Git Internals: Objects, Refs, and the DAG

Shane

2/14/2026

11 min read

A deep dive into Git's internal data model including blob, tree, commit, and tag objects, references, the DAG structure, packfiles, and plumbing commands.

git objects refs dag internals plumbing

Git Internals: Objects, Refs, and the DAG

Most developers use Git without understanding how it works internally. They memorize commands and workflows without knowing what commits actually are, how branches are implemented, or why merge and rebase produce different results. Understanding Git's internals transforms it from a mysterious tool into a predictable system. When something goes wrong, you know where to look. When a command does something unexpected, you understand why.

I learned Git internals by accident while debugging a corrupted repository. The knowledge has saved me countless hours since. This guide covers everything under the hood — the object model, the reference system, and the DAG that connects it all.

Prerequisites

Git installed
Basic Git usage (commit, branch, merge)
Comfort with the command line
Curiosity about how tools work

The Object Database

Git is fundamentally a content-addressable filesystem. Everything Git stores — files, directories, commits — is an object identified by its SHA-1 hash. The object database lives in .git/objects/.

The Four Object Types

blob    — file content (no filename, just data)
tree    — directory listing (maps filenames to blobs and subtrees)
commit  — snapshot pointer with metadata (author, message, parent)
tag     — named pointer to a commit (annotated tags)

Blob Objects

A blob stores file content. Nothing else — no filename, no permissions, no metadata.

# Create a file and add it
echo "Hello, World" > greeting.txt
git add greeting.txt

# Find the blob object
git ls-files --stage
# 100644 d670460b4b4aece5915caf5c68d12f560a9fe3e4 0    greeting.txt

# Inspect the blob
git cat-file -t d670460
# blob

git cat-file -p d670460
# Hello, World

# Two files with identical content share the same blob
echo "Hello, World" > duplicate.txt
git add duplicate.txt
git ls-files --stage
# Both files point to the same blob hash

The hash is computed from the content: SHA-1("blob <size>\0<content>"). Same content always produces the same hash. This is how Git deduplicates identical files.

Tree Objects

A tree maps filenames to blobs (files) and other trees (subdirectories):

# After committing
git cat-file -p main^{tree}
# 100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4    greeting.txt
# 100644 blob a0b1c2d3e4f5a6b7c8d9e0f1a2b3c4d5e6f7a8b9    app.js
# 040000 tree 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b    src

Entry format:

<mode> <type> <hash>    <name>
100644 blob   abc123    file.js      (regular file)
100755 blob   def456    script.sh    (executable)
040000 tree   ghi789    src/         (directory)
120000 blob   jkl012    link         (symlink)

Trees are recursive. A tree for a project with nested directories contains subtrees:

root tree
├── 100644 blob abc123    package.json
├── 100644 blob def456    app.js
└── 040000 tree ghi789    src/
    ├── 100644 blob jkl012    index.js
    ├── 100644 blob mno345    utils.js
    └── 040000 tree pqr678    routes/
        ├── 100644 blob stu901    home.js
        └── 100644 blob vwx234    api.js

Commit Objects

A commit points to a tree (the snapshot) and includes metadata:

git cat-file -p HEAD
# tree 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
# parent abc1234def5678901234567890abcdef12345678
# author Shane <[email protected]> 1707840000 -0800
# committer Shane <[email protected]> 1707840000 -0800
#
# feat: add user authentication

Fields:

tree — the root tree object (the complete snapshot)
parent — the previous commit (merge commits have multiple parents)
author — who wrote the code (name, email, timestamp)
committer — who committed it (can differ from author in cherry-picks)
The commit message follows a blank line

A commit is just 200-300 bytes regardless of how many files changed. The tree it points to captures the entire project state.

Tag Objects

Annotated tags are objects that point to a commit:

git tag -a v1.0.0 -m "Release 1.0.0"

git cat-file -p v1.0.0
# object abc1234def5678901234567890abcdef12345678
# type commit
# tag v1.0.0
# tagger Shane <[email protected]> 1707840000 -0800
#
# Release 1.0.0

Lightweight tags are not objects — they are just refs (pointers) without metadata.

Object Storage

Loose Objects

New objects are stored as individual files:

# Object abc1234... is stored at:
.git/objects/ab/c1234def5678901234567890abcdef12345678

# First two characters = directory
# Remaining characters = filename

Objects are zlib-compressed. You can inspect them with plumbing commands:

# Type
git cat-file -t abc1234

# Content
git cat-file -p abc1234

# Size
git cat-file -s abc1234

Packfiles

Git periodically packs loose objects into packfiles for efficiency:

ls .git/objects/pack/
# pack-abc123def456.idx    (index — lookup table)
# pack-abc123def456.pack   (data — compressed objects)

Packfiles use delta compression — similar objects are stored as deltas (differences) from a base object. This is extremely efficient for files that change incrementally.

# Manually trigger packing
git gc

# Verify packfile integrity
git verify-pack -v .git/objects/pack/pack-*.idx | head -20
# abc1234 commit 234 156 12
# def5678 tree   145 102 168
# ghi9012 blob   2890 1205 270
# jkl3456 blob   45 58 1475 1 ghi9012  ← delta from ghi9012

The last entry shows a delta object — it stores only the difference from ghi9012, saving space when similar file versions exist.

References (Refs)

Refs are human-readable names that point to commit hashes. They are stored as files in .git/refs/.

Branch Refs

cat .git/refs/heads/main
# abc1234def5678901234567890abcdef12345678

cat .git/refs/heads/feature-auth
# def5678901234567890abcdef12345678abc1234

A branch is literally a file containing a 40-character commit hash. Creating a branch is creating a file. That is why branches are cheap.

HEAD

HEAD is a symbolic ref that points to the current branch:

cat .git/HEAD
# ref: refs/heads/main

# After checkout to a branch:
git checkout feature
cat .git/HEAD
# ref: refs/heads/feature

# Detached HEAD (pointing directly to a commit):
git checkout abc1234
cat .git/HEAD
# abc1234def5678901234567890abcdef12345678

Tag Refs

cat .git/refs/tags/v1.0.0
# abc1234def5678901234567890abcdef12345678

Remote Tracking Refs

cat .git/refs/remotes/origin/main
# abc1234def5678901234567890abcdef12345678

Packed Refs

When there are many refs, Git packs them into a single file:

cat .git/packed-refs
# # pack-refs with: peeled fully-peeled sorted
# abc1234def5678901234567890abcdef12345678 refs/heads/main
# def5678901234567890abcdef12345678abc1234 refs/heads/develop
# ghi9012345678901234567890abcdef12345678ab refs/tags/v1.0.0

Loose refs in .git/refs/ override packed refs. Git checks loose refs first, then falls back to packed-refs.

The Directed Acyclic Graph (DAG)

Every commit points to its parent(s), forming a directed acyclic graph:

A ← B ← C ← D ← E        (main)
         \       ↑
          F ← G ← H       (feature, merged at E with parents D and H)

Properties:

Directed — commits point to parents, not children
Acyclic — no cycles (a commit cannot be its own ancestor)
Roots — initial commits have no parent

Walking the DAG

# Show the DAG structure
git log --oneline --graph --all

# List all commits reachable from HEAD
git rev-list HEAD

# List commits reachable from main but not from feature
git rev-list feature..main

# List commits reachable from either but not both
git rev-list main...feature

# Find the common ancestor of two branches
git merge-base main feature

How Merge Works in the DAG

git merge feature

Git finds the merge base (common ancestor), computes diffs from the base to each branch tip, and combines them. The merge commit has two parents:

git cat-file -p HEAD  # After merge
# tree ...
# parent abc1234   (main's previous HEAD)
# parent def5678   (feature's HEAD)
# ...

How Rebase Works in the DAG

git rebase main  # On feature branch

Git replays each commit from the feature branch on top of main. Each replayed commit is a new object with a new hash. The old commits become unreachable (but stay in the reflog).

Plumbing Commands

Git has two layers: porcelain (user-facing) and plumbing (low-level).

Creating Objects Manually

# Create a blob from content
echo "Hello" | git hash-object -w --stdin
# ce013625030ba8dba906f756967f9e9ca394464a

# Create a blob from a file
git hash-object -w myfile.js

# Create a tree
git mktree << 'EOF'
100644 blob ce013625030ba8dba906f756967f9e9ca394464a	hello.txt
100644 blob abc1234def5678901234567890abcdef12345678	app.js
EOF

# Create a commit
echo "Initial commit" | git commit-tree <tree-hash>
# Returns the new commit hash

Inspecting the Index (Staging Area)

The index is a binary file at .git/index that tracks what will go into the next commit:

# Show the index contents
git ls-files --stage
# 100644 abc1234def5678901234567890abcdef12345678 0    app.js
# 100644 def5678901234567890abcdef12345678abc1234 0    package.json

# Show unmerged entries (conflicts)
git ls-files --unmerged

# Update the index manually
git update-index --add --cacheinfo 100644,<hash>,filename

Ref Operations

# Read a ref
git rev-parse HEAD
git rev-parse main
git rev-parse v1.0.0

# Create a ref
git update-ref refs/heads/new-branch abc1234

# Delete a ref
git update-ref -d refs/heads/old-branch

# List all refs
git for-each-ref
git for-each-ref --format='%(refname:short) %(objectname:short) %(subject)' refs/heads/

Complete Working Example: Building a Commit from Scratch

#!/bin/bash
# Build a commit using only plumbing commands

# Initialize a new repo
mkdir plumbing-demo && cd plumbing-demo
git init

# 1. Create blob objects
BLOB_APP=$(echo 'var app = require("express")();' | git hash-object -w --stdin)
BLOB_PKG=$(echo '{"name": "demo", "version": "1.0.0"}' | git hash-object -w --stdin)
echo "Created blobs: app=$BLOB_APP pkg=$BLOB_PKG"

# 2. Create a tree object
TREE=$(printf "100644 blob $BLOB_APP\tapp.js\n100644 blob $BLOB_PKG\tpackage.json\n" | git mktree)
echo "Created tree: $TREE"

# 3. Verify the tree
git cat-file -p $TREE

# 4. Create a commit object
COMMIT=$(echo "Initial commit (built manually)" | \
  GIT_AUTHOR_NAME="Shane" GIT_AUTHOR_EMAIL="[email protected]" \
  GIT_COMMITTER_NAME="Shane" GIT_COMMITTER_EMAIL="[email protected]" \
  git commit-tree $TREE)
echo "Created commit: $COMMIT"

# 5. Point main to the new commit
git update-ref refs/heads/main $COMMIT

# 6. Set HEAD to point to main
git symbolic-ref HEAD refs/heads/main

# 7. Check out the working tree
git checkout -f

# 8. Verify everything works
git log --oneline
git status
ls -la
cat app.js

echo "Done! A complete Git commit built from plumbing commands."

Common Issues and Troubleshooting

"loose object is corrupt"

A file in .git/objects/ has been corrupted, possibly by disk error or incomplete write:

Fix: If you have a remote, re-fetch the object: git fetch origin. If the object is in a packfile, try git unpack-objects < .git/objects/pack/*.pack after removing the corrupted loose object. For severe corruption, clone fresh from the remote.

Repository is very large despite few files

Old large objects remain in packfiles even after files are deleted from the working tree:

Fix: Objects referenced by any commit in history persist. Use git filter-repo to remove large objects from history entirely, then git gc --aggressive to repack.

"unable to resolve reference" errors

A ref file in .git/refs/ is corrupted or empty:

Fix: Check the ref file: cat .git/refs/heads/main. If empty, restore from packed-refs: git pack-refs --all and check .git/packed-refs. If the hash is known (from git reflog), manually write it: echo <hash> > .git/refs/heads/main.

Detached HEAD state after checkout

You checked out a commit hash or tag instead of a branch name:

Fix: Create a branch at the current position: git checkout -b my-branch. Or return to a branch: git checkout main. Detached HEAD is not an error — it just means HEAD points directly to a commit instead of through a branch ref.

Best Practices

Understand that branches are pointers. A branch is a 41-byte file. Creating, deleting, and switching branches is nearly free. Use branches liberally.
Use git cat-file -p to inspect objects. When debugging, look at the raw objects. They tell you exactly what Git sees, without porcelain formatting.
Let Git manage the object database. Do not manually modify files in .git/objects/. Use plumbing commands if you need low-level access.
Run git gc periodically on large repos. Garbage collection packs loose objects and removes unreachable ones. The maintenance system handles this automatically if enabled.
Remember the reflog is your safety net. Even after destructive operations, the reflog keeps references to old commits for 30 days. Use git reflog to find lost work.
Study the DAG to understand merge and rebase. Once you see commits as nodes in a graph, merge creates a node with two parents, and rebase moves nodes to a different parent. The mental model makes all Git operations predictable.

Git Internals: Objects, Refs, and the DAG

Git Internals: Objects, Refs, and the DAG

Prerequisites

The Object Database

The Four Object Types

Blob Objects

Tree Objects

Commit Objects

Tag Objects

Object Storage

Loose Objects

Packfiles

References (Refs)

Branch Refs

HEAD

Tag Refs

Remote Tracking Refs

Packed Refs

The Directed Acyclic Graph (DAG)

Walking the DAG

How Merge Works in the DAG

How Rebase Works in the DAG

Plumbing Commands

Creating Objects Manually

Inspecting the Index (Staging Area)

Ref Operations

Complete Working Example: Building a Commit from Scratch

Common Issues and Troubleshooting

Best Practices

References

Quick Links

Need Expert Help?