Version Control

Git Internals: Objects, Refs, and the DAG

Deep dive into Git internals covering blob, tree, and commit objects, the DAG structure, refs, and building a mini Git in Node.js.

Git Internals: Objects, Refs, and the DAG

Overview

Most developers use Git every day without understanding what happens beneath the porcelain commands. At its core, Git is a content-addressable filesystem built on four object types, a directed acyclic graph of commits, and a thin layer of human-readable references pointing into that graph. Understanding these internals transforms Git from a mysterious tool into a predictable, debuggable system — and makes you dangerous when things go wrong.

Prerequisites

  • Comfortable working with Git from the command line (commit, branch, merge, rebase)
  • Node.js v16+ installed
  • Basic understanding of SHA-1 hashing and data structures (trees, graphs)
  • A Git repository to experiment with (create a throwaway one — you will poke at its internals)

The Four Git Object Types

Everything Git stores lives in .git/objects/. Every piece of data — every file, every directory listing, every commit — is stored as one of four object types: blob, tree, commit, or tag. Each object is identified by the SHA-1 hash of its content, which gives Git its content-addressable storage model.

Content-Addressable Storage

Git does not track files by name. It tracks files by the SHA-1 hash of their contents. Two files with identical content produce the same hash and are stored exactly once. This is not an optimization detail — it is the fundamental design principle.

When you run git add README.md, Git computes the SHA-1 hash of the file's contents (with a header prefix), compresses the result with zlib, and writes it to .git/objects/ab/cdef1234.... The first two characters of the hash become the directory name, and the remaining 38 become the filename. This fan-out structure keeps any single directory from holding too many files.

The hash is computed as:

SHA-1("blob <size>\0<content>")

Where <size> is the byte length of the content and \0 is a null byte. This header prefix is what distinguishes a blob hash from a tree hash even if they happened to contain the same raw bytes.

Blobs: File Content

A blob stores raw file content. Nothing else — no filename, no permissions, no timestamps. Just bytes. This is a critical point that confuses people: the blob does not know what file it belongs to. That mapping lives in the tree object.

You can inspect any blob with git cat-file:

# Show the type of an object
git cat-file -t 5b6e7f8
# Output: blob

# Show the content of a blob
git cat-file -p 5b6e7f8
# Output: (the raw file contents)

# Show the size of the object
git cat-file -s 5b6e7f8
# Output: 2048

Because blobs are content-addressed, renaming a file does not create a new blob. Git detects renames by comparing blob hashes across trees — if the same hash appears under a different path, Git infers a rename. This is why git log --follow works, and why it sometimes gets confused with files that were both renamed and modified in the same commit.

Trees: Directory Structure

A tree object represents a directory listing. Each entry in a tree contains a file mode, an object type, a SHA-1 hash, and a filename. Trees can reference blobs (files) and other trees (subdirectories), forming a recursive structure that mirrors your working directory.

git cat-file -p HEAD^{tree}

Output:

100644 blob a1b2c3d4e5f6...    README.md
100644 blob f6e5d4c3b2a1...    package.json
040000 tree 1a2b3c4d5e6f...    src

The file modes tell you what kind of entry it is:

  • 100644 — regular file
  • 100755 — executable file
  • 040000 — subdirectory (tree)
  • 120000 — symbolic link
  • 160000 — gitlink (submodule reference)

A tree is hashed the same way as a blob — SHA-1("tree <size>\0<entries>") — where entries are binary-encoded. If you change one file deep in a subdirectory, every tree from that file up to the root gets a new hash. This cascading hash property is what makes Git's integrity model work: tampering with any object invalidates every object that references it.

Commits: Snapshots in Time

A commit object ties everything together. It contains:

  • A pointer to the root tree (the snapshot of the entire project)
  • Zero or more parent commit hashes (zero for the initial commit, one for normal commits, two or more for merges)
  • Author name, email, and timestamp
  • Committer name, email, and timestamp
  • The commit message
git cat-file -p HEAD

Output:

tree 4b825dc642cb6eb9a060e54bf899d1f2f430f7b5
parent 8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b
author Shane Larson <[email protected]> 1706745600 -0800
committer Shane Larson <[email protected]> 1706745600 -0800

Add user authentication module

The distinction between author and committer matters. The author is who originally wrote the change. The committer is who applied it. They differ when you cherry-pick someone else's commit, or when patches are applied via email (the Linux kernel workflow that Git was built for).

Tags: Named References to Objects

An annotated tag is the fourth object type. It points to another object (usually a commit), includes a tagger identity and timestamp, and contains a message. Lightweight tags are not objects at all — they are just refs (more on that later).

git cat-file -p v1.0.0

Output:

object 8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b
type commit
tag v1.0.0
tagger Shane Larson <[email protected]> 1706745600 -0800

Release version 1.0.0 - stable API

Annotated tags are cryptographically tied to the object they reference, making them suitable for release signing with GPG.


The Directed Acyclic Graph (DAG)

Every commit points to its parent(s). This creates a graph structure. It is directed (edges go from child to parent), and it is acyclic (you cannot follow parent pointers in a circle — that would require a commit to exist before itself). This structure is called a Directed Acyclic Graph, and it is the backbone of Git's branching and merging model.

         E---F  (feature)
        /     \
   A---B---C---G---H  (main)
            \     /
             D---I  (hotfix)

In this DAG:

  • A is the root commit (no parents)
  • B has A as its parent
  • E branches off from B
  • G is a merge commit with parents C and F
  • H is another merge commit with parents G and I

Every commit is reachable by traversing parent pointers. The set of commits reachable from a ref defines that branch's history. When Git says a branch "contains" a commit, it means that commit is reachable by walking the parent graph from the branch tip.

Merge Commits in the DAG

A merge commit has two or more parents. The first parent is conventionally the branch you merged into, and the second parent is the branch you merged from. This convention matters for git log --first-parent, which shows only the mainline history by following first-parent links.

# Show the parents of a merge commit
git cat-file -p HEAD
# tree ...
# parent abc1234  (first parent - the branch you were on)
# parent def5678  (second parent - the branch you merged in)

Understanding this structure explains why git log sometimes shows a confusing interleaving of commits from different branches — it is performing a topological sort of the DAG and interleaving based on timestamps.

Rebase as DAG Rewriting

Rebase does not move commits. It creates entirely new commit objects with new hashes and new parent pointers, then moves the branch ref to point at the new chain. The old commits still exist (temporarily) in the object store until garbage collection removes them.

Before rebase:
      C---D---E  (feature)
     /
A---B---F---G  (main)

After rebase:
              C'--D'--E'  (feature)
             /
A---B---F---G  (main)

C', D', and E' are new commit objects. They have the same diffs and messages as C, D, and E, but different hashes because their parent pointers changed. The original C, D, E still exist in the object store — they are just unreachable. This is why you can recover from a bad rebase using the reflog.


Refs: Human-Readable Pointers

Nobody wants to type 8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b to reference a commit. Refs are human-readable names that point to object hashes. They live in .git/refs/ and are plain text files containing a single SHA-1 hash.

# A branch is just a file containing a commit hash
cat .git/refs/heads/main
# 8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b

# Tags are also refs
cat .git/refs/tags/v1.0.0
# 9f8e7d6c5b4a3f2e1d0c9b8a7f6e5d4c3b2a1f0e

# Remote tracking branches
cat .git/refs/remotes/origin/main
# 8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b

HEAD: The Current Position

HEAD is a special ref stored in .git/HEAD. It usually contains a symbolic reference to a branch:

cat .git/HEAD
# ref: refs/heads/main

This indirection is what makes git commit advance the current branch. When you commit, Git updates the ref that HEAD points to. When you git checkout a specific commit hash instead of a branch name, HEAD points directly to a commit hash — this is the "detached HEAD" state.

Packed Refs

As a repository grows, thousands of individual ref files become slow. Git periodically packs refs into a single file at .git/packed-refs:

# pack-refs with: peeled fully-peeled sorted
8a3b5c7d9e1f2a4b6c8d0e2f4a6b8c0d2e4f6a8b refs/heads/main
5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f9a8b7c6d refs/tags/v1.0.0
^1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b

Git checks loose refs first, then falls back to packed-refs. This is why you can have a loose ref override a packed ref — the loose ref takes precedence.


The Reflog: Git's Safety Net

The reflog records every change to every ref. Every commit, merge, rebase, checkout, reset — everything that moves a ref gets logged. The reflog is local-only and is your best friend when you need to recover from mistakes.

# Show reflog for HEAD
git reflog
# abc1234 HEAD@{0}: commit: Fix authentication bug
# def5678 HEAD@{1}: rebase finished: returning to refs/heads/feature
# 789abcd HEAD@{2}: rebase: Add user model
# ...

# Show reflog for a specific branch
git reflog show main

The reflog entries expire after 90 days for reachable commits and 30 days for unreachable ones (by default). This is configurable via gc.reflogExpire and gc.reflogExpireUnreachable.

To recover a lost commit:

# Find the commit hash in the reflog
git reflog

# Create a branch pointing to it
git branch recovered-work abc1234

# Or cherry-pick it onto your current branch
git cherry-pick abc1234

Pack Files and Object Compression

Storing every version of every file as a separate compressed object is wasteful. A 1 MB file with a one-line change creates another ~1 MB object. Git solves this with pack files.

When you run git gc (or when Git auto-triggers it), loose objects get packed into .git/objects/pack/*.pack files. Inside a pack file, Git stores objects as deltas — it identifies similar objects and stores only the differences between them. The base object is stored in full, and subsequent versions are stored as instructions for reconstructing them from the base.

# See pack file statistics
git count-objects -vH

# Manually trigger packing
git gc

# Aggressively repack (useful for imported repos)
git gc --aggressive

Pack files have an accompanying .idx index file that enables O(log n) lookup of any object by hash. This is how Git stays fast even with millions of objects — it binary-searches the index to find the offset in the pack file, then reads and decompresses only the relevant delta chain.

How git gc Optimizes Storage

git gc does several things:

  1. Packs loose objects into pack files with delta compression
  2. Removes unreachable objects (commits with no ref or reflog entry pointing to them)
  3. Packs refs from individual files into .git/packed-refs
  4. Prunes old reflog entries past the expiry window
  5. Re-creates the commit graph file for faster traversal

Git triggers auto-gc when the number of loose objects exceeds gc.auto (default: 6700) or the number of pack files exceeds gc.autoPackLimit (default: 50).


Building a Mini Git in Node.js

Let us build a Node.js tool that reads Git's object store directly. This is the best way to internalize how Git actually works.

Hashing Objects

var crypto = require("crypto");
var zlib = require("zlib");
var fs = require("fs");
var path = require("path");

function hashObject(type, content) {
    var header = type + " " + Buffer.byteLength(content) + "\0";
    var store = Buffer.concat([Buffer.from(header), Buffer.from(content)]);
    var hash = crypto.createHash("sha1").update(store).digest("hex");
    return { hash: hash, buffer: store };
}

// Verify against Git
var result = hashObject("blob", "Hello, Git internals!\n");
console.log("SHA-1:", result.hash);
// Run: echo -n "Hello, Git internals!\n" | git hash-object --stdin
// They should match

Reading Objects from the Store

var crypto = require("crypto");
var zlib = require("zlib");
var fs = require("fs");
var path = require("path");

function readObject(gitDir, hash) {
    var objectPath = path.join(
        gitDir, "objects", hash.substring(0, 2), hash.substring(2)
    );

    if (!fs.existsSync(objectPath)) {
        return readPackedObject(gitDir, hash);
    }

    var compressed = fs.readFileSync(objectPath);
    var buffer = zlib.inflateSync(compressed);
    var nullIndex = buffer.indexOf(0);
    var header = buffer.slice(0, nullIndex).toString();
    var parts = header.split(" ");
    var type = parts[0];
    var size = parseInt(parts[1], 10);
    var content = buffer.slice(nullIndex + 1);

    return { type: type, size: size, content: content };
}

function readPackedObject(gitDir, hash) {
    // Pack file reading is complex - simplified version
    var packDir = path.join(gitDir, "objects", "pack");
    if (!fs.existsSync(packDir)) {
        throw new Error("Object not found: " + hash);
    }

    var files = fs.readdirSync(packDir);
    var idxFiles = files.filter(function(f) { return f.endsWith(".idx"); });

    for (var i = 0; i < idxFiles.length; i++) {
        var offset = findInPackIndex(
            path.join(packDir, idxFiles[i]), hash
        );
        if (offset !== null) {
            var packFile = idxFiles[i].replace(".idx", ".pack");
            return readFromPack(
                path.join(packDir, packFile), offset
            );
        }
    }

    throw new Error("Object not found in packs: " + hash);
}

function findInPackIndex(idxPath, hash) {
    var idx = fs.readFileSync(idxPath);

    // Pack index v2 format
    // Header: 4 bytes magic + 4 bytes version
    var magic = idx.readUInt32BE(0);
    if (magic !== 0xff744f63) {
        return null; // Not a v2 index
    }

    // Fan-out table: 256 entries of 4 bytes each, starting at offset 8
    var firstByte = parseInt(hash.substring(0, 2), 16);
    var prevCount = firstByte > 0 ? idx.readUInt32BE(8 + (firstByte - 1) * 4) : 0;
    var count = idx.readUInt32BE(8 + firstByte * 4);
    var totalObjects = idx.readUInt32BE(8 + 255 * 4);

    // SHA-1 table starts at offset 8 + 256*4 = 1032
    var shaTableOffset = 1032;
    var hashBuf = Buffer.from(hash, "hex");

    for (var i = prevCount; i < count; i++) {
        var entrySha = idx.slice(
            shaTableOffset + i * 20,
            shaTableOffset + (i + 1) * 20
        );
        if (entrySha.equals(hashBuf)) {
            // Found it - read offset from offset table
            // CRC table: totalObjects * 4 bytes after SHA table
            // Offset table: after CRC table
            var offsetTableStart = shaTableOffset +
                totalObjects * 20 + totalObjects * 4;
            var packOffset = idx.readUInt32BE(offsetTableStart + i * 4);
            return packOffset;
        }
    }

    return null;
}

function readFromPack(packPath, offset) {
    var pack = fs.readFileSync(packPath);
    var pos = offset;

    // Read object header (variable-length encoding)
    var byte = pack[pos++];
    var type = (byte >> 4) & 0x07;
    var size = byte & 0x0f;
    var shift = 4;

    while (byte & 0x80) {
        byte = pack[pos++];
        size |= (byte & 0x7f) << shift;
        shift += 7;
    }

    var typeNames = {
        1: "commit", 2: "tree", 3: "blob", 4: "tag",
        6: "ofs_delta", 7: "ref_delta"
    };

    var typeName = typeNames[type] || "unknown";

    // For non-delta objects, decompress directly
    if (type >= 1 && type <= 4) {
        var content = zlib.inflateSync(pack.slice(pos));
        return { type: typeName, size: size, content: content.slice(0, size) };
    }

    // Delta objects require resolving the base - simplified
    throw new Error("Delta object resolution not implemented");
}

Parsing Tree Objects

Trees use a binary format that needs special handling:

function parseTree(content) {
    var entries = [];
    var offset = 0;

    while (offset < content.length) {
        // Find the space separating mode from name
        var spaceIdx = content.indexOf(0x20, offset);
        var mode = content.slice(offset, spaceIdx).toString();

        // Find the null byte separating name from hash
        var nullIdx = content.indexOf(0x00, spaceIdx + 1);
        var name = content.slice(spaceIdx + 1, nullIdx).toString();

        // Next 20 bytes are the SHA-1 hash
        var hash = content.slice(nullIdx + 1, nullIdx + 21)
            .toString("hex");

        var entryType = mode === "40000" ? "tree" : "blob";

        entries.push({
            mode: mode,
            type: entryType,
            hash: hash,
            name: name
        });

        offset = nullIdx + 21;
    }

    return entries;
}

Parsing Commit Objects

function parseCommit(content) {
    var text = content.toString();
    var lines = text.split("\n");
    var commit = { parents: [] };
    var i = 0;

    // Parse header fields
    while (i < lines.length && lines[i] !== "") {
        var line = lines[i];
        if (line.startsWith("tree ")) {
            commit.tree = line.substring(5);
        } else if (line.startsWith("parent ")) {
            commit.parents.push(line.substring(7));
        } else if (line.startsWith("author ")) {
            commit.author = parseIdentity(line.substring(7));
        } else if (line.startsWith("committer ")) {
            commit.committer = parseIdentity(line.substring(10));
        }
        i++;
    }

    // Rest is the commit message
    commit.message = lines.slice(i + 1).join("\n").trim();
    return commit;
}

function parseIdentity(raw) {
    var match = raw.match(/^(.+) <(.+)> (\d+) ([+-]\d{4})$/);
    if (!match) return { raw: raw };
    return {
        name: match[1],
        email: match[2],
        timestamp: parseInt(match[3], 10),
        timezone: match[4]
    };
}

Reading Refs

function readRef(gitDir, refPath) {
    var fullPath = path.join(gitDir, refPath);
    if (!fs.existsSync(fullPath)) {
        return readPackedRef(gitDir, refPath);
    }

    var content = fs.readFileSync(fullPath, "utf8").trim();

    // Symbolic ref (like HEAD)
    if (content.startsWith("ref: ")) {
        var target = content.substring(5);
        return { type: "symbolic", target: target, hash: readRef(gitDir, target) };
    }

    return content;
}

function readPackedRef(gitDir, refPath) {
    var packedPath = path.join(gitDir, "packed-refs");
    if (!fs.existsSync(packedPath)) return null;

    var lines = fs.readFileSync(packedPath, "utf8").split("\n");
    for (var i = 0; i < lines.length; i++) {
        var line = lines[i];
        if (line.startsWith("#") || line.startsWith("^")) continue;
        var parts = line.split(" ");
        if (parts.length === 2 && parts[1] === refPath) {
            return parts[0];
        }
    }
    return null;
}

function listBranches(gitDir) {
    var branches = [];
    var headsDir = path.join(gitDir, "refs", "heads");

    function walk(dir, prefix) {
        if (!fs.existsSync(dir)) return;
        var entries = fs.readdirSync(dir, { withFileTypes: true });
        entries.forEach(function(entry) {
            var refName = prefix ? prefix + "/" + entry.name : entry.name;
            if (entry.isDirectory()) {
                walk(path.join(dir, entry.name), refName);
            } else {
                var hash = fs.readFileSync(
                    path.join(dir, entry.name), "utf8"
                ).trim();
                branches.push({ name: refName, hash: hash });
            }
        });
    }

    walk(headsDir, "");
    return branches;
}

Complete Working Example: Git DAG Visualizer

Here is a complete Node.js tool that reads a Git repository, walks the commit graph, and outputs a textual visualization of the DAG with branch labels.

#!/usr/bin/env node

// git-dag-visualizer.js
// Usage: node git-dag-visualizer.js [path-to-repo] [--depth N]

var crypto = require("crypto");
var zlib = require("zlib");
var fs = require("fs");
var path = require("path");

// ---- Configuration ----

var args = process.argv.slice(2);
var repoPath = ".";
var maxDepth = 20;

args.forEach(function(arg, i) {
    if (arg === "--depth" && args[i + 1]) {
        maxDepth = parseInt(args[i + 1], 10);
    } else if (!arg.startsWith("--")) {
        repoPath = arg;
    }
});

var gitDir = path.join(repoPath, ".git");
if (!fs.existsSync(gitDir)) {
    console.error("Not a git repository: " + repoPath);
    process.exit(1);
}

// ---- Object Reading ----

function readLooseObject(hash) {
    var objectPath = path.join(
        gitDir, "objects", hash.substring(0, 2), hash.substring(2)
    );
    if (!fs.existsSync(objectPath)) return null;

    var compressed = fs.readFileSync(objectPath);
    var buffer = zlib.inflateSync(compressed);
    var nullIndex = buffer.indexOf(0);
    var header = buffer.slice(0, nullIndex).toString();
    var parts = header.split(" ");

    return {
        type: parts[0],
        size: parseInt(parts[1], 10),
        content: buffer.slice(nullIndex + 1)
    };
}

function readPackedObject(hash) {
    var packDir = path.join(gitDir, "objects", "pack");
    if (!fs.existsSync(packDir)) return null;

    var files = fs.readdirSync(packDir);
    var idxFiles = files.filter(function(f) { return f.endsWith(".idx"); });
    var hashBuf = Buffer.from(hash, "hex");

    for (var i = 0; i < idxFiles.length; i++) {
        var idxPath = path.join(packDir, idxFiles[i]);
        var idx = fs.readFileSync(idxPath);

        if (idx.readUInt32BE(0) !== 0xff744f63) continue;

        var firstByte = parseInt(hash.substring(0, 2), 16);
        var prevCount = firstByte > 0
            ? idx.readUInt32BE(8 + (firstByte - 1) * 4)
            : 0;
        var count = idx.readUInt32BE(8 + firstByte * 4);
        var totalObjects = idx.readUInt32BE(8 + 255 * 4);
        var shaTableOffset = 1032;

        for (var j = prevCount; j < count; j++) {
            var entrySha = idx.slice(
                shaTableOffset + j * 20,
                shaTableOffset + (j + 1) * 20
            );
            if (entrySha.equals(hashBuf)) {
                var offsetTableStart = shaTableOffset +
                    totalObjects * 20 + totalObjects * 4;
                var packOffset = idx.readUInt32BE(offsetTableStart + j * 4);
                var packFile = idxFiles[i].replace(".idx", ".pack");
                var packPath = path.join(packDir, packFile);
                return readFromPackFile(packPath, packOffset);
            }
        }
    }
    return null;
}

function readFromPackFile(packPath, offset) {
    var pack = fs.readFileSync(packPath);
    var pos = offset;
    var byte = pack[pos++];
    var type = (byte >> 4) & 0x07;
    var size = byte & 0x0f;
    var shift = 4;

    while (byte & 0x80) {
        byte = pack[pos++];
        size |= (byte & 0x7f) << shift;
        shift += 7;
    }

    var typeMap = { 1: "commit", 2: "tree", 3: "blob", 4: "tag" };
    if (!typeMap[type]) return null;

    try {
        var content = zlib.inflateSync(pack.slice(pos));
        return {
            type: typeMap[type],
            size: size,
            content: content.slice(0, size)
        };
    } catch (e) {
        return null;
    }
}

function readObject(hash) {
    var obj = readLooseObject(hash);
    if (obj) return obj;
    return readPackedObject(hash);
}

// ---- Commit Parsing ----

function parseCommitObject(content) {
    var text = content.toString("utf8");
    var lines = text.split("\n");
    var commit = { parents: [] };
    var i = 0;

    while (i < lines.length && lines[i] !== "") {
        if (lines[i].startsWith("tree ")) {
            commit.tree = lines[i].substring(5);
        } else if (lines[i].startsWith("parent ")) {
            commit.parents.push(lines[i].substring(7));
        } else if (lines[i].startsWith("author ")) {
            var authorMatch = lines[i].match(
                /^author (.+) <(.+)> (\d+) ([+-]\d{4})$/
            );
            if (authorMatch) {
                commit.authorName = authorMatch[1];
                commit.authorDate = new Date(
                    parseInt(authorMatch[3], 10) * 1000
                );
            }
        }
        i++;
    }

    commit.message = lines.slice(i + 1).join("\n").trim();
    var firstLine = commit.message.split("\n")[0];
    commit.subject = firstLine.length > 60
        ? firstLine.substring(0, 57) + "..."
        : firstLine;

    return commit;
}

// ---- Ref Reading ----

function resolveRef(refPath) {
    var fullPath = path.join(gitDir, refPath);
    if (fs.existsSync(fullPath)) {
        var content = fs.readFileSync(fullPath, "utf8").trim();
        if (content.startsWith("ref: ")) {
            return resolveRef(content.substring(5));
        }
        return content;
    }

    // Check packed-refs
    var packedPath = path.join(gitDir, "packed-refs");
    if (fs.existsSync(packedPath)) {
        var lines = fs.readFileSync(packedPath, "utf8").split("\n");
        for (var i = 0; i < lines.length; i++) {
            if (lines[i].startsWith("#") || lines[i].startsWith("^")) continue;
            var parts = lines[i].split(" ");
            if (parts.length >= 2 && parts[1] === refPath) {
                return parts[0];
            }
        }
    }
    return null;
}

function getAllRefs() {
    var refs = {};

    // HEAD
    var headContent = fs.readFileSync(
        path.join(gitDir, "HEAD"), "utf8"
    ).trim();
    var headHash;
    var headBranch = null;
    if (headContent.startsWith("ref: ")) {
        headBranch = headContent.substring(5);
        headHash = resolveRef(headBranch);
    } else {
        headHash = headContent;
    }
    if (headHash) {
        if (!refs[headHash]) refs[headHash] = [];
        refs[headHash].push("HEAD");
    }

    // Branches
    function walkRefs(dir, prefix) {
        if (!fs.existsSync(dir)) return;
        var entries = fs.readdirSync(dir, { withFileTypes: true });
        entries.forEach(function(entry) {
            var refName = prefix + "/" + entry.name;
            if (entry.isDirectory()) {
                walkRefs(path.join(dir, entry.name), refName);
            } else {
                var hash = fs.readFileSync(
                    path.join(dir, entry.name), "utf8"
                ).trim();
                if (!refs[hash]) refs[hash] = [];
                refs[hash].push(refName.replace("refs/heads/", "")
                    .replace("refs/tags/", "tag: "));
            }
        });
    }

    walkRefs(path.join(gitDir, "refs", "heads"), "refs/heads");
    walkRefs(path.join(gitDir, "refs", "tags"), "refs/tags");

    // Packed refs
    var packedPath = path.join(gitDir, "packed-refs");
    if (fs.existsSync(packedPath)) {
        var lines = fs.readFileSync(packedPath, "utf8").split("\n");
        lines.forEach(function(line) {
            if (line.startsWith("#") || line.startsWith("^") || !line.trim()) return;
            var parts = line.split(" ");
            if (parts.length >= 2) {
                var hash = parts[0];
                var name = parts[1]
                    .replace("refs/heads/", "")
                    .replace("refs/tags/", "tag: ");
                if (!refs[hash]) refs[hash] = [];
                if (refs[hash].indexOf(name) === -1) {
                    refs[hash].push(name);
                }
            }
        });
    }

    return refs;
}

// ---- DAG Walking ----

function walkDAG(startHashes, depth) {
    var visited = {};
    var commits = [];
    var queue = [];

    startHashes.forEach(function(hash) {
        queue.push({ hash: hash, depth: 0 });
    });

    while (queue.length > 0) {
        var item = queue.shift();
        if (visited[item.hash] || item.depth >= depth) continue;
        visited[item.hash] = true;

        var obj = readObject(item.hash);
        if (!obj || obj.type !== "commit") continue;

        var commit = parseCommitObject(obj.content);
        commit.hash = item.hash;
        commit.depth = item.depth;
        commits.push(commit);

        commit.parents.forEach(function(parentHash) {
            if (!visited[parentHash]) {
                queue.push({ hash: parentHash, depth: item.depth + 1 });
            }
        });
    }

    // Sort by date descending
    commits.sort(function(a, b) {
        return (b.authorDate || 0) - (a.authorDate || 0);
    });

    return commits;
}

// ---- Visualization ----

function visualizeDAG(commits, refs) {
    console.log("\n=== Git DAG Visualization ===\n");
    console.log("Repository: " + path.resolve(repoPath));
    console.log("Commits shown: " + commits.length);
    console.log("");

    var hashToCol = {};
    var activeCols = [];

    commits.forEach(function(commit) {
        var shortHash = commit.hash.substring(0, 7);
        var refLabels = refs[commit.hash];
        var isMerge = commit.parents.length > 1;
        var marker = isMerge ? "M" : "*";

        // Format ref labels
        var labelStr = "";
        if (refLabels && refLabels.length > 0) {
            labelStr = " (" + refLabels.join(", ") + ")";
        }

        // Format date
        var dateStr = "";
        if (commit.authorDate) {
            dateStr = commit.authorDate.toISOString().substring(0, 10);
        }

        // Print the commit line
        var line = "  " + marker + " " + shortHash;
        line += " [" + dateStr + "]";
        line += labelStr;
        line += " - " + commit.subject;

        if (isMerge) {
            var parentShorts = commit.parents.map(function(p) {
                return p.substring(0, 7);
            });
            line += " (merge: " + parentShorts.join(" + ") + ")";
        }

        console.log(line);

        // Draw connectors for parents
        if (commit.parents.length === 1) {
            console.log("  |");
        } else if (commit.parents.length > 1) {
            var connectors = commit.parents.map(function(p) {
                return p.substring(0, 7);
            });
            console.log("  |\\" + connectors.slice(1).map(function() {
                return "";
            }).join(""));
        }
    });

    console.log("");
}

// ---- Statistics ----

function printStats(commits) {
    var mergeCount = 0;
    var rootCount = 0;
    var authors = {};

    commits.forEach(function(commit) {
        if (commit.parents.length > 1) mergeCount++;
        if (commit.parents.length === 0) rootCount++;
        if (commit.authorName) {
            if (!authors[commit.authorName]) {
                authors[commit.authorName] = 0;
            }
            authors[commit.authorName]++;
        }
    });

    console.log("=== DAG Statistics ===\n");
    console.log("Total commits:  " + commits.length);
    console.log("Merge commits:  " + mergeCount);
    console.log("Root commits:   " + rootCount);
    console.log("");
    console.log("Authors:");

    var authorList = Object.keys(authors).map(function(name) {
        return { name: name, count: authors[name] };
    });
    authorList.sort(function(a, b) { return b.count - a.count; });
    authorList.forEach(function(author) {
        console.log("  " + author.name + ": " + author.count + " commits");
    });

    console.log("");
}

// ---- Tree Inspection ----

function inspectCommitTree(hash) {
    var obj = readObject(hash);
    if (!obj || obj.type !== "commit") {
        console.error("Not a commit: " + hash);
        return;
    }

    var commit = parseCommitObject(obj.content);
    console.log("\n=== Tree for commit " + hash.substring(0, 7) + " ===\n");

    function walkTree(treeHash, prefix) {
        var treeObj = readObject(treeHash);
        if (!treeObj || treeObj.type !== "tree") return;

        var entries = parseTreeObject(treeObj.content);
        entries.forEach(function(entry) {
            var typeChar = entry.type === "tree" ? "d" : "-";
            console.log(
                "  " + typeChar + " " + entry.mode +
                " " + entry.hash.substring(0, 7) +
                " " + prefix + entry.name
            );
            if (entry.type === "tree") {
                walkTree(entry.hash, prefix + entry.name + "/");
            }
        });
    }

    walkTree(commit.tree, "");
}

function parseTreeObject(content) {
    var entries = [];
    var offset = 0;

    while (offset < content.length) {
        var spaceIdx = content.indexOf(0x20, offset);
        var mode = content.slice(offset, spaceIdx).toString();
        var nullIdx = content.indexOf(0x00, spaceIdx + 1);
        var name = content.slice(spaceIdx + 1, nullIdx).toString();
        var hash = content.slice(nullIdx + 1, nullIdx + 21).toString("hex");

        entries.push({
            mode: mode,
            type: mode === "40000" ? "tree" : "blob",
            hash: hash,
            name: name
        });

        offset = nullIdx + 21;
    }

    return entries;
}

// ---- Main Execution ----

console.log("Reading Git repository...");

var refs = getAllRefs();
var branchHashes = [];

// Collect all branch tip hashes
Object.keys(refs).forEach(function(hash) {
    branchHashes.push(hash);
});

// Deduplicate
var uniqueHashes = [];
var seen = {};
branchHashes.forEach(function(hash) {
    if (!seen[hash]) {
        seen[hash] = true;
        uniqueHashes.push(hash);
    }
});

var commits = walkDAG(uniqueHashes, maxDepth);
visualizeDAG(commits, refs);
printStats(commits);

// Show tree of HEAD commit if available
var headHash = resolveRef("HEAD");
if (headHash) {
    inspectCommitTree(headHash);
}

Run it against any repository:

node git-dag-visualizer.js /path/to/your/repo --depth 30

Understanding Merge Conflicts at the Object Level

Merge conflicts are not magic. When Git cannot automatically merge two branches, it is because the same region of the same blob was modified differently in two branches. Understanding this at the object level makes conflicts predictable.

Git performs a three-way merge using:

  1. The merge base — the common ancestor commit (found by walking the DAG backward from both branch tips)
  2. The ours version — the blob from the current branch's tree
  3. The theirs version — the blob from the branch being merged

Git compares the base blob to both branch blobs. If only one side changed a region, Git takes that change. If both sides changed the same region differently, you get a conflict.

You can see the merge base:

git merge-base main feature
# Returns the hash of the common ancestor commit

And compare the three versions:

git show :1:path/to/file   # base version (stage 1)
git show :2:path/to/file   # ours version (stage 2)
git show :3:path/to/file   # theirs version (stage 3)

These stage numbers correspond to entries in the index (.git/index), which stores the three-way conflict state until you resolve it.


Common Issues & Troubleshooting

1. "fatal: bad object HEAD"

fatal: bad object HEAD

This means .git/HEAD points to a ref or hash that does not exist. Common after a corrupted clone or interrupted operation.

Fix: Check what HEAD points to and restore it:

cat .git/HEAD
# If it's a symbolic ref, verify the target exists
cat .git/refs/heads/main
# If the file is missing, find a valid commit
git reflog  # if reflog is intact
# Or look at remote tracking branches
cat .git/refs/remotes/origin/main
# Restore HEAD
git update-ref HEAD <valid-commit-hash>

2. "error: object file .git/objects/xx/yyyy is empty"

error: object file .git/objects/ab/cdef1234567890abcdef is empty

A loose object file exists but has zero bytes. This typically happens when a disk write was interrupted (power failure, forced shutdown during git commit).

Fix:

# Remove the empty file
rm .git/objects/ab/cdef1234567890abcdef

# If it was a recent commit, re-fetch or find it in reflog
git fsck --full
git fetch origin

# Rebuild the object from a remote or backup
git fetch origin
git checkout origin/main -- path/to/affected/file

3. "warning: reflog of 'HEAD' references pruned commits"

warning: reflog of 'HEAD' references pruned commits

This appears after git gc when the reflog references commits that were garbage-collected. The reflog entries pointing to those commits are now dangling.

Fix: This is usually harmless. Clean up the reflog:

git reflog expire --expire=now --all
git gc --prune=now

Warning: This permanently removes the ability to recover those commits. Only do this if you are certain you do not need them.

4. "fatal: packed object [hash] (stored in .git/objects/pack/pack-xxx.pack) is corrupt"

fatal: packed object abc1234 (stored in .git/objects/pack/pack-abc.pack) is corrupt

A pack file has been corrupted. This can happen due to disk errors, incomplete network transfers, or filesystem issues.

Fix:

# Verify the damage
git fsck --full

# If you have a remote, the easiest fix is to re-clone
# If you need to preserve local branches:
# 1. Back up the repo
cp -r .git .git-backup

# 2. Remove the corrupted pack
rm .git/objects/pack/pack-abc.pack
rm .git/objects/pack/pack-abc.idx

# 3. Fetch objects from remote
git fetch --all

# 4. Verify
git fsck --full

5. "error: cannot lock ref 'refs/heads/feature': reference already exists"

error: cannot lock ref 'refs/heads/feature/login': 'refs/heads/feature' exists

Git stores refs as filesystem paths. You cannot have both a branch called feature and a branch called feature/login because feature would need to be both a file and a directory. This is a file-or-directory conflict in the ref namespace.

Fix: Delete or rename the conflicting ref:

git branch -d feature
# Now you can create feature/login
git checkout -b feature/login

Best Practices

  • Run git fsck periodically. It verifies the integrity of every object and reports dangling or corrupted objects. Add it to your maintenance routine, especially on servers.

  • Never manually edit files inside .git/. Use Git's plumbing commands (git update-ref, git hash-object, git mktree) instead. Direct file edits bypass Git's consistency checks and can corrupt your repository.

  • Use git reflog before panicking. Almost every "lost" commit is recoverable through the reflog within 90 days. Before you re-do work, check git reflog for the commit hash you need.

  • Understand that branches are cheap. A branch is a 41-byte file (40 hex characters plus a newline). Creating, deleting, and switching branches is nearly instantaneous because it only involves updating these tiny ref files.

  • Prefer git gc --auto over manual git gc. Let Git decide when to run garbage collection. Manual git gc can interfere with concurrent operations and prematurely remove objects that another process is still writing.

  • Learn the plumbing commands. Commands like git cat-file, git rev-parse, git ls-tree, and git for-each-ref give you direct access to Git's internals. They are stable, scriptable, and produce machine-parseable output — unlike porcelain commands whose output format can change between versions.

  • Use git verify-pack to audit pack files. When troubleshooting repository size or performance issues, git verify-pack -v .git/objects/pack/pack-xxx.idx shows you every object in a pack with its type, size, and delta chain depth.

  • Set core.fsmonitor on large repositories. File system monitoring dramatically speeds up git status and other working-tree operations by tracking which files have changed since the last query, avoiding a full directory scan.


References

Powered by Contentful