Git Performance Optimization for Large Repositories

Shane

2/14/2026

10 min read

Techniques for improving Git performance in large repositories including shallow clones, sparse checkout, partial clones, commit graph, filesystem monitor, and maintenance commands.

performance optimization git large-repos scaling

Git Performance Optimization for Large Repositories

Git was designed for the Linux kernel — a massive codebase with decades of history. But even Git slows down when repositories accumulate hundreds of thousands of commits, millions of files, or gigabytes of binary assets. Status checks take seconds. Log commands crawl. Cloning takes twenty minutes.

I work with repositories that have 100,000+ commits and thousands of files. Without optimization, basic operations become painful. The techniques in this guide take Git from sluggish to instant on these large repos.

Prerequisites

Git installed (v2.38+ for all features discussed)
A repository large enough to feel slow (10,000+ commits or 10,000+ files)
Terminal access
Basic Git knowledge

Diagnosing Performance Problems

Before optimizing, measure where time is spent:

# Time a git status
time git status

# Verbose status shows what's slow
GIT_TRACE=1 git status

# Detailed timing of all Git operations
GIT_TRACE_PERFORMANCE=1 git status 2>&1 | head -30

# Repository statistics
git count-objects -vH
# count: 0
# size: 0 bytes
# in-pack: 245831
# packs: 1
# size-pack: 892.34 MiB
# prune-packable: 0
# garbage: 0
# size-garbage: 0 bytes

# Number of commits
git rev-list --count HEAD

# Number of files
git ls-files | wc -l

# Largest files in the repository
git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  sed -n 's/^blob //p' | sort -rnk2 | head -20

Shallow Clones

Clone only recent history instead of the entire repo:

# Clone only the last 10 commits
git clone --depth 10 https://github.com/myorg/large-repo.git

# Clone only the last commit (fastest)
git clone --depth 1 https://github.com/myorg/large-repo.git

# Fetch more history later if needed
git fetch --deepen 50

# Convert to full clone
git fetch --unshallow

Shallow Clone for CI/CD

Most CI pipelines only need the latest commit:

# GitHub Actions
steps:
  - uses: actions/checkout@v4
    with:
      fetch-depth: 1    # Shallow clone

# GitLab CI
variables:
  GIT_DEPTH: 1

# Jenkins
checkout([$class: 'GitSCM',
  extensions: [[$class: 'CloneOption', depth: 1, shallow: true]]])

Shallow Clone Limitations

# These operations require full history:
git log --all              # Only shows shallow history
git blame file.js          # May show incomplete blame
git bisect                 # Cannot bisect beyond shallow boundary
git merge-base main feature  # May fail if common ancestor is not in shallow history

# Workaround: deepen when needed
git fetch --deepen 100

Sparse Checkout

Check out only the directories you need instead of the entire working tree:

# Initialize sparse checkout
git clone --no-checkout https://github.com/myorg/large-repo.git
cd large-repo
git sparse-checkout init --cone

# Specify which directories to check out
git sparse-checkout set packages/api packages/shared

# Check out the files
git checkout main

# Result: only packages/api/ and packages/shared/ exist
ls
# packages/
#   api/
#   shared/

Adding and Removing Directories

# Add another directory
git sparse-checkout add packages/web

# List current sparse checkout patterns
git sparse-checkout list

# Disable sparse checkout (get everything)
git sparse-checkout disable

Sparse Checkout Patterns

# Cone mode (default) — specify directories
git sparse-checkout set src/ docs/ tests/

# Non-cone mode — use gitignore-style patterns
git sparse-checkout init --no-cone
git sparse-checkout set '/*' '!/packages/*' '/packages/api/*' '/packages/shared/*'

Combining with Shallow Clone

# Maximum speed: shallow + sparse
git clone --depth 1 --filter=blob:none --sparse https://github.com/myorg/large-repo.git
cd large-repo
git sparse-checkout set packages/api

Partial Clones

Partial clones download object metadata but defer downloading file contents until needed:

# Clone without downloading blobs (file contents)
git clone --filter=blob:none https://github.com/myorg/large-repo.git

# Clone without objects larger than 1MB
git clone --filter=blob:limit=1m https://github.com/myorg/large-repo.git

# Clone without trees (directory listings) — most aggressive
git clone --filter=tree:0 https://github.com/myorg/large-repo.git

How Partial Clones Work

Full clone:
  Download: ALL commits + ALL trees + ALL blobs
  Time: 20 minutes for a 5GB repo

Blobless clone (--filter=blob:none):
  Download: ALL commits + ALL trees + NO blobs
  Time: 2 minutes (blobs fetched on demand)

Treeless clone (--filter=tree:0):
  Download: ALL commits + NO trees + NO blobs
  Time: 30 seconds (trees and blobs fetched on demand)

When you git checkout a file, Git fetches the blob from the server. When you git log -- path/to/file, Git fetches the necessary trees. This is transparent — Git handles it automatically.

Server Requirements

Partial clones require server support. GitHub, GitLab, and Bitbucket all support them. Self-hosted Git may need configuration:

# On the server, enable partial clone
git config --global uploadpack.allowFilter true
git config --global uploadpack.allowAnySHA1InWant true

Commit Graph

The commit graph is a pre-computed cache of commit relationships that dramatically speeds up log, merge-base, and reachability queries:

# Generate the commit graph
git commit-graph write --reachable

# Enable automatic commit graph updates
git config --global fetch.writeCommitGraph true
git config --global gc.writeCommitGraph true

Impact

# Before commit graph
time git log --oneline -100
# real    0m2.340s

# After commit graph
time git log --oneline -100
# real    0m0.045s

The commit graph stores generation numbers that let Git skip entire branches of the commit DAG during traversal. For repos with 100,000+ commits, this turns multi-second operations into milliseconds.

Incremental Updates

# Write incremental commit graph (faster than full rebuild)
git commit-graph write --reachable --changed-paths

# The --changed-paths option also pre-computes Bloom filters
# for file paths, speeding up git log -- <path>

Filesystem Monitor

Git checks every file in the working tree during git status. On repos with 50,000+ files, this takes seconds. The filesystem monitor (FSMonitor) tells Git which files changed since the last check:

# Enable the built-in filesystem monitor
git config core.fsmonitor true
git config core.untrackedCache true

# First run initializes the monitor
git status
# Subsequent runs are much faster

Impact

# 50,000 files, without FSMonitor
time git status
# real    0m3.200s

# Same repo, with FSMonitor
time git status
# real    0m0.150s

Watchman Integration

For even better performance, use Facebook's Watchman:

# Install Watchman
brew install watchman      # macOS
sudo apt install watchman  # Linux

# Configure Git to use Watchman
git config core.fsmonitor "$(which watchman)"

Git Maintenance

Git 2.34+ includes a maintenance system that runs optimization tasks automatically:

# Register a repository for automatic maintenance
git maintenance register

# Run maintenance manually
git maintenance run

# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=commit-graph
git maintenance run --task=prefetch
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
git maintenance run --task=pack-refs

Maintenance Schedule

# Enable scheduled maintenance
git maintenance start

# This creates system-level scheduled tasks:
# - Hourly: prefetch, loose-objects, incremental-repack
# - Daily: commit-graph, pack-refs
# - Weekly: gc

Manual Garbage Collection

# Standard garbage collection
git gc

# Aggressive GC (slower but more thorough)
git gc --aggressive

# Prune objects older than 2 weeks
git gc --prune=2.weeks.ago

# Repack all objects into a single pack
git repack -a -d -f --depth=250 --window=250

Configuration for Large Repos

# Core performance settings
git config core.fsmonitor true
git config core.untrackedCache true
git config core.preloadIndex true
git config core.fscache true               # Windows only

# Pack settings for large repos
git config pack.threads 0                  # Use all CPU cores
git config pack.windowMemory 256m
git config pack.deltaCacheSize 256m

# Fetch optimizations
git config fetch.writeCommitGraph true
git config fetch.parallel 4               # Parallel submodule fetches

# Index performance
git config index.version 4                # Compact index format
git config feature.manyFiles true          # Enables multiple optimizations

# Diff and merge performance
git config diff.algorithm histogram        # Faster diff algorithm
git config merge.renameLimit 10000         # Allow more rename detection

The feature.manyFiles Shortcut

# This single setting enables multiple optimizations:
git config feature.manyFiles true
# Equivalent to:
#   core.untrackedCache = true
#   core.fsmonitor = true
#   index.version = 4

Complete Working Example: Optimizing a Large Monorepo

#!/bin/bash
# scripts/optimize-repo.sh

echo "=== Git Repository Optimization ==="
echo ""

# Show current stats
echo "Repository statistics:"
git count-objects -vH
echo ""
echo "Commits: $(git rev-list --count HEAD)"
echo "Files: $(git ls-files | wc -l)"
echo ""

# Enable performance features
echo "Enabling performance features..."
git config core.fsmonitor true
git config core.untrackedCache true
git config core.preloadIndex true
git config feature.manyFiles true
git config fetch.writeCommitGraph true
git config gc.writeCommitGraph true
git config pack.threads 0
git config diff.algorithm histogram
echo "Done."
echo ""

# Build commit graph with Bloom filters
echo "Building commit graph..."
git commit-graph write --reachable --changed-paths
echo "Done."
echo ""

# Run garbage collection
echo "Running garbage collection..."
git gc --auto
echo "Done."
echo ""

# Register for maintenance
echo "Registering for automatic maintenance..."
git maintenance register
echo "Done."
echo ""

# Benchmark
echo "=== Benchmarks ==="
echo "git status:"
time git status > /dev/null 2>&1
echo ""
echo "git log -100:"
time git log --oneline -100 > /dev/null 2>&1
echo ""
echo "Optimization complete."

Common Issues and Troubleshooting

git status is slow even with FSMonitor

The filesystem monitor daemon may not be running:

Fix: Check if the monitor is active with git config core.fsmonitor. If set to true, the built-in monitor should start automatically. If using Watchman, verify it is running with watchman watch-list. On Windows, ensure the Git Credential Manager is not interfering.

Shallow clone breaks git log and git blame

History beyond the shallow depth is not available:

Fix: Use git fetch --deepen N to get more history, or git fetch --unshallow for the full history. For CI pipelines that only need the latest code, this limitation is acceptable.

Sparse checkout shows wrong files

Files outside the sparse checkout set appear after certain operations:

Fix: Run git sparse-checkout reapply to re-apply the sparse checkout rules. Ensure you are using cone mode (--cone) which is more predictable. Non-cone mode with complex patterns can produce unexpected results.

Commit graph becomes corrupted after force push

A force push rewrites history that the commit graph has cached:

Fix: Delete the commit graph and rebuild: rm .git/objects/info/commit-graph && git commit-graph write --reachable. The maintenance system will also rebuild it automatically.

Repository size keeps growing despite gc

Large binary files in history take space even after deletion:

Fix: Use git filter-repo to remove large files from history entirely. Or adopt Git LFS for binary assets going forward. Regular git gc cannot remove objects that are referenced by any commit in history.

Best Practices

Enable feature.manyFiles on any repo with 10,000+ files. It is a single setting that activates FSMonitor, untracked cache, and index version 4.
Use shallow clones in CI/CD. Build pipelines rarely need full history. --depth 1 makes cloning near-instant.
Build the commit graph after large fetches. The commit graph accelerates log, merge-base, and reachability queries. Set fetch.writeCommitGraph to keep it updated automatically.
Run git maintenance register on frequently used repos. The automatic maintenance schedule keeps garbage collection, repacking, and commit graph updates running without manual intervention.
Use sparse checkout for monorepos. Checking out only the packages you work on reduces the working tree size and speeds up every file-scanning operation.
Profile before optimizing. Use GIT_TRACE_PERFORMANCE=1 to identify the actual bottleneck. Optimizing clone speed when the problem is git status wastes effort.
Combine partial clone with sparse checkout for maximum speed. --filter=blob:none --sparse gives you the fastest clone and smallest working tree. Blobs are fetched on demand.

Git Performance Optimization for Large Repositories

Git Performance Optimization for Large Repositories

Prerequisites

Diagnosing Performance Problems

Shallow Clones

Shallow Clone for CI/CD

Shallow Clone Limitations

Sparse Checkout

Adding and Removing Directories

Sparse Checkout Patterns

Combining with Shallow Clone

Partial Clones

How Partial Clones Work

Server Requirements

Commit Graph

Impact

Incremental Updates

Filesystem Monitor

Impact

Watchman Integration

Git Maintenance

Maintenance Schedule

Manual Garbage Collection

Configuration for Large Repos

The feature.manyFiles Shortcut

Complete Working Example: Optimizing a Large Monorepo

Common Issues and Troubleshooting

Best Practices

References

Quick Links

Need Expert Help?