Git Performance Optimization for Large Repositories
Techniques for improving Git performance in large repositories including shallow clones, sparse checkout, partial clones, commit graph, filesystem monitor, and maintenance commands.
Git Performance Optimization for Large Repositories
Git was designed for the Linux kernel — a massive codebase with decades of history. But even Git slows down when repositories accumulate hundreds of thousands of commits, millions of files, or gigabytes of binary assets. Status checks take seconds. Log commands crawl. Cloning takes twenty minutes.
I work with repositories that have 100,000+ commits and thousands of files. Without optimization, basic operations become painful. The techniques in this guide take Git from sluggish to instant on these large repos.
Prerequisites
- Git installed (v2.38+ for all features discussed)
- A repository large enough to feel slow (10,000+ commits or 10,000+ files)
- Terminal access
- Basic Git knowledge
Diagnosing Performance Problems
Before optimizing, measure where time is spent:
# Time a git status
time git status
# Verbose status shows what's slow
GIT_TRACE=1 git status
# Detailed timing of all Git operations
GIT_TRACE_PERFORMANCE=1 git status 2>&1 | head -30
# Repository statistics
git count-objects -vH
# count: 0
# size: 0 bytes
# in-pack: 245831
# packs: 1
# size-pack: 892.34 MiB
# prune-packable: 0
# garbage: 0
# size-garbage: 0 bytes
# Number of commits
git rev-list --count HEAD
# Number of files
git ls-files | wc -l
# Largest files in the repository
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
sed -n 's/^blob //p' | sort -rnk2 | head -20
Shallow Clones
Clone only recent history instead of the entire repo:
# Clone only the last 10 commits
git clone --depth 10 https://github.com/myorg/large-repo.git
# Clone only the last commit (fastest)
git clone --depth 1 https://github.com/myorg/large-repo.git
# Fetch more history later if needed
git fetch --deepen 50
# Convert to full clone
git fetch --unshallow
Shallow Clone for CI/CD
Most CI pipelines only need the latest commit:
# GitHub Actions
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1 # Shallow clone
# GitLab CI
variables:
GIT_DEPTH: 1
# Jenkins
checkout([$class: 'GitSCM',
extensions: [[$class: 'CloneOption', depth: 1, shallow: true]]])
Shallow Clone Limitations
# These operations require full history:
git log --all # Only shows shallow history
git blame file.js # May show incomplete blame
git bisect # Cannot bisect beyond shallow boundary
git merge-base main feature # May fail if common ancestor is not in shallow history
# Workaround: deepen when needed
git fetch --deepen 100
Sparse Checkout
Check out only the directories you need instead of the entire working tree:
# Initialize sparse checkout
git clone --no-checkout https://github.com/myorg/large-repo.git
cd large-repo
git sparse-checkout init --cone
# Specify which directories to check out
git sparse-checkout set packages/api packages/shared
# Check out the files
git checkout main
# Result: only packages/api/ and packages/shared/ exist
ls
# packages/
# api/
# shared/
Adding and Removing Directories
# Add another directory
git sparse-checkout add packages/web
# List current sparse checkout patterns
git sparse-checkout list
# Disable sparse checkout (get everything)
git sparse-checkout disable
Sparse Checkout Patterns
# Cone mode (default) — specify directories
git sparse-checkout set src/ docs/ tests/
# Non-cone mode — use gitignore-style patterns
git sparse-checkout init --no-cone
git sparse-checkout set '/*' '!/packages/*' '/packages/api/*' '/packages/shared/*'
Combining with Shallow Clone
# Maximum speed: shallow + sparse
git clone --depth 1 --filter=blob:none --sparse https://github.com/myorg/large-repo.git
cd large-repo
git sparse-checkout set packages/api
Partial Clones
Partial clones download object metadata but defer downloading file contents until needed:
# Clone without downloading blobs (file contents)
git clone --filter=blob:none https://github.com/myorg/large-repo.git
# Clone without objects larger than 1MB
git clone --filter=blob:limit=1m https://github.com/myorg/large-repo.git
# Clone without trees (directory listings) — most aggressive
git clone --filter=tree:0 https://github.com/myorg/large-repo.git
How Partial Clones Work
Full clone:
Download: ALL commits + ALL trees + ALL blobs
Time: 20 minutes for a 5GB repo
Blobless clone (--filter=blob:none):
Download: ALL commits + ALL trees + NO blobs
Time: 2 minutes (blobs fetched on demand)
Treeless clone (--filter=tree:0):
Download: ALL commits + NO trees + NO blobs
Time: 30 seconds (trees and blobs fetched on demand)
When you git checkout a file, Git fetches the blob from the server. When you git log -- path/to/file, Git fetches the necessary trees. This is transparent — Git handles it automatically.
Server Requirements
Partial clones require server support. GitHub, GitLab, and Bitbucket all support them. Self-hosted Git may need configuration:
# On the server, enable partial clone
git config --global uploadpack.allowFilter true
git config --global uploadpack.allowAnySHA1InWant true
Commit Graph
The commit graph is a pre-computed cache of commit relationships that dramatically speeds up log, merge-base, and reachability queries:
# Generate the commit graph
git commit-graph write --reachable
# Enable automatic commit graph updates
git config --global fetch.writeCommitGraph true
git config --global gc.writeCommitGraph true
Impact
# Before commit graph
time git log --oneline -100
# real 0m2.340s
# After commit graph
time git log --oneline -100
# real 0m0.045s
The commit graph stores generation numbers that let Git skip entire branches of the commit DAG during traversal. For repos with 100,000+ commits, this turns multi-second operations into milliseconds.
Incremental Updates
# Write incremental commit graph (faster than full rebuild)
git commit-graph write --reachable --changed-paths
# The --changed-paths option also pre-computes Bloom filters
# for file paths, speeding up git log -- <path>
Filesystem Monitor
Git checks every file in the working tree during git status. On repos with 50,000+ files, this takes seconds. The filesystem monitor (FSMonitor) tells Git which files changed since the last check:
# Enable the built-in filesystem monitor
git config core.fsmonitor true
git config core.untrackedCache true
# First run initializes the monitor
git status
# Subsequent runs are much faster
Impact
# 50,000 files, without FSMonitor
time git status
# real 0m3.200s
# Same repo, with FSMonitor
time git status
# real 0m0.150s
Watchman Integration
For even better performance, use Facebook's Watchman:
# Install Watchman
brew install watchman # macOS
sudo apt install watchman # Linux
# Configure Git to use Watchman
git config core.fsmonitor "$(which watchman)"
Git Maintenance
Git 2.34+ includes a maintenance system that runs optimization tasks automatically:
# Register a repository for automatic maintenance
git maintenance register
# Run maintenance manually
git maintenance run
# Run specific tasks
git maintenance run --task=gc
git maintenance run --task=commit-graph
git maintenance run --task=prefetch
git maintenance run --task=loose-objects
git maintenance run --task=incremental-repack
git maintenance run --task=pack-refs
Maintenance Schedule
# Enable scheduled maintenance
git maintenance start
# This creates system-level scheduled tasks:
# - Hourly: prefetch, loose-objects, incremental-repack
# - Daily: commit-graph, pack-refs
# - Weekly: gc
Manual Garbage Collection
# Standard garbage collection
git gc
# Aggressive GC (slower but more thorough)
git gc --aggressive
# Prune objects older than 2 weeks
git gc --prune=2.weeks.ago
# Repack all objects into a single pack
git repack -a -d -f --depth=250 --window=250
Configuration for Large Repos
# Core performance settings
git config core.fsmonitor true
git config core.untrackedCache true
git config core.preloadIndex true
git config core.fscache true # Windows only
# Pack settings for large repos
git config pack.threads 0 # Use all CPU cores
git config pack.windowMemory 256m
git config pack.deltaCacheSize 256m
# Fetch optimizations
git config fetch.writeCommitGraph true
git config fetch.parallel 4 # Parallel submodule fetches
# Index performance
git config index.version 4 # Compact index format
git config feature.manyFiles true # Enables multiple optimizations
# Diff and merge performance
git config diff.algorithm histogram # Faster diff algorithm
git config merge.renameLimit 10000 # Allow more rename detection
The feature.manyFiles Shortcut
# This single setting enables multiple optimizations:
git config feature.manyFiles true
# Equivalent to:
# core.untrackedCache = true
# core.fsmonitor = true
# index.version = 4
Complete Working Example: Optimizing a Large Monorepo
#!/bin/bash
# scripts/optimize-repo.sh
echo "=== Git Repository Optimization ==="
echo ""
# Show current stats
echo "Repository statistics:"
git count-objects -vH
echo ""
echo "Commits: $(git rev-list --count HEAD)"
echo "Files: $(git ls-files | wc -l)"
echo ""
# Enable performance features
echo "Enabling performance features..."
git config core.fsmonitor true
git config core.untrackedCache true
git config core.preloadIndex true
git config feature.manyFiles true
git config fetch.writeCommitGraph true
git config gc.writeCommitGraph true
git config pack.threads 0
git config diff.algorithm histogram
echo "Done."
echo ""
# Build commit graph with Bloom filters
echo "Building commit graph..."
git commit-graph write --reachable --changed-paths
echo "Done."
echo ""
# Run garbage collection
echo "Running garbage collection..."
git gc --auto
echo "Done."
echo ""
# Register for maintenance
echo "Registering for automatic maintenance..."
git maintenance register
echo "Done."
echo ""
# Benchmark
echo "=== Benchmarks ==="
echo "git status:"
time git status > /dev/null 2>&1
echo ""
echo "git log -100:"
time git log --oneline -100 > /dev/null 2>&1
echo ""
echo "Optimization complete."
Common Issues and Troubleshooting
git status is slow even with FSMonitor
The filesystem monitor daemon may not be running:
Fix: Check if the monitor is active with git config core.fsmonitor. If set to true, the built-in monitor should start automatically. If using Watchman, verify it is running with watchman watch-list. On Windows, ensure the Git Credential Manager is not interfering.
Shallow clone breaks git log and git blame
History beyond the shallow depth is not available:
Fix: Use git fetch --deepen N to get more history, or git fetch --unshallow for the full history. For CI pipelines that only need the latest code, this limitation is acceptable.
Sparse checkout shows wrong files
Files outside the sparse checkout set appear after certain operations:
Fix: Run git sparse-checkout reapply to re-apply the sparse checkout rules. Ensure you are using cone mode (--cone) which is more predictable. Non-cone mode with complex patterns can produce unexpected results.
Commit graph becomes corrupted after force push
A force push rewrites history that the commit graph has cached:
Fix: Delete the commit graph and rebuild: rm .git/objects/info/commit-graph && git commit-graph write --reachable. The maintenance system will also rebuild it automatically.
Repository size keeps growing despite gc
Large binary files in history take space even after deletion:
Fix: Use git filter-repo to remove large files from history entirely. Or adopt Git LFS for binary assets going forward. Regular git gc cannot remove objects that are referenced by any commit in history.
Best Practices
- Enable
feature.manyFileson any repo with 10,000+ files. It is a single setting that activates FSMonitor, untracked cache, and index version 4. - Use shallow clones in CI/CD. Build pipelines rarely need full history.
--depth 1makes cloning near-instant. - Build the commit graph after large fetches. The commit graph accelerates log, merge-base, and reachability queries. Set
fetch.writeCommitGraphto keep it updated automatically. - Run
git maintenance registeron frequently used repos. The automatic maintenance schedule keeps garbage collection, repacking, and commit graph updates running without manual intervention. - Use sparse checkout for monorepos. Checking out only the packages you work on reduces the working tree size and speeds up every file-scanning operation.
- Profile before optimizing. Use
GIT_TRACE_PERFORMANCE=1to identify the actual bottleneck. Optimizing clone speed when the problem isgit statuswastes effort. - Combine partial clone with sparse checkout for maximum speed.
--filter=blob:none --sparsegives you the fastest clone and smallest working tree. Blobs are fetched on demand.