Managing Large Files in Git: LFS and Alternatives
A practical guide to handling large files in Git repositories using Git LFS, git-annex, .gitignore strategies, and external storage patterns.
Managing Large Files in Git: LFS and Alternatives
Git was designed for text files. Every commit stores a full snapshot of every file, and Git's delta compression works brilliantly on source code. But add a 50MB design file, a 200MB database dump, or a collection of video assets, and the repository balloons. Every clone downloads every version of every large file ever committed. A repository with ten versions of a 100MB file is a gigabyte before anyone writes a line of code.
I have seen repositories grow to 10GB+ because someone committed build artifacts or training datasets. Git LFS solves this by storing large files outside the Git repository while keeping pointers in the repo. This guide covers LFS setup, migration, and the alternatives for when LFS is not the right fit.
Prerequisites
- Git installed (v2.20+)
- Git LFS installed (
git lfs install) - A repository with large files or planning to add them
- Terminal access
The Problem with Large Files in Git
# Check repository size
git count-objects -vH
# size-pack: 2.34 GiB ← This should be kilobytes for a code-only repo
# Find the largest objects in history
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize:disk) %(rest)' | \
sed -n 's/^blob //p' | \
sort -rnk2 | \
head -20
# abc1234 104857600 assets/video/demo.mp4
# def5678 52428800 data/training-set.csv
# ghi9012 31457280 docs/presentation.pptx
Each version of these files is stored as a full blob. Delete the file in a later commit and the old versions still live in history, inflating clone size permanently.
Git LFS Basics
Git LFS replaces large files in your repository with small pointer files. The actual file content is stored on a separate LFS server.
Installation
# macOS
brew install git-lfs
# Ubuntu/Debian
sudo apt install git-lfs
# Windows (included with Git for Windows)
# Already installed
# Initialize LFS for your user
git lfs install
Tracking File Types
# Track all PSD files
git lfs track "*.psd"
# Track all files in a directory
git lfs track "assets/videos/**"
# Track specific file types
git lfs track "*.mp4"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.png"
git lfs track "*.jpg"
git lfs track "*.pdf"
git lfs track "*.sqlite"
# View tracked patterns
git lfs track
# Listing tracked patterns
# *.psd (.gitattributes)
# *.mp4 (.gitattributes)
# assets/videos/** (.gitattributes)
Tracking patterns are stored in .gitattributes:
# .gitattributes
*.psd filter=lfs diff=lfs merge=lfs -text
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
assets/videos/** filter=lfs diff=lfs merge=lfs -text
Committing LFS Files
After setting up tracking, commit the .gitattributes file and add your large files normally:
git add .gitattributes
git commit -m "chore: configure Git LFS tracking"
# Add large files as usual
git add assets/design.psd
git commit -m "feat: add homepage design mockup"
git push
Git LFS intercepts the git push and uploads the large file to the LFS server while storing a pointer in the Git repository.
What Gets Stored in Git
Instead of the actual file, Git stores a pointer:
version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 104857600
This pointer is ~130 bytes regardless of the actual file size.
Pulling LFS Files
# Clone with LFS files (automatic if LFS is installed)
git clone https://github.com/myorg/project.git
# LFS files are downloaded automatically during checkout
# Pull LFS files manually (if clone did not fetch them)
git lfs pull
# Pull specific files
git lfs pull --include="assets/images/**"
# Pull excluding certain patterns
git lfs pull --exclude="assets/videos/**"
Checking LFS Status
# List LFS-tracked files
git lfs ls-files
# Show LFS objects that would be pushed
git lfs status
# Check LFS environment
git lfs env
# Show LFS transfer progress
git lfs logs last
LFS Configuration
Per-Repository Settings
# Set LFS transfer concurrency
git config lfs.concurrenttransfers 8
# Set transfer timeout (seconds)
git config lfs.transfer.maxretries 5
# Custom LFS server URL
git config lfs.url https://lfs.example.com/myorg/myrepo
# Skip downloading LFS files during clone (for CI)
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/myorg/project.git
Selective LFS Fetching
# Only download LFS files needed for checkout
git config lfs.fetchinclude "src/assets/**"
# Exclude large directories from LFS fetch
git config lfs.fetchexclude "data/training/**,assets/raw/**"
LFS in CI/CD
# GitHub Actions — skip LFS for non-asset jobs
steps:
- uses: actions/checkout@v4
with:
lfs: false # Skip LFS files
# Or selectively fetch
steps:
- uses: actions/checkout@v4
with:
lfs: true
# Only fetch the LFS files you need
- run: git lfs pull --include="assets/images/**"
Migrating Existing Repos to LFS
Tracking New Files
If large files have not been committed yet:
git lfs track "*.psd"
git add .gitattributes
git add design.psd
git commit -m "chore: add design file with LFS"
Migrating Files Already in History
If large files were committed without LFS:
# Migrate existing files to LFS (rewrites history)
git lfs migrate import --include="*.psd,*.mp4,*.zip" --everything
# Migrate specific paths
git lfs migrate import --include="assets/" --everything
# Check what would be migrated (dry run)
git lfs migrate info --include="*.psd,*.mp4" --everything
# migrate: Sorting commits: ..., done.
# migrate: Examining commits: 100% (542/542), done.
# *.psd 2 files 152 MB 2 versions
# *.mp4 5 files 890 MB 8 versions
Warning: git lfs migrate import rewrites Git history. All commit hashes change. Every collaborator must re-clone after this operation.
# After migration, force push
git push --force-with-lease origin main
# Update the repository on the remote
git reflog expire --expire-unreachable=now --all
git gc --prune=now
Untracking Files from LFS
Move files back from LFS to regular Git storage:
# Remove tracking pattern
git lfs untrack "*.png"
# Migrate files back from LFS to Git
git lfs migrate export --include="*.png" --everything
LFS Storage and Hosting
GitHub LFS
- Free tier: 1GB storage, 1GB bandwidth per month
- Data packs: 50GB storage + 50GB bandwidth for $5/month
- Files up to 2GB each
GitLab LFS
- 10GB total project storage (includes LFS)
- No per-file size limit
- Configurable on self-hosted instances
Self-Hosted LFS Server
# Using lfs-test-server for development
go install github.com/git-lfs/lfs-test-server@latest
lfs-test-server
# Configure repository to use custom server
git config lfs.url http://localhost:8080
For production self-hosted LFS, consider:
- Gitea (includes LFS support)
- MinIO + custom LFS server
- S3-compatible storage backend
Alternatives to Git LFS
.gitignore Strategy
Keep large files out of Git entirely:
# .gitignore
*.psd
*.mp4
*.zip
data/raw/
assets/large/
# But track small images
!assets/icons/*.png
!assets/logos/*.svg
Store large files on a shared drive, S3 bucket, or artifact server. Include download instructions in the README:
# scripts/download-assets.sh
#!/bin/bash
echo "Downloading large assets..."
aws s3 sync s3://myproject-assets/data ./data
aws s3 sync s3://myproject-assets/videos ./assets/videos
echo "Done."
DVC (Data Version Control)
For data science projects with large datasets:
# Install DVC
pip install dvc
# Initialize DVC
dvc init
# Track a large file
dvc add data/training-set.csv
git add data/training-set.csv.dvc data/.gitignore
git commit -m "chore: track training data with DVC"
# Configure remote storage
dvc remote add -d storage s3://my-bucket/dvc
dvc push
DVC stores files in configurable backends (S3, GCS, Azure, SSH, local). It creates .dvc pointer files similar to LFS.
Git-Annex
For repositories with many large files:
# Initialize git-annex
git annex init "my laptop"
# Add large files
git annex add data/large-file.bin
# Sync with remotes
git annex sync
git annex copy --to origin
Git-annex is more complex than LFS but supports multiple storage backends, partial availability (files can be on some remotes but not others), and content locking.
External Storage with Symlinks
# Store large files outside the repo
mkdir -p /shared/project-assets/videos/
# Create symlinks in the repo
ln -s /shared/project-assets/videos assets/videos
# Track the symlink, not the content
git add assets/videos
git commit -m "chore: add symlink to shared video assets"
This works well for local development teams with shared network storage. It does not work for distributed teams.
Complete Working Example: Setting Up LFS for a Project
# Start a new project with LFS configured from day one
mkdir my-project && cd my-project
git init
git lfs install
# Define tracking rules
git lfs track "*.psd"
git lfs track "*.ai"
git lfs track "*.sketch"
git lfs track "*.mp4"
git lfs track "*.mov"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.sqlite"
git lfs track "*.woff2"
git lfs track "assets/large/**"
# Commit the tracking configuration
git add .gitattributes
git commit -m "chore: configure Git LFS for binary assets"
# Create project structure
mkdir -p src assets/images assets/videos data
# Add regular code files (stored in Git)
cat > src/app.js << 'SCRIPT'
var express = require("express");
var app = express();
app.use("/assets", express.static("assets"));
app.listen(3000);
SCRIPT
# Add a large file (stored in LFS)
# Simulate a large file for this example
dd if=/dev/zero of=assets/videos/demo.mp4 bs=1M count=50 2>/dev/null
git add src/app.js
git add assets/videos/demo.mp4
git commit -m "feat: add application with demo video"
# Verify LFS is working
git lfs ls-files
# abc1234567 * assets/videos/demo.mp4
# Check the pointer stored in Git
git show HEAD:assets/videos/demo.mp4
# version https://git-lfs.github.com/spec/v1
# oid sha256:e3b0c44298fc1c149afbf4c8996fb924...
# size 52428800
Common Issues and Troubleshooting
LFS files show as pointer text instead of actual content
Git LFS is not installed or the smudge filter is not running:
Fix: Run git lfs install to set up the smudge and clean filters. Then git lfs pull to download the actual file content. If cloning fresh, make sure Git LFS is installed before cloning.
Push fails with "batch response: repository not found"
The LFS server URL is incorrect or you do not have write access:
Fix: Check git lfs env to see the configured LFS URL. Ensure you have push access to the repository. For GitHub, verify that LFS is enabled for the repository. For self-hosted, check the server logs.
Clone is still slow despite using LFS
LFS downloads all tracked files during checkout. If you have thousands of LFS files, this takes time:
Fix: Use GIT_LFS_SKIP_SMUDGE=1 git clone <url> to clone without downloading LFS files. Then selectively pull what you need: git lfs pull --include="assets/images/**".
Repository size did not shrink after migrating to LFS
The old non-LFS objects are still in Git history:
Fix: After git lfs migrate import, run git reflog expire --expire-unreachable=now --all && git gc --prune=now to remove old objects. Then force push and ask collaborators to re-clone.
LFS bandwidth quota exceeded
GitHub's free LFS bandwidth is limited to 1GB per month:
Fix: Purchase additional data packs, self-host an LFS server, or use lfs.fetchinclude/lfs.fetchexclude to limit which files are downloaded. For CI, set lfs: false in checkout actions for jobs that do not need binary assets.
Best Practices
- Track binary files from day one. Set up
.gitattributeswith LFS tracking before the first binary file is committed. Migrating later requires history rewriting. - Track by extension, not by filename.
*.psdcatches all Photoshop files regardless of where they are added. Tracking individual files misses future additions. - Include
.gitattributesin the repository. This file must be committed so other developers automatically use LFS for tracked patterns. - Skip LFS in CI when not needed. Test suites rarely need video files or design mockups. Use
GIT_LFS_SKIP_SMUDGE=1to save bandwidth and time. - Monitor LFS storage and bandwidth. GitHub charges for LFS beyond the free tier. Set up alerts before you hit limits.
- Do not track auto-generated files with LFS. Build artifacts, compiled binaries, and generated assets should be in
.gitignore, not LFS. LFS is for source assets that humans create and need to version. - Consider alternatives for datasets. Git LFS works for files up to a few hundred megabytes. For multi-gigabyte datasets, DVC or external storage with download scripts is more practical.