Version Control

Managing Large Files in Git: LFS and Alternatives

A practical guide to handling large files in Git repositories using Git LFS, git-annex, .gitignore strategies, and external storage patterns.

Managing Large Files in Git: LFS and Alternatives

Git was designed for text files. Every commit stores a full snapshot of every file, and Git's delta compression works brilliantly on source code. But add a 50MB design file, a 200MB database dump, or a collection of video assets, and the repository balloons. Every clone downloads every version of every large file ever committed. A repository with ten versions of a 100MB file is a gigabyte before anyone writes a line of code.

I have seen repositories grow to 10GB+ because someone committed build artifacts or training datasets. Git LFS solves this by storing large files outside the Git repository while keeping pointers in the repo. This guide covers LFS setup, migration, and the alternatives for when LFS is not the right fit.

Prerequisites

  • Git installed (v2.20+)
  • Git LFS installed (git lfs install)
  • A repository with large files or planning to add them
  • Terminal access

The Problem with Large Files in Git

# Check repository size
git count-objects -vH
# size-pack: 2.34 GiB  ← This should be kilobytes for a code-only repo

# Find the largest objects in history
git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize:disk) %(rest)' | \
  sed -n 's/^blob //p' | \
  sort -rnk2 | \
  head -20
# abc1234 104857600 assets/video/demo.mp4
# def5678  52428800 data/training-set.csv
# ghi9012  31457280 docs/presentation.pptx

Each version of these files is stored as a full blob. Delete the file in a later commit and the old versions still live in history, inflating clone size permanently.

Git LFS Basics

Git LFS replaces large files in your repository with small pointer files. The actual file content is stored on a separate LFS server.

Installation

# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Windows (included with Git for Windows)
# Already installed

# Initialize LFS for your user
git lfs install

Tracking File Types

# Track all PSD files
git lfs track "*.psd"

# Track all files in a directory
git lfs track "assets/videos/**"

# Track specific file types
git lfs track "*.mp4"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.png"
git lfs track "*.jpg"
git lfs track "*.pdf"
git lfs track "*.sqlite"

# View tracked patterns
git lfs track
# Listing tracked patterns
#     *.psd (.gitattributes)
#     *.mp4 (.gitattributes)
#     assets/videos/** (.gitattributes)

Tracking patterns are stored in .gitattributes:

# .gitattributes
*.psd filter=lfs diff=lfs merge=lfs -text
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
assets/videos/** filter=lfs diff=lfs merge=lfs -text

Committing LFS Files

After setting up tracking, commit the .gitattributes file and add your large files normally:

git add .gitattributes
git commit -m "chore: configure Git LFS tracking"

# Add large files as usual
git add assets/design.psd
git commit -m "feat: add homepage design mockup"
git push

Git LFS intercepts the git push and uploads the large file to the LFS server while storing a pointer in the Git repository.

What Gets Stored in Git

Instead of the actual file, Git stores a pointer:

version https://git-lfs.github.com/spec/v1
oid sha256:4d7a214614ab2935c943f9e0ff69d22eadbb8f32b1258daaa5e2ca24d17e2393
size 104857600

This pointer is ~130 bytes regardless of the actual file size.

Pulling LFS Files

# Clone with LFS files (automatic if LFS is installed)
git clone https://github.com/myorg/project.git
# LFS files are downloaded automatically during checkout

# Pull LFS files manually (if clone did not fetch them)
git lfs pull

# Pull specific files
git lfs pull --include="assets/images/**"

# Pull excluding certain patterns
git lfs pull --exclude="assets/videos/**"

Checking LFS Status

# List LFS-tracked files
git lfs ls-files

# Show LFS objects that would be pushed
git lfs status

# Check LFS environment
git lfs env

# Show LFS transfer progress
git lfs logs last

LFS Configuration

Per-Repository Settings

# Set LFS transfer concurrency
git config lfs.concurrenttransfers 8

# Set transfer timeout (seconds)
git config lfs.transfer.maxretries 5

# Custom LFS server URL
git config lfs.url https://lfs.example.com/myorg/myrepo

# Skip downloading LFS files during clone (for CI)
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/myorg/project.git

Selective LFS Fetching

# Only download LFS files needed for checkout
git config lfs.fetchinclude "src/assets/**"

# Exclude large directories from LFS fetch
git config lfs.fetchexclude "data/training/**,assets/raw/**"

LFS in CI/CD

# GitHub Actions — skip LFS for non-asset jobs
steps:
  - uses: actions/checkout@v4
    with:
      lfs: false    # Skip LFS files

# Or selectively fetch
steps:
  - uses: actions/checkout@v4
    with:
      lfs: true
  # Only fetch the LFS files you need
  - run: git lfs pull --include="assets/images/**"

Migrating Existing Repos to LFS

Tracking New Files

If large files have not been committed yet:

git lfs track "*.psd"
git add .gitattributes
git add design.psd
git commit -m "chore: add design file with LFS"

Migrating Files Already in History

If large files were committed without LFS:

# Migrate existing files to LFS (rewrites history)
git lfs migrate import --include="*.psd,*.mp4,*.zip" --everything

# Migrate specific paths
git lfs migrate import --include="assets/" --everything

# Check what would be migrated (dry run)
git lfs migrate info --include="*.psd,*.mp4" --everything
# migrate: Sorting commits: ..., done.
# migrate: Examining commits: 100% (542/542), done.
# *.psd    2 files   152 MB    2 versions
# *.mp4    5 files   890 MB    8 versions

Warning: git lfs migrate import rewrites Git history. All commit hashes change. Every collaborator must re-clone after this operation.

# After migration, force push
git push --force-with-lease origin main

# Update the repository on the remote
git reflog expire --expire-unreachable=now --all
git gc --prune=now

Untracking Files from LFS

Move files back from LFS to regular Git storage:

# Remove tracking pattern
git lfs untrack "*.png"

# Migrate files back from LFS to Git
git lfs migrate export --include="*.png" --everything

LFS Storage and Hosting

GitHub LFS

  • Free tier: 1GB storage, 1GB bandwidth per month
  • Data packs: 50GB storage + 50GB bandwidth for $5/month
  • Files up to 2GB each

GitLab LFS

  • 10GB total project storage (includes LFS)
  • No per-file size limit
  • Configurable on self-hosted instances

Self-Hosted LFS Server

# Using lfs-test-server for development
go install github.com/git-lfs/lfs-test-server@latest
lfs-test-server

# Configure repository to use custom server
git config lfs.url http://localhost:8080

For production self-hosted LFS, consider:

  • Gitea (includes LFS support)
  • MinIO + custom LFS server
  • S3-compatible storage backend

Alternatives to Git LFS

.gitignore Strategy

Keep large files out of Git entirely:

# .gitignore
*.psd
*.mp4
*.zip
data/raw/
assets/large/

# But track small images
!assets/icons/*.png
!assets/logos/*.svg

Store large files on a shared drive, S3 bucket, or artifact server. Include download instructions in the README:

# scripts/download-assets.sh
#!/bin/bash
echo "Downloading large assets..."
aws s3 sync s3://myproject-assets/data ./data
aws s3 sync s3://myproject-assets/videos ./assets/videos
echo "Done."

DVC (Data Version Control)

For data science projects with large datasets:

# Install DVC
pip install dvc

# Initialize DVC
dvc init

# Track a large file
dvc add data/training-set.csv
git add data/training-set.csv.dvc data/.gitignore
git commit -m "chore: track training data with DVC"

# Configure remote storage
dvc remote add -d storage s3://my-bucket/dvc
dvc push

DVC stores files in configurable backends (S3, GCS, Azure, SSH, local). It creates .dvc pointer files similar to LFS.

Git-Annex

For repositories with many large files:

# Initialize git-annex
git annex init "my laptop"

# Add large files
git annex add data/large-file.bin

# Sync with remotes
git annex sync
git annex copy --to origin

Git-annex is more complex than LFS but supports multiple storage backends, partial availability (files can be on some remotes but not others), and content locking.

External Storage with Symlinks

# Store large files outside the repo
mkdir -p /shared/project-assets/videos/

# Create symlinks in the repo
ln -s /shared/project-assets/videos assets/videos

# Track the symlink, not the content
git add assets/videos
git commit -m "chore: add symlink to shared video assets"

This works well for local development teams with shared network storage. It does not work for distributed teams.

Complete Working Example: Setting Up LFS for a Project

# Start a new project with LFS configured from day one
mkdir my-project && cd my-project
git init
git lfs install

# Define tracking rules
git lfs track "*.psd"
git lfs track "*.ai"
git lfs track "*.sketch"
git lfs track "*.mp4"
git lfs track "*.mov"
git lfs track "*.zip"
git lfs track "*.tar.gz"
git lfs track "*.sqlite"
git lfs track "*.woff2"
git lfs track "assets/large/**"

# Commit the tracking configuration
git add .gitattributes
git commit -m "chore: configure Git LFS for binary assets"

# Create project structure
mkdir -p src assets/images assets/videos data

# Add regular code files (stored in Git)
cat > src/app.js << 'SCRIPT'
var express = require("express");
var app = express();

app.use("/assets", express.static("assets"));
app.listen(3000);
SCRIPT

# Add a large file (stored in LFS)
# Simulate a large file for this example
dd if=/dev/zero of=assets/videos/demo.mp4 bs=1M count=50 2>/dev/null

git add src/app.js
git add assets/videos/demo.mp4
git commit -m "feat: add application with demo video"

# Verify LFS is working
git lfs ls-files
# abc1234567 * assets/videos/demo.mp4

# Check the pointer stored in Git
git show HEAD:assets/videos/demo.mp4
# version https://git-lfs.github.com/spec/v1
# oid sha256:e3b0c44298fc1c149afbf4c8996fb924...
# size 52428800

Common Issues and Troubleshooting

LFS files show as pointer text instead of actual content

Git LFS is not installed or the smudge filter is not running:

Fix: Run git lfs install to set up the smudge and clean filters. Then git lfs pull to download the actual file content. If cloning fresh, make sure Git LFS is installed before cloning.

Push fails with "batch response: repository not found"

The LFS server URL is incorrect or you do not have write access:

Fix: Check git lfs env to see the configured LFS URL. Ensure you have push access to the repository. For GitHub, verify that LFS is enabled for the repository. For self-hosted, check the server logs.

Clone is still slow despite using LFS

LFS downloads all tracked files during checkout. If you have thousands of LFS files, this takes time:

Fix: Use GIT_LFS_SKIP_SMUDGE=1 git clone <url> to clone without downloading LFS files. Then selectively pull what you need: git lfs pull --include="assets/images/**".

Repository size did not shrink after migrating to LFS

The old non-LFS objects are still in Git history:

Fix: After git lfs migrate import, run git reflog expire --expire-unreachable=now --all && git gc --prune=now to remove old objects. Then force push and ask collaborators to re-clone.

LFS bandwidth quota exceeded

GitHub's free LFS bandwidth is limited to 1GB per month:

Fix: Purchase additional data packs, self-host an LFS server, or use lfs.fetchinclude/lfs.fetchexclude to limit which files are downloaded. For CI, set lfs: false in checkout actions for jobs that do not need binary assets.

Best Practices

  • Track binary files from day one. Set up .gitattributes with LFS tracking before the first binary file is committed. Migrating later requires history rewriting.
  • Track by extension, not by filename. *.psd catches all Photoshop files regardless of where they are added. Tracking individual files misses future additions.
  • Include .gitattributes in the repository. This file must be committed so other developers automatically use LFS for tracked patterns.
  • Skip LFS in CI when not needed. Test suites rarely need video files or design mockups. Use GIT_LFS_SKIP_SMUDGE=1 to save bandwidth and time.
  • Monitor LFS storage and bandwidth. GitHub charges for LFS beyond the free tier. Set up alerts before you hit limits.
  • Do not track auto-generated files with LFS. Build artifacts, compiled binaries, and generated assets should be in .gitignore, not LFS. LFS is for source assets that humans create and need to version.
  • Consider alternatives for datasets. Git LFS works for files up to a few hundred megabytes. For multi-gigabyte datasets, DVC or external storage with download scripts is more practical.

References

Powered by Contentful