Version Control

Git Submodules and Subtrees: When to Use Each

A practical comparison of Git submodules and subtrees for managing shared code, vendored dependencies, and multi-repo architectures with real-world patterns.

Git Submodules and Subtrees: When to Use Each

Every project eventually needs to share code across repositories. A utility library used by three services. A configuration repo that multiple deployments reference. A vendored dependency you need to patch. Git provides two mechanisms for this: submodules and subtrees. They solve the same problem differently, and choosing wrong creates ongoing friction.

I have used both extensively. Submodules are better when you need independent version control of the shared code. Subtrees are better when you want the shared code fully embedded in your project. This guide covers both approaches with the real workflows — not just setup, but the daily operations that determine whether each approach is practical.

Prerequisites

  • Git installed (v2.20+ for recent submodule improvements)
  • Two or more repositories to connect
  • Understanding of Git basics (clone, commit, push, pull)
  • Terminal access

The Core Problem

You have a shared library that multiple projects use:

project-api/
  src/
  package.json

project-web/
  src/
  package.json

shared-utils/        # Used by both projects
  src/
    validation.js
    formatting.js
  package.json

Options:

  1. Copy the code — works until you need to sync changes. Then it becomes a maintenance nightmare.
  2. Publish as a package — good for stable libraries, overhead for rapidly changing code.
  3. Git submodule — includes the shared repo by reference.
  4. Git subtree — includes the shared repo by copying its history.

Git Submodules

Submodules embed one Git repository inside another as a pointer. The parent repo stores a reference to a specific commit in the child repo.

Adding a Submodule

cd project-api/
git submodule add https://github.com/myorg/shared-utils.git libs/shared
git commit -m "chore: add shared-utils as submodule"

This creates:

  • libs/shared/ — the cloned submodule repository
  • .gitmodules — configuration file tracking the submodule
# .gitmodules
[submodule "libs/shared"]
    path = libs/shared
    url = https://github.com/myorg/shared-utils.git

The parent repo does not store the submodule's files — it stores a commit hash. Running git diff after adding a submodule shows:

+Subproject commit abc1234def5678...

Cloning a Project with Submodules

# Clone and initialize submodules in one command
git clone --recurse-submodules https://github.com/myorg/project-api.git

# Or clone first, then initialize
git clone https://github.com/myorg/project-api.git
cd project-api
git submodule init
git submodule update

Updating Submodules

Pull the latest changes from the submodule's remote:

# Update a specific submodule
cd libs/shared
git checkout main
git pull origin main
cd ../..
git add libs/shared
git commit -m "chore: update shared-utils to latest"

# Or update all submodules at once
git submodule update --remote --merge
git add .
git commit -m "chore: update all submodules"

Pinning a Submodule to a Specific Version

cd libs/shared
git checkout v2.1.0    # Tag, branch, or commit hash
cd ../..
git add libs/shared
git commit -m "chore: pin shared-utils to v2.1.0"

The parent repo now records that specific commit. Other developers who run git submodule update will get exactly that version.

Working Inside a Submodule

You can edit submodule code directly:

cd libs/shared

# Make changes
git add src/validation.js
git commit -m "fix: handle empty string in email validation"
git push origin main

# Return to parent and update the reference
cd ../..
git add libs/shared
git commit -m "chore: update shared-utils with validation fix"
git push

Configuring Submodule Behavior

# Track a specific branch instead of a commit
git config -f .gitmodules submodule.libs/shared.branch main

# Shallow clone submodules (faster, less disk space)
git config -f .gitmodules submodule.libs/shared.shallow true

# Update strategy
git config -f .gitmodules submodule.libs/shared.update merge
# Options: checkout (default), merge, rebase, none

Removing a Submodule

Removing a submodule requires several steps:

# Remove the submodule entry from .gitmodules
git config -f .gitmodules --remove-section submodule.libs/shared

# Remove the submodule entry from .git/config
git config --remove-section submodule.libs/shared

# Remove the submodule directory and staging
git rm --cached libs/shared
rm -rf libs/shared
rm -rf .git/modules/libs/shared

# Commit the removal
git add .gitmodules
git commit -m "chore: remove shared-utils submodule"

Git Subtrees

Subtrees copy the shared repository's content and history directly into your project. There is no pointer, no separate clone — the files exist in your repo as regular files.

Adding a Subtree

cd project-api/

# Add a remote for the shared repo
git remote add shared-utils https://github.com/myorg/shared-utils.git

# Add the subtree
git subtree add --prefix=libs/shared shared-utils main --squash

The --squash option compresses the shared repo's history into a single commit. Without it, the entire history of the shared repo is merged into your project's history.

Pulling Updates from Upstream

git subtree pull --prefix=libs/shared shared-utils main --squash

This pulls the latest changes from the shared repo and merges them into your project.

Pushing Changes Back to Upstream

If you edit files in the subtree directory, you can push those changes back:

git subtree push --prefix=libs/shared shared-utils main

Git extracts the commits that touched libs/shared/ and pushes them to the shared repo.

Splitting a Directory into a Subtree

Extract existing code into a separate repository:

# Create a branch with only the subtree's history
git subtree split --prefix=src/shared --branch shared-split

# Push to a new repository
git remote add shared-new https://github.com/myorg/shared-utils-new.git
git push shared-new shared-split:main

Subtree with No Squash

# Add without squash — full history preserved
git subtree add --prefix=libs/shared shared-utils main

# This merges the entire commit history of shared-utils
# into your project's log. Useful for tracing changes,
# but makes your log noisy.

Submodules vs. Subtrees: Comparison

How They Store Code

Submodule:
project-api/
  .gitmodules          # Pointer to external repo
  libs/shared/         # Separate Git repo (nested .git)
    .git               # Own history, own remote

Subtree:
project-api/
  libs/shared/         # Regular files in your repo
    validation.js      # Committed directly
    formatting.js      # Part of your history

Workflow Comparison

Operation Submodules Subtrees
Clone Need --recurse-submodules Normal clone works
Pull updates git submodule update --remote git subtree pull
Push changes Push submodule separately git subtree push
See shared code in diffs No (just a hash) Yes (regular files)
Offline access Need to init submodules Always available
CI/CD complexity Need submodule init step No extra steps
Pin exact version Natural (commit hash) Manual (squash commits)
Repo size Small (pointers only) Larger (full copy)
History Separate Merged

Decision Framework

Use submodules when:

  • The shared code is a large, independently versioned project
  • You need to pin exact versions across consumer projects
  • Multiple teams own different repos and need clear boundaries
  • The shared code changes frequently and independently
  • You want to keep your repo size small

Use subtrees when:

  • You want everything in one repo with no external dependencies
  • CI/CD should work without special submodule setup
  • You rarely push changes back to the shared repo
  • Developers should see shared code in normal diffs and searches
  • You are vendoring a third-party dependency for patching
  • You want offline access to all code without extra steps

Common Patterns

Pattern 1: Shared Configuration

# Submodule approach — config updates independently
git submodule add https://github.com/myorg/eslint-config.git config/eslint
// .eslintrc.json
{
  "extends": "./config/eslint/index.js"
}

Pattern 2: Vendored Dependency

# Subtree approach — vendor a library you need to patch
git subtree add --prefix=vendor/express https://github.com/expressjs/express.git 4.18.2 --squash

# Make your patches
git add vendor/express/lib/router/index.js
git commit -m "fix: patch Express router for custom error handling"

Pattern 3: Shared Libraries in a Multi-Repo Architecture

# Submodule approach — multiple services share a validation library
cd service-users/
git submodule add https://github.com/myorg/validation.git libs/validation

cd ../service-orders/
git submodule add https://github.com/myorg/validation.git libs/validation

# Pin both services to the same version
cd service-users/libs/validation && git checkout v1.5.0
cd ../../../service-orders/libs/validation && git checkout v1.5.0

Pattern 4: Documentation in a Separate Repo

# Subtree approach — docs live in their own repo but build with the project
git subtree add --prefix=docs https://github.com/myorg/project-docs.git main --squash

# Build docs alongside code
npm run build-docs  # Reads from docs/ directory

Automation Scripts

Submodule Helper Script

// scripts/submodule-status.js
var childProcess = require("child_process");

function run(cmd) {
  return childProcess.execSync(cmd, { encoding: "utf-8" }).trim();
}

function getSubmoduleStatus() {
  var output = run("git submodule status");
  if (!output) {
    console.log("No submodules found.");
    return;
  }

  var lines = output.split("\n");
  lines.forEach(function(line) {
    var parts = line.trim().split(" ");
    var hash = parts[0].replace(/^[+-]/, "");
    var path = parts[1];
    var tag = parts[2] || "";

    var prefix = line.trim()[0];
    var status = "up to date";
    if (prefix === "+") status = "MODIFIED (needs commit in parent)";
    if (prefix === "-") status = "NOT INITIALIZED (run git submodule update)";
    if (prefix === "U") status = "MERGE CONFLICT";

    console.log(path + ": " + status);
    console.log("  Commit: " + hash.substring(0, 8) + " " + tag);

    // Check for upstream updates
    try {
      var localHead = run("git -C " + path + " rev-parse HEAD");
      var remoteHead = run("git -C " + path + " rev-parse origin/main");
      if (localHead !== remoteHead) {
        console.log("  Updates available from upstream");
      }
    } catch (err) {
      // Remote not fetched
    }

    console.log("");
  });
}

getSubmoduleStatus();

Subtree Update Script

// scripts/subtree-update.js
var childProcess = require("child_process");

var subtrees = [
  { prefix: "libs/shared", remote: "shared-utils", branch: "main" },
  { prefix: "libs/config", remote: "eslint-config", branch: "main" }
];

function run(cmd) {
  console.log("$ " + cmd);
  try {
    var output = childProcess.execSync(cmd, {
      encoding: "utf-8",
      stdio: "pipe"
    });
    if (output.trim()) console.log(output.trim());
    return true;
  } catch (err) {
    console.error("Failed: " + err.message);
    return false;
  }
}

subtrees.forEach(function(subtree) {
  console.log("\nUpdating " + subtree.prefix + "...");

  // Fetch latest
  run("git fetch " + subtree.remote);

  // Pull updates
  var success = run(
    "git subtree pull --prefix=" + subtree.prefix +
    " " + subtree.remote + " " + subtree.branch + " --squash"
  );

  if (success) {
    console.log(subtree.prefix + " updated successfully.");
  } else {
    console.log(subtree.prefix + " may need manual merge resolution.");
  }
});

Complete Working Example: Multi-Repo Project with Submodules

# Create the shared library
mkdir shared-utils && cd shared-utils
git init
mkdir src

cat > src/validation.js << 'SCRIPT'
var validator = {};

validator.isEmail = function(email) {
  var pattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
  return pattern.test(email);
};

validator.isNotEmpty = function(value) {
  return value !== null && value !== undefined && String(value).trim().length > 0;
};

validator.isInRange = function(value, min, max) {
  var num = Number(value);
  return !isNaN(num) && num >= min && num <= max;
};

module.exports = validator;
SCRIPT

git add . && git commit -m "feat: add validation utilities"
git tag v1.0.0

# Create the API project
cd ..
mkdir project-api && cd project-api
git init
git submodule add ../shared-utils libs/shared

cat > app.js << 'SCRIPT'
var validator = require("./libs/shared/src/validation");

function handleRequest(data) {
  if (!validator.isEmail(data.email)) {
    return { error: "Invalid email address" };
  }
  if (!validator.isNotEmpty(data.name)) {
    return { error: "Name is required" };
  }
  return { success: true };
}

module.exports = { handleRequest: handleRequest };
SCRIPT

git add . && git commit -m "feat: add API with shared validation"

Common Issues and Troubleshooting

Submodule directory is empty after clone

You cloned without --recurse-submodules:

Fix: Run git submodule init && git submodule update in the project root. Or re-clone with git clone --recurse-submodules <url>.

Subtree merge conflicts during pull

The same file was modified in both the parent project and the shared repo:

Fix: Resolve conflicts normally with git mergetool or by editing the conflicted files. The conflict markers show your version and the upstream version. After resolving, git add the files and git commit.

Submodule shows "modified content" in git status but you did not change anything

The submodule has untracked files or the checked-out commit differs from what the parent expects:

Fix: Go into the submodule directory and check its status. Run git submodule update to reset it to the expected commit. Add ignore = dirty to .gitmodules if you want to suppress noise from untracked files in the submodule.

CI/CD pipeline fails because submodule is not initialized

The CI system clones the repo without submodules:

Fix: Add a submodule init step to your CI configuration:

# GitHub Actions
steps:
  - uses: actions/checkout@v4
    with:
      submodules: recursive
# GitLab CI
variables:
  GIT_SUBMODULE_STRATEGY: recursive

Subtree push is extremely slow

git subtree push scans the entire history to extract relevant commits:

Fix: Use git subtree split first to create a branch, then push that branch. This is faster for repos with long histories:

git subtree split --prefix=libs/shared --branch subtree-push
git push shared-utils subtree-push:main
git branch -D subtree-push

Best Practices

  • Document your submodule/subtree strategy in the README. New developers need to know that submodules exist and how to initialize them. Without documentation, they will clone and wonder why directories are empty.
  • Use --recurse-submodules in clone commands. Make it a habit or alias it: git config --global alias.cloner 'clone --recurse-submodules'.
  • Pin submodules to tags, not branches. Tags are immutable references. Pointing a submodule at main means it could break unexpectedly when someone pushes to the shared repo.
  • Squash subtree merges. Without --squash, the shared repo's entire commit history floods your project log. Squash keeps your history clean.
  • Automate submodule updates in CI. Your CI pipeline should fail clearly if submodules are not initialized, not silently skip tests because code is missing.
  • Prefer subtrees for vendored code. When you fork a library to patch it, subtrees keep everything self-contained. No external dependency at clone time.
  • Prefer submodules for large shared code. If the shared repo is 500MB, a subtree copies all of it into every consuming project. Submodules keep a lightweight pointer.

References

Powered by Contentful