Pipeline Performance: Caching and Optimization Techniques

Shane

2/8/2026

24 min read

A practical guide to optimizing Azure DevOps pipeline performance with dependency caching, Docker layer caching, parallel execution, shallow checkout, and build analytics.

ci-cd azure-devops performance optimization pipelines caching

Pipeline Performance: Caching and Optimization Techniques

Overview

Slow pipelines are not just an annoyance — they are a direct tax on engineering velocity, developer focus, and infrastructure cost. Azure DevOps provides multiple mechanisms for caching dependencies, parallelizing work, reducing checkout time, and analyzing bottlenecks, but most teams only scratch the surface. This article covers every major optimization technique I have used in production to cut pipeline durations from 25+ minutes down to under 6 minutes on real Node.js and .NET projects.

Prerequisites

An Azure DevOps organization with Pipelines enabled
Familiarity with YAML pipeline syntax (triggers, stages, jobs, steps)
A project with dependency management (npm, NuGet, pip, or similar)
Basic understanding of Docker builds (for the Docker caching sections)
Access to pipeline analytics in Azure DevOps (Project Settings > Pipelines > Analytics)

Why Pipeline Speed Matters

I have seen teams shrug off a 20-minute pipeline as "just how CI works." That attitude costs more than people realize.

Developer feedback loops. A developer pushes a commit and opens a pull request. If the pipeline takes 18 minutes, they context-switch to something else. When the results come back — maybe a test failure — they have to reload the original context. Research consistently shows that context switching adds 15-25 minutes of recovery time. A fast pipeline that returns results in 4 minutes keeps developers in flow.

Cost. Microsoft-hosted agents bill by the minute. If you are running 200 pipeline executions per day at 18 minutes each, that is 3,600 agent-minutes daily. Cut that to 5 minutes and you drop to 1,000 agent-minutes — a 72% cost reduction. On self-hosted agents, faster pipelines mean fewer agents to maintain.

Merge velocity. Slow pipelines create PR bottlenecks. When the build queue backs up, developers stack PRs and merge conflicts multiply. Fast pipelines keep the queue moving.

The goal is not to obsess over shaving seconds. The goal is to make the pipeline fast enough that developers never have a reason to skip it.

The Cache Task

Azure DevOps provides the Cache@2 task for persisting directories between pipeline runs. The task stores a compressed archive of the specified path, keyed by a hash you define. On subsequent runs, if the key matches, the cached directory is restored instead of being rebuilt from scratch.

steps:
  - task: Cache@2
    displayName: 'Cache npm dependencies'
    inputs:
      key: 'npm | "$(Agent.OS)" | package-lock.json'
      restoreKeys: |
        npm | "$(Agent.OS)"
      path: '$(Pipeline.Workspace)/.npm'
    # The CACHE_RESTORED variable is set to 'true' if cache was hit

The cache is scoped to the project and branch by default. A cache created on the main branch is available to feature branches, but a cache created on a feature branch is not available to main. This prevents feature branch experiments from polluting the main cache.

Key things to know:

Cache size limit: 10 GB per cache entry. Most dependency caches are well under this.
Retention: Caches expire after 7 days of not being accessed.
Storage: Caches are stored in Azure Artifacts storage in your organization's region.
Scope: Caches are scoped to project + pipeline + branch (with fallback to the default branch).

Cache Keys and Restore Keys

The cache key is a pipe-delimited string. Each segment is either a literal string or a file path that gets hashed. This is where most people get the design wrong.

Hash-Based Keys

# Good: precise key based on lockfile content
key: 'npm | "$(Agent.OS)" | package-lock.json'

# Also good: multiple lockfiles in a monorepo
key: 'npm | "$(Agent.OS)" | packages/**/package-lock.json'

# Bad: no file hash means the cache never invalidates
key: 'npm | "$(Agent.OS)"'

When you include a file path like package-lock.json, Azure DevOps hashes the file contents and uses that hash as part of the key. When the lockfile changes (new dependencies added), the hash changes, and the old cache is bypassed.

Fallback with Restore Keys

Restore keys provide fallback behavior when the exact key does not match. The pipeline tries the exact key first, then each restore key in order, using the most recently created cache that matches.

- task: Cache@2
  inputs:
    key: 'npm | "$(Agent.OS)" | package-lock.json'
    restoreKeys: |
      npm | "$(Agent.OS)"
      npm
    path: '$(Pipeline.Workspace)/.npm'

With this configuration:

First, try to find a cache matching the exact hash of package-lock.json
If not found, try any cache matching npm | "$(Agent.OS)" (same OS, any lockfile version)
If not found, try any cache matching npm (any OS, any lockfile version)
If nothing matches, proceed without cache

A partial cache hit with a restore key is still valuable. Restoring 95% of your dependencies from cache and installing the remaining 5% is far faster than installing everything from scratch.

Caching npm Dependencies

There are two strategies for caching npm, and the choice matters.

Strategy 1: Cache the npm Cache Folder

This caches npm's internal cache directory, not node_modules itself. npm still runs npm ci on every build, but it pulls packages from the local cache instead of the registry.

variables:
  npm_config_cache: $(Pipeline.Workspace)/.npm

steps:
  - task: Cache@2
    displayName: 'Cache npm'
    inputs:
      key: 'npm | "$(Agent.OS)" | package-lock.json'
      restoreKeys: |
        npm | "$(Agent.OS)"
      path: $(npm_config_cache)

  - script: npm ci
    displayName: 'Install dependencies'

Typical savings: npm ci drops from ~45 seconds to ~15 seconds on a medium project (150 dependencies).

Strategy 2: Cache node_modules Directly

This skips the install step entirely when the cache hits. It is faster but carries a risk — if node_modules gets into a corrupted state, the corruption persists until the cache key changes.

steps:
  - task: Cache@2
    displayName: 'Cache node_modules'
    inputs:
      key: 'node_modules | "$(Agent.OS)" | package-lock.json'
      path: 'node_modules'

  - script: npm ci
    displayName: 'Install dependencies'
    condition: ne(variables.CACHE_RESTORED, 'true')

Typical savings: The entire install step is skipped. Saves 30-60 seconds depending on project size.

I recommend Strategy 1 for most teams. Strategy 2 is faster but harder to debug when things go wrong. If you do use Strategy 2, always use npm ci (not npm install) when the cache misses, because npm ci deletes node_modules and does a clean install from the lockfile.

Caching NuGet Packages

The pattern for NuGet is similar. Cache the global packages folder and let dotnet restore pull from it.

variables:
  NUGET_PACKAGES: $(Pipeline.Workspace)/.nuget/packages

steps:
  - task: Cache@2
    displayName: 'Cache NuGet packages'
    inputs:
      key: 'nuget | "$(Agent.OS)" | **/packages.lock.json'
      restoreKeys: |
        nuget | "$(Agent.OS)"
      path: $(NUGET_PACKAGES)

  - task: DotNetCoreCLI@2
    displayName: 'Restore'
    inputs:
      command: 'restore'
      projects: '**/*.csproj'

One gotcha: you need to enable NuGet lock files for this to work. Add <RestorePackagesWithLockFile>true</RestorePackagesWithLockFile> to your Directory.Build.props or individual .csproj files. Without lock files, you do not have a stable file to hash for the cache key.

Typical savings: Restore drops from ~30 seconds to ~8 seconds on a solution with 40+ packages.

Caching Docker Layers

Docker layer caching is the single biggest optimization for pipelines that build container images. A typical Node.js Docker build without caching takes 90-180 seconds. With proper layer caching, rebuilds that only changed application code take 15-25 seconds.

BuildKit Inline Cache

Docker BuildKit supports embedding cache metadata directly in the image. You push the image with cache metadata, and subsequent builds can pull from it.

steps:
  - task: Docker@2
    displayName: 'Build and push with cache'
    inputs:
      containerRegistry: 'myACR'
      repository: 'myapp'
      command: 'build'
      Dockerfile: 'Dockerfile'
      arguments: |
        --build-arg BUILDKIT_INLINE_CACHE=1
        --cache-from $(containerRegistry)/myapp:latest
        --tag $(containerRegistry)/myapp:$(Build.BuildId)
        --tag $(containerRegistry)/myapp:latest
    env:
      DOCKER_BUILDKIT: 1

Registry-Backed Cache

For more advanced scenarios, BuildKit can push cache layers to a separate registry reference:

steps:
  - script: |
      docker buildx build \
        --cache-from type=registry,ref=$(containerRegistry)/myapp:cache \
        --cache-to type=registry,ref=$(containerRegistry)/myapp:cache,mode=max \
        --tag $(containerRegistry)/myapp:$(Build.BuildId) \
        --push \
        .
    displayName: 'Build with registry cache'
    env:
      DOCKER_BUILDKIT: 1

The mode=max flag exports all layers, not just the final image layers. This maximizes cache hits on multi-stage builds.

Dockerfile Optimization for Caching

Layer ordering in your Dockerfile has a direct impact on cache hit rates. Put things that change least frequently first:

FROM node:20-alpine AS build

WORKDIR /app

# Layer 1: package files (changes occasionally)
COPY package.json package-lock.json ./

# Layer 2: install (only reruns when package files change)
RUN npm ci --production

# Layer 3: application code (changes every build)
COPY . .

RUN npm run build

If you put COPY . . before RUN npm ci, every code change invalidates the install layer and you lose the cache entirely.

Caching Build Outputs Between Stages

Multi-stage pipelines need to pass build artifacts between stages. You have two options: pipeline artifacts and the cache task.

Pipeline Artifacts

Use PublishPipelineArtifact and DownloadPipelineArtifact when the output is the final deliverable (a compiled binary, a Docker image tag, a deployment package).

stages:
  - stage: Build
    jobs:
      - job: BuildApp
        steps:
          - script: npm run build
          - task: PublishPipelineArtifact@1
            inputs:
              targetPath: '$(System.DefaultWorkingDirectory)/dist'
              artifact: 'app-dist'

  - stage: Deploy
    dependsOn: Build
    jobs:
      - job: DeployApp
        steps:
          - task: DownloadPipelineArtifact@2
            inputs:
              artifact: 'app-dist'
              path: '$(Pipeline.Workspace)/dist'

Cache Task for Intermediate Outputs

Use the cache task when the output is an intermediate artifact that speeds up rebuilds but is not a deliverable — compiled object files, TypeScript output, Webpack bundles.

steps:
  - task: Cache@2
    inputs:
      key: 'build-output | "$(Agent.OS)" | src/**/*.ts'
      path: 'dist'

  - script: npm run build
    condition: ne(variables.CACHE_RESTORED, 'true')

Pipeline artifacts are always available in downstream stages. Cached files are only available on the same agent type. Use artifacts for cross-stage data flow and caching for within-job speed improvements.

Optimizing Checkout

The default checkout: self step clones the entire repository history. For large repositories, this is a significant time sink.

Shallow Fetch

Fetch only the latest commit instead of the full history:

steps:
  - checkout: self
    fetchDepth: 1
    displayName: 'Shallow checkout'

Typical savings on a repo with 10,000 commits: Checkout drops from ~35 seconds to ~5 seconds.

Setting fetchDepth: 1 fetches only the tip commit. If your build needs recent history (for changelog generation, for example), use fetchDepth: 50 or whatever depth covers your needs.

Sparse Checkout

For monorepos, you can check out only the directories you need:

steps:
  - checkout: self
    fetchDepth: 1
    fetchFilter: 'blob:none'
    displayName: 'Treeless checkout'

If you need only specific paths, use the checkout step with submodules and a .gitconfig sparse-checkout pattern, or use a script step:

steps:
  - checkout: self
    fetchDepth: 1

  - script: |
      git sparse-checkout init --cone
      git sparse-checkout set packages/api packages/shared
    displayName: 'Sparse checkout - API only'

Disable Tagging

Azure DevOps tags each successful build with a Git tag by default. On high-frequency pipelines, this adds unnecessary overhead:

steps:
  - checkout: self
    fetchDepth: 1
    fetchTags: false

Parallel Job Strategies

Running jobs in parallel is the most effective way to reduce wall-clock time. A pipeline with 4 test suites that each take 5 minutes runs in 5 minutes when parallelized, instead of 20.

Matrix Strategy

jobs:
  - job: Test
    strategy:
      matrix:
        unit:
          TEST_SUITE: 'unit'
        integration:
          TEST_SUITE: 'integration'
        e2e:
          TEST_SUITE: 'e2e'
      maxParallel: 3
    steps:
      - checkout: self
        fetchDepth: 1
      - script: npm ci
      - script: npm run test:$(TEST_SUITE)
        displayName: 'Run $(TEST_SUITE) tests'

Fan-Out for Monorepo Packages

jobs:
  - job: Test
    strategy:
      matrix:
        api:
          PACKAGE: 'packages/api'
        web:
          PACKAGE: 'packages/web'
        shared:
          PACKAGE: 'packages/shared'
    steps:
      - checkout: self
        fetchDepth: 1
      - script: |
          cd $(PACKAGE)
          npm ci
          npm test
        displayName: 'Test $(PACKAGE)'

Dynamic Parallelism with Each

For large test suites, split tests across N agents dynamically:

parameters:
  - name: testShards
    type: object
    default:
      - shard: 1
        total: 4
      - shard: 2
        total: 4
      - shard: 3
        total: 4
      - shard: 4
        total: 4

jobs:
  - ${{ each testShard in parameters.testShards }}:
    - job: Test_Shard_${{ testShard.shard }}
      steps:
        - script: |
            npx jest --shard=${{ testShard.shard }}/${{ testShard.total }}
          displayName: 'Run test shard ${{ testShard.shard }}/${{ testShard.total }}'

Incremental Builds

Running every build step on every commit is wasteful when only a few files changed. Incremental builds detect what changed and skip unnecessary work.

Path-Based Triggers

Limit which paths trigger the pipeline at all:

trigger:
  branches:
    include:
      - main
  paths:
    include:
      - packages/api/**
    exclude:
      - '**/*.md'
      - docs/**

Detecting Changed Files in Scripts

For monorepos where you want one pipeline but conditional execution:

#!/bin/bash
CHANGED_FILES=$(git diff --name-only HEAD~1 HEAD)

if echo "$CHANGED_FILES" | grep -q "^packages/api/"; then
  echo "##vso[task.setvariable variable=BUILD_API]true"
fi

if echo "$CHANGED_FILES" | grep -q "^packages/web/"; then
  echo "##vso[task.setvariable variable=BUILD_WEB]true"
fi

steps:
  - script: ./scripts/detect-changes.sh
    displayName: 'Detect changes'

  - script: |
      cd packages/api
      npm ci && npm run build && npm test
    displayName: 'Build and test API'
    condition: eq(variables.BUILD_API, 'true')

  - script: |
      cd packages/web
      npm ci && npm run build && npm test
    displayName: 'Build and test Web'
    condition: eq(variables.BUILD_WEB, 'true')

Optimizing Test Execution

Tests typically consume 40-60% of total pipeline time. Optimizing them has outsized impact.

Test Splitting

Jest supports sharding natively:

steps:
  - script: npx jest --shard=1/4
    displayName: 'Test shard 1'

For other frameworks, split by file count:

// scripts/get-test-files.js
var fs = require('fs');
var path = require('path');
var glob = require('glob');

var shardIndex = parseInt(process.env.SHARD_INDEX, 10);
var shardTotal = parseInt(process.env.SHARD_TOTAL, 10);
var testFiles = glob.sync('**/*.test.js', { ignore: 'node_modules/**' });

testFiles.sort();

var shardFiles = testFiles.filter(function(file, index) {
  return index % shardTotal === shardIndex;
});

console.log(shardFiles.join(' '));

steps:
  - script: |
      TEST_FILES=$(node scripts/get-test-files.js)
      npx jest $TEST_FILES
    env:
      SHARD_INDEX: $(System.JobPositionInPhase)
      SHARD_TOTAL: $(System.TotalJobsInPhase)
    displayName: 'Run test shard'

Running Only Affected Tests

If you use Jest, the --changedSince flag runs only tests related to changed files:

steps:
  - checkout: self
    fetchDepth: 0  # Need full history for diff

  - script: npx jest --changedSince=origin/main
    displayName: 'Run affected tests'

This can reduce test execution from 8 minutes to 30 seconds on a large codebase when only a few files changed. It is ideal for PR validation pipelines where you know the base branch.

Agent Selection and Sizing

The agent you run on has a direct impact on pipeline performance.

Microsoft-Hosted vs Self-Hosted

Factor	Microsoft-Hosted	Self-Hosted
Startup time	15-45 seconds (cold start)	Near-instant
Disk I/O	Moderate (standard SSD)	Configurable (NVMe)
CPU	2 vCPU standard	Configurable
Cache persistence	None (clean VM every run)	Persistent across runs
Docker layer cache	None (rebuilt every time)	Persists between builds
Maintenance	Zero	You manage updates

For build-heavy pipelines, self-hosted agents with NVMe storage and persistent Docker caches can cut build times by 50-70%. The tradeoff is operational overhead.

Right-Sizing Self-Hosted Agents

Profile your builds to match agent specs to workload:

# For compile-heavy workloads (C++, Rust, large TypeScript projects)
# Use: 8 vCPU, 16 GB RAM, NVMe SSD

# For test-heavy workloads (Node.js, Python)
# Use: 4 vCPU, 8 GB RAM, standard SSD

# For Docker builds
# Use: 4 vCPU, 8 GB RAM, 100 GB+ SSD for layer cache

A self-hosted agent running on a Standard_D8s_v5 Azure VM (8 vCPU, 32 GB) compiled a TypeScript monorepo with 200,000+ lines in 45 seconds. The same build on a Microsoft-hosted agent took 3 minutes 20 seconds. The VM costs $280/month. The time savings across 50 daily builds paid for it within the first week.

Reducing Task Overhead

Every task in a pipeline has startup overhead — typically 2-5 seconds for downloading the task, initializing the runtime, and reporting telemetry. On a pipeline with 25 tasks, that is 50-125 seconds of pure overhead.

Combine Script Steps

# Bad: 4 separate steps = ~15 seconds overhead
steps:
  - script: echo "Setting up environment"
  - script: export NODE_ENV=production
  - script: npm run lint
  - script: npm run build

# Good: 1 step = ~4 seconds overhead
steps:
  - script: |
      export NODE_ENV=production
      npm run lint
      npm run build
    displayName: 'Lint and build'

Skip Unnecessary Tasks

Disable features you do not use:

steps:
  - checkout: self
    fetchDepth: 1
    fetchTags: false
    lfs: false
    submodules: false
    clean: false

Setting clean: false reuses the working directory from the previous run on self-hosted agents. Combined with caching, this means the agent already has node_modules populated. The pipeline skips both the cache restore and the install step.

Avoid Post-Job Cleanup on Self-Hosted

pool:
  name: 'MySelfHostedPool'

workspace:
  clean: outputs  # Only clean build outputs, not sources or dependencies

Pipeline Analytics for Identifying Bottlenecks

Azure DevOps provides built-in analytics that most teams never look at. Navigate to Pipelines > Analytics to access them.

Key Metrics to Track

P50 and P95 duration: The median tells you typical performance. The 95th percentile tells you about flaky slow runs.
Task duration breakdown: Identifies which tasks consume the most time.
Queue time: How long builds wait for an available agent. High queue times mean you need more agents or faster builds.
Pass rate: Low pass rates mean wasted compute on failing builds. Fix flaky tests before optimizing speed.

Querying Analytics via API

// scripts/pipeline-metrics.js
var https = require('https');

var org = process.env.ADO_ORG;
var project = process.env.ADO_PROJECT;
var pat = process.env.ADO_PAT;

var url = 'https://analytics.dev.azure.com/' + org + '/' + project +
  '/_odata/v3.0-preview/PipelineRuns?' +
  '$filter=PipelineName eq \'MyPipeline\' and CompletedDate ge 2026-01-01Z' +
  '&$select=PipelineRunId,TotalDurationSeconds,QueueDurationSeconds,RunResult' +
  '&$orderby=CompletedDate desc' +
  '&$top=100';

var options = {
  headers: {
    'Authorization': 'Basic ' + Buffer.from(':' + pat).toString('base64')
  }
};

https.get(url, options, function(res) {
  var data = '';
  res.on('data', function(chunk) { data += chunk; });
  res.on('end', function() {
    var runs = JSON.parse(data).value;
    var durations = runs.map(function(r) { return r.TotalDurationSeconds; });
    durations.sort(function(a, b) { return a - b; });

    var p50 = durations[Math.floor(durations.length * 0.5)];
    var p95 = durations[Math.floor(durations.length * 0.95)];

    console.log('P50 duration: ' + Math.round(p50 / 60) + ' minutes');
    console.log('P95 duration: ' + Math.round(p95 / 60) + ' minutes');
    console.log('Total runs: ' + runs.length);
  });
});

Build Output Artifact Optimization

Large artifacts slow down uploads and downloads between stages. Compress and filter them.

Exclude Unnecessary Files

steps:
  - task: PublishPipelineArtifact@1
    inputs:
      targetPath: '$(Build.ArtifactStagingDirectory)'
      artifact: 'drop'
    # Before publishing, copy only what you need
  - script: |
      mkdir -p $(Build.ArtifactStagingDirectory)/app
      cp -r dist/ $(Build.ArtifactStagingDirectory)/app/
      cp package.json $(Build.ArtifactStagingDirectory)/app/
      cp package-lock.json $(Build.ArtifactStagingDirectory)/app/
    displayName: 'Stage artifacts'

Compress Before Upload

steps:
  - script: |
      tar -czf $(Build.ArtifactStagingDirectory)/app.tar.gz \
        -C dist .
    displayName: 'Compress build output'

  - task: PublishPipelineArtifact@1
    inputs:
      targetPath: '$(Build.ArtifactStagingDirectory)/app.tar.gz'
      artifact: 'app-compressed'

A 450 MB uncompressed dist folder containing JavaScript bundles and source maps compresses to ~85 MB. Upload time drops from 40 seconds to 8 seconds on a typical agent connection.

Complete Working Example

Here is a production pipeline for a Node.js monorepo with three packages. It incorporates npm caching, Docker layer caching, parallel test execution, shallow checkout, and timing instrumentation.

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main
  paths:
    exclude:
      - '**/*.md'
      - docs/**

pr:
  branches:
    include:
      - main

variables:
  npm_config_cache: $(Pipeline.Workspace)/.npm
  DOCKER_BUILDKIT: 1
  containerRegistry: 'myacr.azurecr.io'

stages:
  - stage: Build
    displayName: 'Build & Test'
    jobs:
      # ============================================
      # Shared: Install + Lint + Build
      # ============================================
      - job: BuildAll
        displayName: 'Install, Lint, Build'
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - checkout: self
            fetchDepth: 1
            fetchTags: false
            displayName: 'Shallow checkout'

          - task: Cache@2
            displayName: 'Restore npm cache'
            inputs:
              key: 'npm | "$(Agent.OS)" | package-lock.json | packages/**/package-lock.json'
              restoreKeys: |
                npm | "$(Agent.OS)"
              path: $(npm_config_cache)

          - script: |
              echo "##vso[task.setvariable variable=INSTALL_START]$(date +%s)"
              npm ci
              echo "##vso[task.setvariable variable=INSTALL_END]$(date +%s)"
            displayName: 'Install dependencies'

          - script: |
              npm run lint --workspaces
            displayName: 'Lint all packages'

          - script: |
              npm run build --workspaces
            displayName: 'Build all packages'

          - task: PublishPipelineArtifact@1
            inputs:
              targetPath: 'packages/api/dist'
              artifact: 'api-dist'

          - task: PublishPipelineArtifact@1
            inputs:
              targetPath: 'packages/web/dist'
              artifact: 'web-dist'

      # ============================================
      # Parallel Test Execution (3 shards)
      # ============================================
      - job: Test
        displayName: 'Test'
        dependsOn: []  # Run in parallel with BuildAll
        pool:
          vmImage: 'ubuntu-latest'
        strategy:
          matrix:
            shard_1:
              SHARD: '1/3'
            shard_2:
              SHARD: '2/3'
            shard_3:
              SHARD: '3/3'
          maxParallel: 3
        steps:
          - checkout: self
            fetchDepth: 1
            fetchTags: false

          - task: Cache@2
            displayName: 'Restore npm cache'
            inputs:
              key: 'npm | "$(Agent.OS)" | package-lock.json | packages/**/package-lock.json'
              restoreKeys: |
                npm | "$(Agent.OS)"
              path: $(npm_config_cache)

          - script: npm ci
            displayName: 'Install dependencies'

          - script: |
              echo "Test shard $(SHARD) starting at $(date)"
              npx jest --shard=$(SHARD) \
                --ci \
                --coverage \
                --reporters=default \
                --reporters=jest-junit
            displayName: 'Run test shard $(SHARD)'
            env:
              JEST_JUNIT_OUTPUT_DIR: $(System.DefaultWorkingDirectory)/test-results

          - task: PublishTestResults@2
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: 'test-results/*.xml'
              mergeTestResults: true
            condition: always()

  # ============================================
  # Docker Build with Layer Caching
  # ============================================
  - stage: Docker
    displayName: 'Build Docker Image'
    dependsOn: Build
    jobs:
      - job: DockerBuild
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - checkout: self
            fetchDepth: 1
            fetchTags: false

          - task: DownloadPipelineArtifact@2
            inputs:
              artifact: 'api-dist'
              path: '$(Pipeline.Workspace)/api-dist'

          - task: Docker@2
            displayName: 'Login to ACR'
            inputs:
              containerRegistry: 'myACR'
              command: 'login'

          - script: |
              echo "Docker build starting at $(date)"
              START=$(date +%s)

              docker buildx build \
                --cache-from type=registry,ref=$(containerRegistry)/api:cache \
                --cache-to type=registry,ref=$(containerRegistry)/api:cache,mode=max \
                --tag $(containerRegistry)/api:$(Build.BuildId) \
                --tag $(containerRegistry)/api:latest \
                --push \
                --file packages/api/Dockerfile \
                .

              END=$(date +%s)
              echo "Docker build completed in $((END - START)) seconds"
            displayName: 'Build and push with layer cache'

  # ============================================
  # Deploy (gated)
  # ============================================
  - stage: Deploy
    displayName: 'Deploy to Production'
    dependsOn: Docker
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - deployment: DeployProd
        environment: 'production'
        pool:
          vmImage: 'ubuntu-latest'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    echo "Deploying $(containerRegistry)/api:$(Build.BuildId)"
                    # Your deployment commands here
                  displayName: 'Deploy to production'

Before/After Timing Comparison

Step	Before Optimization	After Optimization	Savings
Checkout	35s	5s	86%
npm install	48s	12s (cached)	75%
Lint	25s	25s	0%
Build	40s	40s	0%
Tests (total)	12m 30s	4m 10s (3 shards)	67%
Docker build	2m 45s	28s (layer cache)	83%
Artifact upload	38s	9s (compressed)	76%
Total wall-clock	18m 41s	5m 29s	71%

The total savings come from three techniques stacked together: caching eliminates redundant downloads, parallelism divides test time by the shard count, and shallow checkout eliminates unnecessary Git history.

Common Issues & Troubleshooting

Issue 1: Cache Not Being Restored

Error: No matching cache found. The task proceeds without a cache hit, and installs take full time every run.

##[debug]Cache miss - no cache found matching key: npm | "Linux" | a4c8f32b...
##[debug]Attempted restore keys:
##[debug]  npm | "Linux"
##[debug]No matching restore key found.

Cause: The cache key includes a hash of a file that does not exist or whose path is wrong. Common when using glob patterns like **/package-lock.json in a repo where some packages do not have lockfiles.

Fix: Verify the file paths in your cache key. Run find . -name "package-lock.json" locally to confirm the patterns match. Also check that the cache was populated by a previous successful run on the same branch or the default branch.

Issue 2: Docker Cache Pull Fails

Error:

ERROR: failed to solve: failed to fetch cache from myacr.azurecr.io/api:cache:
  unexpected status: 401 Unauthorized

Cause: The Docker login step succeeded for image push, but the --cache-from pull happens before the build context is fully authenticated, or the cache image reference does not exist yet.

Fix: Ensure the Docker login happens before the docker buildx build command. For the first run, the cache image does not exist yet — the build will proceed without cache on the initial run and populate the cache for subsequent runs. Add || true to ignore cache pull failures:

- script: |
    docker pull $(containerRegistry)/api:cache || true
    docker buildx build \
      --cache-from type=registry,ref=$(containerRegistry)/api:cache \
      ...

Issue 3: Test Sharding Produces Uneven Distribution

Symptom: Shard 1 takes 6 minutes while shard 3 takes 45 seconds. Total time is still 6 minutes because the slowest shard is the bottleneck.

Cause: Jest's default sharding splits by file count, not by execution time. A shard that gets the integration test files takes much longer.

Fix: Use --shard with --testPathPattern to manually distribute slow test files, or use a test timing file:

- script: |
    npx jest --shard=$(SHARD) \
      --json \
      --outputFile=test-timing.json
  displayName: 'Run tests with timing'

# Upload timing data as artifact for future rebalancing
- task: PublishPipelineArtifact@1
  inputs:
    targetPath: 'test-timing.json'
    artifact: 'test-timing-$(System.JobPositionInPhase)'

Then write a script that reads historical timing data and produces balanced shard assignments.

Issue 4: Cache Keeps Growing and Hits Size Limit

Error:

##[error]Cache upload failed: The cache entry size (11534336000 bytes) exceeds
the maximum allowed size (10737418240 bytes).

Cause: Caching node_modules directly on a project with native modules (sharp, bcrypt, canvas) produces a cache that exceeds 10 GB.

Fix: Switch from caching node_modules to caching the npm cache folder (~/.npm or the npm_config_cache path). The npm cache stores compressed tarballs, which are significantly smaller than the extracted node_modules. For a project where node_modules is 4.2 GB, the npm cache is typically 800 MB.

Issue 5: Self-Hosted Agent Disk Full After Cache Accumulation

Error:

##[error]No space left on device

Cause: The Cache@2 task stores cached content in $(Pipeline.Workspace)/.cachesalt and the agent work directory accumulates across runs. Combined with Docker layer caches and build outputs, the disk fills up.

Fix: Schedule a maintenance job that cleans old caches and Docker images:

# maintenance-pipeline.yml
schedules:
  - cron: '0 2 * * 0'  # Weekly at 2 AM Sunday
    displayName: 'Weekly cleanup'
    branches:
      include:
        - main

steps:
  - script: |
      docker system prune -af --volumes
      rm -rf $(Agent.WorkFolder)/_work/_temp/*
      rm -rf $(Agent.WorkFolder)/_work/*/s/node_modules
    displayName: 'Clean agent disk'

Best Practices

Always use fetchDepth: 1 unless you need Git history. This is the single easiest optimization and it applies to every pipeline. The only exception is when you run git log or git diff against historical commits during the build.
Cache the package manager cache, not node_modules. Caching node_modules directly is faster on hit but fragile. Corrupted caches cause mysterious build failures that are hard to diagnose. The npm/NuGet cache folder approach is slower by a few seconds but far more reliable.
Parallelize tests before optimizing individual test speed. Splitting a 12-minute test suite across 4 shards gives you a 3-minute suite. No amount of individual test optimization will match that improvement for the same effort.
Profile before optimizing. Use Pipeline Analytics to identify the actual bottleneck. I have seen teams spend a week optimizing Docker builds when 60% of their pipeline time was in test execution. Check the data first.
Use condition: ne(variables.CACHE_RESTORED, 'true') to skip install steps on cache hit. This turns a 45-second install into a 0-second skip. It works with the Cache@2 task, which automatically sets the CACHE_RESTORED variable.
Set clean: false on self-hosted agents for incremental builds. Combined with caching, the working directory persists between runs. Source files are already checked out, dependencies are already installed, and only changed files need processing.
Compress artifacts before publishing. A tar.gz of your build output is typically 5-8x smaller than the raw directory. This directly reduces upload and download time between stages.
Monitor cache hit rates over time. A cache that hits 95% of the time is working well. A cache that hits 40% of the time probably has a key that is too specific or a branch-scoping issue. Track this metric monthly.
Do not cache everything. Caching has overhead — the restore step itself takes 5-15 seconds depending on cache size. If your install step only takes 10 seconds, caching it adds complexity for zero gain. Only cache operations that take more than 20-30 seconds.

Pipeline Performance: Caching and Optimization Techniques

Overview

Prerequisites

Why Pipeline Speed Matters

The Cache Task

Cache Keys and Restore Keys

Hash-Based Keys

Fallback with Restore Keys

Caching npm Dependencies

Strategy 1: Cache the npm Cache Folder

Strategy 2: Cache node_modules Directly

Caching NuGet Packages

Caching Docker Layers

BuildKit Inline Cache

Registry-Backed Cache

Dockerfile Optimization for Caching

Caching Build Outputs Between Stages

Pipeline Artifacts

Cache Task for Intermediate Outputs

Optimizing Checkout

Shallow Fetch

Sparse Checkout

Disable Tagging

Parallel Job Strategies

Matrix Strategy

Fan-Out for Monorepo Packages

Dynamic Parallelism with Each

Incremental Builds

Path-Based Triggers

Detecting Changed Files in Scripts

Optimizing Test Execution

Test Splitting

Running Only Affected Tests

Agent Selection and Sizing

Microsoft-Hosted vs Self-Hosted

Right-Sizing Self-Hosted Agents

Reducing Task Overhead

Combine Script Steps

Skip Unnecessary Tasks

Avoid Post-Job Cleanup on Self-Hosted

Pipeline Analytics for Identifying Bottlenecks

Key Metrics to Track

Querying Analytics via API

Build Output Artifact Optimization

Exclude Unnecessary Files

Compress Before Upload

Complete Working Example

Before/After Timing Comparison

Common Issues & Troubleshooting

Issue 1: Cache Not Being Restored

Issue 2: Docker Cache Pull Fails

Issue 3: Test Sharding Produces Uneven Distribution

Issue 4: Cache Keeps Growing and Hits Size Limit

Issue 5: Self-Hosted Agent Disk Full After Cache Accumulation

Best Practices

References

Quick Links

Recommended Reading

Retrieval Augmented Generation with Node.js

Need Expert Help?