Version Control

GitHub Actions Deep Dive: Beyond Basic CI

Advanced guide to GitHub Actions covering reusable workflows, matrix strategies, custom actions, self-hosted runners, and production deployment pipelines

GitHub Actions Deep Dive: Beyond Basic CI

Most teams get their first GitHub Actions workflow running in an afternoon -- a basic lint-and-test pipeline triggered on push. Then they stop. They never touch the workflow file again until something breaks, and when it does, they have no idea how to debug it. This article goes past the beginner setup and into the patterns that make GitHub Actions a serious deployment platform: reusable workflows, matrix strategies, custom actions, self-hosted runners, OIDC authentication, and production deployment pipelines with manual approval gates.

If you are shipping Node.js applications and still running npm test in a single job with no caching, you are leaving significant time and money on the table. Let's fix that.

Prerequisites

  • GitHub account with repository admin access
  • Node.js v18+ installed locally
  • Docker installed (for custom action and container builds)
  • Familiarity with YAML syntax
  • Basic understanding of CI/CD concepts (build, test, deploy)
  • A terminal you are comfortable with (bash, zsh, Git Bash)
  • Optional: a cloud provider account (AWS, GCP, or DigitalOcean) for deployment sections

Advanced Workflow Triggers

Most developers only use push and pull_request triggers. GitHub Actions supports far more granular control over when workflows run.

Push and Pull Request Filters

You can filter by branch, tag, and path. This is critical for monorepos or any project where you do not want every file change to trigger a full build:

name: Targeted CI

on:
  push:
    branches:
      - main
      - 'release/**'
    paths:
      - 'src/**'
      - 'package.json'
      - 'package-lock.json'
    tags:
      - 'v*'
  pull_request:
    branches:
      - main
    paths-ignore:
      - 'docs/**'
      - '*.md'
      - '.github/ISSUE_TEMPLATE/**'

The paths filter is one of the most underused features. In a monorepo with a frontend and backend directory, you can create separate workflows that only trigger when their respective directories change. This saves hundreds of minutes per month on Actions billing.

Scheduled Workflows (Cron)

Cron triggers run workflows on a schedule. They are useful for nightly builds, dependency audits, and data pipelines:

on:
  schedule:
    # Run nightly at 2 AM UTC
    - cron: '0 2 * * *'
    # Run every Monday at 9 AM UTC
    - cron: '0 9 * * 1'

One gotcha: scheduled workflows only run on the default branch. If your workflow file exists on a feature branch but not on main, the schedule will not fire. GitHub also makes no guarantees about exact timing during periods of high load -- your 2 AM job might start at 2:15 AM.

Manual Triggers with workflow_dispatch

This is essential for any workflow you want to run on demand -- deployments, database migrations, cache clearing:

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        type: choice
        options:
          - staging
          - production
      skip_tests:
        description: 'Skip test suite'
        required: false
        type: boolean
        default: false
      version:
        description: 'Version to deploy'
        required: false
        type: string
        default: 'latest'

You access these inputs in your workflow with ${{ github.event.inputs.environment }} or the typed variant ${{ inputs.environment }}. The workflow_dispatch trigger adds a "Run workflow" button to the Actions tab in the GitHub UI.

Repository Dispatch (External Triggers)

repository_dispatch lets external systems trigger workflows via the GitHub API. This is how you integrate Actions with external CI systems, chatbots, or monitoring alerts:

on:
  repository_dispatch:
    types: [deploy, rollback, refresh-cache]

Trigger it from any HTTP client:

curl -X POST \
  -H "Accept: application/vnd.github.v3+json" \
  -H "Authorization: token $GITHUB_TOKEN" \
  https://api.github.com/repos/OWNER/REPO/dispatches \
  -d '{"event_type":"deploy","client_payload":{"env":"staging","ref":"v2.1.0"}}'

Access the payload in your workflow with ${{ github.event.client_payload.env }}. I use this pattern to trigger deployments from Slack bots and to chain workflows across repositories.

Matrix Strategies and Dynamic Matrices

Matrix builds run the same job across multiple configurations in parallel. The basic version tests across Node versions:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [18, 20, 22]
      fail-fast: false
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm ci
      - run: npm test

Setting fail-fast: false is important. Without it, a failure in Node 18 cancels the Node 20 and 22 jobs immediately. You usually want to see all failures, not just the first one.

Multi-Dimensional Matrices

You can combine multiple dimensions. This tests across Node versions and operating systems:

strategy:
  matrix:
    node-version: [18, 20, 22]
    os: [ubuntu-latest, windows-latest, macos-latest]
    include:
      - node-version: 22
        os: ubuntu-latest
        coverage: true
    exclude:
      - node-version: 18
        os: macos-latest

This generates 8 jobs (3x3 minus 1 exclusion). The include entry adds a coverage variable only to the Node 22 / Ubuntu combination.

Dynamic Matrices

For truly dynamic builds, generate the matrix in a preceding job:

jobs:
  prepare:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - id: set-matrix
        run: |
          MATRIX=$(node -e "
            var fs = require('fs');
            var packages = fs.readdirSync('packages').filter(function(d) {
              return fs.statSync('packages/' + d).isDirectory();
            });
            console.log(JSON.stringify({ package: packages }));
          ")
          echo "matrix=$MATRIX" >> $GITHUB_OUTPUT

  test:
    needs: prepare
    runs-on: ubuntu-latest
    strategy:
      matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - run: cd packages/${{ matrix.package }} && npm ci && npm test

This scans the packages/ directory and creates a matrix job for each subdirectory. When you add a new package, the CI automatically picks it up without any workflow changes.

Reusable Workflows and Composite Actions

Once you have more than two or three workflows, you start copying YAML blocks between them. Reusable workflows and composite actions solve this.

Reusable Workflows

A reusable workflow is a complete workflow that other workflows can call. Define it in a separate file:

# .github/workflows/reusable-test.yml
name: Reusable Test Workflow

on:
  workflow_call:
    inputs:
      node-version:
        required: false
        type: string
        default: '20'
      working-directory:
        required: false
        type: string
        default: '.'
    secrets:
      NPM_TOKEN:
        required: false

jobs:
  test:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ inputs.working-directory }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ inputs.node-version }}
          cache: 'npm'
          cache-dependency-path: '${{ inputs.working-directory }}/package-lock.json'
      - run: npm ci
        env:
          NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
      - run: npm test
      - run: npm run lint

Call it from another workflow:

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]

jobs:
  test-api:
    uses: ./.github/workflows/reusable-test.yml
    with:
      node-version: '20'
      working-directory: 'api'
    secrets:
      NPM_TOKEN: ${{ secrets.NPM_TOKEN }}

  test-web:
    uses: ./.github/workflows/reusable-test.yml
    with:
      working-directory: 'web'

You can also reference reusable workflows from other repositories: uses: org/shared-workflows/.github/workflows/test.yml@main. This is how platform teams standardize CI across an entire organization.

Composite Actions

A composite action bundles multiple steps into a single reusable action. Unlike reusable workflows, composite actions run inside a job rather than being a separate job:

# .github/actions/setup-node-project/action.yml
name: 'Setup Node Project'
description: 'Install Node.js, restore cache, install dependencies'

inputs:
  node-version:
    description: 'Node.js version'
    required: false
    default: '20'
  working-directory:
    description: 'Working directory'
    required: false
    default: '.'

runs:
  using: 'composite'
  steps:
    - uses: actions/setup-node@v4
      with:
        node-version: ${{ inputs.node-version }}

    - name: Cache node_modules
      uses: actions/cache@v4
      id: cache-deps
      with:
        path: ${{ inputs.working-directory }}/node_modules
        key: deps-${{ runner.os }}-node${{ inputs.node-version }}-${{ hashFiles(format('{0}/package-lock.json', inputs.working-directory)) }}

    - name: Install dependencies
      if: steps.cache-deps.outputs.cache-hit != 'true'
      shell: bash
      working-directory: ${{ inputs.working-directory }}
      run: npm ci

Use it in a workflow:

steps:
  - uses: actions/checkout@v4
  - uses: ./.github/actions/setup-node-project
    with:
      node-version: '20'
  - run: npm test

I strongly prefer composite actions over reusable workflows for setup logic. Reusable workflows have a call depth limit of four, and they create separate jobs with their own runners. Composite actions run inline and are simpler to reason about.

Environment Secrets and OIDC

Environment-Scoped Secrets

GitHub supports environment-level secrets with protection rules. This is how you gate production deployments:

jobs:
  deploy-staging:
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - run: echo "Deploying to ${{ vars.DEPLOY_URL }}"
        env:
          API_KEY: ${{ secrets.API_KEY }}

  deploy-production:
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment:
      name: production
      url: https://myapp.com
    steps:
      - run: echo "Deploying to production"
        env:
          API_KEY: ${{ secrets.API_KEY }}

In repository settings, you can configure the production environment to require manual approval from specific team members and restrict it to the main branch only. The API_KEY secret can have different values in staging versus production.

OIDC for Cloud Authentication

Hardcoding cloud credentials in GitHub secrets is a security risk. OIDC (OpenID Connect) lets your workflow request short-lived tokens directly from your cloud provider. No long-lived credentials stored anywhere:

jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy
          aws-region: us-east-1

      - run: aws s3 sync ./dist s3://my-app-bucket

The id-token: write permission is required. On the AWS side, you set up an IAM role with a trust policy that only trusts tokens from your specific repository and branch. This is the correct way to handle cloud credentials in 2026 -- if you are still using AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY secrets, migrate to OIDC immediately.

Caching Strategies

Caching is the single biggest optimization you can make. A cold npm ci on a project with 800 dependencies takes 30-45 seconds. With a warm cache, it drops to under 5 seconds.

Node Modules Caching

The built-in cache in actions/setup-node is the simplest approach:

- uses: actions/setup-node@v4
  with:
    node-version: '20'
    cache: 'npm'

For more control, use actions/cache directly:

- name: Cache node_modules
  uses: actions/cache@v4
  id: npm-cache
  with:
    path: node_modules
    key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

- name: Install dependencies
  if: steps.npm-cache.outputs.cache-hit != 'true'
  run: npm ci

The restore-keys fallback is important. If the exact key does not match (because package-lock.json changed), it falls back to the most recent cache with the npm-Linux- prefix. This gives you a partial cache hit -- most modules are already there, and npm ci only downloads the delta.

Docker Layer Caching

Docker builds inside Actions are slow without layer caching. Use docker/build-push-action with cache configuration:

- uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/myorg/myapp:${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

The type=gha cache backend uses GitHub Actions cache storage. The mode=max setting caches all layers, not just the final image layers. This typically reduces Docker build times from 3-5 minutes to 30-60 seconds on subsequent runs.

Custom Caching

Cache anything with a deterministic key. Here is an example caching Playwright browser binaries:

- name: Cache Playwright browsers
  uses: actions/cache@v4
  id: playwright-cache
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

- name: Install Playwright browsers
  if: steps.playwright-cache.outputs.cache-hit != 'true'
  run: npx playwright install --with-deps chromium

Cache storage is limited to 10 GB per repository. Caches not accessed within 7 days are evicted. Monitor your cache usage in the repository's Actions settings.

Artifact Management and Workflow Outputs

Uploading and Downloading Artifacts

Artifacts persist data between jobs and after workflow completion. Use them for test reports, coverage files, and build outputs:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: dist
          path: dist/
          retention-days: 5

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/download-artifact@v4
        with:
          name: dist
          path: dist/
      - run: ls -la dist/

Job Outputs

Pass data between jobs with outputs:

jobs:
  version:
    runs-on: ubuntu-latest
    outputs:
      semver: ${{ steps.get-version.outputs.version }}
      sha_short: ${{ steps.get-sha.outputs.sha }}
    steps:
      - uses: actions/checkout@v4
      - id: get-version
        run: |
          VERSION=$(node -p "require('./package.json').version")
          echo "version=$VERSION" >> $GITHUB_OUTPUT
      - id: get-sha
        run: echo "sha=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT

  build:
    needs: version
    runs-on: ubuntu-latest
    steps:
      - run: echo "Building version ${{ needs.version.outputs.semver }}"

Always use $GITHUB_OUTPUT for setting outputs. The old ::set-output command was deprecated and disabled in 2023. If you see ::set-output in a workflow, update it immediately.

Self-Hosted Runners

GitHub-hosted runners are convenient, but they cost money (at $0.008/minute for Linux) and have limited resources. Self-hosted runners give you more control, cheaper compute, and access to internal networks.

Setup

Install the runner agent on any Linux machine:

# Create a directory for the runner
mkdir actions-runner && cd actions-runner

# Download the latest runner package
curl -o actions-runner-linux-x64-2.311.0.tar.gz -L \
  https://github.com/actions/runner/releases/download/v2.311.0/actions-runner-linux-x64-2.311.0.tar.gz

# Extract
tar xzf ./actions-runner-linux-x64-2.311.0.tar.gz

# Configure - get the token from repository Settings > Actions > Runners
./config.sh --url https://github.com/OWNER/REPO --token YOUR_TOKEN

# Install and start as a systemd service
sudo ./svc.sh install
sudo ./svc.sh start

Reference the runner in your workflow by label:

jobs:
  build:
    runs-on: [self-hosted, linux, x64]
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

Security Considerations

Self-hosted runners are a security surface. Critical rules:

  1. Never use self-hosted runners on public repositories. Anyone who opens a pull request can run arbitrary code on your machine.
  2. Run the agent as a non-root user with minimal permissions.
  3. Use ephemeral runners when possible. The --ephemeral flag causes the runner to de-register after one job.
  4. Isolate runners with containers or VMs. Do not share a runner machine with production workloads.
  5. Keep the runner agent updated. GitHub releases security patches regularly.

For organizations with heavy CI load, I recommend running self-hosted runners on DigitalOcean Droplets or AWS EC2 Spot Instances behind an autoscaler. The actions-runner-controller project handles Kubernetes-based autoscaling.

Conditional Execution and Job Dependencies

Conditional Steps

Use if expressions to control step execution:

steps:
  - name: Run tests
    run: npm test

  - name: Upload coverage
    if: success() && matrix.coverage == true
    run: npx codecov

  - name: Notify on failure
    if: failure()
    run: |
      curl -X POST "${{ secrets.SLACK_WEBHOOK }}" \
        -H 'Content-type: application/json' \
        -d '{"text":"CI failed on ${{ github.ref }}"}'

  - name: Deploy on tag
    if: startsWith(github.ref, 'refs/tags/v')
    run: npm run deploy

Job Dependencies

Use needs to create job dependency chains:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run lint

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

  build:
    needs: [lint, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build

  deploy:
    needs: build
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying..."

The lint and test jobs run in parallel. build only runs when both succeed. deploy only runs on the main branch after build completes.

Conditional Jobs Based on Changed Files

Use dorny/paths-filter to conditionally run entire jobs based on which files changed:

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      api: ${{ steps.filter.outputs.api }}
      web: ${{ steps.filter.outputs.web }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            api:
              - 'api/**'
            web:
              - 'web/**'

  test-api:
    needs: changes
    if: needs.changes.outputs.api == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: cd api && npm ci && npm test

  test-web:
    needs: changes
    if: needs.changes.outputs.web == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: cd web && npm ci && npm test

Custom GitHub Actions

JavaScript Actions

A JavaScript action runs directly on the runner using Node.js. Create the action structure:

.github/actions/notify-deploy/
  action.yml
  index.js
  package.json

Define the action metadata:

# .github/actions/notify-deploy/action.yml
name: 'Notify Deploy'
description: 'Send deployment notification to Slack and update commit status'

inputs:
  slack-webhook:
    description: 'Slack webhook URL'
    required: true
  environment:
    description: 'Deployment environment'
    required: true
  status:
    description: 'Deployment status'
    required: true
    default: 'success'

outputs:
  notification-id:
    description: 'Slack message timestamp'

runs:
  using: 'node20'
  main: 'index.js'

Write the action logic:

// .github/actions/notify-deploy/index.js
var core = require('@actions/core');
var github = require('@actions/github');
var https = require('https');

function sendSlackMessage(webhookUrl, message) {
  return new Promise(function(resolve, reject) {
    var url = new URL(webhookUrl);
    var data = JSON.stringify(message);

    var options = {
      hostname: url.hostname,
      path: url.pathname,
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(data)
      }
    };

    var req = https.request(options, function(res) {
      var body = '';
      res.on('data', function(chunk) { body += chunk; });
      res.on('end', function() { resolve(body); });
    });

    req.on('error', function(err) { reject(err); });
    req.write(data);
    req.end();
  });
}

async function run() {
  try {
    var webhookUrl = core.getInput('slack-webhook', { required: true });
    var environment = core.getInput('environment', { required: true });
    var status = core.getInput('status');

    var context = github.context;
    var emoji = status === 'success' ? ':white_check_mark:' : ':x:';
    var color = status === 'success' ? '#36a64f' : '#dc3545';

    var message = {
      attachments: [{
        color: color,
        blocks: [
          {
            type: 'section',
            text: {
              type: 'mrkdwn',
              text: emoji + ' *Deploy to ' + environment + '* - ' + status +
                '\n*Repo:* ' + context.repo.owner + '/' + context.repo.repo +
                '\n*Commit:* `' + context.sha.substring(0, 7) + '`' +
                '\n*Actor:* ' + context.actor
            }
          }
        ]
      }]
    };

    await sendSlackMessage(webhookUrl, message);
    core.info('Notification sent to Slack for ' + environment + ' deploy');
    core.setOutput('notification-id', Date.now().toString());
  } catch (error) {
    core.setFailed('Failed to send notification: ' + error.message);
  }
}

run();

Install the dependencies and compile:

cd .github/actions/notify-deploy
npm init -y
npm install @actions/core @actions/github

You must commit node_modules or use @vercel/ncc to bundle everything into a single file. I prefer the ncc approach:

npm install -g @vercel/ncc
ncc build index.js -o dist

Then change action.yml to point to main: 'dist/index.js'.

Docker Actions

For actions that need specific tooling or system dependencies, use a Docker-based action:

# .github/actions/db-migrate/action.yml
name: 'Database Migration'
description: 'Run database migrations with connection validation'

inputs:
  database-url:
    description: 'PostgreSQL connection string'
    required: true
  direction:
    description: 'Migration direction (up or down)'
    required: false
    default: 'up'

runs:
  using: 'docker'
  image: 'Dockerfile'
  args:
    - ${{ inputs.database-url }}
    - ${{ inputs.direction }}
# .github/actions/db-migrate/Dockerfile
FROM node:20-alpine

RUN apk add --no-cache postgresql-client

COPY package.json package-lock.json ./
RUN npm ci --production

COPY migrate.js ./

ENTRYPOINT ["node", "migrate.js"]

Docker actions are slower to start (they build the image each run), but they give you complete control over the execution environment.

Workflow Debugging and Troubleshooting

Enable Debug Logging

Set the ACTIONS_STEP_DEBUG secret to true in your repository. This enables verbose logging for all workflow runs. For runner-level diagnostics, set ACTIONS_RUNNER_DEBUG to true as well.

Debug with tmate

For interactive debugging, drop into a shell session mid-workflow:

- name: Debug with tmate
  if: failure()
  uses: mxschmitt/action-tmate@v3
  with:
    limit-access-to-actor: true

This opens an SSH session you can connect to. The limit-access-to-actor setting restricts access to the user who triggered the workflow. Remove this step before merging to main -- you do not want a stale tmate session running on every failure in production.

Examining Context and Event Payloads

Dump the full GitHub context to understand what data is available:

- name: Dump context
  run: |
    echo "Event: ${{ github.event_name }}"
    echo "Ref: ${{ github.ref }}"
    echo "SHA: ${{ github.sha }}"
    echo "Actor: ${{ github.actor }}"
    echo "Workflow: ${{ github.workflow }}"
    echo "Run ID: ${{ github.run_id }}"
    echo "Run number: ${{ github.run_number }}"
    cat "$GITHUB_EVENT_PATH" | jq .

Rerunning Failed Jobs

You can re-run individual failed jobs instead of the entire workflow. In the Actions UI, click the failed run, then "Re-run failed jobs." From the CLI:

# List recent workflow runs
gh run list --limit 5

# Re-run failed jobs only
gh run rerun 12345678 --failed

# Re-run with debug logging enabled
gh run rerun 12345678 --debug

Complete Working Example: Production CI/CD Pipeline

Here is a production-grade pipeline that covers matrix testing, caching, Docker build, and deployment with manual approval:

name: Production CI/CD

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:
    branches: [main]
  workflow_dispatch:
    inputs:
      deploy_env:
        description: 'Deploy to environment'
        type: choice
        options:
          - staging
          - production
        default: staging

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  NODE_VERSION: '20'

jobs:
  # ─── Lint ───────────────────────────────────────────────
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'

      - run: npm ci
      - run: npm run lint
      - run: npx tsc --noEmit || true

  # ─── Test Matrix ────────────────────────────────────────
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [18, 20, 22]
      fail-fast: false

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: testdb
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

      redis:
        image: redis:7
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}

      - name: Cache node_modules
        uses: actions/cache@v4
        id: cache
        with:
          path: node_modules
          key: deps-${{ runner.os }}-node${{ matrix.node-version }}-${{ hashFiles('package-lock.json') }}
          restore-keys: |
            deps-${{ runner.os }}-node${{ matrix.node-version }}-

      - name: Install dependencies
        if: steps.cache.outputs.cache-hit != 'true'
        run: npm ci

      - name: Run tests
        run: npm test -- --coverage
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379
          NODE_ENV: test

      - name: Upload coverage
        if: matrix.node-version == 20
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage/
          retention-days: 7

  # ─── Security Audit ────────────────────────────────────
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --audit-level=high
        continue-on-error: true
      - uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'HIGH,CRITICAL'

  # ─── Docker Build ──────────────────────────────────────
  docker:
    needs: [lint, test]
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    outputs:
      image-tag: ${{ steps.meta.outputs.tags }}
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=sha,prefix=
            type=ref,event=branch
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}

      - id: build
        uses: docker/build-push-action@v5
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          build-args: |
            NODE_VERSION=${{ env.NODE_VERSION }}
            BUILD_DATE=${{ github.event.head_commit.timestamp }}
            GIT_SHA=${{ github.sha }}

  # ─── Deploy Staging ────────────────────────────────────
  deploy-staging:
    needs: docker
    if: github.ref == 'refs/heads/main' || (github.event_name == 'workflow_dispatch' && inputs.deploy_env == 'staging')
    runs-on: ubuntu-latest
    environment:
      name: staging
      url: https://staging.myapp.com

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to staging
        run: |
          echo "Deploying image ${{ needs.docker.outputs.image-tag }}"
          # Replace with your actual deployment command
          # doctl apps create-deployment $APP_ID --wait
        env:
          DIGITALOCEAN_ACCESS_TOKEN: ${{ secrets.DO_TOKEN }}

      - name: Run smoke tests
        run: |
          sleep 10
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.myapp.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "Smoke test failed: health check returned $STATUS"
            exit 1
          fi
          echo "Smoke test passed: health check returned $STATUS"

      - name: Notify Slack
        if: always()
        uses: ./.github/actions/notify-deploy
        with:
          slack-webhook: ${{ secrets.SLACK_WEBHOOK }}
          environment: staging
          status: ${{ job.status }}

  # ─── Deploy Production ─────────────────────────────────
  deploy-production:
    needs: [docker, deploy-staging]
    if: startsWith(github.ref, 'refs/tags/v') || (github.event_name == 'workflow_dispatch' && inputs.deploy_env == 'production')
    runs-on: ubuntu-latest
    environment:
      name: production
      url: https://myapp.com

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to production
        run: |
          echo "Deploying to production: ${{ needs.docker.outputs.image-tag }}"
          # doctl apps create-deployment $PROD_APP_ID --wait
        env:
          DIGITALOCEAN_ACCESS_TOKEN: ${{ secrets.DO_PROD_TOKEN }}

      - name: Run production smoke tests
        run: |
          sleep 15
          node -e "
            var http = require('https');
            var endpoints = ['/health', '/api/status', '/'];
            var failed = [];

            function checkEndpoint(path) {
              return new Promise(function(resolve) {
                http.get('https://myapp.com' + path, function(res) {
                  if (res.statusCode !== 200) {
                    failed.push(path + ' returned ' + res.statusCode);
                  }
                  resolve();
                }).on('error', function(err) {
                  failed.push(path + ' error: ' + err.message);
                  resolve();
                });
              });
            }

            Promise.all(endpoints.map(checkEndpoint)).then(function() {
              if (failed.length > 0) {
                console.error('Smoke tests failed:', failed);
                process.exit(1);
              }
              console.log('All smoke tests passed');
            });
          "

      - name: Create GitHub Release
        if: startsWith(github.ref, 'refs/tags/v')
        uses: softprops/action-gh-release@v2
        with:
          generate_release_notes: true

This pipeline takes approximately 3-4 minutes for the lint/test/build phases with warm caches, and 5-7 minutes end-to-end including staging deployment. Without caching, expect 8-12 minutes.

The key architectural decisions:

  1. Lint and test run in parallel. No reason to gate tests on linting.
  2. Docker build waits for both lint and test. No point building an image if code is broken.
  3. Staging deploys automatically on main. Every merge to main goes to staging.
  4. Production requires a tag or manual trigger. The production environment has required reviewers configured in GitHub settings.
  5. Smoke tests validate each deployment. If the health check fails, the job fails and the Slack notification includes the failure.

Common Issues and Troubleshooting

1. "Resource not accessible by integration"

Error: Resource not accessible by integration

This happens when your workflow lacks the necessary permissions. GitHub Actions uses a restrictive default permission set. Fix it by explicitly declaring permissions:

permissions:
  contents: read
  packages: write
  pull-requests: write

For organization repositories, check Settings > Actions > General > Workflow permissions. If it is set to "Read repository contents," your workflows cannot write anything. The workflow-level permissions key can only restrict permissions, never elevate them beyond the organization setting.

2. Cache Miss on Every Run

Cache not found for input keys: deps-Linux-node20-abc123def456

Common causes:

  • The hashFiles() path is wrong. It is relative to the repository root, not the working directory. Use hashFiles('**/package-lock.json') for monorepos.
  • The cache key includes a volatile value like a timestamp or run ID.
  • Cache was evicted. Caches older than 7 days or exceeding the 10 GB repository limit get purged.

Debug by listing all caches for your repository:

gh cache list --limit 20

3. "Node.js 12 actions are deprecated"

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16:
actions/checkout@v2, actions/cache@v2

This is not a failure yet, but it will be. GitHub is removing Node 12 and Node 16 runner support. Update all actions to their @v4 versions:

# Before (deprecated)
- uses: actions/checkout@v2
- uses: actions/cache@v2

# After
- uses: actions/checkout@v4
- uses: actions/cache@v4

Run this command to find all outdated action references in your workflows:

grep -r "uses:.*@v[12]" .github/workflows/

4. Docker Build Fails with "No Space Left on Device"

Error: write /tmp/buildkit-mount123456/layer.tar: no space left on device

GitHub-hosted runners have 14 GB of available disk space, but Docker images, caches, and pre-installed tools consume most of it. Solutions:

# Free up disk space before building
- name: Free disk space
  run: |
    sudo rm -rf /usr/share/dotnet
    sudo rm -rf /opt/ghc
    sudo rm -rf /usr/local/share/boost
    df -h

# Or use a larger runner
jobs:
  build:
    runs-on: ubuntu-latest-16-cores  # More disk space

For large Docker images, switch to multi-stage builds that produce smaller final images. A Node.js app built with a full node:20 base image might be 1.2 GB. The same app with node:20-alpine and a multi-stage build drops to 80-150 MB.

5. "Workflow Run Cancelled" with No Explanation

This happens when you push multiple commits quickly to the same branch. By default, GitHub Actions queues runs for the same workflow and branch. Add concurrency control to cancel superseded runs:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

This cancels any in-progress run for the same workflow and branch when a new commit is pushed. Use this on PR workflows to avoid wasting minutes on outdated code. Do not use cancel-in-progress: true on deployment workflows -- you do not want a partial deployment.

6. Secrets Not Available in Pull Requests from Forks

Warning: Skipping step because secret 'NPM_TOKEN' is not available

For security, GitHub does not expose repository secrets to workflows triggered by pull requests from forks. This is by design -- a malicious PR could exfiltrate your secrets. Solutions:

  • Use pull_request_target instead of pull_request (but be very careful -- this runs the workflow from the base branch with full secret access).
  • Split your workflow: run tests without secrets on PRs, run the full suite on push to main.
  • Use environment-level secrets with deployment protection rules.

Best Practices

  • Pin action versions to SHA hashes, not tags. Tags can be moved or deleted. Use actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 instead of actions/checkout@v4. At minimum, pin to major versions (@v4), never floating (@main).

  • Use concurrency groups on every PR workflow. Without concurrency control, pushing 5 quick-fix commits generates 5 parallel workflow runs. That is 5x the compute cost for code that is immediately superseded.

  • Set explicit permissions on every workflow. The default GITHUB_TOKEN permissions are too broad. Follow the principle of least privilege. Most CI workflows only need contents: read.

  • Cache aggressively, but validate cache keys. A bad cache is worse than no cache. If your cache key does not include the lockfile hash, you might restore stale dependencies and get phantom test failures that do not reproduce locally.

  • Never store secrets in workflow files or action inputs. Use encrypted secrets, OIDC for cloud credentials, or environment-level secrets with required reviewers. Audit your workflows for any hardcoded tokens, API keys, or passwords.

  • Use timeout-minutes on every job. The default timeout is 6 hours. A stuck job can burn through your Actions minutes budget silently. Most CI jobs should complete in under 15 minutes:

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 15
  • Keep workflows DRY with composite actions. If you find yourself copying the same setup steps (checkout, setup node, install, cache) across multiple workflows, extract them into a composite action in .github/actions/. This reduces maintenance burden and ensures consistency.

  • Monitor your Actions usage. Go to Settings > Billing > Actions to see your monthly minutes consumption. Set up billing alerts. A runaway workflow on a schedule trigger can consume thousands of minutes before anyone notices.

  • Use continue-on-error sparingly and deliberately. Marking a step as continue-on-error: true is sometimes necessary (like for npm audit which might report vulnerabilities in dev dependencies), but overusing it masks real failures. Always add a comment explaining why a step is allowed to fail.

  • Version your workflow files like code. Review workflow changes in pull requests. A syntax error in a workflow file can break CI for the entire team. Use actionlint locally to validate workflow syntax before committing:

# Install actionlint
brew install actionlint

# Lint all workflow files
actionlint .github/workflows/*.yml

References

Powered by Contentful