GitHub Actions Deep Dive: Beyond Basic CI
Advanced guide to GitHub Actions covering reusable workflows, matrix strategies, custom actions, self-hosted runners, and production deployment pipelines
GitHub Actions Deep Dive: Beyond Basic CI
Most teams get their first GitHub Actions workflow running in an afternoon -- a basic lint-and-test pipeline triggered on push. Then they stop. They never touch the workflow file again until something breaks, and when it does, they have no idea how to debug it. This article goes past the beginner setup and into the patterns that make GitHub Actions a serious deployment platform: reusable workflows, matrix strategies, custom actions, self-hosted runners, OIDC authentication, and production deployment pipelines with manual approval gates.
If you are shipping Node.js applications and still running npm test in a single job with no caching, you are leaving significant time and money on the table. Let's fix that.
Prerequisites
- GitHub account with repository admin access
- Node.js v18+ installed locally
- Docker installed (for custom action and container builds)
- Familiarity with YAML syntax
- Basic understanding of CI/CD concepts (build, test, deploy)
- A terminal you are comfortable with (bash, zsh, Git Bash)
- Optional: a cloud provider account (AWS, GCP, or DigitalOcean) for deployment sections
Advanced Workflow Triggers
Most developers only use push and pull_request triggers. GitHub Actions supports far more granular control over when workflows run.
Push and Pull Request Filters
You can filter by branch, tag, and path. This is critical for monorepos or any project where you do not want every file change to trigger a full build:
name: Targeted CI
on:
push:
branches:
- main
- 'release/**'
paths:
- 'src/**'
- 'package.json'
- 'package-lock.json'
tags:
- 'v*'
pull_request:
branches:
- main
paths-ignore:
- 'docs/**'
- '*.md'
- '.github/ISSUE_TEMPLATE/**'
The paths filter is one of the most underused features. In a monorepo with a frontend and backend directory, you can create separate workflows that only trigger when their respective directories change. This saves hundreds of minutes per month on Actions billing.
Scheduled Workflows (Cron)
Cron triggers run workflows on a schedule. They are useful for nightly builds, dependency audits, and data pipelines:
on:
schedule:
# Run nightly at 2 AM UTC
- cron: '0 2 * * *'
# Run every Monday at 9 AM UTC
- cron: '0 9 * * 1'
One gotcha: scheduled workflows only run on the default branch. If your workflow file exists on a feature branch but not on main, the schedule will not fire. GitHub also makes no guarantees about exact timing during periods of high load -- your 2 AM job might start at 2:15 AM.
Manual Triggers with workflow_dispatch
This is essential for any workflow you want to run on demand -- deployments, database migrations, cache clearing:
on:
workflow_dispatch:
inputs:
environment:
description: 'Target environment'
required: true
type: choice
options:
- staging
- production
skip_tests:
description: 'Skip test suite'
required: false
type: boolean
default: false
version:
description: 'Version to deploy'
required: false
type: string
default: 'latest'
You access these inputs in your workflow with ${{ github.event.inputs.environment }} or the typed variant ${{ inputs.environment }}. The workflow_dispatch trigger adds a "Run workflow" button to the Actions tab in the GitHub UI.
Repository Dispatch (External Triggers)
repository_dispatch lets external systems trigger workflows via the GitHub API. This is how you integrate Actions with external CI systems, chatbots, or monitoring alerts:
on:
repository_dispatch:
types: [deploy, rollback, refresh-cache]
Trigger it from any HTTP client:
curl -X POST \
-H "Accept: application/vnd.github.v3+json" \
-H "Authorization: token $GITHUB_TOKEN" \
https://api.github.com/repos/OWNER/REPO/dispatches \
-d '{"event_type":"deploy","client_payload":{"env":"staging","ref":"v2.1.0"}}'
Access the payload in your workflow with ${{ github.event.client_payload.env }}. I use this pattern to trigger deployments from Slack bots and to chain workflows across repositories.
Matrix Strategies and Dynamic Matrices
Matrix builds run the same job across multiple configurations in parallel. The basic version tests across Node versions:
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
fail-fast: false
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- run: npm ci
- run: npm test
Setting fail-fast: false is important. Without it, a failure in Node 18 cancels the Node 20 and 22 jobs immediately. You usually want to see all failures, not just the first one.
Multi-Dimensional Matrices
You can combine multiple dimensions. This tests across Node versions and operating systems:
strategy:
matrix:
node-version: [18, 20, 22]
os: [ubuntu-latest, windows-latest, macos-latest]
include:
- node-version: 22
os: ubuntu-latest
coverage: true
exclude:
- node-version: 18
os: macos-latest
This generates 8 jobs (3x3 minus 1 exclusion). The include entry adds a coverage variable only to the Node 22 / Ubuntu combination.
Dynamic Matrices
For truly dynamic builds, generate the matrix in a preceding job:
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- uses: actions/checkout@v4
- id: set-matrix
run: |
MATRIX=$(node -e "
var fs = require('fs');
var packages = fs.readdirSync('packages').filter(function(d) {
return fs.statSync('packages/' + d).isDirectory();
});
console.log(JSON.stringify({ package: packages }));
")
echo "matrix=$MATRIX" >> $GITHUB_OUTPUT
test:
needs: prepare
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJson(needs.prepare.outputs.matrix) }}
steps:
- uses: actions/checkout@v4
- run: cd packages/${{ matrix.package }} && npm ci && npm test
This scans the packages/ directory and creates a matrix job for each subdirectory. When you add a new package, the CI automatically picks it up without any workflow changes.
Reusable Workflows and Composite Actions
Once you have more than two or three workflows, you start copying YAML blocks between them. Reusable workflows and composite actions solve this.
Reusable Workflows
A reusable workflow is a complete workflow that other workflows can call. Define it in a separate file:
# .github/workflows/reusable-test.yml
name: Reusable Test Workflow
on:
workflow_call:
inputs:
node-version:
required: false
type: string
default: '20'
working-directory:
required: false
type: string
default: '.'
secrets:
NPM_TOKEN:
required: false
jobs:
test:
runs-on: ubuntu-latest
defaults:
run:
working-directory: ${{ inputs.working-directory }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
cache: 'npm'
cache-dependency-path: '${{ inputs.working-directory }}/package-lock.json'
- run: npm ci
env:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
- run: npm test
- run: npm run lint
Call it from another workflow:
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
jobs:
test-api:
uses: ./.github/workflows/reusable-test.yml
with:
node-version: '20'
working-directory: 'api'
secrets:
NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
test-web:
uses: ./.github/workflows/reusable-test.yml
with:
working-directory: 'web'
You can also reference reusable workflows from other repositories: uses: org/shared-workflows/.github/workflows/test.yml@main. This is how platform teams standardize CI across an entire organization.
Composite Actions
A composite action bundles multiple steps into a single reusable action. Unlike reusable workflows, composite actions run inside a job rather than being a separate job:
# .github/actions/setup-node-project/action.yml
name: 'Setup Node Project'
description: 'Install Node.js, restore cache, install dependencies'
inputs:
node-version:
description: 'Node.js version'
required: false
default: '20'
working-directory:
description: 'Working directory'
required: false
default: '.'
runs:
using: 'composite'
steps:
- uses: actions/setup-node@v4
with:
node-version: ${{ inputs.node-version }}
- name: Cache node_modules
uses: actions/cache@v4
id: cache-deps
with:
path: ${{ inputs.working-directory }}/node_modules
key: deps-${{ runner.os }}-node${{ inputs.node-version }}-${{ hashFiles(format('{0}/package-lock.json', inputs.working-directory)) }}
- name: Install dependencies
if: steps.cache-deps.outputs.cache-hit != 'true'
shell: bash
working-directory: ${{ inputs.working-directory }}
run: npm ci
Use it in a workflow:
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup-node-project
with:
node-version: '20'
- run: npm test
I strongly prefer composite actions over reusable workflows for setup logic. Reusable workflows have a call depth limit of four, and they create separate jobs with their own runners. Composite actions run inline and are simpler to reason about.
Environment Secrets and OIDC
Environment-Scoped Secrets
GitHub supports environment-level secrets with protection rules. This is how you gate production deployments:
jobs:
deploy-staging:
runs-on: ubuntu-latest
environment: staging
steps:
- run: echo "Deploying to ${{ vars.DEPLOY_URL }}"
env:
API_KEY: ${{ secrets.API_KEY }}
deploy-production:
runs-on: ubuntu-latest
needs: deploy-staging
environment:
name: production
url: https://myapp.com
steps:
- run: echo "Deploying to production"
env:
API_KEY: ${{ secrets.API_KEY }}
In repository settings, you can configure the production environment to require manual approval from specific team members and restrict it to the main branch only. The API_KEY secret can have different values in staging versus production.
OIDC for Cloud Authentication
Hardcoding cloud credentials in GitHub secrets is a security risk. OIDC (OpenID Connect) lets your workflow request short-lived tokens directly from your cloud provider. No long-lived credentials stored anywhere:
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-deploy
aws-region: us-east-1
- run: aws s3 sync ./dist s3://my-app-bucket
The id-token: write permission is required. On the AWS side, you set up an IAM role with a trust policy that only trusts tokens from your specific repository and branch. This is the correct way to handle cloud credentials in 2026 -- if you are still using AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY secrets, migrate to OIDC immediately.
Caching Strategies
Caching is the single biggest optimization you can make. A cold npm ci on a project with 800 dependencies takes 30-45 seconds. With a warm cache, it drops to under 5 seconds.
Node Modules Caching
The built-in cache in actions/setup-node is the simplest approach:
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
For more control, use actions/cache directly:
- name: Cache node_modules
uses: actions/cache@v4
id: npm-cache
with:
path: node_modules
key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
- name: Install dependencies
if: steps.npm-cache.outputs.cache-hit != 'true'
run: npm ci
The restore-keys fallback is important. If the exact key does not match (because package-lock.json changed), it falls back to the most recent cache with the npm-Linux- prefix. This gives you a partial cache hit -- most modules are already there, and npm ci only downloads the delta.
Docker Layer Caching
Docker builds inside Actions are slow without layer caching. Use docker/build-push-action with cache configuration:
- uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/myorg/myapp:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
The type=gha cache backend uses GitHub Actions cache storage. The mode=max setting caches all layers, not just the final image layers. This typically reduces Docker build times from 3-5 minutes to 30-60 seconds on subsequent runs.
Custom Caching
Cache anything with a deterministic key. Here is an example caching Playwright browser binaries:
- name: Cache Playwright browsers
uses: actions/cache@v4
id: playwright-cache
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- name: Install Playwright browsers
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: npx playwright install --with-deps chromium
Cache storage is limited to 10 GB per repository. Caches not accessed within 7 days are evicted. Monitor your cache usage in the repository's Actions settings.
Artifact Management and Workflow Outputs
Uploading and Downloading Artifacts
Artifacts persist data between jobs and after workflow completion. Use them for test reports, coverage files, and build outputs:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run build
- uses: actions/upload-artifact@v4
with:
name: dist
path: dist/
retention-days: 5
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v4
with:
name: dist
path: dist/
- run: ls -la dist/
Job Outputs
Pass data between jobs with outputs:
jobs:
version:
runs-on: ubuntu-latest
outputs:
semver: ${{ steps.get-version.outputs.version }}
sha_short: ${{ steps.get-sha.outputs.sha }}
steps:
- uses: actions/checkout@v4
- id: get-version
run: |
VERSION=$(node -p "require('./package.json').version")
echo "version=$VERSION" >> $GITHUB_OUTPUT
- id: get-sha
run: echo "sha=$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
build:
needs: version
runs-on: ubuntu-latest
steps:
- run: echo "Building version ${{ needs.version.outputs.semver }}"
Always use $GITHUB_OUTPUT for setting outputs. The old ::set-output command was deprecated and disabled in 2023. If you see ::set-output in a workflow, update it immediately.
Self-Hosted Runners
GitHub-hosted runners are convenient, but they cost money (at $0.008/minute for Linux) and have limited resources. Self-hosted runners give you more control, cheaper compute, and access to internal networks.
Setup
Install the runner agent on any Linux machine:
# Create a directory for the runner
mkdir actions-runner && cd actions-runner
# Download the latest runner package
curl -o actions-runner-linux-x64-2.311.0.tar.gz -L \
https://github.com/actions/runner/releases/download/v2.311.0/actions-runner-linux-x64-2.311.0.tar.gz
# Extract
tar xzf ./actions-runner-linux-x64-2.311.0.tar.gz
# Configure - get the token from repository Settings > Actions > Runners
./config.sh --url https://github.com/OWNER/REPO --token YOUR_TOKEN
# Install and start as a systemd service
sudo ./svc.sh install
sudo ./svc.sh start
Reference the runner in your workflow by label:
jobs:
build:
runs-on: [self-hosted, linux, x64]
steps:
- uses: actions/checkout@v4
- run: npm ci && npm test
Security Considerations
Self-hosted runners are a security surface. Critical rules:
- Never use self-hosted runners on public repositories. Anyone who opens a pull request can run arbitrary code on your machine.
- Run the agent as a non-root user with minimal permissions.
- Use ephemeral runners when possible. The
--ephemeralflag causes the runner to de-register after one job. - Isolate runners with containers or VMs. Do not share a runner machine with production workloads.
- Keep the runner agent updated. GitHub releases security patches regularly.
For organizations with heavy CI load, I recommend running self-hosted runners on DigitalOcean Droplets or AWS EC2 Spot Instances behind an autoscaler. The actions-runner-controller project handles Kubernetes-based autoscaling.
Conditional Execution and Job Dependencies
Conditional Steps
Use if expressions to control step execution:
steps:
- name: Run tests
run: npm test
- name: Upload coverage
if: success() && matrix.coverage == true
run: npx codecov
- name: Notify on failure
if: failure()
run: |
curl -X POST "${{ secrets.SLACK_WEBHOOK }}" \
-H 'Content-type: application/json' \
-d '{"text":"CI failed on ${{ github.ref }}"}'
- name: Deploy on tag
if: startsWith(github.ref, 'refs/tags/v')
run: npm run deploy
Job Dependencies
Use needs to create job dependency chains:
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run lint
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm test
build:
needs: [lint, test]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm ci && npm run build
deploy:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- run: echo "Deploying..."
The lint and test jobs run in parallel. build only runs when both succeed. deploy only runs on the main branch after build completes.
Conditional Jobs Based on Changed Files
Use dorny/paths-filter to conditionally run entire jobs based on which files changed:
jobs:
changes:
runs-on: ubuntu-latest
outputs:
api: ${{ steps.filter.outputs.api }}
web: ${{ steps.filter.outputs.web }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
api:
- 'api/**'
web:
- 'web/**'
test-api:
needs: changes
if: needs.changes.outputs.api == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: cd api && npm ci && npm test
test-web:
needs: changes
if: needs.changes.outputs.web == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: cd web && npm ci && npm test
Custom GitHub Actions
JavaScript Actions
A JavaScript action runs directly on the runner using Node.js. Create the action structure:
.github/actions/notify-deploy/
action.yml
index.js
package.json
Define the action metadata:
# .github/actions/notify-deploy/action.yml
name: 'Notify Deploy'
description: 'Send deployment notification to Slack and update commit status'
inputs:
slack-webhook:
description: 'Slack webhook URL'
required: true
environment:
description: 'Deployment environment'
required: true
status:
description: 'Deployment status'
required: true
default: 'success'
outputs:
notification-id:
description: 'Slack message timestamp'
runs:
using: 'node20'
main: 'index.js'
Write the action logic:
// .github/actions/notify-deploy/index.js
var core = require('@actions/core');
var github = require('@actions/github');
var https = require('https');
function sendSlackMessage(webhookUrl, message) {
return new Promise(function(resolve, reject) {
var url = new URL(webhookUrl);
var data = JSON.stringify(message);
var options = {
hostname: url.hostname,
path: url.pathname,
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(data)
}
};
var req = https.request(options, function(res) {
var body = '';
res.on('data', function(chunk) { body += chunk; });
res.on('end', function() { resolve(body); });
});
req.on('error', function(err) { reject(err); });
req.write(data);
req.end();
});
}
async function run() {
try {
var webhookUrl = core.getInput('slack-webhook', { required: true });
var environment = core.getInput('environment', { required: true });
var status = core.getInput('status');
var context = github.context;
var emoji = status === 'success' ? ':white_check_mark:' : ':x:';
var color = status === 'success' ? '#36a64f' : '#dc3545';
var message = {
attachments: [{
color: color,
blocks: [
{
type: 'section',
text: {
type: 'mrkdwn',
text: emoji + ' *Deploy to ' + environment + '* - ' + status +
'\n*Repo:* ' + context.repo.owner + '/' + context.repo.repo +
'\n*Commit:* `' + context.sha.substring(0, 7) + '`' +
'\n*Actor:* ' + context.actor
}
}
]
}]
};
await sendSlackMessage(webhookUrl, message);
core.info('Notification sent to Slack for ' + environment + ' deploy');
core.setOutput('notification-id', Date.now().toString());
} catch (error) {
core.setFailed('Failed to send notification: ' + error.message);
}
}
run();
Install the dependencies and compile:
cd .github/actions/notify-deploy
npm init -y
npm install @actions/core @actions/github
You must commit node_modules or use @vercel/ncc to bundle everything into a single file. I prefer the ncc approach:
npm install -g @vercel/ncc
ncc build index.js -o dist
Then change action.yml to point to main: 'dist/index.js'.
Docker Actions
For actions that need specific tooling or system dependencies, use a Docker-based action:
# .github/actions/db-migrate/action.yml
name: 'Database Migration'
description: 'Run database migrations with connection validation'
inputs:
database-url:
description: 'PostgreSQL connection string'
required: true
direction:
description: 'Migration direction (up or down)'
required: false
default: 'up'
runs:
using: 'docker'
image: 'Dockerfile'
args:
- ${{ inputs.database-url }}
- ${{ inputs.direction }}
# .github/actions/db-migrate/Dockerfile
FROM node:20-alpine
RUN apk add --no-cache postgresql-client
COPY package.json package-lock.json ./
RUN npm ci --production
COPY migrate.js ./
ENTRYPOINT ["node", "migrate.js"]
Docker actions are slower to start (they build the image each run), but they give you complete control over the execution environment.
Workflow Debugging and Troubleshooting
Enable Debug Logging
Set the ACTIONS_STEP_DEBUG secret to true in your repository. This enables verbose logging for all workflow runs. For runner-level diagnostics, set ACTIONS_RUNNER_DEBUG to true as well.
Debug with tmate
For interactive debugging, drop into a shell session mid-workflow:
- name: Debug with tmate
if: failure()
uses: mxschmitt/action-tmate@v3
with:
limit-access-to-actor: true
This opens an SSH session you can connect to. The limit-access-to-actor setting restricts access to the user who triggered the workflow. Remove this step before merging to main -- you do not want a stale tmate session running on every failure in production.
Examining Context and Event Payloads
Dump the full GitHub context to understand what data is available:
- name: Dump context
run: |
echo "Event: ${{ github.event_name }}"
echo "Ref: ${{ github.ref }}"
echo "SHA: ${{ github.sha }}"
echo "Actor: ${{ github.actor }}"
echo "Workflow: ${{ github.workflow }}"
echo "Run ID: ${{ github.run_id }}"
echo "Run number: ${{ github.run_number }}"
cat "$GITHUB_EVENT_PATH" | jq .
Rerunning Failed Jobs
You can re-run individual failed jobs instead of the entire workflow. In the Actions UI, click the failed run, then "Re-run failed jobs." From the CLI:
# List recent workflow runs
gh run list --limit 5
# Re-run failed jobs only
gh run rerun 12345678 --failed
# Re-run with debug logging enabled
gh run rerun 12345678 --debug
Complete Working Example: Production CI/CD Pipeline
Here is a production-grade pipeline that covers matrix testing, caching, Docker build, and deployment with manual approval:
name: Production CI/CD
on:
push:
branches: [main]
tags: ['v*']
pull_request:
branches: [main]
workflow_dispatch:
inputs:
deploy_env:
description: 'Deploy to environment'
type: choice
options:
- staging
- production
default: staging
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
NODE_VERSION: '20'
jobs:
# ─── Lint ───────────────────────────────────────────────
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- run: npm ci
- run: npm run lint
- run: npx tsc --noEmit || true
# ─── Test Matrix ────────────────────────────────────────
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20, 22]
fail-fast: false
services:
postgres:
image: postgres:16
env:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: testdb
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
- name: Cache node_modules
uses: actions/cache@v4
id: cache
with:
path: node_modules
key: deps-${{ runner.os }}-node${{ matrix.node-version }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
deps-${{ runner.os }}-node${{ matrix.node-version }}-
- name: Install dependencies
if: steps.cache.outputs.cache-hit != 'true'
run: npm ci
- name: Run tests
run: npm test -- --coverage
env:
DATABASE_URL: postgres://test:test@localhost:5432/testdb
REDIS_URL: redis://localhost:6379
NODE_ENV: test
- name: Upload coverage
if: matrix.node-version == 20
uses: actions/upload-artifact@v4
with:
name: coverage-report
path: coverage/
retention-days: 7
# ─── Security Audit ────────────────────────────────────
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm audit --audit-level=high
continue-on-error: true
- uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
severity: 'HIGH,CRITICAL'
# ─── Docker Build ──────────────────────────────────────
docker:
needs: [lint, test]
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,prefix=
type=ref,event=branch
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
- id: build
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
NODE_VERSION=${{ env.NODE_VERSION }}
BUILD_DATE=${{ github.event.head_commit.timestamp }}
GIT_SHA=${{ github.sha }}
# ─── Deploy Staging ────────────────────────────────────
deploy-staging:
needs: docker
if: github.ref == 'refs/heads/main' || (github.event_name == 'workflow_dispatch' && inputs.deploy_env == 'staging')
runs-on: ubuntu-latest
environment:
name: staging
url: https://staging.myapp.com
steps:
- uses: actions/checkout@v4
- name: Deploy to staging
run: |
echo "Deploying image ${{ needs.docker.outputs.image-tag }}"
# Replace with your actual deployment command
# doctl apps create-deployment $APP_ID --wait
env:
DIGITALOCEAN_ACCESS_TOKEN: ${{ secrets.DO_TOKEN }}
- name: Run smoke tests
run: |
sleep 10
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.myapp.com/health)
if [ "$STATUS" != "200" ]; then
echo "Smoke test failed: health check returned $STATUS"
exit 1
fi
echo "Smoke test passed: health check returned $STATUS"
- name: Notify Slack
if: always()
uses: ./.github/actions/notify-deploy
with:
slack-webhook: ${{ secrets.SLACK_WEBHOOK }}
environment: staging
status: ${{ job.status }}
# ─── Deploy Production ─────────────────────────────────
deploy-production:
needs: [docker, deploy-staging]
if: startsWith(github.ref, 'refs/tags/v') || (github.event_name == 'workflow_dispatch' && inputs.deploy_env == 'production')
runs-on: ubuntu-latest
environment:
name: production
url: https://myapp.com
steps:
- uses: actions/checkout@v4
- name: Deploy to production
run: |
echo "Deploying to production: ${{ needs.docker.outputs.image-tag }}"
# doctl apps create-deployment $PROD_APP_ID --wait
env:
DIGITALOCEAN_ACCESS_TOKEN: ${{ secrets.DO_PROD_TOKEN }}
- name: Run production smoke tests
run: |
sleep 15
node -e "
var http = require('https');
var endpoints = ['/health', '/api/status', '/'];
var failed = [];
function checkEndpoint(path) {
return new Promise(function(resolve) {
http.get('https://myapp.com' + path, function(res) {
if (res.statusCode !== 200) {
failed.push(path + ' returned ' + res.statusCode);
}
resolve();
}).on('error', function(err) {
failed.push(path + ' error: ' + err.message);
resolve();
});
});
}
Promise.all(endpoints.map(checkEndpoint)).then(function() {
if (failed.length > 0) {
console.error('Smoke tests failed:', failed);
process.exit(1);
}
console.log('All smoke tests passed');
});
"
- name: Create GitHub Release
if: startsWith(github.ref, 'refs/tags/v')
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
This pipeline takes approximately 3-4 minutes for the lint/test/build phases with warm caches, and 5-7 minutes end-to-end including staging deployment. Without caching, expect 8-12 minutes.
The key architectural decisions:
- Lint and test run in parallel. No reason to gate tests on linting.
- Docker build waits for both lint and test. No point building an image if code is broken.
- Staging deploys automatically on main. Every merge to main goes to staging.
- Production requires a tag or manual trigger. The
productionenvironment has required reviewers configured in GitHub settings. - Smoke tests validate each deployment. If the health check fails, the job fails and the Slack notification includes the failure.
Common Issues and Troubleshooting
1. "Resource not accessible by integration"
Error: Resource not accessible by integration
This happens when your workflow lacks the necessary permissions. GitHub Actions uses a restrictive default permission set. Fix it by explicitly declaring permissions:
permissions:
contents: read
packages: write
pull-requests: write
For organization repositories, check Settings > Actions > General > Workflow permissions. If it is set to "Read repository contents," your workflows cannot write anything. The workflow-level permissions key can only restrict permissions, never elevate them beyond the organization setting.
2. Cache Miss on Every Run
Cache not found for input keys: deps-Linux-node20-abc123def456
Common causes:
- The
hashFiles()path is wrong. It is relative to the repository root, not the working directory. UsehashFiles('**/package-lock.json')for monorepos. - The cache key includes a volatile value like a timestamp or run ID.
- Cache was evicted. Caches older than 7 days or exceeding the 10 GB repository limit get purged.
Debug by listing all caches for your repository:
gh cache list --limit 20
3. "Node.js 12 actions are deprecated"
Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16:
actions/checkout@v2, actions/cache@v2
This is not a failure yet, but it will be. GitHub is removing Node 12 and Node 16 runner support. Update all actions to their @v4 versions:
# Before (deprecated)
- uses: actions/checkout@v2
- uses: actions/cache@v2
# After
- uses: actions/checkout@v4
- uses: actions/cache@v4
Run this command to find all outdated action references in your workflows:
grep -r "uses:.*@v[12]" .github/workflows/
4. Docker Build Fails with "No Space Left on Device"
Error: write /tmp/buildkit-mount123456/layer.tar: no space left on device
GitHub-hosted runners have 14 GB of available disk space, but Docker images, caches, and pre-installed tools consume most of it. Solutions:
# Free up disk space before building
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
df -h
# Or use a larger runner
jobs:
build:
runs-on: ubuntu-latest-16-cores # More disk space
For large Docker images, switch to multi-stage builds that produce smaller final images. A Node.js app built with a full node:20 base image might be 1.2 GB. The same app with node:20-alpine and a multi-stage build drops to 80-150 MB.
5. "Workflow Run Cancelled" with No Explanation
This happens when you push multiple commits quickly to the same branch. By default, GitHub Actions queues runs for the same workflow and branch. Add concurrency control to cancel superseded runs:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
This cancels any in-progress run for the same workflow and branch when a new commit is pushed. Use this on PR workflows to avoid wasting minutes on outdated code. Do not use cancel-in-progress: true on deployment workflows -- you do not want a partial deployment.
6. Secrets Not Available in Pull Requests from Forks
Warning: Skipping step because secret 'NPM_TOKEN' is not available
For security, GitHub does not expose repository secrets to workflows triggered by pull requests from forks. This is by design -- a malicious PR could exfiltrate your secrets. Solutions:
- Use
pull_request_targetinstead ofpull_request(but be very careful -- this runs the workflow from the base branch with full secret access). - Split your workflow: run tests without secrets on PRs, run the full suite on push to main.
- Use environment-level secrets with deployment protection rules.
Best Practices
Pin action versions to SHA hashes, not tags. Tags can be moved or deleted. Use
actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608instead ofactions/checkout@v4. At minimum, pin to major versions (@v4), never floating (@main).Use
concurrencygroups on every PR workflow. Without concurrency control, pushing 5 quick-fix commits generates 5 parallel workflow runs. That is 5x the compute cost for code that is immediately superseded.Set explicit
permissionson every workflow. The defaultGITHUB_TOKENpermissions are too broad. Follow the principle of least privilege. Most CI workflows only needcontents: read.Cache aggressively, but validate cache keys. A bad cache is worse than no cache. If your cache key does not include the lockfile hash, you might restore stale dependencies and get phantom test failures that do not reproduce locally.
Never store secrets in workflow files or action inputs. Use encrypted secrets, OIDC for cloud credentials, or environment-level secrets with required reviewers. Audit your workflows for any hardcoded tokens, API keys, or passwords.
Use
timeout-minuteson every job. The default timeout is 6 hours. A stuck job can burn through your Actions minutes budget silently. Most CI jobs should complete in under 15 minutes:
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 15
Keep workflows DRY with composite actions. If you find yourself copying the same setup steps (checkout, setup node, install, cache) across multiple workflows, extract them into a composite action in
.github/actions/. This reduces maintenance burden and ensures consistency.Monitor your Actions usage. Go to Settings > Billing > Actions to see your monthly minutes consumption. Set up billing alerts. A runaway workflow on a schedule trigger can consume thousands of minutes before anyone notices.
Use
continue-on-errorsparingly and deliberately. Marking a step ascontinue-on-error: trueis sometimes necessary (like fornpm auditwhich might report vulnerabilities in dev dependencies), but overusing it masks real failures. Always add a comment explaining why a step is allowed to fail.Version your workflow files like code. Review workflow changes in pull requests. A syntax error in a workflow file can break CI for the entire team. Use
actionlintlocally to validate workflow syntax before committing:
# Install actionlint
brew install actionlint
# Lint all workflow files
actionlint .github/workflows/*.yml