Test Plans

Continuous Testing Strategies for CI/CD

Implement continuous testing across CI/CD pipeline stages with shift-left practices, quality gates, and production monitoring

Continuous Testing Strategies for CI/CD

Continuous testing is the practice of executing automated tests at every stage of your CI/CD pipeline, from the moment a developer saves a file to long after code reaches production. It is not just "running tests in CI" — it is a deliberate strategy where different types of tests guard different stages, each with its own purpose, speed requirements, and failure response. Done right, continuous testing catches defects within minutes of introduction and gives teams the confidence to deploy multiple times per day.

Prerequisites

  • Node.js 18 or later installed
  • Basic familiarity with Azure DevOps Pipelines
  • A Node.js project with at least unit tests in place
  • Understanding of Git branching and pull request workflows
  • Azure DevOps project with service connections configured

What Continuous Testing Actually Means

Most teams think they have continuous testing because they run a Jest suite in their CI build. That is a start, but it misses the point. Continuous testing means that at every transition point in your delivery pipeline — from local development through production — there is an automated quality gate verifying that the software works correctly.

The key distinction is coverage across stages, not just coverage across code. You need different tests running at different times, each optimized for the risks present at that stage. Unit tests catch logic errors fast. Integration tests verify service boundaries. Smoke tests confirm deployments succeeded. Canary tests detect production-specific failures. Each layer has a distinct job, and removing any one of them leaves a gap that will eventually burn you.

The goal is a pipeline where defects are caught at the earliest possible stage, because the cost of fixing a bug grows exponentially the further it travels from the developer's hands.

Testing Stages in a CI/CD Pipeline

A mature continuous testing pipeline has at least five distinct testing stages:

  1. Pre-commit — Linting, formatting, and fast unit tests run locally before code enters the repository
  2. PR validation — Full unit test suite, code coverage enforcement, and static analysis run on every pull request
  3. Integration — Service-level integration tests run against a staging environment after merge
  4. Post-deployment smoke — Critical path verification immediately after each deployment
  5. Production monitoring — Synthetic tests and canary analysis running continuously in production

Each stage serves as a gate. Code cannot progress to the next stage unless the current stage passes. This is the essence of quality gates, and they are non-negotiable in a serious continuous testing strategy.

Shift-Left Testing Strategy

Shift-left testing means moving testing activities as early as possible in the development lifecycle. Instead of discovering bugs in QA or staging, you find them on the developer's machine before a commit is ever made.

The economics are straightforward. A bug caught in a unit test costs minutes to fix. The same bug caught in integration testing costs hours. If it reaches production, it costs days of investigation, hotfix cycles, and customer trust.

Shifting left requires three things:

  • Fast feedback — Tests must run in seconds, not minutes. If pre-commit hooks take longer than 10 seconds, developers will bypass them.
  • Developer ownership — Developers write and maintain tests, not a separate QA team. The person who writes the code is the person best positioned to test it.
  • Tooling that stays out of the way — Pre-commit hooks, watch-mode test runners, and IDE integrations should run transparently without disrupting flow.

Pre-Commit Hooks and Local Testing

Pre-commit hooks are your first line of defense. Using Husky and lint-staged, you can run linters and targeted unit tests on every commit without slowing developers down.

Here is a practical setup for a Node.js project:

{
  "name": "my-service",
  "scripts": {
    "test": "jest",
    "test:changed": "jest --onlyChanged --passWithNoTests",
    "lint": "eslint .",
    "lint:staged": "eslint --fix"
  },
  "husky": {
    "hooks": {
      "pre-commit": "lint-staged",
      "pre-push": "npm test"
    }
  },
  "lint-staged": {
    "*.js": [
      "eslint --fix",
      "jest --findRelatedTests --passWithNoTests"
    ]
  }
}

The --findRelatedTests flag is critical. It tells Jest to only run tests that are affected by the staged files. If you changed userService.js, it runs userService.test.js and any test that imports userService. This keeps pre-commit hooks under 10 seconds for most changes.

For the pre-push hook, we run the full test suite. This catches anything the targeted pre-commit run might have missed. If the full suite takes more than 2 minutes, you have a test performance problem that needs addressing before it kills your feedback loop.

PR Validation Testing

When a developer opens a pull request, your pipeline should run the complete unit test suite with code coverage enforcement:

// jest.config.js
var config = {
  testEnvironment: "node",
  collectCoverage: true,
  coverageThreshold: {
    global: {
      branches: 80,
      functions: 85,
      lines: 85,
      statements: 85
    }
  },
  coverageReporters: ["text", "cobertura", "html"],
  testMatch: ["**/__tests__/**/*.js", "**/*.test.js"],
  maxWorkers: "50%"
};

module.exports = config;

The cobertura reporter is important because Azure DevOps can parse Cobertura XML and display coverage results directly in the PR. This gives reviewers immediate visibility into whether new code is tested.

Beyond unit tests, PR validation should include static analysis. Tools like SonarQube or ESLint with security-focused plugins catch entire categories of bugs that unit tests miss — SQL injection patterns, unhandled promise rejections, prototype pollution risks.

// .eslintrc.js
module.exports = {
  extends: [
    "eslint:recommended",
    "plugin:security/recommended",
    "plugin:node/recommended"
  ],
  plugins: ["security"],
  rules: {
    "security/detect-object-injection": "warn",
    "security/detect-non-literal-regexp": "error",
    "security/detect-unsafe-regex": "error",
    "no-eval": "error",
    "no-implied-eval": "error"
  }
};

Integration Testing in Staging

After code merges to your main branch, integration tests verify that services work together correctly. These tests hit real HTTP endpoints, query real databases (test instances), and validate end-to-end workflows.

Here is a practical integration test for an API service:

// tests/integration/users.integration.test.js
var request = require("supertest");
var app = require("../../app");
var db = require("../../db/postgres");

describe("User API Integration", function () {
  var testUserId;

  beforeAll(function () {
    return db.query("DELETE FROM users WHERE email LIKE '%@test.integration%'");
  });

  afterAll(function () {
    return db.query("DELETE FROM users WHERE email LIKE '%@test.integration%'")
      .then(function () {
        return db.end();
      });
  });

  it("should create a user and retrieve it", function () {
    return request(app)
      .post("/api/users")
      .send({
        name: "Integration Test User",
        email: "user-" + Date.now() + "@test.integration"
      })
      .expect(201)
      .then(function (res) {
        testUserId = res.body.id;
        expect(res.body.name).toBe("Integration Test User");
        return request(app)
          .get("/api/users/" + testUserId)
          .expect(200);
      })
      .then(function (res) {
        expect(res.body.id).toBe(testUserId);
        expect(res.body.name).toBe("Integration Test User");
      });
  });

  it("should enforce unique email constraint", function () {
    var email = "duplicate-" + Date.now() + "@test.integration";
    return request(app)
      .post("/api/users")
      .send({ name: "First User", email: email })
      .expect(201)
      .then(function () {
        return request(app)
          .post("/api/users")
          .send({ name: "Second User", email: email })
          .expect(409);
      });
  });
});

Notice the test data strategy: every test creates its own data with unique identifiers, and cleanup runs in both beforeAll and afterAll. This makes tests repeatable and prevents cross-test contamination.

Smoke Tests Post-Deployment

Smoke tests run immediately after a deployment completes. Their job is not to test functionality exhaustively — that happened in earlier stages. Smoke tests verify that the deployment itself succeeded and the application is operational.

A good smoke test suite takes under 60 seconds and covers:

  • Health check endpoints respond with 200
  • Authentication flow works end-to-end
  • The most critical business transaction completes successfully
  • External service connections (database, cache, message queue) are alive
// tests/smoke/smoke.test.js
var http = require("https");

var BASE_URL = process.env.SMOKE_TARGET_URL || "https://staging.example.com";

function httpGet(path) {
  return new Promise(function (resolve, reject) {
    var url = BASE_URL + path;
    http.get(url, function (res) {
      var body = "";
      res.on("data", function (chunk) { body += chunk; });
      res.on("end", function () {
        resolve({ statusCode: res.statusCode, body: body, headers: res.headers });
      });
    }).on("error", reject);
  });
}

describe("Smoke Tests - " + BASE_URL, function () {
  it("health endpoint returns 200", function () {
    return httpGet("/health").then(function (res) {
      expect(res.statusCode).toBe(200);
      var data = JSON.parse(res.body);
      expect(data.status).toBe("healthy");
    });
  });

  it("database connectivity is healthy", function () {
    return httpGet("/health/db").then(function (res) {
      expect(res.statusCode).toBe(200);
      var data = JSON.parse(res.body);
      expect(data.connected).toBe(true);
    });
  });

  it("critical endpoint responds correctly", function () {
    return httpGet("/api/v1/status").then(function (res) {
      expect(res.statusCode).toBe(200);
    });
  });

  it("static assets are accessible", function () {
    return httpGet("/css/styles.css").then(function (res) {
      expect(res.statusCode).toBe(200);
      expect(res.headers["content-type"]).toContain("text/css");
    });
  });
});

If any smoke test fails, the deployment should be automatically rolled back. No exceptions. A failed smoke test means your users are experiencing a broken application right now.

Canary Testing and Feature Flags

Canary deployments route a small percentage of traffic (typically 5-10%) to the new version while the majority continues hitting the stable version. This is testing in production with a safety net.

The canary strategy requires two things: a traffic routing mechanism and automated analysis of canary health.

// canary/health-checker.js
var axios = require("axios");

var CANARY_URL = process.env.CANARY_URL;
var STABLE_URL = process.env.STABLE_URL;
var THRESHOLD_ERROR_RATE = parseFloat(process.env.CANARY_ERROR_THRESHOLD || "0.02");
var CHECK_INTERVAL_MS = parseInt(process.env.CANARY_CHECK_INTERVAL || "30000", 10);
var EVALUATION_WINDOW = parseInt(process.env.CANARY_EVAL_WINDOW || "10", 10);

var canaryResults = [];
var stableResults = [];

function checkEndpoint(url) {
  var start = Date.now();
  return axios.get(url + "/health", { timeout: 5000 })
    .then(function (res) {
      return {
        success: res.status === 200,
        latency: Date.now() - start,
        timestamp: new Date().toISOString()
      };
    })
    .catch(function () {
      return {
        success: false,
        latency: Date.now() - start,
        timestamp: new Date().toISOString()
      };
    });
}

function evaluateCanary() {
  var recentCanary = canaryResults.slice(-EVALUATION_WINDOW);
  var recentStable = stableResults.slice(-EVALUATION_WINDOW);

  if (recentCanary.length < EVALUATION_WINDOW) {
    console.log("Not enough data points yet: " + recentCanary.length + "/" + EVALUATION_WINDOW);
    return "pending";
  }

  var canaryErrorRate = recentCanary.filter(function (r) { return !r.success; }).length / recentCanary.length;
  var stableErrorRate = recentStable.filter(function (r) { return !r.success; }).length / recentStable.length;

  var canaryAvgLatency = recentCanary.reduce(function (sum, r) { return sum + r.latency; }, 0) / recentCanary.length;
  var stableAvgLatency = recentStable.reduce(function (sum, r) { return sum + r.latency; }, 0) / recentStable.length;

  console.log("Canary error rate: " + (canaryErrorRate * 100).toFixed(2) + "% | Stable: " + (stableErrorRate * 100).toFixed(2) + "%");
  console.log("Canary latency: " + canaryAvgLatency.toFixed(0) + "ms | Stable: " + stableAvgLatency.toFixed(0) + "ms");

  if (canaryErrorRate > THRESHOLD_ERROR_RATE && canaryErrorRate > stableErrorRate * 2) {
    return "rollback";
  }

  if (canaryAvgLatency > stableAvgLatency * 1.5) {
    return "rollback";
  }

  return "healthy";
}

function runCanaryLoop() {
  setInterval(function () {
    Promise.all([
      checkEndpoint(CANARY_URL),
      checkEndpoint(STABLE_URL)
    ]).then(function (results) {
      canaryResults.push(results[0]);
      stableResults.push(results[1]);

      var verdict = evaluateCanary();
      if (verdict === "rollback") {
        console.error("CANARY FAILED — triggering rollback");
        process.exit(1);
      }
    });
  }, CHECK_INTERVAL_MS);
}

module.exports = { runCanaryLoop: runCanaryLoop, evaluateCanary: evaluateCanary };

Feature flags complement canary deployments by letting you toggle functionality without deploying new code. When a canary fails, you can disable the problematic feature flag instantly instead of rolling back the entire deployment.

// middleware/featureFlags.js
var flags = require("../config/feature-flags.json");

function isFeatureEnabled(flagName, userId) {
  var flag = flags[flagName];
  if (!flag) return false;
  if (flag.enabled === false) return false;
  if (flag.rolloutPercentage === 100) return true;
  if (flag.rolloutPercentage === 0) return false;

  // Deterministic hash so same user always gets same result
  var hash = 0;
  var input = flagName + ":" + userId;
  for (var i = 0; i < input.length; i++) {
    hash = ((hash << 5) - hash) + input.charCodeAt(i);
    hash = hash & hash;
  }
  return (Math.abs(hash) % 100) < flag.rolloutPercentage;
}

function featureFlagMiddleware(req, res, next) {
  req.isFeatureEnabled = function (flagName) {
    var userId = req.user ? req.user.id : req.sessionID;
    return isFeatureEnabled(flagName, userId);
  };
  next();
}

module.exports = featureFlagMiddleware;

Test Environment Management

One of the most common reasons continuous testing fails in practice is flaky environments. Tests pass locally, fail in CI, pass again when re-run. This is almost always an environment problem, not a test problem.

The solution is infrastructure-as-code for your test environments. Every test environment should be reproducible from a script:

// scripts/setup-test-env.js
var childProcess = require("child_process");
var path = require("path");

var ENV_CONFIG = {
  database: {
    image: "postgres:15",
    port: 5433,
    env: {
      POSTGRES_DB: "testdb",
      POSTGRES_USER: "testuser",
      POSTGRES_PASSWORD: "testpass"
    }
  },
  redis: {
    image: "redis:7-alpine",
    port: 6380
  }
};

function startService(name, config) {
  var containerName = "test-" + name + "-" + process.pid;
  var envArgs = [];
  if (config.env) {
    Object.keys(config.env).forEach(function (key) {
      envArgs.push("-e");
      envArgs.push(key + "=" + config.env[key]);
    });
  }

  var args = [
    "run", "-d", "--rm",
    "--name", containerName,
    "-p", config.port + ":" + config.port.toString().slice(0, -1) + (name === "database" ? "5432" : "6379")
  ].concat(envArgs).concat([config.image]);

  console.log("Starting " + name + " on port " + config.port);
  childProcess.execSync("docker " + args.join(" "), { stdio: "inherit" });

  return containerName;
}

function waitForReady(name, port, retries) {
  retries = retries || 30;
  return new Promise(function (resolve, reject) {
    var attempts = 0;
    var interval = setInterval(function () {
      attempts++;
      try {
        childProcess.execSync("docker exec test-" + name + "-" + process.pid + " pg_isready 2>/dev/null || redis-cli -p " + port + " ping 2>/dev/null", { stdio: "pipe" });
        clearInterval(interval);
        resolve();
      } catch (e) {
        if (attempts >= retries) {
          clearInterval(interval);
          reject(new Error(name + " failed to start after " + retries + " attempts"));
        }
      }
    }, 1000);
  });
}

module.exports = { startService: startService, waitForReady: waitForReady, ENV_CONFIG: ENV_CONFIG };

Test Data Strategies

Test data is the silent killer of reliable test suites. Shared mutable test data creates coupling between tests, ordering dependencies, and intermittent failures.

There are three proven test data strategies:

1. Factory pattern — Generate fresh test data for each test:

// tests/factories/userFactory.js
var counter = 0;

function createUser(overrides) {
  counter++;
  var defaults = {
    name: "Test User " + counter,
    email: "testuser-" + counter + "-" + Date.now() + "@test.local",
    role: "member",
    active: true,
    createdAt: new Date().toISOString()
  };

  return Object.assign({}, defaults, overrides || {});
}

function createUsers(count, overrides) {
  var users = [];
  for (var i = 0; i < count; i++) {
    users.push(createUser(overrides));
  }
  return users;
}

module.exports = { createUser: createUser, createUsers: createUsers };

2. Database snapshots — Restore a known-good state before each test suite.

3. Transactional rollback — Wrap each test in a database transaction that rolls back after the test completes. This is the fastest approach because it avoids the overhead of insert/delete cycles.

Test Parallelization for Speed

Slow test suites kill continuous testing adoption. If your pipeline takes 45 minutes, developers will stop waiting for results and merge without green builds.

Jest supports parallelization out of the box with the --maxWorkers flag, but real parallelization means running test suites across multiple pipeline agents:

# azure-pipelines.yml — parallel test execution
strategy:
  parallel: 4

steps:
  - script: |
      TOTAL_AGENTS=4
      AGENT_INDEX=$(System.JobPositionInPhase)
      npx jest --listTests | sort | awk "NR % $TOTAL_AGENTS == $AGENT_INDEX" | xargs npx jest --runTestsByPath
    displayName: "Run partitioned tests"

This splits your test files evenly across four agents. Each agent runs a different subset, and the stage only passes when all four agents succeed. A test suite that takes 20 minutes on a single agent now takes 5 minutes.

For even smarter splitting, use test timing data:

// scripts/split-tests-by-timing.js
var fs = require("fs");

function splitTestsByTiming(testFiles, agentCount, agentIndex) {
  var timingFile = ".jest-test-timings.json";
  var timings = {};

  if (fs.existsSync(timingFile)) {
    timings = JSON.parse(fs.readFileSync(timingFile, "utf8"));
  }

  var withTimings = testFiles.map(function (file) {
    return { file: file, duration: timings[file] || 5000 };
  });

  withTimings.sort(function (a, b) { return b.duration - a.duration; });

  var buckets = [];
  for (var i = 0; i < agentCount; i++) {
    buckets.push({ files: [], totalTime: 0 });
  }

  withTimings.forEach(function (test) {
    var smallest = buckets.reduce(function (min, bucket) {
      return bucket.totalTime < min.totalTime ? bucket : min;
    });
    smallest.files.push(test.file);
    smallest.totalTime += test.duration;
  });

  return buckets[agentIndex].files;
}

module.exports = { splitTestsByTiming: splitTestsByTiming };

Quality Gates Between Stages

Quality gates are the enforcement mechanism that prevents bad code from progressing through your pipeline. Without explicit gates, continuous testing becomes continuous ignoring of test results.

Every quality gate needs three things:

  1. A measurable threshold — "80% code coverage" not "good enough coverage"
  2. An automated check — Human gates do not scale
  3. A blocking consequence — Failed gates must actually prevent progression
// scripts/quality-gate.js
var fs = require("fs");

function evaluateQualityGate(config) {
  var results = {
    passed: true,
    checks: []
  };

  // Check code coverage
  if (config.coverage) {
    var coverageFile = config.coverage.reportPath || "coverage/coverage-summary.json";
    var coverage = JSON.parse(fs.readFileSync(coverageFile, "utf8"));
    var total = coverage.total;

    Object.keys(config.coverage.thresholds).forEach(function (metric) {
      var threshold = config.coverage.thresholds[metric];
      var actual = total[metric].pct;
      var checkPassed = actual >= threshold;

      results.checks.push({
        name: "Coverage: " + metric,
        threshold: threshold + "%",
        actual: actual + "%",
        passed: checkPassed
      });

      if (!checkPassed) results.passed = false;
    });
  }

  // Check test results
  if (config.tests) {
    var testResultsFile = config.tests.reportPath || "test-results.json";
    var testResults = JSON.parse(fs.readFileSync(testResultsFile, "utf8"));

    var failedTests = testResults.numFailedTests || 0;
    var checkPassed = failedTests === 0;

    results.checks.push({
      name: "Test failures",
      threshold: "0",
      actual: String(failedTests),
      passed: checkPassed
    });

    if (!checkPassed) results.passed = false;
  }

  // Check for known vulnerability patterns
  if (config.security) {
    var auditOutput = require("child_process").execSync("npm audit --json", { encoding: "utf8" });
    var audit = JSON.parse(auditOutput);
    var criticalCount = (audit.metadata && audit.metadata.vulnerabilities && audit.metadata.vulnerabilities.critical) || 0;

    var securityPassed = criticalCount === 0;
    results.checks.push({
      name: "Critical vulnerabilities",
      threshold: "0",
      actual: String(criticalCount),
      passed: securityPassed
    });

    if (!securityPassed) results.passed = false;
  }

  return results;
}

// Run as CLI
var config = JSON.parse(fs.readFileSync("quality-gate.json", "utf8"));
var results = evaluateQualityGate(config);

console.log("\n=== Quality Gate Results ===\n");
results.checks.forEach(function (check) {
  var icon = check.passed ? "PASS" : "FAIL";
  console.log("[" + icon + "] " + check.name + ": " + check.actual + " (threshold: " + check.threshold + ")");
});

console.log("\nOverall: " + (results.passed ? "PASSED" : "FAILED"));

if (!results.passed) {
  process.exit(1);
}

Monitoring as Testing in Production

Once code reaches production, testing does not stop. Synthetic monitoring replaces traditional tests — automated scripts that continuously exercise your application's critical paths from the user's perspective.

// monitoring/synthetic-tests.js
var https = require("https");

var ENDPOINTS = [
  { name: "Homepage", path: "/", expectedStatus: 200, maxLatency: 2000 },
  { name: "API Health", path: "/api/health", expectedStatus: 200, maxLatency: 500 },
  { name: "Articles List", path: "/articles", expectedStatus: 200, maxLatency: 3000 },
  { name: "Search API", path: "/api/search?q=test", expectedStatus: 200, maxLatency: 1000 }
];

function runSyntheticTest(endpoint) {
  return new Promise(function (resolve) {
    var start = Date.now();
    var options = {
      hostname: process.env.PRODUCTION_HOST,
      path: endpoint.path,
      method: "GET",
      timeout: 10000
    };

    var req = https.request(options, function (res) {
      var latency = Date.now() - start;
      var passed = res.statusCode === endpoint.expectedStatus && latency <= endpoint.maxLatency;

      resolve({
        name: endpoint.name,
        passed: passed,
        statusCode: res.statusCode,
        latency: latency,
        expectedStatus: endpoint.expectedStatus,
        maxLatency: endpoint.maxLatency
      });
    });

    req.on("error", function (err) {
      resolve({
        name: endpoint.name,
        passed: false,
        error: err.message,
        latency: Date.now() - start
      });
    });

    req.on("timeout", function () {
      req.destroy();
      resolve({
        name: endpoint.name,
        passed: false,
        error: "Timeout after 10s",
        latency: Date.now() - start
      });
    });

    req.end();
  });
}

function runAllSyntheticTests() {
  return Promise.all(ENDPOINTS.map(runSyntheticTest))
    .then(function (results) {
      var failures = results.filter(function (r) { return !r.passed; });

      results.forEach(function (r) {
        var status = r.passed ? "OK" : "FAIL";
        console.log("[" + status + "] " + r.name + " — " + r.latency + "ms" + (r.error ? " (" + r.error + ")" : ""));
      });

      if (failures.length > 0) {
        console.error("\n" + failures.length + " synthetic test(s) failed — sending alert");
        // Trigger PagerDuty, Slack, or email notification here
      }

      return { total: results.length, passed: results.length - failures.length, failed: failures.length };
    });
}

module.exports = { runAllSyntheticTests: runAllSyntheticTests };

Feedback Loops and Test Metrics

The final piece of continuous testing is closing the feedback loop. Metrics tell you whether your testing strategy is actually working or just creating busywork.

Track these metrics weekly:

  • Mean time to detect (MTTD) — How quickly do tests catch defects after introduction?
  • Escaped defect rate — What percentage of bugs reach production uncaught by tests?
  • Test suite execution time — Is your pipeline getting slower over time?
  • Flaky test rate — What percentage of tests fail intermittently? Anything above 2% is a fire.
  • Change failure rate — What percentage of deployments cause incidents?
// metrics/test-metrics-collector.js
var fs = require("fs");

function collectPipelineMetrics(runResults) {
  var metrics = {
    timestamp: new Date().toISOString(),
    pipelineId: process.env.BUILD_BUILDID,
    branch: process.env.BUILD_SOURCEBRANCH,
    stages: {}
  };

  Object.keys(runResults).forEach(function (stage) {
    var stageData = runResults[stage];
    metrics.stages[stage] = {
      duration: stageData.endTime - stageData.startTime,
      testsPassed: stageData.passed,
      testsFailed: stageData.failed,
      testsSkipped: stageData.skipped,
      coveragePercent: stageData.coverage || null
    };
  });

  // Append to metrics log
  var logFile = "test-metrics.jsonl";
  fs.appendFileSync(logFile, JSON.stringify(metrics) + "\n");

  return metrics;
}

module.exports = { collectPipelineMetrics: collectPipelineMetrics };

Complete Working Example

Here is a multi-stage Azure Pipeline that implements continuous testing at every stage:

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main

pr:
  branches:
    include:
      - main

variables:
  nodeVersion: '18.x'
  npmCache: $(Pipeline.Workspace)/.npm

stages:
  # Stage 1: PR Validation — Unit Tests + Static Analysis
  - stage: Validate
    displayName: 'PR Validation'
    condition: eq(variables['Build.Reason'], 'PullRequest')
    jobs:
      - job: UnitTests
        displayName: 'Unit Tests & Coverage'
        strategy:
          parallel: 3
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: $(nodeVersion)

          - task: Cache@2
            inputs:
              key: 'npm | "$(Agent.OS)" | package-lock.json'
              path: $(npmCache)
            displayName: 'Cache npm'

          - script: npm ci
            displayName: 'Install dependencies'

          - script: |
              TOTAL_AGENTS=3
              AGENT_INDEX=$(System.JobPositionInPhase)
              npx jest --listTests | sort | awk "NR % $TOTAL_AGENTS == $AGENT_INDEX" | xargs npx jest --runTestsByPath --ci --coverage --reporters=default --reporters=jest-junit
            displayName: 'Run partitioned unit tests'
            env:
              JEST_JUNIT_OUTPUT_DIR: $(System.DefaultWorkingDirectory)/test-results

          - task: PublishTestResults@2
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: 'test-results/*.xml'
              mergeTestResults: true
            condition: always()

          - task: PublishCodeCoverageResults@2
            inputs:
              summaryFileLocation: 'coverage/cobertura-coverage.xml'
            condition: always()

      - job: StaticAnalysis
        displayName: 'Lint & Security Scan'
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: $(nodeVersion)

          - script: npm ci
            displayName: 'Install dependencies'

          - script: npx eslint . --format json --output-file eslint-results.json
            displayName: 'Run ESLint'

          - script: npm audit --audit-level=critical
            displayName: 'Security audit'
            continueOnError: false

      - job: QualityGate
        displayName: 'Quality Gate'
        dependsOn:
          - UnitTests
          - StaticAnalysis
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - script: node scripts/quality-gate.js
            displayName: 'Evaluate quality gate'

  # Stage 2: Build & Integration Tests
  - stage: Integration
    displayName: 'Integration Testing'
    dependsOn: []
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - job: IntegrationTests
        displayName: 'Integration Tests'
        pool:
          vmImage: 'ubuntu-latest'
        services:
          postgres:
            image: postgres:15
            ports:
              - 5432:5432
            env:
              POSTGRES_DB: testdb
              POSTGRES_USER: testuser
              POSTGRES_PASSWORD: testpass
          redis:
            image: redis:7-alpine
            ports:
              - 6379:6379
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: $(nodeVersion)

          - script: npm ci
            displayName: 'Install dependencies'

          - script: |
              npx jest --testPathPattern='integration' --ci --forceExit --detectOpenHandles
            displayName: 'Run integration tests'
            env:
              DATABASE_URL: 'postgresql://testuser:testpass@localhost:5432/testdb'
              REDIS_URL: 'redis://localhost:6379'

  # Stage 3: Deploy to Staging
  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    dependsOn: Integration
    jobs:
      - deployment: StagingDeploy
        displayName: 'Deploy to Staging'
        environment: 'staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    echo "Deploying to staging..."
                    # Your deployment script here
                  displayName: 'Deploy application'

                - script: |
                    sleep 30
                    npx jest --testPathPattern='smoke' --ci
                  displayName: 'Run smoke tests'
                  env:
                    SMOKE_TARGET_URL: 'https://staging.example.com'

            on:
              failure:
                steps:
                  - script: |
                      echo "Smoke tests failed — rolling back staging"
                      # Rollback script here
                    displayName: 'Rollback staging'

  # Stage 4: Deploy to Production (Canary)
  - stage: DeployProduction
    displayName: 'Production Canary Deploy'
    dependsOn: DeployStaging
    jobs:
      - deployment: CanaryDeploy
        displayName: 'Canary Deployment'
        environment: 'production'
        strategy:
          canary:
            increments: [10, 50, 100]
            deploy:
              steps:
                - script: |
                    echo "Deploying canary at $(Strategy.Increment)%"
                  displayName: 'Deploy canary'

            routeTraffic:
              steps:
                - script: |
                    echo "Routing $(Strategy.Increment)% traffic to canary"
                  displayName: 'Route traffic'

            postRouteTraffic:
              steps:
                - script: |
                    node canary/health-checker.js
                  displayName: 'Canary health analysis'
                  timeoutInMinutes: 10
                  env:
                    CANARY_URL: 'https://canary.example.com'
                    STABLE_URL: 'https://example.com'
                    CANARY_EVAL_WINDOW: '10'
                    CANARY_CHECK_INTERVAL: '30000'

            on:
              failure:
                steps:
                  - script: |
                      echo "Canary failed — rolling back production"
                    displayName: 'Rollback canary'

  # Stage 5: Post-Deployment Synthetic Monitoring
  - stage: PostDeployMonitoring
    displayName: 'Post-Deploy Monitoring'
    dependsOn: DeployProduction
    jobs:
      - job: SyntheticTests
        displayName: 'Synthetic Monitoring'
        pool:
          vmImage: 'ubuntu-latest'
        steps:
          - script: node monitoring/synthetic-tests.js
            displayName: 'Run synthetic tests'
            env:
              PRODUCTION_HOST: 'example.com'

Common Issues and Troubleshooting

Tests pass locally but fail in CI

This is almost always caused by environment differences — different Node.js versions, missing environment variables, or timing-dependent tests. Lock your Node.js version explicitly in your pipeline, set all required env vars in your CI config, and replace setTimeout-based waits with polling loops that check for actual conditions.

Flaky tests causing false pipeline failures

Flaky tests are tests with non-deterministic behavior. Common causes include shared mutable state between tests, dependency on test execution order, or reliance on system time. Fix them by isolating test data with factories, resetting state in beforeEach, and using jest.useFakeTimers() for time-dependent logic. Do not use retry logic as a fix — retries mask the underlying problem and make your suite untrustworthy.

Integration tests timing out waiting for services

Container services in CI take time to start. Add a readiness check that polls the service before running tests. For PostgreSQL, use pg_isready. For Redis, use redis-cli ping. Set a maximum wait time of 60 seconds with a 1-second polling interval. If the service is not ready after 60 seconds, fail fast with a clear error message instead of letting the test timeout obscure the root cause.

Code coverage numbers dropping after adding new features

This happens when developers add code without corresponding tests. The fix is enforcement: set coverageThreshold in your Jest config and make the build fail when coverage drops below the threshold. Use the --changedSince flag to show coverage for recently changed files specifically, making it clear which new code lacks tests. Also set a branch coverage threshold — line coverage alone misses untested conditional branches.

Quality gate passes but bugs still reach production

Your quality gates are probably measuring the wrong things. High code coverage does not guarantee correct behavior — you can have 100% line coverage with zero assertion density. Add mutation testing with Stryker to verify that your tests actually catch bugs, not just execute code paths. Track your escaped defect rate and use each production incident to add a regression test and tighten the gate that should have caught it.

Best Practices

  • Keep unit tests under 10 seconds total. If your unit suite takes longer, you have too many integration tests masquerading as unit tests. Extract the slow ones into a separate integration stage.

  • Use test factories instead of fixtures. Fixture files create hidden coupling between tests. Factories generate fresh, isolated data for each test and make dependencies explicit in the test code.

  • Make every quality gate a blocking gate. Advisory warnings get ignored. If a metric matters enough to measure, it matters enough to enforce. Failed gates should prevent deployment, not just generate a notification.

  • Run smoke tests against the actual deployment, not a proxy. Smoke tests that hit localhost or a mock do not verify that the deployment succeeded. Point them at the real URL with real DNS resolution and real TLS.

  • Track and kill flaky tests aggressively. A flaky test suite teaches developers to ignore failures. Set a policy: if a test flakes more than twice in a week, it gets quarantined and fixed before any new feature work.

  • Instrument your pipeline with timing metrics. You cannot optimize what you do not measure. Track stage duration, test count per stage, and failure rates. Set alerts when pipeline duration exceeds your target (aim for under 15 minutes total).

  • Treat test code with the same rigor as production code. Review test PRs carefully. Refactor test utilities. Delete tests that no longer provide value. Test code that rots becomes worse than no tests because it gives false confidence.

  • Never skip tests to meet a deadline. Skipping tests to ship faster is borrowing against your future velocity at a very high interest rate. The time you save now will cost you triple in debugging, hotfixes, and lost customer trust.

References

Powered by Contentful