Pipelines

Deployment Gates and Pre-Deployment Approvals

A comprehensive guide to configuring deployment gates and pre-deployment approvals in Azure DevOps multi-stage pipelines, covering manual approvals, automated gates, health checks, and gate evaluation policies.

Deployment Gates and Pre-Deployment Approvals

Overview

Deployment gates are automated and manual checkpoints in Azure DevOps pipelines that must pass before a deployment can proceed to the next stage. They exist to prevent bad releases from reaching production by enforcing business rules, health checks, and human oversight at critical pipeline boundaries. If you have ever pushed a build to production only to realize the monitoring dashboard was already on fire from the last deployment, gates are how you stop that from happening again.

Prerequisites

  • An Azure DevOps organization and project with Pipelines enabled
  • At least one YAML-based multi-stage pipeline
  • Access to create and configure Environments in Azure DevOps
  • Basic familiarity with YAML pipeline syntax (stages, jobs, deploymentjobs)
  • An Azure subscription if you plan to use Azure Monitor gates
  • Project Administrator or Environment Administrator permissions

Understanding Environments in Azure DevOps

Before you configure gates, you need to understand how Azure DevOps Environments work. An Environment is a named target that represents where your code runs. Unlike classic release pipeline stages, YAML pipelines use environments as first-class resources with their own approval and check policies.

You create environments in Azure DevOps under Pipelines > Environments. Each environment can have:

  • Approval checks (manual sign-off from designated users)
  • Branch control (restrict which branches can deploy)
  • Business hours restrictions
  • Template-required checks
  • Exclusive lock policies
  • Custom gates via Azure Functions or REST APIs

When a deployment job targets an environment, Azure DevOps evaluates all checks and gates configured on that environment before the job runs. The deployment job pauses and waits until every check passes or times out.

# Referencing an environment in a deployment job
stages:
  - stage: DeployProduction
    jobs:
      - deployment: DeployWeb
        environment: 'production'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: echo "Deploying to production"

The key insight here is that gates and approvals are configured on the environment resource, not in the pipeline YAML itself. This is a deliberate design decision. It means a team lead can enforce approval policies across every pipeline that targets production, without trusting individual pipeline authors to include the right checks.

Manual Approval Gates

Manual approvals are the most common gate type. You configure them on an environment, and when a pipeline reaches a deployment job targeting that environment, it pauses and notifies the designated approvers.

Configuring Manual Approvals

Navigate to Pipelines > Environments > [your environment] > Approvals and checks > Add check > Approvals.

You can configure:

  • Approvers: Individual users or groups
  • Instructions: Context for approvers about what to verify
  • Timeout: How long to wait before the approval times out (default is 30 days, but you should lower this)
  • Minimum number of approvers: Require multiple sign-offs for critical environments
  • Allow approvers to approve their own runs: Usually disabled for production

Here is how the approval gate interacts with pipeline execution:

Pipeline Run Starts
  |
  v
Build Stage --> succeeds
  |
  v
Deploy to Staging --> succeeds
  |
  v
Deploy to Production --> PAUSED (waiting for approval)
  |
  v
Approver receives email/Teams notification
  |
  v
Approver reviews changes, approves or rejects
  |
  v
Deployment proceeds or fails

Approval Timeout Behavior

When an approval times out, the deployment job fails with status Canceled. The pipeline run shows:

Approval timed out. The approval for environment 'production' was not completed within the timeout period of 72:00:00.

Set timeouts that match your team's workflow. For staging environments, 4 hours is reasonable. For production, 24-72 hours gives people enough time to review without leaving pipeline runs dangling forever.

Multi-Approver Configuration

For high-risk environments, require multiple approvers. You can set this to require all specified approvers or a minimum count:

Minimum number of approvers: 2
Approvers:
  - [Team Lead Group]
  - [Release Manager]
  - [QA Lead]
Order: Any order

With this configuration, at least 2 of the 3 designated approvers must sign off before the deployment proceeds. This prevents a single person from pushing a risky change through.

Automated Gates

Automated gates are where things get powerful. Instead of relying on humans to click a button, you configure automated checks that evaluate conditions programmatically.

Azure Monitor Gates

Azure Monitor gates query your monitoring infrastructure to verify that your application is healthy before proceeding. This is particularly useful for post-deployment validation: deploy to staging, then gate the production deployment on staging health metrics.

Configure an Azure Monitor gate under Approvals and checks > Add check > Azure Monitor alerts:

  • Azure subscription: The subscription containing your monitoring resources
  • Resource group: Where your monitored resources live
  • Resource: The specific Application Insights or Azure Monitor resource
  • Alert rules: Which alert rules to evaluate

The gate checks whether any active alerts exist on the specified resources. If alerts are firing, the gate fails and the deployment is blocked.

REST API Gates (Invoke REST API)

REST API gates are the most flexible gate type. They call an HTTP endpoint and evaluate the response to determine pass/fail. This lets you integrate any external system into your deployment pipeline.

Configure under Approvals and checks > Add check > Invoke REST API.

Here is an example: you have a Node.js service that checks whether your feature flag system is ready for a deployment:

var express = require('express');
var axios = require('axios');
var app = express();

app.get('/api/deployment-gate', function(req, res) {
    var environment = req.query.environment;
    var buildId = req.query.buildId;

    // Check feature flag service health
    checkFeatureFlags(environment, function(err, flagsReady) {
        if (err) {
            return res.status(500).json({
                status: 'failed',
                message: 'Could not reach feature flag service: ' + err.message
            });
        }

        // Check active incident status
        checkIncidentStatus(function(err, incidents) {
            if (err) {
                return res.status(500).json({
                    status: 'failed',
                    message: 'Could not check incident status'
                });
            }

            var activeP1 = incidents.filter(function(i) {
                return i.severity === 'P1' && i.status === 'active';
            });

            if (activeP1.length > 0) {
                return res.status(200).json({
                    status: 'failed',
                    message: 'Active P1 incident: ' + activeP1[0].title + '. Deployment blocked.'
                });
            }

            if (!flagsReady) {
                return res.status(200).json({
                    status: 'failed',
                    message: 'Feature flags not configured for ' + environment
                });
            }

            res.status(200).json({
                status: 'succeeded',
                message: 'All pre-deployment checks passed for build ' + buildId
            });
        });
    });
});

function checkFeatureFlags(environment, callback) {
    axios.get('https://flags.internal.company.com/api/status/' + environment)
        .then(function(response) {
            callback(null, response.data.ready === true);
        })
        .catch(function(err) {
            callback(err);
        });
}

function checkIncidentStatus(callback) {
    axios.get('https://incidents.internal.company.com/api/active')
        .then(function(response) {
            callback(null, response.data.incidents || []);
        })
        .catch(function(err) {
            callback(err);
        });
}

var port = process.env.PORT || 3500;
app.listen(port, function() {
    console.log('Deployment gate service listening on port ' + port);
});

The REST API gate configuration expects a specific response format. The response body must include a status field. Azure DevOps evaluates the response like this:

HTTP Status Code Response Body status Gate Result
2xx succeeded Pass
2xx failed Fail (will retry)
4xx/5xx any Fail (will retry)

Work Item Validation Gates

Work item gates verify that all work items linked to the pipeline run meet specific criteria. For example, you can require that all linked user stories are in a "Ready for Release" state before deploying.

Configure under Approvals and checks > Add check > Query Work Items.

Specify a work item query that returns items blocking deployment:

SELECT [System.Id], [System.Title], [System.State]
FROM WorkItems
WHERE [System.TeamProject] = @project
AND [System.State] NOT IN ('Ready for Release', 'Closed', 'Resolved')
AND [System.Tags] CONTAINS 'release-blocker'
ORDER BY [System.CreatedDate] DESC

The gate passes when the query returns zero results (no blockers found). If the query returns any work items, the gate fails and displays them in the pipeline run summary.

Gate Evaluation Timing and Retry Logic

This is where most people get confused. Gates do not evaluate once. They evaluate repeatedly on a schedule until they pass or time out.

Evaluation Flow

Gate configured with:
  - Delay before first evaluation: 5 minutes
  - Time between evaluations: 10 minutes
  - Timeout: 60 minutes
  - Minimum duration: 15 minutes

Timeline:
  T+0:00  - Deployment reaches gate, timer starts
  T+0:05  - First evaluation (fails - monitoring alert active)
  T+0:15  - Second evaluation (fails - alert still active)
  T+0:25  - Third evaluation (succeeds - alert resolved)
  T+0:30  - Minimum duration reached, but gate already passed at T+0:25
  T+0:25  - Gate passes, deployment proceeds

The key parameters:

  • Delay before first evaluation: Wait this long before the first check. Useful for post-deployment gates where you need the deployment to stabilize before checking health.
  • Time between re-evaluation of gates: How often to retry failed gates.
  • Timeout after which gates fail: Maximum time to wait for all gates to pass.
  • Minimum duration: Even if gates pass immediately, wait at least this long. This is useful for "soak time" where you want to observe a deployment for a minimum period.

The Minimum Duration Trap

Here is a subtle behavior that catches people. If you set a minimum duration of 30 minutes and all your gates pass on the first evaluation at T+5 minutes, the pipeline still waits until T+30 minutes. However, the gates continue to be re-evaluated during this period. If a gate that passed at T+5 fails at T+20, the gate result resets and the timer keeps running.

This means a gate with a 30-minute minimum duration and 5-minute evaluation interval must pass consistently across multiple evaluations, not just once. This is actually a good thing for health check gates. You want the system to be healthy over a sustained period, not just at one point in time.

Pre-Deployment vs Post-Deployment Gates

Azure DevOps classic release pipelines had explicit pre-deployment and post-deployment gate phases. YAML pipelines handle this differently through stage dependencies and environment checks.

Pre-Deployment Pattern

Gates configured on the target environment act as pre-deployment gates. They evaluate before the deployment job runs:

stages:
  - stage: Build
    jobs:
      - job: BuildApp
        steps:
          - script: npm ci && npm run build
          - publish: $(Build.ArtifactStagingDirectory)
            artifact: drop

  - stage: DeployStaging
    dependsOn: Build
    jobs:
      - deployment: DeployToStaging
        environment: 'staging'  # gates on 'staging' environment evaluate here
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: drop
                - script: |
                    echo "Deploying to staging..."

Post-Deployment Validation Pattern

For post-deployment validation, use a separate stage with its own gate checks. The trick is to make the next stage depend on a validation stage that runs automated tests or health checks:

stages:
  - stage: DeployStaging
    jobs:
      - deployment: Deploy
        environment: 'staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - script: echo "Deploy to staging"

  - stage: ValidateStaging
    dependsOn: DeployStaging
    jobs:
      - job: RunSmokeTests
        steps:
          - script: |
              echo "Running smoke tests against staging..."
              npm run test:smoke -- --url https://staging.myapp.com
            displayName: 'Smoke Tests'

          - script: |
              echo "Checking staging health endpoint..."
              curl -f https://staging.myapp.com/health || exit 1
            displayName: 'Health Check'

  - stage: DeployProduction
    dependsOn: ValidateStaging
    jobs:
      - deployment: Deploy
        environment: 'production'  # approval + automated gates here
        strategy:
          runOnce:
            deploy:
              steps:
                - script: echo "Deploy to production"

This pattern creates a natural flow: deploy to staging, validate staging automatically, then gate production behind both manual approval and automated checks on the production environment.

Multi-Stage Pipeline Gate Configurations

Real-world pipelines have more than two stages. Here is how to structure gates across a full deployment pipeline with multiple environments:

trigger:
  branches:
    include:
      - main

pool:
  vmImage: 'ubuntu-latest'

variables:
  - group: 'app-settings'
  - name: nodeVersion
    value: '20.x'

stages:
  # Stage 1: Build and unit test
  - stage: Build
    displayName: 'Build & Test'
    jobs:
      - job: BuildAndTest
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: $(nodeVersion)

          - script: |
              npm ci
              npm run lint
              npm test
            displayName: 'Install, Lint, Test'

          - script: npm run build
            displayName: 'Build Application'

          - publish: $(System.DefaultWorkingDirectory)/dist
            artifact: app-package

  # Stage 2: Deploy to Dev (no gates, auto-deploy)
  - stage: DeployDev
    displayName: 'Deploy to Dev'
    dependsOn: Build
    jobs:
      - deployment: DeployDev
        environment: 'dev'
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: app-package
                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: 'Azure-Dev'
                    appName: 'myapp-dev'
                    package: '$(Pipeline.Workspace)/app-package'

  # Stage 3: Integration tests on Dev
  - stage: IntegrationTests
    displayName: 'Integration Tests'
    dependsOn: DeployDev
    jobs:
      - job: RunIntegrationTests
        steps:
          - script: |
              npm ci
              npm run test:integration -- --baseUrl https://myapp-dev.azurewebsites.net
            displayName: 'Run Integration Tests'
            timeoutInMinutes: 15

  # Stage 4: Deploy to Staging (branch control gate on environment)
  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    dependsOn: IntegrationTests
    jobs:
      - deployment: DeployStaging
        environment: 'staging'  # Has branch control: only 'main' branch allowed
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: app-package
                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: 'Azure-Staging'
                    appName: 'myapp-staging'
                    package: '$(Pipeline.Workspace)/app-package'

  # Stage 5: Performance and smoke tests on Staging
  - stage: StagingValidation
    displayName: 'Staging Validation'
    dependsOn: DeployStaging
    jobs:
      - job: SmokeTests
        steps:
          - script: |
              echo "Running smoke tests..."
              npm ci
              npm run test:smoke -- --baseUrl https://myapp-staging.azurewebsites.net
            displayName: 'Smoke Tests'

      - job: PerformanceTests
        steps:
          - script: |
              echo "Running k6 load tests..."
              k6 run --vus 50 --duration 2m tests/load/baseline.js
            displayName: 'Load Tests'
            timeoutInMinutes: 10

  # Stage 6: Deploy to Production (manual approval + automated gates)
  - stage: DeployProduction
    displayName: 'Deploy to Production'
    dependsOn: StagingValidation
    jobs:
      - deployment: DeployProduction
        environment: 'production'
        # Environment 'production' has:
        #   - Manual approval (2 approvers required)
        #   - Azure Monitor gate (no active P1/P2 alerts)
        #   - REST API gate (change management system approval)
        #   - Business hours check (Mon-Thu, 9am-3pm)
        #   - Exclusive lock (one deployment at a time)
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: app-package
                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: 'Azure-Production'
                    appName: 'myapp-prod'
                    package: '$(Pipeline.Workspace)/app-package'
                    deploymentMethod: 'zipDeploy'
                    appSettings: |
                      -NODE_ENV production
            on:
              success:
                steps:
                  - script: |
                      echo "Production deployment succeeded"
                      echo "Build: $(Build.BuildNumber)"
                      echo "Commit: $(Build.SourceVersion)"
                    displayName: 'Post-deploy notification'
              failure:
                steps:
                  - script: |
                      echo "##vso[task.logissue type=error]Production deployment failed!"
                    displayName: 'Failure notification'

  # Stage 7: Post-deployment verification
  - stage: ProductionVerification
    displayName: 'Production Verification'
    dependsOn: DeployProduction
    jobs:
      - job: VerifyProduction
        steps:
          - script: |
              echo "Verifying production deployment..."
              response=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.com/health)
              if [ "$response" != "200" ]; then
                echo "##vso[task.logissue type=error]Health check failed with status $response"
                exit 1
              fi
              echo "Health check passed: HTTP $response"
            displayName: 'Production Health Check'

          - script: |
              echo "Running production smoke tests..."
              npm ci
              npm run test:smoke -- --baseUrl https://myapp.com --subset critical
            displayName: 'Critical Path Tests'
            timeoutInMinutes: 5

Gate Timeout and Evaluation Policies

Getting timeout and evaluation policies right is critical. Set them too aggressively and you get false failures. Set them too loosely and bad deployments slip through.

Recommended Settings by Environment

Setting Staging Production
Delay before first evaluation 0 minutes 5 minutes
Time between evaluations 5 minutes 10 minutes
Timeout 2 hours 8 hours
Minimum duration 0 minutes 15 minutes
Approval timeout 4 hours 72 hours

Business Hours Check

The Business Hours check restricts deployments to specific time windows. This is invaluable for production environments where you want to avoid deploying on Friday at 4:55 PM.

Configure under Approvals and checks > Add check > Business Hours:

Time zone: (UTC-08:00) Pacific Time
Days: Monday, Tuesday, Wednesday, Thursday
Start time: 09:00
End time: 15:00

If a pipeline reaches the gate outside business hours, it waits until the next valid window. This combines well with approval gates: the approver gets notified, reviews the changes, and the deployment naturally lands during the next safe window.

Exclusive Lock Check

The Exclusive Lock check ensures only one pipeline run deploys to an environment at a time. This prevents race conditions when multiple pipeline runs target the same environment.

Configure under Approvals and checks > Add check > Exclusive Lock:

Lock behavior: Sequential (runs queue and execute in order)
  -- or --
Lock behavior: Run latest only (cancels queued runs, only latest proceeds)

For production, "Run latest only" is usually correct. If three builds queue up, you only care about the latest one.

Combining Gates with Environment Checks

The real power of gates emerges when you combine multiple check types on a single environment. Azure DevOps evaluates all checks in parallel and requires all of them to pass.

Here is a production environment configuration that I use on real projects:

Check 1: Manual Approval

  • Approvers: Release Managers group
  • Minimum approvers: 2
  • Timeout: 72 hours
  • Instructions: "Review the staging test results, check the deployment diff, and verify the change management ticket."

Check 2: Azure Monitor

  • Resource: Application Insights (production)
  • Alert rules: All severity 0-2 alerts
  • Evaluation: Fail if any alerts are active

Check 3: Invoke REST API

  • URL: https://gates.internal.company.com/api/deployment-ready
  • Method: POST
  • Headers: Authorization: Bearer $(gate-service-token)
  • Body:
{
    "buildId": "$(Build.BuildId)",
    "environment": "production",
    "repository": "$(Build.Repository.Name)",
    "branch": "$(Build.SourceBranch)",
    "requestedBy": "$(Build.RequestedFor)"
}
  • Success criteria: Response body contains "status": "succeeded"
  • Evaluation interval: 10 minutes
  • Timeout: 4 hours

Check 4: Business Hours

  • Mon-Thu, 9am-3pm Pacific

Check 5: Exclusive Lock

  • Run latest only

Writing a Comprehensive Gate Service

Here is a more complete gate service that checks multiple conditions and returns detailed results:

var express = require('express');
var axios = require('axios');
var app = express();

app.use(express.json());

// Gate evaluation endpoint
app.post('/api/deployment-ready', function(req, res) {
    var buildId = req.body.buildId;
    var environment = req.body.environment;
    var requestedBy = req.body.requestedBy;

    console.log('[Gate] Evaluating deployment gate for build ' + buildId + ' to ' + environment);

    var checks = [
        checkChangeManagement(buildId),
        checkActiveIncidents(),
        checkDeploymentFreeze(),
        checkDependencyHealth(environment)
    ];

    Promise.all(checks)
        .then(function(results) {
            var failures = results.filter(function(r) { return r.passed === false; });

            if (failures.length > 0) {
                var messages = failures.map(function(f) { return f.reason; });
                console.log('[Gate] BLOCKED: ' + messages.join('; '));

                return res.status(200).json({
                    status: 'failed',
                    message: 'Deployment blocked: ' + messages.join('; '),
                    checks: results
                });
            }

            console.log('[Gate] PASSED: All checks succeeded for build ' + buildId);

            res.status(200).json({
                status: 'succeeded',
                message: 'All deployment gates passed',
                checks: results,
                evaluatedAt: new Date().toISOString()
            });
        })
        .catch(function(err) {
            console.error('[Gate] ERROR: ' + err.message);
            res.status(500).json({
                status: 'failed',
                message: 'Gate evaluation error: ' + err.message
            });
        });
});

function checkChangeManagement(buildId) {
    return axios.get('https://changemgmt.internal.company.com/api/tickets', {
        params: { buildId: buildId, status: 'approved' }
    })
    .then(function(response) {
        var hasApprovedTicket = response.data.tickets && response.data.tickets.length > 0;
        return {
            check: 'change-management',
            passed: hasApprovedTicket,
            reason: hasApprovedTicket ? 'Change ticket approved' : 'No approved change ticket found for build ' + buildId
        };
    })
    .catch(function(err) {
        return {
            check: 'change-management',
            passed: false,
            reason: 'Change management system unreachable: ' + err.message
        };
    });
}

function checkActiveIncidents() {
    return axios.get('https://incidents.internal.company.com/api/active')
        .then(function(response) {
            var severeIncidents = (response.data.incidents || []).filter(function(i) {
                return i.severity <= 2; // P1 or P2
            });
            var hasSevereIncidents = severeIncidents.length > 0;
            return {
                check: 'active-incidents',
                passed: !hasSevereIncidents,
                reason: hasSevereIncidents
                    ? 'Active P' + severeIncidents[0].severity + ' incident: ' + severeIncidents[0].title
                    : 'No active P1/P2 incidents'
            };
        })
        .catch(function(err) {
            return {
                check: 'active-incidents',
                passed: false,
                reason: 'Incident system unreachable: ' + err.message
            };
        });
}

function checkDeploymentFreeze() {
    return axios.get('https://changemgmt.internal.company.com/api/freeze-windows/active')
        .then(function(response) {
            var freezeActive = response.data.freezeActive === true;
            return {
                check: 'deployment-freeze',
                passed: !freezeActive,
                reason: freezeActive
                    ? 'Deployment freeze active until ' + response.data.endsAt
                    : 'No deployment freeze in effect'
            };
        })
        .catch(function(err) {
            return {
                check: 'deployment-freeze',
                passed: false,
                reason: 'Could not verify freeze window status: ' + err.message
            };
        });
}

function checkDependencyHealth(environment) {
    var endpoints = {
        staging: [
            'https://api-staging.internal.company.com/health',
            'https://auth-staging.internal.company.com/health'
        ],
        production: [
            'https://api.internal.company.com/health',
            'https://auth.internal.company.com/health',
            'https://payments.internal.company.com/health'
        ]
    };

    var urls = endpoints[environment] || endpoints.production;

    var healthChecks = urls.map(function(url) {
        return axios.get(url, { timeout: 5000 })
            .then(function(response) {
                return { url: url, healthy: response.status === 200 };
            })
            .catch(function() {
                return { url: url, healthy: false };
            });
    });

    return Promise.all(healthChecks)
        .then(function(results) {
            var unhealthy = results.filter(function(r) { return !r.healthy; });
            return {
                check: 'dependency-health',
                passed: unhealthy.length === 0,
                reason: unhealthy.length === 0
                    ? 'All ' + results.length + ' dependencies healthy'
                    : 'Unhealthy dependencies: ' + unhealthy.map(function(u) { return u.url; }).join(', ')
            };
        });
}

var port = process.env.PORT || 3500;
app.listen(port, function() {
    console.log('Deployment gate service running on port ' + port);
});

The corresponding package.json for this gate service:

{
  "name": "deployment-gate-service",
  "version": "1.0.0",
  "description": "Azure DevOps deployment gate evaluation service",
  "main": "server.js",
  "scripts": {
    "start": "node server.js",
    "dev": "nodemon server.js"
  },
  "dependencies": {
    "express": "^4.18.2",
    "axios": "^1.6.0"
  },
  "devDependencies": {
    "nodemon": "^3.0.0"
  }
}

Complete Working Example

Here is a complete multi-stage YAML pipeline that demonstrates environment approvals, automated health check gates, and post-deployment validation. This is a real-world pattern for deploying a Node.js application to Azure App Service.

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main

pr: none

pool:
  vmImage: 'ubuntu-latest'

variables:
  - name: nodeVersion
    value: '20.x'
  - name: appName
    value: 'myapp'
  - group: 'deployment-secrets'

stages:
  # ============================================================
  # STAGE 1: Build, Lint, Test
  # ============================================================
  - stage: Build
    displayName: 'Build & Test'
    jobs:
      - job: BuildJob
        displayName: 'Build Application'
        steps:
          - task: NodeTool@0
            displayName: 'Use Node $(nodeVersion)'
            inputs:
              versionSpec: $(nodeVersion)

          - script: npm ci
            displayName: 'Install dependencies'

          - script: npm run lint
            displayName: 'Run linter'

          - script: npm test -- --ci --coverage
            displayName: 'Run unit tests'

          - task: PublishTestResults@2
            displayName: 'Publish test results'
            inputs:
              testResultsFormat: 'JUnit'
              testResultsFiles: '**/junit.xml'
            condition: always()

          - task: PublishCodeCoverageResults@1
            displayName: 'Publish coverage'
            inputs:
              codeCoverageTool: 'Cobertura'
              summaryFileLocation: '**/coverage/cobertura-coverage.xml'

          - script: npm run build
            displayName: 'Build for production'

          - task: ArchiveFiles@2
            displayName: 'Archive build output'
            inputs:
              rootFolderOrFile: '$(System.DefaultWorkingDirectory)'
              includeRootFolder: false
              archiveType: 'zip'
              archiveFile: '$(Build.ArtifactStagingDirectory)/$(appName)-$(Build.BuildNumber).zip'

          - publish: '$(Build.ArtifactStagingDirectory)/$(appName)-$(Build.BuildNumber).zip'
            artifact: drop
            displayName: 'Publish artifact'

  # ============================================================
  # STAGE 2: Deploy to Staging
  # Environment 'staging' has:
  #   - Branch control (main only)
  # ============================================================
  - stage: DeployStaging
    displayName: 'Deploy to Staging'
    dependsOn: Build
    condition: succeeded()
    jobs:
      - deployment: DeployStagingJob
        displayName: 'Deploy to Staging'
        environment: 'staging'
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: drop

                - task: AzureWebApp@1
                  displayName: 'Deploy to Azure App Service (Staging)'
                  inputs:
                    azureSubscription: 'Azure-NonProd'
                    appType: 'webAppLinux'
                    appName: '$(appName)-staging'
                    package: '$(Pipeline.Workspace)/drop/$(appName)-$(Build.BuildNumber).zip'
                    runtimeStack: 'NODE|20-lts'

  # ============================================================
  # STAGE 3: Validate Staging
  # Automated tests and health checks
  # ============================================================
  - stage: ValidateStaging
    displayName: 'Validate Staging'
    dependsOn: DeployStaging
    condition: succeeded()
    jobs:
      - job: HealthCheck
        displayName: 'Health Check'
        steps:
          - script: |
              echo "Waiting 30 seconds for app to start..."
              sleep 30

              echo "Checking health endpoint..."
              HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
                https://$(appName)-staging.azurewebsites.net/health)

              echo "Health check response: HTTP $HTTP_STATUS"

              if [ "$HTTP_STATUS" != "200" ]; then
                echo "##vso[task.logissue type=error]Health check failed: HTTP $HTTP_STATUS"
                exit 1
              fi

              echo "##vso[task.complete result=Succeeded;]Health check passed"
            displayName: 'Verify health endpoint'
            retryCountOnTaskFailure: 3

      - job: SmokeTests
        displayName: 'Smoke Tests'
        dependsOn: HealthCheck
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: $(nodeVersion)

          - script: |
              npm ci
              STAGING_URL=https://$(appName)-staging.azurewebsites.net \
                npm run test:smoke
            displayName: 'Run smoke tests'
            timeoutInMinutes: 10

      - job: SecurityScan
        displayName: 'Security Scan'
        dependsOn: HealthCheck
        steps:
          - script: |
              npm audit --production --audit-level=high
            displayName: 'NPM audit check'
            continueOnError: false

  # ============================================================
  # STAGE 4: Deploy to Production
  # Environment 'production' has:
  #   - Manual approval (2 required from Release Managers)
  #   - Azure Monitor gate (no active alerts)
  #   - REST API gate (change management approval)
  #   - Business hours (Mon-Thu, 9am-3pm PT)
  #   - Exclusive lock (run latest only)
  # ============================================================
  - stage: DeployProduction
    displayName: 'Deploy to Production'
    dependsOn: ValidateStaging
    condition: succeeded()
    jobs:
      - deployment: DeployProductionJob
        displayName: 'Deploy to Production'
        environment: 'production'
        strategy:
          runOnce:
            deploy:
              steps:
                - download: current
                  artifact: drop

                - task: AzureWebApp@1
                  displayName: 'Deploy to Azure App Service (Production)'
                  inputs:
                    azureSubscription: 'Azure-Production'
                    appType: 'webAppLinux'
                    appName: '$(appName)-prod'
                    package: '$(Pipeline.Workspace)/drop/$(appName)-$(Build.BuildNumber).zip'
                    runtimeStack: 'NODE|20-lts'
                    deploymentMethod: 'zipDeploy'

            on:
              success:
                steps:
                  - script: |
                      echo "Production deployment completed successfully"
                      echo "Build Number: $(Build.BuildNumber)"
                      echo "Source Version: $(Build.SourceVersion)"
                      echo "Deployed by: $(Build.RequestedFor)"
                      echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
                    displayName: 'Log deployment details'

  # ============================================================
  # STAGE 5: Post-Deployment Verification
  # ============================================================
  - stage: ProductionVerification
    displayName: 'Production Verification'
    dependsOn: DeployProduction
    condition: succeeded()
    jobs:
      - job: VerifyDeployment
        displayName: 'Verify Production'
        steps:
          - script: |
              echo "Waiting 60 seconds for production deployment to stabilize..."
              sleep 60

              echo "=== Production Health Check ==="
              HTTP_STATUS=$(curl -s -o response.json -w "%{http_code}" \
                https://$(appName).com/health)

              echo "HTTP Status: $HTTP_STATUS"
              cat response.json | python3 -m json.tool 2>/dev/null || cat response.json

              if [ "$HTTP_STATUS" != "200" ]; then
                echo "##vso[task.logissue type=error]Production health check failed: HTTP $HTTP_STATUS"
                exit 1
              fi

              echo ""
              echo "=== Version Verification ==="
              DEPLOYED_VERSION=$(curl -s https://$(appName).com/health | \
                python3 -c "import sys,json; print(json.load(sys.stdin).get('version','unknown'))")
              echo "Deployed version: $DEPLOYED_VERSION"
              echo "Expected build: $(Build.BuildNumber)"
            displayName: 'Production health and version check'
            retryCountOnTaskFailure: 2

          - script: |
              npm ci
              PROD_URL=https://$(appName).com \
                npm run test:smoke -- --subset critical-path
            displayName: 'Critical path smoke tests'
            timeoutInMinutes: 5

The pipeline run for this configuration takes approximately 15-25 minutes for the automated stages, plus whatever time the manual approval and gate evaluations require. A typical production deployment might look like:

Build & Test          ~4 minutes
Deploy to Staging     ~2 minutes
Validate Staging      ~5 minutes
  - Health Check        30s + check
  - Smoke Tests         ~3 minutes
  - Security Scan       ~1 minute
Deploy to Production  ~15 minutes (waiting for gates) + ~2 minutes (deploy)
  - Manual Approval     variable (minutes to hours)
  - Azure Monitor       evaluated every 10 min
  - REST API Gate       evaluated every 10 min
  - Business Hours      waits until next window
Production Verify     ~3 minutes

Common Issues and Troubleshooting

Issue 1: Gate Evaluation Returns "Task Was Canceled"

##[error]The gate evaluation was canceled because the timeout of '480' minutes was reached.
Task 'InvokeRestAPI' was canceled.

Cause: Your REST API gate endpoint is returning a non-success response, and the gate keeps retrying until the timeout expires. This usually means the endpoint is down, returning 500 errors, or the response body does not match the expected format.

Fix: Verify your gate endpoint returns a 200 status code with {"status": "succeeded"} or {"status": "failed"} in the response body. Check the endpoint logs. A common mistake is returning {"result": "success"} instead of {"status": "succeeded"}. The field name and value must be exact.

Issue 2: Approval Notification Not Sent

The approval for environment 'production' is pending. No notification was sent.
Approval status: Pending
Approvers: [Release Managers]

Cause: Notification settings for the approver group or individual are not configured, or the notification subscription for "A deployment approval is pending" is disabled in the user's notification settings.

Fix: Each approver needs to verify their notification settings under User Settings > Notifications. The subscription "A deployment approval is pending" must be enabled. Also confirm that the approver group contains actual users, not just nested groups that might resolve to empty sets.

Issue 3: Business Hours Gate Blocks All Deployments

##[section]Checking Business Hours...
Business hours check failed. Current time is outside the allowed deployment window.
Next allowed window: Monday 09:00 AM (UTC-08:00)

Cause: You configured business hours in the wrong timezone, or you set the days too restrictively. If your team is distributed across timezones, the business hours check uses the timezone you configured, not the user's local timezone.

Fix: Choose a timezone that accommodates your entire team. If your team spans US Pacific to European time, consider using UTC and setting a wider window. Or set the window to 8am-4pm in your most restrictive timezone. Also remember that Friday deployments are blocked if you only check Monday-Thursday, which might be intentional but catches people off guard.

Issue 4: Exclusive Lock Causes Pipeline Runs to Queue Indefinitely

##[section]Checking Exclusive Lock...
Waiting for exclusive lock on environment 'production'.
Position in queue: 3
Currently locked by run: #20240115.4

Cause: A previous pipeline run has the exclusive lock on the environment and is stuck (waiting for approval, gate evaluation, or failed without releasing the lock). All subsequent runs queue behind it.

Fix: Go to Pipelines > Environments > production > Runs and cancel the stuck run. This releases the lock and allows the next queued run to proceed. If you configured "Run latest only," only the most recent run will proceed and intermediate runs are automatically canceled. Consider switching to "Run latest only" if this happens frequently. You can also cancel queued runs from the pipeline runs list.

Issue 5: REST API Gate Passes But Deployment Still Blocked

Gate 'Invoke REST API' evaluation result: Succeeded
Gate 'Azure Monitor' evaluation result: Succeeded
Overall gate status: Waiting (minimum duration not reached)
Time remaining: 12 minutes

Cause: You configured a minimum duration on the gate evaluation, and while all gates have passed, the minimum soak time has not elapsed. The gates will continue to be re-evaluated during this period.

Fix: This is working as intended, but if the minimum duration is too long for your workflow, reduce it. A 5-minute minimum duration is usually sufficient for health check soak time. The 30-minute defaults some teams set are excessive for most applications.

Best Practices

  • Start with manual approvals and add automated gates incrementally. Do not try to automate every gate on day one. Begin with manual approvals for production, get your team comfortable with the flow, then layer on automated health checks and REST API gates as you identify what matters most.

  • Set gate timeouts shorter than you think you need. A 30-day approval timeout means forgotten pipeline runs linger for a month. Set production approvals to 72 hours maximum. If nobody approves in three days, the change should be re-evaluated anyway.

  • Use the REST API gate response body to communicate context. Do not just return {"status": "failed"}. Include a message field explaining why the gate failed. Approvers and pipeline operators need to know whether the gate is blocked by an active incident, a deployment freeze, or a missing change ticket. Azure DevOps displays this message in the pipeline run UI.

  • Never configure gates that call your own application's health endpoint as a pre-deployment gate. If you are deploying version 2.0 and your pre-deployment gate checks the health of the currently running version 1.9, a health issue in 1.9 will block the 2.0 deployment that might fix it. Pre-deployment gates should check external conditions (incidents, change management, dependencies). Post-deployment gates should check the deployed application's health.

  • Use separate environments for each deployment target, even if they share infrastructure. Creating environments named production-us-east, production-us-west, and production-eu lets you configure different gate policies for each region. You might allow a single approver for US regions but require two for EU due to GDPR considerations.

  • Always configure exclusive locks on production environments. Without exclusive locks, two pipeline runs can deploy to the same environment simultaneously, causing race conditions, partial deployments, and difficult-to-diagnose issues. The "Run latest only" lock behavior is almost always what you want.

  • Keep your gate service endpoints highly available. If your REST API gate endpoint goes down, every deployment is blocked. Run the gate service in a different environment than what it gates. Host it on a separate infrastructure that does not share failure domains with your application. Monitor it independently.

  • Document your gate policies in your team's runbook. When a deployment is blocked at 2 AM during an incident, the on-call engineer needs to know which gates are configured, what they check, and how to override them. Create a runbook page that lists every gate on every environment with bypass procedures for emergencies.

  • Test your gates in non-production environments first. Configure the same gate types on staging with shorter timeouts and more lenient thresholds. Verify that your REST API gate endpoint handles all response scenarios correctly before relying on it for production.

References

Powered by Contentful