Deployment Gates and Pre-Deployment Approvals
A comprehensive guide to configuring deployment gates and pre-deployment approvals in Azure DevOps multi-stage pipelines, covering manual approvals, automated gates, health checks, and gate evaluation policies.
Deployment Gates and Pre-Deployment Approvals
Overview
Deployment gates are automated and manual checkpoints in Azure DevOps pipelines that must pass before a deployment can proceed to the next stage. They exist to prevent bad releases from reaching production by enforcing business rules, health checks, and human oversight at critical pipeline boundaries. If you have ever pushed a build to production only to realize the monitoring dashboard was already on fire from the last deployment, gates are how you stop that from happening again.
Prerequisites
- An Azure DevOps organization and project with Pipelines enabled
- At least one YAML-based multi-stage pipeline
- Access to create and configure Environments in Azure DevOps
- Basic familiarity with YAML pipeline syntax (stages, jobs, deploymentjobs)
- An Azure subscription if you plan to use Azure Monitor gates
- Project Administrator or Environment Administrator permissions
Understanding Environments in Azure DevOps
Before you configure gates, you need to understand how Azure DevOps Environments work. An Environment is a named target that represents where your code runs. Unlike classic release pipeline stages, YAML pipelines use environments as first-class resources with their own approval and check policies.
You create environments in Azure DevOps under Pipelines > Environments. Each environment can have:
- Approval checks (manual sign-off from designated users)
- Branch control (restrict which branches can deploy)
- Business hours restrictions
- Template-required checks
- Exclusive lock policies
- Custom gates via Azure Functions or REST APIs
When a deployment job targets an environment, Azure DevOps evaluates all checks and gates configured on that environment before the job runs. The deployment job pauses and waits until every check passes or times out.
# Referencing an environment in a deployment job
stages:
- stage: DeployProduction
jobs:
- deployment: DeployWeb
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- script: echo "Deploying to production"
The key insight here is that gates and approvals are configured on the environment resource, not in the pipeline YAML itself. This is a deliberate design decision. It means a team lead can enforce approval policies across every pipeline that targets production, without trusting individual pipeline authors to include the right checks.
Manual Approval Gates
Manual approvals are the most common gate type. You configure them on an environment, and when a pipeline reaches a deployment job targeting that environment, it pauses and notifies the designated approvers.
Configuring Manual Approvals
Navigate to Pipelines > Environments > [your environment] > Approvals and checks > Add check > Approvals.
You can configure:
- Approvers: Individual users or groups
- Instructions: Context for approvers about what to verify
- Timeout: How long to wait before the approval times out (default is 30 days, but you should lower this)
- Minimum number of approvers: Require multiple sign-offs for critical environments
- Allow approvers to approve their own runs: Usually disabled for production
Here is how the approval gate interacts with pipeline execution:
Pipeline Run Starts
|
v
Build Stage --> succeeds
|
v
Deploy to Staging --> succeeds
|
v
Deploy to Production --> PAUSED (waiting for approval)
|
v
Approver receives email/Teams notification
|
v
Approver reviews changes, approves or rejects
|
v
Deployment proceeds or fails
Approval Timeout Behavior
When an approval times out, the deployment job fails with status Canceled. The pipeline run shows:
Approval timed out. The approval for environment 'production' was not completed within the timeout period of 72:00:00.
Set timeouts that match your team's workflow. For staging environments, 4 hours is reasonable. For production, 24-72 hours gives people enough time to review without leaving pipeline runs dangling forever.
Multi-Approver Configuration
For high-risk environments, require multiple approvers. You can set this to require all specified approvers or a minimum count:
Minimum number of approvers: 2
Approvers:
- [Team Lead Group]
- [Release Manager]
- [QA Lead]
Order: Any order
With this configuration, at least 2 of the 3 designated approvers must sign off before the deployment proceeds. This prevents a single person from pushing a risky change through.
Automated Gates
Automated gates are where things get powerful. Instead of relying on humans to click a button, you configure automated checks that evaluate conditions programmatically.
Azure Monitor Gates
Azure Monitor gates query your monitoring infrastructure to verify that your application is healthy before proceeding. This is particularly useful for post-deployment validation: deploy to staging, then gate the production deployment on staging health metrics.
Configure an Azure Monitor gate under Approvals and checks > Add check > Azure Monitor alerts:
- Azure subscription: The subscription containing your monitoring resources
- Resource group: Where your monitored resources live
- Resource: The specific Application Insights or Azure Monitor resource
- Alert rules: Which alert rules to evaluate
The gate checks whether any active alerts exist on the specified resources. If alerts are firing, the gate fails and the deployment is blocked.
REST API Gates (Invoke REST API)
REST API gates are the most flexible gate type. They call an HTTP endpoint and evaluate the response to determine pass/fail. This lets you integrate any external system into your deployment pipeline.
Configure under Approvals and checks > Add check > Invoke REST API.
Here is an example: you have a Node.js service that checks whether your feature flag system is ready for a deployment:
var express = require('express');
var axios = require('axios');
var app = express();
app.get('/api/deployment-gate', function(req, res) {
var environment = req.query.environment;
var buildId = req.query.buildId;
// Check feature flag service health
checkFeatureFlags(environment, function(err, flagsReady) {
if (err) {
return res.status(500).json({
status: 'failed',
message: 'Could not reach feature flag service: ' + err.message
});
}
// Check active incident status
checkIncidentStatus(function(err, incidents) {
if (err) {
return res.status(500).json({
status: 'failed',
message: 'Could not check incident status'
});
}
var activeP1 = incidents.filter(function(i) {
return i.severity === 'P1' && i.status === 'active';
});
if (activeP1.length > 0) {
return res.status(200).json({
status: 'failed',
message: 'Active P1 incident: ' + activeP1[0].title + '. Deployment blocked.'
});
}
if (!flagsReady) {
return res.status(200).json({
status: 'failed',
message: 'Feature flags not configured for ' + environment
});
}
res.status(200).json({
status: 'succeeded',
message: 'All pre-deployment checks passed for build ' + buildId
});
});
});
});
function checkFeatureFlags(environment, callback) {
axios.get('https://flags.internal.company.com/api/status/' + environment)
.then(function(response) {
callback(null, response.data.ready === true);
})
.catch(function(err) {
callback(err);
});
}
function checkIncidentStatus(callback) {
axios.get('https://incidents.internal.company.com/api/active')
.then(function(response) {
callback(null, response.data.incidents || []);
})
.catch(function(err) {
callback(err);
});
}
var port = process.env.PORT || 3500;
app.listen(port, function() {
console.log('Deployment gate service listening on port ' + port);
});
The REST API gate configuration expects a specific response format. The response body must include a status field. Azure DevOps evaluates the response like this:
| HTTP Status Code | Response Body status |
Gate Result |
|---|---|---|
| 2xx | succeeded |
Pass |
| 2xx | failed |
Fail (will retry) |
| 4xx/5xx | any | Fail (will retry) |
Work Item Validation Gates
Work item gates verify that all work items linked to the pipeline run meet specific criteria. For example, you can require that all linked user stories are in a "Ready for Release" state before deploying.
Configure under Approvals and checks > Add check > Query Work Items.
Specify a work item query that returns items blocking deployment:
SELECT [System.Id], [System.Title], [System.State]
FROM WorkItems
WHERE [System.TeamProject] = @project
AND [System.State] NOT IN ('Ready for Release', 'Closed', 'Resolved')
AND [System.Tags] CONTAINS 'release-blocker'
ORDER BY [System.CreatedDate] DESC
The gate passes when the query returns zero results (no blockers found). If the query returns any work items, the gate fails and displays them in the pipeline run summary.
Gate Evaluation Timing and Retry Logic
This is where most people get confused. Gates do not evaluate once. They evaluate repeatedly on a schedule until they pass or time out.
Evaluation Flow
Gate configured with:
- Delay before first evaluation: 5 minutes
- Time between evaluations: 10 minutes
- Timeout: 60 minutes
- Minimum duration: 15 minutes
Timeline:
T+0:00 - Deployment reaches gate, timer starts
T+0:05 - First evaluation (fails - monitoring alert active)
T+0:15 - Second evaluation (fails - alert still active)
T+0:25 - Third evaluation (succeeds - alert resolved)
T+0:30 - Minimum duration reached, but gate already passed at T+0:25
T+0:25 - Gate passes, deployment proceeds
The key parameters:
- Delay before first evaluation: Wait this long before the first check. Useful for post-deployment gates where you need the deployment to stabilize before checking health.
- Time between re-evaluation of gates: How often to retry failed gates.
- Timeout after which gates fail: Maximum time to wait for all gates to pass.
- Minimum duration: Even if gates pass immediately, wait at least this long. This is useful for "soak time" where you want to observe a deployment for a minimum period.
The Minimum Duration Trap
Here is a subtle behavior that catches people. If you set a minimum duration of 30 minutes and all your gates pass on the first evaluation at T+5 minutes, the pipeline still waits until T+30 minutes. However, the gates continue to be re-evaluated during this period. If a gate that passed at T+5 fails at T+20, the gate result resets and the timer keeps running.
This means a gate with a 30-minute minimum duration and 5-minute evaluation interval must pass consistently across multiple evaluations, not just once. This is actually a good thing for health check gates. You want the system to be healthy over a sustained period, not just at one point in time.
Pre-Deployment vs Post-Deployment Gates
Azure DevOps classic release pipelines had explicit pre-deployment and post-deployment gate phases. YAML pipelines handle this differently through stage dependencies and environment checks.
Pre-Deployment Pattern
Gates configured on the target environment act as pre-deployment gates. They evaluate before the deployment job runs:
stages:
- stage: Build
jobs:
- job: BuildApp
steps:
- script: npm ci && npm run build
- publish: $(Build.ArtifactStagingDirectory)
artifact: drop
- stage: DeployStaging
dependsOn: Build
jobs:
- deployment: DeployToStaging
environment: 'staging' # gates on 'staging' environment evaluate here
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: drop
- script: |
echo "Deploying to staging..."
Post-Deployment Validation Pattern
For post-deployment validation, use a separate stage with its own gate checks. The trick is to make the next stage depend on a validation stage that runs automated tests or health checks:
stages:
- stage: DeployStaging
jobs:
- deployment: Deploy
environment: 'staging'
strategy:
runOnce:
deploy:
steps:
- script: echo "Deploy to staging"
- stage: ValidateStaging
dependsOn: DeployStaging
jobs:
- job: RunSmokeTests
steps:
- script: |
echo "Running smoke tests against staging..."
npm run test:smoke -- --url https://staging.myapp.com
displayName: 'Smoke Tests'
- script: |
echo "Checking staging health endpoint..."
curl -f https://staging.myapp.com/health || exit 1
displayName: 'Health Check'
- stage: DeployProduction
dependsOn: ValidateStaging
jobs:
- deployment: Deploy
environment: 'production' # approval + automated gates here
strategy:
runOnce:
deploy:
steps:
- script: echo "Deploy to production"
This pattern creates a natural flow: deploy to staging, validate staging automatically, then gate production behind both manual approval and automated checks on the production environment.
Multi-Stage Pipeline Gate Configurations
Real-world pipelines have more than two stages. Here is how to structure gates across a full deployment pipeline with multiple environments:
trigger:
branches:
include:
- main
pool:
vmImage: 'ubuntu-latest'
variables:
- group: 'app-settings'
- name: nodeVersion
value: '20.x'
stages:
# Stage 1: Build and unit test
- stage: Build
displayName: 'Build & Test'
jobs:
- job: BuildAndTest
steps:
- task: NodeTool@0
inputs:
versionSpec: $(nodeVersion)
- script: |
npm ci
npm run lint
npm test
displayName: 'Install, Lint, Test'
- script: npm run build
displayName: 'Build Application'
- publish: $(System.DefaultWorkingDirectory)/dist
artifact: app-package
# Stage 2: Deploy to Dev (no gates, auto-deploy)
- stage: DeployDev
displayName: 'Deploy to Dev'
dependsOn: Build
jobs:
- deployment: DeployDev
environment: 'dev'
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app-package
- task: AzureWebApp@1
inputs:
azureSubscription: 'Azure-Dev'
appName: 'myapp-dev'
package: '$(Pipeline.Workspace)/app-package'
# Stage 3: Integration tests on Dev
- stage: IntegrationTests
displayName: 'Integration Tests'
dependsOn: DeployDev
jobs:
- job: RunIntegrationTests
steps:
- script: |
npm ci
npm run test:integration -- --baseUrl https://myapp-dev.azurewebsites.net
displayName: 'Run Integration Tests'
timeoutInMinutes: 15
# Stage 4: Deploy to Staging (branch control gate on environment)
- stage: DeployStaging
displayName: 'Deploy to Staging'
dependsOn: IntegrationTests
jobs:
- deployment: DeployStaging
environment: 'staging' # Has branch control: only 'main' branch allowed
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app-package
- task: AzureWebApp@1
inputs:
azureSubscription: 'Azure-Staging'
appName: 'myapp-staging'
package: '$(Pipeline.Workspace)/app-package'
# Stage 5: Performance and smoke tests on Staging
- stage: StagingValidation
displayName: 'Staging Validation'
dependsOn: DeployStaging
jobs:
- job: SmokeTests
steps:
- script: |
echo "Running smoke tests..."
npm ci
npm run test:smoke -- --baseUrl https://myapp-staging.azurewebsites.net
displayName: 'Smoke Tests'
- job: PerformanceTests
steps:
- script: |
echo "Running k6 load tests..."
k6 run --vus 50 --duration 2m tests/load/baseline.js
displayName: 'Load Tests'
timeoutInMinutes: 10
# Stage 6: Deploy to Production (manual approval + automated gates)
- stage: DeployProduction
displayName: 'Deploy to Production'
dependsOn: StagingValidation
jobs:
- deployment: DeployProduction
environment: 'production'
# Environment 'production' has:
# - Manual approval (2 approvers required)
# - Azure Monitor gate (no active P1/P2 alerts)
# - REST API gate (change management system approval)
# - Business hours check (Mon-Thu, 9am-3pm)
# - Exclusive lock (one deployment at a time)
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app-package
- task: AzureWebApp@1
inputs:
azureSubscription: 'Azure-Production'
appName: 'myapp-prod'
package: '$(Pipeline.Workspace)/app-package'
deploymentMethod: 'zipDeploy'
appSettings: |
-NODE_ENV production
on:
success:
steps:
- script: |
echo "Production deployment succeeded"
echo "Build: $(Build.BuildNumber)"
echo "Commit: $(Build.SourceVersion)"
displayName: 'Post-deploy notification'
failure:
steps:
- script: |
echo "##vso[task.logissue type=error]Production deployment failed!"
displayName: 'Failure notification'
# Stage 7: Post-deployment verification
- stage: ProductionVerification
displayName: 'Production Verification'
dependsOn: DeployProduction
jobs:
- job: VerifyProduction
steps:
- script: |
echo "Verifying production deployment..."
response=$(curl -s -o /dev/null -w "%{http_code}" https://myapp.com/health)
if [ "$response" != "200" ]; then
echo "##vso[task.logissue type=error]Health check failed with status $response"
exit 1
fi
echo "Health check passed: HTTP $response"
displayName: 'Production Health Check'
- script: |
echo "Running production smoke tests..."
npm ci
npm run test:smoke -- --baseUrl https://myapp.com --subset critical
displayName: 'Critical Path Tests'
timeoutInMinutes: 5
Gate Timeout and Evaluation Policies
Getting timeout and evaluation policies right is critical. Set them too aggressively and you get false failures. Set them too loosely and bad deployments slip through.
Recommended Settings by Environment
| Setting | Staging | Production |
|---|---|---|
| Delay before first evaluation | 0 minutes | 5 minutes |
| Time between evaluations | 5 minutes | 10 minutes |
| Timeout | 2 hours | 8 hours |
| Minimum duration | 0 minutes | 15 minutes |
| Approval timeout | 4 hours | 72 hours |
Business Hours Check
The Business Hours check restricts deployments to specific time windows. This is invaluable for production environments where you want to avoid deploying on Friday at 4:55 PM.
Configure under Approvals and checks > Add check > Business Hours:
Time zone: (UTC-08:00) Pacific Time
Days: Monday, Tuesday, Wednesday, Thursday
Start time: 09:00
End time: 15:00
If a pipeline reaches the gate outside business hours, it waits until the next valid window. This combines well with approval gates: the approver gets notified, reviews the changes, and the deployment naturally lands during the next safe window.
Exclusive Lock Check
The Exclusive Lock check ensures only one pipeline run deploys to an environment at a time. This prevents race conditions when multiple pipeline runs target the same environment.
Configure under Approvals and checks > Add check > Exclusive Lock:
Lock behavior: Sequential (runs queue and execute in order)
-- or --
Lock behavior: Run latest only (cancels queued runs, only latest proceeds)
For production, "Run latest only" is usually correct. If three builds queue up, you only care about the latest one.
Combining Gates with Environment Checks
The real power of gates emerges when you combine multiple check types on a single environment. Azure DevOps evaluates all checks in parallel and requires all of them to pass.
Here is a production environment configuration that I use on real projects:
Check 1: Manual Approval
- Approvers: Release Managers group
- Minimum approvers: 2
- Timeout: 72 hours
- Instructions: "Review the staging test results, check the deployment diff, and verify the change management ticket."
Check 2: Azure Monitor
- Resource: Application Insights (production)
- Alert rules: All severity 0-2 alerts
- Evaluation: Fail if any alerts are active
Check 3: Invoke REST API
- URL:
https://gates.internal.company.com/api/deployment-ready - Method: POST
- Headers:
Authorization: Bearer $(gate-service-token) - Body:
{
"buildId": "$(Build.BuildId)",
"environment": "production",
"repository": "$(Build.Repository.Name)",
"branch": "$(Build.SourceBranch)",
"requestedBy": "$(Build.RequestedFor)"
}
- Success criteria: Response body contains
"status": "succeeded" - Evaluation interval: 10 minutes
- Timeout: 4 hours
Check 4: Business Hours
- Mon-Thu, 9am-3pm Pacific
Check 5: Exclusive Lock
- Run latest only
Writing a Comprehensive Gate Service
Here is a more complete gate service that checks multiple conditions and returns detailed results:
var express = require('express');
var axios = require('axios');
var app = express();
app.use(express.json());
// Gate evaluation endpoint
app.post('/api/deployment-ready', function(req, res) {
var buildId = req.body.buildId;
var environment = req.body.environment;
var requestedBy = req.body.requestedBy;
console.log('[Gate] Evaluating deployment gate for build ' + buildId + ' to ' + environment);
var checks = [
checkChangeManagement(buildId),
checkActiveIncidents(),
checkDeploymentFreeze(),
checkDependencyHealth(environment)
];
Promise.all(checks)
.then(function(results) {
var failures = results.filter(function(r) { return r.passed === false; });
if (failures.length > 0) {
var messages = failures.map(function(f) { return f.reason; });
console.log('[Gate] BLOCKED: ' + messages.join('; '));
return res.status(200).json({
status: 'failed',
message: 'Deployment blocked: ' + messages.join('; '),
checks: results
});
}
console.log('[Gate] PASSED: All checks succeeded for build ' + buildId);
res.status(200).json({
status: 'succeeded',
message: 'All deployment gates passed',
checks: results,
evaluatedAt: new Date().toISOString()
});
})
.catch(function(err) {
console.error('[Gate] ERROR: ' + err.message);
res.status(500).json({
status: 'failed',
message: 'Gate evaluation error: ' + err.message
});
});
});
function checkChangeManagement(buildId) {
return axios.get('https://changemgmt.internal.company.com/api/tickets', {
params: { buildId: buildId, status: 'approved' }
})
.then(function(response) {
var hasApprovedTicket = response.data.tickets && response.data.tickets.length > 0;
return {
check: 'change-management',
passed: hasApprovedTicket,
reason: hasApprovedTicket ? 'Change ticket approved' : 'No approved change ticket found for build ' + buildId
};
})
.catch(function(err) {
return {
check: 'change-management',
passed: false,
reason: 'Change management system unreachable: ' + err.message
};
});
}
function checkActiveIncidents() {
return axios.get('https://incidents.internal.company.com/api/active')
.then(function(response) {
var severeIncidents = (response.data.incidents || []).filter(function(i) {
return i.severity <= 2; // P1 or P2
});
var hasSevereIncidents = severeIncidents.length > 0;
return {
check: 'active-incidents',
passed: !hasSevereIncidents,
reason: hasSevereIncidents
? 'Active P' + severeIncidents[0].severity + ' incident: ' + severeIncidents[0].title
: 'No active P1/P2 incidents'
};
})
.catch(function(err) {
return {
check: 'active-incidents',
passed: false,
reason: 'Incident system unreachable: ' + err.message
};
});
}
function checkDeploymentFreeze() {
return axios.get('https://changemgmt.internal.company.com/api/freeze-windows/active')
.then(function(response) {
var freezeActive = response.data.freezeActive === true;
return {
check: 'deployment-freeze',
passed: !freezeActive,
reason: freezeActive
? 'Deployment freeze active until ' + response.data.endsAt
: 'No deployment freeze in effect'
};
})
.catch(function(err) {
return {
check: 'deployment-freeze',
passed: false,
reason: 'Could not verify freeze window status: ' + err.message
};
});
}
function checkDependencyHealth(environment) {
var endpoints = {
staging: [
'https://api-staging.internal.company.com/health',
'https://auth-staging.internal.company.com/health'
],
production: [
'https://api.internal.company.com/health',
'https://auth.internal.company.com/health',
'https://payments.internal.company.com/health'
]
};
var urls = endpoints[environment] || endpoints.production;
var healthChecks = urls.map(function(url) {
return axios.get(url, { timeout: 5000 })
.then(function(response) {
return { url: url, healthy: response.status === 200 };
})
.catch(function() {
return { url: url, healthy: false };
});
});
return Promise.all(healthChecks)
.then(function(results) {
var unhealthy = results.filter(function(r) { return !r.healthy; });
return {
check: 'dependency-health',
passed: unhealthy.length === 0,
reason: unhealthy.length === 0
? 'All ' + results.length + ' dependencies healthy'
: 'Unhealthy dependencies: ' + unhealthy.map(function(u) { return u.url; }).join(', ')
};
});
}
var port = process.env.PORT || 3500;
app.listen(port, function() {
console.log('Deployment gate service running on port ' + port);
});
The corresponding package.json for this gate service:
{
"name": "deployment-gate-service",
"version": "1.0.0",
"description": "Azure DevOps deployment gate evaluation service",
"main": "server.js",
"scripts": {
"start": "node server.js",
"dev": "nodemon server.js"
},
"dependencies": {
"express": "^4.18.2",
"axios": "^1.6.0"
},
"devDependencies": {
"nodemon": "^3.0.0"
}
}
Complete Working Example
Here is a complete multi-stage YAML pipeline that demonstrates environment approvals, automated health check gates, and post-deployment validation. This is a real-world pattern for deploying a Node.js application to Azure App Service.
# azure-pipelines.yml
trigger:
branches:
include:
- main
pr: none
pool:
vmImage: 'ubuntu-latest'
variables:
- name: nodeVersion
value: '20.x'
- name: appName
value: 'myapp'
- group: 'deployment-secrets'
stages:
# ============================================================
# STAGE 1: Build, Lint, Test
# ============================================================
- stage: Build
displayName: 'Build & Test'
jobs:
- job: BuildJob
displayName: 'Build Application'
steps:
- task: NodeTool@0
displayName: 'Use Node $(nodeVersion)'
inputs:
versionSpec: $(nodeVersion)
- script: npm ci
displayName: 'Install dependencies'
- script: npm run lint
displayName: 'Run linter'
- script: npm test -- --ci --coverage
displayName: 'Run unit tests'
- task: PublishTestResults@2
displayName: 'Publish test results'
inputs:
testResultsFormat: 'JUnit'
testResultsFiles: '**/junit.xml'
condition: always()
- task: PublishCodeCoverageResults@1
displayName: 'Publish coverage'
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: '**/coverage/cobertura-coverage.xml'
- script: npm run build
displayName: 'Build for production'
- task: ArchiveFiles@2
displayName: 'Archive build output'
inputs:
rootFolderOrFile: '$(System.DefaultWorkingDirectory)'
includeRootFolder: false
archiveType: 'zip'
archiveFile: '$(Build.ArtifactStagingDirectory)/$(appName)-$(Build.BuildNumber).zip'
- publish: '$(Build.ArtifactStagingDirectory)/$(appName)-$(Build.BuildNumber).zip'
artifact: drop
displayName: 'Publish artifact'
# ============================================================
# STAGE 2: Deploy to Staging
# Environment 'staging' has:
# - Branch control (main only)
# ============================================================
- stage: DeployStaging
displayName: 'Deploy to Staging'
dependsOn: Build
condition: succeeded()
jobs:
- deployment: DeployStagingJob
displayName: 'Deploy to Staging'
environment: 'staging'
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: drop
- task: AzureWebApp@1
displayName: 'Deploy to Azure App Service (Staging)'
inputs:
azureSubscription: 'Azure-NonProd'
appType: 'webAppLinux'
appName: '$(appName)-staging'
package: '$(Pipeline.Workspace)/drop/$(appName)-$(Build.BuildNumber).zip'
runtimeStack: 'NODE|20-lts'
# ============================================================
# STAGE 3: Validate Staging
# Automated tests and health checks
# ============================================================
- stage: ValidateStaging
displayName: 'Validate Staging'
dependsOn: DeployStaging
condition: succeeded()
jobs:
- job: HealthCheck
displayName: 'Health Check'
steps:
- script: |
echo "Waiting 30 seconds for app to start..."
sleep 30
echo "Checking health endpoint..."
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
https://$(appName)-staging.azurewebsites.net/health)
echo "Health check response: HTTP $HTTP_STATUS"
if [ "$HTTP_STATUS" != "200" ]; then
echo "##vso[task.logissue type=error]Health check failed: HTTP $HTTP_STATUS"
exit 1
fi
echo "##vso[task.complete result=Succeeded;]Health check passed"
displayName: 'Verify health endpoint'
retryCountOnTaskFailure: 3
- job: SmokeTests
displayName: 'Smoke Tests'
dependsOn: HealthCheck
steps:
- task: NodeTool@0
inputs:
versionSpec: $(nodeVersion)
- script: |
npm ci
STAGING_URL=https://$(appName)-staging.azurewebsites.net \
npm run test:smoke
displayName: 'Run smoke tests'
timeoutInMinutes: 10
- job: SecurityScan
displayName: 'Security Scan'
dependsOn: HealthCheck
steps:
- script: |
npm audit --production --audit-level=high
displayName: 'NPM audit check'
continueOnError: false
# ============================================================
# STAGE 4: Deploy to Production
# Environment 'production' has:
# - Manual approval (2 required from Release Managers)
# - Azure Monitor gate (no active alerts)
# - REST API gate (change management approval)
# - Business hours (Mon-Thu, 9am-3pm PT)
# - Exclusive lock (run latest only)
# ============================================================
- stage: DeployProduction
displayName: 'Deploy to Production'
dependsOn: ValidateStaging
condition: succeeded()
jobs:
- deployment: DeployProductionJob
displayName: 'Deploy to Production'
environment: 'production'
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: drop
- task: AzureWebApp@1
displayName: 'Deploy to Azure App Service (Production)'
inputs:
azureSubscription: 'Azure-Production'
appType: 'webAppLinux'
appName: '$(appName)-prod'
package: '$(Pipeline.Workspace)/drop/$(appName)-$(Build.BuildNumber).zip'
runtimeStack: 'NODE|20-lts'
deploymentMethod: 'zipDeploy'
on:
success:
steps:
- script: |
echo "Production deployment completed successfully"
echo "Build Number: $(Build.BuildNumber)"
echo "Source Version: $(Build.SourceVersion)"
echo "Deployed by: $(Build.RequestedFor)"
echo "Timestamp: $(date -u +%Y-%m-%dT%H:%M:%SZ)"
displayName: 'Log deployment details'
# ============================================================
# STAGE 5: Post-Deployment Verification
# ============================================================
- stage: ProductionVerification
displayName: 'Production Verification'
dependsOn: DeployProduction
condition: succeeded()
jobs:
- job: VerifyDeployment
displayName: 'Verify Production'
steps:
- script: |
echo "Waiting 60 seconds for production deployment to stabilize..."
sleep 60
echo "=== Production Health Check ==="
HTTP_STATUS=$(curl -s -o response.json -w "%{http_code}" \
https://$(appName).com/health)
echo "HTTP Status: $HTTP_STATUS"
cat response.json | python3 -m json.tool 2>/dev/null || cat response.json
if [ "$HTTP_STATUS" != "200" ]; then
echo "##vso[task.logissue type=error]Production health check failed: HTTP $HTTP_STATUS"
exit 1
fi
echo ""
echo "=== Version Verification ==="
DEPLOYED_VERSION=$(curl -s https://$(appName).com/health | \
python3 -c "import sys,json; print(json.load(sys.stdin).get('version','unknown'))")
echo "Deployed version: $DEPLOYED_VERSION"
echo "Expected build: $(Build.BuildNumber)"
displayName: 'Production health and version check'
retryCountOnTaskFailure: 2
- script: |
npm ci
PROD_URL=https://$(appName).com \
npm run test:smoke -- --subset critical-path
displayName: 'Critical path smoke tests'
timeoutInMinutes: 5
The pipeline run for this configuration takes approximately 15-25 minutes for the automated stages, plus whatever time the manual approval and gate evaluations require. A typical production deployment might look like:
Build & Test ~4 minutes
Deploy to Staging ~2 minutes
Validate Staging ~5 minutes
- Health Check 30s + check
- Smoke Tests ~3 minutes
- Security Scan ~1 minute
Deploy to Production ~15 minutes (waiting for gates) + ~2 minutes (deploy)
- Manual Approval variable (minutes to hours)
- Azure Monitor evaluated every 10 min
- REST API Gate evaluated every 10 min
- Business Hours waits until next window
Production Verify ~3 minutes
Common Issues and Troubleshooting
Issue 1: Gate Evaluation Returns "Task Was Canceled"
##[error]The gate evaluation was canceled because the timeout of '480' minutes was reached.
Task 'InvokeRestAPI' was canceled.
Cause: Your REST API gate endpoint is returning a non-success response, and the gate keeps retrying until the timeout expires. This usually means the endpoint is down, returning 500 errors, or the response body does not match the expected format.
Fix: Verify your gate endpoint returns a 200 status code with {"status": "succeeded"} or {"status": "failed"} in the response body. Check the endpoint logs. A common mistake is returning {"result": "success"} instead of {"status": "succeeded"}. The field name and value must be exact.
Issue 2: Approval Notification Not Sent
The approval for environment 'production' is pending. No notification was sent.
Approval status: Pending
Approvers: [Release Managers]
Cause: Notification settings for the approver group or individual are not configured, or the notification subscription for "A deployment approval is pending" is disabled in the user's notification settings.
Fix: Each approver needs to verify their notification settings under User Settings > Notifications. The subscription "A deployment approval is pending" must be enabled. Also confirm that the approver group contains actual users, not just nested groups that might resolve to empty sets.
Issue 3: Business Hours Gate Blocks All Deployments
##[section]Checking Business Hours...
Business hours check failed. Current time is outside the allowed deployment window.
Next allowed window: Monday 09:00 AM (UTC-08:00)
Cause: You configured business hours in the wrong timezone, or you set the days too restrictively. If your team is distributed across timezones, the business hours check uses the timezone you configured, not the user's local timezone.
Fix: Choose a timezone that accommodates your entire team. If your team spans US Pacific to European time, consider using UTC and setting a wider window. Or set the window to 8am-4pm in your most restrictive timezone. Also remember that Friday deployments are blocked if you only check Monday-Thursday, which might be intentional but catches people off guard.
Issue 4: Exclusive Lock Causes Pipeline Runs to Queue Indefinitely
##[section]Checking Exclusive Lock...
Waiting for exclusive lock on environment 'production'.
Position in queue: 3
Currently locked by run: #20240115.4
Cause: A previous pipeline run has the exclusive lock on the environment and is stuck (waiting for approval, gate evaluation, or failed without releasing the lock). All subsequent runs queue behind it.
Fix: Go to Pipelines > Environments > production > Runs and cancel the stuck run. This releases the lock and allows the next queued run to proceed. If you configured "Run latest only," only the most recent run will proceed and intermediate runs are automatically canceled. Consider switching to "Run latest only" if this happens frequently. You can also cancel queued runs from the pipeline runs list.
Issue 5: REST API Gate Passes But Deployment Still Blocked
Gate 'Invoke REST API' evaluation result: Succeeded
Gate 'Azure Monitor' evaluation result: Succeeded
Overall gate status: Waiting (minimum duration not reached)
Time remaining: 12 minutes
Cause: You configured a minimum duration on the gate evaluation, and while all gates have passed, the minimum soak time has not elapsed. The gates will continue to be re-evaluated during this period.
Fix: This is working as intended, but if the minimum duration is too long for your workflow, reduce it. A 5-minute minimum duration is usually sufficient for health check soak time. The 30-minute defaults some teams set are excessive for most applications.
Best Practices
Start with manual approvals and add automated gates incrementally. Do not try to automate every gate on day one. Begin with manual approvals for production, get your team comfortable with the flow, then layer on automated health checks and REST API gates as you identify what matters most.
Set gate timeouts shorter than you think you need. A 30-day approval timeout means forgotten pipeline runs linger for a month. Set production approvals to 72 hours maximum. If nobody approves in three days, the change should be re-evaluated anyway.
Use the REST API gate response body to communicate context. Do not just return
{"status": "failed"}. Include amessagefield explaining why the gate failed. Approvers and pipeline operators need to know whether the gate is blocked by an active incident, a deployment freeze, or a missing change ticket. Azure DevOps displays this message in the pipeline run UI.Never configure gates that call your own application's health endpoint as a pre-deployment gate. If you are deploying version 2.0 and your pre-deployment gate checks the health of the currently running version 1.9, a health issue in 1.9 will block the 2.0 deployment that might fix it. Pre-deployment gates should check external conditions (incidents, change management, dependencies). Post-deployment gates should check the deployed application's health.
Use separate environments for each deployment target, even if they share infrastructure. Creating environments named
production-us-east,production-us-west, andproduction-eulets you configure different gate policies for each region. You might allow a single approver for US regions but require two for EU due to GDPR considerations.Always configure exclusive locks on production environments. Without exclusive locks, two pipeline runs can deploy to the same environment simultaneously, causing race conditions, partial deployments, and difficult-to-diagnose issues. The "Run latest only" lock behavior is almost always what you want.
Keep your gate service endpoints highly available. If your REST API gate endpoint goes down, every deployment is blocked. Run the gate service in a different environment than what it gates. Host it on a separate infrastructure that does not share failure domains with your application. Monitor it independently.
Document your gate policies in your team's runbook. When a deployment is blocked at 2 AM during an incident, the on-call engineer needs to know which gates are configured, what they check, and how to override them. Create a runbook page that lists every gate on every environment with bypass procedures for emergencies.
Test your gates in non-production environments first. Configure the same gate types on staging with shorter timeouts and more lenient thresholds. Verify that your REST API gate endpoint handles all response scenarios correctly before relying on it for production.
