Implementing Blue-Green Deployments with Azure Pipelines
A practical guide to implementing blue-green deployments with Azure Pipelines for zero-downtime releases using deployment slots.
Implementing Blue-Green Deployments with Azure Pipelines
Overview
Blue-green deployment is a release strategy where you maintain two identical production environments and switch traffic between them during releases, eliminating downtime entirely. Azure App Service deployment slots make this pattern straightforward to implement, and Azure Pipelines gives you the automation backbone to deploy to a staging slot, verify health, swap traffic, and roll back automatically if something goes wrong. If you are running a Node.js application in production and your users cannot tolerate downtime during releases, this is the approach you should be using.
I have been running blue-green deployments in production for over five years now, across dozens of Node.js services. The pattern works. It catches problems before they reach users, and when something does slip through, you can roll back in seconds instead of minutes. This article walks through the complete implementation from scratch -- slot configuration, pipeline YAML, health checks, traffic routing, automated rollback, and the database migration strategy that makes it all work together.
Prerequisites
- An Azure subscription with an App Service Plan on the Standard tier or higher (deployment slots are not available on the Free or Basic tiers)
- An Azure DevOps organization with a project and a pipeline configured
- An Azure Resource Manager service connection in Azure DevOps linked to your subscription
- A Node.js application deployed to Azure App Service
- Familiarity with YAML pipeline syntax in Azure DevOps
- Azure CLI installed locally for testing slot operations
The Blue-Green Deployment Concept
Traditional deployments follow a simple pattern: stop the old version, deploy the new version, start it up, hope nothing breaks. During that window your users see errors, timeouts, or a maintenance page. Blue-green deployment eliminates that window completely.
The idea is simple. You have two environments:
- Blue -- the live production environment currently serving traffic
- Green -- an idle environment where you deploy the new version
You deploy your new code to the green environment. You run health checks against it. You verify it works. Then you swap traffic from blue to green in one atomic operation. The old blue environment becomes your instant rollback target. If something goes wrong in production after the swap, you swap back in seconds.
In Azure App Service, deployment slots are purpose-built for this pattern. Every App Service has a production slot by default. You create a staging slot (the green environment), deploy there, validate, and then swap. The swap operation is handled at the load balancer level -- it redirects traffic by swapping the virtual IP mappings. No cold starts, no connection drops.
Here is why this matters for Node.js applications specifically:
- Node.js cold starts are real. A Node.js app needs to load modules, establish database connection pools, warm caches. With blue-green, the staging slot is fully warmed before it receives production traffic.
- npm install failures are caught early. If a dependency fails to install or a native module fails to compile, you find out in the staging slot, not in production.
- Database migrations can be tested. Your staging slot can run against the production database (or a replica) to verify migrations before the swap.
Setting Up Azure App Service Deployment Slots
First, create the staging slot using Azure CLI:
# Create a staging deployment slot
az webapp deployment slot create
--name my-node-app
--resource-group my-resource-group
--slot staging
# Verify the slot was created
az webapp deployment slot list
--name my-node-app
--resource-group my-resource-group
--output table
Output:
Name ResourceGroup Status
-------- ----------------- --------
staging my-resource-group Running
Now configure the staging slot with appropriate settings. There is a critical distinction in Azure App Service between slot settings (sticky to the slot) and regular settings (travel with the app during swap):
# Set a slot-specific setting (stays with the slot, does not swap)
az webapp config appsettings set
--name my-node-app
--resource-group my-resource-group
--slot staging
--slot-settings SLOT_NAME=staging
# Set connection strings that should swap with the app
az webapp config appsettings set
--name my-node-app
--resource-group my-resource-group
--slot staging
--settings DB_CONNECTION="mongodb://prod-server:27017/myapp"
This is important: slot settings (configured with --slot-settings) are sticky. They stay with the slot even after a swap. Use these for things like logging levels, feature flags that should differ between environments, and slot identifiers. Regular settings (configured with --settings) travel with the app code during a swap. Use these for database connection strings and API keys that the app code needs regardless of which slot it runs in.
For your Node.js app, add a health check endpoint:
var express = require("express");
var mongoose = require("mongoose");
var app = express();
var isReady = false;
// Readiness probe - checks that the app is fully initialized
app.get("/health", function(req, res) {
if (!isReady) {
return res.status(503).json({
status: "not_ready",
timestamp: new Date().toISOString()
});
}
var checks = {
uptime: process.uptime(),
memory: process.memoryUsage(),
timestamp: new Date().toISOString()
};
// Check database connectivity
if (mongoose.connection.readyState !== 1) {
checks.database = "disconnected";
return res.status(503).json(checks);
}
checks.database = "connected";
checks.status = "healthy";
res.status(200).json(checks);
});
// Liveness probe - basic check that the process is alive
app.get("/health/live", function(req, res) {
res.status(200).json({ status: "alive" });
});
mongoose.connect(process.env.DB_CONNECTION).then(function() {
console.log("Database connected");
isReady = true;
});
var port = process.env.PORT || 8080;
app.listen(port, function() {
console.log("Server running on port " + port);
});
Building the Pipeline YAML
The pipeline has four stages: Build, Deploy, Validate, and Swap.
Build Stage
trigger:
branches:
include:
- main
pool:
vmImage: ubuntu-latest
variables:
azureSubscription: my-azure-service-connection
appName: my-node-app
resourceGroup: my-resource-group
slotName: staging
stages:
- stage: Build
jobs:
- job: BuildJob
steps:
- task: NodeTool@0
inputs:
versionSpec: 20.x
- script: npm ci
- script: npm test
- script: npm prune --production
- task: ArchiveFiles@2
inputs:
rootFolderOrFile: $(System.DefaultWorkingDirectory)
includeRootFolder: false
archiveType: zip
archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip
- task: PublishBuildArtifacts@1
inputs:
pathToPublish: $(Build.ArtifactStagingDirectory)
artifactName: drop
I use npm ci instead of npm install. It installs from package-lock.json, which is faster and deterministic.
Deploy to Staging Stage
- stage: DeployStaging
dependsOn: Build
jobs:
- deployment: DeployToStaging
environment: staging
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: $(azureSubscription)
appType: webAppLinux
appName: $(appName)
deployToSlotOrASE: true
resourceGroupName: $(resourceGroup)
slotName: $(slotName)
package: $(Pipeline.Workspace)/drop/$(Build.BuildId).zip
runtimeStack: NODE|20-lts
- task: AzureAppServiceSettings@1
inputs:
azureSubscription: $(azureSubscription)
appName: $(appName)
resourceGroupName: $(resourceGroup)
slotName: $(slotName)
appSettings: |
[
{ "name": "SLOT_NAME", "value": "staging", "slotSetting": true }
]
The deployment job integrates with Azure DevOps environments for deployment history and approval gates.
Health Checks Before Swap
This is where most teams skip a step and pay for it later. After deploying to staging, verify the application is healthy before swapping.
- stage: ValidateHealth
dependsOn: DeployStaging
jobs:
- job: HealthCheck
steps:
- script: |
echo Waiting 30 seconds...
sleep 30
STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
MAX_RETRIES=10
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_RETRIES ]; do
ATTEMPT=$((ATTEMPT + 1))
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/health)
if [ $HTTP_STATUS = 200 ]; then
echo Health check passed
exit 0
fi
sleep 15
done
echo Health checks failed
exit 1
displayName: Verify staging health
- script: |
STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/)
if [ $HTTP_STATUS \!= 200 ]; then exit 1; fi
echo All smoke tests passed
displayName: Run smoke tests
The health check waits 30 seconds for warm-up, then retries up to 10 times with 15-second intervals.
Slot Swapping and Traffic Routing
Once health checks pass, swap traffic from staging to production.
Instant Swap
- stage: SwapToProduction
dependsOn: ValidateHealth
jobs:
- deployment: SwapSlots
environment: production
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: Swap Slots
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: $(slotName)
The AzureAppServiceManage@0 task performs the atomic swap. The old production code is now in staging -- your rollback target.
Gradual Traffic Routing
For cautious releases, route a percentage of traffic to staging first:
- task: AzureCLI@2
inputs:
azureSubscription: $(azureSubscription)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az webapp traffic-routing set \n --name $(appName) \n --resource-group $(resourceGroup) \n --distribution staging=10
sleep 300
az webapp traffic-routing set \n --name $(appName) \n --resource-group $(resourceGroup) \n --distribution staging=50
sleep 300
az webapp traffic-routing clear \n --name $(appName) \n --resource-group $(resourceGroup)
This sends 10% first, waits, then 50%, before doing the full swap.
Automated Rollback
The rollback stage runs only when the swap stage fails:
- stage: Rollback
dependsOn: SwapToProduction
condition: failed()
jobs:
- deployment: RollbackSwap
environment: production
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: Swap Slots
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: $(slotName)
- task: AzureCLI@2
inputs:
azureSubscription: $(azureSubscription)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az webapp traffic-routing clear \n --name $(appName) \n --resource-group $(resourceGroup)
echo Rollback complete.
The condition: failed() means this stage only runs if SwapToProduction fails. Rollback is just another swap. It takes seconds, not minutes. The old version is sitting in the staging slot, fully warmed and ready.
Database Migration Considerations
Blue-green deployments add complexity when your release includes database schema changes. During the swap, both versions need to work with the same database. The rule is simple: all database migrations must be backward-compatible.
Adding a New Column
This is safe. Add the column with a default value or make it nullable:
// migration-001-add-preferences.js
var MongoClient = require("mongodb").MongoClient;
function up(db) {
return db.collection("users").updateMany(
{ preferences: { $exists: false } },
{ $set: { preferences: { theme: "light", notifications: true } } }
);
}
function down(db) {
return db.collection("users").updateMany(
{},
{ $unset: { preferences: "" } }
);
}
module.exports = { up: up, down: down };
Renaming a Column
This requires a two-phase approach. In the first release, add the new column and write to both:
// Phase 1 -- write to both fields
function updateUserName(userId, newName) {
return db.collection("users").updateOne(
{ _id: userId },
{ $set: { name: newName, displayName: newName } }
);
}
// Read from new field with fallback
function getUserDisplayName(user) {
return user.displayName || user.name;
}
In the second release (after the old version is gone), remove the old column.
Monitoring Post-Swap
After a swap, monitor the application closely. Azure Application Insights gives you real-time visibility into error rates, response times, and throughput:
// monitoring.js - Post-deployment monitoring helper
var https = require("https");
function checkApplicationHealth(appInsightsKey, callback) {
var query = encodeURIComponent(
"requests | where timestamp > ago(5m) | summarize " +
"totalRequests = count(), " +
"failedRequests = countif(success == false), " +
"avgDuration = avg(duration)"
);
var options = {
hostname: "api.applicationinsights.io",
path: "/v1/apps/" + appInsightsKey + "/query?query=" + query,
method: "GET",
headers: {
"x-api-key": process.env.APP_INSIGHTS_API_KEY
}
};
var req = https.request(options, function(res) {
var data = "";
res.on("data", function(chunk) { data += chunk; });
res.on("end", function() {
var result = JSON.parse(data);
var row = result.tables[0].rows[0];
var metrics = {
totalRequests: row[0],
failedRequests: row[1],
avgDuration: row[2],
errorRate: row[0] > 0 ? (row[1] / row[0] * 100).toFixed(2) : 0
};
callback(null, metrics);
});
});
req.on("error", function(err) { callback(err); });
req.end();
}
checkApplicationHealth(process.env.APP_INSIGHTS_APP_ID, function(err, metrics) {
if (err) {
console.error("Failed to check health:", err.message);
process.exit(1);
}
console.log("Post-swap metrics:");
console.log(" Total requests:", metrics.totalRequests);
console.log(" Failed requests:", metrics.failedRequests);
console.log(" Error rate:", metrics.errorRate + "%");
console.log(" Avg duration:", Math.round(metrics.avgDuration) + "ms");
if (parseFloat(metrics.errorRate) > 5) {
console.error("Error rate exceeds 5%. Consider rollback.");
process.exit(1);
}
console.log("Post-swap health check passed.");
});
Complete Working Example
Here is the full pipeline YAML combining every stage into one file. Save this as azure-pipelines.yml in your repository root:
trigger:
branches:
include:
- main
pool:
vmImage: ubuntu-latest
variables:
azureSubscription: my-azure-service-connection
appName: my-node-app
resourceGroup: my-resource-group
slotName: staging
stages:
- stage: Build
jobs:
- job: BuildJob
steps:
- task: NodeTool@0
inputs:
versionSpec: 20.x
- script: npm ci
displayName: Install dependencies
- script: npm test
displayName: Run tests
- script: npm prune --production
displayName: Remove dev dependencies
- task: ArchiveFiles@2
inputs:
rootFolderOrFile: $(System.DefaultWorkingDirectory)
includeRootFolder: false
archiveType: zip
archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip
- task: PublishBuildArtifacts@1
inputs:
pathToPublish: $(Build.ArtifactStagingDirectory)
artifactName: drop
- stage: DeployStaging
dependsOn: Build
jobs:
- deployment: DeployToStaging
environment: staging
strategy:
runOnce:
deploy:
steps:
- task: AzureWebApp@1
inputs:
azureSubscription: $(azureSubscription)
appType: webAppLinux
appName: $(appName)
deployToSlotOrASE: true
resourceGroupName: $(resourceGroup)
slotName: $(slotName)
package: $(Pipeline.Workspace)/drop/$(Build.BuildId).zip
runtimeStack: NODE|20-lts
- task: AzureAppServiceSettings@1
inputs:
azureSubscription: $(azureSubscription)
appName: $(appName)
resourceGroupName: $(resourceGroup)
slotName: $(slotName)
appSettings: |
[
{ "name": "SLOT_NAME", "value": "staging", "slotSetting": true }
]
- stage: ValidateHealth
dependsOn: DeployStaging
jobs:
- job: HealthCheck
steps:
- script: |
echo Waiting 30 seconds for warm-up...
sleep 30
STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
MAX_RETRIES=10
ATTEMPT=0
while [ $ATTEMPT -lt $MAX_RETRIES ]; do
ATTEMPT=$((ATTEMPT + 1))
echo "Attempt $ATTEMPT of $MAX_RETRIES"
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/health)
if [ $HTTP_STATUS = 200 ]; then
echo "Health check passed (HTTP $HTTP_STATUS)"
exit 0
fi
echo "Health check returned HTTP $HTTP_STATUS, retrying in 15s..."
sleep 15
done
echo "Health checks failed after $MAX_RETRIES attempts"
exit 1
displayName: Verify staging health
- script: |
STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/)
if [ $HTTP_STATUS \!= 200 ]; then
echo "Smoke test failed (HTTP $HTTP_STATUS)"
exit 1
fi
echo "All smoke tests passed"
displayName: Run smoke tests
- stage: SwapToProduction
dependsOn: ValidateHealth
jobs:
- deployment: SwapSlots
environment: production
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: Swap Slots
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: $(slotName)
- stage: Rollback
dependsOn: SwapToProduction
condition: failed()
jobs:
- deployment: RollbackSwap
environment: production
strategy:
runOnce:
deploy:
steps:
- task: AzureAppServiceManage@0
inputs:
azureSubscription: $(azureSubscription)
action: Swap Slots
webAppName: $(appName)
resourceGroupName: $(resourceGroup)
sourceSlot: $(slotName)
- task: AzureCLI@2
inputs:
azureSubscription: $(azureSubscription)
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az webapp traffic-routing clear --name $(appName) --resource-group $(resourceGroup)
echo "Rollback complete. Traffic restored to previous version."
This pipeline builds your Node.js application, deploys it to the staging slot, validates its health, swaps it into production, and automatically rolls back if the swap stage fails. Every stage is conditional on the previous one succeeding, except for Rollback which runs only on failure.
Common Issues and Troubleshooting
1. Swap Fails with 409 Conflict
Error:
Error: Conflict - Cannot swap slots because target slot 'production' has ongoing operation.
This happens when Azure is still processing a previous operation on the slot. Wait for the operation to complete or cancel it:
az webapp deployment slot list --name my-node-app --resource-group my-resource-group --query "[].{Name:name, State:state}"
The fix is to add a retry loop in your swap step or increase the timeout in your pipeline task.
2. Health Check Returns 503 After Deploy
Error:
Health check returned HTTP 503, retrying in 15s...
Health checks failed after 10 attempts
The app has not finished starting. Common causes: slow database connection initialization, large dependency trees taking time to load, or missing environment variables in the staging slot. Increase the initial wait time from 30 seconds to 60 or 90, and verify that all required environment variables are set on the staging slot:
az webapp config appsettings list --name my-node-app --resource-group my-resource-group --slot staging --output table
3. Slot Settings Not Sticking After Swap
Error: Your staging slot starts behaving like production after a swap, or vice versa.
This means you configured settings with --settings instead of --slot-settings. Regular settings travel with the app during swap. Slot settings stay with the slot. Review which settings should be sticky:
az webapp config appsettings list --name my-node-app --resource-group my-resource-group --query "[?slotSetting==`true`].{Name:name, SlotSetting:slotSetting}"
4. Traffic Routing Not Clearing After Rollback
Error: After a rollback, some users still see the new (broken) version.
Traffic routing percentages persist until explicitly cleared. Make sure your rollback stage includes the traffic routing clear command:
az webapp traffic-routing clear --name my-node-app --resource-group my-resource-group
az webapp traffic-routing show --name my-node-app --resource-group my-resource-group
5. Database Connection Pool Exhaustion During Swap
Error:
MongoServerError: connection pool exhausted, no available connections
During a swap, both slots briefly run simultaneously. If your database connection pool is sized for one instance, two instances will exhaust it. Reduce the pool size per instance or increase the database connection limit:
// Reduce pool size to account for two slots running during swap
var mongoose = require("mongoose");
mongoose.connect(process.env.DB_CONNECTION, {
maxPoolSize: 5 // Half of normal, since two slots share the pool during swap
});
6. Pipeline Hangs on Deployment Environment Approval
Error: The pipeline sits at "Waiting for approval" indefinitely.
Azure DevOps environments can have approval gates configured. If your production environment requires manual approval, the pipeline will wait. Either approve the deployment in Azure DevOps or remove the approval gate for automated deployments:
Navigate to Pipelines > Environments > production > Approvals and checks to manage gates.
Best Practices
Always run health checks before swapping. Never swap blind. A five-minute health validation catches the vast majority of deployment issues before they reach users. If you skip this step, you are relying on luck.
Make database migrations backward-compatible. Both the old and new versions of your app will briefly run against the same database during a swap. Additive changes (new columns, new tables) are safe. Destructive changes (dropping columns, renaming tables) require a two-phase approach across two releases.
Use slot-sticky settings for environment identifiers. Settings like SLOT_NAME, logging levels, and feature flags that should differ between staging and production must be configured as slot settings. Otherwise, they swap with the app and your staging slot starts behaving like production.
Start with gradual traffic routing for high-risk releases. Route 10% of traffic first, monitor for five minutes, then increase to 50%, then complete the swap. This limits the blast radius if something goes wrong.
Keep the staging slot warm between deployments. Do not delete the staging slot after each deployment. Keeping it running means your rollback target is always ready. The cost of an idle slot is far less than the cost of downtime.
Set connection pool sizes to account for dual-slot operation. During a swap, both slots run simultaneously and share database connections. Size your connection pools at half of what a single instance would use.
Monitor error rates for at least 15 minutes after a swap. Some issues only surface under sustained load. Use Application Insights or your monitoring tool to watch error rates, response times, and throughput after every swap.
Tag your deployments in source control. After a successful swap, tag the commit in Git. This makes it easy to identify exactly which code is running in production and what changed between releases.
Test your rollback procedure regularly. Do not wait for a production incident to find out your rollback is broken. Practice rolling back in a staging environment at least once a month.
Keep your pipeline YAML in source control. Treat your pipeline definition as code. Review changes to it with the same rigor as application code changes. A bad pipeline change can be worse than a bad application change.