Implementing Blue-Green Deployments with Azure Pipelines

Shane

2/13/2026

16 min read

A practical guide to implementing blue-green deployments with Azure Pipelines for zero-downtime releases using deployment slots.

azure-devops pipelines deployment blue-green zero-downtime

Implementing Blue-Green Deployments with Azure Pipelines

Overview

Blue-green deployment is a release strategy where you maintain two identical production environments and switch traffic between them during releases, eliminating downtime entirely. Azure App Service deployment slots make this pattern straightforward to implement, and Azure Pipelines gives you the automation backbone to deploy to a staging slot, verify health, swap traffic, and roll back automatically if something goes wrong. If you are running a Node.js application in production and your users cannot tolerate downtime during releases, this is the approach you should be using.

I have been running blue-green deployments in production for over five years now, across dozens of Node.js services. The pattern works. It catches problems before they reach users, and when something does slip through, you can roll back in seconds instead of minutes. This article walks through the complete implementation from scratch -- slot configuration, pipeline YAML, health checks, traffic routing, automated rollback, and the database migration strategy that makes it all work together.

Prerequisites

An Azure subscription with an App Service Plan on the Standard tier or higher (deployment slots are not available on the Free or Basic tiers)
An Azure DevOps organization with a project and a pipeline configured
An Azure Resource Manager service connection in Azure DevOps linked to your subscription
A Node.js application deployed to Azure App Service
Familiarity with YAML pipeline syntax in Azure DevOps
Azure CLI installed locally for testing slot operations

The Blue-Green Deployment Concept

Traditional deployments follow a simple pattern: stop the old version, deploy the new version, start it up, hope nothing breaks. During that window your users see errors, timeouts, or a maintenance page. Blue-green deployment eliminates that window completely.

The idea is simple. You have two environments:

Blue -- the live production environment currently serving traffic
Green -- an idle environment where you deploy the new version

You deploy your new code to the green environment. You run health checks against it. You verify it works. Then you swap traffic from blue to green in one atomic operation. The old blue environment becomes your instant rollback target. If something goes wrong in production after the swap, you swap back in seconds.

In Azure App Service, deployment slots are purpose-built for this pattern. Every App Service has a production slot by default. You create a staging slot (the green environment), deploy there, validate, and then swap. The swap operation is handled at the load balancer level -- it redirects traffic by swapping the virtual IP mappings. No cold starts, no connection drops.

Here is why this matters for Node.js applications specifically:

Node.js cold starts are real. A Node.js app needs to load modules, establish database connection pools, warm caches. With blue-green, the staging slot is fully warmed before it receives production traffic.
npm install failures are caught early. If a dependency fails to install or a native module fails to compile, you find out in the staging slot, not in production.
Database migrations can be tested. Your staging slot can run against the production database (or a replica) to verify migrations before the swap.

Setting Up Azure App Service Deployment Slots

First, create the staging slot using Azure CLI:

# Create a staging deployment slot
az webapp deployment slot create 
  --name my-node-app 
  --resource-group my-resource-group 
  --slot staging

# Verify the slot was created
az webapp deployment slot list 
  --name my-node-app 
  --resource-group my-resource-group 
  --output table

Output:

Name      ResourceGroup      Status
--------  -----------------  --------
staging   my-resource-group  Running

Now configure the staging slot with appropriate settings. There is a critical distinction in Azure App Service between slot settings (sticky to the slot) and regular settings (travel with the app during swap):

# Set a slot-specific setting (stays with the slot, does not swap)
az webapp config appsettings set 
  --name my-node-app 
  --resource-group my-resource-group 
  --slot staging 
  --slot-settings SLOT_NAME=staging

# Set connection strings that should swap with the app
az webapp config appsettings set 
  --name my-node-app 
  --resource-group my-resource-group 
  --slot staging 
  --settings DB_CONNECTION="mongodb://prod-server:27017/myapp"

This is important: slot settings (configured with --slot-settings) are sticky. They stay with the slot even after a swap. Use these for things like logging levels, feature flags that should differ between environments, and slot identifiers. Regular settings (configured with --settings) travel with the app code during a swap. Use these for database connection strings and API keys that the app code needs regardless of which slot it runs in.

For your Node.js app, add a health check endpoint:

var express = require("express");
var mongoose = require("mongoose");
var app = express();

var isReady = false;

// Readiness probe - checks that the app is fully initialized
app.get("/health", function(req, res) {
  if (!isReady) {
    return res.status(503).json({
      status: "not_ready",
      timestamp: new Date().toISOString()
    });
  }

  var checks = {
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    timestamp: new Date().toISOString()
  };

  // Check database connectivity
  if (mongoose.connection.readyState !== 1) {
    checks.database = "disconnected";
    return res.status(503).json(checks);
  }

  checks.database = "connected";
  checks.status = "healthy";
  res.status(200).json(checks);
});

// Liveness probe - basic check that the process is alive
app.get("/health/live", function(req, res) {
  res.status(200).json({ status: "alive" });
});

mongoose.connect(process.env.DB_CONNECTION).then(function() {
  console.log("Database connected");
  isReady = true;
});

var port = process.env.PORT || 8080;
app.listen(port, function() {
  console.log("Server running on port " + port);
});

Building the Pipeline YAML

The pipeline has four stages: Build, Deploy, Validate, and Swap.

Build Stage

trigger:
  branches:
    include:
      - main
pool:
  vmImage: ubuntu-latest
variables:
  azureSubscription: my-azure-service-connection
  appName: my-node-app
  resourceGroup: my-resource-group
  slotName: staging
stages:
  - stage: Build
    jobs:
      - job: BuildJob
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: 20.x
          - script: npm ci
          - script: npm test
          - script: npm prune --production
          - task: ArchiveFiles@2
            inputs:
              rootFolderOrFile: $(System.DefaultWorkingDirectory)
              includeRootFolder: false
              archiveType: zip
              archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip
          - task: PublishBuildArtifacts@1
            inputs:
              pathToPublish: $(Build.ArtifactStagingDirectory)
              artifactName: drop

I use npm ci instead of npm install. It installs from package-lock.json, which is faster and deterministic.

Deploy to Staging Stage

  - stage: DeployStaging
    dependsOn: Build
    jobs:
      - deployment: DeployToStaging
        environment: staging
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: $(azureSubscription)
                    appType: webAppLinux
                    appName: $(appName)
                    deployToSlotOrASE: true
                    resourceGroupName: $(resourceGroup)
                    slotName: $(slotName)
                    package: $(Pipeline.Workspace)/drop/$(Build.BuildId).zip
                    runtimeStack: NODE|20-lts
                - task: AzureAppServiceSettings@1
                  inputs:
                    azureSubscription: $(azureSubscription)
                    appName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    slotName: $(slotName)
                    appSettings: |
                      [
                        { "name": "SLOT_NAME", "value": "staging", "slotSetting": true }
                      ]

The deployment job integrates with Azure DevOps environments for deployment history and approval gates.

Health Checks Before Swap

This is where most teams skip a step and pay for it later. After deploying to staging, verify the application is healthy before swapping.

  - stage: ValidateHealth
    dependsOn: DeployStaging
    jobs:
      - job: HealthCheck
        steps:
          - script: |
              echo Waiting 30 seconds...
              sleep 30
              STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
              MAX_RETRIES=10
              ATTEMPT=0
              while [ $ATTEMPT -lt $MAX_RETRIES ]; do
                ATTEMPT=$((ATTEMPT + 1))
                HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/health)
                if [ $HTTP_STATUS = 200 ]; then
                  echo Health check passed
                  exit 0
                fi
                sleep 15
              done
              echo Health checks failed
              exit 1
            displayName: Verify staging health
          - script: |
              STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
              HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/)
              if [ $HTTP_STATUS \!= 200 ]; then exit 1; fi
              echo All smoke tests passed
            displayName: Run smoke tests

The health check waits 30 seconds for warm-up, then retries up to 10 times with 15-second intervals.

Slot Swapping and Traffic Routing

Once health checks pass, swap traffic from staging to production.

Instant Swap

  - stage: SwapToProduction
    dependsOn: ValidateHealth
    jobs:
      - deployment: SwapSlots
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureAppServiceManage@0
                  inputs:
                    azureSubscription: $(azureSubscription)
                    action: Swap Slots
                    webAppName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    sourceSlot: $(slotName)

The AzureAppServiceManage@0 task performs the atomic swap. The old production code is now in staging -- your rollback target.

Gradual Traffic Routing

For cautious releases, route a percentage of traffic to staging first:

                - task: AzureCLI@2
                  inputs:
                    azureSubscription: $(azureSubscription)
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az webapp traffic-routing set \n                        --name $(appName) \n                        --resource-group $(resourceGroup) \n                        --distribution staging=10
                      sleep 300
                      az webapp traffic-routing set \n                        --name $(appName) \n                        --resource-group $(resourceGroup) \n                        --distribution staging=50
                      sleep 300
                      az webapp traffic-routing clear \n                        --name $(appName) \n                        --resource-group $(resourceGroup)

This sends 10% first, waits, then 50%, before doing the full swap.

Automated Rollback

The rollback stage runs only when the swap stage fails:

  - stage: Rollback
    dependsOn: SwapToProduction
    condition: failed()
    jobs:
      - deployment: RollbackSwap
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureAppServiceManage@0
                  inputs:
                    azureSubscription: $(azureSubscription)
                    action: Swap Slots
                    webAppName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    sourceSlot: $(slotName)
                - task: AzureCLI@2
                  inputs:
                    azureSubscription: $(azureSubscription)
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az webapp traffic-routing clear \n                        --name $(appName) \n                        --resource-group $(resourceGroup)
                      echo Rollback complete.

The condition: failed() means this stage only runs if SwapToProduction fails. Rollback is just another swap. It takes seconds, not minutes. The old version is sitting in the staging slot, fully warmed and ready.

Database Migration Considerations

Blue-green deployments add complexity when your release includes database schema changes. During the swap, both versions need to work with the same database. The rule is simple: all database migrations must be backward-compatible.

Adding a New Column

This is safe. Add the column with a default value or make it nullable:

// migration-001-add-preferences.js
var MongoClient = require("mongodb").MongoClient;

function up(db) {
  return db.collection("users").updateMany(
    { preferences: { $exists: false } },
    { $set: { preferences: { theme: "light", notifications: true } } }
  );
}

function down(db) {
  return db.collection("users").updateMany(
    {},
    { $unset: { preferences: "" } }
  );
}

module.exports = { up: up, down: down };

Renaming a Column

This requires a two-phase approach. In the first release, add the new column and write to both:

// Phase 1 -- write to both fields
function updateUserName(userId, newName) {
  return db.collection("users").updateOne(
    { _id: userId },
    { $set: { name: newName, displayName: newName } }
  );
}

// Read from new field with fallback
function getUserDisplayName(user) {
  return user.displayName || user.name;
}

In the second release (after the old version is gone), remove the old column.

Monitoring Post-Swap

After a swap, monitor the application closely. Azure Application Insights gives you real-time visibility into error rates, response times, and throughput:

// monitoring.js - Post-deployment monitoring helper
var https = require("https");

function checkApplicationHealth(appInsightsKey, callback) {
  var query = encodeURIComponent(
    "requests | where timestamp > ago(5m) | summarize " +
    "totalRequests = count(), " +
    "failedRequests = countif(success == false), " +
    "avgDuration = avg(duration)"
  );

  var options = {
    hostname: "api.applicationinsights.io",
    path: "/v1/apps/" + appInsightsKey + "/query?query=" + query,
    method: "GET",
    headers: {
      "x-api-key": process.env.APP_INSIGHTS_API_KEY
    }
  };

  var req = https.request(options, function(res) {
    var data = "";
    res.on("data", function(chunk) { data += chunk; });
    res.on("end", function() {
      var result = JSON.parse(data);
      var row = result.tables[0].rows[0];
      var metrics = {
        totalRequests: row[0],
        failedRequests: row[1],
        avgDuration: row[2],
        errorRate: row[0] > 0 ? (row[1] / row[0] * 100).toFixed(2) : 0
      };
      callback(null, metrics);
    });
  });

  req.on("error", function(err) { callback(err); });
  req.end();
}

checkApplicationHealth(process.env.APP_INSIGHTS_APP_ID, function(err, metrics) {
  if (err) {
    console.error("Failed to check health:", err.message);
    process.exit(1);
  }
  console.log("Post-swap metrics:");
  console.log("  Total requests:", metrics.totalRequests);
  console.log("  Failed requests:", metrics.failedRequests);
  console.log("  Error rate:", metrics.errorRate + "%");
  console.log("  Avg duration:", Math.round(metrics.avgDuration) + "ms");
  if (parseFloat(metrics.errorRate) > 5) {
    console.error("Error rate exceeds 5%. Consider rollback.");
    process.exit(1);
  }
  console.log("Post-swap health check passed.");
});

Complete Working Example

Here is the full pipeline YAML combining every stage into one file. Save this as azure-pipelines.yml in your repository root:

trigger:
  branches:
    include:
      - main

pool:
  vmImage: ubuntu-latest

variables:
  azureSubscription: my-azure-service-connection
  appName: my-node-app
  resourceGroup: my-resource-group
  slotName: staging

stages:
  - stage: Build
    jobs:
      - job: BuildJob
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: 20.x
          - script: npm ci
            displayName: Install dependencies
          - script: npm test
            displayName: Run tests
          - script: npm prune --production
            displayName: Remove dev dependencies
          - task: ArchiveFiles@2
            inputs:
              rootFolderOrFile: $(System.DefaultWorkingDirectory)
              includeRootFolder: false
              archiveType: zip
              archiveFile: $(Build.ArtifactStagingDirectory)/$(Build.BuildId).zip
          - task: PublishBuildArtifacts@1
            inputs:
              pathToPublish: $(Build.ArtifactStagingDirectory)
              artifactName: drop

  - stage: DeployStaging
    dependsOn: Build
    jobs:
      - deployment: DeployToStaging
        environment: staging
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureWebApp@1
                  inputs:
                    azureSubscription: $(azureSubscription)
                    appType: webAppLinux
                    appName: $(appName)
                    deployToSlotOrASE: true
                    resourceGroupName: $(resourceGroup)
                    slotName: $(slotName)
                    package: $(Pipeline.Workspace)/drop/$(Build.BuildId).zip
                    runtimeStack: NODE|20-lts
                - task: AzureAppServiceSettings@1
                  inputs:
                    azureSubscription: $(azureSubscription)
                    appName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    slotName: $(slotName)
                    appSettings: |
                      [
                        { "name": "SLOT_NAME", "value": "staging", "slotSetting": true }
                      ]

  - stage: ValidateHealth
    dependsOn: DeployStaging
    jobs:
      - job: HealthCheck
        steps:
          - script: |
              echo Waiting 30 seconds for warm-up...
              sleep 30
              STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
              MAX_RETRIES=10
              ATTEMPT=0
              while [ $ATTEMPT -lt $MAX_RETRIES ]; do
                ATTEMPT=$((ATTEMPT + 1))
                echo "Attempt $ATTEMPT of $MAX_RETRIES"
                HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/health)
                if [ $HTTP_STATUS = 200 ]; then
                  echo "Health check passed (HTTP $HTTP_STATUS)"
                  exit 0
                fi
                echo "Health check returned HTTP $HTTP_STATUS, retrying in 15s..."
                sleep 15
              done
              echo "Health checks failed after $MAX_RETRIES attempts"
              exit 1
            displayName: Verify staging health
          - script: |
              STAGING_URL=https://$(appName)-$(slotName).azurewebsites.net
              HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" $STAGING_URL/)
              if [ $HTTP_STATUS \!= 200 ]; then
                echo "Smoke test failed (HTTP $HTTP_STATUS)"
                exit 1
              fi
              echo "All smoke tests passed"
            displayName: Run smoke tests

  - stage: SwapToProduction
    dependsOn: ValidateHealth
    jobs:
      - deployment: SwapSlots
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureAppServiceManage@0
                  inputs:
                    azureSubscription: $(azureSubscription)
                    action: Swap Slots
                    webAppName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    sourceSlot: $(slotName)

  - stage: Rollback
    dependsOn: SwapToProduction
    condition: failed()
    jobs:
      - deployment: RollbackSwap
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureAppServiceManage@0
                  inputs:
                    azureSubscription: $(azureSubscription)
                    action: Swap Slots
                    webAppName: $(appName)
                    resourceGroupName: $(resourceGroup)
                    sourceSlot: $(slotName)
                - task: AzureCLI@2
                  inputs:
                    azureSubscription: $(azureSubscription)
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az webapp traffic-routing clear                         --name $(appName)                         --resource-group $(resourceGroup)
                      echo "Rollback complete. Traffic restored to previous version."

This pipeline builds your Node.js application, deploys it to the staging slot, validates its health, swaps it into production, and automatically rolls back if the swap stage fails. Every stage is conditional on the previous one succeeding, except for Rollback which runs only on failure.

Common Issues and Troubleshooting

1. Swap Fails with 409 Conflict

Error:

Error: Conflict - Cannot swap slots because target slot 'production' has ongoing operation.

This happens when Azure is still processing a previous operation on the slot. Wait for the operation to complete or cancel it:

az webapp deployment slot list --name my-node-app --resource-group my-resource-group --query "[].{Name:name, State:state}"

The fix is to add a retry loop in your swap step or increase the timeout in your pipeline task.

2. Health Check Returns 503 After Deploy

Error:

Health check returned HTTP 503, retrying in 15s...
Health checks failed after 10 attempts

The app has not finished starting. Common causes: slow database connection initialization, large dependency trees taking time to load, or missing environment variables in the staging slot. Increase the initial wait time from 30 seconds to 60 or 90, and verify that all required environment variables are set on the staging slot:

az webapp config appsettings list --name my-node-app --resource-group my-resource-group --slot staging --output table

3. Slot Settings Not Sticking After Swap

Error: Your staging slot starts behaving like production after a swap, or vice versa.

This means you configured settings with --settings instead of --slot-settings. Regular settings travel with the app during swap. Slot settings stay with the slot. Review which settings should be sticky:

az webapp config appsettings list --name my-node-app --resource-group my-resource-group --query "[?slotSetting==`true`].{Name:name, SlotSetting:slotSetting}"

4. Traffic Routing Not Clearing After Rollback

Error: After a rollback, some users still see the new (broken) version.

Traffic routing percentages persist until explicitly cleared. Make sure your rollback stage includes the traffic routing clear command:

az webapp traffic-routing clear --name my-node-app --resource-group my-resource-group
az webapp traffic-routing show --name my-node-app --resource-group my-resource-group

5. Database Connection Pool Exhaustion During Swap

Error:

MongoServerError: connection pool exhausted, no available connections

During a swap, both slots briefly run simultaneously. If your database connection pool is sized for one instance, two instances will exhaust it. Reduce the pool size per instance or increase the database connection limit:

// Reduce pool size to account for two slots running during swap
var mongoose = require("mongoose");
mongoose.connect(process.env.DB_CONNECTION, {
  maxPoolSize: 5  // Half of normal, since two slots share the pool during swap
});

6. Pipeline Hangs on Deployment Environment Approval

Error: The pipeline sits at "Waiting for approval" indefinitely.

Azure DevOps environments can have approval gates configured. If your production environment requires manual approval, the pipeline will wait. Either approve the deployment in Azure DevOps or remove the approval gate for automated deployments:

Navigate to Pipelines > Environments > production > Approvals and checks to manage gates.

Best Practices

Always run health checks before swapping. Never swap blind. A five-minute health validation catches the vast majority of deployment issues before they reach users. If you skip this step, you are relying on luck.
Make database migrations backward-compatible. Both the old and new versions of your app will briefly run against the same database during a swap. Additive changes (new columns, new tables) are safe. Destructive changes (dropping columns, renaming tables) require a two-phase approach across two releases.
Use slot-sticky settings for environment identifiers. Settings like SLOT_NAME, logging levels, and feature flags that should differ between staging and production must be configured as slot settings. Otherwise, they swap with the app and your staging slot starts behaving like production.
Start with gradual traffic routing for high-risk releases. Route 10% of traffic first, monitor for five minutes, then increase to 50%, then complete the swap. This limits the blast radius if something goes wrong.
Keep the staging slot warm between deployments. Do not delete the staging slot after each deployment. Keeping it running means your rollback target is always ready. The cost of an idle slot is far less than the cost of downtime.
Set connection pool sizes to account for dual-slot operation. During a swap, both slots run simultaneously and share database connections. Size your connection pools at half of what a single instance would use.
Monitor error rates for at least 15 minutes after a swap. Some issues only surface under sustained load. Use Application Insights or your monitoring tool to watch error rates, response times, and throughput after every swap.
Tag your deployments in source control. After a successful swap, tag the commit in Git. This makes it easy to identify exactly which code is running in production and what changed between releases.
Test your rollback procedure regularly. Do not wait for a production incident to find out your rollback is broken. Practice rolling back in a staging environment at least once a month.
Keep your pipeline YAML in source control. Treat your pipeline definition as code. Review changes to it with the same rigor as application code changes. A bad pipeline change can be worse than a bad application change.

Implementing Blue-Green Deployments with Azure Pipelines

Implementing Blue-Green Deployments with Azure Pipelines

Overview

Prerequisites

The Blue-Green Deployment Concept

Setting Up Azure App Service Deployment Slots

Building the Pipeline YAML

Build Stage

Deploy to Staging Stage

Health Checks Before Swap

Slot Swapping and Traffic Routing

Instant Swap

Gradual Traffic Routing

Automated Rollback

Database Migration Considerations

Adding a New Column

Renaming a Column

Monitoring Post-Swap

Complete Working Example

Common Issues and Troubleshooting

1. Swap Fails with 409 Conflict

2. Health Check Returns 503 After Deploy

3. Slot Settings Not Sticking After Swap

4. Traffic Routing Not Clearing After Rollback

5. Database Connection Pool Exhaustion During Swap

6. Pipeline Hangs on Deployment Environment Approval

Best Practices

References

Quick Links

Need Expert Help?