Integrations

Kubernetes Deployment from Azure DevOps

Deploy Node.js applications to AKS from Azure DevOps with canary strategies, Helm charts, and health verification

Kubernetes Deployment from Azure DevOps

Azure DevOps provides first-class integration with Azure Kubernetes Service (AKS) that goes well beyond running kubectl apply in a script step. The KubernetesManifest task, Helm task, and Kubernetes environments give you canary deployments, automated rollbacks, health verification, and approval gates — all wired into your pipeline YAML. If you are running Node.js workloads on AKS, this is the deployment model that will keep you sane at scale.

This article walks through everything from initial AKS setup and service connections to advanced deployment strategies with canary traffic splitting and automated health checks. We will build a complete pipeline that takes a Node.js application from source code to a verified production deployment on AKS.

Prerequisites

  • An Azure subscription with permissions to create AKS clusters and Azure Container Registry (ACR)
  • An Azure DevOps organization and project
  • Azure CLI installed locally (az version 2.50+)
  • kubectl installed and on your PATH
  • Docker installed for local testing
  • Basic familiarity with Kubernetes concepts (pods, deployments, services)
  • A Node.js application you want to deploy (we will create a sample one)

Azure Kubernetes Service (AKS) Setup

Before we touch Azure DevOps, we need an AKS cluster and an ACR instance. I always script this rather than clicking through the portal — it is repeatable, reviewable, and you can tear it down and recreate it in minutes.

# Create a resource group
az group create --name rg-nodejs-prod --location eastus

# Create Azure Container Registry
az acr create \
  --resource-group rg-nodejs-prod \
  --name mynodejsacr \
  --sku Basic

# Create AKS cluster with ACR integration
az aks create \
  --resource-group rg-nodejs-prod \
  --name aks-nodejs-prod \
  --node-count 3 \
  --node-vm-size Standard_B2s \
  --attach-acr mynodejsacr \
  --generate-ssh-keys \
  --network-plugin azure \
  --enable-managed-identity

# Get credentials for kubectl
az aks get-credentials \
  --resource-group rg-nodejs-prod \
  --name aks-nodejs-prod

The --attach-acr flag is critical. It creates a role assignment that lets the AKS cluster pull images from your ACR without any image pull secret configuration. Skip this and you will spend an hour debugging ImagePullBackOff errors.

For production, bump the node count and VM size. I typically run Standard_D2s_v3 with 3-5 nodes for mid-traffic Node.js services. The Standard_B2s is fine for staging.

Kubernetes Service Connection

Azure DevOps needs a service connection to talk to your AKS cluster. Go to Project Settings > Service connections > New service connection > Kubernetes. You have three options:

  1. Azure Subscription — uses your Azure AD credentials. Simplest for AKS.
  2. Service Account — uses a Kubernetes service account token. Works with any cluster.
  3. Kubeconfig — paste a kubeconfig file. Useful for non-Azure clusters.

For AKS, always use the Azure Subscription method. It handles token rotation automatically and integrates with Azure RBAC. Name it something descriptive like aks-nodejs-prod — you will reference this name in your pipeline YAML.

# How you reference the service connection in pipeline YAML
- task: KubernetesManifest@1
  inputs:
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    action: 'deploy'
    manifests: 'k8s/deployment.yaml'

Building and Pushing Images to ACR

Every Kubernetes deployment starts with a container image. The Docker@2 task in Azure Pipelines handles building and pushing to ACR in a single step.

variables:
  imageRepository: 'nodejs-api'
  containerRegistry: 'mynodejsacr.azurecr.io'
  dockerfilePath: '$(Build.SourcesDirectory)/Dockerfile'
  tag: '$(Build.BuildId)'

stages:
- stage: Build
  displayName: 'Build and Push Image'
  jobs:
  - job: BuildImage
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: Docker@2
      displayName: 'Build and push to ACR'
      inputs:
        containerRegistry: 'mynodejsacr-connection'
        repository: '$(imageRepository)'
        command: 'buildAndPush'
        Dockerfile: '$(dockerfilePath)'
        tags: |
          $(tag)
          latest

Here is the Dockerfile for a production Node.js application. Multi-stage builds keep the final image small:

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

FROM node:20-alpine
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
COPY --from=builder /app .
USER nodejs
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "server.js"]

Tag images with the build ID, not just latest. When Kubernetes sees the same tag, it may not pull the updated image depending on your imagePullPolicy. Using the build ID guarantees every deployment pulls a fresh image.

The KubernetesManifest Task

The KubernetesManifest@1 task is the workhorse for Kubernetes deployments in Azure Pipelines. It is not just a wrapper around kubectl apply — it handles image substitution, deployment strategies, and rollback.

- task: KubernetesManifest@1
  displayName: 'Deploy to AKS'
  inputs:
    action: 'deploy'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    manifests: |
      k8s/deployment.yaml
      k8s/service.yaml
    containers: |
      mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)

The containers input is the key feature. It automatically replaces the image reference in your manifest YAML with the tagged image you just built. Your manifest can reference a placeholder, and the task swaps it at deploy time:

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nodejs-api
  template:
    metadata:
      labels:
        app: nodejs-api
    spec:
      containers:
      - name: nodejs-api
        image: mynodejsacr.azurecr.io/nodejs-api
        ports:
        - containerPort: 3000
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "250m"
        readinessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 15
          periodSeconds: 20

Deployment Strategies

Rolling Updates

The default strategy. Kubernetes replaces pods gradually, keeping the service available throughout. Configure it in your deployment manifest:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Setting maxUnavailable: 0 ensures zero downtime — Kubernetes will not kill an old pod until a new one passes its readiness probe.

Canary Deployments

Canary is where the KubernetesManifest task really shines. It deploys a subset of pods with the new version and routes a percentage of traffic to them. If the canary fails health checks, you reject it and roll back automatically.

- stage: DeployCanary
  displayName: 'Canary Deployment'
  jobs:
  - deployment: DeployCanary
    environment: 'production.nodejs-api'
    pool:
      vmImage: 'ubuntu-latest'
    strategy:
      canary:
        increments: [25, 50]
        preDeploy:
          steps:
          - script: echo "Preparing canary deployment"
        deploy:
          steps:
          - task: KubernetesManifest@1
            displayName: 'Deploy canary'
            inputs:
              action: 'deploy'
              kubernetesServiceConnection: 'aks-nodejs-prod'
              namespace: 'production'
              strategy: 'canary'
              percentage: $(strategy.increment)
              manifests: 'k8s/deployment.yaml'
              containers: |
                mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)
        postRouteTraffic:
          steps:
          - task: Bash@3
            displayName: 'Verify canary health'
            inputs:
              targetType: 'inline'
              script: |
                echo "Running health verification..."
                for i in $(seq 1 10); do
                  STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://nodejs-api-canary.production.svc.cluster.local:3000/health)
                  if [ "$STATUS" != "200" ]; then
                    echo "Health check failed with status $STATUS on attempt $i"
                    exit 1
                  fi
                  echo "Health check passed (attempt $i)"
                  sleep 10
                done
        on:
          failure:
            steps:
            - task: KubernetesManifest@1
              displayName: 'Reject canary'
              inputs:
                action: 'reject'
                kubernetesServiceConnection: 'aks-nodejs-prod'
                namespace: 'production'
                strategy: 'canary'
                manifests: 'k8s/deployment.yaml'
          success:
            steps:
            - task: KubernetesManifest@1
              displayName: 'Promote canary'
              inputs:
                action: 'promote'
                kubernetesServiceConnection: 'aks-nodejs-prod'
                namespace: 'production'
                strategy: 'canary'
                manifests: 'k8s/deployment.yaml'
                containers: |
                  mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)

The increments: [25, 50] means the pipeline will first route 25% of traffic to the canary, verify health, then bump to 50%, verify again, and finally promote to 100% if everything passes. At any point, a failure triggers the on.failure steps which reject the canary and roll back.

Blue-Green Deployments

Blue-green is conceptually simpler: deploy the new version alongside the old one, then switch traffic all at once. You can implement this with label selectors:

- task: KubernetesManifest@1
  displayName: 'Deploy green'
  inputs:
    action: 'deploy'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    strategy: 'blueGreen'
    routeMethod: 'service'
    manifests: 'k8s/deployment.yaml'
    containers: |
      mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)

After verifying the green deployment, promote it to receive production traffic. If something is wrong, the blue deployment is still running and you revert by switching the service selector back.

Helm Chart Deployment from Pipelines

For more complex applications, Helm charts are the right abstraction. The HelmDeploy@0 task handles chart installation, upgrades, and value overrides.

- task: HelmDeploy@0
  displayName: 'Deploy with Helm'
  inputs:
    connectionType: 'Azure Resource Manager'
    azureSubscription: 'my-azure-subscription'
    azureResourceGroup: 'rg-nodejs-prod'
    kubernetesCluster: 'aks-nodejs-prod'
    namespace: 'production'
    command: 'upgrade'
    chartType: 'FilePath'
    chartPath: '$(Build.SourcesDirectory)/charts/nodejs-api'
    releaseName: 'nodejs-api'
    overrideValues: |
      image.repository=mynodejsacr.azurecr.io/nodejs-api
      image.tag=$(Build.BuildId)
      replicaCount=3
      resources.requests.memory=128Mi
      resources.requests.cpu=100m
    install: true
    waitForExecution: true
    arguments: '--timeout 5m0s'

A minimal Helm chart for a Node.js service looks like this:

# charts/nodejs-api/Chart.yaml
apiVersion: v2
name: nodejs-api
description: Node.js API service
version: 1.0.0
appVersion: "1.0.0"
# charts/nodejs-api/values.yaml
replicaCount: 3

image:
  repository: mynodejsacr.azurecr.io/nodejs-api
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 3000

resources:
  requests:
    memory: "128Mi"
    cpu: "100m"
  limits:
    memory: "256Mi"
    cpu: "250m"

env:
  NODE_ENV: production
  PORT: "3000"
# charts/nodejs-api/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      containers:
      - name: {{ .Release.Name }}
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: {{ .Values.image.pullPolicy }}
        ports:
        - containerPort: {{ .Values.service.port }}
        env:
        {{- range $key, $value := .Values.env }}
        - name: {{ $key }}
          value: {{ $value | quote }}
        {{- end }}
        readinessProbe:
          httpGet:
            path: /health
            port: {{ .Values.service.port }}
          initialDelaySeconds: 5
          periodSeconds: 10

The --timeout 5m0s flag is important. Without it, Helm defaults to 5 minutes, which may not be enough if you are pulling large images or your health checks take time. For Node.js apps that boot fast, 5 minutes is usually sufficient, but set it explicitly so you are not guessing when a deployment hangs.

Namespace Management

I run separate namespaces per environment. Create them in your pipeline or as a one-time setup:

- task: Kubernetes@1
  displayName: 'Create namespace if not exists'
  inputs:
    connectionType: 'Kubernetes Service Connection'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    command: 'apply'
    useConfigurationFile: true
    inline: |
      apiVersion: v1
      kind: Namespace
      metadata:
        name: production
        labels:
          environment: production

Set resource quotas per namespace to prevent a runaway deployment from starving other services:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "4"
    requests.memory: "8Gi"
    limits.cpu: "8"
    limits.memory: "16Gi"
    pods: "50"

ConfigMaps and Secrets in Pipelines

Never hardcode configuration in your Docker image. Use ConfigMaps for non-sensitive config and Secrets for credentials.

- task: KubernetesManifest@1
  displayName: 'Deploy ConfigMap'
  inputs:
    action: 'deploy'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    manifests: 'k8s/configmap.yaml'

- task: Kubernetes@1
  displayName: 'Create secrets from pipeline variables'
  inputs:
    connectionType: 'Kubernetes Service Connection'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    command: 'apply'
    useConfigurationFile: true
    inline: |
      apiVersion: v1
      kind: Secret
      metadata:
        name: nodejs-api-secrets
        namespace: production
      type: Opaque
      stringData:
        DB_CONNECTION: $(DB_CONNECTION_STRING)
        API_KEY: $(API_KEY)

Reference these in your deployment:

spec:
  containers:
  - name: nodejs-api
    envFrom:
    - configMapRef:
        name: nodejs-api-config
    - secretRef:
        name: nodejs-api-secrets

In your Node.js application, access them as environment variables:

var express = require('express');
var app = express();

var dbConnection = process.env.DB_CONNECTION;
var apiKey = process.env.API_KEY;
var nodeEnv = process.env.NODE_ENV;
var port = process.env.PORT || 3000;

app.get('/health', function(req, res) {
  res.status(200).json({ status: 'healthy', environment: nodeEnv });
});

app.listen(port, function() {
  console.log('Server running on port ' + port);
});

Health Check Verification

Health checks are not optional. Every Node.js service I deploy has a /health endpoint that verifies critical dependencies:

var express = require('express');
var mongoose = require('mongoose');
var router = express.Router();

router.get('/health', function(req, res) {
  var checks = {
    uptime: process.uptime(),
    timestamp: Date.now(),
    memory: process.memoryUsage(),
    database: 'unknown'
  };

  // Check database connectivity
  if (mongoose.connection.readyState === 1) {
    checks.database = 'connected';
    res.status(200).json(checks);
  } else {
    checks.database = 'disconnected';
    res.status(503).json(checks);
  }
});

module.exports = router;

In your pipeline, verify the health endpoint after deployment:

- task: Bash@3
  displayName: 'Verify deployment health'
  inputs:
    targetType: 'inline'
    script: |
      echo "Waiting for deployment to stabilize..."
      sleep 30

      # Check rollout status
      kubectl rollout status deployment/nodejs-api \
        --namespace production \
        --timeout=300s

      # Verify pod health
      READY_PODS=$(kubectl get deployment nodejs-api \
        --namespace production \
        -o jsonpath='{.status.readyReplicas}')
      DESIRED_PODS=$(kubectl get deployment nodejs-api \
        --namespace production \
        -o jsonpath='{.spec.replicas}')

      echo "Ready: $READY_PODS / Desired: $DESIRED_PODS"

      if [ "$READY_PODS" != "$DESIRED_PODS" ]; then
        echo "ERROR: Not all pods are ready"
        kubectl describe deployment nodejs-api --namespace production
        kubectl logs -l app=nodejs-api --namespace production --tail=50
        exit 1
      fi

      echo "Deployment verified successfully"

Kubernetes Environments in Azure DevOps

Environments give you approval gates, deployment history, and traceability. Define them in your pipeline and they auto-create in Azure DevOps:

stages:
- stage: DeployStaging
  jobs:
  - deployment: DeployStaging
    environment: 'staging.nodejs-api'
    strategy:
      runOnce:
        deploy:
          steps:
          - task: KubernetesManifest@1
            inputs:
              action: 'deploy'
              kubernetesServiceConnection: 'aks-nodejs-staging'
              namespace: 'staging'
              manifests: 'k8s/deployment.yaml'
              containers: |
                mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)

- stage: DeployProduction
  dependsOn: DeployStaging
  condition: succeeded()
  jobs:
  - deployment: DeployProduction
    environment: 'production.nodejs-api'
    strategy:
      runOnce:
        deploy:
          steps:
          - task: KubernetesManifest@1
            inputs:
              action: 'deploy'
              kubernetesServiceConnection: 'aks-nodejs-prod'
              namespace: 'production'
              manifests: 'k8s/deployment.yaml'
              containers: |
                mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)

The environment: 'production.nodejs-api' syntax creates a resource named nodejs-api inside the production environment. You can add manual approval checks, business hours restrictions, or exclusive lock policies through the Azure DevOps UI under Pipelines > Environments.

kubectl Commands in Pipelines

Sometimes you need raw kubectl access. The Kubernetes@1 task lets you run arbitrary commands:

- task: Kubernetes@1
  displayName: 'Scale deployment'
  inputs:
    connectionType: 'Kubernetes Service Connection'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    command: 'scale'
    arguments: 'deployment/nodejs-api --replicas=5'

- task: Kubernetes@1
  displayName: 'Get pod status'
  inputs:
    connectionType: 'Kubernetes Service Connection'
    kubernetesServiceConnection: 'aks-nodejs-prod'
    namespace: 'production'
    command: 'get'
    arguments: 'pods -l app=nodejs-api -o wide'

For debugging failed deployments, dump logs in a script step:

- task: Bash@3
  displayName: 'Debug failed deployment'
  condition: failed()
  inputs:
    targetType: 'inline'
    script: |
      echo "=== Deployment Status ==="
      kubectl get deployment nodejs-api -n production -o yaml
      echo "=== Pod Events ==="
      kubectl get events -n production --sort-by='.lastTimestamp' | tail -20
      echo "=== Pod Logs ==="
      kubectl logs -l app=nodejs-api -n production --tail=100 --all-containers

Kustomize Overlays

Kustomize lets you maintain a base configuration with per-environment overlays. This is cleaner than templating for simple variations.

k8s/
├── base/
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   └── service.yaml
├── overlays/
│   ├── staging/
│   │   ├── kustomization.yaml
│   │   └── replica-patch.yaml
│   └── production/
│       ├── kustomization.yaml
│       └── replica-patch.yaml
# k8s/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
# k8s/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  - ../../base
patchesStrategicMerge:
  - replica-patch.yaml
images:
  - name: mynodejsacr.azurecr.io/nodejs-api
    newTag: PLACEHOLDER
# k8s/overlays/production/replica-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-api
spec:
  replicas: 5
  template:
    spec:
      containers:
      - name: nodejs-api
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"

Deploy with Kustomize in your pipeline:

- task: Bash@3
  displayName: 'Deploy with Kustomize'
  inputs:
    targetType: 'inline'
    script: |
      cd $(Build.SourcesDirectory)/k8s/overlays/production
      kustomize edit set image \
        mynodejsacr.azurecr.io/nodejs-api=mynodejsacr.azurecr.io/nodejs-api:$(Build.BuildId)
      kustomize build . | kubectl apply -f - --namespace production

Monitoring Deployments

Wire up deployment monitoring directly in your pipeline. I check three things after every deployment: rollout status, pod readiness, and HTTP endpoint availability.

- task: Bash@3
  displayName: 'Monitor deployment rollout'
  inputs:
    targetType: 'inline'
    script: |
      echo "Monitoring rollout..."
      kubectl rollout status deployment/nodejs-api \
        -n production --timeout=600s

      echo "Checking pod distribution across nodes..."
      kubectl get pods -l app=nodejs-api -n production \
        -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase

      echo "Verifying resource usage..."
      kubectl top pods -l app=nodejs-api -n production || true

For production, integrate with Azure Monitor. Add annotations to your deployment so you can correlate deployments with metrics:

spec:
  template:
    metadata:
      annotations:
        deployment-timestamp: "$(Build.BuildId)"
        prometheus.io/scrape: "true"
        prometheus.io/port: "3000"
        prometheus.io/path: "/metrics"

Complete Working Example

Here is the full pipeline that builds a Node.js Docker image, pushes it to ACR, and deploys to AKS with a canary strategy and health verification.

First, the Node.js application:

// server.js
var express = require('express');
var app = express();
var port = process.env.PORT || 3000;
var version = process.env.APP_VERSION || '1.0.0';

app.use(express.json());

app.get('/health', function(req, res) {
  var healthCheck = {
    status: 'healthy',
    version: version,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
    memory: {
      used: Math.round(process.memoryUsage().heapUsed / 1024 / 1024) + 'MB',
      total: Math.round(process.memoryUsage().heapTotal / 1024 / 1024) + 'MB'
    }
  };
  res.status(200).json(healthCheck);
});

app.get('/api/items', function(req, res) {
  res.json({ items: [], version: version });
});

app.get('/', function(req, res) {
  res.json({ message: 'Node.js API', version: version });
});

app.listen(port, function() {
  console.log('Server v' + version + ' running on port ' + port);
});

module.exports = app;

The full pipeline YAML:

# azure-pipelines.yaml
trigger:
  branches:
    include:
      - main
  paths:
    exclude:
      - README.md
      - docs/*

variables:
  imageRepository: 'nodejs-api'
  containerRegistry: 'mynodejsacr.azurecr.io'
  dockerRegistryServiceConnection: 'mynodejsacr-connection'
  kubernetesServiceConnection: 'aks-nodejs-prod'
  dockerfilePath: '$(Build.SourcesDirectory)/Dockerfile'
  tag: '$(Build.BuildId)'
  namespace: 'production'
  vmImageName: 'ubuntu-latest'

stages:
# Stage 1: Run tests
- stage: Test
  displayName: 'Run Tests'
  jobs:
  - job: Test
    pool:
      vmImage: $(vmImageName)
    steps:
    - task: NodeTool@0
      inputs:
        versionSpec: '20.x'
    - script: |
        npm ci
        npm test
      displayName: 'Install and test'

# Stage 2: Build and push Docker image
- stage: Build
  displayName: 'Build and Push'
  dependsOn: Test
  jobs:
  - job: Build
    pool:
      vmImage: $(vmImageName)
    steps:
    - task: Docker@2
      displayName: 'Build and push image'
      inputs:
        containerRegistry: $(dockerRegistryServiceConnection)
        repository: $(imageRepository)
        command: 'buildAndPush'
        Dockerfile: $(dockerfilePath)
        tags: |
          $(tag)
          latest

# Stage 3: Deploy canary to production
- stage: DeployCanary
  displayName: 'Canary Deployment'
  dependsOn: Build
  jobs:
  - deployment: DeployCanary
    displayName: 'Deploy Canary'
    pool:
      vmImage: $(vmImageName)
    environment: 'production.nodejs-api'
    strategy:
      canary:
        increments: [25]
        preDeploy:
          steps:
          - script: |
              echo "Deploying canary for build $(Build.BuildId)"
              echo "Image: $(containerRegistry)/$(imageRepository):$(tag)"
            displayName: 'Pre-deploy info'
        deploy:
          steps:
          - task: KubernetesManifest@1
            displayName: 'Deploy canary pods'
            inputs:
              action: 'deploy'
              kubernetesServiceConnection: $(kubernetesServiceConnection)
              namespace: $(namespace)
              strategy: 'canary'
              percentage: $(strategy.increment)
              manifests: |
                k8s/deployment.yaml
                k8s/service.yaml
              containers: |
                $(containerRegistry)/$(imageRepository):$(tag)
        postRouteTraffic:
          steps:
          - task: Bash@3
            displayName: 'Canary health verification'
            inputs:
              targetType: 'inline'
              script: |
                echo "Verifying canary health for 2 minutes..."
                FAILURES=0
                for i in $(seq 1 12); do
                  # Check rollout status of canary
                  READY=$(kubectl get pods -n $(namespace) \
                    -l app=nodejs-api,azure-pipelines/version=canary \
                    -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}' \
                    2>/dev/null)

                  if echo "$READY" | grep -q "False"; then
                    FAILURES=$((FAILURES + 1))
                    echo "WARNING: Canary pod not ready (attempt $i, failures: $FAILURES)"
                  else
                    echo "Canary health check passed (attempt $i)"
                  fi

                  if [ "$FAILURES" -ge 3 ]; then
                    echo "ERROR: Too many consecutive failures. Canary is unhealthy."
                    kubectl logs -l app=nodejs-api,azure-pipelines/version=canary \
                      -n $(namespace) --tail=30
                    exit 1
                  fi

                  sleep 10
                done

                echo "Canary verification passed. Promoting to full deployment."
        on:
          failure:
            steps:
            - task: KubernetesManifest@1
              displayName: 'Reject canary'
              inputs:
                action: 'reject'
                kubernetesServiceConnection: $(kubernetesServiceConnection)
                namespace: $(namespace)
                strategy: 'canary'
                manifests: 'k8s/deployment.yaml'
            - script: |
                echo "##vso[task.logissue type=error]Canary deployment rejected for build $(Build.BuildId)"
              displayName: 'Log rejection'
          success:
            steps:
            - task: KubernetesManifest@1
              displayName: 'Promote canary to stable'
              inputs:
                action: 'promote'
                kubernetesServiceConnection: $(kubernetesServiceConnection)
                namespace: $(namespace)
                strategy: 'canary'
                manifests: |
                  k8s/deployment.yaml
                  k8s/service.yaml
                containers: |
                  $(containerRegistry)/$(imageRepository):$(tag)

# Stage 4: Post-deployment verification
- stage: Verify
  displayName: 'Post-Deploy Verification'
  dependsOn: DeployCanary
  jobs:
  - job: VerifyDeployment
    pool:
      vmImage: $(vmImageName)
    steps:
    - task: Kubernetes@1
      displayName: 'Check deployment status'
      inputs:
        connectionType: 'Kubernetes Service Connection'
        kubernetesServiceConnection: $(kubernetesServiceConnection)
        namespace: $(namespace)
        command: 'get'
        arguments: 'deployment nodejs-api -o wide'
    - task: Bash@3
      displayName: 'Verify all pods running'
      inputs:
        targetType: 'inline'
        script: |
          READY=$(kubectl get deployment nodejs-api -n $(namespace) \
            -o jsonpath='{.status.readyReplicas}')
          DESIRED=$(kubectl get deployment nodejs-api -n $(namespace) \
            -o jsonpath='{.spec.replicas}')

          echo "Ready replicas: $READY"
          echo "Desired replicas: $DESIRED"

          if [ "$READY" != "$DESIRED" ]; then
            echo "##vso[task.logissue type=error]Deployment incomplete: $READY/$DESIRED pods ready"
            exit 1
          fi

          echo "All $READY pods are healthy and running"

Common Issues and Troubleshooting

1. ImagePullBackOff — ACR Authentication Failure

Events:
  Warning  Failed   Back-off pulling image "mynodejsacr.azurecr.io/nodejs-api:42"
  Warning  Failed   Error: ErrImagePull

This happens when AKS cannot authenticate to ACR. Verify the ACR integration:

az aks check-acr --resource-group rg-nodejs-prod \
  --name aks-nodejs-prod \
  --acr mynodejsacr.azurecr.io

If it fails, reattach:

az aks update \
  --resource-group rg-nodejs-prod \
  --name aks-nodejs-prod \
  --attach-acr mynodejsacr

2. CrashLoopBackOff — Node.js Process Exits

Events:
  Warning  BackOff  Back-off restarting failed container

This usually means your Node.js process is crashing on startup. Common causes: missing environment variables, database connection failures, port conflicts. Check the logs:

kubectl logs deployment/nodejs-api -n production --previous

The --previous flag shows logs from the crashed container, not the current restart attempt. Nine times out of ten, it is a missing environment variable that you forgot to add to the ConfigMap or Secret.

3. Service Connection Authorization Failure

##[error]Error: Could not find any Kubernetes cluster matching the specified criteria

Your service connection lost its credentials. For Azure Subscription connections, this happens when the service principal expires. Go to Project Settings > Service connections, edit the connection, and click Verify. If verification fails, delete and recreate the connection.

4. Canary Pods Stuck in Pending State

Events:
  Warning  FailedScheduling  0/3 nodes are available: 3 Insufficient cpu

Your cluster is out of resources. The canary creates additional pods on top of your existing deployment, so you need headroom. Either scale your node pool or reduce resource requests:

az aks nodepool scale \
  --resource-group rg-nodejs-prod \
  --cluster-name aks-nodejs-prod \
  --name nodepool1 \
  --node-count 5

Or enable the cluster autoscaler:

az aks update \
  --resource-group rg-nodejs-prod \
  --name aks-nodejs-prod \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 10

5. Helm Release Stuck in "pending-upgrade" State

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

A previous Helm operation was interrupted. Force a rollback:

helm rollback nodejs-api 0 --namespace production

If that does not work, uninstall and reinstall:

helm uninstall nodejs-api --namespace production
# Then re-run your pipeline

Best Practices

  • Always set resource requests and limits. Without them, a single Node.js memory leak can take down your entire node. Set memory limits to 2x your typical usage and CPU limits to 1.5x.

  • Use readiness probes, not just liveness probes. A readiness probe prevents traffic from reaching a pod that is not ready. A liveness probe restarts a stuck pod. You need both, and they should check different things — readiness checks the whole dependency chain, liveness just checks if the process is alive.

  • Tag images with build IDs, never deploy :latest to production. The latest tag makes rollbacks impossible and debugging a nightmare. Every deployment should reference an exact, immutable image tag tied to a specific commit.

  • Run canary deployments in production, not just staging. Staging environments never perfectly mirror production traffic patterns. Canary in production with 10-25% traffic catches issues that staging misses.

  • Store Kubernetes manifests in the same repository as your application code. Gitops purists will disagree, but for most teams, co-locating manifests with application code reduces coordination overhead and makes PRs self-contained.

  • Set up namespace-level resource quotas from day one. Before you have a runaway pod eating 32GB of memory at 3 AM, put guardrails in place. It takes five minutes and saves hours of incident response.

  • Use pod disruption budgets (PDBs) for zero-downtime upgrades. A PDB ensures Kubernetes never kills too many pods at once during node maintenance or cluster upgrades. Set minAvailable to at least 50% of your replicas.

  • Separate your CI (build/test) from CD (deploy) stages with approval gates. Automated tests should gate the build stage. Human approval or automated health checks should gate the production deploy stage. Never let a single pipeline run straight from commit to production without a checkpoint.

  • Monitor deployment frequency and failure rate. Track how often you deploy and how often deployments fail. If your failure rate is above 5%, your pipeline needs more testing stages or your canary verification is not strict enough.

References

Powered by Contentful