Integrations

PagerDuty Alerts from Azure DevOps

A comprehensive guide to integrating PagerDuty with Azure DevOps for incident management, covering pipeline failure alerts, deployment incident triggers, service hook configuration, bidirectional work item sync, and on-call escalation workflows with working automation examples.

PagerDuty Alerts from Azure DevOps

Overview

PagerDuty is the incident management platform that wakes people up when production breaks. Azure DevOps is where you build and deploy the code. Connecting them means pipeline failures, failed deployments, and quality gate violations trigger PagerDuty incidents that route to the right on-call engineer through the right escalation policy. I have implemented this integration for teams where a broken production deployment at 3 AM used to go unnoticed until morning standup. With PagerDuty connected to the deployment pipeline, the on-call engineer gets paged within seconds of a failed deploy, and the incident includes all the context needed to start troubleshooting.

Prerequisites

  • PagerDuty account (any tier — free tier works for basic integration)
  • Azure DevOps organization with Pipelines enabled
  • PagerDuty integration key (Events API v2) for your service
  • Node.js 16 or later for automation scripts
  • Azure DevOps Personal Access Token for bidirectional integration
  • Familiarity with PagerDuty services, escalation policies, and incidents

PagerDuty Events API v2

PagerDuty's Events API v2 is the primary integration point. It accepts trigger, acknowledge, and resolve events that create and manage incidents.

Creating a PagerDuty Integration

  1. In PagerDuty, navigate to Services > Service Directory
  2. Select your service (or create one for your deployment pipeline)
  3. Go to Integrations > Add Integration
  4. Select Events API v2
  5. Copy the Integration Key (also called routing key)

Sending Events from Pipeline Steps

# azure-pipelines.yml
steps:
  - script: npm ci && npm test && npm run build
    displayName: "Build and Test"

  - script: echo "Deploying to production..."
    displayName: "Deploy"

  - script: |
      curl -s -X POST "https://events.pagerduty.com/v2/enqueue" \
        -H "Content-Type: application/json" \
        -d '{
          "routing_key": "$(PAGERDUTY_ROUTING_KEY)",
          "event_action": "trigger",
          "dedup_key": "deploy-$(Build.DefinitionName)-$(Build.BuildId)",
          "payload": {
            "summary": "Deployment failed: $(Build.DefinitionName) #$(Build.BuildNumber)",
            "severity": "critical",
            "source": "azure-devops",
            "component": "$(Build.Repository.Name)",
            "group": "deployments",
            "class": "pipeline_failure",
            "custom_details": {
              "build_number": "$(Build.BuildNumber)",
              "branch": "$(Build.SourceBranchName)",
              "triggered_by": "$(Build.RequestedFor)",
              "build_url": "$(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)"
            }
          },
          "links": [
            {
              "href": "$(System.TeamFoundationCollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)",
              "text": "View Build in Azure DevOps"
            }
          ]
        }'
    displayName: "Page on-call (deployment failed)"
    condition: failed()
    env:
      PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)

The condition: failed() ensures the PagerDuty step only runs when the pipeline fails. The dedup_key prevents duplicate incidents if the pipeline is retried.

PagerDuty Alert Service

Build a reusable Node.js module for sending PagerDuty events from any pipeline or automation:

// pagerduty/client.js
var https = require("https");

function sendEvent(routingKey, eventAction, dedupKey, payload, links, callback) {
    var body = JSON.stringify({
        routing_key: routingKey,
        event_action: eventAction,
        dedup_key: dedupKey,
        payload: payload,
        links: links || []
    });

    var options = {
        hostname: "events.pagerduty.com",
        path: "/v2/enqueue",
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(body)
        }
    };

    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            if (res.statusCode >= 200 && res.statusCode < 300) {
                var parsed = JSON.parse(data);
                callback(null, parsed);
            } else {
                callback(new Error("PagerDuty error " + res.statusCode + ": " + data));
            }
        });
    });

    req.on("error", callback);
    req.write(body);
    req.end();
}

function triggerIncident(routingKey, summary, severity, source, details, links, callback) {
    var dedupKey = source + "-" + Date.now();
    var payload = {
        summary: summary,
        severity: severity, // critical, error, warning, info
        source: source,
        component: details.component || "",
        group: details.group || "",
        class: details.class || "",
        custom_details: details
    };
    sendEvent(routingKey, "trigger", dedupKey, payload, links, callback);
}

function acknowledgeIncident(routingKey, dedupKey, callback) {
    sendEvent(routingKey, "acknowledge", dedupKey, null, null, callback);
}

function resolveIncident(routingKey, dedupKey, callback) {
    sendEvent(routingKey, "resolve", dedupKey, null, null, callback);
}

module.exports = {
    triggerIncident: triggerIncident,
    acknowledgeIncident: acknowledgeIncident,
    resolveIncident: resolveIncident,
    sendEvent: sendEvent
};

Pipeline Failure Alerting

Severity-Based Routing

Different pipeline failures deserve different urgency levels. A failed nightly build is less urgent than a failed production deployment:

// pagerduty/pipeline-alert.js
var pd = require("./client");

var ROUTING_KEY = process.env.PAGERDUTY_ROUTING_KEY;
var BUILD_DEFINITION = process.env.BUILD_DEFINITIONNAME || "unknown";
var BUILD_NUMBER = process.env.BUILD_BUILDNUMBER || "local";
var BUILD_BRANCH = (process.env.BUILD_SOURCEBRANCH || "").replace("refs/heads/", "");
var BUILD_REASON = process.env.BUILD_REASON || "manual";
var BUILD_URL = (process.env.SYSTEM_TEAMFOUNDATIONCOLLECTIONURI || "") +
    (process.env.SYSTEM_TEAMPROJECT || "") +
    "/_build/results?buildId=" + (process.env.BUILD_BUILDID || "0");
var JOB_STATUS = process.env.AGENT_JOBSTATUS || "Unknown";

// Determine severity based on context
function determineSeverity() {
    // Production deployments are critical
    if (BUILD_DEFINITION.indexOf("prod") !== -1 || BUILD_DEFINITION.indexOf("release") !== -1) {
        return "critical";
    }
    // Main branch failures are high severity
    if (BUILD_BRANCH === "main" || BUILD_BRANCH === "master") {
        return "error";
    }
    // Scheduled/CI builds are warnings
    if (BUILD_REASON === "schedule" || BUILD_REASON === "individualCI") {
        return "warning";
    }
    // Everything else is info
    return "info";
}

if (JOB_STATUS === "Failed" || JOB_STATUS === "Canceled") {
    var severity = determineSeverity();

    console.log("Pipeline failed. Severity: " + severity);
    console.log("Definition: " + BUILD_DEFINITION);
    console.log("Branch: " + BUILD_BRANCH);

    pd.triggerIncident(
        ROUTING_KEY,
        "Pipeline failed: " + BUILD_DEFINITION + " #" + BUILD_NUMBER + " on " + BUILD_BRANCH,
        severity,
        "azure-devops-pipeline",
        {
            component: BUILD_DEFINITION,
            group: "ci-cd",
            class: "pipeline_failure",
            build_number: BUILD_NUMBER,
            branch: BUILD_BRANCH,
            reason: BUILD_REASON,
            status: JOB_STATUS
        },
        [{
            href: BUILD_URL,
            text: "View Build #" + BUILD_NUMBER
        }],
        function (err, result) {
            if (err) {
                console.error("PagerDuty trigger failed: " + err.message);
            } else {
                console.log("PagerDuty incident triggered: " + result.dedup_key);
            }
        }
    );
} else {
    console.log("Pipeline status: " + JOB_STATUS + " — no alert needed");
}

Auto-Resolve on Successful Retry

When a failed pipeline is retried and succeeds, automatically resolve the PagerDuty incident:

# In your pipeline
steps:
  - script: npm ci && npm test
    displayName: "Build and Test"

  # On failure: trigger PagerDuty
  - script: node pagerduty/pipeline-alert.js
    displayName: "Alert on failure"
    condition: failed()
    env:
      PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)

  # On success: resolve any existing incident for this pipeline
  - script: |
      node -e "
      var pd = require('./pagerduty/client');
      var dedupKey = 'pipeline-' + process.env.BUILD_DEFINITIONNAME + '-failure';
      pd.resolveIncident(process.env.PAGERDUTY_ROUTING_KEY, dedupKey, function(err) {
        if (err) { console.log('No incident to resolve (expected if none was open)'); }
        else { console.log('Resolved PagerDuty incident: ' + dedupKey); }
      });
      "
    displayName: "Resolve PagerDuty on success"
    condition: succeeded()
    env:
      PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)

Bidirectional Work Item Integration

When a PagerDuty incident is created, automatically create an Azure DevOps work item. When the incident is resolved, update the work item.

Webhook from PagerDuty to Azure DevOps

Configure a PagerDuty webhook extension to call your middleware:

  1. In PagerDuty, go to Integrations > Generic Webhooks (v3)
  2. Add a webhook subscription for your service
  3. Events: incident.triggered, incident.acknowledged, incident.resolved
  4. URL: Your middleware endpoint
// pagerduty/webhook-receiver.js
var express = require("express");
var https = require("https");

var app = express();
app.use(express.json());

var AZURE_ORG = process.env.AZURE_ORG;
var AZURE_PROJECT = process.env.AZURE_PROJECT;
var AZURE_PAT = process.env.AZURE_PAT;

function createAzureWorkItem(incident, callback) {
    var auth = Buffer.from(":" + AZURE_PAT).toString("base64");

    var priority;
    switch (incident.urgency) {
        case "high": priority = 1; break;
        case "low": priority = 3; break;
        default: priority = 2;
    }

    var patchDoc = [
        { op: "add", path: "/fields/System.Title", value: "[Incident] " + incident.title },
        { op: "add", path: "/fields/System.Description", value:
            "<p><strong>PagerDuty Incident:</strong> <a href=\"" + incident.html_url + "\">" + incident.incident_number + "</a></p>" +
            "<p><strong>Service:</strong> " + (incident.service ? incident.service.summary : "unknown") + "</p>" +
            "<p><strong>Urgency:</strong> " + incident.urgency + "</p>" +
            "<p><strong>Created:</strong> " + incident.created_at + "</p>" +
            "<p><strong>Description:</strong> " + (incident.description || "No description") + "</p>"
        },
        { op: "add", path: "/fields/Microsoft.VSTS.Common.Priority", value: priority },
        { op: "add", path: "/fields/System.Tags", value: "pagerduty;incident;" + incident.urgency }
    ];

    var body = JSON.stringify(patchDoc);
    var options = {
        hostname: "dev.azure.com",
        path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis/wit/workitems/$Bug?api-version=7.1",
        method: "POST",
        headers: {
            "Authorization": "Basic " + auth,
            "Content-Type": "application/json-patch+json",
            "Content-Length": Buffer.byteLength(body)
        }
    };

    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            if (res.statusCode >= 200 && res.statusCode < 300) {
                callback(null, JSON.parse(data));
            } else {
                callback(new Error("Work item creation failed: " + res.statusCode));
            }
        });
    });
    req.on("error", callback);
    req.write(body);
    req.end();
}

function updateAzureWorkItem(incidentNumber, state, callback) {
    var auth = Buffer.from(":" + AZURE_PAT).toString("base64");

    // Find the work item by title
    var wiql = {
        query: "SELECT [System.Id] FROM WorkItems WHERE [System.Title] CONTAINS '[Incident]' AND [System.Tags] CONTAINS 'pagerduty' AND [System.State] <> 'Closed' ORDER BY [System.CreatedDate] DESC"
    };

    var searchBody = JSON.stringify(wiql);
    var searchOptions = {
        hostname: "dev.azure.com",
        path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis/wit/wiql?api-version=7.1",
        method: "POST",
        headers: {
            "Authorization": "Basic " + auth,
            "Content-Type": "application/json",
            "Content-Length": Buffer.byteLength(searchBody)
        }
    };

    var req = https.request(searchOptions, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            var result = JSON.parse(data);
            if (!result.workItems || result.workItems.length === 0) {
                return callback(null, null);
            }

            var workItemId = result.workItems[0].id;
            var patchDoc = [
                { op: "add", path: "/fields/System.State", value: state },
                { op: "add", path: "/fields/System.History", value: "PagerDuty incident resolved." }
            ];

            var patchBody = JSON.stringify(patchDoc);
            var patchOptions = {
                hostname: "dev.azure.com",
                path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis/wit/workitems/" + workItemId + "?api-version=7.1",
                method: "PATCH",
                headers: {
                    "Authorization": "Basic " + auth,
                    "Content-Type": "application/json-patch+json",
                    "Content-Length": Buffer.byteLength(patchBody)
                }
            };

            var patchReq = https.request(patchOptions, function (patchRes) {
                var patchData = "";
                patchRes.on("data", function (chunk) { patchData += chunk; });
                patchRes.on("end", function () { callback(null, workItemId); });
            });
            patchReq.on("error", callback);
            patchReq.write(patchBody);
            patchReq.end();
        });
    });
    req.on("error", callback);
    req.write(searchBody);
    req.end();
}

app.post("/webhooks/pagerduty", function (req, res) {
    var payload = req.body;
    var event = payload.event;

    if (!event || !event.data) {
        return res.status(200).json({ received: true });
    }

    var eventType = event.event_type;
    var incident = event.data;

    console.log("[PagerDuty] " + eventType + ": " + incident.title);

    if (eventType === "incident.triggered") {
        createAzureWorkItem(incident, function (err, workItem) {
            if (err) { console.error("Work item creation failed:", err.message); }
            else { console.log("Created work item #" + workItem.id); }
        });
    }

    if (eventType === "incident.resolved") {
        updateAzureWorkItem(incident.incident_number, "Resolved", function (err, id) {
            if (err) { console.error("Work item update failed:", err.message); }
            else if (id) { console.log("Resolved work item #" + id); }
        });
    }

    res.status(200).json({ received: true });
});

var PORT = process.env.PORT || 4200;
app.listen(PORT, function () {
    console.log("PagerDuty webhook receiver on port " + PORT);
});

Complete Working Example: Deployment with PagerDuty Gates

A full deployment pipeline that checks PagerDuty for active incidents before deploying, sends deployment events, and pages on failure:

// pagerduty/check-incidents.js
// Pre-deployment gate: block deploy if there are active critical incidents
var https = require("https");

var PD_API_KEY = process.env.PAGERDUTY_API_KEY;
var SERVICE_IDS = (process.env.PD_SERVICE_IDS || "").split(",").filter(function (s) { return s.trim(); });

function pdRequest(path, callback) {
    var options = {
        hostname: "api.pagerduty.com",
        path: path,
        method: "GET",
        headers: {
            "Authorization": "Token token=" + PD_API_KEY,
            "Accept": "application/json",
            "Content-Type": "application/json"
        }
    };

    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            if (res.statusCode === 200) {
                callback(null, JSON.parse(data));
            } else {
                callback(new Error("PagerDuty API error: " + res.statusCode));
            }
        });
    });
    req.on("error", callback);
    req.end();
}

var serviceFilter = SERVICE_IDS.length > 0
    ? "&service_ids[]=" + SERVICE_IDS.join("&service_ids[]=")
    : "";

pdRequest("/incidents?statuses[]=triggered&statuses[]=acknowledged&urgencies[]=high" + serviceFilter, function (err, data) {
    if (err) {
        console.error("Failed to check PagerDuty: " + err.message);
        console.log("Proceeding with deployment (PagerDuty check failed gracefully)");
        process.exit(0);
    }

    var incidents = data.incidents || [];
    console.log("Active high-urgency incidents: " + incidents.length);

    if (incidents.length > 0) {
        console.error("\n=== DEPLOYMENT BLOCKED ===");
        console.error("Active critical incidents detected:\n");
        incidents.forEach(function (inc) {
            console.error("  [" + inc.status.toUpperCase() + "] #" + inc.incident_number + ": " + inc.title);
            console.error("    Service: " + inc.service.summary);
            console.error("    Assigned: " + (inc.assignments.length > 0 ? inc.assignments[0].assignee.summary : "unassigned"));
            console.error("    URL: " + inc.html_url);
            console.error("");
        });
        console.error("Resolve active incidents before deploying.");
        process.exit(1);
    }

    console.log("No active critical incidents. Deployment can proceed.");
});

Pipeline integrating all components:

# azure-pipelines-pagerduty.yml
trigger:
  branches:
    include:
      - main

pool:
  vmImage: "ubuntu-latest"

stages:
  - stage: Build
    jobs:
      - job: BuildAndTest
        steps:
          - script: npm ci && npm test
            displayName: "Build and test"
          - task: PublishPipelineArtifact@1
            inputs:
              targetPath: dist
              artifact: app

  - stage: PreDeployGate
    displayName: "Pre-Deploy: Check PagerDuty"
    dependsOn: Build
    jobs:
      - job: CheckIncidents
        steps:
          - script: node pagerduty/check-incidents.js
            displayName: "Check for active incidents"
            env:
              PAGERDUTY_API_KEY: $(PagerDutyApiKey)
              PD_SERVICE_IDS: $(PagerDutyServiceIds)

  - stage: Deploy
    dependsOn: PreDeployGate
    jobs:
      - deployment: Production
        environment: production
        strategy:
          runOnce:
            deploy:
              steps:
                - script: echo "Deploying..."
                  displayName: "Deploy to production"

            on:
              failure:
                steps:
                  - script: node pagerduty/pipeline-alert.js
                    displayName: "Page on-call: deployment failed"
                    env:
                      PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)

              success:
                steps:
                  - script: |
                      node -e "
                      var pd = require('./pagerduty/client');
                      pd.resolveIncident(
                        process.env.PAGERDUTY_ROUTING_KEY,
                        'pipeline-' + process.env.BUILD_DEFINITIONNAME + '-failure',
                        function(err) { console.log(err ? 'No incident to resolve' : 'Resolved previous incident'); }
                      );
                      "
                    displayName: "Resolve any prior incident"
                    env:
                      PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)

Common Issues and Troubleshooting

PagerDuty Events API returns "Invalid Routing Key"

{"status":"invalid event","message":"Event object is invalid","errors":["Invalid Routing Key"]}

The routing key is tied to a specific integration on a specific service. Verify you are using the Events API v2 integration key, not a REST API key or service key. Each service has its own integration key — using the wrong one silently fails. Check Services > Your Service > Integrations for the correct key.

Duplicate incidents created for the same pipeline failure

The dedup_key field prevents duplicates. If you are getting duplicates, verify the dedup key is consistent across retries. Use a stable identifier like "pipeline-" + definitionName + "-" + buildId rather than a timestamp. The dedup key is case-sensitive and must match exactly for deduplication to work.

Incidents not auto-resolving when pipeline succeeds

The resolve event must use the exact same dedup_key as the trigger event. If the trigger used "deploy-MyPipeline-12345" but the resolve tries "deploy-mypipeline-12345", it will not match. Log the dedup key in both the trigger and resolve steps to verify they match.

PagerDuty webhook not reaching your receiver

PagerDuty webhooks require a publicly accessible HTTPS endpoint. If your receiver is behind a firewall or on a private network, PagerDuty cannot reach it. Check the webhook delivery history in PagerDuty under Integrations > Generic Webhooks > Your Webhook > Recent Deliveries for HTTP status codes and error details.

Best Practices

  • Use dedup keys based on the failure context, not unique IDs. A dedup key like "pipeline-api-service-main" groups all failures of the same pipeline on main into one incident. A key like "pipeline-api-service-12345" creates a new incident per build, which overwhelms the on-call person with duplicates.

  • Auto-resolve incidents when the pipeline recovers. Keeping resolved incidents accurate maintains trust in the alerting system. If on-call engineers see stale incidents, they start ignoring PagerDuty.

  • Set severity based on the deployment target. Production failures are critical. Staging failures are warnings. Dev environment failures are informational. Do not page someone at 3 AM for a failed dev build.

  • Include direct links to Azure DevOps in the incident. The on-call engineer needs to go from "I got paged" to "I'm looking at the logs" in one click. Always include the build URL in the PagerDuty incident links.

  • Block deployments during active critical incidents. Deploying new code while production is on fire almost always makes things worse. Check PagerDuty for active incidents as a pre-deployment gate.

  • Use PagerDuty maintenance windows during planned deployments. If your deployment is expected to cause brief availability issues, create a maintenance window in PagerDuty to suppress alerts during the deployment window.

References

Powered by Contentful