PagerDuty Alerts from Azure DevOps
Integrate PagerDuty with Azure DevOps for pipeline failure alerts, deployment tracking, and automated incident management
PagerDuty Alerts from Azure DevOps
When a production deployment fails at 2 AM, the difference between a five-minute fix and a two-hour outage comes down to how fast the right person gets notified. PagerDuty is the industry standard for incident management and on-call routing, and Azure DevOps is where most enterprise teams run their CI/CD pipelines. Connecting the two properly means your team gets actionable alerts with full context, automatic resolution when pipelines recover, and a clean audit trail of every deployment incident.
This article walks through the full integration: from basic webhook-driven alerts to a production-grade Node.js service that handles severity mapping, deduplication, bidirectional sync with work items, and change event tracking.
Prerequisites
- An Azure DevOps organization with at least one pipeline
- A PagerDuty account (any tier — free trial works for development)
- Node.js 18+ installed locally
- Basic familiarity with Azure DevOps service hooks and REST APIs
- A PagerDuty Events API v2 integration key (routing key)
- An Azure DevOps Personal Access Token (PAT) with Build read/execute and Work Items read/write permissions
PagerDuty Events API v2
PagerDuty's Events API v2 is the primary mechanism for programmatic incident creation. Unlike the REST API, the Events API is designed for machine-to-machine communication and is the correct choice for pipeline integrations. It accepts events at https://events.pagerduty.com/v2/enqueue and supports three event actions: trigger, acknowledge, and resolve.
Every event requires a routing_key (your integration key), an event_action, and a payload containing the summary, source, and severity. The severity field accepts four values: critical, error, warning, and info.
Here is a basic client for the Events API:
var https = require("https");
function sendPagerDutyEvent(routingKey, action, dedupKey, payload, callback) {
var event = {
routing_key: routingKey,
event_action: action,
dedup_key: dedupKey,
payload: payload
};
var body = JSON.stringify(event);
var options = {
hostname: "events.pagerduty.com",
port: 443,
path: "/v2/enqueue",
method: "POST",
headers: {
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(body)
}
};
var req = https.request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
var responseBody = Buffer.concat(chunks).toString();
try {
var parsed = JSON.parse(responseBody);
callback(null, parsed);
} catch (err) {
callback(new Error("Failed to parse PagerDuty response: " + responseBody));
}
});
});
req.on("error", function (err) {
callback(err);
});
req.write(body);
req.end();
}
module.exports = { sendPagerDutyEvent: sendPagerDutyEvent };
The dedup_key is critical. PagerDuty uses this key to correlate trigger, acknowledge, and resolve events into a single incident. Without it, every pipeline failure creates a new incident — and your on-call engineer's phone becomes unusable. A good dedup key for pipeline failures is a combination of the project name, pipeline ID, and the branch or environment.
Triggering Incidents from Pipeline Failures
The simplest integration is a pipeline task that fires a PagerDuty alert when a stage fails. In Azure DevOps YAML pipelines, you can use the condition: failed() expression to run a step only when the preceding stage has failed:
stages:
- stage: Deploy
jobs:
- job: DeployProduction
steps:
- script: ./deploy.sh
displayName: "Deploy to production"
- stage: NotifyOnFailure
dependsOn: Deploy
condition: failed()
jobs:
- job: PagerDutyAlert
steps:
- task: NodeTool@0
inputs:
versionSpec: "18.x"
- script: |
node pagerduty-alert.js \
--pipeline "$(Build.DefinitionName)" \
--buildId "$(Build.BuildId)" \
--branch "$(Build.SourceBranchName)" \
--reason "$(Build.Reason)" \
--url "$(System.CollectionUri)$(System.TeamProject)/_build/results?buildId=$(Build.BuildId)"
displayName: "Trigger PagerDuty incident"
env:
PAGERDUTY_ROUTING_KEY: $(PagerDutyRoutingKey)
The corresponding Node.js script parses those arguments and fires the event:
var pdClient = require("./pagerduty-client");
var args = {};
process.argv.slice(2).forEach(function (arg, i, arr) {
if (arg.startsWith("--") && arr[i + 1]) {
args[arg.substring(2)] = arr[i + 1];
}
});
var dedupKey = "azdo-" + args.pipeline + "-" + args.branch;
var payload = {
summary: "Pipeline failed: " + args.pipeline + " on " + args.branch,
source: "Azure DevOps",
severity: "error",
timestamp: new Date().toISOString(),
component: args.pipeline,
group: args.branch,
custom_details: {
build_id: args.buildId,
trigger_reason: args.reason,
build_url: args.url
}
};
pdClient.sendPagerDutyEvent(
process.env.PAGERDUTY_ROUTING_KEY,
"trigger",
dedupKey,
payload,
function (err, result) {
if (err) {
console.error("Failed to send PagerDuty alert:", err.message);
process.exit(1);
}
console.log("PagerDuty incident triggered:", result.dedup_key);
}
);
Azure DevOps Service Hooks to PagerDuty
For a more robust approach than in-pipeline scripts, Azure DevOps service hooks let you subscribe to events at the organization level. This way, every pipeline failure triggers PagerDuty without modifying individual pipeline YAML files.
Navigate to Project Settings > Service hooks > Create subscription. Select "Web Hooks" as the service. You can subscribe to these relevant events:
- Build completed — fires when any pipeline run finishes (filter on status = failed)
- Release deployment completed — fires when a classic release deployment finishes
- Run stage completed — fires for individual stage completions in YAML pipelines
Point the webhook URL at your integration service. The payload from Azure DevOps includes the build result, pipeline name, project, and a link back to the build. Your service processes this and forwards it to PagerDuty.
Here is the webhook receiver:
var express = require("express");
var bodyParser = require("body-parser");
var pdClient = require("./pagerduty-client");
var app = express();
app.use(bodyParser.json());
app.post("/webhooks/azure-devops", function (req, res) {
var event = req.body;
var eventType = event.eventType;
if (eventType === "build.complete") {
handleBuildComplete(event, function (err) {
if (err) {
console.error("Error handling build event:", err.message);
return res.status(500).json({ error: err.message });
}
res.status(200).json({ status: "processed" });
});
} else {
res.status(200).json({ status: "ignored", eventType: eventType });
}
});
function handleBuildComplete(event, callback) {
var resource = event.resource;
var result = resource.result;
var definition = resource.definition;
var project = resource.project;
var dedupKey = "azdo-" + project.name + "-" + definition.id + "-" +
(resource.sourceBranch || "unknown").replace("refs/heads/", "");
if (result === "succeeded") {
// Auto-resolve any existing incident for this pipeline
pdClient.sendPagerDutyEvent(
process.env.PAGERDUTY_ROUTING_KEY,
"resolve",
dedupKey,
null,
callback
);
return;
}
if (result === "failed" || result === "partiallySucceeded") {
var severity = mapSeverity(definition.name, resource.sourceBranch);
var payload = {
summary: "[" + project.name + "] Pipeline failed: " + definition.name,
source: "Azure DevOps - " + project.name,
severity: severity,
timestamp: resource.finishTime,
component: definition.name,
group: project.name,
class: "pipeline_failure",
custom_details: {
build_id: resource.id,
build_number: resource.buildNumber,
result: result,
reason: resource.reason,
requested_by: resource.requestedFor ? resource.requestedFor.displayName : "unknown",
source_branch: resource.sourceBranch,
build_url: resource._links.web.href
}
};
pdClient.sendPagerDutyEvent(
process.env.PAGERDUTY_ROUTING_KEY,
"trigger",
dedupKey,
payload,
callback
);
} else {
callback(null);
}
}
app.listen(3000, function () {
console.log("PagerDuty integration service running on port 3000");
});
The key behavior here is the auto-resolution. When a pipeline that previously failed runs successfully again, we send a resolve event with the same dedup key. PagerDuty automatically resolves the open incident. This prevents stale alerts from lingering after a hotfix goes through.
Custom Severity Mapping
Not every pipeline failure is a production outage. A failing PR validation build is noise if it pages someone at 3 AM. Severity mapping ensures the right urgency reaches the right people.
var SEVERITY_RULES = [
{
match: function (pipelineName, branch) {
return branch && branch.indexOf("refs/heads/main") === 0;
},
severity: "critical"
},
{
match: function (pipelineName, branch) {
return branch && branch.indexOf("refs/heads/release") === 0;
},
severity: "error"
},
{
match: function (pipelineName, branch) {
return pipelineName.toLowerCase().indexOf("deploy") !== -1;
},
severity: "error"
},
{
match: function (pipelineName, branch) {
return branch && branch.indexOf("refs/heads/develop") === 0;
},
severity: "warning"
}
];
function mapSeverity(pipelineName, branch) {
for (var i = 0; i < SEVERITY_RULES.length; i++) {
if (SEVERITY_RULES[i].match(pipelineName, branch)) {
return SEVERITY_RULES[i].severity;
}
}
return "info"; // Default for feature branches, PR builds, etc.
}
This means a main branch deployment failure triggers a critical incident that pages immediately, while a feature branch CI failure creates an informational alert that shows up in the PagerDuty dashboard without waking anyone up. In PagerDuty, you configure notification rules per urgency level — critical and error should page, warning should send a Slack message, and info should just log.
Deployment Failure Escalation
Sometimes a single failure is a fluke, but repeated failures indicate a systemic problem that needs escalation. You can track consecutive failures and escalate severity accordingly:
var failureTracker = {};
function getEscalatedSeverity(dedupKey, baseSeverity) {
if (!failureTracker[dedupKey]) {
failureTracker[dedupKey] = { count: 0, firstFailure: null };
}
failureTracker[dedupKey].count++;
if (!failureTracker[dedupKey].firstFailure) {
failureTracker[dedupKey].firstFailure = new Date();
}
var count = failureTracker[dedupKey].count;
if (count >= 5) return "critical";
if (count >= 3) return "error";
return baseSeverity;
}
function clearFailureTracking(dedupKey) {
delete failureTracker[dedupKey];
}
In production, you would persist failureTracker to a database or Redis rather than keeping it in memory. The idea is simple: the third consecutive failure on the same pipeline gets promoted from warning to error, and the fifth gets promoted to critical, regardless of the branch.
Change Events for Deployment Tracking
PagerDuty's Change Events API lets you send deployment markers that appear on the service timeline. This is separate from incidents — change events represent intentional actions, like a deployment or configuration change. They help on-call engineers correlate incidents with recent changes.
function sendChangeEvent(routingKey, summary, source, links, customDetails, callback) {
var event = {
routing_key: routingKey,
payload: {
summary: summary,
source: source,
timestamp: new Date().toISOString(),
custom_details: customDetails || {}
},
links: links || []
};
var body = JSON.stringify(event);
var options = {
hostname: "events.pagerduty.com",
port: 443,
path: "/v2/change/enqueue",
method: "POST",
headers: {
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(body)
}
};
var req = require("https").request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
var responseBody = Buffer.concat(chunks).toString();
callback(null, JSON.parse(responseBody));
});
});
req.on("error", callback);
req.write(body);
req.end();
}
// Call this on every successful deployment
function trackDeployment(resource) {
var definition = resource.definition;
var links = [
{
href: resource._links.web.href,
text: "View Build"
}
];
sendChangeEvent(
process.env.PAGERDUTY_ROUTING_KEY,
"Deployed " + definition.name + " build #" + resource.buildNumber,
"Azure DevOps",
links,
{
build_number: resource.buildNumber,
source_branch: resource.sourceBranch,
requested_by: resource.requestedFor ? resource.requestedFor.displayName : "unknown",
commit: resource.sourceVersion
},
function (err) {
if (err) console.error("Failed to send change event:", err.message);
else console.log("Change event tracked for build #" + resource.buildNumber);
}
);
}
Send change events for every successful production deployment. When an incident fires ten minutes after a deployment, the timeline in PagerDuty makes the correlation obvious.
PagerDuty Service Configuration
Your PagerDuty setup matters as much as the code. Here is the recommended service configuration for an Azure DevOps integration:
Create a dedicated service per environment (e.g., "Production Pipelines", "Staging Pipelines"). Do not dump all pipeline alerts into a single service.
Use Events API v2 integration on each service. Go to Service > Integrations > Add Integration > Events API v2. Copy the integration key — this is your routing key.
Configure urgency rules on the service:
- High urgency for
criticalanderrorseverity (pages immediately) - Low urgency for
warningandinfoseverity (no paging, Slack only)
- High urgency for
Set up an escalation policy that reflects your team structure. A typical pipeline failure escalation: primary on-call for 15 minutes, then secondary on-call, then the engineering manager at 30 minutes.
Enable auto-resolution — since our integration sends resolve events, PagerDuty should be configured to accept them. This is the default behavior, but verify it is not disabled.
Add response plays for common failure scenarios. A response play can automatically add responders, post to a Slack channel, and start a conference bridge.
On-Call Schedule Awareness in Pipelines
Sometimes you want pipeline behavior to change based on who is on-call. Maybe you skip certain risky deployments during off-hours, or you want the pipeline to tag the on-call engineer in the failure notification. PagerDuty's REST API lets you query the current on-call schedule:
var https = require("https");
function getCurrentOnCall(scheduleId, apiToken, callback) {
var now = new Date().toISOString();
var path = "/schedules/" + scheduleId + "/users?since=" +
encodeURIComponent(now) + "&until=" + encodeURIComponent(now);
var options = {
hostname: "api.pagerduty.com",
port: 443,
path: path,
method: "GET",
headers: {
"Authorization": "Token token=" + apiToken,
"Content-Type": "application/json",
"Accept": "application/vnd.pagerduty+json;version=2"
}
};
var req = https.request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
var body = JSON.parse(Buffer.concat(chunks).toString());
var users = body.users || [];
callback(null, users);
});
});
req.on("error", callback);
req.end();
}
// Use in pipeline context
getCurrentOnCall(
process.env.PAGERDUTY_SCHEDULE_ID,
process.env.PAGERDUTY_API_TOKEN,
function (err, users) {
if (err) {
console.error("Could not fetch on-call:", err.message);
return;
}
if (users.length > 0) {
console.log("Current on-call: " + users[0].name + " (" + users[0].email + ")");
// Include in PagerDuty alert custom_details or pipeline output
}
}
);
This is useful for building deployment gates. If no one is on-call (schedule gap), block the production deployment. If the on-call engineer is on a secondary schedule (backup), require manual approval before deploying.
Incident Lifecycle Management
A well-integrated system manages the full incident lifecycle: trigger, acknowledge, escalate, resolve. Here is a module that wraps the full lifecycle:
var pdClient = require("./pagerduty-client");
var IncidentManager = {
trigger: function (routingKey, dedupKey, summary, severity, details, callback) {
var payload = {
summary: summary,
source: "Azure DevOps Pipeline Monitor",
severity: severity,
timestamp: new Date().toISOString(),
custom_details: details
};
pdClient.sendPagerDutyEvent(routingKey, "trigger", dedupKey, payload, function (err, result) {
if (err) return callback(err);
console.log("Incident triggered: " + dedupKey);
callback(null, result);
});
},
acknowledge: function (routingKey, dedupKey, callback) {
pdClient.sendPagerDutyEvent(routingKey, "acknowledge", dedupKey, null, function (err, result) {
if (err) return callback(err);
console.log("Incident acknowledged: " + dedupKey);
callback(null, result);
});
},
resolve: function (routingKey, dedupKey, callback) {
pdClient.sendPagerDutyEvent(routingKey, "resolve", dedupKey, null, function (err, result) {
if (err) return callback(err);
console.log("Incident resolved: " + dedupKey);
callback(null, result);
});
},
triggerWithEscalation: function (routingKey, dedupKey, summary, baseSeverity, details, failureCount, callback) {
var severity = baseSeverity;
if (failureCount >= 5) severity = "critical";
else if (failureCount >= 3) severity = "error";
details.consecutive_failures = failureCount;
details.escalated = severity !== baseSeverity;
this.trigger(routingKey, dedupKey, summary, severity, details, callback);
}
};
module.exports = IncidentManager;
Bidirectional Sync: PagerDuty Incidents to Azure DevOps Work Items
When an incident fires, you often want a corresponding bug or task in Azure DevOps for tracking. PagerDuty webhooks (v3) can notify your service when an incident is created, acknowledged, or resolved. Your service then creates or updates a work item in Azure DevOps.
var https = require("https");
function createAzureDevOpsWorkItem(orgUrl, project, pat, title, description, callback) {
var patchDoc = [
{ op: "add", path: "/fields/System.Title", value: title },
{ op: "add", path: "/fields/System.Description", value: description },
{ op: "add", path: "/fields/System.Tags", value: "pagerduty-incident;auto-created" },
{ op: "add", path: "/fields/Microsoft.VSTS.Common.Priority", value: 1 }
];
var body = JSON.stringify(patchDoc);
var auth = Buffer.from(":" + pat).toString("base64");
var url = new URL(orgUrl);
var options = {
hostname: url.hostname,
port: 443,
path: "/" + encodeURIComponent(project) + "/_apis/wit/workitems/$Bug?api-version=7.1",
method: "POST",
headers: {
"Content-Type": "application/json-patch+json",
"Authorization": "Basic " + auth,
"Content-Length": Buffer.byteLength(body)
}
};
var req = https.request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
var responseBody = JSON.parse(Buffer.concat(chunks).toString());
if (res.statusCode >= 200 && res.statusCode < 300) {
callback(null, responseBody);
} else {
callback(new Error("Azure DevOps API error: " + res.statusCode + " - " + JSON.stringify(responseBody)));
}
});
});
req.on("error", callback);
req.write(body);
req.end();
}
// Webhook handler for PagerDuty v3 webhooks
app.post("/webhooks/pagerduty", function (req, res) {
var event = req.body.event;
if (!event) return res.status(200).json({ status: "no event" });
var eventType = event.event_type;
var incident = event.data;
if (eventType === "incident.triggered") {
var title = "[PagerDuty] " + incident.title;
var description = "<h3>PagerDuty Incident</h3>" +
"<p><strong>Service:</strong> " + (incident.service ? incident.service.summary : "Unknown") + "</p>" +
"<p><strong>Urgency:</strong> " + incident.urgency + "</p>" +
"<p><strong>Link:</strong> <a href='" + incident.html_url + "'>" + incident.html_url + "</a></p>";
createAzureDevOpsWorkItem(
process.env.AZDO_ORG_URL,
process.env.AZDO_PROJECT,
process.env.AZDO_PAT,
title,
description,
function (err, workItem) {
if (err) {
console.error("Failed to create work item:", err.message);
return res.status(500).json({ error: err.message });
}
console.log("Created work item #" + workItem.id + " for incident " + incident.id);
res.status(200).json({ work_item_id: workItem.id });
}
);
} else {
res.status(200).json({ status: "ignored", eventType: eventType });
}
});
For the reverse direction — resolving the PagerDuty incident when the Azure DevOps work item is closed — set up an Azure DevOps service hook for work item updates and filter on state transitions to "Closed" or "Resolved" where the tag includes pagerduty-incident.
Runbook Linking
Every PagerDuty incident should link to a runbook. When an engineer gets paged at 3 AM, they should not have to guess what to do. Add runbook links to your PagerDuty alerts by including them in the links array of the event payload:
var RUNBOOKS = {
"production-deploy": "https://wiki.example.com/runbooks/production-deploy-failure",
"database-migration": "https://wiki.example.com/runbooks/database-migration-failure",
"integration-tests": "https://wiki.example.com/runbooks/integration-test-failure",
"default": "https://wiki.example.com/runbooks/general-pipeline-failure"
};
function getRunbookUrl(pipelineName) {
var key = pipelineName.toLowerCase().replace(/\s+/g, "-");
return RUNBOOKS[key] || RUNBOOKS["default"];
}
function buildEventWithRunbook(routingKey, dedupKey, payload, pipelineName) {
var runbookUrl = getRunbookUrl(pipelineName);
return {
routing_key: routingKey,
event_action: "trigger",
dedup_key: dedupKey,
payload: payload,
links: [
{ href: runbookUrl, text: "Runbook: " + pipelineName },
{ href: payload.custom_details.build_url, text: "View Build in Azure DevOps" }
],
images: []
};
}
PagerDuty renders these links directly in the incident detail view, the mobile app, and Slack notifications. Engineers get one-tap access to the relevant documentation.
Complete Working Example
Here is the full integration service that ties everything together. This is a production-ready Node.js service that receives Azure DevOps service hook webhooks, creates PagerDuty incidents with proper severity and deduplication, auto-resolves on successful re-runs, and tracks deployments as change events.
var express = require("express");
var bodyParser = require("body-parser");
var https = require("https");
// ============================================================
// Configuration
// ============================================================
var CONFIG = {
port: process.env.PORT || 3000,
pagerduty: {
routingKey: process.env.PAGERDUTY_ROUTING_KEY,
apiToken: process.env.PAGERDUTY_API_TOKEN
},
azureDevOps: {
orgUrl: process.env.AZDO_ORG_URL,
project: process.env.AZDO_PROJECT,
pat: process.env.AZDO_PAT
}
};
// ============================================================
// PagerDuty Client
// ============================================================
function pagerDutyRequest(path, payload, callback) {
var body = JSON.stringify(payload);
var options = {
hostname: "events.pagerduty.com",
port: 443,
path: path,
method: "POST",
headers: {
"Content-Type": "application/json",
"Content-Length": Buffer.byteLength(body)
}
};
var req = https.request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) { chunks.push(chunk); });
res.on("end", function () {
var responseBody = Buffer.concat(chunks).toString();
try {
callback(null, JSON.parse(responseBody), res.statusCode);
} catch (e) {
callback(new Error("Invalid JSON from PagerDuty: " + responseBody));
}
});
});
req.on("error", callback);
req.write(body);
req.end();
}
function triggerIncident(dedupKey, summary, severity, details, links, callback) {
var event = {
routing_key: CONFIG.pagerduty.routingKey,
event_action: "trigger",
dedup_key: dedupKey,
payload: {
summary: summary,
source: "Azure DevOps Pipeline Monitor",
severity: severity,
timestamp: new Date().toISOString(),
custom_details: details
},
links: links || []
};
pagerDutyRequest("/v2/enqueue", event, callback);
}
function resolveIncident(dedupKey, callback) {
var event = {
routing_key: CONFIG.pagerduty.routingKey,
event_action: "resolve",
dedup_key: dedupKey
};
pagerDutyRequest("/v2/enqueue", event, callback);
}
function sendChangeEvent(summary, details, links, callback) {
var event = {
routing_key: CONFIG.pagerduty.routingKey,
payload: {
summary: summary,
source: "Azure DevOps",
timestamp: new Date().toISOString(),
custom_details: details
},
links: links || []
};
pagerDutyRequest("/v2/change/enqueue", event, callback);
}
// ============================================================
// Severity Mapping
// ============================================================
var SEVERITY_RULES = [
{ test: function (p, b) { return /refs\/heads\/main$/.test(b); }, severity: "critical" },
{ test: function (p, b) { return /refs\/heads\/master$/.test(b); }, severity: "critical" },
{ test: function (p, b) { return /refs\/heads\/release/.test(b); }, severity: "error" },
{ test: function (p, b) { return /deploy/i.test(p); }, severity: "error" },
{ test: function (p, b) { return /refs\/heads\/develop$/.test(b); }, severity: "warning" }
];
function mapSeverity(pipelineName, branch) {
for (var i = 0; i < SEVERITY_RULES.length; i++) {
if (SEVERITY_RULES[i].test(pipelineName, branch || "")) {
return SEVERITY_RULES[i].severity;
}
}
return "info";
}
// ============================================================
// Failure Tracking & Escalation
// ============================================================
var failureCounts = {};
function trackFailure(dedupKey) {
if (!failureCounts[dedupKey]) failureCounts[dedupKey] = 0;
failureCounts[dedupKey]++;
return failureCounts[dedupKey];
}
function clearFailures(dedupKey) {
delete failureCounts[dedupKey];
}
function escalateSeverity(baseSeverity, failureCount) {
if (failureCount >= 5) return "critical";
if (failureCount >= 3 && baseSeverity !== "critical") return "error";
return baseSeverity;
}
// ============================================================
// Runbook Mapping
// ============================================================
var RUNBOOKS = {};
function getRunbookUrl(pipelineName) {
var key = pipelineName.toLowerCase().replace(/\s+/g, "-");
return RUNBOOKS[key] || null;
}
// ============================================================
// Dedup Key Generation
// ============================================================
function buildDedupKey(project, definitionId, branch) {
var branchName = (branch || "unknown").replace("refs/heads/", "");
return "azdo-" + project + "-" + definitionId + "-" + branchName;
}
// ============================================================
// Express App
// ============================================================
var app = express();
app.use(bodyParser.json());
// Health check
app.get("/health", function (req, res) {
res.json({ status: "ok", uptime: process.uptime() });
});
// Azure DevOps webhook receiver
app.post("/webhooks/azure-devops", function (req, res) {
var event = req.body;
var eventType = event.eventType;
console.log("Received event: " + eventType);
if (eventType === "build.complete") {
processBuildComplete(event.resource, function (err) {
if (err) {
console.error("Error processing build event:", err.message);
return res.status(500).json({ error: err.message });
}
res.status(200).json({ status: "processed" });
});
} else {
res.status(200).json({ status: "ignored" });
}
});
function processBuildComplete(resource, callback) {
var result = resource.result;
var definition = resource.definition;
var project = resource.project;
var branch = resource.sourceBranch;
var dedupKey = buildDedupKey(project.name, definition.id, branch);
// Successful build: resolve any open incident and track as change event
if (result === "succeeded") {
clearFailures(dedupKey);
// Send resolve event
resolveIncident(dedupKey, function (err) {
if (err) console.error("Resolve failed (may not have open incident):", err.message);
else console.log("Resolved incident: " + dedupKey);
});
// Track successful deployment as change event
if (/deploy/i.test(definition.name) || /refs\/heads\/(main|master)$/.test(branch)) {
sendChangeEvent(
"Deployed " + definition.name + " #" + resource.buildNumber,
{
build_number: resource.buildNumber,
branch: branch,
requested_by: resource.requestedFor ? resource.requestedFor.displayName : "unknown",
commit: resource.sourceVersion
},
[{ href: resource._links.web.href, text: "View Build" }],
function (err) {
if (err) console.error("Change event failed:", err.message);
}
);
}
callback(null);
return;
}
// Failed build: trigger or update incident
if (result === "failed" || result === "partiallySucceeded") {
var failureCount = trackFailure(dedupKey);
var baseSeverity = mapSeverity(definition.name, branch);
var severity = escalateSeverity(baseSeverity, failureCount);
var summary = "[" + project.name + "] Pipeline failed: " + definition.name +
" (#" + resource.buildNumber + ")";
if (failureCount > 1) {
summary += " [" + failureCount + " consecutive failures]";
}
var details = {
build_id: resource.id,
build_number: resource.buildNumber,
result: result,
reason: resource.reason,
source_branch: branch,
source_version: resource.sourceVersion,
requested_by: resource.requestedFor ? resource.requestedFor.displayName : "unknown",
build_url: resource._links.web.href,
consecutive_failures: failureCount,
severity_escalated: severity !== baseSeverity
};
var links = [
{ href: resource._links.web.href, text: "View Build in Azure DevOps" }
];
var runbook = getRunbookUrl(definition.name);
if (runbook) {
links.push({ href: runbook, text: "Runbook: " + definition.name });
}
triggerIncident(dedupKey, summary, severity, details, links, function (err, result) {
if (err) return callback(err);
console.log("Incident triggered: " + dedupKey + " (severity: " + severity +
", failures: " + failureCount + ")");
callback(null);
});
return;
}
// Cancelled or other results — ignore
callback(null);
}
// ============================================================
// Start Server
// ============================================================
app.listen(CONFIG.port, function () {
console.log("PagerDuty integration service running on port " + CONFIG.port);
console.log("Webhook endpoint: POST /webhooks/azure-devops");
if (!CONFIG.pagerduty.routingKey) {
console.warn("WARNING: PAGERDUTY_ROUTING_KEY not set");
}
});
module.exports = app;
To deploy this service, run it alongside your Azure DevOps organization. Set up a service hook pointing to https://your-service.example.com/webhooks/azure-devops with the "Build completed" event type. Every pipeline completion — success or failure — flows through this service, and PagerDuty gets the right signal.
Common Issues & Troubleshooting
1. Duplicate incidents for the same failure. This happens when your dedup key is too specific — for example, including the build ID in the key. The dedup key should represent the thing that broke, not the specific failure instance. Use project + pipeline + branch as the key. Two consecutive failures on the same pipeline and branch should merge into one incident.
2. Incidents not auto-resolving. Verify that the dedup key used in the resolve event matches the dedup key from the trigger event exactly. Character casing matters. Also check that the PagerDuty service has not been configured to disable event-based resolution. Log both the trigger and resolve dedup keys to confirm they match.
3. Service hook payloads missing fields. Azure DevOps service hook payloads vary depending on the event type and pipeline type (YAML vs. classic). The resource.sourceBranch field may be absent for manually triggered builds without a branch specified. Always use defensive coding with fallbacks: resource.sourceBranch || "unknown".
4. PagerDuty returns 429 (rate limit). The Events API has a rate limit of approximately 120 events per minute per integration key. If you have hundreds of pipelines failing simultaneously (which itself is a problem worth investigating), implement retry logic with exponential backoff:
function sendWithRetry(fn, maxRetries, callback) {
var attempts = 0;
function attempt() {
attempts++;
fn(function (err, result, statusCode) {
if (err && attempts < maxRetries) {
var delay = Math.pow(2, attempts) * 1000;
console.log("Retrying in " + delay + "ms (attempt " + attempts + "/" + maxRetries + ")");
setTimeout(attempt, delay);
return;
}
callback(err, result);
});
}
attempt();
}
5. Webhook signature validation failing. PagerDuty v3 webhooks include an X-PagerDuty-Signature header for HMAC verification. If you have configured a webhook secret, you must validate this signature or PagerDuty will not retry on failures. Use crypto.createHmac("sha256", secret).update(rawBody).digest("hex") and compare with the header value.
6. Build status shows "partiallySucceeded" but no alert fires. Make sure your handler treats partiallySucceeded as a failure state. This result means some tasks failed but the pipeline continued. It is often more dangerous than a clean failure because it may indicate a partial deployment.
Best Practices
Use separate PagerDuty services per environment. Production, staging, and development pipelines should route to different services with different escalation policies and urgency rules. A staging failure should never page anyone at night.
Always include a build URL in the alert. The engineer who gets paged needs one click to see the failing build. Put the URL in both the
custom_detailsand thelinksarray of the PagerDuty event.Set up auto-resolution from the start. Stale incidents erode trust in your alerting system. If a pipeline failure resolves itself on the next run, the incident should close automatically. This requires consistent dedup keys and sending resolve events on success.
Use change events for every production deployment. Even when nothing breaks, PagerDuty's change timeline is invaluable for post-incident analysis. The on-call engineer can instantly see what changed in the last hour.
Keep dedup keys stable and deterministic. Do not include timestamps, build IDs, or random values in dedup keys. The key represents a specific failure condition (this pipeline, on this branch), not a specific build run.
Rate-limit your integration service, not just PagerDuty. If Azure DevOps fires 500 webhook events in a burst (common during branch cleanup or mass pipeline runs), your service should queue events and process them at a controlled rate rather than hammering PagerDuty's API.
Test your integration with non-critical pipelines first. Create a test pipeline that intentionally fails, trigger it, verify the PagerDuty incident looks correct, then trigger a success and verify auto-resolution. Do this before connecting production pipelines.
Map runbooks to pipelines before you go live. An alert without a runbook is just noise. Every production pipeline should have a corresponding runbook linked in the PagerDuty incident. If you do not have a runbook, the pipeline is not ready for on-call alerting.
Audit your dedup key strategy monthly. As pipelines get renamed, branches get restructured, and projects evolve, your dedup keys may drift. Review open incidents periodically to check for key mismatches or orphaned incidents that never resolved.