Integrations

Grafana Dashboards for Pipeline Metrics

Build Grafana dashboards for Azure DevOps pipeline metrics with DORA metrics, deployment tracking, and team performance visualization

Grafana Dashboards for Pipeline Metrics

Pipeline metrics tell you whether your engineering team is actually delivering software or just pretending to. Grafana gives you the visualization layer to turn raw Azure DevOps pipeline data into actionable dashboards that track deployment frequency, lead time, failure rates, and recovery time. In this article, we will build a complete Node.js metric collector that pulls pipeline data from Azure DevOps and feeds it into Grafana dashboards covering all four DORA metrics.

Prerequisites

  • Node.js 18+ installed
  • An Azure DevOps organization with active pipelines
  • A Personal Access Token (PAT) with Build read scope
  • Grafana 9+ running locally or on a server
  • InfluxDB 2.x installed and running
  • Basic familiarity with REST APIs and dashboard concepts

Grafana Overview for DevOps Teams

Grafana is an open-source observability platform that connects to dozens of data sources and renders time-series data into panels, graphs, and alerts. For DevOps teams, it fills a gap that Azure DevOps itself does not cover well: historical trend analysis across pipelines, teams, and repositories.

The built-in Azure DevOps analytics views give you per-pipeline run history, but they do not let you overlay deployment frequency against change failure rate, or compare lead times across multiple repositories on a single screen. Grafana does.

The architecture is straightforward. You collect pipeline metrics from Azure DevOps, store them in a time-series database like InfluxDB or Prometheus, and point Grafana at that database. Grafana handles the visualization, alerting, and sharing.

Data Sources for Azure DevOps

There are three main approaches to getting Azure DevOps data into Grafana.

Azure DevOps Plugin

Grafana has a community plugin for Azure DevOps that connects directly to the Azure DevOps REST API. It works for basic queries but has limitations. The plugin does not support complex aggregations, and it queries the API on every dashboard refresh, which can hit rate limits on busy organizations.

grafana-cli plugins install grafana-azure-devops-datasource

This is fine for simple use cases, but for DORA metrics and historical analysis, you need a time-series database in between.

Prometheus

Prometheus uses a pull model. You expose pipeline metrics on an HTTP endpoint, and Prometheus scrapes them on a schedule. This works well if you already have Prometheus in your stack, but it requires you to maintain a metrics exporter that is always running.

Custom Collector with InfluxDB

This is the approach I recommend for most teams. A scheduled Node.js job pulls pipeline data from the Azure DevOps API, calculates metrics, and writes them to InfluxDB. Grafana reads from InfluxDB using Flux queries. You get full control over what data you collect, how you aggregate it, and how long you retain it.

Collecting Pipeline Metrics with Node.js

The Azure DevOps REST API exposes pipeline runs through the Build API. Each build record includes the start time, finish time, result, source branch, repository, and the pipeline definition. That is everything you need to calculate DORA metrics.

Here is a basic collector that fetches recent pipeline runs:

var https = require("https");

var ORG = process.env.AZURE_DEVOPS_ORG;
var PROJECT = process.env.AZURE_DEVOPS_PROJECT;
var PAT = process.env.AZURE_DEVOPS_PAT;

function fetchBuilds(minTime, callback) {
  var token = Buffer.from(":" + PAT).toString("base64");
  var minTimeISO = minTime.toISOString();
  var path = "/" + ORG + "/" + PROJECT +
    "/_apis/build/builds?minTime=" + minTimeISO +
    "&api-version=7.1&$top=500";

  var options = {
    hostname: "dev.azure.com",
    path: path,
    method: "GET",
    headers: {
      "Authorization": "Basic " + token,
      "Content-Type": "application/json"
    }
  };

  var req = https.request(options, function(res) {
    var body = "";
    res.on("data", function(chunk) { body += chunk; });
    res.on("end", function() {
      if (res.statusCode !== 200) {
        return callback(new Error("API returned " + res.statusCode));
      }
      var data = JSON.parse(body);
      callback(null, data.value || []);
    });
  });

  req.on("error", callback);
  req.end();
}

Each build object from the API contains fields we care about:

  • startTime and finishTime for duration calculations
  • result (succeeded, failed, canceled, partiallySucceeded)
  • sourceBranch for filtering production deployments
  • definition.name for grouping by pipeline
  • repository.name for grouping by repository

Storing Metrics in InfluxDB

InfluxDB 2.x uses the line protocol for writes and Flux for queries. The Node.js client library handles both. Here is how to write pipeline metrics to InfluxDB:

var Influx = require("@influxdata/influxdb-client");

var INFLUX_URL = process.env.INFLUX_URL || "http://localhost:8086";
var INFLUX_TOKEN = process.env.INFLUX_TOKEN;
var INFLUX_ORG = process.env.INFLUX_ORG;
var INFLUX_BUCKET = process.env.INFLUX_BUCKET || "pipeline-metrics";

var client = new Influx.InfluxDB({
  url: INFLUX_URL,
  token: INFLUX_TOKEN
});

var writeApi = client.getWriteApi(INFLUX_ORG, INFLUX_BUCKET, "s");

function writeBuildMetric(build) {
  var point = new Influx.Point("pipeline_run")
    .tag("pipeline", build.definition.name)
    .tag("repository", build.repository.name)
    .tag("result", build.result)
    .tag("sourceBranch", build.sourceBranch)
    .tag("reason", build.reason)
    .floatField("duration_seconds", calculateDuration(build))
    .intField("succeeded", build.result === "succeeded" ? 1 : 0)
    .intField("failed", build.result === "failed" ? 1 : 0)
    .timestamp(new Date(build.finishTime));

  writeApi.writePoint(point);
}

function calculateDuration(build) {
  if (!build.startTime || !build.finishTime) return 0;
  var start = new Date(build.startTime).getTime();
  var finish = new Date(build.finishTime).getTime();
  return (finish - start) / 1000;
}

function flushWrites(callback) {
  writeApi.flush().then(function() {
    callback(null);
  }).catch(function(err) {
    callback(err);
  });
}

The key design decision here is what to store as tags versus fields. Tags are indexed and used for filtering and grouping. Fields hold the numeric values you want to aggregate. Pipeline name, repository, and result are tags. Duration and success/failure counts are fields.

Building Pipeline Duration Dashboards

Once metrics are flowing into InfluxDB, building the Grafana dashboard is the straightforward part. Create a new dashboard and add a time series panel with this Flux query:

from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r._field == "duration_seconds")
  |> group(columns: ["pipeline"])
  |> aggregateWindow(every: 1d, fn: mean)

This gives you average pipeline duration per day, grouped by pipeline. You can see at a glance if a pipeline is getting slower over time. Add a second panel showing the 95th percentile to catch outlier runs:

from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r._field == "duration_seconds")
  |> group(columns: ["pipeline"])
  |> aggregateWindow(every: 1d, fn: (tables=<-, column) =>
    tables |> quantile(q: 0.95, column: column))

Deployment Frequency Tracking

Deployment frequency is the first DORA metric. It measures how often your team deploys to production. Filter by the production branch and count runs per time window:

from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r._field == "succeeded")
  |> filter(fn: (r) => r.sourceBranch == "refs/heads/main" or
                        r.sourceBranch == "refs/heads/master")
  |> filter(fn: (r) => r.result == "succeeded")
  |> group(columns: ["pipeline"])
  |> aggregateWindow(every: 1d, fn: sum)

Display this as a bar chart. Daily bars make weekly patterns obvious. You will quickly see which days your team ships and whether deployment frequency is trending up or down.

For the DORA classification, use a stat panel with a value mapping:

  • Elite: multiple deploys per day
  • High: between once per day and once per week
  • Medium: between once per week and once per month
  • Low: less than once per month

Lead Time for Changes Visualization

Lead time for changes measures the time from code commit to production deployment. This requires correlating commit timestamps with build completion times. Extend the collector to capture this:

function calculateLeadTime(build, callback) {
  if (build.sourceVersion && build.finishTime) {
    fetchCommit(build.repository.id, build.sourceVersion, function(err, commit) {
      if (err) return callback(err);
      var commitTime = new Date(commit.author.date).getTime();
      var deployTime = new Date(build.finishTime).getTime();
      var leadTimeHours = (deployTime - commitTime) / (1000 * 60 * 60);
      callback(null, leadTimeHours);
    });
  } else {
    callback(null, null);
  }
}

function fetchCommit(repoId, commitId, callback) {
  var token = Buffer.from(":" + PAT).toString("base64");
  var path = "/" + ORG + "/" + PROJECT +
    "/_apis/git/repositories/" + repoId +
    "/commits/" + commitId + "?api-version=7.1";

  var options = {
    hostname: "dev.azure.com",
    path: path,
    method: "GET",
    headers: {
      "Authorization": "Basic " + token
    }
  };

  var req = https.request(options, function(res) {
    var body = "";
    res.on("data", function(chunk) { body += chunk; });
    res.on("end", function() {
      if (res.statusCode !== 200) {
        return callback(new Error("Commit fetch failed: " + res.statusCode));
      }
      callback(null, JSON.parse(body));
    });
  });

  req.on("error", callback);
  req.end();
}

Write lead time as a separate field in the same measurement:

function writeLeadTimeMetric(build, leadTimeHours) {
  var point = new Influx.Point("pipeline_run")
    .tag("pipeline", build.definition.name)
    .tag("repository", build.repository.name)
    .floatField("lead_time_hours", leadTimeHours)
    .timestamp(new Date(build.finishTime));

  writeApi.writePoint(point);
}

DORA Metrics Dashboards

The four DORA metrics belong on a single dashboard with four stat panels across the top and detailed time series below each one. Here is the full layout:

Row 1: Current DORA Scores

  • Deployment Frequency (stat panel, last 7 days)
  • Lead Time for Changes (stat panel, median hours)
  • Mean Time to Restore (stat panel, median hours)
  • Change Failure Rate (stat panel, percentage)

Row 2: Deployment Frequency Trend

  • Daily deployment count as bar chart
  • Weekly rolling average as line overlay

Row 3: Lead Time Trend

  • Median lead time per day
  • 95th percentile lead time per day

Row 4: Failure and Recovery

  • Change failure rate per week
  • Mean time to restore per incident

Change failure rate is the ratio of failed deployments to total deployments:

succeeded = from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r._field == "succeeded")
  |> filter(fn: (r) => r.sourceBranch == "refs/heads/main")
  |> aggregateWindow(every: 1w, fn: sum)

failed = from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r._field == "failed")
  |> filter(fn: (r) => r.sourceBranch == "refs/heads/main")
  |> aggregateWindow(every: 1w, fn: sum)

join(tables: {succeeded: succeeded, failed: failed}, on: ["_time"])
  |> map(fn: (r) => ({
    _time: r._time,
    _value: float(v: r._value_failed) /
            (float(v: r._value_succeeded) + float(v: r._value_failed)) * 100.0
  }))

Mean Time to Restore (MTTR)

MTTR measures how long it takes to recover after a failed deployment. The collector needs to detect failure-to-success transitions on production pipelines:

function calculateMTTR(builds) {
  var prodBuilds = builds.filter(function(b) {
    return b.sourceBranch === "refs/heads/main" ||
           b.sourceBranch === "refs/heads/master";
  });

  prodBuilds.sort(function(a, b) {
    return new Date(a.finishTime) - new Date(b.finishTime);
  });

  var incidents = [];
  var failureStart = null;

  for (var i = 0; i < prodBuilds.length; i++) {
    var build = prodBuilds[i];
    if (build.result === "failed" && failureStart === null) {
      failureStart = new Date(build.finishTime);
    } else if (build.result === "succeeded" && failureStart !== null) {
      var recoveryTime = new Date(build.finishTime);
      var mttrHours = (recoveryTime - failureStart) / (1000 * 60 * 60);
      incidents.push({
        failureStart: failureStart,
        recoveryTime: recoveryTime,
        mttrHours: mttrHours,
        pipeline: build.definition.name
      });
      failureStart = null;
    }
  }

  return incidents;
}

function writeMTTRMetrics(incidents) {
  incidents.forEach(function(incident) {
    var point = new Influx.Point("mttr_incident")
      .tag("pipeline", incident.pipeline)
      .floatField("mttr_hours", incident.mttrHours)
      .timestamp(incident.recoveryTime);

    writeApi.writePoint(point);
  });
}

Alerting on Pipeline Failures

Grafana alerts trigger when a metric crosses a threshold. For pipeline monitoring, set up alerts for:

  1. Consecutive failures: Alert when a pipeline fails 3 times in a row
  2. Duration spike: Alert when a pipeline takes more than 2x its average duration
  3. Deployment drought: Alert when no production deployment has occurred in 48 hours

Configure alerts in Grafana's Alerting section. Create a contact point for Slack or email, then add alert rules:

# Grafana alert rule example (provisioned via YAML)
apiVersion: 1
groups:
  - orgId: 1
    name: pipeline_alerts
    folder: Pipeline Monitoring
    interval: 5m
    rules:
      - uid: pipeline-failure-alert
        title: Pipeline Failure Spike
        condition: C
        data:
          - refId: A
            datasourceUid: influxdb
            model:
              query: |
                from(bucket: "pipeline-metrics")
                  |> range(start: -1h)
                  |> filter(fn: (r) => r._measurement == "pipeline_run")
                  |> filter(fn: (r) => r._field == "failed")
                  |> filter(fn: (r) => r.sourceBranch == "refs/heads/main")
                  |> sum()
          - refId: C
            datasourceUid: __expr__
            model:
              type: threshold
              conditions:
                - evaluator:
                    type: gt
                    params: [2]
        for: 0s
        labels:
          severity: critical
        annotations:
          summary: "More than 2 production pipeline failures in the last hour"

Annotation Markers for Deployments

Annotations add vertical markers on Grafana graphs at specific timestamps. They are perfect for marking deployments so you can correlate application behavior with code changes.

Add a webhook to your Azure DevOps pipeline that posts annotations to Grafana when a deployment succeeds:

var http = require("http");

function postGrafanaAnnotation(build, callback) {
  var annotation = {
    text: build.definition.name + " deployed by " + build.requestedBy.displayName,
    tags: ["deployment", build.definition.name, build.repository.name],
    time: new Date(build.finishTime).getTime()
  };

  var payload = JSON.stringify(annotation);

  var options = {
    hostname: process.env.GRAFANA_HOST || "localhost",
    port: process.env.GRAFANA_PORT || 3000,
    path: "/api/annotations",
    method: "POST",
    headers: {
      "Authorization": "Bearer " + process.env.GRAFANA_API_KEY,
      "Content-Type": "application/json",
      "Content-Length": Buffer.byteLength(payload)
    }
  };

  var req = http.request(options, function(res) {
    var body = "";
    res.on("data", function(chunk) { body += chunk; });
    res.on("end", function() {
      callback(null, JSON.parse(body));
    });
  });

  req.on("error", callback);
  req.write(payload);
  req.end();
}

Enable the annotation query in your dashboard settings so the markers appear on all panels.

Team Performance Dashboards

Team dashboards aggregate metrics across all pipelines owned by a team. Add a team tag to your metrics by mapping pipeline names to teams in a configuration file:

var TEAM_MAP = {
  "api-service-build": "backend",
  "api-service-deploy": "backend",
  "web-app-build": "frontend",
  "web-app-deploy": "frontend",
  "mobile-build": "mobile",
  "data-pipeline": "data-engineering"
};

function getTeamForPipeline(pipelineName) {
  return TEAM_MAP[pipelineName] || "unassigned";
}

function writeBuildMetricWithTeam(build) {
  var team = getTeamForPipeline(build.definition.name);

  var point = new Influx.Point("pipeline_run")
    .tag("pipeline", build.definition.name)
    .tag("repository", build.repository.name)
    .tag("result", build.result)
    .tag("sourceBranch", build.sourceBranch)
    .tag("team", team)
    .floatField("duration_seconds", calculateDuration(build))
    .intField("succeeded", build.result === "succeeded" ? 1 : 0)
    .intField("failed", build.result === "failed" ? 1 : 0)
    .timestamp(new Date(build.finishTime));

  writeApi.writePoint(point);
}

In Grafana, add a dashboard variable for team and use it to filter all panels:

from(bucket: "pipeline-metrics")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "pipeline_run")
  |> filter(fn: (r) => r.team == "${team}")
  |> filter(fn: (r) => r._field == "duration_seconds")
  |> group(columns: ["pipeline"])
  |> aggregateWindow(every: 1d, fn: mean)

Sharing and Embedding Dashboards

Grafana supports several ways to share dashboards:

  • Direct links: Share the dashboard URL with teammates. Use the time range picker to lock the view to a specific period.
  • Snapshots: Create a read-only snapshot that anyone can view without Grafana access.
  • Embedded panels: Use the embed panel option to get an iframe URL for any single panel. Drop it into your team wiki or Confluence page.
  • PDF reports: Grafana Enterprise and the Image Renderer plugin support scheduled PDF reports via email.

For embedding a DORA metrics summary in your team's internal tooling:

<iframe
  src="https://grafana.yourcompany.com/d/dora-metrics/dora?orgId=1&from=now-30d&to=now&theme=light&panelId=1"
  width="800"
  height="400"
  frameborder="0">
</iframe>

Complete Working Example

Here is the full metric collector as a single Node.js script. It pulls Azure DevOps pipeline data, calculates all four DORA metrics, writes them to InfluxDB, and posts deployment annotations to Grafana.

var https = require("https");
var http = require("http");
var Influx = require("@influxdata/influxdb-client");

// Configuration
var CONFIG = {
  azureOrg: process.env.AZURE_DEVOPS_ORG,
  azureProject: process.env.AZURE_DEVOPS_PROJECT,
  azurePAT: process.env.AZURE_DEVOPS_PAT,
  influxUrl: process.env.INFLUX_URL || "http://localhost:8086",
  influxToken: process.env.INFLUX_TOKEN,
  influxOrg: process.env.INFLUX_ORG,
  influxBucket: process.env.INFLUX_BUCKET || "pipeline-metrics",
  grafanaHost: process.env.GRAFANA_HOST || "localhost",
  grafanaPort: parseInt(process.env.GRAFANA_PORT || "3000", 10),
  grafanaApiKey: process.env.GRAFANA_API_KEY,
  collectIntervalMinutes: parseInt(process.env.COLLECT_INTERVAL || "15", 10),
  productionBranches: ["refs/heads/main", "refs/heads/master"]
};

var TEAM_MAP = {
  "api-service-build": "backend",
  "api-service-deploy": "backend",
  "web-app-build": "frontend",
  "web-app-deploy": "frontend"
};

// InfluxDB client
var influxClient = new Influx.InfluxDB({
  url: CONFIG.influxUrl,
  token: CONFIG.influxToken
});
var writeApi = influxClient.getWriteApi(CONFIG.influxOrg, CONFIG.influxBucket, "s");

// Azure DevOps API helper
function azureRequest(path, callback) {
  var token = Buffer.from(":" + CONFIG.azurePAT).toString("base64");

  var options = {
    hostname: "dev.azure.com",
    path: path,
    method: "GET",
    headers: {
      "Authorization": "Basic " + token,
      "Content-Type": "application/json"
    }
  };

  var req = https.request(options, function(res) {
    var body = "";
    res.on("data", function(chunk) { body += chunk; });
    res.on("end", function() {
      if (res.statusCode !== 200) {
        return callback(new Error("Azure API " + res.statusCode + ": " + body));
      }
      try {
        callback(null, JSON.parse(body));
      } catch (e) {
        callback(new Error("Invalid JSON response"));
      }
    });
  });

  req.on("error", callback);
  req.end();
}

// Fetch builds since a given time
function fetchBuilds(sinceMinutes, callback) {
  var since = new Date(Date.now() - sinceMinutes * 60 * 1000);
  var path = "/" + CONFIG.azureOrg + "/" + CONFIG.azureProject +
    "/_apis/build/builds?minTime=" + since.toISOString() +
    "&api-version=7.1&$top=500&statusFilter=completed";

  azureRequest(path, function(err, data) {
    if (err) return callback(err);
    callback(null, data.value || []);
  });
}

// Fetch commit details for lead time calculation
function fetchCommit(repoId, commitId, callback) {
  var path = "/" + CONFIG.azureOrg + "/" + CONFIG.azureProject +
    "/_apis/git/repositories/" + repoId +
    "/commits/" + commitId + "?api-version=7.1";

  azureRequest(path, callback);
}

// Calculate build duration in seconds
function calculateDuration(build) {
  if (!build.startTime || !build.finishTime) return 0;
  var start = new Date(build.startTime).getTime();
  var finish = new Date(build.finishTime).getTime();
  return (finish - start) / 1000;
}

// Check if build is a production deployment
function isProductionBuild(build) {
  return CONFIG.productionBranches.indexOf(build.sourceBranch) !== -1;
}

// Get team for a pipeline
function getTeam(pipelineName) {
  return TEAM_MAP[pipelineName] || "unassigned";
}

// Write build metrics to InfluxDB
function writeBuildMetrics(builds) {
  builds.forEach(function(build) {
    if (!build.finishTime) return;

    var point = new Influx.Point("pipeline_run")
      .tag("pipeline", build.definition.name)
      .tag("repository", build.repository.name)
      .tag("result", build.result || "unknown")
      .tag("sourceBranch", build.sourceBranch)
      .tag("reason", build.reason)
      .tag("team", getTeam(build.definition.name))
      .tag("is_production", isProductionBuild(build) ? "true" : "false")
      .floatField("duration_seconds", calculateDuration(build))
      .intField("succeeded", build.result === "succeeded" ? 1 : 0)
      .intField("failed", build.result === "failed" ? 1 : 0)
      .intField("count", 1)
      .timestamp(new Date(build.finishTime));

    writeApi.writePoint(point);
  });
}

// Calculate and write lead time metrics
function writeLeadTimeMetrics(builds, callback) {
  var prodBuilds = builds.filter(function(b) {
    return isProductionBuild(b) && b.result === "succeeded" && b.sourceVersion;
  });

  var pending = prodBuilds.length;
  if (pending === 0) return callback(null);

  prodBuilds.forEach(function(build) {
    fetchCommit(build.repository.id, build.sourceVersion, function(err, commit) {
      if (!err && commit && commit.author && commit.author.date) {
        var commitTime = new Date(commit.author.date).getTime();
        var deployTime = new Date(build.finishTime).getTime();
        var leadTimeHours = (deployTime - commitTime) / (1000 * 60 * 60);

        if (leadTimeHours >= 0 && leadTimeHours < 720) {
          var point = new Influx.Point("lead_time")
            .tag("pipeline", build.definition.name)
            .tag("repository", build.repository.name)
            .tag("team", getTeam(build.definition.name))
            .floatField("hours", leadTimeHours)
            .timestamp(new Date(build.finishTime));

          writeApi.writePoint(point);
        }
      }

      pending--;
      if (pending === 0) callback(null);
    });
  });
}

// Calculate and write MTTR metrics
function writeMTTRMetrics(builds) {
  var prodBuilds = builds.filter(isProductionBuild);
  prodBuilds.sort(function(a, b) {
    return new Date(a.finishTime) - new Date(b.finishTime);
  });

  // Group by pipeline
  var pipelineBuilds = {};
  prodBuilds.forEach(function(build) {
    var name = build.definition.name;
    if (!pipelineBuilds[name]) pipelineBuilds[name] = [];
    pipelineBuilds[name].push(build);
  });

  Object.keys(pipelineBuilds).forEach(function(pipelineName) {
    var pBuilds = pipelineBuilds[pipelineName];
    var failureStart = null;

    for (var i = 0; i < pBuilds.length; i++) {
      var build = pBuilds[i];
      if (build.result === "failed" && failureStart === null) {
        failureStart = new Date(build.finishTime);
      } else if (build.result === "succeeded" && failureStart !== null) {
        var recoveryTime = new Date(build.finishTime);
        var mttrHours = (recoveryTime - failureStart) / (1000 * 60 * 60);

        var point = new Influx.Point("mttr_incident")
          .tag("pipeline", pipelineName)
          .tag("team", getTeam(pipelineName))
          .floatField("mttr_hours", mttrHours)
          .timestamp(recoveryTime);

        writeApi.writePoint(point);
        failureStart = null;
      }
    }
  });
}

// Write change failure rate
function writeChangeFailureRate(builds) {
  var prodBuilds = builds.filter(isProductionBuild);
  var total = prodBuilds.length;
  var failed = prodBuilds.filter(function(b) {
    return b.result === "failed";
  }).length;

  if (total === 0) return;

  var rate = (failed / total) * 100;

  var point = new Influx.Point("change_failure_rate")
    .floatField("rate_percent", rate)
    .intField("total_deployments", total)
    .intField("failed_deployments", failed)
    .timestamp(new Date());

  writeApi.writePoint(point);
}

// Post deployment annotations to Grafana
function postAnnotations(builds) {
  var deployments = builds.filter(function(b) {
    return isProductionBuild(b) && b.result === "succeeded";
  });

  deployments.forEach(function(build) {
    var annotation = {
      text: build.definition.name + " deployed (" +
            build.buildNumber + ")",
      tags: ["deployment", build.definition.name],
      time: new Date(build.finishTime).getTime()
    };

    var payload = JSON.stringify(annotation);

    var options = {
      hostname: CONFIG.grafanaHost,
      port: CONFIG.grafanaPort,
      path: "/api/annotations",
      method: "POST",
      headers: {
        "Authorization": "Bearer " + CONFIG.grafanaApiKey,
        "Content-Type": "application/json",
        "Content-Length": Buffer.byteLength(payload)
      }
    };

    var req = http.request(options, function(res) {
      var body = "";
      res.on("data", function(chunk) { body += chunk; });
      res.on("end", function() {
        if (res.statusCode !== 200) {
          console.error("Annotation failed:", res.statusCode, body);
        }
      });
    });

    req.on("error", function(err) {
      console.error("Annotation error:", err.message);
    });

    req.write(payload);
    req.end();
  });
}

// Main collection cycle
function collect() {
  var intervalMinutes = CONFIG.collectIntervalMinutes;
  console.log("[" + new Date().toISOString() + "] Collecting pipeline metrics...");

  fetchBuilds(intervalMinutes + 5, function(err, builds) {
    if (err) {
      console.error("Failed to fetch builds:", err.message);
      return;
    }

    console.log("Fetched " + builds.length + " builds");

    if (builds.length === 0) return;

    writeBuildMetrics(builds);
    writeMTTRMetrics(builds);
    writeChangeFailureRate(builds);

    writeLeadTimeMetrics(builds, function(err) {
      if (err) console.error("Lead time error:", err.message);

      writeApi.flush().then(function() {
        console.log("Metrics written to InfluxDB");

        if (CONFIG.grafanaApiKey) {
          postAnnotations(builds);
        }
      }).catch(function(err) {
        console.error("InfluxDB flush error:", err.message);
      });
    });
  });
}

// Run on startup and schedule
collect();
setInterval(collect, CONFIG.collectIntervalMinutes * 60 * 1000);
console.log("Pipeline metric collector started. Interval: " +
            CONFIG.collectIntervalMinutes + " minutes");

Run it with:

npm init -y
npm install @influxdata/influxdb-client

AZURE_DEVOPS_ORG=myorg \
AZURE_DEVOPS_PROJECT=myproject \
AZURE_DEVOPS_PAT=xxxx \
INFLUX_TOKEN=xxxx \
INFLUX_ORG=myorg \
GRAFANA_API_KEY=xxxx \
node collector.js

Common Issues and Troubleshooting

1. Azure DevOps API rate limiting

The Azure DevOps REST API has rate limits that vary by endpoint. If you hit a 429 status code, the response includes a Retry-After header. Add exponential backoff to your collector:

function azureRequestWithRetry(path, retries, callback) {
  azureRequest(path, function(err, data) {
    if (err && err.message.indexOf("429") !== -1 && retries > 0) {
      var delay = Math.pow(2, 4 - retries) * 1000;
      console.log("Rate limited. Retrying in " + delay + "ms");
      setTimeout(function() {
        azureRequestWithRetry(path, retries - 1, callback);
      }, delay);
    } else {
      callback(err, data);
    }
  });
}

2. InfluxDB timestamp conflicts

If two builds finish at the exact same second and have the same tag set, InfluxDB will overwrite one with the other. This is because InfluxDB uses measurement + tags + timestamp as the unique key. Fix this by adding the build ID as a tag:

var point = new Influx.Point("pipeline_run")
  .tag("build_id", build.id.toString())
  // ... rest of tags

3. Grafana dashboard variables not filtering

When using template variables like ${team}, make sure the variable query matches your tag values exactly. A common mistake is using show tag values from "pipeline_run" with key = "team" syntax from InfluxQL when your datasource is configured for Flux. Use this Flux query for the variable instead:

import "influxdata/influxdb/schema"
schema.tagValues(bucket: "pipeline-metrics", tag: "team")

4. Lead time calculations showing negative values

This happens when the commit timestamp is ahead of the build finish time, usually because of timezone mismatches or clock skew between systems. The collector filters out negative values and values over 720 hours (30 days) as invalid data points. If you see persistent issues, normalize all timestamps to UTC before comparing.

5. Missing data gaps in Grafana panels

When the collector is not running (server restart, maintenance), gaps appear in the dashboard. Configure Grafana panels to use "connect null values" in the panel options to draw lines through gaps. For stat panels, increase the query range to ensure there is always data available, even during short outages.

Best Practices

  • Start with deployment frequency. It is the easiest DORA metric to collect and provides immediate value. Add the other three metrics once the pipeline is stable.

  • Use tags sparingly. Every unique tag combination creates a new series in InfluxDB. High-cardinality tags like commit SHA or build number as tags will bloat your database. Use them as fields instead, unless you need to filter by them.

  • Set retention policies. Raw pipeline metrics older than 90 days are rarely useful. Configure InfluxDB retention policies to automatically expire old data, and use continuous queries or tasks to downsample into hourly or daily aggregates for long-term trend analysis.

  • Separate production from non-production. Always filter by the production branch when calculating DORA metrics. Including feature branch builds in deployment frequency makes the number meaningless.

  • Version your dashboard JSON. Export your Grafana dashboard as JSON and commit it to your repository. This lets you track changes, roll back broken dashboards, and provision dashboards automatically in new Grafana instances.

  • Use dashboard folders and permissions. Organize dashboards by team. Give each team edit access to their own folder and read access to the organization-wide DORA dashboard. This prevents accidental edits while keeping metrics visible.

  • Add context with annotations. Deployment annotations on application performance dashboards help you correlate code changes with behavior changes. When a latency spike lines up with a deployment marker, you know where to look.

  • Alert on trends, not individual failures. A single pipeline failure is normal. Three failures in a row on a production pipeline is worth alerting on. Set alert thresholds based on patterns rather than individual events to reduce alert fatigue.

  • Run the collector as a service. Use systemd, PM2, or a container to keep the collector running. Add health checks and restart policies so it recovers from crashes automatically.

References

Powered by Contentful