Integrations

Grafana Dashboards for Pipeline Metrics

A comprehensive guide to building Grafana dashboards for Azure DevOps pipeline metrics, covering data collection with the Azure DevOps REST API, Prometheus exporters, InfluxDB storage, dashboard design patterns, and alerting on CI/CD performance indicators.

Grafana Dashboards for Pipeline Metrics

Overview

Azure DevOps has built-in analytics, but the dashboards are limited to what Microsoft provides. Grafana gives you complete control over how pipeline data is visualized — custom panels, mixed data sources, alerting, and the ability to combine CI/CD metrics with infrastructure and application monitoring on the same dashboard. I build Grafana dashboards for every team I work with because the question "how healthy is our build pipeline?" deserves a better answer than digging through the Azure DevOps portal. This article covers collecting pipeline metrics, storing them in time-series databases, and building dashboards that surface real insights.

Prerequisites

  • Grafana instance (self-hosted or Grafana Cloud) version 9.x or later
  • Azure DevOps organization with Pipelines
  • Azure DevOps Personal Access Token with Build (Read) and Analytics (Read) scopes
  • Time-series database: InfluxDB, Prometheus, or PostgreSQL/TimescaleDB
  • Node.js 16 or later for the data collection scripts
  • Basic familiarity with Grafana dashboard concepts (panels, queries, variables)

Data Collection Architecture

Azure DevOps does not natively push metrics to Grafana. You need a collection layer that periodically queries the Azure DevOps REST API and writes metrics to a data source Grafana can query.

The architecture:

Azure DevOps REST API
        │
        ▼
  Collector Script (Node.js)
  (runs on schedule — cron/pipeline/container)
        │
        ▼
  Time-Series Database
  (InfluxDB / Prometheus / PostgreSQL)
        │
        ▼
  Grafana Dashboards

InfluxDB Collector

InfluxDB is a natural fit for CI/CD metrics — it handles time-series data efficiently, supports tags for filtering, and Grafana has excellent InfluxDB query support.

// collectors/pipeline-metrics.js
var https = require("https");
var http = require("http");

var AZURE_ORG = process.env.AZURE_ORG;
var AZURE_PROJECT = process.env.AZURE_PROJECT;
var AZURE_PAT = process.env.AZURE_PAT;
var INFLUX_URL = process.env.INFLUX_URL || "http://localhost:8086";
var INFLUX_DB = process.env.INFLUX_DB || "devops_metrics";
var INFLUX_TOKEN = process.env.INFLUX_TOKEN || "";

function azureRequest(path, callback) {
    var auth = Buffer.from(":" + AZURE_PAT).toString("base64");
    var options = {
        hostname: "dev.azure.com",
        path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis" + path,
        method: "GET",
        headers: {
            "Authorization": "Basic " + auth,
            "Accept": "application/json"
        }
    };

    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            try { callback(null, JSON.parse(data)); }
            catch (e) { callback(new Error("Parse error: " + data.substring(0, 200))); }
        });
    });
    req.on("error", callback);
    req.end();
}

function writeToInflux(lines, callback) {
    var parsed = new URL(INFLUX_URL);
    var protocol = parsed.protocol === "https:" ? https : http;
    var body = lines.join("\n");

    var writePath = "/api/v2/write?org=default&bucket=" + INFLUX_DB + "&precision=s";
    var options = {
        hostname: parsed.hostname,
        port: parsed.port,
        path: writePath,
        method: "POST",
        headers: {
            "Content-Type": "text/plain",
            "Authorization": "Token " + INFLUX_TOKEN,
            "Content-Length": Buffer.byteLength(body)
        }
    };

    var req = protocol.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            if (res.statusCode >= 200 && res.statusCode < 300) {
                callback(null);
            } else {
                callback(new Error("InfluxDB write failed (" + res.statusCode + "): " + data));
            }
        });
    });

    req.on("error", callback);
    req.write(body);
    req.end();
}

function escapeTag(value) {
    return (value || "unknown").replace(/ /g, "\\ ").replace(/,/g, "\\,").replace(/=/g, "\\=");
}

function collectBuildMetrics(callback) {
    var since = new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString();
    var path = "/build/builds?minTime=" + encodeURIComponent(since) +
        "&statusFilter=completed&$top=500&api-version=7.1";

    azureRequest(path, function (err, data) {
        if (err) { return callback(err); }

        var lines = [];
        var builds = data.value || [];

        builds.forEach(function (build) {
            var startTime = new Date(build.startTime);
            var finishTime = new Date(build.finishTime);
            var durationSec = Math.round((finishTime - startTime) / 1000);
            var timestamp = Math.floor(finishTime.getTime() / 1000);

            var definition = escapeTag(build.definition.name);
            var result = escapeTag(build.result);
            var branch = escapeTag((build.sourceBranch || "").replace("refs/heads/", ""));
            var reason = escapeTag(build.reason);
            var requestedBy = escapeTag(build.requestedFor ? build.requestedFor.displayName : "unknown");

            // Build duration metric
            lines.push(
                "pipeline_build,definition=" + definition + ",result=" + result +
                ",branch=" + branch + ",reason=" + reason +
                " duration=" + durationSec + "i" +
                ",succeeded=" + (build.result === "succeeded" ? "1i" : "0i") +
                ",failed=" + (build.result === "failed" ? "1i" : "0i") +
                " " + timestamp
            );

            // Build count metric (for rate calculations)
            lines.push(
                "pipeline_build_count,definition=" + definition + ",result=" + result +
                ",branch=" + branch +
                " count=1i " + timestamp
            );
        });

        console.log("Collected " + builds.length + " build metrics");

        if (lines.length > 0) {
            writeToInflux(lines, callback);
        } else {
            callback(null);
        }
    });
}

function collectQueueMetrics(callback) {
    azureRequest("/build/builds?statusFilter=notStarted,inProgress&api-version=7.1", function (err, data) {
        if (err) { return callback(err); }

        var queued = 0;
        var running = 0;
        var builds = data.value || [];

        builds.forEach(function (build) {
            if (build.status === "notStarted") { queued++; }
            if (build.status === "inProgress") { running++; }
        });

        var timestamp = Math.floor(Date.now() / 1000);
        var lines = [
            "pipeline_queue queued=" + queued + "i,running=" + running + "i " + timestamp
        ];

        console.log("Queue: " + queued + " queued, " + running + " running");
        writeToInflux(lines, callback);
    });
}

// Run collection
console.log("Collecting Azure DevOps pipeline metrics...");
console.log("Org: " + AZURE_ORG + ", Project: " + AZURE_PROJECT);

var pending = 2;
function done(err) {
    if (err) { console.error("Error: " + err.message); }
    pending--;
    if (pending === 0) { console.log("Collection complete."); }
}

collectBuildMetrics(done);
collectQueueMetrics(done);

Run the collector on a schedule:

# Cron: every 5 minutes
*/5 * * * * node /opt/collectors/pipeline-metrics.js

# Or as a scheduled Azure Pipeline
# azure-pipelines-collector.yml
schedules:
  - cron: "*/5 * * * *"
    displayName: "Collect pipeline metrics"
    branches:
      include:
        - main
    always: true

Prometheus Exporter

If your stack uses Prometheus, expose Azure DevOps metrics as a Prometheus endpoint:

// exporters/prometheus-exporter.js
var http = require("http");
var https = require("https");

var AZURE_ORG = process.env.AZURE_ORG;
var AZURE_PROJECT = process.env.AZURE_PROJECT;
var AZURE_PAT = process.env.AZURE_PAT;
var PORT = parseInt(process.env.PORT, 10) || 9185;

var cachedMetrics = "";
var lastFetch = 0;
var CACHE_TTL = 60000; // 1 minute

function azureRequest(path, callback) {
    var auth = Buffer.from(":" + AZURE_PAT).toString("base64");
    var options = {
        hostname: "dev.azure.com",
        path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis" + path,
        method: "GET",
        headers: { "Authorization": "Basic " + auth, "Accept": "application/json" }
    };

    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            try { callback(null, JSON.parse(data)); }
            catch (e) { callback(e); }
        });
    });
    req.on("error", callback);
    req.end();
}

function formatLabel(key, value) {
    return key + '="' + (value || "").replace(/"/g, '\\"') + '"';
}

function generateMetrics(callback) {
    var now = Date.now();
    if (now - lastFetch < CACHE_TTL && cachedMetrics) {
        return callback(null, cachedMetrics);
    }

    var since = new Date(now - 24 * 60 * 60 * 1000).toISOString();

    azureRequest("/build/builds?minTime=" + encodeURIComponent(since) +
        "&statusFilter=completed&$top=200&api-version=7.1", function (err, data) {
        if (err) { return callback(err); }

        var lines = [];
        lines.push("# HELP azure_devops_build_duration_seconds Duration of pipeline builds");
        lines.push("# TYPE azure_devops_build_duration_seconds gauge");
        lines.push("# HELP azure_devops_build_success Build success indicator (1=success, 0=failure)");
        lines.push("# TYPE azure_devops_build_success gauge");

        var builds = data.value || [];
        var definitionStats = {};

        builds.forEach(function (build) {
            var def = build.definition.name;
            var branch = (build.sourceBranch || "").replace("refs/heads/", "");
            var result = build.result || "unknown";
            var start = new Date(build.startTime);
            var finish = new Date(build.finishTime);
            var duration = (finish - start) / 1000;

            var labels = formatLabel("definition", def) + "," +
                formatLabel("branch", branch) + "," +
                formatLabel("result", result);

            lines.push("azure_devops_build_duration_seconds{" + labels + "} " + duration.toFixed(1));
            lines.push("azure_devops_build_success{" + labels + "} " + (result === "succeeded" ? "1" : "0"));

            if (!definitionStats[def]) {
                definitionStats[def] = { total: 0, succeeded: 0, failed: 0 };
            }
            definitionStats[def].total++;
            if (result === "succeeded") { definitionStats[def].succeeded++; }
            if (result === "failed") { definitionStats[def].failed++; }
        });

        lines.push("# HELP azure_devops_builds_total Total builds in last 24h by definition");
        lines.push("# TYPE azure_devops_builds_total gauge");
        lines.push("# HELP azure_devops_build_success_rate Success rate percentage");
        lines.push("# TYPE azure_devops_build_success_rate gauge");

        Object.keys(definitionStats).forEach(function (def) {
            var stats = definitionStats[def];
            var rate = stats.total > 0 ? (stats.succeeded / stats.total * 100) : 0;
            var label = formatLabel("definition", def);
            lines.push("azure_devops_builds_total{" + label + "} " + stats.total);
            lines.push("azure_devops_build_success_rate{" + label + "} " + rate.toFixed(1));
        });

        cachedMetrics = lines.join("\n") + "\n";
        lastFetch = now;
        callback(null, cachedMetrics);
    });
}

var server = http.createServer(function (req, res) {
    if (req.url === "/metrics") {
        generateMetrics(function (err, metrics) {
            if (err) {
                res.writeHead(500);
                res.end("Error: " + err.message);
            } else {
                res.writeHead(200, { "Content-Type": "text/plain" });
                res.end(metrics);
            }
        });
    } else {
        res.writeHead(200);
        res.end("Azure DevOps Prometheus Exporter. GET /metrics for data.\n");
    }
});

server.listen(PORT, function () {
    console.log("Prometheus exporter on port " + PORT);
});

Add the scrape target in Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: "azure-devops"
    scrape_interval: 60s
    static_configs:
      - targets: ["exporter:9185"]

Grafana Dashboard Design

Key Panels

A CI/CD pipeline dashboard should include these panels:

Build Success Rate (Stat panel)

InfluxQL:
SELECT mean("succeeded") * 100 FROM "pipeline_build"
WHERE $timeFilter GROUP BY time(1h) fill(null)

PromQL:
avg(azure_devops_build_success_rate) by (definition)

Build Duration Trend (Time series panel)

InfluxQL:
SELECT mean("duration") FROM "pipeline_build"
WHERE $timeFilter GROUP BY time(1h), "definition" fill(null)

PromQL:
avg(azure_devops_build_duration_seconds) by (definition)

Failed Builds Table (Table panel)

InfluxQL:
SELECT "definition", "branch", "duration" FROM "pipeline_build"
WHERE "result" = 'failed' AND $timeFilter ORDER BY time DESC LIMIT 20

Queue Depth (Time series panel)

InfluxQL:
SELECT "queued", "running" FROM "pipeline_queue" WHERE $timeFilter

DORA Metrics Dashboard

The four DORA (DevOps Research and Assessment) metrics are the gold standard for measuring engineering team performance:

  1. Deployment Frequency — How often you deploy to production
  2. Lead Time for Changes — Time from commit to production
  3. Change Failure Rate — Percentage of deployments that cause failures
  4. Mean Time to Recovery — Time to restore service after a failure
// collectors/dora-metrics.js
var https = require("https");

var AZURE_ORG = process.env.AZURE_ORG;
var AZURE_PROJECT = process.env.AZURE_PROJECT;
var AZURE_PAT = process.env.AZURE_PAT;

function azureRequest(path, callback) {
    var auth = Buffer.from(":" + AZURE_PAT).toString("base64");
    var options = {
        hostname: "dev.azure.com",
        path: "/" + AZURE_ORG + "/" + AZURE_PROJECT + "/_apis" + path,
        method: "GET",
        headers: { "Authorization": "Basic " + auth, "Accept": "application/json" }
    };
    var req = https.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            try { callback(null, JSON.parse(data)); } catch (e) { callback(e); }
        });
    });
    req.on("error", callback);
    req.end();
}

function calculateDORA(days, callback) {
    var since = new Date(Date.now() - days * 24 * 60 * 60 * 1000).toISOString();

    // Get production deployments (builds on main that succeeded)
    azureRequest("/build/builds?minTime=" + encodeURIComponent(since) +
        "&statusFilter=completed&resultFilter=succeeded" +
        "&branchName=refs/heads/main&$top=1000&api-version=7.1", function (err, data) {
        if (err) { return callback(err); }

        var builds = data.value || [];

        // Deployment Frequency
        var deploymentFrequency = builds.length / days;

        // Lead Time for Changes (commit to deploy)
        var leadTimes = [];
        builds.forEach(function (build) {
            if (build.startTime && build.finishTime && build.sourceVersion) {
                var buildFinish = new Date(build.finishTime).getTime();
                var buildStart = new Date(build.startTime).getTime();
                // Approximate lead time as build duration (full lead time needs commit timestamp)
                leadTimes.push((buildFinish - buildStart) / 1000 / 60); // minutes
            }
        });

        var avgLeadTime = leadTimes.length > 0
            ? leadTimes.reduce(function (a, b) { return a + b; }, 0) / leadTimes.length
            : 0;

        // Get failed builds for change failure rate
        azureRequest("/build/builds?minTime=" + encodeURIComponent(since) +
            "&statusFilter=completed&resultFilter=failed" +
            "&branchName=refs/heads/main&$top=1000&api-version=7.1", function (err2, failedData) {
            if (err2) { return callback(err2); }

            var failedBuilds = failedData.value || [];
            var totalMainBuilds = builds.length + failedBuilds.length;
            var changeFailureRate = totalMainBuilds > 0
                ? (failedBuilds.length / totalMainBuilds * 100)
                : 0;

            callback(null, {
                period: days + " days",
                deploymentFrequency: Math.round(deploymentFrequency * 100) / 100,
                deploymentFrequencyUnit: "deploys/day",
                avgLeadTimeMinutes: Math.round(avgLeadTime),
                changeFailureRate: Math.round(changeFailureRate * 10) / 10,
                totalDeployments: builds.length,
                failedDeployments: failedBuilds.length
            });
        });
    });
}

calculateDORA(30, function (err, metrics) {
    if (err) { return console.error("DORA calculation failed:", err.message); }

    console.log("\n=== DORA Metrics (Last 30 Days) ===");
    console.log("Deployment Frequency: " + metrics.deploymentFrequency + " " + metrics.deploymentFrequencyUnit);
    console.log("Avg Lead Time: " + metrics.avgLeadTimeMinutes + " minutes");
    console.log("Change Failure Rate: " + metrics.changeFailureRate + "%");
    console.log("Total Deployments: " + metrics.totalDeployments);
    console.log("Failed Deployments: " + metrics.failedDeployments);

    // Write to InfluxDB or expose as Prometheus metrics
    // ...
});

Dashboard Variables

Add template variables so users can filter by pipeline definition, branch, and time range:

Variable: definition
  Type: Query
  Query: SHOW TAG VALUES FROM "pipeline_build" WITH KEY = "definition"

Variable: branch
  Type: Query
  Query: SHOW TAG VALUES FROM "pipeline_build" WITH KEY = "branch" WHERE "definition" =~ /^$definition$/

Variable: result
  Type: Custom
  Values: succeeded, failed, canceled, all

Use these in panel queries:

SELECT mean("duration") FROM "pipeline_build"
WHERE "definition" =~ /^$definition$/ AND "branch" =~ /^$branch$/
AND $timeFilter GROUP BY time($__interval) fill(null)

Grafana Alerting

Set up alerts for pipeline health degradation:

Build Failure Rate Alert

Create a Grafana alert rule:

Rule name: High Build Failure Rate
Folder: CI/CD Alerts
Evaluation group: pipeline-health
Evaluation interval: 5m

Query A (InfluxDB):
SELECT count("failed") / count("succeeded") * 100 FROM "pipeline_build"
WHERE time > now() - 1h GROUP BY "definition"

Condition: WHEN last() OF query(A) IS ABOVE 30

Notification:
  - Slack channel: #build-alerts
  - Message: "Build failure rate for {{ $labels.definition }} is {{ $values.A }}%"

Build Duration Anomaly

Rule name: Build Duration Spike
Query A:
SELECT mean("duration") FROM "pipeline_build"
WHERE time > now() - 1h AND "definition" = '$definition'

Query B (baseline):
SELECT mean("duration") FROM "pipeline_build"
WHERE time > now() - 7d AND time < now() - 1h AND "definition" = '$definition'

Condition: WHEN last() OF query(A) IS ABOVE last() OF query(B) * 2

Complete Working Example: Full Dashboard Provisioning

This script provisions a complete Grafana dashboard programmatically:

// grafana/provision-dashboard.js
var http = require("http");

var GRAFANA_URL = process.env.GRAFANA_URL || "http://localhost:3000";
var GRAFANA_API_KEY = process.env.GRAFANA_API_KEY;

function grafanaRequest(method, path, body, callback) {
    var parsed = new URL(GRAFANA_URL);
    var options = {
        hostname: parsed.hostname,
        port: parsed.port,
        path: "/api" + path,
        method: method,
        headers: {
            "Content-Type": "application/json",
            "Authorization": "Bearer " + GRAFANA_API_KEY
        }
    };

    var req = http.request(options, function (res) {
        var data = "";
        res.on("data", function (chunk) { data += chunk; });
        res.on("end", function () {
            try { callback(null, JSON.parse(data)); }
            catch (e) { callback(null, data); }
        });
    });

    req.on("error", callback);
    if (body) { req.write(JSON.stringify(body)); }
    req.end();
}

var dashboard = {
    dashboard: {
        title: "Azure DevOps Pipeline Metrics",
        tags: ["azure-devops", "ci-cd", "pipelines"],
        timezone: "browser",
        refresh: "5m",
        templating: {
            list: [
                {
                    name: "definition",
                    type: "query",
                    datasource: "InfluxDB",
                    query: 'SHOW TAG VALUES FROM "pipeline_build" WITH KEY = "definition"',
                    refresh: 2,
                    includeAll: true,
                    multi: true
                }
            ]
        },
        panels: [
            {
                title: "Build Success Rate",
                type: "stat",
                gridPos: { h: 4, w: 6, x: 0, y: 0 },
                targets: [{
                    query: 'SELECT mean("succeeded") * 100 FROM "pipeline_build" WHERE $timeFilter AND "definition" =~ /^$definition$/ GROUP BY time($__interval) fill(null)',
                    datasource: "InfluxDB"
                }],
                fieldConfig: {
                    defaults: {
                        unit: "percent",
                        thresholds: {
                            steps: [
                                { value: 0, color: "red" },
                                { value: 80, color: "yellow" },
                                { value: 95, color: "green" }
                            ]
                        }
                    }
                }
            },
            {
                title: "Avg Build Duration",
                type: "stat",
                gridPos: { h: 4, w: 6, x: 6, y: 0 },
                targets: [{
                    query: 'SELECT mean("duration") FROM "pipeline_build" WHERE $timeFilter AND "definition" =~ /^$definition$/',
                    datasource: "InfluxDB"
                }],
                fieldConfig: { defaults: { unit: "s" } }
            },
            {
                title: "Builds Today",
                type: "stat",
                gridPos: { h: 4, w: 6, x: 12, y: 0 },
                targets: [{
                    query: 'SELECT count("duration") FROM "pipeline_build" WHERE time > now() - 24h AND "definition" =~ /^$definition$/',
                    datasource: "InfluxDB"
                }]
            },
            {
                title: "Queue Depth",
                type: "stat",
                gridPos: { h: 4, w: 6, x: 18, y: 0 },
                targets: [{
                    query: 'SELECT last("queued") + last("running") FROM "pipeline_queue" WHERE $timeFilter',
                    datasource: "InfluxDB"
                }]
            },
            {
                title: "Build Duration Over Time",
                type: "timeseries",
                gridPos: { h: 8, w: 12, x: 0, y: 4 },
                targets: [{
                    query: 'SELECT mean("duration") FROM "pipeline_build" WHERE $timeFilter AND "definition" =~ /^$definition$/ GROUP BY time($__interval), "definition" fill(null)',
                    datasource: "InfluxDB"
                }],
                fieldConfig: { defaults: { unit: "s" } }
            },
            {
                title: "Success/Failure Trend",
                type: "timeseries",
                gridPos: { h: 8, w: 12, x: 12, y: 4 },
                targets: [
                    {
                        query: 'SELECT count("succeeded") FROM "pipeline_build" WHERE "result" = \'succeeded\' AND $timeFilter AND "definition" =~ /^$definition$/ GROUP BY time(1h) fill(0)',
                        alias: "Succeeded",
                        datasource: "InfluxDB"
                    },
                    {
                        query: 'SELECT count("failed") FROM "pipeline_build" WHERE "result" = \'failed\' AND $timeFilter AND "definition" =~ /^$definition$/ GROUP BY time(1h) fill(0)',
                        alias: "Failed",
                        datasource: "InfluxDB"
                    }
                ]
            },
            {
                title: "Recent Failed Builds",
                type: "table",
                gridPos: { h: 8, w: 24, x: 0, y: 12 },
                targets: [{
                    query: 'SELECT "definition", "branch", "duration" FROM "pipeline_build" WHERE "result" = \'failed\' AND $timeFilter ORDER BY time DESC LIMIT 20',
                    datasource: "InfluxDB"
                }]
            }
        ]
    },
    overwrite: true
};

grafanaRequest("POST", "/dashboards/db", dashboard, function (err, result) {
    if (err) { return console.error("Failed: " + err.message); }
    console.log("Dashboard created: " + result.url);
});

Common Issues and Troubleshooting

InfluxDB write fails with "field type conflict"

partial write: field type conflict: input field "duration" on measurement "pipeline_build" is type float, already exists as type integer

InfluxDB enforces consistent field types. If the first write sent duration=120i (integer), all subsequent writes must also be integer. Drop the measurement and start fresh, or use a consistent type. In the collector script, always append i for integers or use float format consistently.

Grafana shows "No data" despite collector running

Check the InfluxDB data source configuration in Grafana. The database name, retention policy, and URL must match exactly. Test the connection in Grafana's data source settings. Also verify the time range — if the collector only writes recent data, selecting "Last 7 days" on a new installation returns nothing.

Prometheus exporter scrapes are timing out

scrape error: context deadline exceeded

The Azure DevOps API calls take too long for the default Prometheus scrape timeout (10 seconds). Either increase the scrape timeout in prometheus.yml or implement caching in the exporter (the example above caches for 60 seconds). Never make API calls synchronously during a Prometheus scrape on every request.

Dashboard variables not populating

Template variable queries run against the data source, not the Grafana backend. If the SHOW TAG VALUES query returns nothing, the data may not have the expected tag structure. Run the query directly in Grafana's Explore view to debug. Also ensure the variable's data source matches the panel data sources.

Metrics are duplicated after collector restarts

The collector may re-write metrics for the same builds on each run. InfluxDB handles this with its timestamp-based deduplication — if you write the same measurement with the same tags and timestamp, it overwrites rather than duplicates. Ensure your collector uses the build finish time as the InfluxDB timestamp, not the collection time.

Best Practices

  • Collect metrics every 5 minutes, not less. More frequent collection hits Azure DevOps rate limits and provides minimal additional value. Pipeline builds take minutes to hours — 5-minute granularity captures all meaningful changes.

  • Use tags for dimensions, fields for values. In InfluxDB, pipeline definition name, branch, and result are tags (indexed, filterable). Duration and count are fields (aggregatable). This structure enables efficient queries and proper Grafana variable support.

  • Build one overview dashboard and separate detail dashboards. The overview shows all pipelines with success rates and durations. Click a pipeline name to drill down into a detail dashboard with per-branch analysis, step-level timing, and failure patterns.

  • Include DORA metrics on engineering leadership dashboards. Deployment frequency, lead time, change failure rate, and mean time to recovery are the metrics that matter most for organizational health. Grafana can compute these from raw build data.

  • Set up Grafana alerts for sustained degradation, not individual failures. A single failed build is not alertable. A failure rate above 30% sustained for 30 minutes is. Use FOR clauses on alert rules to avoid noise.

  • Store at least 90 days of pipeline data. Trend analysis requires history. Configure InfluxDB retention policies to keep pipeline metrics for at least 90 days. Aggregate older data into daily summaries to save storage.

  • Export dashboards as JSON and store them in Git. Grafana dashboards are code. Version them alongside your pipeline YAML so changes to metrics collection and visualization are reviewed together.

References

Powered by Contentful