Test Plans

Quality Metrics and Test Coverage Tracking

A comprehensive guide to tracking quality metrics and test coverage in Azure DevOps, covering code coverage integration, test pass rate trends, defect density analysis, coverage gaps, dashboard widgets, OData analytics queries, and building custom quality reports with the REST API.

Quality Metrics and Test Coverage Tracking

Overview

Running tests is not the same as tracking quality. A team can have 1,000 automated tests and still ship buggy software if those tests cover the wrong code, if pass rates are declining unnoticed, or if manual test coverage gaps exist in critical areas. Quality metrics turn raw test data into actionable insights -- which areas of the codebase are undertested, whether quality is trending up or down, how test execution maps to requirements, and where the team should invest testing effort next. Azure DevOps provides built-in analytics, dashboard widgets, and APIs for tracking these metrics across both automated and manual tests.

I have built quality dashboards for teams ranging from 5 developers to 50, and the pattern is consistent: teams that measure test coverage, defect density, and pass rate trends find and fix quality problems earlier. Teams that only look at "did the build pass?" discover issues in production. This article covers the specific metrics that matter, how to collect them in Azure DevOps, and how to build dashboards and reports that drive quality decisions.

Prerequisites

  • An Azure DevOps organization with Azure Pipelines and Azure Test Plans
  • Automated tests publishing results via PublishTestResults@2 task
  • Code coverage published via PublishCodeCoverageResults@2 task
  • Azure DevOps Analytics extension enabled (default for Azure DevOps Services)
  • Dashboard edit permissions
  • Node.js 18+ for custom reporting scripts
  • At least 2-3 weeks of test execution history for meaningful trends

Core Quality Metrics

Code Coverage Percentage

Code coverage measures the percentage of source code lines, branches, or functions exercised by automated tests. It is the most commonly tracked metric and the most commonly misunderstood.

What it tells you: Which code paths are exercised by tests. A line coverage of 80% means 80% of your source lines execute during test runs.

What it does not tell you: Whether the tests actually verify correct behavior. You can have 100% coverage with zero meaningful assertions. Coverage without assertions is just code execution, not testing.

Useful thresholds:

  • Below 50%: Significant risk areas exist with no test coverage
  • 60-80%: Reasonable for most applications. Focus coverage on business logic
  • Above 80%: Diminishing returns for most codebases. The last 20% is often error handling, edge cases, and framework plumbing
  • 100%: Only practical for critical libraries and shared utilities

Test Pass Rate

The percentage of tests that pass over a time period. Track this as a trend, not a single number.

Pass Rate = (Passed Tests / Total Tests) × 100

Week 1: 245 passed / 250 total = 98.0%
Week 2: 243 passed / 255 total = 95.3%
Week 3: 240 passed / 260 total = 92.3%
Week 4: 235 passed / 265 total = 88.7%

A declining pass rate means tests are failing faster than they are being fixed. This signals accumulating technical debt, flaky tests being ignored, or insufficient developer attention to test maintenance.

Defect Density

Defects per unit of code (typically per 1,000 lines or per feature area):

Defect Density = Bugs Found / Lines of Code (in thousands)

Module: Authentication
  Lines of code: 4,200
  Bugs found (last 90 days): 12
  Defect density: 2.86 bugs per KLOC

Module: Dashboard
  Lines of code: 8,500
  Bugs found (last 90 days): 3
  Defect density: 0.35 bugs per KLOC

High defect density areas need more testing investment. Low defect density areas may be over-tested relative to their risk.

Test Execution Coverage

The percentage of test cases that have been executed in the current test plan or sprint:

Execution Coverage = (Executed Test Points / Total Test Points) × 100

Sprint 48 Test Plan:
  Total test points: 180
  Executed: 145 (passed: 130, failed: 12, blocked: 3)
  Not executed: 35
  Execution coverage: 80.6%

This metric specifically tracks manual and planned test execution, not automated test pass rates. A low execution coverage at sprint end means the team did not finish testing before the release.

Requirement Coverage

The percentage of requirements (user stories, features) that have associated test cases:

Requirement Coverage = (Stories with Tests / Total Stories) × 100

Sprint 48:
  User stories: 12
  Stories with test cases: 9
  Stories with executed tests: 7
  Requirement coverage: 75.0%
  Tested requirement coverage: 58.3%

Stories without test cases are a coverage gap. They may ship without any verification beyond developer self-testing.

Flaky Test Rate

The percentage of tests that produce inconsistent results (pass in one run, fail in the next, with no code changes):

Flaky Rate = (Flaky Tests / Total Tests) × 100

Total automated tests: 400
Tests flagged as flaky: 18
Flaky rate: 4.5%

A flaky rate above 5% erodes trust in the test suite. Developers start ignoring test failures because "it's probably just a flaky test." Actively track and fix flaky tests.

Dashboard Widgets for Quality Tracking

Azure DevOps provides several built-in widgets for quality dashboards.

Test Results Trend Widget

Shows pass/fail/inconclusive counts over time as a stacked bar or line chart. Configure it to show:

  • Test run title filter (e.g., only show "Unit Tests" results)
  • Time range (last 7 days, 14 days, 30 days)
  • Group by: Daily, Weekly

This is the single most important quality widget. A glance tells you whether quality is stable, improving, or declining.

Code Coverage Widget

Displays the latest code coverage percentage. Limited compared to the trend widget -- it only shows the current value, not the trend. Pair it with a custom chart that tracks coverage over time.

Deployment Status Widget

Shows the last deployment status per environment. Not directly a quality metric, but relevant for understanding which environments have been tested against.

Query-Based Charts

Create work item queries and pin charts to dashboards:

Active Bugs by Priority:

Query: WorkItemType = 'Bug' AND State <> 'Closed'
Chart: Pie chart grouped by Priority

Bug Trend (Open vs Closed):

Query: WorkItemType = 'Bug'
Chart: Stacked area chart, grouped by State, over time

Test Cases Without Execution:

Query: WorkItemType = 'Test Case' AND State = 'Ready'
Chart: Bar chart showing count by Area Path

Building a Quality Dashboard

Create a dashboard named "Quality Metrics" with these widgets:

Row 1: Overview
┌─────────────────┬──────────────────┬──────────────────┐
│ Test Pass Rate   │ Code Coverage    │ Active Bug Count │
│ (Trend, 30 days) │ (Current %)     │ (Query tile)     │
└─────────────────┴──────────────────┴──────────────────┘

Row 2: Trends
┌────────────────────────────────────┬───────────────────┐
│ Test Results Trend (14 days)       │ Bug Trend         │
│ (Stacked bar: pass/fail/other)    │ (Open vs Closed)  │
└────────────────────────────────────┴───────────────────┘

Row 3: Coverage Gaps
┌────────────────────────────────────┬───────────────────┐
│ Requirements Without Tests         │ Untested Areas    │
│ (Query chart by Area Path)        │ (Coverage by area)│
└────────────────────────────────────┴───────────────────┘

Row 4: Sprint Progress
┌────────────────────────────────────────────────────────┐
│ Test Plan Progress (current sprint)                    │
│ (Passed/Failed/Blocked/Not Run by suite)              │
└────────────────────────────────────────────────────────┘

OData Analytics for Custom Reports

Azure DevOps Analytics provides an OData endpoint for querying test data. This enables custom reports that go beyond what built-in widgets offer.

Querying Test Result Trends

https://analytics.dev.azure.com/{org}/{project}/_odata/v4.0-preview/TestResultsDaily
  ?$apply=
    filter(DateSK ge 20260101 and Workflow eq 'Build')
    /groupby(
      (CompletedDateSK),
      aggregate(
        ResultCount with sum as TotalTests,
        ResultPassCount with sum as PassedTests,
        ResultFailCount with sum as FailedTests
      )
    )
  &$orderby=CompletedDateSK desc

Querying Test Coverage by Area

https://analytics.dev.azure.com/{org}/{project}/_odata/v4.0-preview/TestPoints
  ?$apply=
    groupby(
      (TestSuite/TestPlanTitle, TestSuite/Title),
      aggregate(
        $count as TotalPoints,
        PointsPassedCount with sum as Passed,
        PointsFailedCount with sum as Failed,
        PointsNotExecutedCount with sum as NotRun
      )
    )
  &$orderby=TotalPoints desc

Building a Custom OData Report

var https = require("https");
var url = require("url");

var ORG = "my-organization";
var PROJECT = "my-project";
var PAT = process.env.AZURE_DEVOPS_PAT;

var AUTH = "Basic " + Buffer.from(":" + PAT).toString("base64");

function queryOData(query) {
  var apiUrl = "https://analytics.dev.azure.com/" + ORG + "/" + PROJECT +
    "/_odata/v4.0-preview/" + query;
  var parsed = url.parse(apiUrl);

  return new Promise(function (resolve, reject) {
    var options = {
      hostname: parsed.hostname,
      path: parsed.path,
      method: "GET",
      headers: {
        Authorization: AUTH,
        Accept: "application/json",
      },
    };

    var req = https.request(options, function (res) {
      var data = "";
      res.on("data", function (chunk) {
        data += chunk;
      });
      res.on("end", function () {
        if (res.statusCode >= 200 && res.statusCode < 300) {
          resolve(JSON.parse(data));
        } else {
          reject(new Error("OData query failed: " + res.statusCode + " " + data));
        }
      });
    });

    req.on("error", reject);
    req.end();
  });
}

function getTestPassRateTrend(days) {
  var startDate = new Date();
  startDate.setDate(startDate.getDate() - days);
  var dateSK = startDate.toISOString().slice(0, 10).replace(/-/g, "");

  var query = "TestResultsDaily?" +
    "$apply=" +
    "filter(DateSK ge " + dateSK + " and Workflow eq 'Build')" +
    "/groupby(" +
    "(CompletedDateSK)," +
    "aggregate(" +
    "ResultCount with sum as TotalTests," +
    "ResultPassCount with sum as PassedTests," +
    "ResultFailCount with sum as FailedTests" +
    ")" +
    ")" +
    "&$orderby=CompletedDateSK asc";

  return queryOData(query).then(function (response) {
    var data = response.value || [];
    console.log("=== Test Pass Rate Trend (Last " + days + " Days) ===");
    console.log("");

    var totalPassed = 0;
    var totalTests = 0;

    data.forEach(function (day) {
      var date = String(day.CompletedDateSK);
      var formatted = date.slice(0, 4) + "-" + date.slice(4, 6) + "-" + date.slice(6, 8);
      var passRate = day.TotalTests > 0
        ? ((day.PassedTests / day.TotalTests) * 100).toFixed(1)
        : "N/A";

      totalPassed += day.PassedTests;
      totalTests += day.TotalTests;

      var bar = "#".repeat(Math.round(parseFloat(passRate) / 2));
      console.log(
        formatted + "  " +
        passRate.padStart(5) + "%  " +
        bar + "  (" +
        day.PassedTests + "/" + day.TotalTests + ")"
      );
    });

    console.log("");
    var overallRate = totalTests > 0
      ? ((totalPassed / totalTests) * 100).toFixed(1)
      : "N/A";
    console.log("Overall: " + overallRate + "% (" + totalPassed + "/" + totalTests + ")");

    return data;
  });
}

function getCoverageByArea() {
  var query = "TestPoints?" +
    "$apply=groupby(" +
    "(TestSuite/TestPlanTitle)," +
    "aggregate(" +
    "$count as TotalPoints" +
    ")" +
    ")" +
    "&$orderby=TotalPoints desc" +
    "&$top=20";

  return queryOData(query).then(function (response) {
    var data = response.value || [];
    console.log("=== Test Points by Test Plan ===");
    console.log("");

    data.forEach(function (plan) {
      var planName = plan["[email protected]"] ||
        plan.TestSuite_TestPlanTitle || "Unknown";
      console.log("  " + planName + ": " + plan.TotalPoints + " test points");
    });

    return data;
  });
}

function getDefectDensity() {
  // Query bugs grouped by area path
  var wiqlUrl = "https://dev.azure.com/" + ORG + "/" + PROJECT +
    "/_apis/wit/wiql?api-version=7.1";
  var parsed = url.parse(wiqlUrl);

  var wiql = {
    query: "SELECT [System.Id], [System.AreaPath] FROM WorkItems " +
      "WHERE [System.WorkItemType] = 'Bug' " +
      "AND [System.CreatedDate] >= @Today - 90 " +
      "ORDER BY [System.AreaPath] ASC",
  };

  return new Promise(function (resolve, reject) {
    var options = {
      hostname: parsed.hostname,
      path: parsed.path,
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: AUTH,
      },
    };

    var req = https.request(options, function (res) {
      var data = "";
      res.on("data", function (chunk) {
        data += chunk;
      });
      res.on("end", function () {
        if (res.statusCode >= 200 && res.statusCode < 300) {
          var result = JSON.parse(data);
          var bugCount = result.workItems ? result.workItems.length : 0;
          console.log("=== Defect Summary (Last 90 Days) ===");
          console.log("Total bugs filed: " + bugCount);
          resolve(result);
        } else {
          reject(new Error("WIQL query failed: " + res.statusCode));
        }
      });
    });

    req.on("error", reject);
    req.write(JSON.stringify(wiql));
    req.end();
  });
}

// Main execution
var action = process.argv[2] || "all";
var days = parseInt(process.argv[3]) || 30;

if (action === "pass-rate" || action === "all") {
  getTestPassRateTrend(days)
    .then(function () {
      if (action === "all") {
        console.log("");
        return getDefectDensity();
      }
    })
    .catch(function (err) {
      console.error("Error: " + err.message);
      process.exit(1);
    });
} else if (action === "coverage") {
  getCoverageByArea().catch(function (err) {
    console.error("Error: " + err.message);
    process.exit(1);
  });
} else if (action === "defects") {
  getDefectDensity().catch(function (err) {
    console.error("Error: " + err.message);
    process.exit(1);
  });
} else {
  console.log("Usage: node quality-metrics.js [action] [days]");
  console.log("Actions: pass-rate, coverage, defects, all");
  console.log("Example: node quality-metrics.js pass-rate 14");
}

Running the report:

$ node quality-metrics.js pass-rate 14
=== Test Pass Rate Trend (Last 14 Days) ===

2026-01-27  97.2%  ################################################  (350/360)
2026-01-28  96.8%  ################################################  (362/374)
2026-01-29  95.1%  ###############################################  (348/366)
2026-01-30  98.0%  #################################################  (392/400)
2026-01-31  94.5%  ###############################################  (345/365)
2026-02-01  93.2%  ##############################################  (358/384)
2026-02-02  96.7%  ################################################  (380/393)
2026-02-03  97.5%  ################################################  (390/400)
2026-02-04  95.8%  ###############################################  (367/383)
2026-02-05  94.1%  ###############################################  (352/374)
2026-02-06  93.5%  ##############################################  (346/370)
2026-02-07  96.2%  ################################################  (378/393)
2026-02-08  97.8%  ################################################  (400/409)
2026-02-09  95.4%  ###############################################  (373/391)

Overall: 95.9% (5141/5362)

Complete Working Example

A comprehensive quality metrics pipeline stage that runs after tests, collects metrics, and publishes a quality report:

stages:
  - stage: Test
    jobs:
      - job: RunTests
        steps:
          - script: npm test
            displayName: Run tests

          - task: PublishTestResults@2
            inputs:
              testResultsFormat: JUnit
              testResultsFiles: "**/junit.xml"
              testRunTitle: "Unit Tests"
            condition: always()

          - task: PublishCodeCoverageResults@2
            inputs:
              summaryFileLocation: "**/cobertura-coverage.xml"
            condition: always()

  - stage: QualityGate
    dependsOn: Test
    condition: always()
    jobs:
      - job: CheckQuality
        steps:
          - task: NodeTool@0
            inputs:
              versionSpec: "20.x"

          - script: |
              node quality-metrics.js pass-rate 7
              node quality-metrics.js defects
            displayName: Generate quality report
            env:
              AZURE_DEVOPS_PAT: $(System.AccessToken)

          - script: |
              COVERAGE=$(grep -oP 'line-rate="[^"]*"' coverage/cobertura-coverage.xml | head -1 | grep -oP '[\d.]+')
              COVERAGE_PCT=$(echo "$COVERAGE * 100" | bc)
              echo "Code coverage: ${COVERAGE_PCT}%"
              if [ $(echo "$COVERAGE_PCT < 70" | bc) -eq 1 ]; then
                echo "##vso[task.logissue type=warning]Code coverage ${COVERAGE_PCT}% is below 70% threshold"
              fi
            displayName: Check coverage threshold
            condition: always()

Common Issues and Troubleshooting

Analytics Data Is Delayed

Azure DevOps Analytics processes data asynchronously. Test results and work item changes can take 15-60 minutes to appear in OData queries and Analytics widgets. If you run a quality report immediately after a test run completes, the latest results may not be included. For pipeline-integrated quality gates, use the direct REST API for test results (_apis/test/runs) instead of OData, as it reflects data in real time.

Code Coverage Shows Different Numbers Than Local

Azure DevOps code coverage is calculated from the coverage report file (Cobertura or JaCoCo XML). If the pipeline builds different files than your local environment (e.g., different build configurations, excluded files), the coverage numbers will differ. Verify that the coverage report includes the same source files in both environments. Also check that the coverage tool is configured identically -- Istanbul and c8 can produce different numbers for the same codebase depending on instrumentation settings.

Dashboard Widgets Show "No Data Available"

Common causes: (a) the Analytics extension is not installed or enabled, (b) the widget is configured to filter by a test run title that does not match any runs, (c) the time range filter excludes all data, (d) the widget requires a specific data source (like Test Plans) that the project does not use. Check the widget configuration and verify that matching data exists in the selected time range.

OData Query Returns 403 Forbidden

The PAT used for OData queries needs the Analytics (read) scope. If using $(System.AccessToken) in a pipeline, it has read access to Analytics by default. For personal PATs, ensure the Analytics scope is checked. Also verify that the Analytics extension is installed in your organization -- it is enabled by default for Azure DevOps Services but must be installed separately for Azure DevOps Server.

Coverage Trend Shows Sudden Drops

A sudden coverage drop usually means new code was added without tests, or test files were excluded from the coverage report. Less commonly, it means the coverage tool changed configuration. Investigate by comparing the coverage report file between the last good run and the current run. Look for new source files in the report with 0% coverage -- those are the uncovered additions.

Best Practices

  • Track trends, not snapshots. A single coverage number (e.g., 78%) is meaningless without context. Is it going up or down? Was it 72% last month? Trends reveal whether the team is investing in quality or accumulating debt. Configure all quality widgets to show at least 14 days of history.

  • Set quality gates that block deployment, not just warn. A quality gate that produces a warning is a quality suggestion. A quality gate that fails the pipeline and blocks deployment is a quality gate. Enforce minimum coverage thresholds, maximum bug counts, and minimum pass rates in the pipeline.

  • Measure coverage at the feature level, not just globally. Global coverage of 80% can hide a critical module at 20% coverage. Break coverage down by area path, module, or feature to identify specific areas that need testing investment.

  • Track both automated and manual test coverage. Automated test coverage tells you what code is exercised. Manual test coverage (from Azure Test Plans) tells you what requirements are verified. A feature can have 90% code coverage but zero manual testing of the user experience.

  • Review quality metrics weekly in team meetings. Quality metrics that nobody reviews are just data. Include a 5-minute quality review in your weekly team meeting: pass rate trend, coverage changes, open bug count, and any new flaky tests. This keeps quality visible as a team responsibility.

  • Distinguish leading and lagging metrics. Code coverage and pass rate are leading indicators -- they predict future quality. Bug reports from production are lagging indicators -- they report past failures. Focus on improving leading indicators to reduce lagging ones.

  • Automate quality reports. The custom report scripts in this article should run automatically, either in the pipeline or on a schedule. Manual report generation is unreliable -- someone forgets, and the team goes weeks without quality visibility.

  • Set achievable, incremental targets. Do not set a coverage target of 90% when you are at 40%. Set a target of 50% for this quarter, 60% for next quarter. Unreachable targets are ignored; achievable targets drive action.

References

Powered by Contentful