Test Result Analysis and Trend Reporting
Analyze test results and track quality trends in Azure DevOps with OData analytics, custom dashboards, and automated reporting
Test Result Analysis and Trend Reporting
Running tests is only half the battle. The real value comes from analyzing results over time, spotting regressions before they compound, and building quality gates that prevent bad code from shipping. Azure DevOps provides a rich set of test analytics capabilities, from built-in dashboards to OData feeds and REST APIs, that let you build exactly the reporting pipeline your team needs.
Prerequisites
- An Azure DevOps organization with an active project
- Test runs publishing results to Azure DevOps (via pipelines or manual runs)
- Node.js 16+ installed locally
- A Personal Access Token (PAT) with Test Management read permissions
- Basic familiarity with Azure DevOps pipelines and test plans
- Analytics enabled on your Azure DevOps organization (enabled by default)
Understanding Test Result Data in Azure DevOps
Every time a pipeline runs tests, Azure DevOps captures a structured hierarchy of data. At the top level you have a Test Run, which contains individual Test Results. Each result records the test name, outcome (passed, failed, not executed, etc.), duration, error message, stack trace, and the pipeline context that produced it.
This data accumulates over time and becomes the foundation for trend analysis. The key entities you will work with are:
- TestRuns — A collection of test results from a single execution. Contains metadata like the build, pipeline, environment, and completion date.
- TestResults — Individual test case outcomes within a run. Each has an outcome, duration, error details, and an associated test case reference.
- TestResultsDaily — A pre-aggregated Analytics view that rolls up pass/fail counts by day. This is the fastest way to build trend charts.
- TestSuite and TestPoint — Used in manual test plans. Points track the assignment of a test case to a configuration and tester.
The distinction between raw results and the Analytics aggregations matters. Raw results via the REST API give you every detail but require pagination and client-side aggregation. The Analytics OData feed gives you pre-computed rollups that are far more efficient for dashboards and trend queries.
Built-in Test Analytics
Before writing any code, know what Azure DevOps gives you out of the box. Navigate to Test Plans > Analytics or add Test analytics widgets to your dashboards.
The built-in analytics provide:
- Pass Rate Trend — A line chart showing the percentage of passing tests over a configurable time window. You can filter by pipeline, branch, or test file.
- Test Failures — A ranked list of the most frequently failing tests. This is where you start your investigation.
- Test Duration — Average execution time per test over time. Useful for catching performance regressions in your test suite.
- Flaky Tests — Tests that flip between pass and fail across runs without code changes. Azure DevOps flags these automatically when it detects inconsistent outcomes on the same commit.
These built-in views cover 80% of what most teams need. But when you need custom aggregations, cross-project reporting, or automated alerting, you need to go deeper.
Pass Rate Trends and Failure Analysis
Pass rate is the single most important quality metric. A declining pass rate means your codebase is accumulating defects faster than you are fixing them. Track it at multiple levels:
- Overall pass rate — Across all test suites and pipelines
- Per-pipeline pass rate — Isolate which pipeline is degrading
- Per-test-suite pass rate — Narrow down to the area of the product
- Per-branch pass rate — Compare main branch health versus feature branches
Failure categorization is equally important. Not all failures are created equal:
- Product bugs — The test correctly caught a regression. This is what tests are for.
- Test bugs — The test itself is broken. Flaky assertions, environment dependencies, race conditions.
- Infrastructure failures — Timeouts, agent crashes, network issues. These mask real results.
- Configuration drift — A dependency updated, a service endpoint moved, an environment variable changed.
Azure DevOps lets you set a failure type and resolution on each test result through the UI or API. Disciplined teams categorize every failure. This data feeds back into your trend reports and tells you whether your quality problems are in the product, the tests, or the infrastructure.
Identifying Flaky Tests
Flaky tests are a cancer. They erode trust in the test suite, waste developer time investigating false failures, and eventually lead to teams ignoring test results entirely.
Azure DevOps identifies flaky tests by looking at results across multiple runs of the same branch and commit. If a test passes in one run and fails in another with no code change, it gets flagged. You can enable automatic flaky test detection in Project Settings > Test Management.
Once enabled, you have options:
- Report flaky tests — Mark them in results but still count them as failures
- Include flaky tests in pass rate — Exclude them from pass rate calculations so they do not mask real regressions
I recommend reporting but not excluding. Excluding flaky tests from pass rate gives you a rosier picture than reality. Instead, track flaky test count as its own metric and drive it to zero.
Programmatically, you can identify flaky tests by querying for tests that have both pass and fail outcomes within a rolling window:
var axios = require("axios");
var ORG = "https://dev.azure.com/myorg";
var PROJECT = "myproject";
var PAT = process.env.AZURE_DEVOPS_PAT;
function getAuthHeader() {
var token = Buffer.from(":" + PAT).toString("base64");
return { Authorization: "Basic " + token };
}
function findFlakyTests(pipelineId, days, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - days);
var dateStr = minDate.toISOString().split("T")[0];
var url = ORG + "/" + PROJECT + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineId + "' " +
"and DateSK ge " + dateStr.replace(/-/g, "") +
")" +
"/groupby(" +
"(TestSK, Test/TestName)," +
"aggregate(" +
"ResultPassCount with sum as TotalPass," +
"ResultFailCount with sum as TotalFail" +
")" +
")" +
"&$filter=TotalPass gt 0 and TotalFail gt 0" +
"&$orderby=TotalFail desc";
axios.get(url, { headers: getAuthHeader() })
.then(function(response) {
var flakyTests = response.data.value.map(function(item) {
var total = item.TotalPass + item.TotalFail;
return {
testName: item.Test.TestName,
totalRuns: total,
passes: item.TotalPass,
failures: item.TotalFail,
flakyRate: ((item.TotalFail / total) * 100).toFixed(1) + "%"
};
});
callback(null, flakyTests);
})
.catch(function(err) {
callback(err);
});
}
findFlakyTests("MyPipeline", 14, function(err, results) {
if (err) {
console.error("Error:", err.message);
return;
}
console.log("Flaky tests in last 14 days:");
results.forEach(function(t) {
console.log(" " + t.testName + " — " + t.flakyRate + " failure rate (" + t.totalRuns + " runs)");
});
});
Test Duration Trends
Slow tests are expensive tests. They slow down your feedback loop, increase pipeline costs, and encourage developers to skip running tests locally. Track duration trends to catch regressions early.
The OData Analytics feed exposes ResultDurationSeconds which you can aggregate over time:
function getTestDurationTrends(pipelineName, days, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - days);
var dateStr = minDate.toISOString().split("T")[0];
var url = ORG + "/" + PROJECT + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineName + "' " +
"and DateSK ge " + dateStr.replace(/-/g, "") +
")" +
"/groupby(" +
"(DateSK)," +
"aggregate(" +
"ResultDurationSeconds with sum as TotalDuration," +
"ResultCount with sum as TotalTests" +
")" +
")" +
"&$orderby=DateSK asc";
axios.get(url, { headers: getAuthHeader() })
.then(function(response) {
var trends = response.data.value.map(function(day) {
return {
date: String(day.DateSK).replace(/(\d{4})(\d{2})(\d{2})/, "$1-$2-$3"),
totalDuration: Math.round(day.TotalDuration),
testCount: day.TotalTests,
avgDuration: (day.TotalDuration / day.TotalTests).toFixed(2)
};
});
callback(null, trends);
})
.catch(function(err) {
callback(err);
});
}
Set alerts when average test duration increases by more than 20% week-over-week. This usually means someone added a sleep, a real network call leaked into a unit test, or a test data setup is doing way more work than necessary.
Custom Queries with OData Analytics
The OData Analytics service in Azure DevOps is powerful but poorly documented. Here are the query patterns you will use most often.
The base URL is:
https://analytics.dev.azure.com/{org}/{project}/_odata/v4.0-preview/
The primary entities for test analytics are TestRuns, TestResults, and TestResultsDaily. The daily aggregation is best for trend reporting because it is pre-computed and fast.
Pass rate by week for the last 90 days:
TestResultsDaily?
$apply=filter(DateSK ge 20260101)
/groupby(
(DateSK),
aggregate(
ResultPassCount with sum as Passed,
ResultFailCount with sum as Failed,
ResultCount with sum as Total
)
)
&$orderby=DateSK asc
Top 10 slowest tests:
TestResultsDaily?
$apply=filter(DateSK ge 20260201)
/groupby(
(TestSK, Test/TestName),
aggregate(
ResultDurationSeconds with average as AvgDuration,
ResultCount with sum as Runs
)
)
&$orderby=AvgDuration desc
&$top=10
Failure count by test owner:
TestResultsDaily?
$apply=filter(DateSK ge 20260201 and Workflow eq 'Build')
/groupby(
(Test/ContainerName),
aggregate(ResultFailCount with sum as Failures)
)
&$filter=Failures gt 0
&$orderby=Failures desc
These queries return JSON and can be consumed directly in Power BI, custom dashboards, or your Node.js reporting scripts.
Building Custom Dashboards
Azure DevOps dashboards support test analytics widgets, but for truly custom views you will want to build your own. The approach is straightforward: query OData or the REST API, transform the data, and render it.
For internal dashboards, I recommend generating static HTML files that you serve from a simple Express app or host on Azure Static Web Apps. This avoids the complexity of a full frontend framework while giving you complete control over the visualization.
Use Chart.js for the visualizations. It is lightweight, works without a build step, and handles line charts, bar charts, and doughnut charts well.
The key metrics to display on a test quality dashboard:
- Pass rate trend — Line chart, 30/60/90 day views
- Failure hotspots — Bar chart of most-failing tests
- Flaky test count — Single number with trend arrow
- Average test duration — Line chart with threshold line
- Test count over time — Are you adding tests? Track it.
- Time to fix failures — How long do failures persist before being resolved?
Test Result REST API
For scenarios where you need individual test result details rather than aggregations, use the REST API directly. The OData Analytics feed is better for trends, but the REST API gives you stack traces, error messages, and attachments.
Key endpoints:
GET {org}/{project}/_apis/test/runs?api-version=7.1
GET {org}/{project}/_apis/test/runs/{runId}/results?api-version=7.1
GET {org}/{project}/_apis/test/runs/{runId}/results/{resultId}?api-version=7.1
Fetching recent test runs with their results:
function getRecentTestRuns(daysBack, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - daysBack);
var url = ORG + "/" + PROJECT + "/_apis/test/runs?" +
"minLastUpdatedDate=" + minDate.toISOString() +
"&api-version=7.1" +
"&$top=100";
axios.get(url, { headers: getAuthHeader() })
.then(function(response) {
callback(null, response.data.value);
})
.catch(function(err) {
callback(err);
});
}
function getFailedResults(runId, callback) {
var url = ORG + "/" + PROJECT + "/_apis/test/runs/" + runId + "/results?" +
"outcomes=Failed" +
"&api-version=7.1" +
"&$top=200";
axios.get(url, { headers: getAuthHeader() })
.then(function(response) {
var failures = response.data.value.map(function(result) {
return {
testName: result.testCaseTitle,
outcome: result.outcome,
duration: result.durationInMs,
errorMessage: result.errorMessage || "",
stackTrace: result.stackTrace || "",
startedDate: result.startedDate,
completedDate: result.completedDate
};
});
callback(null, failures);
})
.catch(function(err) {
callback(err);
});
}
Automated Trend Reports with Node.js
Here is where it all comes together. Build a scheduled job that queries your test analytics, detects quality regressions, and sends alerts. Run it as a cron job, a scheduled pipeline, or a serverless function.
var axios = require("axios");
var ORG = process.env.AZURE_DEVOPS_ORG;
var PROJECT = process.env.AZURE_DEVOPS_PROJECT;
var PAT = process.env.AZURE_DEVOPS_PAT;
function getHeaders() {
return {
Authorization: "Basic " + Buffer.from(":" + PAT).toString("base64"),
"Content-Type": "application/json"
};
}
function fetchPassRateTrend(pipelineName, days, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - days);
var dateSK = minDate.toISOString().split("T")[0].replace(/-/g, "");
var baseUrl = "https://analytics.dev.azure.com/" +
ORG.replace("https://dev.azure.com/", "") + "/" + PROJECT;
var query = baseUrl + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineName + "' " +
"and DateSK ge " + dateSK +
")" +
"/groupby(" +
"(DateSK)," +
"aggregate(" +
"ResultPassCount with sum as Passed," +
"ResultFailCount with sum as Failed," +
"ResultNotExecutedCount with sum as Skipped," +
"ResultCount with sum as Total" +
")" +
")" +
"&$orderby=DateSK asc";
axios.get(query, { headers: getHeaders() })
.then(function(response) {
var trend = response.data.value.map(function(day) {
var passRate = day.Total > 0 ? ((day.Passed / day.Total) * 100) : 0;
return {
date: String(day.DateSK).replace(/(\d{4})(\d{2})(\d{2})/, "$1-$2-$3"),
passed: day.Passed,
failed: day.Failed,
skipped: day.Skipped,
total: day.Total,
passRate: parseFloat(passRate.toFixed(2))
};
});
callback(null, trend);
})
.catch(function(err) {
callback(err);
});
}
function detectRegression(trend) {
if (trend.length < 7) {
return { regressing: false, message: "Not enough data" };
}
var recentDays = trend.slice(-3);
var previousDays = trend.slice(-7, -3);
var recentAvg = recentDays.reduce(function(sum, d) { return sum + d.passRate; }, 0) / recentDays.length;
var previousAvg = previousDays.reduce(function(sum, d) { return sum + d.passRate; }, 0) / previousDays.length;
var delta = recentAvg - previousAvg;
return {
regressing: delta < -2,
recentAvg: parseFloat(recentAvg.toFixed(2)),
previousAvg: parseFloat(previousAvg.toFixed(2)),
delta: parseFloat(delta.toFixed(2)),
message: delta < -2
? "Pass rate dropped " + Math.abs(delta).toFixed(1) + "% (from " + previousAvg.toFixed(1) + "% to " + recentAvg.toFixed(1) + "%)"
: "Pass rate stable at " + recentAvg.toFixed(1) + "%"
};
}
Test Result Notifications
When a regression is detected, you need to alert the right people immediately. Slack is the most common destination for these notifications. Here is a reusable notification function:
function sendSlackAlert(webhookUrl, report, callback) {
var color = report.regression.regressing ? "#e74c3c" : "#2ecc71";
var emoji = report.regression.regressing ? ":rotating_light:" : ":white_check_mark:";
var blocks = [
{
type: "header",
text: {
type: "plain_text",
text: emoji + " Test Quality Report — " + report.pipeline
}
},
{
type: "section",
fields: [
{
type: "mrkdwn",
text: "*Current Pass Rate:*\n" + report.regression.recentAvg + "%"
},
{
type: "mrkdwn",
text: "*Previous Period:*\n" + report.regression.previousAvg + "%"
},
{
type: "mrkdwn",
text: "*Trend:*\n" + (report.regression.delta >= 0 ? "+" : "") + report.regression.delta + "%"
},
{
type: "mrkdwn",
text: "*Total Tests:*\n" + report.latestTotal
}
]
}
];
if (report.topFailures && report.topFailures.length > 0) {
var failureText = report.topFailures.slice(0, 5).map(function(f, i) {
return (i + 1) + ". `" + f.testName + "` (" + f.failures + " failures)";
}).join("\n");
blocks.push({
type: "section",
text: {
type: "mrkdwn",
text: "*Top Failing Tests:*\n" + failureText
}
});
}
var payload = {
attachments: [{
color: color,
blocks: blocks
}]
};
axios.post(webhookUrl, payload)
.then(function() {
callback(null, "Notification sent");
})
.catch(function(err) {
callback(err);
});
}
Quality Gates Based on Trends
Quality gates turn passive reporting into active prevention. Instead of just telling you the pass rate is dropping, a quality gate blocks a deployment or merge when quality thresholds are not met.
Implement quality gates in your pipeline YAML:
- task: PowerShell@2
displayName: 'Check Test Quality Gate'
inputs:
targetType: 'inline'
script: |
$headers = @{
Authorization = "Basic $([Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(":$(System.AccessToken)")))"
}
$url = "$(System.CollectionUri)$(System.TeamProject)/_odata/v4.0-preview/TestResultsDaily?" +
"`$apply=filter(Pipeline/PipelineId eq $(System.DefinitionId) and DateSK ge $((Get-Date).AddDays(-7).ToString('yyyyMMdd')))" +
"/aggregate(ResultPassCount with sum as Passed, ResultCount with sum as Total)"
$result = Invoke-RestMethod -Uri $url -Headers $headers
$passRate = ($result.value[0].Passed / $result.value[0].Total) * 100
Write-Host "Current 7-day pass rate: $([math]::Round($passRate, 2))%"
if ($passRate -lt 95) {
Write-Error "Quality gate failed: pass rate $([math]::Round($passRate, 2))% is below 95% threshold"
exit 1
}
You can also implement quality gates in Node.js as a pipeline task:
function checkQualityGate(pipelineName, thresholds, callback) {
fetchPassRateTrend(pipelineName, 7, function(err, trend) {
if (err) {
callback(err);
return;
}
var totalPassed = 0;
var totalTests = 0;
trend.forEach(function(day) {
totalPassed += day.passed;
totalTests += day.total;
});
var passRate = totalTests > 0 ? (totalPassed / totalTests) * 100 : 0;
var result = {
passed: passRate >= thresholds.minPassRate,
passRate: parseFloat(passRate.toFixed(2)),
threshold: thresholds.minPassRate,
totalTests: totalTests,
message: ""
};
if (!result.passed) {
result.message = "QUALITY GATE FAILED: Pass rate " + result.passRate +
"% is below threshold of " + thresholds.minPassRate + "%";
} else {
result.message = "Quality gate passed: " + result.passRate + "% pass rate";
}
callback(null, result);
});
}
// Usage in a pipeline script
checkQualityGate("CI-Pipeline", { minPassRate: 95 }, function(err, gate) {
if (err) {
console.error("Error checking quality gate:", err.message);
process.exit(1);
}
console.log(gate.message);
if (!gate.passed) {
process.exit(1);
}
});
Complete Working Example
This is a full Node.js tool that ties everything together: queries Azure DevOps test analytics, identifies failing trends, generates an HTML report, and sends Slack notifications when quality degrades.
// test-trend-reporter.js
var axios = require("axios");
var fs = require("fs");
var path = require("path");
// ── Configuration ──────────────────────────────────────────────
var CONFIG = {
org: process.env.AZURE_DEVOPS_ORG || "myorg",
project: process.env.AZURE_DEVOPS_PROJECT || "myproject",
pat: process.env.AZURE_DEVOPS_PAT,
slackWebhook: process.env.SLACK_WEBHOOK_URL,
pipelines: (process.env.PIPELINES || "CI-Build,Integration-Tests").split(","),
lookbackDays: parseInt(process.env.LOOKBACK_DAYS || "30", 10),
passRateThreshold: parseFloat(process.env.PASS_RATE_THRESHOLD || "95"),
regressionThreshold: parseFloat(process.env.REGRESSION_THRESHOLD || "2"),
outputDir: process.env.OUTPUT_DIR || "./reports"
};
function getHeaders() {
return {
Authorization: "Basic " + Buffer.from(":" + CONFIG.pat).toString("base64")
};
}
function getAnalyticsBaseUrl() {
return "https://analytics.dev.azure.com/" + CONFIG.org + "/" + CONFIG.project;
}
// ── Data Fetching ──────────────────────────────────────────────
function fetchDailyTrend(pipelineName, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - CONFIG.lookbackDays);
var dateSK = minDate.toISOString().split("T")[0].replace(/-/g, "");
var url = getAnalyticsBaseUrl() + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineName + "' " +
"and DateSK ge " + dateSK +
")" +
"/groupby(" +
"(DateSK)," +
"aggregate(" +
"ResultPassCount with sum as Passed," +
"ResultFailCount with sum as Failed," +
"ResultNotExecutedCount with sum as Skipped," +
"ResultCount with sum as Total," +
"ResultDurationSeconds with sum as Duration" +
")" +
")" +
"&$orderby=DateSK asc";
axios.get(url, { headers: getHeaders() })
.then(function(response) {
var trend = response.data.value.map(function(day) {
return {
date: String(day.DateSK).replace(/(\d{4})(\d{2})(\d{2})/, "$1-$2-$3"),
dateSK: day.DateSK,
passed: day.Passed,
failed: day.Failed,
skipped: day.Skipped || 0,
total: day.Total,
duration: Math.round(day.Duration || 0),
passRate: day.Total > 0 ? parseFloat(((day.Passed / day.Total) * 100).toFixed(2)) : 0
};
});
callback(null, trend);
})
.catch(function(err) {
callback(err);
});
}
function fetchTopFailures(pipelineName, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - CONFIG.lookbackDays);
var dateSK = minDate.toISOString().split("T")[0].replace(/-/g, "");
var url = getAnalyticsBaseUrl() + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineName + "' " +
"and DateSK ge " + dateSK +
")" +
"/groupby(" +
"(TestSK, Test/TestName, Test/ContainerName)," +
"aggregate(" +
"ResultFailCount with sum as Failures," +
"ResultPassCount with sum as Passes," +
"ResultCount with sum as Total" +
")" +
")" +
"&$filter=Failures gt 0" +
"&$orderby=Failures desc" +
"&$top=20";
axios.get(url, { headers: getHeaders() })
.then(function(response) {
var failures = response.data.value.map(function(item) {
return {
testName: item.Test.TestName,
container: item.Test.ContainerName,
failures: item.Failures,
passes: item.Passes,
total: item.Total,
failRate: parseFloat(((item.Failures / item.Total) * 100).toFixed(1))
};
});
callback(null, failures);
})
.catch(function(err) {
callback(err);
});
}
function fetchFlakyTests(pipelineName, callback) {
var minDate = new Date();
minDate.setDate(minDate.getDate() - 14);
var dateSK = minDate.toISOString().split("T")[0].replace(/-/g, "");
var url = getAnalyticsBaseUrl() + "/_odata/v4.0-preview/TestResultsDaily?" +
"$apply=filter(" +
"Pipeline/PipelineName eq '" + pipelineName + "' " +
"and DateSK ge " + dateSK +
")" +
"/groupby(" +
"(TestSK, Test/TestName)," +
"aggregate(" +
"ResultPassCount with sum as Passes," +
"ResultFailCount with sum as Failures" +
")" +
")" +
"&$filter=Passes gt 0 and Failures gt 0" +
"&$orderby=Failures desc" +
"&$top=15";
axios.get(url, { headers: getHeaders() })
.then(function(response) {
var flaky = response.data.value.map(function(item) {
var total = item.Passes + item.Failures;
return {
testName: item.Test.TestName,
passes: item.Passes,
failures: item.Failures,
total: total,
flakyRate: parseFloat(((item.Failures / total) * 100).toFixed(1))
};
});
callback(null, flaky);
})
.catch(function(err) {
callback(err);
});
}
// ── Analysis ───────────────────────────────────────────────────
function analyzeRegression(trend) {
if (trend.length < 7) {
return { regressing: false, message: "Insufficient data", delta: 0, recentAvg: 0, previousAvg: 0 };
}
var recent = trend.slice(-3);
var previous = trend.slice(-7, -3);
var recentAvg = recent.reduce(function(s, d) { return s + d.passRate; }, 0) / recent.length;
var previousAvg = previous.reduce(function(s, d) { return s + d.passRate; }, 0) / previous.length;
var delta = recentAvg - previousAvg;
return {
regressing: delta < -CONFIG.regressionThreshold,
recentAvg: parseFloat(recentAvg.toFixed(2)),
previousAvg: parseFloat(previousAvg.toFixed(2)),
delta: parseFloat(delta.toFixed(2)),
message: delta < -CONFIG.regressionThreshold
? "REGRESSION: Pass rate dropped " + Math.abs(delta).toFixed(1) + "%"
: "Stable at " + recentAvg.toFixed(1) + "%"
};
}
function analyzeDurationTrend(trend) {
if (trend.length < 7) {
return { increasing: false, message: "Insufficient data" };
}
var recent = trend.slice(-3);
var previous = trend.slice(-7, -3);
var recentAvgDuration = recent.reduce(function(s, d) { return s + d.duration; }, 0) / recent.length;
var previousAvgDuration = previous.reduce(function(s, d) { return s + d.duration; }, 0) / previous.length;
var change = previousAvgDuration > 0
? ((recentAvgDuration - previousAvgDuration) / previousAvgDuration) * 100
: 0;
return {
increasing: change > 20,
recentAvg: Math.round(recentAvgDuration),
previousAvg: Math.round(previousAvgDuration),
changePercent: parseFloat(change.toFixed(1)),
message: change > 20
? "Duration increased " + change.toFixed(0) + "%"
: "Duration stable"
};
}
// ── HTML Report Generation ─────────────────────────────────────
function generateHtmlReport(reports) {
var timestamp = new Date().toISOString().replace(/[:.]/g, "-").slice(0, 19);
var pipelineSections = reports.map(function(report) {
var statusColor = report.regression.regressing ? "#e74c3c" : "#2ecc71";
var statusText = report.regression.regressing ? "REGRESSING" : "HEALTHY";
var trendRows = report.trend.map(function(day) {
var rowColor = day.passRate < CONFIG.passRateThreshold ? "#fff3f3" : "transparent";
return "<tr style='background:" + rowColor + "'>" +
"<td>" + day.date + "</td>" +
"<td>" + day.passed + "</td>" +
"<td>" + day.failed + "</td>" +
"<td>" + day.skipped + "</td>" +
"<td>" + day.total + "</td>" +
"<td><strong>" + day.passRate + "%</strong></td>" +
"<td>" + day.duration + "s</td>" +
"</tr>";
}).join("\n");
var failureRows = report.topFailures.slice(0, 10).map(function(f) {
return "<tr>" +
"<td>" + escapeHtml(f.testName) + "</td>" +
"<td>" + f.container + "</td>" +
"<td>" + f.failures + "</td>" +
"<td>" + f.failRate + "%</td>" +
"</tr>";
}).join("\n");
var flakyRows = report.flakyTests.slice(0, 10).map(function(f) {
return "<tr>" +
"<td>" + escapeHtml(f.testName) + "</td>" +
"<td>" + f.passes + "</td>" +
"<td>" + f.failures + "</td>" +
"<td>" + f.flakyRate + "%</td>" +
"</tr>";
}).join("\n");
var passRateData = report.trend.map(function(d) { return d.passRate; });
var dateLabels = report.trend.map(function(d) { return "'" + d.date + "'"; });
return '<div class="pipeline-section">' +
'<h2>' + escapeHtml(report.pipeline) +
' <span class="status" style="background:' + statusColor + '">' + statusText + '</span></h2>' +
'<div class="metrics-grid">' +
'<div class="metric"><span class="metric-value">' + report.regression.recentAvg + '%</span>' +
'<span class="metric-label">Current Pass Rate</span></div>' +
'<div class="metric"><span class="metric-value">' + (report.regression.delta >= 0 ? "+" : "") +
report.regression.delta + '%</span><span class="metric-label">Trend (vs prev period)</span></div>' +
'<div class="metric"><span class="metric-value">' + report.flakyTests.length + '</span>' +
'<span class="metric-label">Flaky Tests</span></div>' +
'<div class="metric"><span class="metric-value">' + report.durationTrend.recentAvg + 's</span>' +
'<span class="metric-label">Avg Duration</span></div>' +
'</div>' +
'<canvas id="chart-' + report.pipeline.replace(/[^a-zA-Z0-9]/g, "") + '" height="80"></canvas>' +
'<script>new Chart(document.getElementById("chart-' +
report.pipeline.replace(/[^a-zA-Z0-9]/g, "") + '").getContext("2d"),{' +
'type:"line",data:{labels:[' + dateLabels.join(",") + '],' +
'datasets:[{label:"Pass Rate %",data:[' + passRateData.join(",") + '],' +
'borderColor:"#3498db",backgroundColor:"rgba(52,152,219,0.1)",fill:true,tension:0.3}]},' +
'options:{scales:{y:{min:Math.max(0,Math.min.apply(null,[' + passRateData.join(",") + '])-5),' +
'max:100}},plugins:{annotation:{annotations:{threshold:{type:"line",yMin:' +
CONFIG.passRateThreshold + ',yMax:' + CONFIG.passRateThreshold +
',borderColor:"#e74c3c",borderDash:[5,5],label:{display:true,content:"Threshold: ' +
CONFIG.passRateThreshold + '%"}}}}}}});</script>' +
'<h3>Daily Results</h3>' +
'<table><thead><tr><th>Date</th><th>Passed</th><th>Failed</th><th>Skipped</th>' +
'<th>Total</th><th>Pass Rate</th><th>Duration</th></tr></thead>' +
'<tbody>' + trendRows + '</tbody></table>' +
'<h3>Top Failures</h3>' +
'<table><thead><tr><th>Test Name</th><th>Suite</th><th>Failures</th>' +
'<th>Fail Rate</th></tr></thead><tbody>' + failureRows + '</tbody></table>' +
'<h3>Flaky Tests (14 day window)</h3>' +
'<table><thead><tr><th>Test Name</th><th>Passes</th><th>Failures</th>' +
'<th>Flaky Rate</th></tr></thead><tbody>' + flakyRows + '</tbody></table>' +
'</div>';
}).join("\n");
var html = '<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">' +
'<meta name="viewport" content="width=device-width,initial-scale=1.0">' +
'<title>Test Quality Report — ' + timestamp + '</title>' +
'<script src="https://cdn.jsdelivr.net/npm/chart.js@4"></script>' +
'<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-annotation@3"></script>' +
'<style>' +
'body{font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,sans-serif;' +
'max-width:1200px;margin:0 auto;padding:20px;background:#f5f5f5;color:#333}' +
'h1{color:#2c3e50;border-bottom:3px solid #3498db;padding-bottom:10px}' +
'.pipeline-section{background:white;border-radius:8px;padding:24px;margin:20px 0;' +
'box-shadow:0 2px 4px rgba(0,0,0,0.1)}' +
'.status{color:white;padding:4px 12px;border-radius:12px;font-size:0.8em;' +
'vertical-align:middle;margin-left:10px}' +
'.metrics-grid{display:grid;grid-template-columns:repeat(4,1fr);gap:16px;margin:20px 0}' +
'.metric{text-align:center;padding:16px;background:#f8f9fa;border-radius:8px}' +
'.metric-value{display:block;font-size:1.8em;font-weight:bold;color:#2c3e50}' +
'.metric-label{display:block;font-size:0.85em;color:#7f8c8d;margin-top:4px}' +
'table{width:100%;border-collapse:collapse;margin:12px 0;font-size:0.9em}' +
'th,td{padding:8px 12px;text-align:left;border-bottom:1px solid #ecf0f1}' +
'th{background:#f8f9fa;font-weight:600;color:#2c3e50}' +
'tr:hover{background:#f8f9fa}' +
'.generated{text-align:center;color:#95a5a6;margin-top:30px;font-size:0.85em}' +
'</style></head><body>' +
'<h1>Test Quality Report</h1>' +
'<p>Generated: ' + new Date().toLocaleString() + ' | ' +
'Lookback: ' + CONFIG.lookbackDays + ' days | ' +
'Threshold: ' + CONFIG.passRateThreshold + '%</p>' +
pipelineSections +
'<p class="generated">Generated by test-trend-reporter.js</p>' +
'</body></html>';
return html;
}
function escapeHtml(str) {
return String(str)
.replace(/&/g, "&")
.replace(/</g, "<")
.replace(/>/g, ">")
.replace(/"/g, """);
}
// ── Slack Notification ─────────────────────────────────────────
function notifySlack(reports, callback) {
if (!CONFIG.slackWebhook) {
console.log("No Slack webhook configured, skipping notification");
callback(null);
return;
}
var regressions = reports.filter(function(r) { return r.regression.regressing; });
if (regressions.length === 0) {
console.log("No regressions detected, skipping Slack notification");
callback(null);
return;
}
var blocks = [
{
type: "header",
text: { type: "plain_text", text: ":rotating_light: Test Quality Regression Detected" }
}
];
regressions.forEach(function(report) {
var topFails = report.topFailures.slice(0, 3).map(function(f) {
return "`" + f.testName + "` (" + f.failures + " failures)";
}).join("\n");
blocks.push({
type: "section",
text: {
type: "mrkdwn",
text: "*" + report.pipeline + "*\n" +
"Pass rate: " + report.regression.previousAvg + "% -> " +
report.regression.recentAvg + "% (" + report.regression.delta + "%)\n" +
"*Top failures:*\n" + topFails
}
});
blocks.push({ type: "divider" });
});
var payload = {
blocks: blocks,
text: "Test quality regression detected in " + regressions.length + " pipeline(s)"
};
axios.post(CONFIG.slackWebhook, payload)
.then(function() {
console.log("Slack notification sent for " + regressions.length + " regression(s)");
callback(null);
})
.catch(function(err) {
console.error("Failed to send Slack notification:", err.message);
callback(err);
});
}
// ── Main Execution ─────────────────────────────────────────────
function processPipeline(pipelineName, callback) {
console.log("Processing pipeline: " + pipelineName);
fetchDailyTrend(pipelineName, function(err, trend) {
if (err) {
console.error("Error fetching trend for " + pipelineName + ":", err.message);
callback(err);
return;
}
fetchTopFailures(pipelineName, function(err, failures) {
if (err) {
console.error("Error fetching failures for " + pipelineName + ":", err.message);
callback(err);
return;
}
fetchFlakyTests(pipelineName, function(err, flakyTests) {
if (err) {
console.error("Error fetching flaky tests for " + pipelineName + ":", err.message);
callback(err);
return;
}
var regression = analyzeRegression(trend);
var durationTrend = analyzeDurationTrend(trend);
var report = {
pipeline: pipelineName,
trend: trend,
topFailures: failures,
flakyTests: flakyTests,
regression: regression,
durationTrend: durationTrend,
latestTotal: trend.length > 0 ? trend[trend.length - 1].total : 0
};
console.log(" " + regression.message);
console.log(" " + durationTrend.message);
console.log(" Top failures: " + failures.length + ", Flaky: " + flakyTests.length);
callback(null, report);
});
});
});
}
function run() {
if (!CONFIG.pat) {
console.error("AZURE_DEVOPS_PAT environment variable is required");
process.exit(1);
}
console.log("Test Trend Reporter");
console.log("Organization: " + CONFIG.org);
console.log("Project: " + CONFIG.project);
console.log("Pipelines: " + CONFIG.pipelines.join(", "));
console.log("Lookback: " + CONFIG.lookbackDays + " days");
console.log("---");
var remaining = CONFIG.pipelines.length;
var allReports = [];
var hasError = false;
CONFIG.pipelines.forEach(function(pipeline) {
processPipeline(pipeline.trim(), function(err, report) {
if (err) {
hasError = true;
} else {
allReports.push(report);
}
remaining--;
if (remaining === 0) {
if (allReports.length === 0) {
console.error("No reports generated");
process.exit(1);
}
// Generate HTML report
var html = generateHtmlReport(allReports);
var outputDir = path.resolve(CONFIG.outputDir);
if (!fs.existsSync(outputDir)) {
fs.mkdirSync(outputDir, { recursive: true });
}
var filename = "test-report-" + new Date().toISOString().split("T")[0] + ".html";
var outputPath = path.join(outputDir, filename);
fs.writeFileSync(outputPath, html);
console.log("\nReport saved to: " + outputPath);
// Send Slack notification if regressions detected
notifySlack(allReports, function(err) {
if (err) {
console.error("Slack notification failed");
}
var regressions = allReports.filter(function(r) { return r.regression.regressing; });
if (regressions.length > 0) {
console.log("\nWARNING: " + regressions.length + " pipeline(s) showing regression");
process.exit(1);
}
console.log("\nAll pipelines healthy");
process.exit(0);
});
}
});
});
}
run();
Save this as test-trend-reporter.js and run it with:
export AZURE_DEVOPS_PAT="your-pat-token"
export AZURE_DEVOPS_ORG="your-org"
export AZURE_DEVOPS_PROJECT="your-project"
export PIPELINES="CI-Build,Integration-Tests,E2E-Tests"
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/xxx/yyy/zzz"
node test-trend-reporter.js
To run it on a schedule in Azure Pipelines:
schedules:
- cron: "0 8 * * 1-5"
displayName: "Daily test quality report"
branches:
include:
- main
always: true
pool:
vmImage: 'ubuntu-latest'
steps:
- task: NodeTool@0
inputs:
versionSpec: '20.x'
- script: |
npm install axios
node test-trend-reporter.js
displayName: 'Generate test quality report'
env:
AZURE_DEVOPS_PAT: $(System.AccessToken)
AZURE_DEVOPS_ORG: $(System.CollectionUri)
AZURE_DEVOPS_PROJECT: $(System.TeamProject)
SLACK_WEBHOOK_URL: $(SlackWebhookUrl)
- publish: $(System.DefaultWorkingDirectory)/reports
artifact: test-quality-report
condition: always()
Common Issues and Troubleshooting
OData query returns empty results despite test runs existing. The Analytics service indexes data asynchronously. After a test run completes, it can take 5 to 15 minutes for results to appear in OData queries. The REST API (_apis/test/runs) reflects results immediately. If your automated report runs too soon after a pipeline completes, add a delay or use the REST API for the latest run and OData for historical trends.
"The query specified in the URI is not valid" errors from OData. OData filter syntax is strict. Common mistakes include using double quotes instead of single quotes around string values, forgetting to URL-encode spaces, and incorrect $apply ordering. The filter must come before groupby in $apply clauses. Test your queries in a browser first by pasting the full URL and authenticating with your PAT.
Pass rate calculations differ between the built-in widget and your custom query. The built-in test analytics widgets filter by workflow type (Build vs Release) and may exclude certain outcomes like NotImpacted. Make sure your OData query uses the same filters. Add Workflow eq 'Build' to your filter clause if you only want pipeline test results and exclude manual test plan runs.
PAT authentication returns 401 or 203 (non-HTML response). Ensure your PAT has the Analytics (Read) scope for OData queries and Test Management (Read) for REST API calls. If you are using System.AccessToken in a pipeline, the project's build service account needs permissions to the Analytics service. Also verify the PAT has not expired — a common oversight in scheduled jobs that worked fine for months and then suddenly stop.
Chart.js not rendering in generated HTML reports. The HTML report uses CDN links for Chart.js. If the report is viewed in an environment without internet access (like an air-gapped network), the charts will not load. Bundle Chart.js locally by downloading the minified file and embedding it in a <script> tag within the HTML, or use inline SVG charts generated server-side with a library like d3-node.
Best Practices
Set a pass rate threshold and enforce it. Pick a number (95% is a good starting point), make it a quality gate, and do not lower it. If you cannot meet the threshold, fix the tests rather than moving the goalpost.
Categorize every failure. Uncategorized failures are wasted data. When a test fails, mark it as a product bug, test bug, or infrastructure issue. This discipline gives you the data to make informed decisions about where to invest engineering effort.
Track flaky test count as a first-class metric. Do not just flag flaky tests and move on. Set a target (zero is ideal), assign ownership, and review the count weekly. Flaky tests compound — one becomes five becomes fifty.
Alert on trends, not individual failures. A single test failure is noise. Three consecutive days of declining pass rate is a signal. Build your alerting around rolling averages and deltas, not raw fail counts.
Separate test duration from pass rate reporting. A test suite that passes 100% of the time but takes 45 minutes is still a problem. Track duration trends independently and set alerts when average duration increases more than 20% week-over-week.
Archive reports for historical comparison. Save your generated HTML reports with date-stamped filenames. When someone asks "how was our test quality six months ago?", you want to be able to answer immediately. Store them as pipeline artifacts or in blob storage.
Run your trend reporter on a schedule, not just on-demand. A daily report at 8 AM gives your team a consistent quality pulse. If you only generate reports when someone remembers to check, you have already missed the regression window.
Use the OData feed for dashboards, the REST API for investigations. OData is pre-aggregated and fast for charts and trends. When you need to dig into a specific failure's stack trace or error message, switch to the REST API. Do not try to use one for everything.