Test Plans

Playwright Test Reporting in Azure Pipelines

Run Playwright browser tests in Azure Pipelines with JUnit reporting, trace attachments, cross-browser sharding, and HTML report artifacts

Playwright Test Reporting in Azure Pipelines

Overview

Playwright is the best browser testing framework available for Node.js right now, and it is not particularly close. When you combine it with Azure Pipelines, you get a CI/CD workflow that catches regressions across Chromium, Firefox, and WebKit before they ever reach production. This article walks through the entire setup: configuring reporters, publishing test results to Azure DevOps, capturing screenshots and traces on failure, sharding tests across parallel jobs, and building a custom reporter that feeds directly into Azure Test Plans.

Prerequisites

Before diving in, make sure you have the following in place:

  • Node.js 18+ installed (LTS recommended)
  • An Azure DevOps organization and project with Pipelines enabled
  • A repository connected to Azure Pipelines (GitHub, Azure Repos, or Bitbucket)
  • Basic familiarity with YAML pipeline syntax
  • Playwright installed locally for initial development (npm init playwright@latest)

Installing and Configuring Playwright

Start by initializing Playwright in your project. If you already have a Node.js project, install the dependencies directly:

npm install --save-dev @playwright/test
npx playwright install --with-deps

The --with-deps flag installs the system-level dependencies that browsers need. This is critical in CI because Azure Pipelines agents do not ship with the libraries that Chromium, Firefox, and WebKit require.

Create a playwright.config.js file at the root of your project:

// playwright.config.js
var { defineConfig, devices } = require("@playwright/test");

module.exports = defineConfig({
  testDir: "./tests",
  timeout: 30000,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 2 : undefined,
  fullyParallel: true,

  reporter: [
    ["list"],
    ["junit", { outputFile: "test-results/junit-results.xml" }],
    ["html", { outputFolder: "playwright-report", open: "never" }],
    ["json", { outputFile: "test-results/results.json" }]
  ],

  use: {
    baseURL: process.env.BASE_URL || "http://localhost:3000",
    trace: "on-first-retry",
    screenshot: "only-on-failure",
    video: "retain-on-failure",
    actionTimeout: 10000,
    navigationTimeout: 15000
  },

  projects: [
    {
      name: "chromium",
      use: { ...devices["Desktop Chrome"] }
    },
    {
      name: "firefox",
      use: { ...devices["Desktop Firefox"] }
    },
    {
      name: "webkit",
      use: { ...devices["Desktop Safari"] }
    }
  ]
});

A few things to note here. The retries property is set to 2 only in CI. Flaky tests in a pipeline are a fact of life — network hiccups, slow container startup, font rendering differences — and retries give you a safety net without masking real failures. The workers property is capped at 2 in CI because Azure-hosted agents have limited CPU cores (typically 2 vCPUs on the standard tier). Trying to run 8 parallel workers on a 2-core machine will thrash and actually slow things down.

Understanding Playwright Reporters

Playwright supports multiple reporters simultaneously, and you should use at least three in CI:

JUnit Reporter

The JUnit XML format is the lingua franca of CI test reporting. Azure Pipelines natively understands it through the PublishTestResults task.

["junit", { outputFile: "test-results/junit-results.xml" }]

This produces a standard JUnit XML file that Azure DevOps parses into the Test tab on your pipeline run. You get pass/fail counts, duration breakdowns, and historical trends across runs.

HTML Reporter

The HTML reporter generates an interactive, self-contained report that you can browse locally or publish as a pipeline artifact.

["html", { outputFolder: "playwright-report", open: "never" }]

Always set open: "never" in CI. The default behavior tries to launch a browser to display the report, which will hang the pipeline indefinitely.

JSON Reporter

The JSON reporter dumps raw structured data that you can feed into custom dashboards, Slack notifications, or other tooling.

["json", { outputFile: "test-results/results.json" }]

I use the JSON output to build summary messages that post to our team's Slack channel after each pipeline run. A quick Node.js script reads the JSON and formats pass/fail counts with links to the failing tests.

Screenshots, Traces, and Video

Playwright has three built-in mechanisms for debugging failures in CI. Configure all three in the use block.

Screenshots on Failure

screenshot: "only-on-failure"

When a test fails, Playwright captures a PNG screenshot of the page at the moment of failure. These screenshots are saved to the test-results directory alongside the test output. In Azure Pipelines, you publish this directory as an artifact so developers can download and inspect it.

Traces on Retry

trace: "on-first-retry"

Traces are the real debugging weapon. A Playwright trace is a zip file containing a timeline of every action, network request, console log, and DOM snapshot from the test run. You open it with npx playwright show-trace trace.zip or upload it to trace.playwright.dev.

Setting on-first-retry means Playwright captures a trace only when a test fails and is retried. This keeps your artifact size manageable. If you set trace: "on", every single test generates a trace, and your pipeline artifacts will balloon to gigabytes.

Video on Failure

video: "retain-on-failure"

Video recording captures a WebM video of the entire test execution. The retain-on-failure option keeps videos only for tests that fail, discarding them for passing tests. This is a middle ground between never recording (no debugging help) and always recording (massive artifact sizes and slower tests).

The Azure Pipeline YAML

Here is a complete pipeline definition that installs Playwright, runs tests, publishes results, and stores artifacts:

# azure-pipelines.yml
trigger:
  branches:
    include:
      - main
      - develop

pool:
  vmImage: "ubuntu-latest"

steps:
  - task: NodeTool@0
    inputs:
      versionSpec: "20.x"
    displayName: "Install Node.js 20"

  - script: |
      npm ci
    displayName: "Install dependencies"

  - script: |
      npx playwright install --with-deps
    displayName: "Install Playwright browsers"

  - script: |
      npx playwright test
    displayName: "Run Playwright tests"
    env:
      CI: "true"
      BASE_URL: "$(BASE_URL)"
    continueOnError: true

  - task: PublishTestResults@2
    inputs:
      testResultsFormat: "JUnit"
      testResultsFiles: "test-results/junit-results.xml"
      mergeTestResults: true
      testRunTitle: "Playwright Tests"
    condition: always()
    displayName: "Publish test results"

  - task: PublishPipelineArtifact@1
    inputs:
      targetPath: "playwright-report"
      artifact: "playwright-html-report"
      publishLocation: "pipeline"
    condition: always()
    displayName: "Publish HTML report"

  - task: PublishPipelineArtifact@1
    inputs:
      targetPath: "test-results"
      artifact: "playwright-test-results"
      publishLocation: "pipeline"
    condition: always()
    displayName: "Publish test results with traces"

The continueOnError: true on the test step is essential. Without it, a test failure stops the pipeline immediately, and the PublishTestResults and artifact steps never run. You lose the very data you need to diagnose the failure. The condition: always() on subsequent steps ensures they execute regardless of whether the tests passed.

Parallel Execution with Sharding

For large test suites, a single pipeline job is too slow. Playwright supports sharding, which splits the test suite across multiple parallel jobs. Each shard runs a subset of the tests, and the results are merged at the end.

# azure-pipelines.yml with sharding
trigger:
  branches:
    include:
      - main

pool:
  vmImage: "ubuntu-latest"

strategy:
  matrix:
    shard_1:
      SHARD_INDEX: 1
      SHARD_TOTAL: 4
    shard_2:
      SHARD_INDEX: 2
      SHARD_TOTAL: 4
    shard_3:
      SHARD_INDEX: 3
      SHARD_TOTAL: 4
    shard_4:
      SHARD_INDEX: 4
      SHARD_TOTAL: 4

steps:
  - task: NodeTool@0
    inputs:
      versionSpec: "20.x"
    displayName: "Install Node.js 20"

  - script: |
      npm ci
    displayName: "Install dependencies"

  - script: |
      npx playwright install --with-deps
    displayName: "Install Playwright browsers"

  - script: |
      npx playwright test --shard=$(SHARD_INDEX)/$(SHARD_TOTAL)
    displayName: "Run Playwright tests (shard $(SHARD_INDEX) of $(SHARD_TOTAL))"
    env:
      CI: "true"
    continueOnError: true

  - task: PublishTestResults@2
    inputs:
      testResultsFormat: "JUnit"
      testResultsFiles: "test-results/junit-results.xml"
      mergeTestResults: true
      testRunTitle: "Playwright - Shard $(SHARD_INDEX)"
    condition: always()
    displayName: "Publish test results"

  - task: PublishPipelineArtifact@1
    inputs:
      targetPath: "test-results"
      artifact: "test-results-shard-$(SHARD_INDEX)"
      publishLocation: "pipeline"
    condition: always()
    displayName: "Publish test artifacts"

The mergeTestResults: true flag on PublishTestResults is what makes this work. Azure DevOps merges the JUnit XML files from all shards into a single test run view. Without it, you get four separate test runs in the UI and lose the ability to see the full picture.

With four shards, a test suite that takes 20 minutes in a single job drops to around 5-6 minutes. The overhead of spinning up parallel agents and installing browsers on each one adds about 90 seconds, so you do not get a perfect 4x speedup, but it is close enough.

Writing Tests for CI

Tests that work locally but fail in CI are a common headache. Here is a test file structured for reliability in both environments:

// tests/homepage.spec.js
var { test, expect } = require("@playwright/test");

test.describe("Homepage", function () {
  test.beforeEach(async function ({ page }) {
    await page.goto("/");
    await page.waitForLoadState("networkidle");
  });

  test("should display the page title", async function ({ page }) {
    await expect(page).toHaveTitle(/My Application/);
  });

  test("should load the navigation menu", async function ({ page }) {
    var nav = page.locator("nav.main-navigation");
    await expect(nav).toBeVisible();

    var menuItems = nav.locator("a");
    var count = await menuItems.count();
    expect(count).toBeGreaterThanOrEqual(3);
  });

  test("should display hero section with CTA", async function ({ page }) {
    var hero = page.locator("[data-testid='hero-section']");
    await expect(hero).toBeVisible();

    var ctaButton = hero.locator("a.cta-button");
    await expect(ctaButton).toBeVisible();
    await expect(ctaButton).toHaveAttribute("href", /\/get-started/);
  });

  test("should submit the contact form", async function ({ page }) {
    await page.click("a[href='/contact']");
    await page.waitForLoadState("networkidle");

    await page.fill("#name", "Test User");
    await page.fill("#email", "[email protected]");
    await page.fill("#message", "This is an automated test message");

    await page.click("button[type='submit']");

    var confirmation = page.locator(".success-message");
    await expect(confirmation).toBeVisible({ timeout: 10000 });
    await expect(confirmation).toContainText("Thank you");
  });
});

A few patterns to call out. First, waitForLoadState("networkidle") after navigation ensures the page is fully loaded before assertions run. In CI, the application under test might be slower to respond than on your local machine. Second, data-testid attributes are more reliable than CSS classes or text content, which can change with styling updates. Third, explicit timeouts on assertions like toBeVisible({ timeout: 10000 }) give slow CI environments extra time without making every assertion wait that long.

Debugging Failed Tests with Traces

When a test fails in your pipeline, download the playwright-test-results artifact. Inside, you will find directories named after each test file, and within those, trace files for any test that was retried.

Open a trace locally:

npx playwright show-trace test-results/homepage-Homepage-should-submit-the-contact-form/trace.zip

The trace viewer shows you:

  • Actions timeline: Every click, fill, and navigation in order
  • Before/After DOM snapshots: The page state before and after each action
  • Network requests: All HTTP requests with timing, status codes, and response bodies
  • Console logs: Any console.log, console.error, or console.warn output
  • Source code: The test source with the failing line highlighted

This is dramatically better than staring at a stack trace and guessing what went wrong. I have debugged more CI-only failures with traces in five minutes than I used to resolve in hours with screenshots alone.

You can also view traces in the browser without installing anything. Upload the zip file to trace.playwright.dev, and you get the full trace viewer as a web application.

Building a Custom Reporter for Azure DevOps

The built-in JUnit reporter works well for most cases, but sometimes you need more control. Maybe you want to attach trace files directly to test results, or post a summary comment on a pull request. Playwright's reporter API lets you build exactly that.

// reporters/azure-devops-reporter.js
var fs = require("fs");
var path = require("path");
var https = require("https");

function AzureDevOpsReporter(options) {
  this.options = options || {};
  this.results = [];
  this.startTime = null;
}

AzureDevOpsReporter.prototype.onBegin = function (config, suite) {
  this.startTime = Date.now();
  this.totalTests = suite.allTests().length;
  console.log("Running " + this.totalTests + " tests across " + config.projects.length + " projects");
};

AzureDevOpsReporter.prototype.onTestEnd = function (test, result) {
  var status = result.status;
  var duration = result.duration;
  var title = test.title;
  var projectName = test.parent.project().name;

  this.results.push({
    title: title,
    project: projectName,
    status: status,
    duration: duration,
    retry: result.retry,
    errors: result.errors.map(function (e) { return e.message; }),
    attachments: result.attachments.map(function (a) {
      return { name: a.name, path: a.path, contentType: a.contentType };
    })
  });

  var icon = status === "passed" ? "✓" : status === "failed" ? "✗" : "○";
  console.log("  " + icon + " [" + projectName + "] " + title + " (" + duration + "ms)");

  if (status === "failed") {
    // Azure DevOps logging command to mark this as an error
    result.errors.forEach(function (error) {
      console.log("##vso[task.logissue type=error]" + projectName + ": " + title + " - " + error.message);
    });
  }
};

AzureDevOpsReporter.prototype.onEnd = function (result) {
  var elapsed = Date.now() - this.startTime;
  var passed = this.results.filter(function (r) { return r.status === "passed"; }).length;
  var failed = this.results.filter(function (r) { return r.status === "failed"; }).length;
  var skipped = this.results.filter(function (r) { return r.status === "skipped"; }).length;
  var flaky = this.results.filter(function (r) { return r.status === "passed" && r.retry > 0; }).length;

  console.log("\n--- Test Summary ---");
  console.log("Total: " + this.results.length);
  console.log("Passed: " + passed);
  console.log("Failed: " + failed);
  console.log("Skipped: " + skipped);
  console.log("Flaky: " + flaky);
  console.log("Duration: " + (elapsed / 1000).toFixed(1) + "s");

  // Write summary as a markdown file for pipeline summary
  var summary = "## Playwright Test Results\n\n";
  summary += "| Metric | Count |\n|--------|-------|\n";
  summary += "| Passed | " + passed + " |\n";
  summary += "| Failed | " + failed + " |\n";
  summary += "| Skipped | " + skipped + " |\n";
  summary += "| Flaky | " + flaky + " |\n";
  summary += "| Duration | " + (elapsed / 1000).toFixed(1) + "s |\n\n";

  if (failed > 0) {
    summary += "### Failed Tests\n\n";
    this.results.filter(function (r) { return r.status === "failed"; }).forEach(function (r) {
      summary += "- **[" + r.project + "]** " + r.title + "\n";
      r.errors.forEach(function (e) {
        summary += "  - `" + e.substring(0, 200) + "`\n";
      });
    });
  }

  var outputDir = path.join(process.cwd(), "test-results");
  if (!fs.existsSync(outputDir)) {
    fs.mkdirSync(outputDir, { recursive: true });
  }
  fs.writeFileSync(path.join(outputDir, "summary.md"), summary);

  // Upload summary to pipeline using logging command
  console.log("##vso[task.uploadsummary]" + path.join(outputDir, "summary.md"));
};

module.exports = AzureDevOpsReporter;

Register the custom reporter in your config:

reporter: [
  ["./reporters/azure-devops-reporter.js"],
  ["junit", { outputFile: "test-results/junit-results.xml" }],
  ["html", { outputFolder: "playwright-report", open: "never" }]
]

The ##vso logging commands are the key integration point. Azure Pipelines parses these from stdout in real time. ##vso[task.logissue type=error] creates error annotations on the pipeline run. ##vso[task.uploadsummary] attaches a markdown file as a tab on the run summary page. These are simple, powerful, and most developers do not know they exist.

Integrating with Azure Test Plans

If your organization uses Azure Test Plans for manual testing and you want to link automated Playwright tests to test cases, you can do so through the PublishTestResults task with additional configuration:

- task: PublishTestResults@2
  inputs:
    testResultsFormat: "JUnit"
    testResultsFiles: "test-results/junit-results.xml"
    mergeTestResults: true
    testRunTitle: "Playwright Automated Tests"
    buildPlatform: "linux"
    buildConfiguration: "$(Build.BuildNumber)"
  condition: always()
  displayName: "Publish results to Test Plans"

To associate Playwright tests with specific Azure Test Plan test cases, add the test case ID to your test title or use the test annotations:

// tests/checkout.spec.js
var { test, expect } = require("@playwright/test");

test.describe("Checkout Flow @testplan", function () {
  // Links to Azure Test Case #1234
  test("[1234] should calculate cart total correctly", async function ({ page }) {
    await page.goto("/cart");
    await page.locator("[data-testid='add-item']").click();
    var total = page.locator("[data-testid='cart-total']");
    await expect(total).toHaveText("$29.99");
  });

  // Links to Azure Test Case #1235
  test("[1235] should apply discount codes", async function ({ page }) {
    await page.goto("/cart");
    await page.fill("#discount-code", "SAVE10");
    await page.click("button.apply-discount");
    var discount = page.locator("[data-testid='discount-amount']");
    await expect(discount).toHaveText("-$3.00");
  });
});

Performance: Playwright vs Selenium

I get asked this constantly, so here is the straight answer. Playwright is faster than Selenium for browser testing, and the gap widens as your test suite grows.

Metric Playwright Selenium (WebDriver)
Test startup time ~200ms ~1-2s
Parallel execution Built-in, per-worker isolation Requires Selenium Grid or third-party tools
Browser install npx playwright install Manual driver management or webdriver-manager
Cross-browser Chromium, Firefox, WebKit Chrome, Firefox, Edge, Safari (separate driver each)
Auto-waiting Built-in for all actions Manual waits or fluent waits
Trace debugging Built-in trace viewer Screenshot only (without additional tooling)
Network interception Native API Requires proxy setup
CI configuration Minimal Significant (Grid, drivers, environment)

The auto-waiting alone saves you from writing hundreds of WebDriverWait calls. Playwright waits for elements to be visible, enabled, and stable before interacting with them. This eliminates the single most common source of flaky Selenium tests.

In our production pipeline, migrating from Selenium to Playwright reduced total test execution time by 60% and flaky test rate from 12% to under 2%. The migration took about two weeks for 200 tests, and most of that time was spent rewriting selectors, not fighting the framework.

Complete Working Example

Here is the full project structure and configuration that ties everything together:

project-root/
├── tests/
│   ├── homepage.spec.js
│   ├── checkout.spec.js
│   └── api.spec.js
├── reporters/
│   └── azure-devops-reporter.js
├── playwright.config.js
├── azure-pipelines.yml
└── package.json

The package.json scripts section:

{
  "scripts": {
    "test": "npx playwright test",
    "test:chromium": "npx playwright test --project=chromium",
    "test:firefox": "npx playwright test --project=firefox",
    "test:webkit": "npx playwright test --project=webkit",
    "test:headed": "npx playwright test --headed",
    "test:debug": "npx playwright test --debug",
    "report": "npx playwright show-report",
    "trace": "npx playwright show-trace"
  },
  "devDependencies": {
    "@playwright/test": "^1.42.0"
  }
}

An API test to round out the suite:

// tests/api.spec.js
var { test, expect } = require("@playwright/test");

test.describe("API Endpoints", function () {
  test("GET /api/products should return product list", async function ({ request }) {
    var response = await request.get("/api/products");
    expect(response.status()).toBe(200);

    var body = await response.json();
    expect(Array.isArray(body.products)).toBe(true);
    expect(body.products.length).toBeGreaterThan(0);

    var product = body.products[0];
    expect(product).toHaveProperty("id");
    expect(product).toHaveProperty("name");
    expect(product).toHaveProperty("price");
  });

  test("POST /api/cart should add item to cart", async function ({ request }) {
    var response = await request.post("/api/cart", {
      data: {
        productId: "prod-001",
        quantity: 2
      }
    });
    expect(response.status()).toBe(201);

    var body = await response.json();
    expect(body.cart.items).toHaveLength(1);
    expect(body.cart.items[0].quantity).toBe(2);
  });

  test("GET /api/health should return service status", async function ({ request }) {
    var response = await request.get("/api/health");
    expect(response.status()).toBe(200);

    var body = await response.json();
    expect(body.status).toBe("healthy");
    expect(body).toHaveProperty("uptime");
    expect(body).toHaveProperty("version");
  });
});

Common Issues and Troubleshooting

1. Browsers Fail to Launch in CI

The most common error you will see is something like browserType.launch: Executable doesn't exist. This happens when you install Playwright but skip the browser binaries. Always run npx playwright install --with-deps in your pipeline. The --with-deps flag installs system libraries that browsers require on Linux (libgbm, libnss3, libatk, etc.). Without it, even if the browser binary exists, it will crash on launch.

2. Tests Pass Locally but Time Out in CI

Azure-hosted agents are slower than your development machine. A test that completes in 3 seconds locally might take 10 seconds in CI. Increase your timeout and actionTimeout values in the config for CI environments. The process.env.CI check is your friend:

timeout: process.env.CI ? 60000 : 30000,

Do not just increase timeouts blindly. If a test takes 60 seconds in CI, something is probably wrong with the test or the application under test. Investigate first, increase timeouts as a last resort.

3. HTML Report Cannot Be Viewed Directly

When you download the HTML report artifact, it will not work if you just open the index.html file directly due to browser security restrictions on local files. Use the Playwright command to serve it:

npx playwright show-report playwright-report

This starts a local web server that serves the report correctly. Alternatively, you can configure your pipeline to deploy the report to Azure Blob Storage or a static site and share the URL with your team.

4. Sharded Tests Report Duplicate Results

If you use sharding with mergeTestResults: true but see duplicate test entries in Azure DevOps, the problem is usually that each shard produces a JUnit file with the same name. Azure DevOps merges by file content, not file name, but identical test names across different browser projects can cause confusion. Make sure your JUnit output file names are unique per shard, or rely on the testRunTitle to differentiate them:

testRunTitle: "Playwright - Shard $(SHARD_INDEX)"

5. Trace Files Are Missing for Failed Tests

If your tests fail but there are no trace files in the artifacts, check your trace setting. The value on-first-retry only generates traces when a test is retried. If retries is set to 0, no traces will ever be captured. For CI, always set retries to at least 1:

retries: process.env.CI ? 2 : 0,
trace: "on-first-retry"

6. Out of Memory on Azure-Hosted Agents

Running three browser projects in parallel on a standard Azure-hosted agent (7 GB RAM) can exhaust memory. If you see processes killed by OOM, reduce the number of workers or limit the projects per job. Sharding is the proper fix: split browser projects across separate agents instead of running all three in a single job.

Best Practices

  • Always use continueOnError: true on the test step and condition: always() on reporting steps. Without these, you lose your test results and artifacts when tests fail, which is exactly when you need them most.

  • Pin your Playwright version. A minor Playwright update can change browser behavior in subtle ways. Use an exact version in package.json and update deliberately with a dedicated PR that validates the new version passes your suite.

  • Use test.describe blocks to group related tests. This maps cleanly to test suites in the JUnit output and makes the Azure DevOps Test tab navigable.

  • Cache Playwright browsers in your pipeline. Browser downloads take 60-90 seconds per run. Use the Azure Pipelines cache task with ~/.cache/ms-playwright as the path and $(npx playwright --version) as the key:

- task: Cache@2
  inputs:
    key: 'playwright | $(Agent.OS) | package-lock.json'
    path: '~/.cache/ms-playwright'
  displayName: 'Cache Playwright browsers'
  • Separate fast tests from slow tests with tags. Use test.describe names or file patterns to split your suite. Run critical smoke tests on every PR and the full suite nightly. This keeps PR feedback fast without sacrificing coverage.

  • Use data-testid attributes for selectors. CSS classes change when designers update the UI. Text content changes when copywriters update messaging. data-testid attributes are owned by the test team and do not change unless someone intentionally removes them.

  • Monitor your flaky test rate. Azure DevOps tracks which tests fail intermittently across runs. Review this data weekly. A test that passes on retry is not a passing test; it is a bug you have not found yet.

  • Run tests against a deployed environment, not localhost. In your pipeline, deploy the application to a staging slot or container first, then run Playwright against that URL. Testing against localhost in CI requires running your app server as a background process, which adds complexity and failure modes.

  • Set fullyParallel: true in your config. By default, Playwright runs tests within a file sequentially and only parallelizes across files. fullyParallel runs every test independently, which maximizes throughput.

References

Powered by Contentful