Most accessibility regressions ship because nobody checked before merge. The numbers back that up: 95.9% of the top one million home pages had detectable WCAG 2 failures in 2026, up from 94.8% the year before (WebAIM Million, 2026). The average page carried 56.1 detectable errors, a 10.1% jump from 51 in 2025 (WebAIM Million, 2026).
A GitHub Actions gate fixes the most common version of this problem: a contrast change or a missing label that slips through review and lands in production. This tutorial walks through a complete setup with axe, Playwright, and pa11y-ci, using current action versions as of June 2026. It builds on our developer's guide to web accessibility testing. The DIY setup is vendor-neutral and fully runnable; if you'd rather not maintain it yourself, there's a managed alternative (our own API) covered at the end, clearly marked.
TL;DR: Automated CI testing catches roughly 57% of accessibility issues by volume (Deque, 2021), or at most 30% of WCAG success criteria (European Commission, 2025). Wire @axe-core/playwright into a workflow, assert
results.violationsis empty, cache Playwright binaries, and use a threshold ratchet so legacy debt doesn't block the build. Manual and assistive-tech testing still cover the rest.
Why Run Accessibility Tests in GitHub Actions?
Moving testing into development cuts accessibility spend from around 15% to 5% of project budget, a roughly two-thirds reduction, by catching issues before they reach production (Deque, 2023). Post-production fixes can cost up to 30x more than catching the same issue during coding (Microsoft, 2023). A CI gate is the cheapest place to catch the issues automation handles well.
The litigation pressure is real too. Federal court website-accessibility lawsuit filings reached 3,117 in 2025, up about 27% from 2,452 in 2024 (Seyfarth Shaw, 2026). Most complaints cite the same handful of failures, and those failures are exactly what a CI gate catches reliably.
What can an automated gate actually catch?
The six most common error types account for 96% of all detected errors: low-contrast text (83.9%), missing alt text (53.1%), missing form labels (51%), empty links (46.3%), empty buttons (30.6%), and missing document language (13.5%) (WebAIM Million, 2026). Every one of those is programmatically detectable.
That is the case for a gate. You are not trying to verify the full screen-reader experience in CI. You are stopping the regressions that make up the bulk of real-world failures, before they reach users. Manual review handles the rest, which is the part overlays pretend to solve but cannot.
Citation capsule: The six most common WCAG failures (low-contrast text at 83.9%, missing alt text at 53.1%, missing form labels at 51%, empty links at 46.3%, empty buttons at 30.6%, and missing document language at 13.5%) account for 96% of all detected errors across the top one million home pages (WebAIM Million, 2026). These are precisely the issues an automated CI gate detects reliably.
Which Accessibility Testing Tool Should You Use in CI?
There is no single right tool, because each one targets a different layer of your test pyramid. Automated detection is possible for at most 30% of WCAG success criteria (European Commission, 2025), so the question is where in your stack you want that coverage to run. The table below maps each tool to the layer it fits best.
Tool | Best layer | When to use | Current version (June 2026) |
|---|---|---|---|
jest-axe | Component / unit (JSDOM) | Assert single components are accessible in unit tests, fast and no browser | 10.0.0 |
cypress-axe | End-to-end (Cypress) | Already on Cypress for E2E; note a reported compatibility issue with Cypress v14 | 1.7.0 |
@axe-core/playwright | End-to-end (Playwright) | Real browser checks on full pages, the default for most CI a11y gates | 4.11.3 |
pa11y-ci | Crawl many URLs | Scan a list or sitemap of pages with a pass/fail threshold | 4.1.1 |
Lighthouse CI | Page score (perf + a11y) | Track an a11y score over time alongside performance budgets | 0.15.1 (@lhci/cli) |
a11yFlow API (our product) | Hosted scan + score | You want axe-core in CI without running browsers yourself | REST ( |
For a deeper engine-by-engine comparison, see axe vs Lighthouse vs WAVE vs pa11y. The short version: use jest-axe at the component layer for speed, @axe-core/playwright for full-page E2E checks, and pa11y-ci when you need to sweep many URLs at once. Lighthouse CI is best as a score trend, not a hard gate.
Citation capsule: Automated detection is possible for at most 30% of WCAG success criteria (European Commission, 2025), while Deque measured the same coverage gap as ~57% of issues by volume (Deque, 2021). The denominators differ (issue count versus criteria count), so the figures describe the same limit rather than conflict.
How Do You Set Up an Accessibility Gate in GitHub Actions?
A working gate needs three pieces: a checkout, a Node setup with caching, and a test step that fails the build on violations. With @axe-core/playwright 4.11.3, the fail pattern is one assertion: run the scan, then assert the violations array is empty (Playwright docs). Start with the test, then wire the workflow around it.
Step 1: Write the Playwright + axe test
This test loads a page, runs axe with the WCAG 2.0/2.1 A and AA tags, and fails if any violation is returned. Save it as tests/a11y.spec.ts.
import { test, expect } from '@playwright/test';
import AxeBuilder from '@axe-core/playwright';
test('home page has no detectable WCAG A/AA violations', async ({ page }) => {
await page.goto('/');
const results = await new AxeBuilder({ page })
.withTags(['wcag2a', 'wcag2aa'])
.analyze();
expect(results.violations).toEqual([]);
});The withTags call scopes axe to the rule sets you care about. Keeping it to wcag2a and wcag2aa avoids noise from best-practice rules that are not strictly WCAG. When a violation fires, Playwright prints the offending selector and the rule ID, so the failure tells you what to fix.
Step 2: Write the workflow
This workflow runs on every push and pull request. It checks out the code, sets up Node 22 with npm caching, installs dependencies, installs the Playwright browser, runs the a11y test, and uploads the report as an artifact. Save it as .github/workflows/accessibility.yml.
name: Accessibility
on:
push:
branches: [main]
pull_request:
jobs:
a11y:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v6
with:
node-version: '22'
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps chromium
- name: Run accessibility tests
run: npx playwright test tests/a11y.spec.ts --reporter=html
- name: Upload report
if: ${{ !cancelled() }}
uses: actions/upload-artifact@v4
with:
name: a11y-report
path: playwright-report/
retention-days: 14The if: ${{ !cancelled() }} guard uploads the report even when the test step fails, which is when you most want to read it. The --reporter=html flag produces the playwright-report/ folder that the artifact step grabs.
Current action versions as of June 2026
Versions matter here because of a runner change. GitHub-hosted runners default to Node 24 starting June 2026, and Node 20 reached end of life in April 2026 and is now deprecated for Actions (GitHub Changelog). Pin to versions that ship a current Node runtime.
- actions/checkout is at v7.0.0. Use
@v5for maximum stability, or@v6/@v7if you want the Node 24 runtime. - actions/setup-node is at v6.4.0. It auto-caches when a
packageManagerfield is present, and supportscache: 'npm'as shown above. - actions/upload-artifact is at v4. Older v3 artifacts were deprecated, so v4 is the current baseline.
Citation capsule: GitHub-hosted runners default to Node 24 from June 2026, and Node 20 reached end of life in April 2026, deprecating it for GitHub Actions (GitHub Changelog). As of June 2026, actions/checkout is v7.0.0, actions/setup-node is v6.4.0, actions/upload-artifact is v4, and @axe-core/playwright is 4.11.3.
How Do You Crawl Many Pages with pa11y-ci?
When you need to check a list of URLs rather than assert single pages in a test runner, pa11y-ci 4.1.1 is the simpler fit. It loads each URL, runs the configured rules, and fails the job when errors exceed your threshold. With no threshold set, it fails on any error count above zero (pa11y-ci). You configure everything in a single JSON file.
Step 1: Add a .pa11yci config
Save this as .pa11yci at the repo root. The defaults block applies to every URL, and threshold sets how many errors are tolerated before the run fails.
{
"defaults": {
"standard": "WCAG2AA",
"timeout": 30000,
"threshold": 0
},
"urls": [
"http://localhost:3000/",
"http://localhost:3000/pricing",
"http://localhost:3000/contact"
]
}Step 2: Add the workflow step
Drop this step into a workflow that has already started your dev server (for example via a background npm run start & step or a service). pa11y-ci reads .pa11yci automatically.
- name: Run pa11y-ci
run: npx pa11y-ciA threshold of 0 is the strictest setting and works well on a greenfield project where every page already passes. On an existing codebase with known issues, a hard zero will block every build, which brings us to the next section.
How Do You Handle Pre-Existing Accessibility Debt?
Turning on a strict gate against a legacy codebase fails the first build and every build after it. The fix is a ratchet: allow a known count of existing violations, then fail only when the count climbs. This lets you stop new debt without forcing a full remediation before the gate ships. Most competitor tutorials skip this and leave teams stuck.
The pa11y-ci threshold ratchet
pa11y-ci has this built in. Set the threshold to your current known error count, commit it, and the build passes as long as you stay at or below that number. When someone introduces a new violation, the count rises above the threshold and the job fails.
{
"defaults": {
"standard": "WCAG2AA",
"threshold": 12
},
"urls": ["http://localhost:3000/", "http://localhost:3000/legacy-report"]
}The discipline is to lower the number as you fix issues, never raise it. Each fixed violation gives you a chance to ratchet the threshold down by one. Over time it reaches zero, and at that point any new failure breaks the build immediately. Teams that practice proper CI testing instead of overlays tend to drive this number down fast, because real fixes stick.
Citation capsule: pa11y-ci fails a job when detected errors exceed the configured threshold, and with no threshold it fails on any error count above zero (pa11y-ci). Setting the threshold to a project's current known error count creates a ratchet that blocks new violations while permitting pre-existing debt to be paid down incrementally.
How Do You Speed Up and Scale the Workflow?
Two changes make a real difference: caching the Playwright browser binaries and running the suite as a matrix. A fresh playwright install downloads browser binaries on every run, which is the slowest step in a typical a11y job. Caching ~/.cache/ms-playwright removes that download once the cache warms. Many tutorials never mention it.
Cache Playwright browser binaries
Add this cache step before the install step. It keys on the Playwright version pulled from your lockfile, so the cache invalidates correctly when you upgrade.
- name: Cache Playwright browsers
uses: actions/cache@v4
with:
path: ~/.cache/ms-playwright
key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
- run: npx playwright install --with-deps chromiumWhen the cache hits, playwright install finds the binaries already present and skips the download. On a warm cache the step finishes in seconds instead of pulling tens of megabytes each run.
Run a matrix over routes
A matrix lets you fan the same test across multiple routes or Node versions in parallel, so a 10-route suite runs as 10 concurrent jobs instead of one serial pass. This is another step most guides leave out.
jobs:
a11y:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
route: ['/', '/pricing', '/contact', '/blog']
steps:
- uses: actions/checkout@v5
- uses: actions/setup-node@v6
with:
node-version: '22'
cache: 'npm'
- run: npm ci
- run: npx playwright install --with-deps chromium
- run: npx playwright test --grep "${{ matrix.route }}"fail-fast: false keeps the other routes running when one fails, so a single broken page does not hide problems on the rest. Pair the matrix with the artifact upload from earlier, and each job produces its own report. That gives you a per-route breakdown rather than one merged result, which is far easier to triage.
What If You Don't Want to Maintain Any of This?
Full disclosure: this section is about a11yFlow, our own product, so weigh it accordingly. Everything above is the DIY path, and it works. But it also means you own the Playwright binaries, the browser cache, the runner, and the reporting. If you'd rather offload that, a hosted scanner runs the same axe-core engine behind a REST API and gives you back a score and remediation guidance per call.
The CI step becomes an HTTP request instead of a browser install. You create a scan, poll until it finishes, and fail the build on the result:
# .github/workflows/a11y-api.yml (managed alternative)
- name: Accessibility scan via a11yFlow
env:
A11YFLOW_KEY: ${{ secrets.A11YFLOW_API_KEY }}
run: |
# 1. Create the scan
id=$(curl -s https://api.a11yflow.dev/v1/scans \
-H "Authorization: Bearer $A11YFLOW_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://staging.example.com"}' | jq -r '.id')
# 2. Poll until the job is finished
until [ "$(curl -s https://api.a11yflow.dev/v1/scans/$id \
-H "Authorization: Bearer $A11YFLOW_KEY" | jq -r '.status')" = "completed" ]; do
sleep 5
done
# 3. Fail the build on critical/serious violations
# (response schema is documented in the API reference)
result=$(curl -s https://api.a11yflow.dev/v1/scans/$id \
-H "Authorization: Bearer $A11YFLOW_KEY")
echo "$result" | jq '.score, .violations'Store your key as the A11YFLOW_API_KEY repository secret. The free tier covers 25 scans a month with no card, which is enough to wire this into a pipeline and decide whether it fits before you pay for anything. The exact request and response fields are in the API documentation.
This does not replace the manual and assistive-technology testing the DIY gate also skips. It is the same automated layer, just hosted.
Frequently Asked Questions
Can automated accessibility testing in CI catch all WCAG issues?
No. Automated detection reaches at most 30% of WCAG success criteria (European Commission, 2025), or about 57% of issues by volume (Deque, 2021). Keyboard operability, screen-reader behavior, and cognitive accessibility still need manual and assistive-technology testing. A CI gate is necessary but not sufficient on its own.
Which tool should I use, axe, pa11y, or Lighthouse?
Pick by layer. Use jest-axe for component unit tests, @axe-core/playwright for full-page E2E assertions, and pa11y-ci to crawl many URLs with a threshold. Lighthouse CI is best for tracking an accessibility score trend rather than as a hard gate. Our tool comparison covers the underlying engines in detail.
How do I make a GitHub Actions build fail on violations?
Two patterns. In Playwright, run await new AxeBuilder({ page }).analyze() and assert expect(results.violations).toEqual([]), which fails the test and the job (Playwright docs). With pa11y-ci, the job fails automatically when errors exceed the configured threshold, or on any error when no threshold is set (pa11y-ci).
How do I avoid the build failing on pre-existing debt?
Use a ratchet. Set the pa11y-ci threshold to your current known error count, so the build passes at or below that number and fails when new violations push it higher (pa11y-ci). Lower the threshold as you fix issues, never raise it, until you reach zero.
Should accessibility tests run on every PR or only on merge to main?
Run them on every pull request. Catching issues during development cuts accessibility spend from around 15% to 5% of project budget (Deque, 2023), while post-production fixes can cost up to 30x more (Microsoft, 2023). PR feedback is where the savings live.
The Bottom Line
A GitHub Actions accessibility gate catches the issues that make up the bulk of real-world failures: low-contrast text, missing alt text, and missing labels together account for the majority of the 96% covered by the top six error types (WebAIM Million, 2026). It does this at the cheapest point in the pipeline, where fixes cost up to 30x less than post-production ones (Microsoft, 2023).
The setup is short. Wire @axe-core/playwright into a workflow on current action versions, assert the violations array is empty, cache the browser binaries, and run a matrix across your routes. For legacy projects, start with a pa11y-ci threshold ratchet so the gate ships today without a full remediation first.
Remember the gate's limits. Automated detection reaches at most 30% of WCAG success criteria, so pair it with manual and assistive-tech testing for the rest. Start with the workflow above, then read our developer's guide to web accessibility testing for the full picture, and our accessibility testing tools comparison when you are ready to choose what runs in each layer.