Agile Regression Testing Checklist: Per-PR, Sprint, Release

Written By  Crosscheck Team

Content Team

May 15, 2025 11 minutes

Agile Regression Testing Checklist: Per-PR, Sprint, Release

A Copy-And-Use Regression Testing Checklist for Agile Teams

An agile regression testing checklist is a fixed, written set of verifications that runs against every change to confirm nothing previously working has broken. The version below is organised into nine layers — smoke, critical path, integration, cross-browser, mobile responsive, accessibility, API contract, performance, and release notes — with timing guidance for which run per PR, which per sprint, and which gate a release.

TL;DR — what runs when:

  • Per PR merge to main: smoke pass + scoped critical path (5–15 minutes, automated).
  • Per sprint, before deploy: integration + cross-browser + mobile + a11y + API contract + perf smoke (1–3 hours, mixed).
  • Per release, after deploy: full critical path on prod + a11y check + release notes verification (30–60 minutes).
  • Same checklist; the scope changes by trigger.

What an Agile Regression Checklist Has to Do

Regression testing is the work that consistently gets compressed in a two-week sprint. Feature testing has a clear owner — the engineer who built it. Regression has no owner by default, so it expands or contracts to fit the time left.

A good checklist defines scope in writing so "did we regress anything?" has a verifiable answer; separates layers by cost so smoke runs every push and accessibility audits gate the release; and ties each layer to a trigger so nobody negotiates at sprint-end whether the cross-browser sweep happens this week.


The Nine-Layer Regression Checklist

Each layer states what it checks, when it triggers, and a copy-ready list of items. Treat them as a baseline — add to the list every time a regression slips through that the current version did not catch.


Layer 1: Smoke Pass

What it checks: the build boots, the app loads, and the most basic interaction surface responds. If smoke fails, no further regression work happens until the build is restored.

When it triggers: every PR merge to main, automated, under 10 minutes in CI.

  • Build deploys to staging or preview without errors
  • Landing page loads under 5 seconds on a clean session
  • No JavaScript errors in the console on first paint
  • Primary navigation renders and all top-level links return a 200
  • Login accepts valid credentials and lands the user on the authenticated home
  • Health-check endpoint returns 200 with all dependencies healthy

Anything past ten minutes belongs in the next layer.


Layer 2: Critical Path

What it checks: the handful of end-to-end flows that, if broken, produce a severity-one incident regardless of how minor the change. For B2B SaaS, that is usually sign up → onboard → first value action. For e-commerce, browse → cart → checkout.

When it triggers: scoped to the PR diff per merge, full sweep per sprint, and again per release against production after deploy.

  • New user can register, verify email, and complete onboarding
  • Existing user can log in via every supported method (password, SSO, OAuth)
  • Password reset email arrives, link is valid, new password takes effect immediately
  • The most important workflow in your product completes end-to-end with realistic data
  • Multi-step flows allow backward navigation without losing entered data
  • Confirmation states (order placed, account created) display and trigger expected emails or webhooks
  • Logout invalidates the session — the back button does not restore access
  • Role and permission changes take effect at the next navigation unless re-login is documented

Keep the full sweep under twenty minutes of automated runtime; past that, split into per-PR and per-sprint subsets.


Layer 3: Integration

What it checks: the boundaries between your code and third-party services. A webhook schema update from a payment processor or a deprecated OAuth scope can break your app with no visible change in your own codebase.

When it triggers: per sprint, before release. Payments and auth also belong in the per-PR smoke pass for any PR touching them.

  • Test-mode payment completes and updates the user's subscription or order state
  • Declined payment shows the correct error and does not charge the user
  • Subscription create, upgrade, downgrade, and cancel update entitlements correctly
  • All configured OAuth providers (Google, Microsoft, GitHub, Apple) complete the authorisation handshake
  • Transactional emails (welcome, reset, receipt, invoice) arrive with correct content and working links
  • Inbound webhooks are received, validated, and processed without errors
  • Outbound webhook payloads match the documented schema
  • Failed webhook deliveries retry per policy and surface in the dashboard if exhausted
  • Analytics events fire exactly once on signup, conversion, and key milestones — and not at all on failures

Integration regressions are silent in your logs and loud in support tickets two days later. Treat them accordingly.


Layer 4: Cross-Browser

What it checks: the app renders and behaves consistently across the browsers your users actually use. The 2026 baseline for most products is Chrome, Safari, Firefox, and Edge on the latest two stable versions. Opera and Samsung Internet are usually optional unless telemetry shows real traffic.

When it triggers: per sprint, with a scoped pass per PR for changes touching CSS, layout, or browser-specific APIs (clipboard, file system, payment request, web push).

  • Layout renders correctly in Chrome, Firefox, Safari, and Edge (latest two stable versions)
  • No JavaScript errors in the console in any supported browser
  • Web fonts load without visible FOUT in Safari
  • CSS features with known cross-browser gaps (subgrid, :has(), container queries) behave per spec or fall back gracefully
  • Browser-specific APIs (clipboard, web share, payment request, push) work or degrade gracefully where unsupported
  • Form autofill works in each browser — especially Safari's password manager
  • PDF, video, and image rendering looks correct, including HEIC/AVIF support gaps

Chromium-only automation needs a manual sweep here — or a Playwright suite across all three engines (Chromium, WebKit, Firefox), the lowest-friction option in 2026. See Selenium vs Playwright vs Cypress for a fuller framework comparison.


Layer 5: Mobile Responsive

What it checks: the app looks and behaves correctly at real-world viewports. Mobile accounts for more than half of global web traffic, and a regression at 390px (iPhone) or 393px (Pixel) is invisible from a desktop laptop until customers complain.

When it triggers: per sprint, scoped per PR for any change touching layout, navigation, or breakpoints.

  • Layout holds at 360px, 390px, 768px, 1024px, and 1280px viewports without horizontal scroll
  • No content is clipped, hidden behind fixed headers, or pushed below the fold
  • Tap targets meet WCAG 2.2's 24x24 minimum; 44x44 still recommended by Apple HIG and Material
  • Mobile navigation (hamburger, drawer, bottom bar) opens, closes, and traps focus correctly
  • Forms submit cleanly — keyboard does not obscure submit buttons, autocomplete works, input types are correct (tel, email, number)
  • Sticky and fixed elements do not jitter on scroll, especially on iOS Safari
  • Hover-only interactions have a tap-or-focus equivalent
  • Landscape orientation does not break the layout

Real-device testing on one iOS and one Android device beats emulators every time — emulators miss the URL-bar collapse, keyboard overlay, and autofill quirks that produce most mobile regressions.


Layer 6: Accessibility Quick Check

What it checks: baseline accessibility, as a quick check rather than a full audit. It catches the regressions a single PR can introduce — a button that lost its accessible name, a colour tweaked below contrast, a focus outline removed by a CSS reset.

When it triggers: per sprint as a baseline, again per release as a gate. With the European Accessibility Act in force since 28 June 2025, accessibility regressions on products sold into the EU carry direct legal risk.

  • Automated axe-core or Pa11y scan returns zero serious or critical violations on the top ten routes
  • Interactive elements have visible focus indicators meeting WCAG 2.2 focus-appearance criteria
  • Text meets WCAG AA contrast: 4.5:1 for body, 3:1 for large text and UI components
  • Images and icons have correct alt attributes or aria-hidden where decorative
  • Forms have an associated <label> or aria-label for every input
  • Keyboard navigation reaches every interactive element in visual order
  • Modal dialogs trap focus while open and restore focus to the trigger on close
  • Screen reader smoke test on one critical flow with VoiceOver or NVDA

For deeper manual coverage and assistive-tech testing, the accessibility testing checklist for WCAG 2.2 goes layer by layer; Axe vs WAVE vs Pa11y compares the three scanners most teams plug into CI.


Layer 7: API Contract Verification

What it checks: that API responses still match the contract clients expect. A renamed field, a removed enum value, a status code changed from 200 to 204 — any of these can break a mobile client, a partner, or your own front-end without an obvious server-side error.

When it triggers: per sprint, and per PR for any change touching the API layer or the schemas it serialises.

  • OpenAPI or GraphQL schema validates against the spec — no removed fields, changed types, or renamed routes without deprecation
  • Contract tests pass against a golden response set for the top twenty endpoints
  • Versioned endpoints continue to serve their documented contract — no new fields leaking into old versions
  • Error envelopes match the documented shape (e.g. { error: { code, message, details } })
  • Pagination metadata (total, next_cursor, has_more) behaves consistently across listing endpoints
  • Authentication errors return 401, authorisation 403, rate-limit 429 with Retry-After
  • Idempotency keys behave correctly — repeated requests produce the same result
  • Webhook payloads to subscribers match the schema published in developer docs

Pact, Dredd, or a golden-file diff all work — the point is that the contract gets enforced in CI rather than discovered by a partner two weeks later.


Layer 8: Performance Smoke

What it checks: that recent changes did not silently regress core performance. Performance regressions rarely fail outright — they make everything slightly slower until, a few sprints later, the app feels sluggish and nobody can point at the PR that did it.

When it triggers: per sprint, with Core Web Vitals tracked per PR via Lighthouse CI or a synthetic monitor.

  • LCP under 2.5 seconds on a simulated mobile 4G connection
  • INP under 200 milliseconds at the 75th percentile (replaced FID in March 2024)
  • CLS under 0.1 — no elements shifting after first paint
  • JavaScript bundle size has not grown past the agreed budget (commonly 5–10%) without documented reason
  • No render-blocking scripts added to the critical path
  • New images compressed and served as WebP or AVIF with srcset
  • API p50 and p95 response times for the top ten endpoints have not regressed against last sprint's baseline
  • Database query count per page has not grown — watch for accidental N+1s from ORM changes

The three Core Web Vitals thresholds above remain Google's current ranking-signal thresholds heading into 2026. INP is still the most commonly failed of the three. For deeper diagnosis, Chrome DevTools performance auditing walks through how to localise the culprit.


Layer 9: Release Notes Verification

What it checks: the release notes match what actually shipped — and that things they do not mention did not sneak through. Most teams skip this and pay for it with embarrassing release-day issues: a feature flag still off, a copy change that did not deploy, a setting defaulted differently than documented.

When it triggers: per release, after deploy to production, before the announcement goes out.

  • Every item in the release notes is verifiable in production by an outside observer
  • Feature flags listed as "enabled" are actually enabled in production
  • Migration scripts have run and the production schema matches expectations
  • Configuration changes (rate limits, retention, email templates) are live and match documentation
  • No undocumented breaking changes — diff the public API surface against the previous release
  • Documentation, in-app help, and developer references reflect the new state
  • Status page or changelog entry is published and matches what shipped
  • Rollback procedure verified — previous build tagged and rollback tested in staging this week

If support sees something on day one that contradicts the release notes, you want it caught before a customer does.


Per-PR vs Per-Sprint vs Per-Release Slices

The same nine layers, scoped differently by trigger, give you a workable cadence without three separate checklists.

TriggerLayers that runTarget runtimeMostly
Per PR merge to mainSmoke + scoped critical path + scoped contract5–15 minutesAutomated
Per sprint, pre-releaseIntegration + cross-browser + mobile + a11y + contract + perf1–3 hoursAutomated + exploratory
Per release, post-deployFull critical path on prod + a11y + release notes30–60 minutesManual + smoke automation

"Scoped" in the per-PR row means only the parts that intersect with the diff. A CSS-only PR gets a scoped cross-browser and mobile pass; a serializer change gets a scoped contract pass.

The per-sprint slice is where automation earns its keep. If your Playwright or Cypress suite cannot finish it in three hours, speed it up before adding more tests — a regression suite slower than the sprint cycle does not protect the sprint cycle.

Teams running continuous deployment compress further: per-PR expands to include cross-browser, mobile, and contract in under fifteen minutes of CI, per-sprint batches weekly, and per-release splits across every deploy. The trade-off is automation maturity — a red build has to mean a real bug, not flake.


Where Crosscheck Fits

The documentation overhead lives in this cycle. You find a regression in layer 5 — a button off-screen on a 390px viewport — and now you need a report the developer can act on without follow-up: reproduction steps, exact viewport, browser version, console errors, network calls, screenshot. Under sprint-end pressure, that report gets abbreviated.

Crosscheck captures that evidence automatically. It is a Chrome extension built for the bug-reporting moment in the QA loop: one click captures a screenshot or session recording, the full browser console, every network request with payloads and response codes, and the environment (OS, browser, viewport, URL). The report goes straight to Jira, Linear, ClickUp, GitHub, or Slack with the technical detail attached. For a richer report structure see the perfect bug report template.

Try Crosscheck free


FAQ

What is the difference between a regression test and a smoke test?

A smoke test is a fast check that the build is alive — login works, the app loads, the database is reachable. A regression test verifies that specific previously-working behaviour still works after a change. Smoke is layer 1; regression is layers 2 through 9. See smoke testing vs sanity testing for the fuller breakdown.

How often should agile teams run regression tests?

At three triggers: per PR (smoke + scoped critical path), per sprint (the full checklist), and per release (a focused post-deploy pass on production). Frequency scales with deploy frequency.

What should be automated first?

The critical path, in this order: authentication, the most important workflow your product serves, payment or conversion actions, and the API contract for your top-ten endpoints. These are the tests you would least want to skip on a slow sprint.

How long should a sprint regression cycle take?

For a two-week sprint, the per-sprint slice should land in 1–3 hours of mixed automated and exploratory work. If it is taking a full day, either the automation is too slow, the manual scope is too wide, or per-release work has crept in.

Do I need a different checklist for hotfixes?

No — use the same nine layers, scoped to the change. A hotfix gets the smoke pass plus the layers touching the affected area, plus a critical-path check of surrounding functionality.


Start Tightening Your Regression Cycle

This is the checklist the Crosscheck team would hand a new QA hire on day one. Copy it into your repo, prune what does not apply, and add to it every time a regression slips through that the current version did not cover. The value is that it exists.

Try Crosscheck free

Related Articles

Contact us
to find out how this model can streamline your business!
Crosscheck Logo
Crosscheck Logo
Crosscheck Logo

Speed up bug reporting by 50% and
make it twice as effortless.

Overall rating: 5/5