Smoke Testing vs Sanity Testing (and Regression): What Each One Actually Does
Smoke testing is a broad, shallow check that runs on every new build to confirm the application is stable enough to test. Sanity testing is a narrow, deep check that runs on an already-stable build after a specific fix or minor change to confirm that fix worked. Regression testing is the comprehensive run that sits behind both — verifying that nothing else in the system broke. The three are often used interchangeably in standups, but they answer different questions, run at different points in the pipeline, and have very different automation profiles.
This guide pulls apart the three, with definitions aligned to the ISTQB Glossary, a comparison table, real CI/CD examples, and the automation tradeoffs that matter in 2026.
TL;DR — the three tests in one screen
- Smoke testing answers "is this build alive?" Runs in 15–30 minutes against the critical paths of the entire application, on every new build, as the first quality gate.
- Sanity testing answers "did this fix work, and did it break anything next to it?" Runs in 30–60 minutes against a single modified module, on an already-stable build, after a bug fix or minor change.
- Regression testing answers "did anything else break?" Runs for hours to days against the entire system, after smoke and sanity have passed, usually on a schedule or pre-release.
- Sequence matters. Smoke first, sanity second, regression third — inverting it wastes hours testing builds that were never going to ship.
- Automation fit differs. Smoke tests live in CI and run on every commit. Sanity tests are often manual or lightly scripted because the scope changes with the fix. Regression tests are heavily automated, scheduled, and frequently parallelised.
What is smoke testing?
Smoke testing — also called Build Verification Testing (BVT) or build acceptance testing — is a preliminary set of checks run on a new build to verify that the critical functions of the application work before deeper testing begins. The ISTQB Glossary defines it as "a test suite that covers the main functionality of a component or system to determine whether it works properly before planned testing begins."
The term has a literal hardware origin. When engineers powered up a new circuit board for the first time, they watched for smoke. Smoke meant the board was fundamentally broken — send it back. Software smoke testing inherited the metaphor: if the basic functions fail, don't burn QA hours on deeper exploration. There's a competing origin from plumbing — injecting smoke into pipes to find leaks before pressurising — but the modern usage is identical.
The goal is narrow: confirm the build is stable enough to invest further testing time in. Smoke testing does not aim to find every bug. It asks one question and accepts a binary answer.
What a smoke test typically covers
A smoke test sweeps across the critical user paths at a high level. For a SaaS dashboard, that usually means the app launches, a test user can log in and log out, the primary dashboard loads with its data widgets, core navigation between top-level pages works, and backend services respond on the critical endpoints. For an e-commerce platform the checklist is different — homepage, search, cart, checkout initiation — but the logic is the same. The discipline is to keep the suite at roughly 15 to 30 minutes of run time. A smoke test that takes two hours has become something else.
Who runs smoke tests, and where
Smoke tests are scripted, repeatable, and almost always automated. They live in the CI/CD pipeline as the first quality gate after every new build — and according to ThinkSys's 2026 survey, 89.1% of QA teams have adopted CI/CD pipelines, with the automated smoke suite typically running on every push or merge to the integration branch.
In 2026, the framework choice is increasingly Playwright — independent benchmarks put it at 23% faster than Cypress in sequential execution and 35% faster in parallel mode, with free built-in parallelisation that keeps CI costs roughly 2.5x lower than Cypress Cloud equivalents for high-volume suites. For teams with smaller existing Cypress suites the developer experience often wins out, and there's no urgent reason to migrate.
What is sanity testing?
Sanity testing is a narrow, deep check performed after a bug fix, minor code change, or feature update — focused on the modified area and the parts of the system most likely to be affected by that change. The ISTQB treats it as a focused check on the rationality of a specific modification, distinct from the build-wide check of smoke testing.
It is, importantly, a subset of regression testing — not a sibling of smoke. The two answer different questions even when their checklists overlap. Smoke testing covers breadth on an unproven build. Sanity testing covers depth on a stable build that just had something patched.
The question is: did this change work, and did it break anything adjacent?
If sanity testing fails, the build is rejected and sent back to development before regression is wasted on it. If sanity passes, the change is cleared for the broader regression suite or for release, depending on how late in the cycle the team is.
What a sanity test typically covers
Unlike smoke testing, sanity testing goes deep — but only within a defined blast radius. A developer fixes a bug in discount-code logic, and sanity testing exercises every discount-related flow (code entry, cart recalculation, stacked discounts, expired codes, order summary) plus the adjacent areas most likely to be affected. A team ships fingerprint authentication, and sanity covers enrolment, login, fingerprint removal, fallback to PIN — and confirms that password-based login is untouched. A patch changes how failed payments are handled, and sanity walks every error path: declined card, network timeout, 3DS cancellation, retry behaviour, user-facing copy.
For the e-commerce platform from the smoke example, a post-fix sanity test on the cart might cover adding items, updating quantities, removing items, applying coupons, and verifying totals — running for 30 to 60 minutes focused entirely on that one module.
Who runs sanity tests
Sanity testing is almost always performed by QA engineers rather than developers, and it is often unscripted. The scope shifts with every change — a sanity suite relevant after last week's checkout fix is not the suite you want after today's login refactor. Teams that over-formalise sanity testing into a fixed automated suite usually end up with stale tests or what is effectively a small regression run.
The sustainable pattern is a lightweight checklist per module — a few critical happy paths plus the obvious edge cases — that the tester adapts based on what the developer says they changed.
Smoke vs sanity vs regression: a side-by-side comparison
| Attribute | Smoke testing | Sanity testing | Regression testing |
|---|---|---|---|
| Scope | Broad — entire application | Narrow — specific modified module | Comprehensive — entire system |
| Depth | Shallow | Deep within the focused area | Deep across all features |
| Build state | Run on a new, unproven build | Run on an already-stable build after a change | Run on a build that has passed earlier gates |
| Question answered | Is the build stable enough to test? | Did this change work, and did it break adjacent areas? | Did anything else in the system break? |
| Trigger | Every new build or release candidate | After a bug fix, patch, or minor change | After accepted changes, usually scheduled or pre-release |
| Performed by | Developers and QA, via CI | QA engineers | QA engineers, mostly via CI |
| Scripted? | Scripted and automated | Often unscripted or lightly scripted | Heavily automated |
| Part of | Acceptance testing | Regression testing (a subset) | Regression testing (the whole) |
| Outcome on fail | Build rejected for further testing | Fix returned to development | Bug logged, blocker decision per defect |
| Typical duration | 15–30 minutes | 30–60 minutes | Hours to days |
| Automation fit | Excellent — runs on every CI build | Moderate — scope changes per fix | High — but requires ongoing maintenance |
When to use each — and a real CI/CD walkthrough
Picture the three as a funnel. Smoke is the wide opening that filters out fundamentally broken builds fast. Sanity narrows the focus to recent changes. Regression provides full coverage once the build has cleared the earlier gates. Inverting that order is the most common structural mistake teams make — running sanity before smoke regularly wastes hours on builds that were never going to pass a basic stability check.
A walked-through 2026 CI/CD example
Consider a Series B fintech with a 12-person QA team shipping a payments dashboard. Their pipeline looks roughly like this:
On every pull request: developers push code, CI builds the app, and an automated Playwright smoke suite runs against the artifact — login, dashboard render, transactions list, balance widget, primary navigation. Fifty critical-path assertions sharded across eight parallel workers finish in roughly six minutes. If smoke fails, the pipeline halts and the PR is blocked.
On every merge to staging: the smoke suite reruns against the staging deployment. If the PR fixed a known bug, the on-call QA engineer runs a sanity suite focused on the affected module — for a discount-code fix, the 18 manual and lightly scripted checks they keep in a Notion checklist for the cart. About 45 minutes of focused work.
On a nightly schedule: the regression suite runs against the latest accepted staging build. Three hundred tests, 30 minutes of wall-clock time thanks to free parallelisation. Failures are triaged the next morning.
On every release candidate: smoke and the full regression suite run one more time against production-like infrastructure. Sanity is skipped unless a last-minute hotfix went in.
This separation keeps the feedback loop tight during the day and reserves the expensive comprehensive runs for off-hours. According to SmartBear's 2024 State of Testing report, 73% of teams automate their regression suites, but only 45% have formalised smoke testing gates in CI/CD. That gap is the single highest-impact place most teams can tighten.
Automation suitability — what works, what doesn't
Not all three tests automate equally well, and treating them as if they do is an expensive mistake.
Smoke tests are the ideal automation candidate
They are scripted, deterministic, and they re-run identically on every build. They cover stable critical paths that change slowly, and they produce a clear binary outcome. Every property that makes a test automation-friendly applies to smoke.
The hard part is keeping the suite fast. Cap it at the time budget you've agreed — usually 10 to 15 minutes for the parallelised run — and ruthlessly cut tests that creep in over time. A smoke suite bloated into a mini-regression is one that no longer gates anything.
Sanity tests are awkward to automate
The scope shifts with every change, so a fixed automated sanity suite tends to either drift out of relevance or grow into module-level regression. The patterns that work in practice are a per-module checklist maintained by QA and executed manually with consistent steps, a library of small targeted scripts that QA picks from based on what changed, or exploratory testing by an engineer who knows the module with the bug report and fix diff in hand. If your team insists on automating sanity, scope it tightly — the goal is "verify this specific fix" not "verify this entire module from scratch."
Regression tests are heavily automated — and heavily maintained
This is where the AI-assisted testing tools of 2026 earn their money. Mabl's Adaptive Auto-Healing claims to reduce test maintenance by 85%, and Testim's smart locators have cut flaky tests by up to 70% in customer reports. For a regression suite of a few hundred to a few thousand tests, that maintenance reduction is usually the deciding factor in tool selection. For framework choice see our Selenium vs Playwright vs Cypress 2026 comparison; for broader tooling, the best AI testing tools of 2026 round-up.
Common mistakes that cost teams real time
Treating smoke and sanity as the same test. Different scope, different depth, different owner, different position in the pipeline. Conflating them produces a checklist that does neither job well.
Running sanity before smoke on a new build. A clean sanity pass on a build that fails smoke means nothing — the build is broken further up.
Bloating the smoke suite. A smoke test that grew from 30 critical-path checks to 300 every-edge-case checks no longer gates anything in a useful time window. Cut aggressively.
Skipping automation on smoke. Manual smoke testing on every CI build is a tax the team is paying for no reason.
Treating sanity as purely informal. Unscripted does not mean undocumented. A one-line note on what was verified — and what was not — is the difference between a sanity pass that has value and one that papers over a missed regression.
Letting regression grow without pruning. A regression suite that takes six hours to run is one that no one runs on PRs. Flake detection, deduplication, and tiered suites keep it usable.
Where bug capture fits — and the new bottleneck
The cleaner the three test types are run, the more clearly the actual bottleneck shows up: not the test that found the bug, but the time spent reporting it. Test execution speed has improved roughly 5x over the past decade — Playwright running a 100-test suite in 9–12 minutes is the headline example — while the time from "I see a bug" to "developer can reproduce it" has barely moved.
Smoke and sanity are where this pinch hurts most. Both fire early. Both produce findings that are blocking by definition — a failed smoke test stops the pipeline, a failed sanity test sends the build back. Slow bug reporting on those failures multiplies through every downstream activity.
Crosscheck is a free Chrome extension built for that pinch point — it captures screen recording, console logs, network requests, and user actions automatically and pushes a fully-formed ticket into Jira, Linear, ClickUp, GitHub, or Slack. For a deeper look at what makes a bug report reproducible, see the perfect bug report template or the best bug reporting tools of 2026 round-up.
FAQ
Is sanity testing the same as smoke testing?
No. Smoke testing is a broad, shallow check on a new build to confirm overall stability. Sanity testing is a narrow, deep check on an already-stable build after a specific change. The ISTQB Glossary treats smoke as a build verification check and sanity as a focused check on a specific modification.
Is sanity testing a part of regression testing?
Yes. Sanity testing is a subset of regression testing — both verify that changes haven't broken existing functionality, but sanity focuses on a narrow, recently-modified area while regression covers the entire application. Sanity is what teams run when they can't justify the full regression suite for a small, targeted change.
Should smoke tests be automated?
Almost always. Smoke tests are scripted, deterministic, and run on every build — every property that makes a test automation-friendly. Playwright and Cypress are the most common frameworks in 2026, with Playwright leading on cost-efficient parallelisation for high-frequency CI.
How long should a smoke test take?
Aim for 15–30 minutes of execution, ideally compressed to under 10 minutes with parallelisation. Once a smoke suite passes the half-hour mark, it has typically drifted into mini-regression territory and no longer serves as a fast gate.
Can the same test case appear in both smoke and sanity suites?
Yes, but the framing differs. A login-works check might appear in smoke as one of many breadth checks across the app, and again in sanity after a login bug fix as one of several depth checks on the login module. Same test step, different role in the pipeline.
What runs first — smoke, sanity, or regression?
Smoke first, sanity second, regression third. Smoke gates whether the build is worth testing at all. Sanity confirms specific changes work. Regression verifies the rest of the system still works around them.
Start capturing smoke and sanity bugs faster
Whichever framework runs your smoke suite and whichever checklist drives your sanity passes, the bugs both surface still have to land in Jira, Linear, or ClickUp with enough context for a developer to reproduce them. That handoff is where most release cycles quietly lose time. Crosscheck installs in seconds, captures the full context behind every bug into a single ticket, and stays out of the way. No paid tiers, no usage limits.



