Incrementality testing is not always the right move. Learn when geo holdouts fail, real benchmark data, and how to get decision-grade results.

Incrementality testing is not automatically smart. Sometimes it is the cleanest way to find the truth. Sometimes it is just an expensive way to stay confused.
Here is the short version: without enough spend, conversion volume, and test control, a geo holdout wastes time, kills signal, and hands you a result you cannot act on.
That is the part most guides skip.
Incrementality testing measures what your marketing actually caused, not what your dashboard claimed.
Your platform says a campaign drove 400 purchases. Incrementality testing asks: how many of those would have happened anyway?
Google's geo-based Conversion Lift runs this as a controlled experiment. It splits comparable geographic groups into exposed and unexposed, then measures the difference in conversions between them. That difference is your true incremental impact.
Attribution tells you correlation. Incrementality tells you causation. They are not the same number, and they are rarely even close.
Click-based attribution has three structural problems.
It over-credits bottom-funnel channels. Retargeting and branded search capture people who were already going to buy. Attribution calls that a win. It is not.
It ignores upper-funnel entirely. Channels that create demand but do not get the last click look useless in most dashboards. They are not useless. They are just invisible to last-click models.
It cannot measure what it cannot see. Offline, OTT, and any channel without a clean click trail simply does not exist in last-click reporting.
Traditional media mix modeling helps, but has its own problem: it runs on historical correlations and takes weeks to update. By the time it tells you something useful, the media cycle has moved.
Incrementality testing fills the gap. It gives you a causal read, not a correlated one. Across 225 geo-based tests in the 2025 DTC Incrementality Benchmark Study, platform-reported ROAS and actual iROAS were consistently different numbers, often by a wide margin.
Attribution was already imperfect. Privacy changes made it significantly harder.
Apple's App Tracking Transparency framework, rolled out in 2021, removed the IDFA for users who opt out. Most do. That wiped out a large chunk of the signal iOS-based attribution depended on, particularly for mobile-heavy DTC brands.
Google's Privacy Sandbox is still in progress, but the direction is clear: third-party cookies are going away. The measurement infrastructure that last-click attribution was built on is being dismantled piece by piece.
What that means practically: the gap between what your platform reports and what actually happened is getting wider, not narrower. Channels look less effective than they are. Or more effective, depending on which signals survived. Either way, you are making budget decisions on incomplete data.
This is not a reason to panic. It is a reason to measure causally. Geo-based incrementality testing does not rely on user-level tracking. It measures outcomes at the population level, which makes it more durable as signal degrades. That is one of the reasons it has become the measurement method of choice for brands that want answers they can actually trust.
Incrementality testing fails when the cost of the answer is higher than the value of the answer.
That happens in three situations.
1. You do not have enough volume. Lewis and Rao's foundational study on ad measurement analyzed 25 large-scale field experiments across major U.S. retailers. The median confidence interval on ROI was over 100 percentage points wide. Informative experiments can require more than 10 million person-weeks of data. That is for large advertisers. Smaller brands should read that as a warning, not a challenge.
2. You cannot keep the business stable. A geo holdout needs a clean test environment. Promos, creative changes, inventory issues, or mid-test campaign adjustments contaminate the read. Once that happens, you are not answering one question. You are answering five noisy ones.
3. You have contamination risk. Google's documentation explains that contamination happens when someone is exposed to ads in a treatment region but converts in a control region, or vice versa. It lowers the estimated difference between groups and reduces reported incrementality. A contaminated test gives you a number. It does not give you the truth.
Before you spend a dollar on a live test, run through this.

Google labels geo test feasibility as high, medium, or low, and explicitly says low-feasibility tests should not be run. Medium means proceed with caution. Most teams ignore this and launch anyway. That is why they get directional noise instead of a decision.
Across 225 geo-based tests in Stella's 2025 DTC Incrementality Benchmark Study, pre-test fit quality was the strongest predictor of success. It outperformed budget size and test duration. Setup quality matters more than spending.
A good incrementality result is not just a lift number. It is a lift number you can trust enough to change a budget.
From the 2025 DTC Incrementality Benchmark Study, which analyzed 225 geo-based tests run between August 2024 and December 2025:
That last point is the one most teams miss. A well-designed test at modest spend beats a sloppy test at high spend every time.
Google's Conversion Lift reporting shows iROAS as a point estimate with a confidence interval, for example 2.2x with a range of 1.3x to 3.5x. If you cannot act on the full range of that interval, the test did not give you a decision. It gave you a number to argue about.
Not ready for a geo holdout does not mean not ready to measure. It means use the right tool for your stage.
Always-On Incrementality runs continuous measurement instead of one-off tests. Budget decisions are based on what is true right now, not what a study found last quarter. For most DTC brands, this is the right starting point. You build a baseline of signal over time instead of betting everything on one make-or-break experiment.
Bayesian MMM gives you cross-channel budget guidance calibrated with real holdout data. It is useful when you need to understand how channels work together, not just whether one channel works at all.
One more thing: fix the obvious problems before you test anything. Bad creative, a weak offer, broken tracking, and unstable campaigns do not become insightful because you wrapped them in an experiment. You will just get an expensive confirmation that something was broken.

Incrementality testing is powerful when the question is sharp, the setup is clean, and the result will actually change a decision. When those conditions are missing, a geo holdout is not sophistication. It is self-inflicted noise.
Is incrementality testing only for big brands? No. But smaller brands need to be more selective. Volume requirements are real, and the math gets punished fast at low scale. Always-On Incrementality is often the better entry point.
What is the difference between iROAS and ROAS? ROAS is what the platform reports. iROAS is the incremental return: how much revenue your spend actually generated beyond what would have happened without it. iROAS is almost always lower than platform ROAS. That gap is how much your attribution has been flattering you.
How long does a geo holdout test take? Long enough to reach statistical significance at your conversion volume. There is no universal timeline. Setting your minimum detectable iROAS before launch, as Google recommends, tells you whether your planned test duration is realistic before you start.
What is better than a one-time geo holdout? For most brands at most stages, Always-On Incrementality gives you continuous signal without the setup complexity and holdout revenue risk of a one-off study.
How do I know if a lift result actually matters? You decide that before the test starts. Define the minimum iROAS that would justify a budget change. If the result does not clear that bar, the test is not a win. Across Stella's benchmark data, the brands that got actionable results were the ones that defined "actionable" before they launched.
Want to know if your brand is actually ready for incrementality testing?
See how Stella's measurement stack works.
.png)