Weighted Synthetic Controls for Incrementality Testing: The Complete Marketing Guide (2025)

Stop guessing. Start proving. Here’s how weighted synthetic controls make incrementality testing results tight enough to take to your CFO.

Sep 19, 2025

Weighted Synthetic Controls for Incrementality Testing: The Complete Marketing Guide (2025)

Stop guessing. Start proving. Here’s how weighted synthetic controls make incrementality testing results tight enough to take to your CFO.

What is Incrementality Testing? (And Why Most Marketers Get It Wrong)

Incrementality testing answers one question: Would these sales have happened without the ads?

Sounds obvious, but most marketers still rely on platform ROAS to answer it. That’s how you end up with reports that look like a $5 ROAS in-platform, but when you pause spend only half of those sales disappear. The other half were baseline.

That is not fraud. It is overlap, bad attribution windows, and a lack of a counterfactual. Incrementality fixes that by comparing a test group (ads on) to a control group (ads off). The difference is your true incremental lift.

Here’s the catch: most incrementality studies don’t fail because there was no lift. They fail because the control group was wrong.

‍

Why So Many Incrementality Tests Produce Useless Results

The problem is not the math. It is the setup.

Random Splits = Expensive Coin Toss

Too many brands still do this: pick half the states as test, half as control, and call it a study. That is not science. That is a coin toss with a media budget attached.

You cannot throw New York in one bucket, Nebraska in the other, and expect them to line up. Different consumer behavior, different economics, different demand curves. When the results swing wildly, you will have no idea if it was ads or just bad design.

The Matched-Market Myth

“Matched markets” sound smarter. Pick two places that look similar (say Denver and Austin) and run ads in one but not the other. The problem is real life does not care about population size or median income.

What if one region runs a competitor promo mid-test?
What if weather spikes or tanks demand?
What if inventory gets tight in one market and not the other?

One random factor can blow up the entire study. You walk into the boardroom with a big iROAS swing, but you cannot defend it. That is how budget gets cut.

This is why weighted synthetic controls for incrementality testing exist. They fix the control group problem.

‍

What Are Weighted Synthetic Controls? (And Why They Work Better)

A weighted synthetic control builds a “synthetic twin” of your test region. Instead of betting everything on one control market, you create a weighted blend of multiple untreated regions that moves almost identically to your test group in the pre-period.

Example (oversimplified):

40% Region A
30% Region B
30% Region C

Put together, that blend mirrors your test region’s historical revenue pattern almost perfectly. If your test region suddenly outperforms that synthetic twin during the campaign, you can be confident the ads drove the difference.

It is the same principle as diversifying a stock portfolio. Would you bet your entire portfolio on one stock? Of course not. Traditional matched-market testing is exactly that. Weighted synthetic controls spread the risk. One market gets noisy, the others smooth it out.

‍

Understanding Synthetic Control and Causal Inference in A/B Testing
‍

Why Weighted Synthetic Controls for Incrementality Testing Beat Simple Holdouts

Simple “test minus control” math might work in a clean lab. It rarely works in the real world.

Competitor runs a flash sale in your control market → your lift looks inflated.
Weather tanks demand in your test region → your lift looks negative.
Out-of-stocks or promos hit unevenly → the model falls apart.

Weighted synthetic controls fix this by:

Averaging out noise across multiple donor markets
Tightening confidence intervals (so iROAS is not [0.01, 7.0])
Delivering ranges you can act on ([2.75, 4.25] instead of “could be zero, could be infinity”)

This is how you get results that survive CFO scrutiny.

‍

When Should You Use Weighted Synthetic Controls?

Not every test needs them. But when the stakes are high, they are non-negotiable.

Perfect scenarios

Geo or inverse holdouts using states, DMAs, or zips
High-ticket products where every order swings results
Channels you cannot randomize (TV, CTV, retail pushes)
Seasonal businesses where baselines change fast
Long sales cycles where sample sizes stay small

‍

Less critical scenarios

Simple creative A/B tests (randomization works fine)
Email or SMS campaigns where user-level holdouts are easy
Low-budget short runs (<$50k spend)

If you are moving millions in media spend, do not run a test without synthetic controls.

‍

Stella’s Approach: High Correlation First, Then Synthetic Controls

Here is where most platforms mess up. They throw every region into a giant weighted blender and call it a control group. That creates more noise, not less.

At Stella, we do it differently:

Correlation first — we start by finding regions that already move together historically. If two places do not correlate, they are cut. Period.
Synthetic control second — once the correlated pool is set, we layer in weighted synthetic controls to tighten results further.

That two-step process is why our iROAS ranges are tight, often ±0.25 to ±0.75. Instead of “somewhere between 0 and 7,” you get [2.75, 4.25]. Results you can defend.

‍

‍

How to Get Started with Stella Today

Here is how easy it is to test this yourself:

What You Need

The last 120 days of revenue by region, by date
A CSV or Google Sheet

The Process

Sign up for Stella’s free 7-day trial (no credit card required)
Open Stella’s Incremental Tool → Location Selection Artifact
Upload your data
Wait 10–20 minutes for the model to run
Review 3 split options:
- Minimum Investment → cheapest valid design
- Highest Confidence → tightest fit, strongest ranges
- Happy Medium → balance of both

Each option tells you:

Which regions to test and control
Minimum spend required
Expected confidence level

No guessing. No gambling. Just statistically valid test designs you can trust before you even launch. Based on your data.

‍

Why This Approach Outperforms Traditional Tests

Traditional tests often deliver ranges like [0.5, 4.2]. Too wide to make a call.

By starting with correlated markets and layering weighted synthetic controls, Stella typically delivers ranges like [2.1, 3.1]. That is the difference between “we do not know” and “scale it with confidence.”

‍

Case Studies: Real Brands, Real Results

‍

These are studies from Stella clients that returned great results based on MAPE, R-Squared, Statistical Significance, and iROAS Range. Largely due to the proper setup in the location selection and the use of weighted synthetic controls:

Ecom Fashion Brand: Found +2.1 iROAS on TikTok with a tight [1.7, 2.6] range. Scaled 40%, drove $800k incremental revenue.
Healthcare Company: TV incrementality test revealed 25% of pipeline was baseline, not lift. Reallocated $300k to higher-performing channels.
CPG Retail: Region-level synthetic controls showed promos worked 30% better in urban stores. Adjusted rollout, boosted promo iROAS by 18%.

‍

The Future of Incrementality Testing

The direction is clear:

Machine learning will improve donor pool selection and external factor adjustments
Cross-channel testing will measure interactions (for example, how CTV impacts search)
Privacy-first measurement will make geo-based synthetic controls even more critical as cookies disappear

‍

Final Takeaway

Every marketing decision is an investment decision. If your incrementality test spits out a range so wide it is meaningless, you are not running a study. You are rolling dice with your budget.

Weighted synthetic controls for incrementality testing are how you turn inconclusive tests into confident, defensible results. And with Stella, you can set one up in minutes.

Try Stella free for 7 days (no credit card required). Pull your last 120 days of revenue by region, run the model, and get back clean test designs in under 20 minutes.

Stop wasting money on studies you cannot defend. Start running incrementality tests that actually prove what is working.

‍