Google Ads Incrementality Test with Meridian GeoX: Setup Guide

Table of contents

To run a Google Ads incrementality test with Meridian GeoX, pick one campaign, choose a design (holdback, go dark, or heavy up), split your geographies into treatment and control markets, align Google Ads spend with backend revenue by region, run the test long enough to detect real lift, calculate iROAS, then feed the result into Meridian MMM as a Bayesian prior. GeoX is not available to everyone yet. Run the framework now and port it into GeoX when it ships.

‍

Executive Summary

Google previewed Meridian GeoX on May 5, 2026, an open-source, publisher-agnostic geo experimentation tool that feeds Bayesian priors into Meridian MMM. Testing begins later in 2026 and Google hasn't said when it will be available to everyone.

GeoX is not the same as Google's Conversion Lift. Conversion Lift measures Google Ads inside Google. GeoX measures any channel against backend revenue you control. The methodology isn't even new, Google's trimmed_match and matched_markets repos have been on GitHub for years. What's new is the packaging, the name, and the explicit tie to Meridian.

Open source moves the cost from license to labor. Running GeoX in production still requires data pipelines, an analyst, and contamination monitoring. If your measurement team is one performance marketer, a managed platform is cheaper than self-hosting.

Across 225 geo tests on Stella's platform, the median iROAS was 2.31x with 88.4% reaching significance. Brands running tests today will plug into GeoX immediately when it ships. Brands waiting will still be learning the basics in 2027.

The rest of this post is the runbook.

‍

What is Meridian GeoX?

GeoX is Google's open-source tool for running geographic incrementality experiments. It supports three test designs: holdback, go dark, and heavy up. Results convert directly into Bayesian priors that calibrate Meridian MMM. GeoX is publisher-agnostic, so it can measure Meta, TikTok, podcasts, or any channel by geography, not just Google Ads. Testing begins later in 2026.

The mechanic is the same as any geo holdout. Split geographies into a treatment group and a control group, change media exposure in one, and measure the gap. GeoX adds structure around three specific designs:

Holdback: keep ads active in most regions, pause them in a small set of holdout regions
Go dark: pause media entirely in selected regions and measure the drop
Heavy up: increase spend in selected regions and measure the gain

GeoX also runs multiple treatments against a shared control in the same study, which makes multi-cell tests cheaper. The output feeds into Meridian as priors, so the MMM learns from real causal evidence instead of inferring impact from correlation.

A useful frame: Conversion Lift is the platform handing you a lift number. GeoX is the methodology, open and inspectable, that lets anyone produce one.

iROAS distribution across 225 Stella geo tests

DTC advertising incrementality benchmarks, August 2024 to December 2025

225

Total tests

2.31x

Median iROAS

88.4%

Reached significance

Test count

Interquartile range (1.36x to 3.24x)

Median (2.31x)

Source: Stella 2025 DTC Digital Advertising Incrementality Benchmarks. 225 geo-based tests across DTC brands, August 2024 to December 2025.

Can you run GeoX today?

Not yet. GeoX entered testing in late 2026 and Google hasn't published a launch date. You can read the docs, look at the underlying trimmed_match and matched_markets repos, and run geo holdouts now using your current setup so you have baseline tests ready when GeoX ships.

The brands that get value from GeoX on day one are the brands already running disciplined geo tests today.

‍

How to run a Google Ads incrementality test with Meridian GeoX

Nine steps. None are optional.

Step 1: Choose the campaign

Pick one campaign worth testing. The campaigns where platform ROAS and iROAS diverge most are usually:

Branded search. Most conversions would have happened anyway. iROAS often lands 60-80% below platform ROAS.
Performance Max. Mixes branded search, shopping, YouTube, and Display, so platform attribution is opaque. See the PMax-specific guide.
YouTube and Demand Gen. Long conversion windows, weak last-click attribution. Most likely to be undercredited.
Non-brand search. Cleanest test for whether the channel acquires new customers.

One test answers one question.

Step 2: Pick the test design

Design	What changes	Best for	Risk
Holdback	Pause ads in a subset of regions	Lowest revenue risk	Smaller signal, needs more geos
Go dark	Pause the channel entirely in selected regions	Cleanest signal	Forfeits revenue during the test
Heavy up	Increase spend in selected regions	Testing whether more spend produces more lift	Requires extra budget

Holdback is the default for most DTC brands. Go dark is right when you suspect a channel is barely incremental. Heavy up is right when you've validated a channel and want to test scaling.

Step 3: Select treatment and control markets

This is the single biggest predictor of significance. Across Stella's 225-test benchmark, the 11.6% of tests that didn't reach significance almost always failed on pre-period matching, not sample size.

Good matching means:

Similar revenue baseline in the 90 days before the test
Similar seasonality over the past 12 months
Similar media spend share across all channels
Similar daily conversion volume
Geographic separation (don't pair Manhattan and Brooklyn)

Most teams use DMAs as the geographic unit. Brands with retail footprints use trade areas. Pick what matches how your business runs.

Step 4: Prepare the data

Before launch, you need this dataset joined and clean:

Daily revenue by geo (from Shopify or your CRM, not Google Ads)
Daily spend by geo across every paid channel
Daily orders by geo
Promo calendar with start and end dates
Inventory outages on revenue-driving SKUs
Major site changes during the test window
Campaign IDs and geographic targeting settings
Treatment vs control assignment per geo
Contribution margin

Critical: outcome data has to come from your backend, not Google Ads. Platform conversions are filtered by Google's attribution model, which is the thing you're testing against. Using them as your outcome variable defeats the experiment.

Step 5: Configure Google Ads location settings

This is where most geo holdouts get contaminated.

Google Ads has two location targeting modes:

"Presence or interest" (the default): shows ads to people in, regularly in, OR showing interest in a location. That last category leaks ads into control regions whenever someone searches "best running shoes Seattle" from a control market.
"Presence": shows ads only to people physically located in the targeted regions.

For a geo holdout, you almost always want Presence. See Google's docs on advanced location options for the mechanic.

Other contamination sources:

Search partners and Display Network can override geographic restrictions. Disable both.
PMax campaigns reallocate budget across geographies dynamically. See the PMax guide.
Commuting zones can leak exposure across DMA boundaries. Avoid pairing adjacent markets.

Step 6: Run the test

Brand size or channel	Recommended duration
High-volume DTC ($10M+/year)	2 to 4 weeks
Mid-market ($1M to $10M/year)	4 to 6 weeks
YouTube, Demand Gen, upper funnel	Add 2 weeks post-treatment

Run length should be driven by statistical power, not calendar habit.

Pre-launch QA checklist:

Treatment and control markets are statistically similar in the pre-period
Control regions are fully excluded in campaign settings
Location targeting is set to "Presence"
Other channels are stable (no concurrent test on Meta or TikTok)
Promo changes during the test window are documented
Backend revenue tracking confirmed working by geo

Step 7: Calculate lift and iROAS

Three formulas:

Incremental revenue = Treatment revenue − Expected revenue (what treatment would have earned without the ads)

Expected revenue is where synthetic controls come in. They use a weighted combination of control markets to model what treatment would have done without the campaign. See Stella's guide to weighted synthetic controls.

iROAS = Incremental revenue / Incremental ad spend

Incremental profit = (Incremental revenue × Contribution margin) − Media cost

iROAS without contribution margin is a vanity number. A 2x iROAS is excellent at 60% margins and unprofitable at 30%. The decision is profit, not the ratio.

Step 8: Feed the result into Meridian MMM

This is what GeoX changes most. The output of a GeoX test becomes a Bayesian prior in Meridian MMM, which means the model learns from your real experiment instead of guessing from correlation.

How calibration changes MMM accuracy

Stella MMM forward-test accuracy, with and without iROAS calibration

Hover each bar for methodology notes. Feeding real experiment results into the MMM as priors improves forward-test accuracy by 8 percentage points.

Source: Stella MMM benchmark. Forward tests measured on holdout periods the model has never seen, across Stella's customer base.

In plain English, the MMM stops pattern-matching on historical data and starts anchoring its estimates to causal evidence. Stella's MMM runs 87% accurate on average in forward tests, and 95% when calibrated with iROAS from real experiments.

The eight-point gain matters because budget decisions live downstream of the model. An MMM at 87% misallocates a meaningful share of every quarter's spend. An MMM at 95% misallocates less.

Most teams don't have a single causal experiment to calibrate their MMM with. GeoX makes that gap embarrassing instead of invisible. For more, see Getting Started with MMM Using Google Meridian.

Step 9: Make the budget decision

‍

The test should lead to an action.

Result	Action
High platform ROAS + low iROAS	Cut or cap spend. The platform is taking credit for conversions that would have happened anyway.
Low platform ROAS + high iROAS	Scale. This is an undercredited growth channel and you've been under-investing.
High iROAS + high volume	Scale aggressively, then re-test at the higher spend level to check for diminishing returns.
Low iROAS + low volume	Reduce or kill. The channel isn't creating demand at scale.
Inconclusive	Rerun with better market design, longer duration, or a more sensitive KPI.

The worst outcome is running the test, sharing the result, and changing nothing.

‍

GeoX vs Conversion Lift vs manual geo holdout

	Google Conversion Lift (Geo)	Manual or Platform Geo Holdout	Meridian GeoX
Who runs it	Google, via a rep	You or your vendor	You, with open-source code
Cost	Free, eligibility-gated	Variable	Free in license
Channels	Google Ads only	Any	Any (publisher-agnostic)
Conversion source	Google's tag, Firebase, DV360	Backend revenue	Whatever you feed it
Methodology	Black box	Vendor-dependent	Open source, auditable
MMM integration

Conversion Lift is fine for "is this Google Ads campaign incremental inside Google." It also has the structural awkwardness of Google grading Google. You wouldn't accept that arrangement from any other publisher.

For everything else, you want the open framework.

‍

What does GeoX actually cost?

Free in license, expensive in labor. To run GeoX in production, a brand needs:

Clean data pipelines joining Google Ads spend, Shopify revenue, and other channels into a single geographic dataset
An analyst who can configure the model, interpret Bayesian posteriors, and translate the result for a CFO
Pre-test design discipline, because region selection is the strongest predictor of significance
Contamination monitoring, especially around Google Ads location defaults
Maintenance, because the codebase will evolve on GitHub

If a brand has a measurement team, GeoX is great. If "measurement team" is one performance marketer with a Looker license, the math goes the other way. Stella's Incrementality product handles design, contamination checks, and synthetic control matching, and the MMM runs at $3,000 per month flat instead of $15K-$80K consulting fees.

GeoX makes self-hosting cheaper. It doesn't make self-hosting easy.

‍

What Stella has learned from 225 geo incrementality tests

Five patterns that repeat across the 225-test benchmark:

Region selection is the biggest predictor of significance. The 88.4% that reach significance almost always had clean pre-period matching.
Branded search and PMax show the widest platform-ROAS-to-iROAS gap. Both routinely overstate by 60-80% versus true incremental.
Geo contamination is the most common execution failure. Default Google Ads location targeting leaks ads into control markets.
Backend revenue beats platform-reported conversions every time. Platform conversions share the attribution bias the test is designed to measure against.
Tests become more useful when stored as priors. A single test calibrates one quarter. A library of tests calibrates an MMM permanently. This is the GeoX promise made concrete.

‍

Common mistakes to avoid

Using platform-reported conversions as the outcome
Picking markets based on convenience instead of statistical similarity
Letting Google location targeting leak into control markets
Running the test too short
Ignoring contribution margin
Treating inconclusive as failed (inconclusive means redesign, not abandon)

‍

Frequently asked questions

When will Meridian GeoX be available to everyone?

Google announced GeoX on May 5, 2026 and confirmed testing begins later in 2026. Google hasn't published a launch date. Sign up for updates on the official Meridian GeoX page.

Is Meridian GeoX free to use?

Yes, the code is open source under Google's Meridian project on GitHub. The license costs nothing. The engineering, analyst time, and data infrastructure required to run it in production is where the real cost lives.

Can GeoX measure channels other than Google Ads?

Yes. GeoX is publisher-agnostic. You can test Meta, TikTok, YouTube, podcasts, CTV, or offline media, as long as you can change media exposure by geography and pull a clean outcome signal.

How long should a Google Ads geo holdout run?

Two to four weeks for high-volume DTC brands. Four to six weeks for mid-market. Add a two-week post-treatment window for YouTube or any upper-funnel test. For setup detail, see the Google Ads incrementality guide.

Does GeoX replace MMM?

No. GeoX is the geographic experimentation layer. Meridian is the MMM. GeoX feeds causal evidence into the MMM as priors. Experiments validate causality, MMM allocates budget. You need both.

‍

The bottom line

GeoX is the most important measurement announcement Google has made in years, and the most overrated.

It formalizes a methodology good measurement vendors have run for years, makes the open code easier to use, and gives CFOs a Google-branded reason to trust geographic experimentation.

It doesn't pick your markets, manage contamination, join your backend revenue, or interpret your results. Those are still the hard parts.

The brands that get value from GeoX when it ships are the brands running geo holdouts today.

Run your first Google Ads geo holdout before GeoX ships. Start a 7-day Stella trial and we'll match your markets, run the test, and have results ready before your next budget review.