What is a holdout test?

Q: What do you hold out in a pLTV pilot?

Typically the pLTV value signal (or enhanced value events) on the control side, while treatment receives calibrated predicted values via Meta CAPI, Google Ads Conversion API, or equivalent paths. BAU conversion events often remain on both sides.

Q: How long should a holdout run?

Long enough for platform learning, signal volume stability, and your agreed cohort maturity window. Separate "signal live" from "experiment readout complete."

Q: What metrics should a holdout track?

Incremental ROAS, conversion volume, cost per acquisition, and cohort LTV or margin at maturity. Platform ROAS is supplementary, not sufficient.

Why it matters

Platform dashboards rise and fall for reasons unrelated to your signal change: seasonality, creative refresh, audience exhaustion, or attribution window shifts. Without a holdout, teams often credit pLTV or value-based bidding for lifts that are correlation, not causation.

Holdouts matter most when economic value is delayed. You might see platform ROAS improve in week two while cohort LTV at D90 shows no quality gain. A holdout tied to an agreed maturity window separates signal impact from noise.

Performance marketing owns the test design upfront: which campaigns, what split, how long to run, and what success means for finance. Skipping holdout design is a common reason pLTV pilots fail to get budget renewal.

Holdout test

Holdout tests are how teams prove pLTV and value signals work in production:

Baseline: Document business as usual (BAU) conversion setup (event type, value field, timing).
Treatment: Enable user-level pLTV from first-party data in your data warehouse; Churney sends values directly to ad networks for the test cell only.
Holdout: Withhold the new value signal (or send BAU values) for a matched control segment with sufficient volume.
Platform learning: Allow learning phase and signal volume thresholds before judging delivery shifts.
Readout: Compare incremental ROAS, conversion volume, and cohort LTV by cell at pre-agreed maturity.

A clean holdout answers: "Did pLTV change who we acquired and how much they were worth?" Platform-reported ROAS alone cannot.

Next step: Growth Predictability Test · Talk to an expert

Category variants

Model	How holdout tests show up
Ecommerce / DTC	Split by campaign or geo; compare repurchase and net revenue cohorts with vs without pLTV value events.
Subscription app	Holdout on trial-start campaigns; compare trial-to-paid and renewal at maturity, not just install CPA.
SaaS / PLG	Split on paid signup campaigns; compare expansion and retention with longer readout windows than ecommerce.

Common mistakes

No true holdout. Everyone gets the new signal; you only compare before/after time periods.
Contaminated control. Holdout users still receive treatment via overlapping campaigns or broad targeting.
Stopping early. Ending at platform learning phase before cohort maturity misses delayed value.
Underpowered splits. Too little volume in either cell produces inconclusive readouts.
Moving goalposts. Changing success metrics mid-test after seeing interim dashboards.
Ignoring calibration. Treatment lifts platform metrics but predicted values do not rank realized LTV.

Advertiser lens

Role	What they ask	What good looks like
Head of Performance / UA	Can we run this without killing volume?	Pre-registered split, minimum volume plan, and BAU fallback documented.
VP Growth / CMO	What proves this is worth scaling?	Holdout readout on incremental ROAS and customer quality at agreed maturity.
Marketing Analytics / Data Science	Is the design valid?	Power check, leakage audit, and analysis plan signed before launch.
Data Engineering	Can we route signals by cell?	Clear mapping of test vs holdout campaigns and monitoring for mis-routing.
Finance / Procurement	What triggers payment or renewal?	Success criteria tied to holdout outcomes, not platform dashboards alone.

FAQ

What is a holdout test?

A holdout test withholds a new treatment from a control group while applying it to a test group, then compares outcomes to estimate causal lift vs what would have happened under BAU.

Why use a holdout instead of before/after comparison?

Before/after is vulnerable to seasonality, budget changes, and creative cycles. A simultaneous holdout isolates the effect of the treatment if the groups are comparable and cleanly separated.

What do you hold out in a pLTV pilot?

Typically the pLTV value signal (or enhanced value events) on the control side, while treatment receives calibrated predicted values via Meta CAPI, Google Ads Conversion API, or equivalent paths. BAU conversion events often remain on both sides.

How long should a holdout run?

Long enough for platform learning, signal volume stability, and your agreed cohort maturity window. Separate "signal live" from "experiment readout complete."

Can you hold out at user level on Meta or Google?

Some ad platforms offer conversion lift studies that randomly withhold ads from a control audience, but eligibility and campaign-type coverage vary by account. Custom pLTV signal holdouts (withhold value events on control campaigns) usually require campaign, geo, or audience splits you control. Geo holdouts remain common when you need cross-channel readout.

What metrics should a holdout track?

Incremental ROAS, conversion volume, cost per acquisition, and cohort LTV or margin at maturity. Platform ROAS is supplementary, not sufficient.

What if the holdout shows no lift?

That is a valid outcome. It may indicate calibration issues, insufficient volume, wrong campaigns, or that BAU was already near optimal. Document learning and fix signal design before scaling.

Not the same as

Term	Difference
A/B test (creative)	Creative A/B tests copy or assets; holdout tests often withhold a measurement or value signal.
Conversion lift study	Platform-run lift studies measure ad exposure; holdout here focuses on your value signal or bidding change.
Geo experiment	Geo holdouts use geography as the unit; campaign holdouts use traffic or campaign splits.
BAU comparison	BAU is the control definition; holdout is the experimental method that enforces it.

Why it matters

Holdout test

Category variants

Common mistakes

Advertiser lens

Related terms

FAQ

What is a holdout test?

Why use a holdout instead of before/after comparison?

What do you hold out in a pLTV pilot?

How long should a holdout run?

Can you hold out at user level on Meta or Google?

What metrics should a holdout track?

What if the holdout shows no lift?

Not the same as