A holdout group is a segment of a target audience deliberately withheld from a marketing campaign so that advertisers can measure the true incremental impact of that campaign. By comparing outcomes between the exposed group and the unexposed holdout, marketers isolate what their spending actually caused versus what would have happened anyway.
Without a holdout group, there is no reliable way to separate campaign-driven conversions from organic ones. Someone who would have purchased regardless of seeing an ad looks identical in the data to someone the ad genuinely persuaded. Holdout groups resolve this by preserving a clean baseline.
How a Holdout Group Works
Before a campaign launches, marketers pull a random sample of the eligible audience and exclude them from all campaign exposure. The remaining audience receives the campaign as normal. After the campaign period ends, marketers compare conversion rates, revenue, and other KPIs across the two groups.
The randomization step is critical. If the holdout group differs systematically from the exposed group (for example, skewing toward lower-intent users), the comparison produces a distorted read. True random assignment ensures the only meaningful difference between groups is whether they saw the campaign.
Holdout sizes typically range from 5% to 20% of the target audience. Smaller holdouts reduce opportunity cost but require larger overall audience pools to reach statistical significance. A 10% holdout is a common starting point for mid-scale campaigns.
Incrementality Formula
The core metric derived from a holdout test is the incremental conversion rate (ICR):
| Metric | Formula |
|---|---|
| Incremental Conversions | Conversions (Exposed) − Conversions (Holdout) |
| Incremental Lift (%) | [(CVR Exposed − CVR Holdout) / CVR Holdout] × 100 |
| iROAS | Incremental Revenue / Ad Spend |
Example: A retailer runs a retargeting campaign. The exposed group converts at 4.2% and the holdout group at 3.1%. Incremental lift is [(4.2 − 3.1) / 3.1] × 100 = 35.5% incremental lift. Without the holdout, every 3.1% of those conversions would have been falsely credited to the campaign.
Holdout Groups vs. A/B Testing
Holdout groups and A/B testing are related but answer different questions. An A/B test compares two versions of a creative, message, or offer. A holdout test compares campaign exposure against no exposure at all. A/B tests optimize; holdout tests validate whether the campaign is worth running in the first place.
The two methods are often run simultaneously. A campaign might A/B test two ad creatives across the exposed population while maintaining a holdout of unexposed users to measure overall incrementality.
Real-World Application
Meta’s Conversion Lift product, widely used by direct-to-consumer brands, builds holdout groups natively within its ad auction. Advertisers using this tool have repeatedly found that 20% to 40% of conversions attributed by last-click models were not incremental. They would have occurred without the ad.
Airbnb’s growth team, led by former VP of Engineering Mike Curtis, documented a similar finding when auditing their paid social spend in 2019. Retargeting campaigns showed strong attributed performance in standard reporting, but holdout tests revealed incremental lift was a fraction of reported conversions. The team reallocated budget accordingly, reducing retargeting spend and reinvesting in upper-funnel channels with higher measured incrementality.
Peloton used holdout methodology during its 2020 demand surge to separate organic demand from paid campaign contribution. With massive organic tailwinds from lockdowns, accurate attribution required clean holdout comparisons to prevent over-investing in campaigns that were effectively riding the organic wave.
Setting Up a Valid Holdout Test
- Define the audience pool first. Segment before splitting. If the campaign targets cart abandoners in the last 30 days, pull that full list, then randomize into test and holdout.
- Randomize at the user level, not the session level. Session-level splitting creates contamination risk if the same user has multiple sessions.
- Suppress, do not just exclude. The holdout group should be actively prevented from seeing campaign ads, not just deprioritized in delivery. Most DSPs and paid social platforms offer suppression list functionality for this purpose.
- Pre-register your success metrics. Decide what constitutes lift before the test begins to avoid post-hoc metric selection.
- Run long enough to reach statistical significance. Short tests are vulnerable to day-of-week effects and noise. Two to four weeks is a common minimum for most mid-funnel campaigns.
Common Mistakes
Audience contamination is the most frequent error. If holdout users are reached through another channel running simultaneously (email, organic social, influencer), the holdout is no longer clean. Marketers need to account for cross-channel exposure when designing the test.
Marketers also frequently confuse platform-level holdouts with cross-platform holdouts. A holdout built inside Meta measures Meta’s incrementality. It does not account for what those users see on Google, CTV, or elsewhere. Full incrementality measurement often requires a geo-based holdout, where entire markets are suppressed from a campaign rather than individual users.
Over-relying on short-term conversion windows can understate incrementality for high-consideration purchases. A holdout test for a mattress brand measured over 7 days will capture far less true lift than one measured over 30 to 60 days, given the category’s typical purchase cycle.
Connection to Marketing Attribution
Holdout groups are a foundational input for marketing attribution models that aim to measure true causal impact rather than correlation. Data-driven attribution models trained without holdout validation tend to overfit to channels that touch users near conversion, systematically undervaluing upper-funnel and brand activity.
Many measurement vendors now combine media mix modeling (MMM) with holdout-calibrated incrementality tests, using the holdout results to ground-truth the MMM outputs. This hybrid approach has become a standard recommendation from platforms including Google and Meta for advertisers spending above $1M annually on digital.
For marketers building toward more rigorous attribution, holdout groups are one of the few tools that provide a causal rather than correlational answer to the question: did this campaign actually drive results?
Frequently Asked Questions About Holdout Groups
What is a holdout group in marketing?
A holdout group is a segment of the target audience kept out of a marketing campaign to serve as a clean baseline. By comparing outcomes between users who saw the campaign and those who did not, marketers can measure the true incremental lift the campaign produced, separate from conversions that would have happened organically.
How large should a holdout group be?
Holdout groups typically run between 5% and 20% of the target audience. A 10% holdout is the most common starting point for mid-scale campaigns. Smaller holdouts reduce the opportunity cost of withholding audience but require a larger total audience pool to reach statistical significance.
What is the difference between a holdout group and A/B testing?
A/B testing compares two versions of a creative, message, or offer against each other. A holdout test compares campaign exposure against no exposure at all. A/B tests optimize what the campaign says; holdout tests validate whether the campaign should run in the first place.
How do you prevent holdout group contamination?
Contamination happens when holdout users are reached through another active channel, such as email, organic social, or influencer activity, during the test period. To prevent it, suppress holdout users across all active paid channels, account for cross-channel exposure in the test design, and randomize at the user level rather than the session level.
What is a good incremental lift result from a holdout test?
Incremental lift varies by channel, funnel stage, and category. Retargeting campaigns often show 20% to 40% lower incrementality than last-click attribution suggests, per Meta’s Conversion Lift data. Prospecting and upper-funnel campaigns frequently show higher true lift because users in those pools would not have converted organically. There is no universal benchmark; the useful comparison is against your own prior tests, not an industry average.
