Data readiness

Data
6 min read
Updated June 12, 2026

Why it matters

pLTV and value-based bidding fail quietly when data is almost good enough. A model trained on incomplete refunds learns the wrong economics. Missing platform-specific click identifiers hurt match quality on the network that needs them: gclid (or Google enhanced-conversion keys) for Google, fbc/fbp for Meta CAPI. Absent fbc on non-Meta traffic is normal; missing hashed email or phone on purchase events is often the bigger EMQ drag. Stale nightly batches send signal freshness problems that look like "the model doesn't work."

Teams often jump to campaign settings before auditing readiness. The result is weeks of platform learning on noisy or biased inputs, followed by inconclusive holdouts and loss of stakeholder trust.

Data readiness is the gate between "we want pLTV" and "we can ship pLTV." Marketing analytics, data engineering, and performance marketing need a shared checklist before pilot kickoff.

Data readiness

Data readiness is the upstream prerequisite for the full activation chain:

  1. Inventory: Map events, revenue definitions, IDs, and attribution fields in your data warehouse (typically 3–12 months of history for modeling).
  2. Hygiene: Enforce append-only data, stable user ID, refund timing, and subscription state logic aligned to finance.
  3. Modeling: Train user-level pLTV only after leakage checks and anchor-event coverage meet thresholds.
  4. Signal design: Calibrate value scale, timing, and volume for platform learning.
  5. Activation: Churney reads from your data warehouse and sends values directly to ad networks via Meta CAPI, Google Ads Conversion API, or app measurement paths; monitor match rate and freshness in pilot.

Skipping readiness steps produces models that rank poorly, signals that do not match users, and experiments that cannot be defended to finance.

Category variants

ModelReadiness priorities
Ecommerce / DTCOrder and refund events, net revenue definition, repeat purchase history, click IDs on web sessions.
Subscription appInstall and trial events, trial-to-paid transitions, renewal and churn, MMP postbacks plus data warehouse truth for revenue.
SaaS / PLGAccount-level IDs, product usage events, expansion revenue, longer sales cycles reflected in training windows.

Common mistakes

  1. Starting with ad platform setup before ID audit. Match rate fails before the model is tested.
  2. Inconsistent user ID across web and app. Breaks joins and duplicates customers in training data.
  3. Batch-only updates with no SLA. Models and live signals drift from operational reality.
  4. Training on gross revenue, activating on net. Calibration and finance readout disagree.
  5. Missing attribution fields at anchor event. Cannot connect pLTV scores to the ad touch that should receive credit.
  6. MMP-only for hybrid businesses. App postbacks complement but rarely replace data warehouse history for web and cross-device journeys.

Advertiser lens

RoleWhat they askWhat good looks like
Head of Performance / UAAre we ready to pilot pLTV?Signed checklist: IDs, volume, anchor event, and campaign consolidation plan.
VP Growth / CMOWhat blocks go-live?Dated readiness milestones with owners across marketing and data.
Marketing Analytics / Data ScienceIs training data valid?Leakage audit, label definition doc, and maturity window for evaluation.
Data EngineeringWhat do we pipe and when?Daily append-only feeds, monitored pipelines, and identifier map documented.
Finance / ProcurementIs revenue data auditable?Single source of truth for net revenue and refunds aligned to pilot success metrics.

FAQ

What is data readiness for pLTV?

Data readiness means your event, revenue, and identity data in the data warehouse are complete and reliable enough to train pLTV models and send calibrated value events to ad platforms with acceptable match rates and freshness.

How much history do you need?

Often 3–12 months of user-level events and revenue, depending on repeat cycles, subscription length, and seasonality. Shorter history can work for simple ecommerce; subscription and SaaS usually need more.

What identifiers matter most?

A stable user ID internal to your business, plus ad platform identifiers where available (for example GCLID, fbc/fbp, hashed email or phone for CAPI). Gaps directly affect match rate and model labels.

Is data readiness only a data engineering job?

No. Marketing defines anchor events and success metrics. Analytics validates labels and leakage. Data engineering owns pipelines. Performance marketing owns campaign eligibility and pilot design.

How is data readiness different from signal health?

Readiness is whether source data can support modeling and activation. Signal health is whether events arriving at platforms stay accurate, fresh, and matched over time.

What happens if we are not ready?

Delay the pilot, fix IDs and feeds, or scope a limited proof on one channel or geo. Launching on weak data usually wastes learning phase and produces inconclusive holdouts.

Where is the full Churney checklist?

See What data Churney needs for the operational checklist and onboarding steps.

Not the same as

TermDifference
Signal healthOngoing quality of events at platforms; readiness is upstream source fitness.
Data warehouseThe system that stores data; readiness is a state of that data relative to pLTV use cases.
CalibrationHow well predictions match outcomes after modeling; readiness must exist before calibration matters.
Match ratePlatform-side user matching; readiness includes having the IDs match rate depends on.