What to be aware of before onboarding Churney

 | August 7, 2024 · 3 min

The onboarding process to Churney is mostly painless, as we only require consistent user IDs, understood payment events, and reliable timestamps across events. However, we identified some issues that may arise with time. We outline these issues in this document, both from a data engineering perspective and from potential vulnerabilities in the ad platform setup.

From a Data Engineering perspective

You should be aware of the following:

  • Only daily data updates. We produce the Churney signal by incrementally updating converted user activity in the data warehouse. We hinge on reading the user's progress throughout the first day and benefit from being confident early on in the user's value. As such, the relevant data must be refreshed in a similar cadence. While we do not require a real-time update, daily or even less frequent updates make it nearly impossible to support the Churney signal.

  • Backfilling. Key events cannot be used if they are only added to the data warehouse long after they occur. 

  • Future leakage. If columns are updated without timestamps, we avoid using them to prevent future data from leaking into our training set. Our automated checks help detect this but may exclude useful data.

  • Payment ambiguity. Payments are crucial for our predictions, so we must know when a valid payment occurs. We provide exact queries for targets and cross-reference with internal analytics to ensure accurate revenue understanding.

  • Firebase logs limit. Firebase is key for user behavior capture and ID matching, but there's a hard limit of 1,000,000 events daily for free replication to BQ. Exceeding this requires Google Marketing 360, which is costly. We usually subsample Firebase to include only relevant events.

  • Unclear user mapping across different sources. We need to create a unified view of each user’s journey, which requires mapping user IDs across sources. If unavailable, we can't use all relevant data. Some advertisers don't store their internal user ID in Firebase, impacting predictive models. We verify this early to allow fixes before experiments go live.

From an Ad Platform perspective

You should be aware of the following:

  • Ad platform identifier issues. Make sure to always follow the official platform specifications. As an example, our predictions are sent to Meta primarily using the hashed email identifier: sha2(lower(email)). A common mistake would be to hash the email differently, for example: sha2(upper(email)). This deviates from the official Meta specifications.

  • Early access to Betas. We exploit specific Beta products from Meta that require whitelisting. For advertisers with dedicated CSM support, registering for these Betas is straightforward. Be sure to request this early in our engagement.

  • User-level campaign attribution. Having access to user-level campaign attribution in the data warehouse helps demonstrate the long-term impact of Churney. While perfect attribution is rare in any company, it's valuable to understand how your company typically handles it and to identify any significant issues.

  • Non-consolidated campaigns. Ad platforms perform their best with the broadest targeting possible. As the ad platform learns the signal per Campaign (or ad set), we must produce enough successful signals to exit the learning phase in the strongest way possible. Experience shows that ad accounts with numerous small, low-volume campaigns split by geography and user properties are suboptimal for Churney's signal operation.