Predicting churn - or predicting what you can do about it?

Brian Brost | September 26, 2022 · 4 min
Illustration of people walking on a path

Excessive churn is perhaps the greatest threat to the healthy growth and profitability of any recurring revenue company. More precisely, the real pain point occurs when your customer acquisition cost is greater than their lifetime value (LTV).

Roughly speaking, growth teams have two major responsibilities: (i) to acquire the best possible customers at the lowest possible costs, and (ii) to maximize the LTV of their existing customer base. In this blog post, we focus on the second problem, and why predicting churn is a useful starting point, but inadequate to solve the problem on its own.

A study from the Harvard Business Review demonstrated that a 5% increase in retention can be expected to lead to a 25%-95% increase in profit. Reducing churn has such an out-sized impact on profit because the effects of an increase in retention compound over time. As a result, a natural starting point of any LTV maximization strategy is to focus on reducing churn.

If you work at a subscription company, one of the simplest things you can do to stop your customer from churning is to offer them a discount or some other incentive when they attempt to cancel their subscription. This form of reactive churn prevention often comes too late to change the customer’s decision to stop using your service. Indeed, our A/B testing has shown that the same discount offered after customers try to leave can be less than half as effective as providing that same discount proactively, before the customer has tried to cancel.

Of course, such a proactive approach isn’t possible if you can’t effectively predict who’s going to churn, at least not if you don’t want to offer unnecessary discounts to customers that were happily using your service. Strong predictive models are therefore a prerequisite for proactive churn prevention. They aren’t enough however, and continuing with our previous example, providing a discount to the users deemed most likely to churn can also be counterproductive.

Customers who are very likely to churn may not be possible to save. In the worst case, as illustrated above in the quadrant labelled 'harmful', a user that appears very likely to churn might be reminded to cancel their subscription by a discount offer, earlier than they would have done otherwise. For a customer that’s going to leave regardless of whether you offer them the discount, you will have wasted whatever resources you spend trying to get them to stay. Ideally, you would target only the customers from the 'desirable' quadrant, the ones that would leave, were they not offered the discount, but who would stay if provided the discount.

Unfortunately, predicting which users belong to that 'desirable' quadrant is a difficult problem, requiring expertise in both machine learning and causal inference. The problem boils down to predicting so-called heterogeneous treatment effects. While it may seem like magic to claim to know who will respond to a given treatment, it is possible with a bit of intelligent exploration.

The basic solution is to run an adaptive, targeted A/B test. Starting with an initially randomized A/B split, you need to iteratively estimate for which users the treatment worked using double machine learning. These treatment effect estimates are then used to more intelligently target the treatment. This treatment effect estimation has only become feasible for complicated problem settings thanks to recent advancements in double machine learning based causal machine learning, which enable modern machine learning methods to be combined with rigorous and unbiased causal inference.

As an example of how well this can work, we worked with a client that had unsuccessfully tested a discounting strategy for users that they believed were likely to churn. Their initial targeting resulted in only a negligible increase in retention, and actually reduced revenue. Within less than 4 weeks of adopting our strategy, they had identified the users for whom the discounting actually worked, yielding a 30% increase in retention, and a 17% increase in revenue per user.

Another advantage is that the framework of treatment effect prediction is more general than churn prediction. If you focus on the predicted treatment effect on lifetime value, you can optimally target many treatments which wouldn’t make sense to think of if viewed through the narrow lens of churn prediction. Thus, rather than just trying to prevent churn, treatment effect predictions can be used to maximize the return on investment for upsell and cross sell campaigns, and more generally, to maximize the impact of your retention marketing efforts.

We hope this article will convince you that causal machine learning is the right framework for optimizing your retention marketing. In the coming weeks and months we’ll be sharing more material on how to execute in this challenging, but rewarding domain.

  • Churn
  • LTV