The Limitations of A/B Testing and the Rise of Causal AI

Paul Fagan, Noy Rotbart, Anna Bigaard | January 8, 2024 · 4 min

We highlight the issues of A/B testing for marketing purposes and give some of our experience on using causal AI to improve this process dramatically.

A/B testing and its issues

A/B testing is a widely used method to evaluate the effectiveness of different solutions. It is often employed in a continuous workflow consisting of three main steps: identifying a target segment together with potential treatments to drive a desired outcome, conducting an A/B test where the different treatments are given to randomly assigned users, and lastly, analyzing the results to identify the most effective treatment. If a new treatment is successful, it is implemented. If it isn't, the process is repeated, and the treatment designs are refined.


It is an unfortunate reality that 80-90% of A/B tests fail to demonstrate a significant improvement. An understudied cause of many failures is that A/B tests bundle harmful effects with desirable ones and provide the average treatment effect. These potentially harmful effects for a subset of the targeted users can mask or dilute the desirable effects in the final analysis, leading to good treatments being abandoned just because they aren’t the right choice for everyone. Often, even when this problem is understood, marketers attempt to identify sub-segments for whom a given treatment will work based on intuition and gut feel. 

Even when successful, A/B tests tend to optimize for the short rather than the long term. Marketers may be tempted to implement strategies that yield immediate results, even though their real objective is to ensure long-term customer retention and loyalty. A focus on short-term gains can sometimes lead to strategies that are detrimental in the long run.


The Rise of Causal AI

Causal AI offers a new approach to overcoming these limitations. This method makes it possible to personalize A/B testing, where the goal is not just to discover the best-performing variant on average but to identify the optimal variant for each user.

It works iteratively and starts first by running what looks like a traditional A/B test. As the first results roll in, the causal machine learning algorithm identifies patterns and trends in user responses to the various treatments and answers the question, “What would have happened if the user got treatment X instead of Y?". From this, we can create a treatment effect estimation of the individual treatment for each user and change the treatment assignments accordingly.

With each iteration, the algorithm learns the optimal user assignments. This allows for reliance on a continuous AI system for making better decisions about who should receive which treatment instead of relying on a one-size-fits-all mindset and intuition-based segmentations. 

The key to success with causal AI

Three conditions are vital for the success of automated, personalized A/B testing.

  1. As in standard A/B testing, there is a trade-off between the size of the population, the variance in the outcome, the impact we want to be able to detect and the required population size. Churney offers power calculations for each treatment to ensure that the treatments have a real shot.

  2. From a business standpoint, individual treatment effects are worth pursuing if you can accept the long-term existence of multiple treatment decisions. Conversely, there are many scenarios where you simply want to identify the single most effective variant, and here, A/B testing will usually be preferable. 

  3. The availability of features that can predict the best option. If there are very few user features and the decision point occurs early in your user journeys, such as right after app installation, personalizing their experience effectively may be challenging. Features that are user-properties based (e.g., age) are easier to work with, but those rooted in user behavior (e.g., first-session lengths) are often the most powerful.

To summarize, a successful implementation requires a substantial number of individuals to treat, data demonstrating a good conversion rate, and significant differences in treatment effects across various user options. Lastly, we recommend choosing an outcome revealed within a relatively short timeframe and performing a mediation analysis to ensure that improving the short-term outcome also improves the ultimate long-term objective.

Churney's specialization in causal AI provides companies such as Podimo, Beer52, and Mindvalley with double-digit increases in their return on investment. It also helps them achieve their business goals without wasting time and money. A/B testing is suboptimal, intuition-based.