Bayesian AB Testing in Marketing Reading List

If you came here from the related talk, thanks for watching! If not, thanks all the same, and you’ll probably be able to use this as a starting point to do more reading about whatever part of AB testing piques your curiosity.

There was a lot of info I wasn’t able to squeeze into my talk for the 2019 AB Testing Summit, including proper reference or sourcing. I’ve supplied that info here, along with a few comments I hope will be useful to those looking to do further reading about Bayesian AB Testing in Marketing.

Also, if you have a question that won’t fit in Twitter, please leave a comment and we can have a more detailed discussion that will be easier for people with similar questions in the future to find.

You can find the slides here.

Introduction to Bayesian Methods in Marketing

As a first look at Bayes Theorem, you could do a lot worse than An Intuitive (and Short) Explanation of Bayes’ Theorem

We almost immediately get into considering where Bayesian Priors come from and the validity of these processes, which is hotly debated. By extension, we need to consider several sources to get an idea an informed perspective.

If you don’t read any other source I provide, 5 Reasons to Go Bayesian in AB Testing – Debunked is probably the fastest way to get see a well argued perspective that diverges from the generally accepted hype. If you want to learn more, the author, Georgi Georgiev, is knowledgeable and prolific. There are several other great articles and whitepapers that explain how Bayesian methods may not always live up to the accompanying spin, which were immensely useful in building my understanding and preparing this talk. We’ll mention some more of his work later, but the one blog link above and two following white papers offer a good overview of the contention.

The overwhelming majority of discussion about Bayesian methods in AB testing was dismissed as coming from sources with a clear conflict of interest paired with an almost complete lack of transparency, and in some cases, presented with total incoherence. The pro-Bayesian materials we looked at that were useful were from referred journals or technical whitepapers with a high degree of transparency, for which the vendors should be commended.

Bayesian Statistics and Marketing – Published back in 2003, this lays a lot of the groundwork about how Bayesian methods work and why they are useful to marketers, even though it predates the prevalence of AB testing that online businesses enjoy today.
Bayesian A/B Testing at VWO
The New Stats Engine (at Optimizely)
If you know where I can get my hands on a Google Optimize white paper let me know.

We also looked at a non-Bayesian modern CRO tool that claims to deliver similar advantages of those claimed by Bayesian tools. We wanted to establish if the claimed advantages of Bayesian tools are only possible with Bayesian tools, or if non-Bayesian methods that are appropriately advanced and customized for CRO can deliver them as well. It may not be a huge surprise that this method was created by the aforementioned Georgi Georgiev.

Efficient A/B Testing in Conversion Rate Optimization: The AGILE Statistical Method

Introducing Bayes Visually With Python

Introduction to Bayesian Inference is a great first look at how doing your own Bayesian analysis could work, and even if that isn’t your goal, it provides a very helpful visual interpretation to how Bayes works in marketing.

The 3d graphs were based on, like most of the code holding the internet together, a Stack Overflow answer.

Ease of Explanation

The Part to Ignore

Sorry, but I’m going to include a bit of a rant that I didn’t have time for in the talk. Skip to the TLDR if you’d like.

The whole “frequentist methods are ridiculous because [some variety of absurd claims about how frequentists aren’t allowed to discuss results or use their brains except in the strictest possible interpretation of the math they use]” such as those included by Google in their Optimize documentation are a text book example of fallacious “straw man” arguments.

Simply put, you make up a version of your enemy that doesn’t reflect reality, but completely supports your argument, then argue against that instead of what the actual enemy says.

This argument has been raised and conclusively shut down repeatedly, perhaps most devastatingly by A “Bayesian Bear” rejoinder practically writes itself… where a mathematician gives a hilarious example of what happens if you make a Bayesian straw man to compare to the Frequentist one made by unscrupulous Bayesians.

TLDR: If you hold mathematicians to the strictest possible requirements of their assumptions and definitions, the only people who will understand are other mathematicians, and even they will think you are a jerk that is just wasting people’s time. For Bayesians to pretend Frequentist methods are the only part of statistics facing this issue reduces the credence we can give other claims made by this particular subset of Bayesian proponents.

The Part That Is Interesting, But Still Kind of Esoteric

Initially I didn’t fully absorb the importance of the debate, but the aforementioned Georgi Georgiev was kind enough to have an email chat with me and I was able to understand more of the difference in positions.

Bayesians assume that their best guess of the probability, is the probability.

Frequentists figure out how unlikely an outcome would be if the experiment were repeated a very large number of times. It’s still up to the user to decide what this result means in their real world situation.

Using a Bayesian tool doesn’t just mean trusting that the priors are chosen in a way that will work with your data, it also means trusting that the method is generating an actual probability. For people who think about these things deeply, that can be a pretty big leap of faith.

To quote Georgi: “Frequentist methods give you data and uncertainty and make it crystal clear that the decision-making part is up to you, and cannot be purely data-driven.”

Now, it’s still up to us as practitioners to decide how to explain what level of detail to stakeholders, and not everyone at a meeting about what creative to go with is going to be up for a talk about p-values. For people who are, though, it actually opens up a fascinating area of discussion and introspection.

We build human assumptions, often based on qualitative inputs, into the tools and methods we use to gather and report on data – but we often consider the results to be purely quantitative.

And we didn’t need to make a big AI project to run into that issue – even a little old t-test is an opportunity to see to what level we separate our assumption and interpretation from the data itself.

Optional Stopping and Bayesian Suitability for CRO

There’s a lot of material out there about optional stopping, but here are some excellent reads on the topic that go deeper than your average blog post without turning into heavy math and coding.

Faster CRO Results

Information about specific products comes from either the aforementioned whitepapers or website site documentation accessed over the month of May 2019. Note that at some points while speaking (and I think even one slide), I use loss control and error control interchangeably, which is inaccurate. Loss control and error control are very different things.

Recommended Non-Bayesian Products

Optimizely, VWO, and Google Optimize, even once the hype is taken away, all make compelling cases to any org without the data science resources to do their own modelling. There are some non-Bayesian alternatives, however, that allow users to compare Bayesian tool results to traditional or more advanced frequentist menthods, for free or very little investment.

Google Analytics users can take advantage of Stéphane Hamels’ DaVinci Tools. This Chrome extension, even in its free version, makes a million and one quality of life improvements to the Analytics UI, as well as adding some useful features. One of these is the ability to take a t-test by clicking a few times on any Google Analytics report. If you also use Optimize, your experiment data is already in GA in a format that makes this very easy.

For a more sophisticated approach, have a look at the Agile A/B Testing Calculator included with Georgi Georgiev’s Analytics Toolkit. It’s very inexpensive, and Google users can connect it to Analytics/Optimize easily.

Sorry for the Google-centric view but it’s what I spend most of my time on. If there are other tools out there you feel provide a good way to double check what the Bayesian “Big Three” are recommending, I’d love to hear about it.

Please let me know in the comments or on Twitter, or reach out with any other questions or comments!