Stats table calcs + Timeline viz = Easy A/B Testing


(Fabio) #1

Disclaimer: The statistical methods in this post are Bayesian. I strongly prefer Bayesian statistics over frequentist statistics, and I think you should too. But, that’s a post in and of itself, and there are plenty of resources easily found on the internet that do it better than I could, so I will leave it at this - if you do any statistics, but aren’t yet familiar with Bayesian statistics, please take some time to read up on them!

Looker 4.22 ships with a ton of statistical functions, including several inverse distribution functions for conjugate priors. If that didn’t make any sense and you just want to run an A/B test, worry not & read on.

Let’s say you have randomly assigned visitors to one of several variants, and collected some data like this:

Variant    | Visitors | Converted
------------------------------------------
Control    | 519      | 34
Test-A     | 217      | 23
Test-B     | 224      | 19

What can we conclude from this using Bayesian statistics? We can build credibility intervals that tell us (for example, with 95% probability) where the true conversion rate may lie for each of these variants, using the following table calculations:

beta_inv(
  0.025,
  ${...converted} + 0.5,
  ${...total} - ${...converted} + 0.5
)
beta_inv(
  0.975,
  ${...converted} + 0.5,
  ${...total} - ${...converted} + 0.5
)

After adding these calculations, hiding the two columns we don’t want to visualize, and selecting the “Timeline” visualization, we get the following

From here, we can easily compare the probable conversion rates and decide to go forward with a given variant, to continue collecting more data, or to drop the experiment and try something else (for example, if the possible improvement isn’t substantial enough).

That’s it, that easy! Go A/B test!

PS. I now have another article which provides examples of comparing rates of events over time rather than proportions, and also goes into a little more detail on the theory behind conjugate priors.


Bonus: In theory, you could even use the Looker API to do Thompson sampling, to automatically weight how you assign new partitipants, by sorting your variants on the following formula and picking the top one:

beta_inv(
  rand(),
  ${...converted} + 0.5,
  ${...total} - ${...converted} + 0.5
)

In practice, however, it would be better for performance to just figure out a ratio once in a while, and then apply that ratio to a whole batch of participants, rather than hitting the API for each participant :wink:


Stats table calcs: comparing rates over time (Bonus: anomaly detection!)
(sara.leon) #2

In order to get the % formatting here on the visualization, you can use excel markdown in the visualization settings. For example, if you enter 0.00% you’ll get percent to the second decimal place.