Based on customer research over a two month period on the usability of Stats Accelerator, we discovered that customers were unclear as to how to best use the product. As a refresher, at this time there were two modes of Stats Accelerator, Accelerate Impact and Accelerate Learnings. If you aren’t familiar with them, see these previous blog posts: an overview of the two, and how we tackled Simpson’s Paradox in Accelerate Learnings.
Here’s what we learned via our research and how we set out to improve the products.
The handful of customer challenges we discovered can be distilled down to two things: misaligned expectations and unclear results.
Customers struggled to decide when to pick one mode or the other, often picking Accelerate Impact because it had the least amount of friction to set up. In contrast to Accelerate Impact, Accelerate Learnings requires at least two variations aside from the baseline. Even after selecting Accelerate Impact, customers were surprised to see their results page lacking the statistical values they had become accustomed to.
In search of those values, they would switch to Manual distribution because it yielded something they could report back to their stakeholders. This activity suggested that their mental framework was tied to A/B testing even though Accelerate Impact is centered on optimization.
This revealed a gap in understanding between when to experiment and when to optimize–the fundamental difference between when to use Accelerate Learning and Accelerate Impact. Admittedly, how we designed the interface and named the two modes (they both start with “Accelerate”) likely contributed to this conflation of the two concepts.
Experimentation and Optimization
The decision to run one or the other comes down to understanding the intent of your test. Do you want to optimize towards a primary metric? Or are you interested in discovering which variation improves your product – in the sense of statistical certainty that this change is indeed better and not just an anomaly?
Let’s consider when to run an A/B test. You have a hypothesis, create the experiment, and run it. If a variation reaches statistical significance, you analyze the results, create a new hypothesis with these learnings and run a new experiment with an updated baseline.
This is different from an optimization. With optimization you’re not concerned about learning insights; you want to maximize (or minimize) with respect to a metric. For example, a customer runs a promotion on their site with the intent of driving more sign-ups. Suppose this test had two variations, where Variation 1 has a 10% conversion rate and Variation 2 a 20% conversion rate. With uniform allocation and 100 visitors, we have the following Scenario 1:
Had they known that Variation 2 yields more conversions on average, they could have allocated all visitors to Variation 2 like Scenario 2:
During this promotion the customer wants the algorithm to drive visitors to the variation that yields the most sign-ups. Compared to Scenario 1, Scenario 2 yielded 5 additional conversions, a 33% improvement. Missing those 5 conversions is referred to (in Machine Learning) as regret, and that’s what they want to minimize. This is an optimization problem. (For a longer treatise on the benefits of Machine Learning you can read this previous blog).
The customer running the site was not concerned whether there’s some actionable insight throughout the duration of their test. There’s no intent to implement the variation permanently into their product since this is a promotion (check out our Knowledge Base article on when to experiment and when to optimize).
And it is this fundamental difference between experimentation and optimization is where we focused our efforts in improving Stats Accelerator over the last year.
First we changed the name to accurately reflect the intent of the feature, and we also placed additional guardrails in the creation flow and the results page.
We renamed Accelerate Impact to Multi-Armed Bandit (MAB) and Accelerate Learning to Stats Accelerator (SA). Though both use multi-armed bandits to dynamically allocate visitors, we used the commonly used industry term of MAB to rename Accelerate Impact because a multi-armed bandit is often associated with minimizing regret.
MAB Creation Flow
We’ve made MAB a first-class option when creating a test:
Prior to this change, using Accelerate Impact under the A/B test made it seem as though statistical significance and confidence intervals were part of the results. Users had to select “Accelerate Impact” within their A/B test’s Traffic Distribution.
In doing so they were thinking of this optimization in the paradigm of an experiment.
The reasons to move this out as a separate test option out of A/B test are two-fold. First, we emphasize the distinction between MAB as an optimization and an A/B test. An A/B test’s goals are to understand when a variation is better through statistical rigor, whereas an MAB should be used for maximizing reward (or minimizing regret). Second, with its own creation flow MAB can be disassociated with the experimentation paradigm, encouraging our customers to decide which is appropriate for their business use-case when selecting these options. (And for more on exactly the kind of results MAB can yield here’s a post on our use of it during the holiday shopping season.)
We’ve included guardrails to highlight that MAB does not use values typical to an A/B Test.
MAB Results Page
Before the improvements, a results page for an A/B test with Accelerate Impact enabled looked similar to that of an A/B test:
Customers who saw this were confused, still expecting to see confidence intervals and statistical significance values, attributes akin to the experimentation framework. Now, with the emphasis that an optimization ignores statistical significance, we’ve taken those columns out and focused on how MAB is performing overall.
Users can get a summary of this performance, and observe a new estimate called “Improvement Over Equal Allocation”. This is the cumulative gain of running an MAB over running an A/B test with uniform allocation.
To calculate this we separate the history of the test into a series of equal-length epochs, where an epoch is a period of time that the traffic allocation and conversion rates are constant. Within each one we calculate the gain of running an MAB over equally allocating visitors to all arms/variations. This can be shown mathematically.
Improvement over equal allocaction
Check out our Knowledge Base for a great explanation of this.
Moreover, we’ve included a tour to highlight the new aspects of MAB results page:
Stats Accelerator Creation Flow
Aside from renaming Accelerate Learnings to Stats Accelerator, nothing else has changed. This mode is specifically for an A/B test and enabling it requires you to change the Distribution Mode.
Stats Accelerator Results Page
There are a number of improvements for the Results Page of an A/B test with Stats Accelerator enabled. There’s a badge that indicates when SA is enabled. More importantly, we’ve included a tour that focuses on Weighted Improvement, the estimate of the true lift by filtering out bias within each epoch, and less on Conversion Rate which isn’t used as an input to the algorithm.
Customer feedback helped us identify and address the core confusion in understanding the distinction between Accelerate Learnings and Accelerate Impact. We hope our improvements encourage users to decide whether to run an experiment or an optimization upfront, and adopt the right framework to understand their results.
As we continue to incorporate feedback, we aim to refine our product so customers can understand their intent in using our features. And we did just that with our latest improvements.