October 30, 2014

The Problem With A/B Testing Success Stories

Like many, I learned about the principles and practicalities of A/B testing from online articles and resources. From the time I was first introduced to split testing, I’ve made conversion optimization my full-time profession. A year later, I noticed something alarming about the stories I was once so excited to read: success stories about A/B testing are bad for you. Continue reading to learn why they’re bad, how to reverse their damage, and how to multiply the value of your tests without the help of success stories…

a white and black object on a white surface

A year later, I noticed something alarming about the stories I was once so excited to read: success stories about A/B testing are bad for you.

Continue reading to learn why they’re bad, how to reverse their damage, and how to multiply the value of your tests without the help of success stories…

The Problem with Success Stories

Wildly successful A/B test results are like lottery winners: They make the news, while the millions of losers are never mentioned.

Unlike the playing the lottery, A/B testing is something many people are still learning about, and they’re learning from online resources just as I did. Unfortunately, those resources and articles tend to boast of winning tests and variations, while rarely mentioning the neutral and non-winning tests that are a vital part of the process.

This leads to survivorship bias — the formation of expectations and impressions based solely on exceptional winners (the “survivors”) and not from the whole picture.

Unrealistically high expectations by you, your company, or your client about A/B testing can result in:

Getting discouraged and giving up on A/B testing too soon.
Ignoring neutral and negative tests, sweeping them under the rug thereby missing opportunities to learn from those results.
Mistakenly evaluating the conversion optimization process only by the number of exceptional wins.

If not addressed early enough, these false expectations can stop conversion optimization efforts in their tracks, resulting in lost time, lost money, and missed opportunities.

What A/B Testing is Really Like

The first step of a successful conversion optimization effort is setting proper expectations.

A/B testing is a marathon, not a sprint. It’s a powerful and cost-effective way of improving conversion rates (and therefore revenue), but that improvement happens as an aggregate result of many tests, over many weeks, following a disciplined process.
Neutral and negative results occur happen just as often as positive results. Even with experience, you can’t predict how they’ll turn out, and that’s okay because neutral and negative results are just as insightful as positive results.
Huge wins, such as the 20x and 50x improvements you read about, are uncommon. Assess the wins on their added value, not just their number. For example, a +15% improvement in checkout rates might not make for an exciting blog post, but it could mean many thousands of dollars in additional annual revenue.
An A/B test that worked for another company isn’t always repeatable. Don’t blindly copy tests from success stories expecting similar results. Instead try to understand why it worked for them, and what lessons you can draw from it.

Although most tests aren’t going to be massive wins, things aren’t so dire. You can get tremendous value from paying attention to tests with neutral or negative results…

Embrace Non-Winning Tests

If expectations are set properly, neutral or losing tests shouldn’t be seen as a failure or shortcoming. In fact, they can be a source of important information that will ultimately help your conversion optimization process.

Getting Value from Neutral Tests

When a test results in neither a winner nor a loser, it’s easy to dismiss it as a strike-out and try something else. Instead, you should stop for a moment to analyze the neutral test:

What hypothesis, if any, does the neutral result invalidate? The problem might not be what you thought.
Was the variation drastic enough to have an effect? You may need to be more creative or increase the scope of the experiment.
Should the test be targeting specific user segments to find significant results? Maybe the AB test affected a user segment, but that change isn’t visible in the average.

Case Study: A Neutral Test That Won

I recently ran an A/B test on Ruby Lane, one of the largest e-commerce sites for vintage and antique items. We tested a simplified checkout page with just the most essential information, and with a more prominent checkout button. The hypothesis was that giving the user an obvious next step and removing distractions will increase checkout rates.

After a significant amount of traffic, the test remained inconclusive. To my surprise, the simplified variation seemed to have no effect.

If our expectation was only to see wins, then this test would’ve been quickly discarded and we’d be back at the drawing board. Fortunately, we treated the test as a learning opportunity and took a closer look at the targeted audience.

That scrutiny paid off: We found that due to a slight difference in checkout flows between existing users and guests, some guests never saw the new variation but were included in the results anyway. This skewed the results in favor of existing users, who were much less affected by a new checkout page.

After learning this, we restarted the test with a more precise activation method (using Optimizely’s manual activation). On the second run, we found that the variation did indeed improve checkout rates by +5%. For an e-commerce site of their site, that’s a significant increase in revenue.

If we only chased big wins, we would’ve overlooked the first inconclusive test and missed the opportunity to increase checkout rates and revenue.

Getting Value from Negative Tests

When a test variation results in fewer conversions, there’s a temptation to stop it immediately and move on without ever mentioning it again, lest someone thinks you’re doing something wrong.

Resist that temptation. Let the test run until you reach your sample size necessary to reach a significant conclusion, then interpret your results:

Was your hypothesis wrong? Instead of creating more variations, you might need to reevaluate the factors you thought were affecting conversion rates.
Why did the original version perform better? Try to understand what made the original version more effective than the variation, and consider that for future tests.
What does this result teach you about your visitors? For example, if adding pricing information to your service’s homepage results in more bounces, what does that suggest about their expectations or understanding of your service?

Key Takeaways

Successful A/B testing begins with proper expectations.
Be careful with success stories. Don’t let them influence expectations about A/B testing and conversion optimization.
A/B testing is an effective but long-term process to grow business. It’s not a quick and cheap “growth hack.”
You can get exponentially more value from A/B testing by analyzing your neutral and negative results.