Download our FREE ebook of 43 A/B testing case studies from the world's leading companies for test ideas and inspiration. Download Now

Optimizely Blog

Tips & Tricks for Building Your Experimentation Program

Data is great. Not only can it be used to inform a single A/B test hypothesis, but also an entire A/B testing strategy. But the industry as a whole throws out platitudes of test strategy without backing it up with data. Why, in a field that preaches a unbiased, unassuming data-driven approach, are we not applying the same principle to our expertise?

We’re in a unique position to contribute to this conversation. Because we work with high-velocity testing teams, we have a wealth of data-backed insights as to what works, and what doesn’t, when it comes to testing.

You’ve gotten a sneak peek of some of our data before. We’ve talked about which eCommerce pages have the biggest win rate for tests and put our data behind the never-ending incremental vs. radical testing debate.

For this two part series, we want to explore what baseline testing performance looks like for eCommerce companies, and what you should expect your testing program to look like when it comes to wins, conversion lifts, and frequency of returns. We’ve analyzed hundreds of eCommerce tests all with the same goal: improve purchase conversion rate. From this, we’ve learned a few things. In part one, we’ll be tackling the toughest one:

You better get used to ties.

eCommerce Test Results and Ties

Looking at test data, most results are flat. So get used to seeing a lot of photo finish ties.

Average performance for eCommerce A/B testing programs

First–let’s look at things from a very high level. What can our data tell us about eCommerce testing programs?

Our overall data about eCommerce A/B tests tell us a few things:

  • The average test result is +6% (but not statistically significant)
  • Half of tests drive a positive result, half of tests drive a negative result (Median is -1%)

For the math nerds (hi friends!):

  • Standard deviation of the data was 50%
  • Minimum change: -70%
  • Maximum change: 290%
Probability of Outcome Between Two Values

A histogram of the probability of an outcome between two values


The histogram and summary statistics show that the data roughly follows a normal distribution, though peaked and clustered around 0%. The majority of test results fall in an inconclusive range between -15% and 15%.

This backs the anecdotal feedback that most tests are flat. Which means getting one of those can’t believe it, incredible stories of increasing your conversion rate by 300% is a much less common occurrence than you’d be led to believe.

What does losing (or not knowing) mean for your testing program?

If most of what you can expect from a best-in-class testing program are inconclusive results–what do you do? The key to leveraging flat results is in being more strategic with your tests. This means three things:

  1. Test for sensitivity
  2. Manage opportunity costs
  3. Test faster

Test for Sensitivity

If most of your tests are inconclusive, how do you gain any knowledge out of the results? The key is to have a specific hypothesis behind your tests and identify which elements of the experience have been impacted by the variation. This seems like basic stuff, but many CROs come hot out the gate with an “idea,” but don’t actually outline (or record) the hypothesis behind it. If you stay disciplined, you can keep an organized record of which elements (and hypotheses) move the needle and which point to “insensitive” areas of the site (element or psychological triggers that have little impact on purchasing behavior.)

Once you have this record of sensitive and insensitive areas of the site, you can help manage your testing effort more efficiently.

Manage Opportunity Costs

Since many tests end up in the no-man’s-land cluster around 0% change, many eCommerce A/B tests can take forever to reach statistical significance when using something like Stats Engine (and leave you in unbearable stats engine purgatory). If an inconclusive test is a blocker to other tests being run, this has significant opportunity costs to your testing program. We’ll explore this more in depth in Part 2, but it is important to know when to jump ship on a test that’s going nowhere.

Test Faster

It’s not terrible that most tests are flat–as we’ve said before, it’s a well reported phenomenon impacting even the best testing teams. Now that we’ve published real data behind this–you can actually know if you’re under or over performing in your win rate. This does mean, however, you have a data-backed reason to always be testing, and fast. Live with the knowledge that most results will be flat, but test fast knowing that the big win is right around the corner.

Which we’ll be exploring in more depth in Part 2: When You Win, You Win Big.

Optimizely X