Download our FREE ebook of 43 A/B testing case studies from the world's leading companies for test ideas and inspiration. Download Now

Optimizely Blog

Tips & Tricks for Building Your Experimentation Program

In Part 1 of this series, we explored why and how most A/B tests return inconclusive. Part 2 continues with a deeper analysis of wins, returns, and how to maximize ROI from your test program.

As we explored in Part 1 of this series, most eCommerce tests are inconclusive, with results clustering around 0% change. Flat test results are extremely frustrating–there’s no win, and sometimes, there’s not even that much to learn from them (the opposite of a twofer.)

Of course, conversion optimization would not be one of the most powerful ways to drive growth if tests never won. And by tracking and analyzing hundreds of A/B tests, we were able to prove that disciplined teams can predictably and continuously drive wins through their testing programs. There’s even some magic math that can help you maximize ROI on testing for your specific business.

So–how do wins work and what do they really get you?

eCommerce Test Wins: A quantified look at expected lift

In analyzing our data, we found that the test outperformed the control in 40% of experiments, and of those tests with positive results, 80% produced a lift between 1% and 20%.

What does that mean? For eCommerce tests you can expect a positive result in fewer than half your tests, and the lifts will typically not exceed 20%.

But what about all the case studies touting 30%, 100%, or even 300% increases in conversion rate?

It’s true, extremely high lifts do happen in eCommerce conversion experiments. In fact, for any experiment increasing conversions by more than 20%, the fat tail takes over the data, showing that a result is equally likely to be a +40% lift as it is a +100% lift.

eCommerce Test Wins Distribution

This means 1 out of every 5 positive experiments you see is going to drive a big, knock-your-socks-off increase in conversions.

Taking both the probability of a win and the expected lift, you can determine that ⅓ of the lift generated from an eCommerce testing program will come from those tests that generate between 1-20% lift. A whopping ⅔ of the of the incremental revenue added by an eCommerce testing program comes from those case-study-worthy 20%+ wins.

eCommerce Test Wins 2/3 Greater than 20%

So even though only 1 out of every 10 tests will generate a lift greater than 20%, that lift will make up the majority of revenue added from your testing program.

And before you use that stat as a reason to open the radical vs. incremental test debate, and claim these outsized wins are a reason to test big changes50% of the tests generating significant A/B test wins were from simple, incremental tests, like updating a CTA or adding a short line of explanatory text. Only 16% of the “big wins” came from complex test variations.

What does this mean for my eCommerce testing strategy?

The data we’ve gathered so far can help you determine an optimal testing approach based off your traffic and conversion rate.

Time to Run Tests versus Lift Expected

Above is a simple graph looking at the time required in weeks to run a test depending upon the expected lift (you can use your favorite split test duration calculator to determine this for yourself).

The time (number of weeks) is shown on the left in green, while the lift is shown in blue on the right. As you stats experts know, the time needed to run a test decreases is inversely related to the minimum lift you want to detect.

You can run a test 33% faster if you increase your minimum detected lift from 10% to 20%. We all know time is money, so why not set a high expected minimum for your tests and run a fast and furious testing program?

Well, as our data tells us, along with a decrease in time to run a test comes a decrease in probability of obtaining that increase. Two out of 10 tests had a conversion rate increase of 10% or more, but only 1 out of 10 tests had an increase of 20% more. So, you’ll reduce the time needed to run a test, but you’ll also negatively impact the rate at which you see wins.

This insight guides your testing depending on the amount of traffic to your test pages. We recommend running all eCommerce A/B tests for at least one full week to account for any day-to-day effects within a week.

With this minimum time range and a test page that has at least 100k unique visitors a month, looking for a minimum lift of 10% makes the most sense. Tests will still only take a week to run and reaching for a higher lift won’t reduce the time it takes to run a test.

With a lower traffic site, we recommend looking for a lift of 20% or more. By setting this higher standard for lifts, you will miss some of those smaller wins (about ⅓ of the expected value of your testing program), but this will be balanced out by the benefits of increased testing velocity.

ecommerce test process

Seriously, Always Be Testing

Test coverage (how much of your traffic or testable days you are taking advantage of) makes a big difference in program performance. With 52 weeks in a year, several of which are major holidays, there are only about 45 eCommerce testing opportunities per year. Any opportunity missed waiting on an inconclusive result is essentially a win forgone. If you don’t have tons of traffic, it is critical to cut your losses fast, test diligently, and shoot to make up the difference in outsized wins.

The chart below shows an idealized, expected value increase from running a testing program on a page with 50k unique visitors a month. In this ideal case, this company is able to constantly run a test and never gets testing fatigue.

Let’s graph these potential testing situations.

The blue line represents a highly disciplined approach: only runs tests looking for a 20% conversion rate increase, stops all tests when they hit the traffic threshold, and quickly starts the next test.

The red line is the same program, but looking for 10% or greater lift, instead. This means this program has to run tests 3x as long, which means significantly fewer tests.

The yellow line represents most companies that do sporadic testing and let tests run for weeks or months. Quite a difference in conversion rates at the end of a year.

eCommerce Test Program Returns

Given the expected returns from testing, teams need to be diligent about setting appropriate traffic limits and sticking to them. Opportunity cost of absent or infrequent testing is one of the biggest killers of conversion rate optimization program performance.

I promised some magic and here it is: running at least one experiment per week, every week, is the only guaranteed way to triple your eCommerce conversion rate through testing.


So here are the big lessons, summed up:

    1. Most tests are flat
    2. And most tests that win generate a lift of under 20%
    3. But if you win over 20%, you’ll get an outsized increase in conversions
    4. Your traffic levels should determine the minimum detectable lift you look for in tests
    5. Test at least once a week, every week

Doing this is hard, but not impossible. Most companies are not able to maximize test capacity because of a lack of resources and a lackadaisical approach to testing: letting tests run too long. But stick to the formula, and that data shows you can drive ROI. The returns from a disciplined testing program can be significant for your conversion rate, and, in turn, the growth of your ecommerce business.


Optimizely X