Download our FREE ebook of 43 A/B testing case studies from the world's leading companies for test ideas and inspiration. Download Now

Optimizely Blog

Tips & Tricks for Building Your Experimentation Program

Disclaimer: This post was written without consulting anyone at or WhichTestWon. We, at Experiment Engine, are making big assumptions about the test and the analysis that might have gone on before and after on the test. 

If your company is one of the smart ones that does hypothesis driven testing, make sure your test conclusions aren’t leading you astray with confounding factors.  A test might be set up to test one hypothesis, but the actual test might test several, potential hypotheses at once.

Let’s look at this Express experiment from WhichTestWon:

express experiment

Original website (control) on the left, hypothesized test design (variation) on the right


Hypothesis: Gender segmentation on the homepage will increase purchase completions
Original: Page features an image of only a woman with focus on the product
New version: Page features images segmented by gender

Result: The A/B test failed and overall purchase completions dropped

When analyzing the differences between versions, notice these changes:

Multiple Hypotheses

This experiment tests gender segmentation, but, also tests several other changes. Elements like product selection, sale items, style, and search changed from original to variation. One can’t accurately conclude that gender segmentation caused the decrease in purchase rate when these other factors are in the mix.

These are possible culprits for the drop in conversions:

1. Changes in Navigation are Suspects

Check out the top navigation (or lack thereof) in the new version. “Sale” and “lookbooks” links, along with the search function, are either gone or less prominent. Altering or removing any of these could prevent customers from moving through the funnel!

2. Deemphasizing Merchandizing

This Express experiment tests the importance of product feature, too. In this case, a prominent product or deal on the original page gets replaced. Perhaps it’s not gender segmentation causing order drop-off, but the lack of focus on product…

3. Different Calls-to-Action

Don’t forget button styles and type of CTA affect order rate, also. By removing the previous button and using “women” and “men” entrances, CTA elements are tested. Typeface, sizing, lack of a button, and copy are all players in this experiment.

4. Altered Visual Presentation

Strong visuals can have a strong impact one way or the other. Every aspect of the Express images, including style, different models, clothing worn, poses, and background location are part of the user experience. If these new images affected sales, Express didn’t necessarily test gender segmentation; they tested visuals. Add these factors up and the list of hypotheses is quite long.


When testing, make sure you know which hypothesis was tested. Although this test tries out gender segmentation, other simultaneous changes diluted targeted hypothesis. It doesn’t matter whether a test won or failed; companies face this challenge in every experiment. When you’ve reached a test conclusion, just be aware that multiple hypotheses can skew your findings!

Optimizely X