Introduction
Conversion Rate Optimisation and/or Personalisation (CRO/P) usually delivers huge value return initially. If you then scale within a well thought through attribute & experience framework, this can be maintained as both team size & number of tests balloon. Most websites could run dozens of tests every month without coming close to running out of good ideas.
But… if average runtime is say 2 weeks… and I have perhaps 4 priority page types, will I max out at 9 tests/ month?
Or can I run >1 test on the same page?
Perhaps tests on totally different pages can still actually “clash”?
Considerations
There are three scenarios that you have to think about here, in order of seriousness:
- Strategic Clashes – where an experience from one test affects user expectation & influences the result of another (a.k.a “interaction effects”)
- Technical Clashes – those that “break something” & cause unacceptable UX
- Statistical Clashes – where one test’s control/ variant audiences overlap another’s, leading to “double counting” or other such obfuscations
Strategic Clashes – careful attention needed
Within all well known testing tools, the control and variant audiences are created fresh for each test, so the variant of test A will be made up of different people to the variant from test B, even if they are running on the same page. Because of this you don’t generally have to worry about unpicking “which test actually made the difference”. The exception to this is when:
“when an experience from one test affects user expectation & influences the result of another (a.k.a “interaction effects”) “
Or in other words – one test is influencing another and while the two are statistically independent, results are compromised.
An example of “Strategic Clash”:
- “Add search to navigation menu test
- “Add search to product page” test
This is a problem because the product page test could lose BECAUSE people already have search now in the topnav (due to test #1). the Topnav test could lose BECAUSE people already have search now in the PDP (due to test #2).
Whereas if instead we’d run the product page test when search was not in the Topnav, it might have won. In this example one experience has affected user expectation and distorted the result of a different experience.
An example of “No Strategic Clash”:
- “Add search to PDP” test
- “PDP image swap” test
While both of these tests run on PDP, it’s unlikely that seeing a different PDP image would make someone more or less likely to want to search, and visa versa it’s unlikely that seeing search on PDP would make someone more or less likely to prefer a different PDP hero image.
Strategic clash is inescapably subjective, so it’s important that clear rules are set at the organisation-level which are widely understood and adhered to.
Top tip: “no more than 1 test per page” is an unhelpfully crude solution to this problem that should be avoided – both over and under-correcting by wide margins.
Technical Clashes – a less uncommon but serious risk
For test B to interact with test A in such a way as to cause malfunction, one of two things must happen
- Option 1) Test A and Test B are manipulating the same element – e.g. if they both relocate the “buy now” button and Test A runs first, then Test B’s code will reference an element that no longer exists in its expected form – potentially causing malfunction
- Option 2) Test A and Test B manipulate different elements, but Test A’s manipulation has an effect on Test B’s element which causes the code to malfunction – e.g. Test A changes the width of the product image; then when Test B extends the product description copy there is no longer enough space to accommodate this.
Some AB testing tools do offer tools for this, but usually they’re unhelpful:
- In Adobe Target there’s a “Collisions” tab, that lists all activities targeted at the same page. This works at the page-level (not the element-level) and as such for most testing programs will list a very large number of Activities, often impractically so.
- Optimizely’s preview tool tells you which other live experiences the audience will qualify for. This is also at the journey rather than element-level, so comes up against the same over-information challenge.
- AB Tasty offers a function you can plug into the console to return current tests on the page, again only helpful really for small programs.
Solution
The only existing way to protect against technical clashes is human & manual. There needs to be a quick way of identifying all activities on a given page, filtering by planned live dates & filtering to only those that actually affect UX (some will just track data for example). The developer then needs to check all the relevant activities & make sure none of them are affecting the same elements of the page as the new planned test. This is why we always must maintain a global & always up to date roadmap, ideally including both page & “module” so you can filter for “product page colour swatch” tests for example. Here’s an example of one DMPG roadmap format
Top tip: just QA-ing (testing) the experience before launching is not a good solution, as many existing 50/50 tests will happen to not render. Even after 3 QA runs on the same device/ browser, there’s still a 12.5% chance of failing to spot any one already-live AB test. These odds exceed 50% for mature programs, with 3+ tests/ page.
Statistical Clashes – nothing to worry about
A statistical clash is the situation where one test’s control/ variant audiences overlap another’s, leading to “double counting” or other such obfuscations “How do I know which test really drove the conversion?” for example is a “statistical clash” concern.
This is almost never a problem
All professional AB testing tools randomise audience allocations at the test-level – that is to say that rather than maintaining global “variant” and “control” audiences that many tests pull from, each test randomises its own control and variant audiences.
For any test with >100 total participants, this means that while there will be substantial overlap between a Test A and Test B that run on overlapping journeys, this overlap will be roughly equivalent between control and variant.
Example:
- Test A: a highly successful “top product” badge that boosts conversion +10%
- Test B: an unimpactful colour change on the “buy now” CTA, from lime to mint.
Test A will indeed boost the conversion rate of Test B’s variant. It will also boost the control in exactly the same way however, so test B’s uplift will still be 0%.
To Summarise
Bullet proofing against testing clashes is only possible with an up to date & comprehensive testing roadmap, and a set of rules for subjective “strategic clash” decisions. Typically a central CRO/ Personalisation program owner is the best person to control both practices.
What do you think?
Does this match up with your own practical experience?
Maybe something totally different is working great for you…
Let me know ️