Can you trust your CRO test results? Rules for CRO test validity

Every CROer knows the mix of joy and relief that comes with any test being pushed live. Your hard planning and development work finally translating into a tangible experience that real-life people get to interact with, for better or for worse.

But are you sure that your test has been set up in valid conditions to start with? Particularly if the test results you are seeing seem unlikely or plainly make no sense, your first move should be to review the test’s setup.

What is test validity?

Test validity is meant as the assurance that your CRO test is thought-out and set up in conditions allowing it to yield valid results. The idea is to remove chance and unaccountability from your testing process, so the results you are seeing come from an objective place.

Why care?

If the validity of your test is compromised:

‍Results might not make sense: you might see unlikely impacts on various metrics that you struggle to make heads or tails of. This could be a result of the test introducing influences you weren’t accounting for initially, which might totally wipe out your initial hypothesis.
‍‍
Results might look wrong: you might undertrack or overtrack certain metrics, or worse, display a broken experience to visitors under certain circumstances. This could be due to a defect in the technical setup of the test. In that case, it will be hard to get anything useful out of the test results as they might be far too compromised.

Either way, an invalid test is always going to be a waste of time, resources, and more crucially, trust. Especially if you had to advocate hard for CRO to become part your marketing activity, you need to nurture the latter element of trust, which a useless test is not going to help you with.

Where and when test validity matters

Concept/Planning phase

As you conceptualise the test and crystalise your thinking into a hypothesis and test plan, ask yourself:

What business outcomes do you expect from the test?
Is the change you’re proposing to make likely to generate these outcomes?
Will you be able to measure this?
Is the test technically achievable in the first place?
Will you have any potential limiting elements to account for (e.g.: skills, resources, tools, etc.)?

Build/QA phase

As the test is being developed and QAed, ask yourself:

Is my CRO tool available in the right places across the website?
Is my test coded correctly, as per test plan specs?
Once live, are other tests likely to run at the same time?
Are there any changes scheduled to happen on the Control version during my test’s runtime?
Have we checked for unexpected scenarios as part of the QA process?

Analysis phase

As the test is still live and you’re trying to make sense of the results, ask yourself:

Do I have a big enough sample of visitors/converters to call it a day?
Regardless of sample size, has the test been running for long enough?
Have my results reached statistical significance?
Have I looked at full weeks of data?
Are there any external factors influencing the results (e.g.: seasonal event, marketing campaign, etc.)?

Logical vs. Technical validity

When assessing test validity, you can divide considerations into two main categories: logical and technical validity.

Logical validity

Experiment contents

Make changes that are noticeable/meaningful enough
If making multiple changes in one experiment, ensure they all work together towards a common objective
Use a researched hypothesis to back up your test

Controlled conditions

Avoid Control changes when the test is live
Define a device/browser segmentation
Be mindful of your promotional calendar and its influence
Ensure your test content is actually novel for users on your website

Mutual exclusion

If you run more than one test at once, ensure the tests are mutually exclusive so their users do not mix
This ensures you can understand the impact of purely one test at a time in the results

Live test conditions

Run tests for a minimum of 14 full days
Have a minimum of 100 unique conversions per experiment for key metrics
Do not trust results that are not significant according to a recognised statistical methodology (e.g.: p-value, confidence level, chance to beat control, etc.)
Only look at full weeks of data

Technical validity

Flicker

Deliver your CRO tool’s script synchronously
However, be ready to accept that delivering that script behind cookie acceptance will inevitably result in some flicker on the page
Make use of page hiding for a more graceful content delivery

Test entry

Only happens when tested elements are (Experiment)/would be (Control) present
Know the difference between a simple (e.g.: on pageload) and complex (e.g.: when a specific interaction has happened) test entry, and factor this into your test planning

Unexpected scenarios

Identify ‘mechanical’ scenarios that increase the risk of breaking tests (e.g.: page refresh, navigating back/forth, being logged in…)
Identify other test-specific scenarios that could be influencing content delivery
Ensure you QA attentively for all these scenarios

Span of your CRO tool

Ensure the tool spans all the areas where changes need to be made and conversions tracked
Understand that unavoidable gaps might exist in your test delivery and data due to the tool not being present everywhere it should be

Thorough QA

Page rendition
Look & feel
Test entry
Scope of changes
Conversion tracking
Segmentation
Errors in console
Unexpected/test-specific scenarios

In conclusion

With the above ground rules, I have tried to help you define a checking process that will ensure the validity of your tests. But as important as creating the process itself is its repeatability: if you follow the same rules test after test, you will safeguard that not only your tests are valid in isolation, but also as a reasoned CRO programme.

‍

Related resources:

Blog: The CRO Process: Four Steps to Take Before Your Test Goes Live

Blog: There's no such thing as a perfect test idea: 7 key ideas for an experimentation mindset

‍

Can you trust your CRO test results? Rules for CRO test validity

What is test validity?

Why care?

Where and when test validity matters

Concept/Planning phase

Build/QA phase

Analysis phase

Logical vs. Technical validity

Logical validity

Experiment contents

Controlled conditions

Mutual exclusion

Live test conditions

Technical validity

Flicker

Test entry

Unexpected scenarios

Span of your CRO tool

Thorough QA

In conclusion

Latest Articles

It's not optional: CRO is a JFDI situation for marketers

The ultimate six-step CRO script to win C-Suite support

Four hard truths about winning support for CRO

Can you trust your CRO test results? Rules for CRO test validity

What is test validity?

Why care?

Where and when test validity matters

Concept/Planning phase

Build/QA phase

Analysis phase

Logical vs. Technical validity

Logical validity

Experiment contents

Controlled conditions

Mutual exclusion

Live test conditions

Technical validity

Flicker

Test entry

Unexpected scenarios

Span of your CRO tool

Thorough QA

In conclusion

Subscribe for CRO tips & advice

Latest Articles

It's not optional: CRO is a JFDI situation for marketers

The ultimate six-step CRO script to win C-Suite support

Four hard truths about winning support for CRO