Every CROer knows the mix of joy and relief that comes with any test being pushed live. Your hard planning and development work finally translating into a tangible experience that real-life people get to interact with, for better or for worse.
But are you sure that your test has been set up in valid conditions to start with? Particularly if the test results you are seeing seem unlikely or plainly make no sense, your first move should be to review the test’s setup.
What is test validity?
Test validity is meant as the assurance that your CRO test is thought-out and set up in conditions allowing it to yield valid results. The idea is to remove chance and unaccountability from your testing process, so the results you are seeing come from an objective place.
Why care?
If the validity of your test is compromised:
- Results might not make sense: you might see unlikely impacts on various metrics that you struggle to make heads or tails of. This could be a result of the test introducing influences you weren’t accounting for initially, which might totally wipe out your initial hypothesis.
- Results might look wrong: you might undertrack or overtrack certain metrics, or worse, display a broken experience to visitors under certain circumstances. This could be due to a defect in the technical setup of the test. In that case, it will be hard to get anything useful out of the test results as they might be far too compromised.
Either way, an invalid test is always going to be a waste of time, resources, and more crucially, trust. Especially if you had to advocate hard for CRO to become part your marketing activity, you need to nurture the latter element of trust, which a useless test is not going to help you with.
Where and when test validity matters
Concept/Planning phase
As you conceptualise the test and crystalise your thinking into a hypothesis and test plan, ask yourself:
- What business outcomes do you expect from the test?
- Is the change you’re proposing to make likely to generate these outcomes?
- Will you be able to measure this?
- Is the test technically achievable in the first place?
- Will you have any potential limiting elements to account for (e.g.: skills, resources, tools, etc.)?
Build/QA phase
As the test is being developed and QAed, ask yourself:
- Is my CRO tool available in the right places across the website?
- Is my test coded correctly, as per test plan specs?
- Once live, are other tests likely to run at the same time?
- Are there any changes scheduled to happen on the Control version during my test’s runtime?
- Have we checked for unexpected scenarios as part of the QA process?
Analysis phase
As the test is still live and you’re trying to make sense of the results, ask yourself:
- Do I have a big enough sample of visitors/converters to call it a day?
- Regardless of sample size, has the test been running for long enough?
- Have my results reached statistical significance?
- Have I looked at full weeks of data?
- Are there any external factors influencing the results (e.g.: seasonal event, marketing campaign, etc.)?
Logical vs. Technical validity
When assessing test validity, you can divide considerations into two main categories: logical and technical validity.
Logical validity
Experiment contents
- Make changes that are noticeable/meaningful enough
- If making multiple changes in one experiment, ensure they all work together towards a common objective
- Use a researched hypothesis to back up your test
Controlled conditions
- Avoid Control changes when the test is live
- Define a device/browser segmentation
- Be mindful of your promotional calendar and its influence
- Ensure your test content is actually novel for users on your website
Mutual exclusion
- If you run more than one test at once, ensure the tests are mutually exclusive so their users do not mix
- This ensures you can understand the impact of purely one test at a time in the results
Live test conditions
- Run tests for a minimum of 14 full days
- Have a minimum of 100 unique conversions per experiment for key metrics
- Do not trust results that are not significant according to a recognised statistical methodology (e.g.: p-value, confidence level, chance to beat control, etc.)
- Only look at full weeks of data
Technical validity
Flicker
- Deliver your CRO tool’s script synchronously
- However, be ready to accept that delivering that script behind cookie acceptance will inevitably result in some flicker on the page
- Make use of page hiding for a more graceful content delivery
Test entry
- Only happens when tested elements are (Experiment)/would be (Control) present
- Know the difference between a simple (e.g.: on pageload) and complex (e.g.: when a specific interaction has happened) test entry, and factor this into your test planning
Unexpected scenarios
- Identify ‘mechanical’ scenarios that increase the risk of breaking tests (e.g.: page refresh, navigating back/forth, being logged in…)
- Identify other test-specific scenarios that could be influencing content delivery
- Ensure you QA attentively for all these scenarios
Span of your CRO tool
- Ensure the tool spans all the areas where changes need to be made and conversions tracked
- Understand that unavoidable gaps might exist in your test delivery and data due to the tool not being present everywhere it should be
Thorough QA
- Page rendition
- Look & feel
- Test entry
- Scope of changes
- Conversion tracking
- Segmentation
- Errors in console
- Unexpected/test-specific scenarios
In conclusion
With the above ground rules, I have tried to help you define a checking process that will ensure the validity of your tests. But as important as creating the process itself is its repeatability: if you follow the same rules test after test, you will safeguard that not only your tests are valid in isolation, but also as a reasoned CRO programme.
Related resources:
Blog: The CRO Process: Four Steps to Take Before Your Test Goes Live
Blog: There's no such thing as a perfect test idea: 7 key ideas for an experimentation mindset