When does statistical significance apply to UX research?

Q: When does statistical significance apply to UX research?

Quantitative methods allow us to measure variables and test specific hypotheses, while qualitative methods enable us to explore concepts and experiences in depth. So, when it comes to determining and using statistical significance, it simply does not make sense in qualitative research.

Ideas

When Does Statistical Significance Apply to Your Research?

Sometimes asking about statistical significance isn't the right question. Here's when to know if you're on track.

Words by Nikki Anderson-Stanier, Visuals by Austin Smoldt-Sáenz

Qualitative user research is one of the best kinds of data.

You can get so much from conversations with your users—rich stories filled with insights and a deep understanding of how that person thinks. These interviews can help shape your product, innovate, and give solid ideas for valuable improvements.

As a researcher, it’s incredibly rewarding to present these insights to your team. There is a certain level of excitement that I've found with these types of reports. With this information, I can help teams make better decisions or uncover potential exciting and innovative projects.

So, what happens when that all comes to a grinding halt with:

"Are these results statistically significant?"

The significance of statistical significance

As a qualitative user researcher, I have encountered this question many times since my sample sizes tend to be small—especially in the first phase of my generative research.

And it tripped me up for years. I never felt like I had the best answer to this question. I never felt like I could squash the statistical significance bug. I would mumble about quantitative versus qualitative research, or tell them statistical significance didn't apply to this study.

No matter what avenue I turned down in response to this question, it just didn't feel good. I wasn't prepared or confident.

And then, at one point, I began sharing results that I was incredibly excited about. These were the kind of results that could change our product for the better. Right on cue came that question. Once someone asked the dreaded question, it seemed to permeate everyone's minds, invalidating everything I had just said.

At that moment, I vowed to get more clarity behind this question and build responses that made me feel more confident. So here are some of the main ways I approach this question.

What is statistical significance?

Shockingly, not a lot of people know what statistical significance even means. And I would argue that asking about the statistical significance of qualitative user research methods is a red flag. So I will attempt to explain statistical significance concisely, but may not hit it 100% spot on.

When it comes to statistical significance, we generally start with a null hypothesis, which means we assume there is NO relation, effect, or difference between the two things we measure. So, for instance, if we were doing a study investigating the relationship between drinking sparkling water and being able to run fast, we would start the study assuming there was no relation between the two.

We would then go about our research using a p-value to determine if the data is statistically significant. The p-value tells you how often you would expect to see a test statistic as extreme or more extreme than the one calculated by your statistical test if the null hypothesis of that test was true. If a p-value is less than 5%, you have reached statistical significance and can reject the null hypothesis.

So, if we investigated the relationship between drinking sparkling water and running faster, and our p-value was .05 or below, we would reject our null hypothesis. This means we would reject that there is no relationship between drinking sparkling water and running faster. We have observed, instead, that there is a relationship between the two.

Statistical significance can tell you whether or not the null hypothesis (that there is NO relationship between drinking sparkling water and running faster) is supported. Significance testing assumes the null is true. Therefore, a p-value only provides information against the null hypothesis, and is not in favor of it.

Statistical significance cannot tell you:

If any alternative hypotheses are true
The effect size of a result
The importance of a result
Why the null hypothesis is not supported, or why your alternative hypothesis might be true

When it comes to determining and using statistical significance, it simply does not make sense in qualitative research.

Nikki Anderson-Stanier
Founder, User Research Academy

Quantitative versus qualitative methods

Quantitative research is statistical because it has numbers attached to it. With quantitative research, we see averages or percentages.

Qualitative research uses non-statistical methods.

Quantitative methods allow us to measure variables and test specific hypotheses, while qualitative methods enable us to explore concepts and experiences in depth.

So, when it comes to determining and using statistical significance, it simply does not make sense in qualitative research. Qualitative research, as it exists, is a non-statistical approach. With qualitative research, we do not try to measure certain variables, nor do we try to test specific hypotheses. Never in my life have I used qualitative research to determine whether or not a null hypothesis is supported.

One of the other reasons that qualitative research is not statistical is that we are not trying to generalize or widen our scope of understanding. Instead, we are trying to understand what is taking place on a deep level.

With qualitative research, we are in the first phases of our study. So we aren't looking to understand if our findings represent a broader population because we don't have findings yet. So we first have to go deep before we go wide, or we will miss crucial insights that we can later test at large.

My first answer to the question about statistical significance is to explain what precisely statistical significance is, and the difference between quantitative and qualitative user research. I stress that qualitative research is a non-statistical approach.

Of course, that doesn't mean we can't follow up qualitative research with quantitative research (or vice versa). Still, standalone qualitative research is a non-statistical approach to understanding, and colleagues need to deeply understand the goals of qualitative research versus those of quantitative research.

Formative versus summative approaches

Whenever dealing with usability testing, I use the formative versus summative approach. Usability testing can be tricky because it can involve quantitative-looking data, but it is still a relatively qualitative approach with a smaller sample size.

Both formative and summative approaches fall under evaluative research. However, they evaluate designs, products, and apps differently.

Formative evaluations zero in on uncovering issues within a design or experience and why/how those issues occur for participants. Usability testing is a fantastic example of a formative approach. We test the design and experience with usability testing to understand problems and help inform the product development process.

With this iterative approach, we might test and iterate on the design several times until we feel we are on the right path. We conduct formative testing in the earlier stages of the product development process.

With formative testing, we are:

Uncovering the issues that occur
Understanding why those issues are occurring
Understanding how those issues are occurring

During formative evaluations, we are NOT looking at how many people encounter an issue. Therefore, statistical significance is irrelevant in qualitative usability testing (formative approach).

Summative testing, also known as quantitative testing, answers those questions that stakeholders are asking. Summative testing looks at how many people are impacted by an issue and how much impact an issue has. With a summative approach, you use a larger sample size.

However, I would still argue that summative testing and statistical significance are not 100% related. This is because, as we remember, statistical significance is looking at whether or not the null hypothesis is supported.

Instead, we see the effect size with summative testing, which leads me to my last approach to the statistical significance question.

Understand the question behind the question

My last approach to the significance question is to ask stakeholders about their actual concern. When they throw out the statistical significance question, especially regarding qualitative research, what are they actually worried about?

If you can put your research hat on and probe past the buzzword of statistical significance, you might find other concerns you can more easily address.

For example, stakeholders often confuse statistical significance with a representative sample or understanding effect size. You can uncover this quickly by asking questions:

Stakeholder: "Are these results statistically significant?"
Researcher: "What are you worried about with the results?"
Stakeholder: "You only spoke to 10 people. How do we know they are representative of our users?"

So, instead of looking for whether or not our null hypothesis is supported, stakeholders are concerned that a small sample size is not representative of the given population or doesn't tell us how many people are impacted by our findings.

By uncovering these deeper concerns, we can readily address the problem: representative sample and effect sizes.

One answer to these concerns is theoretical saturation. Theoretical saturation means we no longer learn anything new during our research sessions, and aren't making new connections during analysis. When it comes to qualitative data, this is the type of "significance" we are looking for.

Wrapping it up

The next step is to use a mixed-methods approach to help determine a representative sample and effect size. When we use mixed methods, we can go both deep and wide, which helps allay the fears and concerns of our stakeholders. As a result, we can uncover rich insights and better understand how many people are impacted by these concepts—precisely what our stakeholders are asking about.

And, none of it has to do a bit with statistical significance!

Nikki Anderson-Stanier is the founder of User Research Academy and a qualitative researcher with 9 years in the field. She loves solving human problems and petting all the dogs.

To get even more UXR nuggets, check out her user research membership, follow her on LinkedIn, or subscribe to her Substack.