7.1 Reading results in quantitative research

Learning Objectives

Learners will be able to…

  • Describe how statistical significance and confidence intervals demonstrate which results are most important

Pre-awareness check (Knowledge)

What do you know about previously conducted research on your topic (e.g., statistical analyses, qualitative and quantitative results)?

If you recall, empirical journal articles are those that report the results of quantitative or qualitative data analyzed by the author. They follow a set structure—introduction, methods, results, discussion/conclusions. This chapter is about reading what is often the most challenging section: results.

Quantitative results

Quantitative articles often contain tables, and scanning them is a good way to begin reading the results. A table usually provides a quick, condensed summary of the report’s key findings. Tables are a concise way to report large amounts of data. Some tables present descriptive information about a researcher’s sample (often the first table in a results section). These tables will likely contain frequencies (n) and percentages (%). For example, if gender happened to be an important variable for the researcher’s analysis, a descriptive table would show how many and what percent of all study participants are of a particular gender. Frequencies or “how many” will probably be listed as n, while the percent symbol (%) might be used to indicate percentages. The symbol N is used for the entire sample size, and n is used for the size of a portion of the entire sample.

In a table presenting a causal relationship, two sets of variables are represented. The independent variable, or cause, and the dependent variable, the effect. We’ll go into more detail on variables in Chapter 8. Independent variable attributes are typically presented in the table’s columns, while dependent variable attributes are presented in rows. This allows the reader to scan a table’s rows to see how values on the dependent variable change as the independent variable values change. Tables displaying results of quantitative analysis will also likely include some information about which relationships are significant or not. We will discuss the details of significance and p-values later in this section.

Let’s look at a specific example: Table 7.1 below.

Table 7.1 Percentage reporting harassing behaviors at work
Behavior experienced at work Women Men p-value
Subtle or obvious threats to your safety 2.9% 4.7% .623
Being hit, pushed, or grabbed 2.2% 4.7% .480
Comments or behaviors that demean your gender 6.5% 2.3% .184
Comments or behaviors that demean your age 13.8% 9.3% .407
Staring or invasion of your personal space 9.4% 2.3% .039
Note: Sample size was 138 for women and 43 for men.

Table 7.1 presents the association between gender and experiencing harassing behaviors at work. In this example, gender is the independent variable (the predictor) and the harassing behaviors listed are the dependent variables (the outcome).[1] Therefore, we place gender in the table’s columns and harassing behaviors in the table’s rows.

Reading across the table’s top row, we see that 2.9% of women in the sample reported experiencing subtle or obvious threats to their safety at work, while 4.7% of men in the sample reported the same. We can read across each of the rows of the table in this way. Reading across the bottom row, we see that 9.4% of women in the sample reported experiencing staring or invasion of their personal space at work while just 2.3% of men in the sample reported having the same experience. We’ll discuss p-values later in this section.

While you can certainly scan tables for key results, they are often difficult to understand without reading the text of the article. The article and table were meant to complement each other, and the text should provide information on how the authors interpret their findings. The table is not redundant with the text of the results section. Additionally, the first table in most results sections is a summary of the study’s sample, which provides more background information on the study than information about hypotheses and findings. It is also a good idea to look back at the methods section of the article as the data analysis plan the authors outline should walk you through the steps they took to analyze their data which will inform how they report them in the results section.

Statistical significance

The statistics reported in Table 7.1 represent what the researchers found in their sample. The purpose of statistical analysis is usually to generalize from a the small number of people in a study’s sample to a larger population of people. Thus, the researchers intend to make causal arguments about harassing behaviors at workplaces beyond those covered in the sample.

Generalizing is key to understanding statistical significance. According to Cassidy et al. (2019),[2] 89% of research methods textbooks in psychology define statistical significance incorrectly. This includes an early draft of this textbook which defined statistical significance as “the likelihood that the relationships we observe could be caused by something other than chance.” If you have previously had a research methods class, this might sound familiar to you. It certainly did to me!

But statistical significance is less about “random chance” than more about the null hypothesis. Basically, at the beginning of a study a researcher develops a hypothesis about what they expect to find, usually that there is a statistical relationship between two or more variables. The null hypothesis is the opposite. It is the hypothesis that there is no relationship between the variables in a research study. Researchers then can hopefully reject the null hypothesis because they find a relationship between the variables.

For example, in Table 7.1 researchers were examining whether gender impacts harassment. Of course, researchers assumed that women were more likely to experience harassment than men. The null hypothesis, then, would be that gender has no impact on harassment. Once we conduct the study, our results will hopefully lead us to reject the null hypothesis because we find that gender impacts harassment. We would then generalize from our study’s sample to the larger population of people in the workplace.

Statistical significance is calculated using a p-value which is obtained by comparing the statistical results with a hypothetical set of results if the researchers re-ran their study a large number of times. Keeping with our example, imagine we re-ran our study with different men and women from different workplaces hundreds and hundred of times and we assume that the null hypothesis is true that gender has no impact on harassment. If results like ours come up pretty often when the null hypothesis is true, our results probably don’t mean much. “The smaller the p-value, the greater the statistical incompatibility with the null hypothesis” (Wasserstein & Lazar, 2016, p. 131).[3] Generally, researchers in the social sciences have set alpha at .05 for the value at which a result is significant (p is less than or equal to .05) or not significant (p is greater than .05). The p-value .05 refers to if less than 5% of those hypothetical results from re-running our study show the same or more extreme relationships when the null hypothesis is true. Researchers, however, may choose a stricter standard such as .01 in which 1% or less of those hypothetical results are more extreme or a more lenient standard like .1 in which 10% or less of those hypothetical results are more extreme than what was found in the study.

Let’s look back at Table 7.1. Which one of the relationships between gender and harassing behaviors is statistically significant? It’s the last one in the table, “staring or invasion of personal space,” whose p-value is .039 (under the p<.05 standard to establish statistical significance). Again, this indicates that if we re-ran our study over and over again and gender did not impact staring/invasion of space (i.e., the null hypothesis was true), only 3.9% of the time would we find similar or more extreme differences between men and women than what we observed in our study. Thus, we conclude that for staring or invasion of space only, there is a statistically significant relationship.

For contrast, let’s look at “being pushed, hit, or grabbed” and run through the same analysis to see if it is statistically significant. If we re-ran our study over and over again and the null hypothesis was true, 48% of the time (p=.48) we would find similar or more extreme differences between men and women. That means these results are not statistically significant.

This discussion should also highlight a point we discussed previously: that it is important to read the full results section, rather than simply relying on the summary in the abstract. If the abstract stated that most tests revealed no statistically significant relationships between gender and harassment, you would have missed the detail on which behaviors were and were not associated with gender. Read the full results section! And don’t be afraid to ask for help from a professor in understanding what you are reading, as results sections are often not written to be easily understood.

Statistical significance and p-values have been critiqued recently for a number of reasons, including that they are misused and misinterpreted (Wasserstein & Lazar, 2016)[4], that researchers deliberately manipulate their analyses to have significant results (Head et al., 2015)[5], and factor into the difficulty scientists have today in reproducing many of the results of previous social science studies (Peng, 2015).[6] For this reason, we share these principles, adapted from those put forth by the American Statistical Association,[7] for understanding and using p-values in social science:

  1. p-values provide evidence against a null hypothesis.
  2. p-values do not indicate whether the results were produced by random chance alone or if the researcher’s hypothesis is true, though both are common misconceptions.
  3. Statistical significance can be detected in minuscule differences that have very little effect on the real world.
  4. Nuance is needed to interpret scientific findings, as a conclusion does not become true or false when the p-value passes from p=.051 to p=.049.
  5. Real-world decision-making must use more than reported p-values. It’s easy to run analyses of large datasets and only report the significant findings.
  6. Greater confidence can be placed in studies that pre-register their hypotheses and share their data and methods openly with the public.
  7. “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. For example, a p-value near .05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large p-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data” (Wasserstein & Lazar, 2016, p. 132).

Confidence intervals

Because of the limitations of p-values, scientists can use other methods to determine whether their models of the world are true. One common approach is to use a confidence interval, or a range of values in which the true value is likely to be found. Confidence intervals are helpful because, as principal #5 above points out, p-values do not measure the size of an effect (Greenland et al., 2016).[8] Remember, something that has very little impact on the world can be statistically significant, and the values in a confidence interval would be helpful. In our example from Table 7.1, imagine our analysis produced a confidence interval that women are 1.2-3.4 times more likely to experience “staring or invasion of personal space” than men. As with p-values, calculation for a confidence interval compares what was found in one study with a hypothetical set of results if we repeated the study over and over again. If we calculated 95% confidence intervals for all of the hypothetical set of hundreds and hundreds of studies, that would be our confidence interval. 

Confidence intervals are pretty intuitive. As of this writing, my wife and are expecting our second child. The doctor told us our due date was December 11th. But the doctor also told us that December 11th was only their best estimate. They were actually 95% sure our baby might be born any time in the 30-day period between November 27th and December 25th. Confidence intervals are often listed with a percentage, like 90% or 95%, and a range of values, such as between November 27th and December 25th. You can read that as: “we are 95% sure your baby will be born between November 27th and December 25th because we’ve studied hundreds of thousands of fetuses and mothers, and we’re 95% sure your baby will be within these two dates.”

Notice that we’re hedging our bets here by using words like “best estimate.” When testing hypotheses, social scientists generally phrase their findings in a tentative way, talking about what results “indicate” or “support,” rather than making bold statements about what their results “prove.” Social scientists have humility because they understand the limitations of their knowledge. In a literature review, using a single study or fact to “prove” an argument right or wrong is often a signal to the person reading your literature review (usually your professor) that you may not have appreciated the limitations of that study or its place in the broader literature on the topic. Strong arguments in a literature review include multiple facts and ideas that span across multiple studies.

You can learn more about creating tables, reading tables, and tests of statistical significance in a class focused exclusively on statistical analysis. We provide links to many free and openly licensed resources on statistics in Chapter 16. For now, we hope this brief introduction to reading tables will improve your confidence in reading and understanding the results sections in quantitative empirical articles.

Key Takeaways

  • The results section of empirical articles are often the most difficult to understand.
  • To understand a quantitative results section, look for results that were statistically significant and examine the confidence interval, if provided.

Post-awareness check (Emotional)

On a scale of 1-10 (10 being excellent), how would you rate your confidence level in your ability to understand a quantitative results section in empirical articles on your topic of interest?

Exercises

TRACK 1 (IF YOU ARE CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

Select a quantitative empirical article related to your topic.

  • Write down the results the authors identify as statistically significant in the results section.
  • How do the authors interpret their results in the discussion section?
  • Do the authors provide enough information in the introduction for you to understand their results?

TRACK 2 (IF YOU AREN’T CREATING A RESEARCH PROPOSAL FOR THIS CLASS)

You are interested in researching the effects of race-based stress and burnout among social workers.

Select a quantitative empirical article related to this topic.

  • Write down the results the authors identify as statistically significant in the results section.
  • How do the authors interpret their results in the discussion section?
  • Do the authors provide enough information in the introduction for you to understand their results?

 


  1. It wouldn’t make any sense to say that people’s workplace experiences predict their gender, so in this example, the question of which is the independent variable and which are the dependent variables has a pretty obvious answer.
  2. Cassidy, S. A., Dimova, R., Giguère, B., Spence, J. R., & Stanley, D. J. (2019). Failing grade: 89% of introduction-to-psychology textbooks that define or explain statistical significance do so incorrectly. Advances in Methods and Practices in Psychological Science2(3), 233-239.
  3. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  4. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  5. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS biology, 13(3).
  6. Peng, R. (2015), The reproducibility crisis in science: A statistical counterattack. Significance, 12, 30–32.
  7. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70, p. 129-133.
  8. Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology31(4), 337-350.
definition

License

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book