14.5 Threats to internal validity

Learning Objectives

Type your learning objectives here.

  • First
  • Second

THIS NEEDS ADDITIONAL CONTENT/REVISING

[ADD INTRO PARAGRAPH THAT REFERS BACK TO PREVIOUS SECTIONS]

There are a number of factors that may influence a study’s internal validity. You might consider these threats to all be extraneous variables, as we discussed at the beginning of this section. Each factor that influences the outcome other than the treatment (or intervention) is a potential threat to internal validity. Threats to internal validity can introduce error and bias into the experiment.

Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable in order for experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups give researchers an opportunity to explore what happens when people similar to those in the experimental group do not receive the intervention. But if the experimental and control groups are not comparable, then the differences in outcome may not be due to the intervention. No groups are ever perfectly comparable. What’s important is ensuring groups are as similar as possible along variables relevant to the research project.

When some factor related to selecting research participants prevents the groups from being comparable, then selection bias is introduced into the sample. This could happen if a researcher chooses clients from one agency to belong to the experimental group and those from another agency to be in the comparison group, when the agencies serve different types of people. Selection bias is a reason experimenters use random assignment, so conscious and unconscious bias do not influence to which group a participant is assigned. Sometimes, the groups are comparable at the start of the experiment, but people drop out of the experiment. Mortality is the term we use to describe when a group changes because of people dropping out of the study.

Experiments themselves are often the source of threats to validity. Experiments are different from participants’ normal routines. The novelty of a research environment or experimental treatment may cause them to expect to feel differently, independently of the actual intervention. Reactivity is a threat to internal validity that occurs because the participants realize they are being observed. In this case, being observed makes the difference in outcome, not the intervention. What if the people in the control group are aware that they aren’t receiving the potential benefits from the experimental treatment? Maybe they respond by increasing their efforts to improve in spite of not receiving the treatment. This introduces a threat to internal validity called compensatory rivalry. On the other hand, it might have the opposite effect. Resentful demoralization occurs when people in the control group decrease their efforts because they aren’t getting the treatment. These threats could be decreased by keeping the experimental and control groups completely separate, so the control group isn’t aware of what’s happening with the experimental group. An advantage to this is that it can help prevent diffusion of treatment, in which members of the control group learn about the experimental treatment from people in the experimental group and start implementing the intervention for themselves. This can occur if participants in the experimental group begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware.

Researchers may also introduce error. For example, researchers may expect the experimental group to feel better and may give off conscious or unconscious cues to participants that influence their outcomes. Control groups could be expected to fare worse, and research staff might cue participants that they should feel worse than they otherwise would. It is also possible that research staff administering treatment as usual to the control group might try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). To prevent these threats that are caused by researchers or participants being aware of their role in the experiment, double-blind designs prevent both the research staff interacting with participants and the participants themselves from knowing who is assigned to which group.

There are some additional threats to internal validity that using double-blind designs cannot reduce. You have likely heard of the placebo effect, in which a participant in the control group feels better because they think they are receiving treatment, despite not having received the experimental treatment at all. Researchers may introduce a threat to internal validity called instrumentation when they choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses. Testing is a threat to internal validity in which the fact that participants take a pretest–not the intervention–affects their score on the post-test. The Solomon Four Group and Post-test Only designs are used to reduce the testing threat to internal validity. Sometimes, the change in an experiment would have happened even without any intervention because of the natural passage of time. This is called maturation. Imagine researchers testing the effects of a parenting class on the beliefs and attitudes of adolescent fathers. Perhaps the changes in their beliefs and attitudes are based on growing older, not on the class. Having a control or comparison group helps with this threat. It also helps reduce the threat of history, when something happens outside the experiment but affects its participants.

As you can see, there are several ways in which the internal validity of a study can be threatened. No study can eliminate all threats, but the best ones consider the threats and do their best to reduce them as much as is feasible based on the resources available. When you read and critique research articles, it is important to consider these threats so you can assess the validity of a study’s results.

==Content fro Katherine Kitchens 2022; table made in 2023: 

NEED TO (1) DECIDE WHERE TO PUT TABLE, REVIEW & IMPROVE CONTENT, ADD MISSING THREATS; DETERMINE ORGANIZATION

Table 14.X: Selected Threats to Internal Validity
Threat to internal validity Description Mitigation
External circumstances
History Any event that occurs while the experiment is in progress might be an alternate explanation for changes in the outcome Using a control group
Statistical regression to the mean The natural tendency for extreme scores to regress or move toward the mean Using a control group
Instrumentation Flawed measurement effects the results of the experiment Select standardized instruments; Develop strict protocols for pre- and post- testing, including  duration of assessment; Train researchers on assessment procedures
Testing effects A pre-test may confound the influence of the experimental treatment or performance on the post-test Using a control group; Using Solomon Four Group Design; Using post-test only designs; Multiple time points for pre-testing
Selection bias
Immortal time bias Can occur in studies where there is a delay in classifying participants as ‘treated’ until they first begin the intervention rather than when they enter the study and as assigned to the experimental group Using data at time-zero (rather than upon beginning the intervention) from participants assigned to exposure groups
Participants
Apprehension bias
Contamination experimental and control groups interact
Adherence bias Differential rates of adherence affect outcome Adherence reminders; Collect data on adherence and statistically control for it
Placebo effect Knowledge and expectations about being treated affects the outcomes Participants as well as researchers may remain uninformed (i.e., blinded) about the kind of treatment each participant is receiving, or which group they are in
Compensatory rivalry Members of the comparison group realize they’re not getting intervention and work extra hard (i.e., compensate) as a result Keep the groups apart; add qualitative component to understand if compensatory rivalry is occurring
Resentful demoralization Participants in the control/comparison develop feelings of resentment due to exclusion from experimental condition and this affects the outcome. (The reverse of compensatory rivalry) Keep the groups apart; add qualitative component to understand if demoralization is occurring
Maturation Normal changes over time (e.g., fatigue or aging) might affect the dependent variable Using a control group or equivalently matched comparison group
Attrition/mortality If groups lose participants (e.g., dropping out of the experiment), the outcome might be affected
Hawthorne effect Occurs when individuals alter a specific feature of their behavior in reaction to their consciousness of being under observation Covert observation techniques can be helpful in mitigating the Hawthorne effect
Researchers/Interventionists
Researcher effects e.g., facial expression affects group
Treatment fidelity make adjustments to intervention during intervention
Diffusion of treatment interventionist tells other researchers about treatment who begin using it
Performance bias
Allocation bias Allocation bias can occur when researchers possess knowledge or make predictions about the intervention that the next eligible person is expected to receive Researchers should endeavor to comprehend the underlying causes of allocation concealment and the potential complications that may develop in the absence of a rigorous trial design incorporating allocation concealment
Observer bias Observer bias is a form of detection bias that has the potential to impact the evaluation process in both observational and intervention research Measures to maintain blinding of outcome assessors with regard to which group participants are in

>FROM BHATTARCHERJEE

Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below,
within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.

  • History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math
    program.
  • Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.
  • Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam. Not conducting a pretest can help avoid this threat.
  • Instrumentation threat, which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.
  • Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.
  • Regression threat, also called a regression to the mean, refers to the statistical tendency of a group’s overall performance on a measure during a posttest to regress toward the mean of that measure rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest was possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.

 

> FROM RESEARCH METHODS IN PSYCH 1641401927

One alternative explanation goes under the name of history. Other things might have happened between the pretest and the posttest that caused a change from pretest to posttest. Perhaps an anti-drug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it.

Another alternative explanation goes under the name of maturation. Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a year long anti-drug program, participants might become less impulsive or better reasoners and this might be responsible for the change in their attitudes toward illegal drugs.

Another threat to the internal validity of one-group pretest-posttest designs is testing, which refers to when the act of measuring the dependent variable during the pretest affects participants’ responses at posttest. For instance, completing the measure of attitudes towards illegal drugs may have had an effect on those attitudes. Simply completing this measure may have inspired further thinking and conversations about illegal drugs that then produced a change in posttest scores.

Similarly, instrumentation can be a threat to the internal validity of studies using this design. Instrumentation refers to when the basic characteristics of the measuring instrument change over time. When human observers are used to measure behavior, they may over time gain skill, become fatigued, or change the standards on which observations are based. So participants may have taken the measure of attitudes toward illegal drugs very seriously during the pretest when it was novel but then they may have become bored with the measure at posttest and been less careful in considering their responses.

Another alternative explanation for a change in the dependent variable in a pretest-posttest design is regression to the mean. This refers to the statistical fact that an individual who scores extremely high or extremely low on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study because of their extreme scores. Imagine, for example, that only students who scored especially high on the test of attitudes toward illegal drugs (those with extremely favorable attitudes toward drugs) were given the anti-drug program and then were retested. Regression to the mean all but guarantees that their scores will be lower at the posttest even if the training program has no effect.

A closely related concept—and an extremely important one in psychological research—is spontaneous remission. This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001)1. Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

A common approach to ruling out the threats to internal validity described above is by revisiting the research design to include a control group, one that does not receive the treatment effect. A control group would be subject to the same threats from history, maturation, testing, instrumentation, regression to the mean, and spontaneous remission and so would allow the researcher to measure the actual effect of the treatment (if any). Of course, including a control group would mean that this is no longer a one-group design.

 

Key Takeaways

Type your key takeaways here.

  • First
  • Second

 

License

Doctoral Research Methods in Social Work Copyright © by Mavs Open Press. All Rights Reserved.

Share This Book