8.3 The logic of experimental design
Learning Objectives
- Apply the criterion of causality to experimental design
- Define internal validity and external validity
- Identify and define threats to internal validity
As we discussed at the beginning of this chapter, experimental design is commonly understood and implemented informally in everyday life. Trying out a new restaurant, dating a new person—we often call these things “experiments.” As you’ve learned over the past two sections, in order for something to be a true experiment, or even a quasi- or pre-experiment, you must rigorously apply the various components of experimental design. A true experiment for trying a new restaurant would include recruitment of a large enough sample, random assignment to control and experimental groups, pretesting and posttesting, as well as using clearly and objectively defined measures of satisfaction with the restaurant.
Social scientists use this level of rigor and control because they try to maximize the internal validity of their research. Internal validity is the confidence researchers have about whether the independent variable intervention truly produced a change in the dependent variable. In the case of experimental design, the independent variable is the intervention or treatment. Experiments are attempts to establish causality between two variables—the treatment and its intended outcome.
As we talked about in Chapter 4, nomothetic causal explanations must establish four criteria: covariation, plausibility, temporality, and nonspuriousness. The logic and rigor of experimental design allows for causality to be established. Experimenters can assess covariation on the dependent variable through pre- and post-tests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable (i.e., receiving the treatment). Moreover, since the researcher controls when the intervention is administered, she can be assured that changes in the independent variable (the treatment) happened before changes the dependent variable (the outcome). In this way, experiments assure temporality. In our restaurant experiment, we would know through assignment to experimental and control groups that people varied in the restaurant they attended. We would also know whether their level of satisfaction changed, as measured by the pre- and posttest. We would also know that changes in our diners’ satisfaction occurred after they left the restaurant, not before they walked in because of the pre- and post-tests.
Experimenters also have a plausible reason why their intervention would cause changes in the dependent variable. Usually, a theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we found a national poll that found the type of food our experimental restaurant served, let’s say pizza, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google. This evidence would give us a plausible reason to establish the restaurant as causing satisfaction.
One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled conditions. The intervention is given in the same way to each person, with a minimal number of other variables that might cause their post-test scores to change. In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, an important problem in real-world research. For this reason, experiments try to control as many aspects of the research process as possible: using control groups, having large enough sample sizes, standardizing the treatment, etc. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the experiment so each person who participates receives the exact same treatment.
Experimental researchers also document their procedures, so that others can review them and make changes in future research if they think it will improve on the ability to control for spurious variables. An interesting example is Bruce Alexander’s (2010) Rat Park experiments. Much of the early research conducted on addictive drugs, like heroin and cocaine, was conducted on animals other than humans, usually mice or rats. The scientific consensus up until Alexander’s experiments was that cocaine and heroin were so addictive that rats, if offered the drugs, would consume them repeatedly until they perished. Researchers claimed this behavior explained how addiction worked in humans, but Alexander was not so sure. He knew rats were social animals and the experimental procedure from previous experiments did not allow them to socialize. Instead, rats were kept isolated in small cages with only food, water, and metal walls. To Alexander, social isolation was a spurious variable, causing changes in addictive behavior not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug. In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance. To Alexander, the results of his experiment demonstrated that social isolation was more of a causal factor for addiction than the drug itself.
One challenge with Alexander’s findings is that subsequent researchers have had mixed success replicating his findings (e.g., Petrie, 1996; Solinas, Thiriet, El Rawas, Lardeux, & Jaber, 2009). Replication involves conducting another researcher’s experiment in the same manner and seeing if it produces the same results. If the causal relationship is real, it should occur in all (or at least most) replications of the experiment.
One of the defining features of experiments is that they report their procedures diligently, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology (Open Science Collaboration, 2015). In one study, researchers attempted reproduce the results of 100 experiments published in major psychology journals between 2008 and the present. What they found was shocking. The results of only 36% of the studies were reproducible. Despite coordinating closely with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible. The implications of the Reproducibility Project are staggering, and social scientists are coming up with new ways to ensure researchers do not cherry-pick data or change their hypotheses, simply to get published.
Let’s return to Alexander’s Rat Park study and consider the implications of his experiment for substance use professionals. The conclusions he drew from his experiments on rats were meant to generalize to the population of people with substance use disorders. If this could be done, the experiment would have high degree of external validity, which is the degree to which conclusions generalize to larger populations and different situations. Alexander argues his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments may become addicted to drugs more often than those in more enriching environments. Similarly, earlier rat researchers argued their results showed these drugs were instantly addictive to humans, often to the point of death.
Neither study’s results will match up perfectly with real life. There are clients in social work practice who may fit into Alexander’s social isolation model, but social isolation is complex. Clients can live in environments with other sociable humans, work jobs, and have romantic relationships; does this mean they are not socially isolated? On the other hand, clients may face structural racism, poverty, trauma, and other challenges that may contribute their social environment. Alexander’s work helps understand clients’ experiences, but the explanation is incomplete. Human existence is more complicated than the experimental conditions in Rat Park.
Social workers are especially attentive to how social context shapes social life. So, we are likely to point out a specific disadvantage of experiments. They are rather artificial. How often do real-world social interactions occur in the same way that they do in a controlled experiment? Experiments that are conducted in community settings may not be as subject to artificiality as those in a research lab, but their conditions are less easily controlled. This demonstrates the tension between internal and external validity. Internal validity and external validity are conceptually linked. Internal validity refers to the degree to which the intervention causes its intended outcomes, and external validity refers to how well that relationship applies to different groups and circumstances. However, the more researchers tightly control the environment to ensure internal validity, the less they can claim external validity for generalizing their results to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process. This is not to suggest that experimental research cannot have external validity, but that experimental researchers must always be aware that external validity problems can occur and be forthcoming in their reports of findings about this potential weakness.
Threats to internal validity
There are a number of factors that may influence a study’s internal validity. You might consider these threats to all be spurious variables, as we discussed at the beginning of this section. Each threat proposes something other than the treatment (or intervention) is changing the outcome. The threats introduce error and bias into the experiment.
Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable in order for experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups give researchers an opportunity to explore what happens when similar people who do not receive the intervention. But if the experimental and control groups are not comparable, then the differences in outcome may not be due to the intervention. No groups are ever perfectly comparable. What’s important is ensuring groups are as similar as possible along variables relevant to the research project.
In our restaurant example, if one of the groups had far more vegetarians or people with gluten issues, it might influence how satisfied they were with the restaurant. The groups, in that case, would not be comparable. Researchers can account for this by measuring other variables, like dietary preference, and controlling for their effects statistically, after the data are collected. We discussed control variables like these in Chapter 4. When some factor related to selecting research participants prevents the groups from being comparable, then selection bias is introduced into the sample. This could happen if a researcher cho0ses clients from one agency to belong to the experimental group and those from another agency to be in the comparison group, when the agencies serve different types of people. Selection bias is a reason experimenters use random assignment, so conscious and unconscious bias do not influence to which group a participant is assigned. Sometimes, the groups are comparable at the start of the experiment, but people drop out of the experiment. Mortality is the term we use to describe when a group changes because of people dropping out of the study. In our restaurant example, this could happen if vegetarians dropped out of the experimental group because the restaurant being tested didn’t have vegetarian options.
Experiments themselves are often the source of threats to validity. Experiments are different from participants’ normal routines. The novelty of a research environment or experimental treatment may cause them to expect to feel differently, independently of the actual intervention. Reactivity is a threat to internal validity that occurs because the participants realize they are being observed. In this case, being observed makes the difference in outcome, not the intervention.
What if the people in the control group are aware that they aren’t receiving the potential benefits from the experimental treatment? Maybe they respond by increasing their efforts to improve in spite of not receiving the treatment. This introduces a threat to internal validity called compensatory rivalry. On the other hand, it might have the opposite effect. Resentful demoralization occurs when people in the control group decrease their efforts because they aren’t getting the treatment. These threats could be decreased by keeping the experimental and control groups completely separate, so the control group isn’t aware of what’s happening with the experimental group. An advantage to this is that it can help prevent diffusion of treatment, in which members of the control group learn about the experimental treatment from people in the experimental group and start implementing the intervention for themselves. This can occur if participants in the experimental group begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware.
Researchers may also introduce error. For example, researchers may expect the experimental group to feel better and may give off conscious or unconscious cues to participants that influence their outcomes. Control groups could be expected to fare worse, and research staff might cue participants that they should feel worse than they otherwise would. It is also possible that research staff administering treatment as usual to the control group might try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). To prevent these threats that are caused by researchers or participants being aware of their role in the experiment, double-blind designs prevent both the research staff interacting with participants and the participants themselves from knowing who is assigned to which group.
There are some additional threats to internal validity that using double-blind designs cannot reduce. You have likely heard of the placebo effect, in which a participant in the control group feels better because they think they are receiving treatment, despite not having received the experimental treatment at all. Researchers may introduce a threat to internal validity called instrumentation when they choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses. Testing is a threat to internal validity in which the fact that participants take a pretest–not the intervention–affects their score on the post-test. The Solomon Four Group and Post-test Only designs are used to reduce the testing threat to internal validity. Sometimes, the change in an experiment would have happened even without any intervention because of the natural passage of time. This is called maturation. Imagine researchers testing the effects of a parenting class on the beliefs and attitudes of adolescent fathers. Perhaps the changes in their beliefs and attitudes are based on growing older, not on the class. Having a control or comparison group helps with this threat. It also helps reduce the threat of history, when something happens outside the experiment but affects its participants.
As you can see, there are several ways in which the internal validity of a study can be threatened. No study can eliminate all threats, but the best ones consider the threats and do their best to reduce them as much as is feasible based on the resources available. When you read and critique research articles, it is important to consider these threats so you can assess the validity of a study’s results.
Spotlight on UTA School of Social Work
Assessing a teen pregnancy and STI prevention program
Dr. Holli Slater and Dr.Diane Mitschke implemented an experimental design to conduct a randomized two-group cohort-based longitudinal study using repeated measures to assess outcomes related to a teen pregnancy prevention program (Slater & Mitschke, 2015). Crossroads was a co-ed program targeting academically at risk youth enrolled in local school district. It was administered by trained facilitators in a large-group setting across three consecutive days for a total of 18.75 hours of program instruction. Each day had a separate focus, including building relationships, prevention of pregnancy and sexually transmitted infections (STIs), and identifying resources available within the community.
Potential participants were recruited on an ongoing basis and put into a pool of potential candidates to join the study. Prior to each intervention series, 60 youth were randomly assigned to either treatment or control groups. Youth assigned to the treatment group attended a three day intervention and received ongoing support from assigned facilitators. Youth who were assigned to the control group did not attend the intervention and continued to receive services as usual. Services as usual comprised of being assigned a graduation coach who provided dropout prevention services and assisted youth to meet their academic goals. Graduation coach services were available to all at-risk students in the school district, regardless of their enrollment in the study and/or assignment to treatment or control groups.
The primary research aim of the study was to assess the impact of being offered participation in the intervention on condom use. Essentially, the researchers wanted to see if condom use increased more among sexually active youth following the intervention compared to youth who did not attend the intervention. In addition to this primary research aim, Drs. Mitschke and Slater explored whether this effect was sustained over time. They collected data through an online survey at four separate time points (baseline, 3-, 6-, and 12- months post intervention). Due to the longitudinal nature of the study and the highly transient population, the researchers provided incentives of a $20 gift card at each data collection point. They still had a challenge in retaining youth for the duration of the study.
An intent-to-treat framework was used to assess the impact of the program, meaning data analysis included all youth who were randomized regardless of their level of participation in the program. The researchers compared the outcomes between youth in treatment and youth in the control groups. Significant differences between the treatment and control groups (p<.05) would support the argument that changes in behavior (e.g., increase in condom use) were attributed to participation in the intervention.
Results of the study did not identify significant findings in condom usage at 3 months and 12 months after the intervention. However, it did find significant results at 6 months, indicating that youth who participated in the intervention were less likely to engage in intercourse without a condom than youth in the control group. While it is disappointing to not find significant results in a large scale study, such as this, negative results can be just as powerful.
Dr. Slater and Dr. Mitschke explored reasons why the intervention may not have been as effective immediately following the intervention by talking with youth and their counselors to gain insight. One possible explanation is that youth enrolled in this study had already established their sexual norms prior to the intervention. The majority of youth in the study were already sexually active. If this was the case, then practitioners developing interventions for pregnancy prevention should take this into consideration when developing program. Perhaps implementing an intervention at an earlier age when youth are not yet sexually active would have a greater impact on behaviors than waiting until they are already engaging in risky sexual behaviors and trying to create a change.
It is interesting that behaviors did seem to change with youth at the six month follow up. It is possible this is a spurious result and should be explored more fully. Interviews with youth indicated that the repeated follow up from the intervention team over time resulted in an increase in trust between the youth and their counselor. Some even suggested they changed their behaviors because a caring adult took time to continually follow up with them. This alternate explanation should also be further explored to better understand what components of the intervention have the greatest impact on the behavior of youth.
Key Takeaways
- Experimental design provides researchers with the ability to best establish causality between their variables.
- Experiments provide strong internal validity but may have trouble achieving external validity.
- Experimental deigns should be reproducible by future researchers.
- Threats to validity come from both experimenter and participant reactivity.
Glossary
- Comparable groups – groups that are similar across factors important for the study
- Compensatory rivalry – a threat to internal validity in which participants in the control group increasing their efforts to improve because they know they are not receiving the experimental treatment
- Diffusion of treatment – a threat to internal validity in which members of the control group learn about the experimental treatment from people in the experimental group and start implementing the intervention for themselves
- Double-blind – when researchers interact with participants are unaware of who is in the control or experimental group
- External validity – the degree to which experimental conclusions generalize to larger populations and different situations
- Instrumentation – a threat to internal validity when measures do not accurately measure participants or are implemented in a way that biases participant responses
- Internal validity – the confidence researchers have about whether their intervention produced variation in their dependent variable
- Maturation – a threat to internal validity in which the change in an experiment would have happened even without any intervention because of the natural passage of time
- Mortality – a threat to internal validity caused when either the experimental or control group composition changes because of people dropping out of the study
- Placebo effect- when a participant feels better, despite having received no intervention at all
- Reactivity – a threat to internal validity that occurs because the participants realize they are being observed
- Replication – conducting another researcher’s experiment in the same manner and seeing if it produces the same results
- Resentful demoralization – a threat to internal validity that occurs when people in the control group decrease their efforts because they aren’t getting the experimental treatment
- Selection bias – when the elements selected for inclusion in a study do not represent the larger population from which they were drawn due to sampling method or thought processes of the researcher
Image attributions
One of Juno’s solar panels before illumination test by NASA/Jack Pfaller public domain