Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
8.1 Experimental design: What is it and when should it be used?
Learning objectives.
- Define experiment
- Identify the core features of true experimental designs
- Describe the difference between an experimental group and a control group
- Identify and describe the various types of true experimental designs
Experiments are an excellent data collection strategy for social workers wishing to observe the effects of a clinical intervention or social welfare program. Understanding what experiments are and how they are conducted is useful for all social scientists, whether they actually plan to use this methodology or simply aim to understand findings from experimental studies. An experiment is a method of data collection designed to test hypotheses under controlled conditions. In social scientific research, the term experiment has a precise meaning and should not be used to describe all research methodologies.
Experiments have a long and important history in social science. Behaviorists such as John Watson, B. F. Skinner, Ivan Pavlov, and Albert Bandura used experimental design to demonstrate the various types of conditioning. Using strictly controlled environments, behaviorists were able to isolate a single stimulus as the cause of measurable differences in behavior or physiological responses. The foundations of social learning theory and behavior modification are found in experimental research projects. Moreover, behaviorist experiments brought psychology and social science away from the abstract world of Freudian analysis and towards empirical inquiry, grounded in real-world observations and objectively-defined variables. Experiments are used at all levels of social work inquiry, including agency-based experiments that test therapeutic interventions and policy experiments that test new programs.
Several kinds of experimental designs exist. In general, designs considered to be true experiments contain three basic key features:
- random assignment of participants into experimental and control groups
- a “treatment” (or intervention) provided to the experimental group
- measurement of the effects of the treatment in a post-test administered to both groups
Some true experiments are more complex. Their designs can also include a pre-test and can have more than two groups, but these are the minimum requirements for a design to be a true experiment.
Experimental and control groups
In a true experiment, the effect of an intervention is tested by comparing two groups: one that is exposed to the intervention (the experimental group , also known as the treatment group) and another that does not receive the intervention (the control group ). Importantly, participants in a true experiment need to be randomly assigned to either the control or experimental groups. Random assignment uses a random number generator or some other random process to assign people into experimental and control groups. Random assignment is important in experimental research because it helps to ensure that the experimental group and control group are comparable and that any differences between the experimental and control groups are due to random chance. We will address more of the logic behind random assignment in the next section.
Treatment or intervention
In an experiment, the independent variable is receiving the intervention being tested—for example, a therapeutic technique, prevention program, or access to some service or support. It is less common in of social work research, but social science research may also have a stimulus, rather than an intervention as the independent variable. For example, an electric shock or a reading about death might be used as a stimulus to provoke a response.
In some cases, it may be immoral to withhold treatment completely from a control group within an experiment. If you recruited two groups of people with severe addiction and only provided treatment to one group, the other group would likely suffer. For these cases, researchers use a control group that receives “treatment as usual.” Experimenters must clearly define what treatment as usual means. For example, a standard treatment in substance abuse recovery is attending Alcoholics Anonymous or Narcotics Anonymous meetings. A substance abuse researcher conducting an experiment may use twelve-step programs in their control group and use their experimental intervention in the experimental group. The results would show whether the experimental intervention worked better than normal treatment, which is useful information.
The dependent variable is usually the intended effect the researcher wants the intervention to have. If the researcher is testing a new therapy for individuals with binge eating disorder, their dependent variable may be the number of binge eating episodes a participant reports. The researcher likely expects her intervention to decrease the number of binge eating episodes reported by participants. Thus, she must, at a minimum, measure the number of episodes that occur after the intervention, which is the post-test . In a classic experimental design, participants are also given a pretest to measure the dependent variable before the experimental treatment begins.
Types of experimental design
Let’s put these concepts in chronological order so we can better understand how an experiment runs from start to finish. Once you’ve collected your sample, you’ll need to randomly assign your participants to the experimental group and control group. In a common type of experimental design, you will then give both groups your pretest, which measures your dependent variable, to see what your participants are like before you start your intervention. Next, you will provide your intervention, or independent variable, to your experimental group, but not to your control group. Many interventions last a few weeks or months to complete, particularly therapeutic treatments. Finally, you will administer your post-test to both groups to observe any changes in your dependent variable. What we’ve just described is known as the classical experimental design and is the simplest type of true experimental design. All of the designs we review in this section are variations on this approach. Figure 8.1 visually represents these steps.
An interesting example of experimental research can be found in Shannon K. McCoy and Brenda Major’s (2003) study of people’s perceptions of prejudice. In one portion of this multifaceted study, all participants were given a pretest to assess their levels of depression. No significant differences in depression were found between the experimental and control groups during the pretest. Participants in the experimental group were then asked to read an article suggesting that prejudice against their own racial group is severe and pervasive, while participants in the control group were asked to read an article suggesting that prejudice against a racial group other than their own is severe and pervasive. Clearly, these were not meant to be interventions or treatments to help depression, but were stimuli designed to elicit changes in people’s depression levels. Upon measuring depression scores during the post-test period, the researchers discovered that those who had received the experimental stimulus (the article citing prejudice against their same racial group) reported greater depression than those in the control group. This is just one of many examples of social scientific experimental research.
In addition to classic experimental design, there are two other ways of designing experiments that are considered to fall within the purview of “true” experiments (Babbie, 2010; Campbell & Stanley, 1963). The posttest-only control group design is almost the same as classic experimental design, except it does not use a pretest. Researchers who use posttest-only designs want to eliminate testing effects , in which participants’ scores on a measure change because they have already been exposed to it. If you took multiple SAT or ACT practice exams before you took the real one you sent to colleges, you’ve taken advantage of testing effects to get a better score. Considering the previous example on racism and depression, participants who are given a pretest about depression before being exposed to the stimulus would likely assume that the intervention is designed to address depression. That knowledge could cause them to answer differently on the post-test than they otherwise would. In theory, as long as the control and experimental groups have been determined randomly and are therefore comparable, no pretest is needed. However, most researchers prefer to use pretests in case randomization did not result in equivalent groups and to help assess change over time within both the experimental and control groups.
Researchers wishing to account for testing effects but also gather pretest data can use a Solomon four-group design. In the Solomon four-group design , the researcher uses four groups. Two groups are treated as they would be in a classic experiment—pretest, experimental group intervention, and post-test. The other two groups do not receive the pretest, though one receives the intervention. All groups are given the post-test. Table 8.1 illustrates the features of each of the four groups in the Solomon four-group design. By having one set of experimental and control groups that complete the pretest (Groups 1 and 2) and another set that does not complete the pretest (Groups 3 and 4), researchers using the Solomon four-group design can account for testing effects in their analysis.
Solomon four-group designs are challenging to implement in the real world because they are time- and resource-intensive. Researchers must recruit enough participants to create four groups and implement interventions in two of them.
Overall, true experimental designs are sometimes difficult to implement in a real-world practice environment. It may be impossible to withhold treatment from a control group or randomly assign participants in a study. In these cases, pre-experimental and quasi-experimental designs–which we will discuss in the next section–can be used. However, the differences in rigor from true experimental designs leave their conclusions more open to critique.
Experimental design in macro-level research
You can imagine that social work researchers may be limited in their ability to use random assignment when examining the effects of governmental policy on individuals. For example, it is unlikely that a researcher could randomly assign some states to implement decriminalization of recreational marijuana and some states not to in order to assess the effects of the policy change. There are, however, important examples of policy experiments that use random assignment, including the Oregon Medicaid experiment. In the Oregon Medicaid experiment, the wait list for Oregon was so long, state officials conducted a lottery to see who from the wait list would receive Medicaid (Baicker et al., 2013). Researchers used the lottery as a natural experiment that included random assignment. People selected to be a part of Medicaid were the experimental group and those on the wait list were in the control group. There are some practical complications macro-level experiments, just as with other experiments. For example, the ethical concern with using people on a wait list as a control group exists in macro-level research just as it does in micro-level research.
Key Takeaways
- True experimental designs require random assignment.
- Control groups do not receive an intervention, and experimental groups receive an intervention.
- The basic components of a true experiment include a pretest, posttest, control group, and experimental group.
- Testing effects may cause researchers to use variations on the classic experimental design.
- Classic experimental design- uses random assignment, an experimental and control group, as well as pre- and posttesting
- Control group- the group in an experiment that does not receive the intervention
- Experiment- a method of data collection designed to test hypotheses under controlled conditions
- Experimental group- the group in an experiment that receives the intervention
- Posttest- a measurement taken after the intervention
- Posttest-only control group design- a type of experimental design that uses random assignment, and an experimental and control group, but does not use a pretest
- Pretest- a measurement taken prior to the intervention
- Random assignment-using a random process to assign people into experimental and control groups
- Solomon four-group design- uses random assignment, two experimental and two control groups, pretests for half of the groups, and posttests for all
- Testing effects- when a participant’s scores on a measure change because they have already been exposed to it
- True experiments- a group of experimental designs that contain independent and dependent variables, pretesting and post testing, and experimental and control groups
Image attributions
exam scientific experiment by mohamed_hassan CC-0
Foundations of Social Work Research Copyright © 2020 by Rebecca L. Mauldin is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Share This Book
- Privacy Policy
Home » Experimental Design – Types, Methods, Guide
Experimental Design – Types, Methods, Guide
Table of Contents
Experimental design is a structured approach used to conduct scientific experiments. It enables researchers to explore cause-and-effect relationships by controlling variables and testing hypotheses. This guide explores the types of experimental designs, common methods, and best practices for planning and conducting experiments.
Experimental Design
Experimental design refers to the process of planning a study to test a hypothesis, where variables are manipulated to observe their effects on outcomes. By carefully controlling conditions, researchers can determine whether specific factors cause changes in a dependent variable.
Key Characteristics of Experimental Design :
- Manipulation of Variables : The researcher intentionally changes one or more independent variables.
- Control of Extraneous Factors : Other variables are kept constant to avoid interference.
- Randomization : Subjects are often randomly assigned to groups to reduce bias.
- Replication : Repeating the experiment or having multiple subjects helps verify results.
Purpose of Experimental Design
The primary purpose of experimental design is to establish causal relationships by controlling for extraneous factors and reducing bias. Experimental designs help:
- Test Hypotheses : Determine if there is a significant effect of independent variables on dependent variables.
- Control Confounding Variables : Minimize the impact of variables that could distort results.
- Generate Reproducible Results : Provide a structured approach that allows other researchers to replicate findings.
Types of Experimental Designs
Experimental designs can vary based on the number of variables, the assignment of participants, and the purpose of the experiment. Here are some common types:
1. Pre-Experimental Designs
These designs are exploratory and lack random assignment, often used when strict control is not feasible. They provide initial insights but are less rigorous in establishing causality.
- Example : A training program is provided, and participants’ knowledge is tested afterward, without a pretest.
- Example : A group is tested on reading skills, receives instruction, and is tested again to measure improvement.
2. True Experimental Designs
True experiments involve random assignment of participants to control or experimental groups, providing high levels of control over variables.
- Example : A new drug’s efficacy is tested with patients randomly assigned to receive the drug or a placebo.
- Example : Two groups are observed after one group receives a treatment, and the other receives no intervention.
3. Quasi-Experimental Designs
Quasi-experiments lack random assignment but still aim to determine causality by comparing groups or time periods. They are often used when randomization isn’t possible, such as in natural or field experiments.
- Example : Schools receive different curriculums, and students’ test scores are compared before and after implementation.
- Example : Traffic accident rates are recorded for a city before and after a new speed limit is enforced.
4. Factorial Designs
Factorial designs test the effects of multiple independent variables simultaneously. This design is useful for studying the interactions between variables.
- Example : Studying how caffeine (variable 1) and sleep deprivation (variable 2) affect memory performance.
- Example : An experiment studying the impact of age, gender, and education level on technology usage.
5. Repeated Measures Design
In repeated measures designs, the same participants are exposed to different conditions or treatments. This design is valuable for studying changes within subjects over time.
- Example : Measuring reaction time in participants before, during, and after caffeine consumption.
- Example : Testing two medications, with each participant receiving both but in a different sequence.
Methods for Implementing Experimental Designs
- Purpose : Ensures each participant has an equal chance of being assigned to any group, reducing selection bias.
- Method : Use random number generators or assignment software to allocate participants randomly.
- Purpose : Prevents participants or researchers from knowing which group (experimental or control) participants belong to, reducing bias.
- Method : Implement single-blind (participants unaware) or double-blind (both participants and researchers unaware) procedures.
- Purpose : Provides a baseline for comparison, showing what would happen without the intervention.
- Method : Include a group that does not receive the treatment but otherwise undergoes the same conditions.
- Purpose : Controls for order effects in repeated measures designs by varying the order of treatments.
- Method : Assign different sequences to participants, ensuring that each condition appears equally across orders.
- Purpose : Ensures reliability by repeating the experiment or including multiple participants within groups.
- Method : Increase sample size or repeat studies with different samples or in different settings.
Steps to Conduct an Experimental Design
- Clearly state what you intend to discover or prove through the experiment. A strong hypothesis guides the experiment’s design and variable selection.
- Independent Variable (IV) : The factor manipulated by the researcher (e.g., amount of sleep).
- Dependent Variable (DV) : The outcome measured (e.g., reaction time).
- Control Variables : Factors kept constant to prevent interference with results (e.g., time of day for testing).
- Choose a design type that aligns with your research question, hypothesis, and available resources. For example, an RCT for a medical study or a factorial design for complex interactions.
- Randomly assign participants to experimental or control groups. Ensure control groups are similar to experimental groups in all respects except for the treatment received.
- Randomize the assignment and, if possible, apply blinding to minimize potential bias.
- Follow a consistent procedure for each group, collecting data systematically. Record observations and manage any unexpected events or variables that may arise.
- Use appropriate statistical methods to test for significant differences between groups, such as t-tests, ANOVA, or regression analysis.
- Determine whether the results support your hypothesis and analyze any trends, patterns, or unexpected findings. Discuss possible limitations and implications of your results.
Examples of Experimental Design in Research
- Medicine : Testing a new drug’s effectiveness through a randomized controlled trial, where one group receives the drug and another receives a placebo.
- Psychology : Studying the effect of sleep deprivation on memory using a within-subject design, where participants are tested with different sleep conditions.
- Education : Comparing teaching methods in a quasi-experimental design by measuring students’ performance before and after implementing a new curriculum.
- Marketing : Using a factorial design to examine the effects of advertisement type and frequency on consumer purchase behavior.
- Environmental Science : Testing the impact of a pollution reduction policy through a time series design, recording pollution levels before and after implementation.
Experimental design is fundamental to conducting rigorous and reliable research, offering a systematic approach to exploring causal relationships. With various types of designs and methods, researchers can choose the most appropriate setup to answer their research questions effectively. By applying best practices, controlling variables, and selecting suitable statistical methods, experimental design supports meaningful insights across scientific, medical, and social research fields.
- Campbell, D. T., & Stanley, J. C. (1963). Experimental and Quasi-Experimental Designs for Research . Houghton Mifflin Company.
- Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference . Houghton Mifflin.
- Fisher, R. A. (1935). The Design of Experiments . Oliver and Boyd.
- Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics . Sage Publications.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences . Routledge.
About the author
Muhammad Hassan
Researcher, Academic Writer, Web developer
You may also like
One-to-One Interview – Methods and Guide
Exploratory Research – Types, Methods and...
Focus Groups – Steps, Examples and Guide
Quasi-Experimental Research Design – Types...
Applied Research – Types, Methods and Examples
Ethnographic Research -Types, Methods and Guide
Chapter 6: Experiments
Doubt the conventional wisdom unless you can verify it with reason and experiment
— Steve Albini [1]
Learning Objectives
After reading this chapter, students should be able to do the following:
- Describe the rationale underlying an experimental method.
- Identify the criteria needed to establish causality and explain which features of an experiment support the testing of cause–effect relationships.
- Differentiate between basic and classic experimental designs; explain how exposure to the independent variable differs in between-subjects versus within-subjects designs; and explain why some designs are classified as quasi-experimental.
- Define internal and external validity.
- Identify and describe potential threats to internal validity.
- Identify and describe potential threats to external validity.
INTRODUCTION
In chapter 4, you learned that a research design is a template for a study detailing who, what, where, when, why, and how an investigation will take place. In addition to distinguishing between designs based on “when” a study takes place, as in a cross-sectional study at one point in time versus a longitudinal design conducted at multiple points in time, researchers in the social sciences sometimes classify designs as “experimental” or “non-experimental” based on their ability to determine causality. Also recall from chapter 1 how explanatory research is conducted with the aim of answering the question of “why” something occurs the way it does or to clarify why there is variation between groups on some dimension of interest. Experimental methods provide a means for testing causal relationships through the manipulation and measurement of variables. In an experiment , at least one independent variable is manipulated by a researcher to measure its effects (if any) on a dependent variable . For example, a researcher interested in the effects of sleep deprivation on scholastic performance might manipulate the amount of sleep obtained by participants. Half of the participants in the experiment might be allowed only three hours of sleep. In other words, half of the participants experience the treatment (independent variable), which in this example is sleep deprivation. The other half might be allowed to experience a normal night’s sleep of about eight to nine hours, with no sleep deprivation. The independent variable is the presumed cause of some outcome. In this example, the researcher might hypothesize that sleep deprivation will lower academic performance on a memory-based word task. In this case, participants who experience sleep deprivation should remember fewer words than participants with normal amounts of sleep, since sleep deprivation is believed to cause impaired performance. The measured performance on the memory task is the dependent variable, or the outcome.
Research on the Net
Participating in Human Research and Clinical Trials
The U.S. Department of Health and Human Services has a site full of resources to help protect the well-being of humans who volunteer as participants in health-related research (Office of Human Research Protections, 2022). Here you can find informational videos to learn more about participating in research, questions to ask researchers as a prospective volunteer, and what kind of regulations in place to protect volunteers.
CAUSALITY, CONTROL, AND RANDOM ASSIGNMENT
The study of cause–effect relationships rests on the assumption that one variable of interest is the cause of another; that is, changes in one variable produce changes in another. To establish a causal relationship, three criteria must exist. First, the variables must be related in a logical way. For example, education and income are associated, as people with higher levels of education also tend to earn more. This is sometimes referred to as the “covariance rule” (Beins, 2018, p. 163). Second, the cause must precede the effect in time, establishing “temporal order.” A person acquires an education and then enters the workforce to earn an income. Finally, the presumed cause should be the most plausible one and rival explanations should be ruled out. Although education contributes to income, there are other factors that help to explain one’s income, including age, years of experience, family socioeconomic status (i.e., how well off a person’s family of origin is), and type of employment. These factors cannot be completely ruled out from this example. While we have established an association, we have yet to prove causation. This can only be done by conducting an experiment.
Experimental methods common to the natural sciences are also regularly employed by psychologists and used to a lesser extent by sociologists, such as social psychologists and criminologists. Although experiments often conjure up images of scientists wearing white lab coats working with beakers in research laboratories, they can be conducted anywhere a researcher has control over the environment. Just as an instructor can close a classroom door and lock it, thereby preventing people from entering the room during a lecture, a researcher can follow standardized procedures, use scripts, and take precautions to turn a classroom, office, and/or some other area on campus into a carefully controlled laboratory setting.
Experiments constitute the only method that can demonstrate causation due to the strict environmental control and the random assignment of cases to the treatment. Random assignment is a method for assigning cases to the experimental group (the one that receives the independent variable) based on chance alone. This is important because, going back to the original example on sleep deprivation, some individuals require more sleep than others, some have better working memories than others, and some have higher overall scholastic aptitude than others—all of which can influence performance on the word task, as can mood, time of the day, and whether the person has recently eaten. If participants are randomly assigned to a sleep deprivation group or a normal sleep group, then any existing individual differences will also be randomly assigned across the groups, making them equivalent at the onset. The group exposed to the independent variable is called the experimental group and the group that does not experience the independent variable is called a control group . The control group provides a measure of what would “normally” happen in the absence of the experimental manipulation. In the example used earlier, the control group tells us how many words on average people can recall during a word task. We can then compare the results for the sleep-deprived experimental group to the results for the control group to see if the experimental group fares worse, as hypothesized.
Therefore, with random assignment and strict control over the environment, where both groups receive identical instructions and undergo the exact same experience except for the independent variable, we can be reasonably sure that any differences found between the two groups on some measure result solely from the manipulation. Using the previous example, if the sleep-deprived group performs worse on the word task, we can attribute the difference to the independent variable.
Students routinely confuse random sampling (discussed in chapter 5) with random assignment. Try to remember that random sampling has to do with how a sample is selected from a population of interest. This process permits a researcher to generalize from the sample to the population. For example, one of the probability-based sampling methods is simple random sampling, where a sample of a given size is obtained using a random-numbers generator. This means chance alone determines who is selected to take part in a study. Random assignment, on the other hand, has to do with how participants are put into groups in experimental research. In this case, chance alone determines who from the collection of sample subjects ends up receiving the manipulation (see figure 6.1). Random assignment helps ensure the experimental and control groups are identical before the experimental manipulation.
Since the hallmark of an experiment is the manipulation of an independent variable presumed to be the cause of a change in the dependent variable, researchers often incorporate a “manipulation check” into their study procedures to be certain the experimental group experienced the independent variable as intended. For example, participants might be asked to report on how much sleep they received. Those in the experimental group should say they received about 3.0 hours sleep, while those in the control condition should indicate about 8.5 hours, on average. Similarly, an experimenter studying the effects of watching a video on subsequent attitudes might ask participants a question or two about the video to gauge whether they watched it. This is an important check to see that the procedures of the experiment unfolded as intended.
Activity: Components of Experimental Research
Test Yourself
- Which variable is manipulated in an experiment?
- What three criteria must exist to establish a causal relationship?
- How can random assignment be distinguished from random sampling?
TYPES OF EXPERIMENTAL DESIGNS
There are many different types of experimental designs. What makes any experiment a “true” experiment is the presence of four core features: (1) random assignment, (2) an experimental and a control group, (3) the manipulation of an independent variable experienced by the experimental group, and (4) the measurement of a dependent variable (i.e., the outcome) to see what (if any) effect the independent variable had. This is usually referred to as a “post-test.” An experimental design that includes these four features and only these four features is referred to as a basic experimental design (see figure 6.2).
Classic Experimental Design
A classic experimental design includes the four basic features from the basic design (random assignment, an experimental and a control group, the manipulation of an independent variable, and a post-test measurement of the dependent variable), along with a pre-test of the dependent measure. This design is also commonly called a pre-test–post-test design. Even when random assignment is used to place participants into groups, there may be differences between the groups starting out. For example, perhaps some of the participants who end up in the experimental group have exceptional memories or are great at word tasks. If a basic experimental design is used, then the dependent variable is measured only once following exposure to the independent variable, during the post-test. Due to the exceptional qualities of the participants in the experimental group, the researcher might find no differences in word recall between the experimental and control groups. It appears that the independent variable, sleep deprivation, had no impairing effect on performance. But what if the experimental group had done even better had they not been sleep deprived? With only one measure of the dependent variable, an experimenter will never know the answer. However, if performance is measured before and after exposure to the independent variable, we can see how much performance is impaired by sleep deprivation. In a classic experimental design, participants are randomly assigned to an experimental or control group and then given a pre-test, where the dependent variable is measured prior to exposure to the manipulation (i.e., the independent variable). The participants in the experimental group then receive the manipulation (i.e., are sleep deprived), and both groups are reassessed (i.e., given a post-test) on the dependent variable, as outlined in figure 6.3.
Between-Subjects and Within-Subjects Designs
In the designs discussed thus far, the experimental group is exposed to an independent variable and the control group is not. The experimental group can experience an independent variable in two ways: It can be assigned either to one or all levels of the independent variable. In a between-subjects design , participants in the experimental group are exposed to only one level of the independent variable. For example, in an experiment on the effects of music on personality, undergraduates at a Canadian university were randomly assigned to one of three possible conditions where they (1) listened to a classical song while reading the English translation of the lyrics, (2) listened to a classical song and followed along with the text provided in German, or (3) listened to an English translation of the lyrics (Djikic, 2011). This is called a between-subjects design because there are differences between participants (also known as subjects) in how they experience the independent variable. This type of design is also called an “independent groups design” since each fraction of the experimental group independently receives one treatment. One-third of the participants heard the song while reading the English translation (a music and lyrics condition), one-third listened to the song while looking at German lyrics (described as a music only condition since no one in the study understood German), and one-third listened to the English translation (lyrics only without the music). In case you are curious, Djikic (2011) found that listening to classical music changed the participants’ personalities for the better, leading to a self-reported enhanced variability in overall personality, while exposure to lyrics but no actual music produced a diminished variability in personality.
In a within-subjects design , the participants in the experimental group are exposed to all levels of the independent variable. For example, in an experiment on participants’ willingness to endure a painful exercise for money, Cabanac (1986) paid 10 participants varying amounts of money to endure an isometric sitting exercise. Isometric sitting is akin to sitting without a chair, with one’s legs at a 90-degree angle, causing lactic acid to build up in the thigh muscles to produce a painful sensation. Using a within-subjects design, all participants were exposed to the six different levels of the independent variable, wherein they were paid 0.2, 0.5, 1.25, 3.125, or 7.8125 French francs (FF) per 20 seconds of exercise or a lump sum, presented in a random order. For example, participant number one attended their first session and might receive 0.2 FF for every 20 seconds of exercise during that session, 1.25 FF at the next session, 7.8125 FF at the third session, 3.125 FF at the fourth session, a lump sum on the fifth session, and 0.5 FF at their last session. All participants experienced the same conditions; however, the order of presentation differed (e.g., participant number two might receive the lump sum during their first session and the highest pay amount during their second). In this case, the difference in how the independent variable occurs is within the participants themselves, who each receive the treatments in random order. Cabanac (1986) found that money motivated participants’ willingness to endure pain up to a certain point: participants lasted longer for increasing amounts of money, but eventually they were unable to continue the exercise due to physical limitations beyond their desire to continue to withstand the pain for money.
A within-subjects design is also called a repeated measures design since participants are exposed to the independent variable repeatedly at multiple points in time. In a within-subjects/repeated measures design, participants may receive exposure to the same condition repeatedly as opposed to encountering different levels of the independent variable at different times. For example, McConville and Virk (2012) used a repeated measures design to examine the effectiveness of game-playing training using the Sony PlayStation 2 and EyeToy combination with the Harmonix AntiGrav game for improving postural balance. Participants were randomly assigned to either the control group (no training) or the experimental group (attended nine scheduled training sessions over a period of three weeks). During each 30-minute training session, the participants played the game four times using the controls and incorporating necessary head, body, and arm movements. Participants who underwent game training showed significant improvement on two different facets of balance (McConville & Virk, 2012).
One potential drawback of this type of design is its tendency to create order effects. Order effects are differences in the dependent variable that result from the order in which the independent variable is presented (as opposed to the manipulation itself). For example, imagine you are interested in learning whether people state a preference for a brand of cola (e.g., Pepsi or Coke) in a blind taste test. If, as the researcher, you always gave participants Coke first, it could be that the initial brand tasted lingered in a participant’s mouth, confounding or interfering with the taste of the next cola; that is, effects from trial number 1 have carried over into trial number 2. To control for this, the researcher could increase the time between trials 1 and 2 (to ensure the taste of the first cola has dissipated) or the experimenter could offer water in between trials to eliminate the traces left from trial 1. Alternatively, the researcher could employ a technique called “counterbalancing,” where all possible ways to order the independent variable are included in the design. For example, on half of the trials Coke would be presented first, and on half of the trials Pepsi would be presented first.
Quasi-Experimental Designs
Although you might be inclined to infer that a more complicated design is always a better one, there are many instances in which even a basic experimental design cannot be used.
For instance, a counsellor who works with clients in a Manitoba-based treatment program for adolescents, such as the Edgewood Program (for male youth who have committed a sexual offence) or the Mutchmor Program (for adolescent males with aggressive tendencies), might be interested in examining the effects of program completion on reoffending. It would be unethical and potentially unsafe to randomly offer treatment to certain offenders and not others to see if the treated group is less likely than the untreated group to reoffend at some point in the future. However, some offenders voluntarily enter treatment programs, while others do not. And, similarly, some offenders who enter treatment programs complete treatment, while others fail to complete the full course of the treatment (e.g., they fail to comply with the rules and are asked to leave the program or they decide to quit the program). While random assignment to the independent variable (treatment) is not possible, naturally occurring groups sometimes become available for studying, such as treatment completers and non-completers. These groups might be compared to see if those who complete treatment have lower reoffending rates (the dependent variable) than those who fail to complete the treatment program.
Experimental designs lacking in one or more of the main features of a true experiment are commonly referred to as quasi-experimental designs (also called pre-experimental designs). Quasi-experimental designs are especially prevalent in research projects designed to examine the effectiveness of treatment programs in areas such as clinical psychology, sociology, and social work. One of the most common types of quasi-experimental designs is a static group comparison , where there are two groups and a post-test measure, but random assignment was not an option for the placement of participants into the two groups. Instead, participants typically end up in the two groups as a function of self-selection. For example, a static group comparison might be used to examine treatment completers versus non-completers in a rehabilitative program for sexual offenders (see figure 6.4). Allowing time for reoffending to occur following treatment (e.g., a period of up to five years), the research question of interest, then, could be “Is reoffending (called recidivism) lower in treatment completers compared to non-completers?” A static group comparison is also useful for examining the differences between groups in situations wherein one group receives a novel treatment or when one group receives a placebo (or simulated treatment) while the other does not (Thyer, 2012). In evaluating the merit of a static group comparison, it is important to note that this design is often used in research that is more of a starting place for determining if a program or intervention appears to be effective. While static group comparisons are frequently employed to compare groups to see if a treatment helps, without random assignment, causal inferences are difficult to establish. For example, there could be important differences between offenders who complete treatment and those who do not, which account for the lower recidivism, irrespective of what went on in a given treatment program.
Another common quasi-experimental design is a one-shot case study , where a group receives exposure to an independent variable and then is measured on a dependent variable. This design lacks a control group (see figure 6.5). One-shot case studies are commonly employed in social work and educational research. In his book on quasi-experimental designs, Bruce Thyer (2012) notes that one-shot case studies play an especially important role in answering questions concerning the effectiveness of various interventions and programs such as “What is the status of clients after they have received a given course of treatment?” and “Do clients improve after receiving a given course of treatment?”
Research in Action
Impression Formation
In a classic study on impression formation, Solomon Asch (1946) read a list of peripheral character traits used to describe someone’s general disposition—intelligent, skillful, practical, etc.—to a group of participants and then had participants describe the sort of person to whom the traits might apply. All participants heard the same list read except for one word. Half of the participants heard the word warm , while the other half heard the word cold included among the descriptors. Results showed that participants exposed to the word warm created much more favourable impressions than those who were exposed to the word cold . Asch (1946) explained that only certain words (such as warm and cold ) are “central traits” that have an overall effect on impression formation, as demonstrated in his early study. In this example, only an experimental group was exposed to the independent variable (the warm or cold descriptor) and then measured on the dependent variable (i.e., their impression). This allowed the researcher to examine for an overall main effect, the difference between the warm and cold condition. However, without a control group, it was not possible to say which condition accounts for the effect. It could be that the warm descriptor produced a favourable impression and this alone accounted for the difference between the warm and cold condition, with the cold perhaps not having any effect. Conversely, it could be that the cold descriptor negatively impacted the impression to create the difference between the cold and warm condition, and the warm condition on its own perhaps did not have any impact. Finally, it could be that both the warm and cold descriptors produced separate main effects.
- What four features underlie a true experimental design?
- What main feature distinguishes a classic experimental design from a basic design?
- Which type of quasi-experimental design lacks a control group?
INTERNAL AND EXTERNAL VALIDITY
Recall how validity is an important consideration in evaluating whether a study is properly measuring/assessing the central concepts and constructs of interest. In other words, is a study properly examining what it is supposed to? In an experiment, validity takes on an even greater level of importance as a researcher tries to prove causation and generalize the findings beyond the confines of that study. Internal validity refers to the capacity to demonstrate an experimental effect and to rule out rival explanations for that effect. Campbell and Stanley (1963) coined the term internal validity , referring to it as “the basic minimum without which any experiment is uninterpretable: Did in fact the experimental treatments make a difference in this specific experimental instance?” (p. 5). In other words, internal validity pertains to whether a causal relationship has been established. A study has high internal validity if a researcher can demonstrate that it is highly likely an independent variable produced the outcome (i.e., the differences between groups observed on the dependent variable were due solely due to the independent variable) and it is highly unlikely that alternative explanations can account for the effect. Random assignment to an experimental and a control group helps to establish internal validity by minimizing differences between the groups prior to the experimental manipulation.
External validity refers to the generalizability of the effect or outcome beyond an experiment. In other words, do the results generalize to other people in other settings at other times? Does sleep deprivation impair performance in general, such as other cognitive or behavioural areas of functioning in the real world for most people, or is it limited to the results found in this study measuring performance on a word task with these participants? Random sampling helps to establish external validity because it eliminates bias generalizing from the sample to the population (Dorsten & Hotchkiss, 2005).
Internal and external validity are related in a manner that is considered a trade-off in experimental research. With high internal validity, a researcher can be sure the independent variable is the cause of any differences found between the experimental and control group on the dependent measure. However, the greater the control over the environment, which is high for most laboratory experiments, the more artificial and less lifelike the study becomes and the less likely the experiment will generalize to the real world (external validity). This can be countered by using a field experiment , where the experiment occurs naturally in a real-life context. For example, the Smithsonian Tropical Research Institute conducts field experiments in different locations on the effects of global climate change on tropical plants, such as by looking at plant responses to various levels of carbon dioxide concentrations. Similarly, social researchers examine variations in the behaviour of groups that take place in actual social situations, such as performance on a sports team, shopping behaviour in a supermarket, motorist responses at busy intersections, and student engagement in a classroom. Although field experiments have higher generalizability because they take place in the real world, there is very little control over that environment, and the isolation of variables can be quite problematic. As a result, high external validity corresponds to low internal validity and vice versa.
- What is the term for the generalizability of an experimental effect?
- How are internal and external validity related?
THREATS TO INTERNAL VALIDITY
In their pioneering work on experimental design, Donald Campbell and Julian Stanley (1963) identify eight considerations they refer to as “classes of extraneous variables” that need to be carefully controlled for within an experimental design or ruled out as possibilities via careful consideration. Otherwise, these unintentional variables can confound or interfere with the effects of the independent variable and make it difficult to properly assess the findings. The eight classes of extraneous variables are now more commonly recognized as “threats to internal validity” and they include selection, history, maturation, selection by maturation interactions, experimental mortality, testing, instrumentation, and statistical regression.
As a threat to internal validity, selection refers to methods used to obtain groups that can result in differences prior to the experimental manipulation. Recall how true experimental designs rely upon random assignment to achieve identical groups at the onset of the study. The first threat to validity concerns any practices that can lead to a selection bias, where the two groups are not identical at the beginning of the study. For example, allowing participants to self-select into groups can produce differences between the experimental and control groups since certain individuals opted for the treatment, while others chose to avoid it (Mitchell & Jolley, 1996). Using my earlier example, if participants who normally need a lot of sleep opt to be in the control condition because they would prefer to avoid sleep deprivation and those who end up in the sleep deprivation treatment choose it because they do not feel they require the normal amount of sleep, the effects of the independent variable may be nullified. This is because those least impacted by sleep deprivation self-selected themselves into the study, while those likely to be most impacted opted out. A similar problem would occur if a researcher purposely assigned participants to the groups in a manner that appeared arbitrary but was in fact not, as would be the case if a teacher assigned students in the front of the class to one group and the back of the class to another (Mitchell & Jolley, 1996). To help assess whether selection is a threat to internal validity in a given study, ask yourself, “Can I be sure the experimental and control groups are identical at the onset of the study?” This threat can be prevented using random assignment.
Another potential threat to internal validity concerns what is called history , which refers to changes in the dependent measure that are attributed to external events other than the independent variable and that occurred between the first and second measurement. This threat pertains to classic experimental designs or other designs containing a pre-test and post-test measurement of the dependent variable. For example, suppose a researcher is interested in the effects of exposure to information on government cutbacks on students’ attitudes toward tuition hikes. If large-scale student tuition protests take place during the testing periods, it is unclear whether changes in students’ attitudes toward tuition from time one to time two result from learning about government cutbacks (the experimental manipulation) or the coinciding historical event. To help assess whether history is a threat to internal validity in a given study, ask yourself, “Is it possible that some other event outside of the study produced these findings?” This threat cannot be avoided, but random assignment will help ensure that both groups experience its effects similarly. In addition, the inclusion of a pre-test will indicate if a history effect may have occurred, since there will be a difference between the pre-test and post-test for both groups, even the control group that did not experience the independent variable. Note that the difference will only indicate that history is a potential problem. The difference could also be the result of maturation, as discussed next.
A third potential threat to internal validity pertaining to experimental designs containing a pre-test and post-test is maturation. In the context of experimental research, maturation refers to changes in the dependent measure that result from processes within the research participants themselves over the period of treatment, such as growing older, gaining experience, or growing fatigued. For example, Palys and Atchison (2014) provide a simplistic case of a researcher interested in whether the administration of pills will help children learn to walk. If the participants are one-year-olds and none of them are walking at Time 1 but all of them are walking at Time 2, it is impossible to determine if the pill administered on a monthly basis for a year (as the treatment or independent variable) facilitated walking or if a natural biological process attributed to maturation resulted in all of the children walking at age two (p. 238).
Although none of the features of the designs discussed in this section protect against maturation, you can question whether this threat might be operating in studies showing a difference in the dependent measure at Time 1 and Time 2 for a control group. It cannot be the experimental manipulation that accounts for the unexpected change because this group is not exposed to the independent variable. To help assess whether maturation is a threat to internal validity in a given study, ask yourself, “Is it possible that naturally occurring changes within the participants are responsible for the findings?” Like history, this threat cannot be avoided, but random assignment will evenly distribute the effects across the experimental and control groups, and a pre-test will provide evidence of a difference that may be attributed to maturation.
Selection by Maturation Interaction
The influence of maturation can sometimes be driven by the initial selection of the experimental and control groups. In a selection by maturation interaction , there is a combined effect of initial differences in the groups at the onset of the study alongside maturation effects. For example, suppose a researcher assigned six-year-old boys to an experimental group and six-year-old females to a control group for a study on spatial skills. The experimenter believes that they can design an exercise program that will improve spatial skills. Both groups are measured at time 1, then the researcher spends a few months designing their program and places the boys in the exercise program where they do special drills once a week for an hour over the course of three weeks. Both groups are measured about a month after completing the program, and the boys show marked improvement in spatial skills compared to the girls. In this example, several factors could account for the findings. First, there could be differences in spatial skills between boys and girls to begin with. In addition, even if there are no initial differences between boys and girls, it could be that over a course of several months, the boys engaged in a variety of activities that the girls did not that inadvertently led to improvements in their spatial skills, irrespective of the specialized exercise program. For example, some may have participated in organized sports such as baseball or soccer, while others perhaps played sports during their recess breaks and over the lunch hour.
To help assess whether a selection by maturation interaction is a threat to internal validity in a given study, ask yourself, “Is it possible that the two groups would have eventually become different, irrespective of the independent variable?” Again, random assignment helps to improve internal validity by eliminating biases that could otherwise be present at the onset, and a pre-test for the control group helps to establish whether maturation is a possibility that could otherwise be mistaken for a treatment effect.
Experimental Mortality
A fifth potential threat to internal validity is experimental mortality , referring to the loss of participants due to a discontinuation of voluntary participation over time. Simply stated, the longer a study carries on, the greater the odds are that participants will drop out of the study for any number of reasons (e.g., they move, they lose track of the study, they lose interest). Mortality, also called attrition, is an inherent problem in longitudinal studies, particularly ones conducted over many years and ones that involve time-consuming and even unpleasant contributions from participants. For example, although highly important to our eventual understanding of the development of cancer among Canadians, Alberta’s Tomorrow Project periodically requests that participants complete detailed documents on their food intake, activity levels, and body measurements. In addition, they are asked to go to a health clinic to provide saliva samples, urine samples, and blood samples (Alberta’s Tomorrow Project, 2024c). What if the participants who are most motivated or healthiest choose to remain in the study and those who are least motivated or least healthy drop out? To help assess whether mortality is a threat to internal validity in a given study, ask yourself, “Did more participants drop out of the experimental compared to the control group (or vice versa)?” While mortality cannot be prevented by an experimental design feature, it can be reduced, where possible, by limiting the overall time frame for a study or by taking the pre-test and post-test measures at close intervals in time. By examining the pre-test and post-test measures for the control group, a researcher can see if a change in the group size has potentially influenced the results of a study. Finally, it is especially important to monitor the number of participants in each group to see if more participants drop out of one group relative to the other.
Although a pre-test is very important for indicating the presence of potential threats, such as mortality or maturation, testing itself can even pose a threat to the internal validity of an experimental design. Just as students in a class sometimes perform better on a second midterm once they have gained familiarity with how a professor designs and words the questions on exams, participants may improve from a first test to a second test, regardless of the experimental manipulation. In this case, a researcher would expect both the experimental and control group to show improvement. If the independent variable has an effect, the change should be greater for participants in the experimental condition. To help assess whether testing is a threat to internal validity in a given study, ask yourself, “Is it possible that participants’ test scores changed due to experience or familiarity gained from taking a pre-test?” The necessary feature for determining whether testing is a potential threat is the inclusion of a control group.
Instrumentation
Instrumentation refers to any changes in the way the dependent variable is measured that can lead to differences between Time 1 and Time 2. For example, a researcher might change a measuring instrument such as a scale or index because a newer or more improved version becomes available between Time 1 and Time 2. Alternatively, an observer or rater might fall ill or otherwise be unable to obtain measurements at both time periods, and there may be differences between how the first and second observer interpret events that influence the results. To help assess whether instrumentation is a threat to internal validity in a given study, ask yourself, “Were there any changes to the way the dependent variable was measured between the pre-test and post-test that might account for the findings?”
Statistical Regression
Finally, as Mitchell and Jolley (1996) point out, “even if the measuring instrument is the same for both the pre-test and post-test, the amount of chance measurement error may not be” (p. 143). Extreme scores are sometimes inflated by measurement error. That is, with increased measurement (as in testing someone at two points in time), extreme scores or outliers tend to level off to more accurately reflect the construct under investigation. In statistical terms, this is a phenomenon known as regression toward the mean. For example, researchers interested in helping people overcome phobias, obsessions, or behavioural disorders such as attention deficit might try to include participants likely to display extreme scores on a pre-test because extreme scores are indicative of those who are out of the range of normal and therefore have an actual disorder that needs to be managed. However, it is also likely that participants with extreme scores will show some improvement (their scores will go down from Time 1 to Time 2) regardless of the treatment, as they cannot really go any higher (since they are already extreme), and they are likely to have some good days and some bad days. There is a change in the dependent measure, but “the change is more illusory than real” (Palys & Atchison, 2014, p. 239). To help assess whether regression is a threat to internal validity in a given study, ask yourself, “Is it possible that participants were on the outlier or extreme end of scoring on the pre-test?” To help control for the statistical regression, a researcher can try to avoid the use of participants with extreme scores.
- What is the name for the threat to internal validity that results from methods used to obtain participants that result in differences prior to the experimental manipulation?
- Which threat to internal validity results from processes occurring within the participants?
- What is the name for the threat resulting from the combined effect of initial differences in the groups at the onset of the study alongside maturation effects?
- Which threat is assessed by asking “Were there any changes to the way the dependent variable was measured between the pre- and post-test that might account for the findings?”
- What question should be asked to see if regression toward the mean is a threat to validity?
THREATS TO EXTERNAL VALIDITY
Recall that external validity pertains to the ability to generalize beyond a given experiment. While a psychologist might rely upon a convenient sample of introductory psychology students who consent to participate in their study on jury deliberations, they are relying on the findings from their research to better inform him about how most Canadian jurors deliberate during trials. Just as there are threats to internal validity, there are features of experimental designs that can jeopardize the generalizability of findings beyond the experimental settings and participants on which they were based. In this section, I discuss the three common threats to external validity: experimenter bias, participant reactivity, and unrepresentative samples.
Experimenter Bias
In an earlier chapter, you learned about sources of error in measurement and how a researcher might try to help along a study in order get a desired outcome. Experimenter bias “exists when researchers inadvertently influence the behavior of research participants in a way that favors the outcomes they anticipate” (Marczyk et al., 2005, p. 69). For example, a researcher who expects sleep deprivation to hinder performance might distract participants in the experimental condition or fail to give them the full instructions for how to complete the task, whereas those in the control condition might receive additional cues that aid performance. Although experimenter bias is one of the most common and basic threats to external validity (Kintz et al., 1965), there are ways to minimize its occurrence. First, procedural control in an experiment can include the use of scripts to ensure a researcher reads the exact same instructions to each of the participants. Alternatively, control can even be removed from the researcher such that participants might receive typed instructions, watch a video clip, or hear an audio recording that describes how to carry out the task that is being measured in the study. Anything that can be done to standardize the procedures to help ensure identical treatment for the control and experimental group (except for the independent variable) will reduce the likelihood of experimental bias.
In addition to standardized instructions and procedures, where possible, experimenters should not be allowed access to the outcome measure while it is taking place. For example, in Symbaluk et al.’s (1997) pain experiment, the dependent variable was how long participants lasted at an isometric sitting exercise. Participants ended a session by sitting down on a box that contained a pressure plate that stopped a timer. The experimenter was not in the same room with the participants while they performed the exercise and therefore could not influence how long they lasted at the exercise. Further, a recording device (not the experimenter) indicated how long each participant lasted.
Finally, in some studies it is possible to keep an experimenter or research assistant “blind” to the important features of the study until after the dependent variable is measured. For example, in the pain experiment, an assistant was in the room with the participants while they performed the exercise. The assistant recorded pain ratings at regular intervals and noted when participants ended the exercise. The assistant was never informed about the hypothesis or the independent variable, or whether any given participant was in an experimental or control condition. As a result, the assistant had no reason to create an experimenter effect. It may even be possible to utilize a “double-blind” technique, where both the researchers/assistants and the participants are unaware of which participants are assigned to the experimental and control conditions. In the pain experiment discussed above, participants were not informed about the other conditions until the completion of the study, so they could not form an expectation about whether they should or should not do well based on features of the study, such as the amount of money they were being paid relative to others. Participant bias is discussed in the next section.
Basic Instincts , Part 5. The Milgram Experiment Re-Visited
Recall the now classic experiments on obedience conducted by Stanley Milgram in the 1960s that were discussed in detail in chapter 3 as an example of unethical research due to the prolonged psychological harm experienced by participants who believed they were giving painful electric shocks to a learner. Professor Jerry Burger, an emeritus social psychologist from Santa Clara University, partially replicated Milgram’s studies in an experiment that was broadcast on January 3, 2007, as part of an ABC News Primetime television program, called Basic Instincts . Surprisingly, participants were almost as obedient in this study as they were in Milgram’s original versions. To learn how Burger created a “safer” version of Milgram’s procedures and to find out if there are differences in obedience between males and females, check out the video published by ABC News Productions. For more information on the video and Burger’s (2009) article summarizing this study, called “ Replicating Milgram: Would People Still Obey Today? ” published in American Psychologist , refer to Jerry Burger’s professional profile .
Participant Reactivity
A second source of bias rests with participants themselves. Participant reactivity refers to the tendency for research participants to act differently during a study simply because they are aware that they are participating in a research study. This sometimes occurs because participants try to “look good,” suggesting a social desirability bias, or they pick up on what the study is about and try to help the researchers prove their hypothesis by following demand characteristics, as discussed in chapter 4. One way to lessen participant reactivity is to withhold details regarding the hypothesis and/or experimental manipulation from the participants until after the dependent variable is measured. For example, suppose we were interested in whether students would help a fellow classmate get caught up on missed lecture notes. The dependent variable is whether students agree to loan their notes. Perhaps we hypothesize that students will be more willing to loan notes to someone who was sick from class versus someone who skipped class. We can manipulate the independent variable by sending one request to half of the class asking if anyone is willing to loan their notes to a fellow classmate who was recently ill, and a request to the other half of the class asking if anyone is willing to loan their notes to a student who missed class to attend a Stanley Cup playoff game. If we told the class ahead of time we were studying their willingness to help a classmate, they might react to the request in order to appear helpful, irrespective of the independent variable.
Participant effects are also sometimes controlled for with the use of deception, where participants are led to believe the experimenter is investigating something different than the true purpose of the study. In the example above, we might use a cover story in which we tell the students we are studying aspects of internet usage and have them complete a short questionnaire asking about their familiarity with and time spent on sites such as Facebook and X. As they complete the short survey, we might pass out the request for help. The request might be in the form of a handout that has a spot at the bottom where they can check off a box if they want to lend notes and leave a contact. We could then collect the handout along with the completed questionnaires.
In cases where participants are not informed about the hypothesis under investigation or they are misled about the hypothesis, at the completion of the study the investigators may ask participants if they can guess the hypothesis under investigation. A participant who accurately states the hypothesis under investigation despite the researcher’s attempts to conceal it would be considered “suspicious,” and the results for the experiment would be examined with and without suspicious cases as a further check to determine if reactivity was a problem. Deception and the withholding of information is used only rarely in experimental research—typically involving social psychological processes that would be negated by full disclosure (e.g., willingness to help) because the practice goes against the participants’ ethical rights to informed consent. In all cases where information is withheld from participants or they are deceived by a cover story or misled in any way by procedures used in the experiment, a detailed debriefing must occur as soon as possible. The debriefing should include full disclosure of the nature and the purpose of any form of deception and allow the participant to seek further clarification on any aspect of the study.
Unrepresentative Samples
Researchers at universities across Canada regularly conduct studies using students enrolled in introductory psychology classes as a common pool of available research participants. In many cases, the students receive a small course credit as a direct incentive for their participation so that psychologists and graduate students can obtain the needed participants to further the interests of science, their own research agendas, and important degree requirements. To ensure voluntary participation from an ethical perspective, students who do agree to participate in research must be allowed to withdraw their participation at any time without penalty (i.e., they would still obtain credit or be able to complete a comparable project for course credit). While clearly a convenient sample, a group of psychology majors seeking an arts or science degree is unlikely to represent the broader university population enrolled in any number of other programs, such as commerce, communication studies, or contemporary popular music. Similarly, Canadian residents who volunteer as experimental research participants tend to be different in important ways from the general population. Rosenthal and Rosnow (1975), for example, found that the typical volunteer in experimental research was more intelligent, more sociable, and from a higher social class. More recently, Stahlmann et al. (2024) found that people with agreeable, extraverted and open/intellectual personalities were more likely to engage in volunteerism and other forms of civic engagement.
Unrepresentative samples are especially problematic for claiming the effectiveness of programs for things like drug treatment since the participants who self-select into treatment tend to be the most motivated and most likely to benefit from treatment. Campbell and Stanley (1963) refer to this threat as a selection by treatment interaction effect since those most susceptible to the independent variable have placed themselves in the study. That is not to say that unwilling participants should be coerced into treatment just to balance out the sample. Research ethics aside, research has also shown that court-ordered participants are more resentful and less committed to the objectives of drug treatment programs (Sullivan, 2001).
- What is experimenter bias and how can this be minimized in an experiment?
- What is participant reactivity and how can this be controlled for in an experiment?
- Why is a volunteer sample unlikely to be representative of the larger population from which it was drawn?
Activity: Threats to Validity
Chapter summary.
- Describe the rationale underlying an experimental method. In an experiment, at least one independent variable is manipulated by a researcher to measure its effects (if any) on a dependent variable.
- Identify the criteria needed to establish causality and explain which features of an experiment support the testing of cause–effect relationships. To establish causality, two variables must be related in a logical way, the presumed cause must precede the effect in time, and the cause should be the most plausible, ruling out rival explanations. Strict control over the environment and random assignment to the experimental and control group helps to ensure that the only difference between the two groups results from the independent variable.
- Differentiate between basic and classic experimental designs; explain how exposure to the independent variable differs in between-subjects versus within-subjects designs; and explain why some designs are classified as quasi-experimental. A basic experimental design includes random assignment, an experiment and control group, the manipulation of an independent variable experienced by the experimental group, and the measurement of a dependent variable. A classic experiment includes these features along with a pre-test measure of the dependent variable prior to the manipulation of the independent variable. In a between-subjects design , participants in the experimental group are exposed to only one level of the independent variable. In a within-subjects design , participants in the experimental group are exposed to all levels of the independent variable. A quasi-experimental design lacks one of the features of a true experiment, such as random assignment or a control group.
- Define internal and external validity. Internal validity is the capacity to demonstrate an experimental effect and to rule out rival explanations for that effect. External validity refers to the generalizability of the effect beyond a given experiment to other people in other settings at other times.
- Identify and describe potential threats to internal validity. This chapter discusses eight threats to validity: (1) selection refers to methods used to obtain groups that can result in differences prior to the experimental manipulation; (2) history refers to changes in the dependent variable attributed to external events occurring between the first and second measurement; (3) maturation refers to changes in dependent measure that result from processes within the research participants; (4) selection by maturation interaction refers to a combined effect of initial differences in the groups and maturation; (5) experimental mortality refers to the course of participant drop-out over time; (6) testing refers to changes in the dependent variables that result from experience gained on the pre-test; (7) instrumentation refers to any changes in the way the dependent variable is measured; and (8) statistical regression refers to differences produced by the tendency for extreme scores to become less extreme.
- Identify and describe potential threats to external validity. This chapter discusses three threats to external validity: (1) experimenter bias exists when researchers influence the behaviour of research participants in a manner that favours the outcomes they anticipate; (2) participant reactivity refers to the tendency for research participants to act differently during a study simply because they are aware that they are participating in a research study; and (3) unrepresentative samples , such as introductory psychology students or other groups that self-select into experiments, are likely different in important ways from the population of interest.
RESEARCH REFLECTIONS
- List the name of the study and the affiliated primary researcher(s).
- Describe the purpose of the study as identified in the associated consent form.
- Outline the procedures for potential participants.
- Based on details provided about the study, describe one potential threat to internal or external validity discussed in this chapter that could impact the results of your selected study.
- Does the experimental design used by Amy fit the criteria for a true experimental design? Why or why not?
- Why is it impossible to prove causality in this instance?
- What recommendations would you make to Amy to improve upon her design, so she could be more confident that her manipulation is working?
- Why was a quasi-experimental design used in this study?
- What is the main independent variable and how was it manipulated?
- What is the main dependent variable and how was it measured?
- Does coaching improve academic performance? What features of this study increase your confidence in the findings?
- Are there any potential threats to internal or external validity relevant to this study? Explain your response.
LEARNING THROUGH PRACTICE
Objective: To design an experimental taste test
Directions:
- Pair up with someone else in class.
- Discuss whether you believe people can accurately identify their preferred food or beverage brands from among a sample of competitors’ brands.
- Come up two testable hypotheses of interest related to specific taste preferences. For example, H1: Participants will be able to identify their stated cola preference in a blind taste test between Pepsi and Coke.
- Reflecting on threats to internal validity, identify factors that you think might influence taste and how you might control for these in your experimental design.
- What materials will need to carry out this study?
- How will you order the presentation of the beverages or food items?
- What sort of instructions will you give participants?
- How will you measure preference?
RESEARCH RESOURCES
- For students and researchers with statistical proficiency who want to learn more about research designs for special circumstances and about more complex experimental designs, refer to chapters 10 and 11 in Cozby, P. C. et al., (2020). Methods in behavioural research (3rd Canadian ed.). McGraw-Hill Education.
- To learn about independent-groups, dependent-groups, and single-participant designs, see chapters 7 to 9 in Rooney, B. J., and Evans, A. N. (2019). Methods in psychological research (4th ed.). Sage.
- For advice on conducting experimental research online using open-source software, see Peirce, J., Hirst, R., & MacAskill, M. (2022). Building experiments in PsychoPy (2nd ed.). Sage.
- For a critique and re-evaluation of Zimbardo’s Stanford Prison study, check out Michael Stevens’ The Stanford Prison Experiment video (part of Mind Field Season 3 Episode 4) posted to YouTube posted on December 19, 2018.
- Opening quote retrieved from https://www.brainyquote.com/ . ↵
A research method in which a researcher manipulates an independent variable to examine its effects on a dependent variable.
The variable that is manipulated in an experiment and is presumed to be the cause of some outcome.
The variable that is measured in an experiment and is the outcome.
A method for assigning cases in which chance alone determines receipt of the experimental manipulation.
The group that experiences the independent variable in an experiment.
The group that does not experience the independent variable in an experiment.
An experimental design that includes random assignment, an experimental and a control group, the manipulation of an independent variable, and a post-test measurement of the dependent variable.
An experimental design that includes random assignment, an experimental and a control group, a pre-test measure of a dependent variable, the manipulation of an independent variable, and a post-test measure of the same dependent variable.
A type of design in which the experimental group is exposed to only one level of the independent variable.
A type of design in which the experimental group is exposed to all possible levels of the independent variable.
Differences in the dependent variable that result from the order in which the independent variable is presented.
An experimental design that lacks one or more of the basic features of a true experiment, including random assignment or a control group.
A quasi-experimental design lacking random assignment in which two groups are compared following a treatment.
A quasi-experimental design lacking a control group, in which one group is examined following a treatment.
The capacity to demonstrate an experimental effect and to rule out rival explanations for that effect.
The generalizability of an experimental effect.
A naturally occurring experiment that takes place in a real-life setting.
Methods used to obtain groups that can result in differences prior to the experimental manipulation.
Changes in the dependent measure attributed to external events outside of the experiment.
Changes in the dependent measure that result from naturally occurring processes within the research participants themselves over the period of treatment.
A combined effect of maturation and initial differences in the groups at the onset of the study.
The course of participant drop-out over time.
Changes in the dependent measure that result from experience gained on the pre-test.
Differences produced by changes in the way the dependent variable is measured.
Differences produced by the tendency for extreme scores to become less extreme.
The tendency for researchers to influence the behaviour of research participants in a manner that favours the outcomes they anticipate.
The tendency for research participants to act differently during a study simply because they are aware that they are participating in a research study.
A threat to external validity produced by the self-selection of participants susceptible to the independent variable.
Research Methods: Exploring the Social World in Canadian Context Copyright © 2024 by Diane Symbaluk & Robyn Hall is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.
Experimental Design
Ethics, Integrity, and the Scientific Method
- Reference work entry
- First Online: 02 April 2020
- Cite this reference work entry
- Jonathan Lewis 2
3108 Accesses
5 Citations
Experimental design is one aspect of a scientific method. A well-designed, properly conducted experiment aims to control variables in order to isolate and manipulate causal effects and thereby maximize internal validity, support causal inferences, and guarantee reliable results. Traditionally employed in the natural sciences, experimental design has become an important part of research in the social and behavioral sciences. Experimental methods are also endorsed as the most reliable guides to policy effectiveness. Through a discussion of some of the central concepts associated with experimental design, including controlled variation and randomization, this chapter will provide a summary of key ethical issues that tend to arise in experimental contexts. In addition, by exploring assumptions about the nature of causation and by analyzing features of causal relationships, systems, and inferences in social contexts, this chapter will summarize the ways in which experimental design can undermine the integrity of not only social and behavioral research but policies implemented on the basis of such research.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save.
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
Similar content being viewed by others
Research Design: Toward a Realistic Role for Causal Analysis
Experiments and Econometrics
Alderson P (1996) Equipoise as a means of managing uncertainty: personal, communal and proxy. J Med Ethics 223:135–139
Article Google Scholar
Arabatzis T (2014) Experiment. In: Curd M, Psillos S (eds) The Routledge companion to philosophy of science, 2nd edn. Routledge, London, pp 191–202
Google Scholar
Baele S (2013) The ethics of new development economics: is the experimental approach to development economics morally wrong? J Philos Econ 7(1):2–42
Beauchamp T, Childress J (2013) Principles of biomedical ethics, 7th edn. Oxford University Press, Oxford
Binmore K (1999) Why experiment in economics? Econ J 109(453):16–24
Bogen J (2002) Epistemological custard pies from functional brain imaging. Philos Sci 69(3):59–71
Bordens K, Abbott B (2013) Research and design methods: a process approach. McGraw-Hill, Boston
Brady H (2011) Causation and explanation in social science. In: Goodin R (ed) The Oxford handbook of political science. Oxford University Press, Oxford, pp 1054–1107
Broome J (1984) Selecting people randomly. Ethics 95(1):38–55
Brown A, Mehta T, Allison D (2017) Publication bias in science: what is it, why is it problematic, and how can it be addressed? In: Jamieson K, Kahan D, Scheufele D (eds) The Oxford handbook of the science of science communication. Oxford University Press, Oxford, pp 93–101
Cartwright N (1999) The dappled world: a study of the boundaries of science. Cambridge University Press, Cambridge, UK
Book Google Scholar
Cartwright N (2007) Hunting causes and using them. Cambridge University Press, Cambridge, UK
Cartwright N (2012) RCTs, evidence, and predicting policy effectiveness. In: Kincaid H (ed) The Oxford handbook of philosophy of social science. Oxford University Press, Oxford, UK, pp 298–318
Cartwright N (2014) Causal inference. In: Cartwright N, Montuschi E (eds) Philosophy of social science: a new introduction. Oxford University Press, Oxford, pp 308–337
Churchill L (1980) Physician-investigator/patient-subject: exploring the logic and the tension. J Med Philos 5(3):215–224
Clarke S (1999) Justifying deception in social science research. J Appl Philos 16(2):151–166
Conner R (1982) Random assignment of clients in social experimentation. In: Sieber J (ed) The ethics of social research: surveys and experiments. Springer, New York, pp 57–77
Chapter Google Scholar
Cook T, Campbell D (1986) The causal assumptions of quasi-experimental practice. Synthese 68(1):141–180
Cook C, Sheets C (2011) Clinical equipoise and personal equipoise: two necessary ingredients for reducing bias in manual therapy trials. J Man Manipulative Ther 19(1):55–57
Crasnow S (2017) Bias in social science experiments. In: McIntyre L, Rosenberg A (eds) The Routledge companion to the philosophy of social science. Routledge, London, pp 191–201
Douglas H (2014) Values in social science. In: Cartwright N, Montuschi E (eds) Philosophy of social science: a new introduction. Oxford University Press, Oxford, pp 162–182
Feest U, Steinle F (2016) Experiment. In: Humphreys P (ed) The Oxford handbook of philosophy of science. Oxford University Press, Oxford, pp 274–295
Freedman B (1987) Equipoise and the ethics of clinical research. N Engl J Med 317(3):141–145
Freedman B, Glass K, Weijer C (1996) Placebo orthodoxy in clinical research II: ethical, legal, and regulatory myths. J Law Med Ethics 24(3):252–259
Fried C (1974) Medical experimentation: personal integrity and social policy. Elsevier, New York
Gangl M (2010) Causal inference in sociological research. Annu Rev Sociol 36:21–47
Geller D (1982) Alternatives to deception: why, what, and how? In: Sieber JE (ed) The ethics of social research: surveys and experiments. Springer, New York, pp 38–55
Gifford F (1986) The conflict between randomized clinical trials and the therapeutic obligation. J Med Philos 11:347–366
Gillon R (1994) Medical ethics: four principles plus attention to scope. Br Med J 309(6948):184–188
Goldthorpe J (2001) Causation, statistics, and sociology. Eur Sociol Rev 17(1):1–20
Guala F (2005) The methodology of experimental economics. Cambridge University Press, Cambridge
Guala F (2009) Methodological issues in experimental design and interpretation. In: Kincaid H, Ross D (eds) The Oxford handbook of philosophy of economics. Oxford University Press, Oxford, pp 280–305
Guala F (2012) Experimentation in economics. In: Mäki U (ed) Philosophy of economics. Elsevier/North Holland, Oxford, pp 597–640
Hacking I (1999) The social construction of what? Harvard University Press, Cambridge, MA
Hammersley M (2008) Paradigm war revived? On the diagnosis of resistance to randomized controlled trials and systematic review in education. Int J Res Method Educ 31(1):3–10
Hegtvedt K (2014) Ethics and experiments. In: Webster M, Sell J (eds) Laboratory experiments in the social sciences. Academic, London, pp 23–51
Holmes D (1976) ‘Debriefing after psychological experiments: I. Effectiveness of postdeception dehoaxing’ and ‘Debriefing after psychological experiments: II. Effectiveness of postexperimental desensitizing’. Am Psychol 32:858–875
Humphreys M (2015) Reflections on the ethics of social experimentation. J Glob Dev 6(1):87–112
Kaidesoja T (2017) Causal inference and modeling. In: McIntyre L, Rosenberg A (eds) The Routledge companion to philosophy of social science. Routledge, London, pp 202–213
Kelman H (1982) Ethical issues in different social science methods. In: Beauchamp T et al (eds) Ethical issues in social science research. John Hopkins University Press, Baltimore, pp 40–98
Kuorikoski J, Marchionni C (2014) Philosophy of economics. In: French S, Saatsi J (eds) The Bloomsbury companion to the philosophy of science. Bloomsbury, London, pp 314–333
Levine R (1979) Clarifying the concepts of research ethics. Hast Cent Rep 9(3):21–26
Lilford R, Jackson J (1995) Equipoise and the ethics of randomization. J R Soc Med 88(10):552–559
Miller F, Brody H (2003) A critique of clinical equipoise: therapeutic misconception in the ethics of clinical trials. Hast Cent Rep 33(3):19–28
Miller P, Weijer C (2006) Fiduciary obligation in clinical research. J Law Med Ethics 34(2):424–440
Mitchell S (2009) Unsimple truths: science, complexity, and policy. University of Chicago Press, Chicago
Morton R, Williams K (2010) Experimental political science and the study of causality: from nature to the lab. Cambridge University Press, Cambridge, UK
Oakley A et al (2003) Using random allocation to evaluate social interventions: three recent UK examples. Ann Am Acad Pol Soc Sci 589(1):170–189
Papineau D (1994) The virtues of randomization. Br J Philos Sci 45:437–450
Pearl J (2000) Causality-models, reasoning and inference. Cambridge University Press, Cambridge, UK
Risjord M (2014) Philosophy of social science: a contemporary introduction. Routledge, London
Sieber, Joan (1982) Ethical dilemmas in social research. In: Sieber J (ed) The ethics of social research: surveys and experiments. Springer, New York, pp 1–29
Sieber J (1992) Planning ethically responsible research: a guide for students and internal review boards. Sage, Newbury Park
Sobel M (1996) An introduction to causal inference. Sociol Methods Res 24(3):353–379
Sullivan J (2009) The multiplicity of experimental protocols. A challenge to reductionist and non-reductionist models of the unity of neuroscience. Synthese 167:511–539
Urbach P (1985) Randomization and the design of experiments. Philos Sci 52:256–273
Veatch R (2007) The irrelevance of equipoise. J Med Philos 32(2):167–183
Wilholt T (2009) Bias and values in scientific research. Stud Hist Phil Sci 40(1):92–101
Woodward J (2008) Invariance, modularity, and all that. Cartwright on causation. In: Cartwright N et al (eds) Nancy Cartwright’s philosophy of science. Routledge, New York, pp 198–237
Worrall J (2002) What evidence in evidence-based medicine? Philos Sci 69(3):316–330
Worrall J (2007) Why there’s no cause to randomize. Br J Philos Sci 58(3):451–488
Download references
Author information
Authors and affiliations.
Institute of Ethics, School of Theology, Philosophy and Music, Faculty of Humanities and Social Sciences, Dublin City University, Dublin, Ireland
Jonathan Lewis
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Jonathan Lewis .
Editor information
Editors and affiliations.
Chatelaillon Plage, France
Ron Iphofen
Rights and permissions
Reprints and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this entry
Cite this entry.
Lewis, J. (2020). Experimental Design. In: Iphofen, R. (eds) Handbook of Research Ethics and Scientific Integrity. Springer, Cham. https://doi.org/10.1007/978-3-030-16759-2_19
Download citation
DOI : https://doi.org/10.1007/978-3-030-16759-2_19
Published : 02 April 2020
Publisher Name : Springer, Cham
Print ISBN : 978-3-030-16758-5
Online ISBN : 978-3-030-16759-2
eBook Packages : Religion and Philosophy Reference Module Humanities and Social Sciences Reference Module Humanities
Share this entry
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
5 Chapter 5: Experimental and Quasi-Experimental Designs
Case stu dy: the impact of teen court.
Research Study
An Experimental Evaluation of Teen Courts 1
Research Question
Is teen court more effective at reducing recidivism and improving attitudes than traditional juvenile justice processing?
Methodology
Researchers randomly assigned 168 juvenile offenders ages 11 to 17 from four different counties in Maryland to either teen court as experimental group members or to traditional juvenile justice processing as control group members. (Note: Discussion on the technical aspects of experimental designs, including random assignment, is found in detail later in this chapter.) Of the 168 offenders, 83 were assigned to teen court and 85 were assigned to regular juvenile justice processing through random assignment. Of the 83 offenders assigned to the teen court experimental group, only 56 (67%) agreed to participate in the study. Of the 85 youth randomly assigned to normal juvenile justice processing, only 51 (60%) agreed to participate in the study.
Upon assignment to teen court or regular juvenile justice processing, all offenders entered their respective sanction. Approximately four months later, offenders in both the experimental group (teen court) and the control group (regular juvenile justice processing) were asked to complete a post-test survey inquiring about a variety of behaviors (frequency of drug use, delinquent behavior, variety of drug use) and attitudinal measures (social skills, rebelliousness, neighborhood attachment, belief in conventional rules, and positive self-concept). The study researchers also collected official re-arrest data for 18 months starting at the time of offender referral to juvenile justice authorities.
Teen court participants self-reported higher levels of delinquency than those processed through regular juvenile justice processing. According to official re-arrests, teen court youth were re-arrested at a higher rate and incurred a higher average number of total arrests than the control group. Teen court offenders also reported significantly lower scores on survey items designed to measure their �belief in conventional rules� compared to offenders processed through regular juvenile justice avenues. Other attitudinal and opinion measures did not differ significantly between the experimental and control group members based on their post-test responses. In sum, those youth randomly assigned to teen court fared worse than control group members who were not randomly assigned to teen court.
Limitations with the Study Procedure
Limitations are inherent in any research study and those research efforts that utilize experimental designs are no exception. It is important to consider the potential impact that a limitation of the study procedure could have on the results of the study.
In the current study, one potential limitation is that teen courts from four different counties in Maryland were utilized. Because of the diversity in teen court sites, it is possible that there were differences in procedure between the four teen courts and such differences could have impacted the outcomes of this study. For example, perhaps staff members at one teen court were more punishment-oriented than staff members at the other county teen courts. This philosophical difference may have affected treatment delivery and hence experimental group members� belief in conventional attitudes and recidivism. Although the researchers monitored each teen court to help ensure treatment consistency between study sites, it is possible that differences existed in the day-to-day operation of the teen courts that may have affected participant outcomes. This same limitation might also apply to control group members who were sanctioned with regular juvenile justice processing in four different counties.
A researcher must also consider the potential for differences between the experimental and control group members. Although the offenders were randomly assigned to the experimental or control group, and the assumption is that the groups were equivalent to each other prior to program participation, the researchers in this study were only able to compare the experimental and control groups on four variables: age, school grade, gender, and race. It is possible that the experimental and control group members differed by chance on one or more factors not measured or available to the researchers. For example, perhaps a large number of teen court members experienced problems at home that can explain their more dismal post-test results compared to control group members without such problems. A larger sample of juvenile offenders would likely have helped to minimize any differences between the experimental and control group members. The collection of additional information from study participants would have also allowed researchers to be more confident that the experimental and control group members were equivalent on key pieces of information that could have influenced recidivism and participant attitudes.
Finally, while 168 juvenile offenders were randomly assigned to either the experimental or control group, not all offenders agreed to participate in the evaluation. Remember that of the 83 offenders assigned to the teen court experimental group, only 56 (67%) agreed to participate in the study. Of the 85 youth randomly assigned to normal juvenile justice processing, only 51 (60%) agreed to participate in the study. While this limitation is unavoidable, it still could have influenced the study. Perhaps those 27 offenders who declined to participate in the teen court group differed significantly from the 56 who agreed to participate. If so, it is possible that the differences among those two groups could have impacted the results of the study. For example, perhaps the 27 youths who were randomly assigned to teen court but did not agree to be a part of the study were some of the least risky of potential teen court participants�less serious histories, better attitudes to begin with, and so on. In this case, perhaps the most risky teen court participants agreed to be a part of the study, and as a result of being more risky, this led to more dismal delinquency outcomes compared to the control group at the end of each respective program. Because parental consent was required for the study authors to be able to compare those who declined to participate in the study to those who agreed, it is unknown if the participants and nonparticipants differed significantly on any variables among either the experimental or control group. Moreover, of the resulting 107 offenders who took part in the study, only 75 offenders accurately completed the post-test survey measuring offending and attitudinal outcomes.
Again, despite the experimental nature of this study, such limitations could have impacted the study results and must be considered.
Impact on Criminal Justice
Teen courts are generally designed to deal with nonserious first time offenders before they escalate to more serious and chronic delinquency. Innovative programs such as �Scared Straight� and juvenile boot camps have inspired an increase in teen court programs across the country, although there is little evidence regarding their effectiveness compared to traditional sanctions for youthful offenders. This study provides more specific evidence as to the effectiveness of teen courts relative to normal juvenile justice processing. Researchers learned that teen court participants fared worse than those in the control group. The potential labeling effects of teen court, including stigma among peers, especially where the offense may have been very minor, may be more harmful than doing less or nothing. The real impact of this study lies in the recognition that teen courts and similar sanctions for minor offenders may do more harm than good.
One important impact of this study is that it utilized an experimental design to evaluate the effectiveness of a teen court compared to traditional juvenile justice processing. Despite the study�s limitations, by using an experimental design it improved upon previous teen court evaluations by attempting to ensure any results were in fact due to the treatment, not some difference between the experimental and control group. This study also utilized both official and self-report measures of delinquency, in addition to self-report measures on such factors as self-concept and belief in conventional rules, which have been generally absent from teen court evaluations. The study authors also attempted to gauge the comparability of the experimental and control groups on factors such as age, gender, and race to help make sure study outcomes were attributable to the program, not the participants.
In This Chapter You Will Learn
The four components of experimental and quasi-experimental research designs and their function in answering a research question
The differences between experimental and quasi-experimental designs
The importance of randomization in an experimental design
The types of questions that can be answered with an experimental or quasi-experimental research design
About the three factors required for a causal relationship
That a relationship between two or more variables may appear causal, but may in fact be spurious, or explained by another factor
That experimental designs are relatively rare in criminal justice and why
About common threats to internal validity or alternative explanations to what may appear to be a causal relationship between variables
Why experimental designs are superior to quasi-experimental designs for eliminating or reducing the potential of alternative explanations
Introduction
The teen court evaluation that began this chapter is an example of an experimental design. The researchers of the study wanted to determine whether teen court was more effective at reducing recidivism and improving attitudes compared to regular juvenile justice case processing. In short, the researchers were interested in the relationship between variables �the relationship of teen court to future delinquency and other outcomes. When researchers are interested in whether a program, policy, practice, treatment, or other intervention impacts some outcome, they often utilize a specific type of research method/design called experimental design. Although there are many types of experimental designs, the foundation for all of them is the classic experimental design. This research design, and some typical variations of this experimental design, are the focus of this chapter.
Although the classic experiment may be appropriate to answer a particular research question, there are barriers that may prevent researchers from using this or another type of experimental design. In these situations, researchers may turn to quasi-experimental designs. Quasi-experiments include a group of research designs that are missing a key element found in the classic experiment and other experimental designs (hence the term �quasi� experiment). Despite this missing part, quasi-experiments are similar in structure to experimental designs and are used to answer similar types of research questions. This chapter will also focus on quasi-experiments and how they are similar to and different from experimental designs.
Uncovering the relationship between variables, such as the impact of teen court on future delinquency, is important in criminal justice and criminology, just as it is in other scientific disciplines such as education, biology, and medicine. Indeed, whereas criminal justice researchers may be interested in whether a teen court reduces recidivism or improves attitudes, medical field researchers may be concerned with whether a new drug reduces cholesterol, or an education researcher may be focused on whether a new teaching style leads to greater academic gains. Across these disciplines and topics of interest, the experimental design is appropriate. In fact, experimental designs are used in all scientific disciplines; the only thing that changes is the topic. Specific to criminal justice, below is a brief sampling of the types of questions that can be addressed using an experimental design:
Does participation in a correctional boot camp reduce recidivism?
What is the impact of an in-cell integration policy on inmate-on-inmate assaults in prisons?
Does police officer presence in schools reduce bullying?
Do inmates who participate in faith-based programming while in prison have a lower recidivism rate upon their release from prison?
Do police sobriety checkpoints reduce drunken driving fatalities?
What is the impact of a no-smoking policy in prisons on inmate-on-inmate assaults?
Does participation in a domestic violence intervention program reduce repeat domestic violence arrests?
A focus on the classic experimental design will demonstrate the usefulness of this research design for addressing criminal justice questions interested in cause and effect relationships. Particular attention is paid to the classic experimental design because it serves as the foundation for all other experimental and quasi-experimental designs, some of which are covered in this chapter. As a result, a clear understanding of the components, organization, and logic of the classic experimental design will facilitate an understanding of other experimental and quasi-experimental designs examined in this chapter. It will also allow the reader to better understand the results produced from those various designs, and importantly, what those results mean. It is a truism that the results of a research study are only as �good� as the design or method used to produce them. Therefore, understanding the various experimental and quasi-experimental designs is the key to becoming an informed consumer of research.
The Challenge of Establishing Cause and Effect
Researchers interested in explaining the relationship between variables, such as whether a treatment program impacts recidivism, are interested in causation or causal relationships. In a simple example, a causal relationship exists when X (independent variable) causes Y (dependent variable), and there are no other factors (Z) that can explain that relationship. For example, offenders who participated in a domestic violence intervention program (X�domestic violence intervention program) experienced fewer re-arrests (Y�re-arrests) than those who did not participate in the domestic violence program, and no other factor other than participation in the domestic violence program can explain these results. The classic experimental design is superior to other research designs in uncovering a causal relationship, if one exists. Before a causal relationship can be established, however, there are three conditions that must be met (see Figure 5.1). 2
FIGURE 5.1 | The Cause and Effect Relationship
Timing The first condition for a causal relationship is timing. For a causal relationship to exist, it must be shown that the independent variable or cause (X) preceded the dependent variable or outcome (Y) in time. A decrease in domestic violence re-arrests (Y) cannot occur before participation in a domestic violence reduction program (X ), if the domestic violence program is proposed to be the cause of fewer re-arrests. Ensuring that cause comes before effect is not sufficient to establish that a causal relationship exists, but it is one requirement that must be met for a causal relationship.
Association In addition to timing, there must also be an observable association between X and Y, the second necessary condition for a causal relationship. Association is also commonly referred to as covariance or correlation. When an association or correlation exits, this means there is some pattern of relationship between X and Y �as X changes by increasing or decreasing, Y also changes by increasing or decreasing. Here, the notion of X and Y increasing or decreasing can mean an actual increase/decrease in the quantity of some factor, such as an increase/decrease in the number of prison terms or days in a program or re-arrests. It can also refer to an increase/decrease in a particular category, for example, from nonparticipation in a program to participation in a program. For instance, subjects who participated in a domestic violence reduction program (X) incurred fewer domestic violence re-arrests (Y) than those who did not participate in the program. In this example, X and Y are associated�as X change s or increases from nonparticipation to participation in the domestic violence program, Y or the number of re-arrests for domestic violence decreases.
Associations between X and Y can occur in two different directions: positive or negative. A positive association means that as X increases, Y increases, or, as X decreases, Y decreases. A negative association means that as X increases, Y decreases, or, as X decreases, Y increases. In the example above, the association is negative�participation in the domestic violence program was associated with a reduction in re-arrests. This is also sometimes called an inverse relationship.
Elimination of Alternative Explanations Although participation in a domestic violence program may be associated with a reduction in re-arrests, this does not mean for certain that participation in the program was the cause of reduced re-arrests. Just as timing by itself does not imply a causal relationship, association by itself does not imply a causal relationship. For example, instead of the program being the cause of a reduction in re-arrests, perhaps several of the program participants died shortly after completion of the domestic violence program and thus were not able to engage in domestic violence (and their deaths were unknown to the researcher tracking re-arrests). Perhaps a number of the program participants moved out of state and domestic violence re-arrests occurred but were not able to be uncovered by the researcher. Perhaps those in the domestic violence program experienced some other event, such as the trauma of a natural disaster, and that experience led to a reduction in domestic violence, an event not connected to the domestic violence program. If any of these situations occurred, it might appear that the domestic violence program led to fewer re-arrests. However, the observed reduction in re-arrests can actually be attributed to a factor unrelated to the domestic violence program.
The previous discussion leads to the third and final necessary consideration in determining a causal relationship� elimination of alternative explanations. This means that the researcher must rule out any other potential explanation of the results, except for the experimental condition such as a program, policy, or practice. Accounting for or ruling out alternative explanations is much more difficult than ensuring timing and association. Ruling out all alternative explanations is difficult because there are so many potential other explanations that can wholly or partly explain the findings of a research study. This is especially true in the social sciences, where researchers are often interested in relationships explaining human behavior. Because of this difficulty, associations by themselves are sometimes mistaken as causal relationships when in fact they are spurious. A spurious relationship is one where it appears that X and Y are causally related, but the relationship is actually explained by something other than the independent variable, or X.
One only needs to go so far as the daily newspaper to find headlines and stories of mere associations being mistaken, assumed, or represented as causal relationships. For example, a newspaper headline recently proclaimed �Churchgoers live longer.� 3 An uninformed consumer may interpret this headline as evidence of a causal relationship�that going to church by itself will lead to a longer life�but the astute consumer would note possible alternative explanations. For example, people who go to church may live longer because they tend to live healthier lifestyles and tend to avoid risky situations. These are two probable alternative explanations to the relationship independent of simply going to church. In another example, researchers David Kalist and Daniel Yee explored the relationship between first names and delinquent behavior in their manuscript titled �First Names and Crime: Does Unpopularity Spell Trouble?� 4 Kalist and Lee (2009) found that unpopular names are associated with juvenile delinquency. In other words, those individuals with the most unpopular names were more likely to be delinquent than those with more popular names. According to the authors, is it not necessarily someone�s name that leads to delinquent behavior, but rather, the most unpopular names also tend to be correlated with individuals who come from disadvantaged home environments and experience a low socio-economic status of living. Rightly noted by the authors, these alternative explanations help to explain the link between someone�s name and delinquent behavior�a link that is not causal.
A frequently cited example provides more insight to the claim that an association by itself is not sufficient to prove causality. In certain cities in the United States, for example, as ice cream sales increase on a particular day or in a particular month so does the incidence of certain forms of crime. If this association were represented as a causal statement, it would be that ice cream or ice cream sales causes crime. There is an association, no doubt, and let us assume that ice cream sales rose before the increase in crime (timing). Surely, however, this relationship between ice cream sales and crime is spurious. The alternative explanation is that ice cream sales and crime are associated in certain parts of the country because of the weather. Ice cream sales tend to increase in warmer temperatures, and it just so happens that certain forms of crime tend to increase in warmer temperatures as well. This coincidence or association does not mean a causal relationship exists. Additionally, this does not mean that warm temperatures cause crime either. There are plenty of other alternative explanations for the increase in certain forms of crime and warmer temperatures. 6 For another example of a study subject to alternative explanations, read the June 2011 news article titled �Less Crime in U.S. Thanks to Videogames.� 7 Based on your reading, what are some other potential explanations for the crime drop other than videogames?
The preceding examples demonstrate how timing and association can be present, but the final needed condition for a causal relationship is that all alternative explanations are ruled out. While this task is difficult, the classic experimental design helps to ensure these additional explanatory factors are minimized. When other designs are used, such as quasi-experimental designs, the chance that alternative explanations emerge is greater. This potential should become clearer as we explore the organization and logic of the classic experimental design.
CLASSICS IN CJ RESEARCH
Minneapolis Domestic Violence Experiment
The Minneapolis Domestic Violence Experiment (MDVE) 5
Which police action (arrest, separation, or mediation) is most effective at deterring future misdemeanor domestic violence?
The experiment began on March 17, 1981, and continued until August 1, 1982. The experiment was conducted in two of Minneapolis�s four police precincts�the two with the highest number of domestic violence reports and arrests. A total of 314 reports of misdemeanor domestic violence were handled by the police during this time frame.
This study utilized an experimental design with the random assignment of police actions. Each police officer involved in the study was given a pad of report forms. Upon a misdemeanor domestic violence call, the officer�s action (arrest, separation, or mediation) was predetermined by the order and color of report forms in the officer�s notebook. Colored report forms were randomly ordered in the officer�s notebook and the color on the form determined the officer response once at the scene. For example, after receiving a call for domestic violence, an officer would turn to his or her report pad to determine the action. If the top form was pink, the action was arrest. If on the next call the top form was a different color, an action other than arrest would occur. All colored report forms were randomly ordered through a lottery assignment method. The result is that all police officer actions to misdemeanor domestic violence calls were randomly assigned. To ensure the lottery procedure was properly carried out, research staff participated in ride-alongs with officers to ensure that officers did not skip the order of randomly ordered forms. Research staff also made sure the reports were received in the order they were randomly assigned in the pad of report forms.
To examine the relationship of different officer responses to future domestic violence, the researchers examined official arrests of the suspects in a 6-month follow-up period. For example, the researchers examined those initially arrested for misdemeanor domestic violence and how many were subsequently arrested for domestic violence within a 6-month time frame. They did the same procedure for the police actions of separation and mediation. The researchers also interviewed the victim(s) of each incident and asked if a repeat domestic violence incident occurred with the same suspect in the 6-month follow-up period. This allowed researchers to examine domestic violence offenses that may have occurred but did not come to the official attention of police. The researchers then compared official arrests for domestic violence to self-reported domestic violence after the experiment.
Suspects arrested for misdemeanor domestic violence, as opposed to situations where separation or mediation was used, were significantly less likely to engage in repeat domestic violence as measured by official arrest records and victim interviews during the 6-month follow-up period. According to official police records, 10% of those initially arrested engaged in repeat domestic violence in the followup period, 19% of those who initially received mediation engaged in repeat domestic violence, and 24% of those who randomly received separation engaged in repeat domestic violence. According to victim interviews, 19% of those initially arrested engaged in repeat domestic violence, compared to 37% for separation and 33% for mediation. The general conclusion of the experiment was that arrest was preferable to separation or mediation in deterring repeat domestic violence across both official police records and victim interviews.
A few issues that affected the random assignment procedure occurred throughout the study. First, some officers did not follow the randomly assigned action (arrest, separation, or mediation) as a result of other circumstances that occurred at the scene. For example, if the randomly assigned action was separation, but the suspect assaulted the police officer during the call, the officer might arrest the suspect. Second, some officers simply ignored the assigned action if they felt a particular call for domestic violence required another action. For example, if the action was mediation as indicated by the randomly assigned report form, but the officer felt the suspect should be arrested, he or she may have simply ignored the randomly assigned response and substituted his or her own. Third, some officers forgot their report pads and did not know the randomly assigned course of action to take upon a call of domestic violence. Fourth and finally, the police chief also allowed officers to deviate from the randomly assigned action in certain circumstances. In all of these situations, the random assignment procedures broke down.
The results of the MDVE had a rapid and widespread impact on law enforcement practice throughout the United States. Just two years after the release of the study, a 1986 telephone survey of 176 urban police departments serving cities with populations of 100,000 or more found that 46 percent of the departments preferred to make arrests in cases of minor domestic violence, largely due to the effectiveness of this practice in the Minneapolis Domestic Violence Experiment. 8
In an attempt to replicate the findings of the Minneapolis Domestic Violence Experiment, the National Institute of Justice sponsored the Spouse Assault Replication Program. Replication studies were conducted in Omaha, Charlotte, Milwaukee, Miami, and Colorado Springs from 1986�1991. In three of the five replications, offenders randomly assigned to the arrest group had higher levels of continued domestic violence in comparison to other police actions during domestic violence situations. 9 Therefore, rather than providing results that were consistent with the Minneapolis Domestic Violence Experiment, the results from the five replication experiments produced inconsistent findings about whether arrest deters domestic violence. 10
Despite the findings of the replications, the push to arrest domestic violence offenders has continued in law enforcement. Today many police departments require officers to make arrests in domestic violence situations. In agencies that do not mandate arrest, department policy typically states a strong preference toward arrest. State legislatures have also enacted laws impacting police actions regarding domestic violence. Twenty-one states have mandatory arrest laws while eight have pro-arrest statutes for domestic violence. 11
The Classic Experimental Design
Table 5.1 provides an illustration of the classic experimental design. 12 It is important to become familiar with the specific notation and organization of the classic experiment before a full discussion of its components and their purpose.
Major Components of the Classic Experimental Design
The classic experimental design has four major components:
1. Treatment
2. Experimental Group and Control Group
3. Pre-Test and Post-Test
4. Random Assignment
Treatment The first component of the classic experimental design is the treatment, and it is denoted by X in the classic experimental design. The treatment can be a number of things�a program, a new drug, or the implementation of a new policy. In a classic experimental design, the primary goal is to determine what effect, if any, a particular treatment had on some outcome. In this way, the treatment can also be considered the independent variable.
TABLE 5.1 | The Classic Experimental Design
Experimental Group = Group that receives the treatment
Control Group = Group that does not receive the treatment
R = Random assignment
O 1 = Observation before the treatment, or the pre-test
X = Treatment or the independent variable
O 2 = Observation after the treatment, or the post-test
Experimental and Control Groups The second component of the classic experiment is an experimental group and a control group. The experimental group receives the treatment, and the control group does not receive the treatment. There will always be at least one group that receives the treatment in experimental and quasi-experimental designs. In some cases, experiments may have multiple experimental groups receiving multiple treatments.
Pre-Test and Post-Test The third component of the classic experiment is a pre-test and a post-test. A pretest is a measure of the dependent variable or outcome before the treatment. The post-test is a measure of the dependent variable after the treatment is administered. It is important to note that the post-test is defined based on the stated goals of the program. For example, if the stated goal of a particular program is to reduce re-arrests, the post-test will be a measure of re-arrests after the program. The dependent variable also defines the pre-test. For example, if a researcher wanted to examine the impact of a domestic violence reduction program (treatment or X) on the goal of reducing re-arrests (dependent variable or Y), the pre-test would be the number of domestic violence arrests incurred before the program. Program goals may be numerous and all can constitute a post-test, and hence, the pre-test. For example, perhaps the goal of the domestic violence program is also that participants learn of different pro-social ways to handle domestic conflicts other than resorting to violence. If researchers wanted to examine this goal, the post-test might be subjects� level of knowledge about pro-social ways to handle domestic conflicts other than violence. The pre-test would then be subjects� level of knowledge about these pro-social alternatives to violence before they received the treatment program.
Although all designs have a post-test, it is not always the case that designs have a pre-test. This is because researchers may not have access or be able to collect information constituting the pre-test. For example, researchers may not be able to determine subjects� level of knowledge about alternatives to domestic violence before the intervention program if the subjects are already enrolled in the domestic violence intervention program. In other cases, there may be financial barriers to collecting pre-test information. In the teen court evaluation that started this chapter, for example, researchers were not able to collect pre-test information on study participants due to the financial strain it would have placed on the agencies involved in the study. 13 There are a number of potential reasons why a pre-test might not be available in a research study. The defining feature, however, is that the pre-test is determined by the post-test.
Random Assignment The fourth component of the classic experiment is random assignment. Random assignment refers to a process whereby members of the experimental group and control group are assigned to the two groups through a random and unbiased process. Random assignment should not be mistaken for random selection as discussed in Chapter 3. Random selection refers to selecting a smaller but representative sample from a larger population. For example, a researcher may randomly select a sample from a larger city population for the purposes of sending sample members a mail survey to determine their attitudes on crime. The goal of random selection in this example is to make sure the sample, although smaller in size than the population, accurately represents the larger population.
Random assignment, on the other hand, refers to the process of assigning subjects to either the experimental or control group with the goal that the groups are similar or equivalent to each other in every way (see Figure 5.2). The exception to this rule is that one group gets the treatment and the other does not (see discussion below on why equivalence is so important). Although the concept of random is similar in each, the goals are different between random selection and random assignment. 14 Experimental designs all feature random assignment, but this is not true of other research designs, in particular quasi-experimental designs.
FIGURE 5.2 | Random Assignment
The classic experimental design is the foundation for all other experimental and quasi-experimental designs because it retains all of the major components discussed above. As mentioned, sometimes designs do not have a pre-test, a control group, or random assignment. Because the pre-test, control group, and random assignment are so critical to the goal of uncovering a causal relationship, if one exists, we explore them further below.
The Logic of the Classic Experimental Design
Consider a research study using the classic experimental design where the goal is to determine if a domestic violence treatment program has any effect on re-arrests for domestic violence. The randomly assigned experimental and control groups are comprised of persons who had previously been arrested for domestic violence. The pretest is a measure of the number of domestic violence arrests before the program. This is because the goal of the program is to determine whether re-arrests are impacted after the treatment. The post-test is the number of re-arrests following the treatment program.
Once randomly assigned, the experimental group members receive the domestic violence program, and the control group members do not. After the program, the researcher will compare the pre-test arrests for domestic violence of the experimental group to post-test arrests for domestic violence to determine if arrests increased, decreased, or remained constant since the start of the program. The researcher will also compare the post-test re-arrests for domestic violence between the experimental and control groups. With this example, we explore the usefulness of the classic experimental design, and the contribution of the pre-test, random assignment, and the control group to the goal of determining whether a domestic violence program reduces re-arrests.
The Pre-Test As a component of the classic experiment, the pre-test allows an examination of change in the dependent variable from before the domestic violence program to after the domestic violence program. In short, a pre-test allows the researcher to determine if re-arrests increased, decreased, or remained the same following the domestic violence program. Without a pre-test, researchers would not be able to determine the extent of change, if any, from before to after the program for either the experimental or control group.
Although the pre-test is a measure of the dependent variable before the treatment, it can also be thought of as a measure whereby the researcher can compare the experimental group to the control group before the treatment is administered. For example, the pre-test helps researchers to make sure both groups are similar or equivalent on previous arrests for domestic violence. The importance of equivalence between the experimental and control groups on previous arrests is discussed below with random assignment.
Random Assignment Random assignment helps to ensure that the experimental and control groups are equivalent before the introduction of the treatment. This is perhaps one of the most critical aspects of the classic experiment and all experimental designs. Although the experimental and control groups will be made up of different people with different characteristics, assigning them to groups via a random assignment process helps to ensure that any differences or bias between the groups is eliminated or minimized. By minimizing bias, we mean that the groups will balance each other out on all factors except the treatment. If they are balanced out on all factors prior to the administration of the treatment, any differences between the groups at the post-test must be due to the treatment�the only factor that differs between the experimental group and the control group. According to Shadish, Cook, and Campbell: �If implemented correctly, random assignment creates two or more groups of units that are probabilistically similar to each other on the average. Hence, any outcome differences that are observed between those groups at the end of a study are likely to be due to treatment, not to differences between the groups that already existed at the start of the study.� 15 Considered in another way, if the experimental and control group differed significantly on any relevant factor other than the treatment, the researcher would not know if the results observed at the post-test are attributable to the treatment or to the differences between the groups.
Consider an example where 500 domestic abusers were randomly assigned to the experimental group and 500 were randomly assigned to the control group. Because they were randomly assigned, we would likely find more frequent domestic violence arrestees in both groups, older and younger arrestees in both groups, and so on. If random assignment was implemented correctly, it would be highly unlikely that all of the experimental group members were the most serious or frequent arrestees and all of the control group members were less serious and/or less frequent arrestees. While there are no guarantees, we know the chance of this happening is extremely small with random assignment because it is based on known probability theory. Thus, except for a chance occurrence, random assignment will result in equivalence between the experimental and control group in much the same way that flipping a coin multiple times will result in heads approximately 50% of the time and tails approximately 50% of the time. Over 1,000 tosses of a coin, for example, should result in roughly 500 heads and 500 tails. While there is a chance that flipping a coin 1,000 times will result in heads 1,000 times, or some other major imbalance between heads and tails, this potential is small and would only occur by chance.
The same logic from above also applies with randomly assigning people to groups, and this can even be done by flipping a coin. By assigning people to groups through a random and unbiased process, like flipping a coin, only by chance (or researcher error) will one group have more of one characteristic than another, on average. If there are no major (also called statistically significant) differences between the experimental and control group before the treatment, the most plausible explanation for the results at the post-test is the treatment.
As mentioned, it is possible by some chance occurrence that the experimental and control group members are significantly different on some characteristic prior to administration of the treatment. To confirm that the groups are in fact similar after they have been randomly assigned, the researcher can examine the pre-test if one is present. If the researcher has additional information on subjects before the treatment is administered, such as age, or any other factor that might influence post-test results at the end of the study, he or she can also compare the experimental and control group on those measures to confirm that the groups are equivalent. Thus, a researcher can confirm that the experimental and control groups are equivalent on information known to the researcher.
Being able to compare the groups on known measures is an important way to ensure the random assignment process �worked.� However, perhaps most important is that randomization also helps to ensure similarity across unknown variables between the experimental and control group. Because random assignment is based on known probability theory, there is a much higher probability that all potential differences between the groups that could impact the post-test should balance out with random assignment�known or unknown. Without random assignment, it is likely that the experimental and control group would differ on important but unknown factors and such differences could emerge as alternative explanations for the results. For example, if a researcher did not utilize random assignment and instead took the first 500 domestic abusers from an ordered list and assigned them to the experimental group and the last 500 domestic abusers and assigned them to the control group, one of the groups could be �lopsided� or imbalanced on some important characteristic that could impact the outcome of the study. With random assignment, there is a much higher likelihood that these important characteristics among the experimental and control groups will balance out because no individual has a different chance of being placed into one group versus the other. The probability of one or more characteristics being concentrated into one group and not the other is extremely small with random assignment.
To further illustrate the importance of random assignment to group equivalence, suppose the first 500 domestic violence abusers who were assigned to the experimental group from the ordered list had significantly fewer domestic violence arrests before the program than the last 500 domestic violence abusers on the list. Perhaps this is because the ordered list was organized from least to most chronic domestic abusers. In this instance, the control group would be lopsided concerning number of pre-program domestic violence arrests�they would be more chronic than the experimental group. The arrest imbalance then could potentially explain the post-test results following the domestic violence program. For example, the �less risky� offenders in the experimental group might be less likely to be re-arrested regardless of their participation in the domestic violence program, especially compared to the more chronic domestic abusers in the control group. Because of imbalances between the experimental and control group on arrests before the program was implemented, it would not be known for certain whether an observed reduction in re-arrests after the program for the experimental group was due to the program or the natural result of having less risky offenders in the experimental group. In this instance, the results might be taken to suggest that the program significantly reduces re-arrests. This conclusion might be spurious, however, for the association may simply be due to the fact that the offenders in the experimental group were much different (less frequent offenders) than the control group. Here, the program may have had no effect�the experimental group members may have performed the same regardless of the treatment because they were low-level offenders.
The example above suggests that differences between the experimental and control groups based on previous arrest records could have a major impact on the results of a study. Such differences can arise with the lack of random assignment. If subjects were randomly assigned to the experimental and control group, however, there would be a much higher probability that less frequent and more frequent domestic violence arrestees would have been found in both the experimental and control groups and the differences would have balanced out between the groups�leaving any differences between the groups at the post-test attributable to the treatment only.
In summary, random assignment helps to ensure that the experimental and control group members are balanced or equivalent on all factors that could impact the dependent variable or post-test�known or unknown. The only factor they are not balanced or equal on is the treatment. As such, random assignment helps to isolate the impact of the treatment, if any, on the post-test because it increases confidence that the only difference between the groups should be that one group gets the treatment and the other does not. If that is the only difference between the groups, any change in the dependent variable between the experimental and control group must be attributed to the treatment and not an alternative explanation, such as significant arrest history imbalance between the groups (refer to Figure 5.2). This logic also suggests that if the experimental group and control group are imbalanced on any factor that may be relevant to the outcome, that factor then becomes a potential alternative explanation for the results�an explanation that reduces the researcher�s ability to isolate the real impact of the treatment.
WHAT RESEARCH SHOWS: IMPACTING CRIMINAL JUSTICE OPERATIONS
Scared Straight
The 1978 documentary Scared Straight introduced to the public the �Lifer�s Program� at Rahway State Prison in New Jersey. This program sought to decrease juvenile delinquency by bringing at-risk and delinquent juveniles into the prison where they would be �scared straight� by inmates serving life sentences. Participants in the program were talked to and yelled at by the inmates in an effort to scare them. It was believed that the fear felt by the participants would lead to a discontinuation of their problematic behavior so that they would not end up in prison themselves. Although originally touted as a success based on anecdotal evidence, subsequent evaluations of the program and others like it proved otherwise.
Using a classic experimental design, Finckenauer evaluated the original �Lifer�s Program� at Rahway State Prison. 16 Participating juveniles were randomly assigned to the experimental group or the control group. Results of the evaluation were not positive. Post-test measures revealed that juveniles who were assigned to the experimental group and participated in the program were actually more seriously delinquent afterwards than those who did not participate in the program. Also using an experimental design with random assignment, Yarborough evaluated the �Juvenile Offenders Learn Truth� (JOLT) program at the State Prison of Southern Michigan at Jackson. 17 This program was similar to that of the �Lifer�s Program� only with fewer obscenities used by the inmates. Post-test measurements were taken at two intervals, 3 and 6 months after program completion. Again, results were not positive. Findings revealed no significant differences between those juveniles who attended the program and those who did not.
Other experiments conducted on Scared Straight -like programs further revealed their inability to deter juveniles from future criminality. 18 Despite the intuitive popularity of these programs, these evaluations proved that such programs were not successful. In fact, it is postulated that these programs may have actually done more harm than good.
The Control Group The presence of an equivalent control group (created through random assignment) also gives the researcher more confidence that the findings at the post-test are due to the treatment and not some other alternative explanation. This logic is perhaps best demonstrated by considering how interpretation of results is affected without a control group. Absent an equivalent control group, it cannot be known whether the results of the study are due to the program or some other factor. This is because the control group provides a baseline of comparison or a �control.� For example, without a control group, the researcher may find that domestic violence arrests declined from pre-test to post-test. But the researcher would not be able to definitely attribute that finding to the program without a control group. Perhaps the single experimental group incurred fewer arrests because they matured over their time in the program, regardless of participation in the domestic violence program. Having a randomly assigned control group would allow this consideration to be eliminated, because the equivalent control group would also have naturally matured if that was the case.
Because the control group is meant to be similar to the experimental group on all factors with the exception that the experimental group receives the treatment, the logic is that any differences between the experimental and control group after the treatment must then be attributable only to the treatment itself�everything else occurs equally in both the experimental and control groups and thus cannot be the cause of results. The bottom line is that a control group allows the researcher more confidence to attribute any change in the dependent variable from pre- to post-test and between the experimental and control groups to the treatment�and not another alternative explanation. Absent a control group, the researcher would have much less confidence in the results.
Knowledge about the major components of the classic experimental design and how they contribute to an understanding of cause and effect serves as an important foundation for studying different types of experimental and quasi-experimental designs and their organization. A useful way to become familiar with the components of the experimental design and their important role is to consider the impact on the interpretation of results when one or more components are lacking. For example, what if a design lacked a pre-test? How could this impact the interpretation of post-test results and knowledge about the comparability of the experimental and control group? What if a design lacked random assignment? What are some potential problems that could occur and how could those potential problems impact interpretation of results? What if a design lacked a control group? How does the absence of an equivalent control group affect a researcher�s ability to determine the unique effects of the treatment on the outcomes being measured? The ability to discuss the contribution of a pre-test, random assignment, and a control group�and what is the impact when one or more of those components is absent from a research design�is the key to understanding both experimental and quasi-experimental designs that will be discussed in the remainder of this chapter. As designs lose these important parts and transform from a classic experiment to another experimental design or to a quasi-experiment, they become less useful in isolating the impact that a treatment has on the dependent variable and allow more room for alternative explanations of the results.
One more important point must be made before further delving into experimental and quasi-experimental designs. This point is that rarely, if ever, will the average consumer of research be exposed to the symbols or specific language of the classic experiment, or other experimental and quasi-experimental designs examined in this chapter. In fact, it is unlikely that the average consumer will ever be exposed to the terms pre-test, post-test, experimental group, or random assignment in the popular media, among other terms related to experimental and quasi-experimental designs. Yet, consumers are exposed to research results produced from these and other research designs every day. For example, if a national news organization or your regional newspaper reported a story about the effectiveness of a new drug to reduce cholesterol or the effects of different diets on weight loss, it is doubtful that the results would be reported as produced through a classic experimental design that used a control group and random assignment. Rather, these media outlets would use generally nonscientific terminology such as �results of an experiment showed� or �results of a scientific experiment indicated� or �results showed that subjects who received the new drug had greater cholesterol reductions than those who did not receive the new drug.� Even students who regularly search and read academic articles for use in course papers and other projects will rarely come across such design notation in the research studies they utilize. Depiction of the classic experimental design, including a discussion of its components and their function, simply illustrates the organization and notation of the classic experimental design. Unfortunately, the average consumer has to read between the lines to determine what type of design was used to produce the reported results. Understanding the key components of the classic experimental design allows educated consumers of research to read between those lines.
RESEARCH IN THE NEWS
�Swearing Makes Pain More Tolerable� 19
In 2009, Richard Stephens, John Atkins, and Andrew Kingston of the School of Psychology at Keele University conducted a study with 67 undergraduate students to determine if swearing affects an individual�s response to pain. Researchers asked participants to immerse their hand in a container filled with ice-cold water and repeat a preferred swear word. The researchers then asked the same participants to immerse their hand in ice-cold water while repeating a word used to describe a table (a non-swear word). The results showed that swearing increased pain tolerance compared to the non-swearing condition. Participants who used a swear word were able to hold their hand in ice-cold water longer than when they did not swear. Swearing also decreased participants� perception of pain.
1. This study is an example of a repeated measures design. In this form of experimental design, study participants are exposed to an experimental condition (swearing with hand in ice-cold water) and a control condition (non-swearing with hand in ice-cold water) while repeated outcome measures are taken with each condition, for example, the length of time a participant was able to keep his or her hand submerged in ice-cold water. Conduct an Internet search for �repeated measures design� and explore the various ways such a study could be conducted, including the potential benefits and drawbacks to this design.
2. After researching repeated measures designs, devise a hypothetical repeated measures study of your own.
3. Retrieve and read the full research study �Swearing as a Response to Pain� by Stephens, Atkins, and Kingston while paying attention to the design and methods (full citation information for this study is listed below). Has your opinion of the study results changed after reading the full study? Why or why not?
Full Study Source: Stephens, R., Atkins, J., and Kingston, A. (2009). �Swearing as a response to pain.� NeuroReport 20, 1056�1060.
Variations on the Experimental Design
The classic experimental design is the foundation upon which all experimental and quasi-experimental designs are based. As such, it can be modified in numerous ways to fit the goals (or constraints) of a particular research study. Below are two variations of the experimental design. Again, knowledge about the major components of the classic experiment, how they contribute to an explanation of results, and what the impact is when one or more components are missing provides an understanding of all other experimental designs.
Post-Test Only Experimental Design
The post-test only experimental design could be used to examine the impact of a treatment program on school disciplinary infractions as measured or operationalized by referrals to the principal�s office (see Table 5.2). In this design, the researcher randomly assigns a group of discipline problem students to the experimental group and control group by flipping a coin�heads to the experimental group and tails to the control group. The experimental group then enters the 3-month treatment program. After the program, the researcher compares the number of referrals to the principal�s office between the experimental and control groups over some period of time, for example, discipline referrals at 6 months after the program. The researcher finds that the experimental group has a much lower number of referrals to the principal�s office in the 6 month follow-up period than the control group.
TABLE 5.2 | Post-Test Only Experimental Design
Several issues arise in this example study. The researcher would not know if discipline problems decreased, increased, or stayed the same from before to after the treatment program because the researcher did not have a count of disciplinary referrals prior to the treatment program (e.g., a pre-test). Although the groups were randomly assigned and are presumed equivalent, the absence of a pre-test means the researcher cannot confirm that the experimental and control groups were equivalent before the treatment was administered, particularly on the number of referrals to the principal�s office. The groups could have differed by a chance occurrence even with random assignment, and any such differences between the groups could potentially explain the post-test difference in the number of referrals to the principal�s office. For example, if the control group included much more serious or frequent discipline problem students than the experimental group by chance, this difference might explain the lower number of referrals for the experimental group, not that the treatment produced this result.
Experimental Design with Two Treatments and a Control Group
This design could be used to determine the impact of boot camp versus juvenile detention on post-release recidivism (see Table 5.3). Recidivism in this study is operationalized as re-arrest for delinquent behavior. First, a population of known juvenile delinquents is randomly assigned to either boot camp, juvenile detention, or a control condition where they receive no sanction. To accomplish random assignment to groups, the researcher places the names of all youth into a hat and assigns the groups in order. For example, the first name pulled goes into experimental group 1, the next into experimental group 2, and the next into the control group, and so on. Once randomly assigned, the experimental group youth receive either boot camp or juvenile detention for a period of 3 months, whereas members of the control group are released on their own recognizance to their parents. At the end of the experiment, the researcher compares the re-arrest activity of boot camp participants to detention delinquents to control group members during a 6-month follow-up period.
TABLE 5.3 | Experimental Design with Two Treatments and a Control Group
This design has several advantages. First, it includes all major components of the classic experimental design, and simply adds an additional treatment for comparison purposes. Random assignment was utilized and this means that the groups have a higher probability of being equivalent on all factors that could impact the post-test. Thus, random assignment in this example helps to ensure the only differences between the groups are the treatment conditions. Without random assignment, there is a greater chance that one group of youth was somehow different, and this difference could impact the post-test. For example, if the boot camp youth were much less serious and frequent delinquents than the juvenile detention youth or control group youth, the results might erroneously show that the boot camp reduced recidivism when in fact the youth in boot camp may have been the �best risks��unlikely to get re-arrested with or without boot camp. The pre-test in the example above allows the researcher to determine change in re-arrests from pretest to post-test. Thus, the researcher can determine if delinquent behavior, as measured by re-arrest, increased, decreased, or remained constant from pre- to post-test. The pre-test also allows the researcher to confirm that the random assignment process resulted in equivalent groups based on the pre-test. Finally, the presence of a control group allows the researcher to have more confidence that any differences in the post-test are due to the treatment. For example, if the control group had more re-arrests than the boot camp or juvenile detention experimental groups 6 months after their release from those programs, the researcher would have more confidence that the programs produced fewer re-arrests because the control group members were the same as the experimental groups; the only difference was that they did not receive a treatment.
The one key feature of experimental designs is that they all retain random assignment. This is why they are considered �experimental� designs. Sometimes, however, experimental designs lack a pre-test. Knowledge of the usefulness of a pre-test demonstrates the potential problems with those designs where it is missing. For example, in the post-test only experimental design, a researcher would not be able to make a determination of change in the dependent variable from pre- to post-test. Perhaps most importantly, the researcher would not be able to confirm that the experimental and control groups were in fact equivalent on a pre-test measure before the introduction of the treatment. Even though both groups were randomly assigned, and probability theory suggests they should be equivalent, without a pre-test measure the researcher could not confirm similarity because differences could occur by chance even with random assignment. If there were any differences at the post-test between the experimental group and control group, the results might be due to some explanation other than the treatment, namely that the groups differed prior to the administration of the treatment. The same limitation could apply in any form of experimental design that does not utilize a pre-test for conformational purposes.
Understanding the contribution of a pre-test to an experimental design shows that it is a critical component. It provides a measure of change and also gives the researcher more confidence that the observed results are due to the treatment, and not some difference between the experimental and control groups. Despite the usefulness of a pre-test, however, perhaps the most critical ingredient of any experimental design is random assignment. It is important to note that all experimental designs retain random assignment.
Experimental Designs Are Rare in Criminal Justice and Criminology
The classic experiment is the foundation for other types of experimental and quasi-experimental designs. The unfortunate reality, however, is that the classic experiment, or other experimental designs, are few and far between in criminal justice. 20 Recall that one of the major components of an experimental design is random assignment. Achieving random assignment is often a barrier to experimental research in criminal justice. Achieving random assignment might, for example, require the approval of the chief (or city council or both) of a major metropolitan police agency to allow researchers to randomly assign patrol officers to certain areas of a city and/or randomly assign police officer actions. Recall the MDVE. This experiment required the full cooperation of the chief of police and other decision-makers to allow researchers to randomly assign police actions. In another example, achieving random assignment might require a judge to randomly assign a group of youthful offenders to a certain juvenile court sanction (experimental group), and another group of similar youthful offenders to no sanction or an alternative sanction as a control group. 21 In sum, random assignment typically requires the cooperation of a number of individuals and sometimes that cooperation is difficult to obtain.
Even when random assignment can be accomplished, sometimes it is not implemented correctly and the random assignment procedure breaks down. This is another barrier to conducting experimental research. For example, in the MDVE, researchers randomly assigned officer responses, but the officers did not always follow the assigned course of action. Moreover, some believe that the random assignment of criminal justice programs, sentences, or randomly assigning officer responses may be unethical in certain circumstances, and even a violation of the rights of citizens. For example, some believe it is unfair when random assignment results in some delinquents being sentenced to boot camp while others get assigned to a control group without any sanction at all or a less restrictive sanction than boot camp. In the MDVE, some believe it is unfair that some suspects were arrested and received an official record whereas others were not arrested for the same type of behavior. In other cases, subjects in the experimental group may receive some benefit from the treatment that is essentially denied to the control group for a period of time and this can become an issue as well.
There are other important reasons why random assignment is difficult to accomplish. Random assignment may, for example, involve a disruption of the normal procedures of agencies and their officers. In the MDVE, officers had to adjust their normal and established routine, and this was a barrier at times in that study. Shadish, Cook, and Campbell also note that random assignment may not always be feasible or desirable when quick answers are needed. 22 This is because experimental designs sometimes take a long time to produce results. In addition to the time required in planning and organizing the experiment, and treatment delivery, researchers may need several months if not years to collect and analyze the data before they have answers. This is particularly important because time is often of the essence in criminal justice research, especially in research efforts testing the effect of some policy or program where it is not feasible to wait years for answers. Waiting for the results of an experimental design means that many policy-makers may make decisions without the results.
Quasi-Experimental Designs
In general terms, quasi-experiments include a group of designs that lack random assignment. Quasi-experiments may also lack other parts, such as a pre-test or a control group, just like some experimental designs. The absence of random assignment, however, is the ingredient that transforms an otherwise experimental design into a quasi-experiment. Lacking random assignment is a major disadvantage because it increases the chances that the experimental and control groups differ on relevant factors before the treatment�both known and unknown�differences that may then emerge as alternative explanations of the outcomes.
Just like experimental designs, quasi-experimental designs can be organized in many different ways. This section will discuss three types of quasi-experiments: nonequivalent group design, one-group longitudinal design, and two-group longitudinal design.
Nonequivalent Group Design
The nonequivalent group design is perhaps the most common type of quasi-experiment. 23 Notice that it is very similar to the classic experimental design with the exception that it lacks random assignment (see Table 5.4). Additionally, what was labeled the experimental group in an experimental design is sometimes called the treatment group in the nonequivalent group design. What was labeled the control group in the experimental design is sometimes called the comparison group in the nonequivalent group design. This terminological distinction is an indicator that the groups were not created through random assignment.
TABLE 5.4 | Nonequivalent Group Design
NR = Not Randomly assigned
One of the main problems with the nonequivalent group design is that it lacks random assignment, and without random assignment, there is a greater chance that the treatment and comparison groups may be different in some way that can impact study results. Take, for example, a nonequivalent group design where a researcher is interested in whether an aggression-reduction treatment program can reduce inmate-on-inmate assaults in a prison setting. Assume that the researcher asked for inmates who had previously been involved in assaultive activity to volunteer for the aggression-reduction program. Suppose the researcher placed the first 50 volunteers into the treatment group and the next 50 volunteers into the comparison group. Note that this method of assignment is not random but rather first come, first serve.
Because the study utilized volunteers and there was no random assignment, it is possible that the first 50 volunteers placed into the treatment group differed significantly from the last 50 volunteers who were placed in the comparison group. This can lead to alternative explanations for the results. For example, if the treatment group was much younger than the comparison group, the researcher may find at the end of the program that the treatment group still maintained a higher rate of infractions than the comparison group�even after the aggression-reduction program! The conclusion might be that the aggression program actually increased the level of violence among the treatment group. This conclusion would likely be spurious and may be due to the age differential between the treatment and comparison groups. Indeed, research has revealed that younger inmates are significantly more likely to engage in prison assaults than older inmates. The fact that the treatment group incurred more assaults than the comparison group after the aggression-reduction program may only relate to the age differential between the groups, not that the program had no effect or that it somehow may have increased aggression. The previous example highlights the importance of random assignment and the potential problems that can occur in its absence.
Although researchers who utilize a quasi-experimental design are not able to randomly assign their subjects to groups, they can employ other techniques in an attempt to make the groups as equivalent as possible on known or measured factors before the treatment is given. In the example above, it is likely that the researcher would have known the age of inmates, their prior assault record, and various other pieces of information (e.g., previous prison stays). Through a technique called matching, the researcher could make sure the treatment and comparison groups were �matched� on these important factors before administering the aggression reduction program to the treatment group. This type of matching can be done individual to individual (e.g., subject #1 in treatment group is matched to a selected subject #1 in comparison group on age, previous arrests, gender), or aggregately, such that the comparison group is similar to the treatment group overall (e.g., average ages between groups are similar, equal proportions of males and females). Knowledge of these and other important variables, for example, would allow the researcher to make sure that the treatment group did not have heavy concentrations of younger or more frequent or serious offenders than the comparison group�factors that are related to assaultive activity independent of the treatment program. In short, matching allows the researcher some control over who goes into the treatment and comparison groups so as to balance these groups on important factors absent random assignment. If unbalanced on one or more factors, these factors could emerge as alternative explanations of the results. Figure 5.3 demonstrates the logic of matching both at the individual and aggregate level in a quasi-experimental design.
Matching is an important part of the nonequivalent group design. By matching, the researcher can approximate equivalence between the groups on important variables that may influence the post-test. However, it is important to note that a researcher can only match subjects on factors that they have information about�a researcher cannot match the treatment and comparison group members on factors that are unmeasured or otherwise unknown but which may still impact outcomes. For example, if the researcher has no knowledge about the number of previous incarcerations, the researcher cannot match the treatment and comparison groups on this factor. Matching also requires that the information used for matching is valid and reliable, which is not always the case. Agency records, for example, are notorious for inconsistencies, errors, omissions, and for being dated, but are often utilized for matching purposes. Asking survey questions to generate information for matching (for example, how many times have you been incarcerated?) can also be problematic because some respondents may lie, forget, or exaggerate their behavior or experiences.
In addition to the above considerations, the more factors a researcher wishes to match the group members on, the more difficult it becomes to find appropriate matches. Matching on prior arrests or age is less complex than matching on several additional pieces of information. Finally, matching is never considered superior to random assignment when the goal is to construct equitable groups. This is because there is a much higher likelihood of equivalence with random assignment on factors that are both measured and unknown to the researcher. Thus, the results produced from a nonequivalent group design, even with matching, are at a greater risk of alternative explanations than an experimental design that features random assignment.
FIGURE 5.3 | (a) Individual Matching (b) Aggregate Matching
The previous discussion is not to suggest that the nonequivalent group design cannot be useful in answering important research questions. Rather, it is to suggest that the nonequivalent group design, and hence any quasi-experiment, is more susceptible to alternative explanations than the classic experimental design because of the absence of random assignment. As a result, a researcher must be prepared to rule out potential alternative explanations. Quasi-experimental designs that lack a pre-test or a comparison group are even less desirable than the nonequivalent group design and are subject to additional alternative explanations because of these missing parts. Although the quasi-experiment may be all that is available and still can serve as an important design in evaluating the impact of a particular treatment, it is not preferable to the classic experiment. Researchers (and consumers) must be attuned to the potential issues of this design so as to make informed conclusions about the results produced from such research studies.
The Effects of Red Light Camera (RLC) Enforcement
On March 15, 2009, an article appeared in the Santa Cruz Sentinel entitled �Ticket�s in the Mail: Red-Light Cameras Questioned.� The article stated �while studies show fewer T-bone crashes at lights with cameras and fewer drivers running red lights, the number of rear-end crashes increases.� 24 The study mentioned in the newspaper, which showed fewer drivers running red lights with cameras, was conducted by Richard Retting, Susan Ferguson, and Charles Farmer of the Insurance Institute for Highway Safety (IIHS). 25 They completed a quasi-experimental study in Philadelphia to determine the impact of red light cameras (RLC) on red light violations. In the study, the researchers selected nine intersections�six of which were experimental sites that utilized RLCs and three comparison sites that did not utilize RLCs. The six experimental sites were located in Philadelphia, Pennsylvania, and the three comparison sites were located in Atlantic County, New Jersey. The researchers chose the comparison sites based on the proximity to Philadelphia, the ability to collect data using the same methods as at experimental intersections (e.g., the use of cameras for viewing red light traffic), and the fact that police officials in Atlantic County had offered assistance selecting and monitoring the intersections.
The authors collected three phases of information in the RLC study at the experimental and comparison sites:
Phase 1 Data Collection: Baseline (pre-test) data collection at the experimental and comparison sites consisting of the number of vehicles passing through each intersection, the number of red light violations, and the rate of red light violations per 10,000 vehicles.
Phase 2 Data Collection: Number of vehicles traveling through experimental and comparison intersections, number of red light violations after a 1-second yellow light increase at the experimental sites (treatment 1), number of red light violations at comparison sites without a 1-second yellow light increase, and red light violations per 10,000 vehicles at both experimental and comparison sites.
Phase 3 Data Collection: Red light violations after a 1-second yellow light increase and RLC enforcement at the experimental sites (treatment 2), red light violations at comparison sites without a 1-second yellow increase or RLC enforcement, number of vehicles passing through the experimental and comparison intersections, and the rate of red light violations per 10,000 vehicles.
The researchers operationalized �red light violations� as those where the vehicle entered the intersection one-half of a second or more after the onset of the red signal where the vehicle�s rear tires had to be positioned behind the crosswalk or stop line prior to entering on red. Vehicles already in the intersection at the onset of the red light, or those making a right turn on red with or without stopping were not considered red light violations.
The researchers collected video data at each of the experimental and comparison sites during Phases 1�3. This allowed the researchers to examine red light violations before, during, and after the implementation of red light enforcement and yellow light time increases. Based on an analysis of data, the researchers revealed that the implementation of a 1-second yellow light increase led to reductions in the rate of red light violations from Phase 1 to Phase 2 in all of the experimental sites. In 2 out of 3 comparison sites, the rate of red light violations also decreased, despite no yellow light increase. From Phase 2 to Phase 3 (the enforcement of red light camera violations in addition to a 1-second yellow light increase at experimental sites), the authors noted decreases in the rate of red light violations in all experimental sites, and decreases among 2 of 3 comparison sites without red light enforcement in effect.
Concluding their study, the researchers noted that the study �found large and highly significant incremental reductions in red light running associated with increased yellow signal timing followed by the introduction of red light cameras.� Despite these findings, the researchers noted a number of potential factors to consider in light of the findings: the follow-up time periods utilized when counting red light violations before and after the treatment conditions were instituted; publicity about red light camera enforcement; and the size of fines associated with red light camera enforcement (the fine in Philadelphia was $100, higher than in many other cities), among others.
After reading about the study used in the newspaper article, has your impression of the newspaper headline and quote changed?
For more information and research on the effect of RLCs, visit the Insurance Institute for Highway Safety at http://www .iihs.org/research/topics/rlr.html .
One-Group Longitudinal Design
Like all experimental designs, the quasi-experimental design can come in a variety of forms. The second quasi-experimental design (above) is the one-group longitudinal design (also called a simple interrupted time series design). 26 An examination of this design shows that it lacks both random assignment and a comparison group (see Table 5.5). A major difference between this design and others we have covered is that it includes multiple pre-test and post-test observations.
TABLE 5.5 | One-Group Longitudinal Design
The one-group longitudinal design is useful when researchers are interested in exploring longer-term patterns. Indeed, the term longitudinal generally means �over time��repeated measurements of the pre-test and post-test over time. This is different from cross-sectional designs, which examine the pre-test and post-test at only one point in time (e.g., at a single point before the application of the treatment and at a single point after the treatment). For example, in the nonequivalent group design and the classic experimental design previously examined, both are cross-sectional because pre-tests and post-tests are measured at one point in time (e.g., at a point 6 months after the treatment). Yet, these designs could easily be considered longitudinal if researchers took repeated measures of the pre-test and post-test.
The organization of the one-group longitudinal design is to examine a baseline of several pre-test observations, introduce a treatment or intervention, and then examine the post-test at several different time intervals. As organized, this design is useful for gauging the impact that a particular program, policy, or law has, if any, and how long the treatment impact lasts. Consider an example whereby a researcher is interested in gauging the impact of a tobacco ban on inmate-on-inmate assaults in a prison setting. This is an important question, for recent years have witnessed correctional systems banning all tobacco products from prison facilities. Correctional administrators predicted that there would be a major increase of inmate-on-inmate violence once the bans took effect. The one-group longitudinal design would be one appropriate design to examine the impact of banning tobacco on inmate assaults.
To construct this study using the one-group longitudinal design, the researcher would first examine the rate of inmate-on-inmate assaults in the prison system (or at an individual prison, a particular cellblock, or whatever the unit of analysis) prior to the removal of tobacco. This is the pre-test, or a baseline of assault activity before the ban goes into effect. In the design presented above, perhaps the researcher would measure the level of assaults in the preceding four months prior to the tobacco ban. When establishing a pre-test baseline, the general rule is that, in a longitudinal design, the more time utilized, both in overall time and number of intervals, the better. For example, the rate of assaults in the preceding month is not as useful as an entire year of data on inmate assaults prior to the tobacco ban. Next, once the tobacco ban is implemented, the researcher would then measure the rate of inmate assaults in the coming months to determine what impact the ban had on inmate-on-inmate assaults. This is shown in Table 5.5 as the multiple post-test measures of assaults. Assaults may increase, decrease, or remain constant from the pre-test baseline over the term of the post-test.
If assaults increased at the same time as the ban went into effect, the researcher might conclude that the increase was due only to the tobacco ban. But, could there be alternative explanations? The answer to this question is yes, there may be other plausible explanations for the increase even with several months of pre-test data. Unfortunately, without a comparison group there is no way for the researcher to be certain if the increase in assaults was due to the tobacco ban, or some other factor that may have spurred the increase in assaults and happened at the same time as the tobacco ban. What if assaults decreased after the tobacco ban went into effect? In this scenario, because there is no comparison group, the researcher would still not know if the results would have happened anyway without the tobacco ban. In these instances, the lack of a comparison group prevents the researcher from confidently attributing the results to the tobacco ban, and interpretation is subject to numerous alternative explanations.
Two-Group Longitudinal Design
A remedy for the previous situation would be to introduce a comparison group (see Table 5.6). Prior to the full tobacco ban, suppose prison administrators conducted a pilot program at one prison to provide insight as to what would happen once the tobacco ban went into effect systemwide. To conduct this pilot, the researcher identified one prison. At this prison, the researcher identified two different cellblocks, C-Block and D-Block. C-Block constitutes the treatment group, or the cellblock of inmates who will have their tobacco taken away. D-Block is the comparison group�inmates in this cellblock will retain their tobacco privileges during the course of the study and during a determined follow-up period to measure post-test assaults (e.g., 12-months). This is a two-group longitudinal design (also sometimes called a multiple interrupted time series design), and adding a comparison group makes this design superior to the one-group longitudinal design.
TABLE 5.6 | Two-Group Longitudinal Design
The usefulness of adding a comparison group to the study means that the researcher can have more confidence that the results at the post-test are due to the tobacco ban and not some alternative explanation. This is because any difference in assaults at the post-test between the treatment and comparison group should be attributed to the only difference between them, the tobacco ban. For this interpretation to hold, however, the researcher must be sure that C-Block and D-Block are similar or equivalent on all factors that might influence the post-test. There are many potential factors that should be considered. For example, the researcher will want to make sure that the same types of inmates are housed in both cellblocks. If a chronic group of assaultive inmates constitutes members of C-Block, but not D-Block, this differential could explain the results, not the treatment.
The researcher might also want to make sure equitable numbers of tobacco and non-tobacco users are found in each cellblock. If very few inmates in C-Block are smokers, the real effect of removing tobacco may be hidden. The researcher might also examine other areas where potential differences might arise, for example, that both cellblocks are staffed with equal numbers of officers, that officers in each cellblock tend to resolve inmate disputes similarly, and other potential issues that could influence post-test measure of assaults. Equivalence could also be ensured by comparing the groups on additional evidence before the ban takes effect: number of prior prison sentences, time served in prison, age, seriousness of conviction crime, and other factors that might relate to assaultive behavior, regardless of the tobacco ban. Moreover, the researcher should ensure that inmates in C-Block do not know that their D-Block counterparts are still allowed tobacco during the pilot study, and vice versa. If either group knows about the pilot program being an experiment, they might act differently than normal, and this could become an explanation of results. Additionally, the researchers might also try to make sure that C-Block inmates are completely tobacco free after the ban goes into effect�that they do not hoard, smuggle, or receive tobacco from officers or other inmates during the tobacco ban in or outside of the cellblock. If these and other important differences are accounted for at the individual and cellblock level, the researcher will have more confidence that any differences in assaults at the post-test between the treatment and comparison groups are related to the tobacco ban, and not some other difference between the two groups or the two cellblocks.
The addition of a comparison group aids in the ability of the researcher to isolate the true impact of a tobacco ban on inmate-on-inmate assaults. All factors that influence the treatment group should also influence the comparison group because the groups are made up of equivalent individuals in equivalent circumstances, with the exception of the tobacco ban. If this is the only difference, the results can be attributed to the ban. Although the addition of the comparison group in the two-group longitudinal design provides more confidence that the findings are attributed to the tobacco ban, the fact that this design lacks randomization means that alternative explanations cannot be completely ruled out�but they can be minimized. This example also suggests that the quasi-experiment in this instance may actually be preferable to an experimental design�noting the realities of prison administration. For example, prison inmates are not typically randomly assigned to different cellblocks by prison officers. Moreover, it is highly unlikely that a prison would have two open cellblocks waiting for a researcher to randomly assign incoming inmates to the prison for a tobacco ban study. Therefore, it is likely there would be differences among the groups in the quasi-experiment.
Fortunately, if differences between the groups are present, the researcher can attempt to determine their potential impact before interpretation of results. The researcher can also use statistical models after the ban takes effect to determine the impact of any differences between the groups on the post-test. While the two-group longitudinal quasi-experiment just discussed could also take the form of an experimental design, if random assignment could somehow be accomplished, the previous discussion provides one situation where an experimental design might be appropriate and desired for a particular research question, but would not be realistic considering the many barriers.
The Threat of Alternative Explanations
Alternative explanations are those factors that could explain the post-test results, other than the treatment. Throughout this chapter, we have noted the potential for alternative explanations and have given several examples of explanations other than the treatment. It is important to know that potential alternative explanations can arise in any research design discussed in this chapter. However, alternative explanations often arise because some design part is missing, for example, random assignment, a pre-test, or a control or comparison group. This is especially true in criminal justice where researchers often conduct field studies and have less control over their study conditions than do researchers who conduct experiments under highly controlled laboratory conditions. A prime example of this is the tobacco ban study, where it would be difficult for researchers to ensure that C-Block inmates, the treatment group, were completely tobacco free during the course of the study.
Alternative explanations are typically referred to as threats to internal validity. In this context, if an experiment is internally valid, it means that alternative explanations have been ruled out and the treatment is the only factor that produced the results. If a study is not internally valid, this means that alternative explanations for the results exist or potentially exist. In this section, we focus on some common alternative explanations that may arise in experimental and quasi-experimental designs. 27
Selection Bias
One of the more common alternative explanations that may occur is selection bias. Selection bias generally indicates that the treatment group (or experimental group) is somehow different from the comparison group (or control group) on a factor that could influence the post-test results. Selection bias is more often a threat in quasi-experimental designs than experimental designs due to the lack of random assignment. Suppose in our study of the prison tobacco ban, members of C-Block were substantially younger than members of D-Block, the comparison group. Such an imbalance between the groups would mean the researcher would not know if the differences in assaults are real (meaning the result of the tobacco ban) or a result of the age differential. Recall that research shows that younger inmates are more assaultive than older inmates and so we would expect more assaults among the younger offenders independent of the tobacco ban.
In a quasi-experiment, selection bias is perhaps the most prevalent type of alternative explanation and can seriously compromise results. Indeed, many of the examples above have referred to potential situations where the groups are imbalanced or not equivalent on some important factor. Although selection bias is a common threat in quasi-experimental designs because of lack of random assignment, and can be a threat in experimental designs because the groups could differ by chance alone or the practice of randomization was not maintained throughout the study (see Classics in CJ Research-MDVE above), a researcher may be able to detect such differentials. For example, the researcher could detect such differences by comparing the groups on the pre-test or other types of information before the start of the study. If differences were found, the researcher could take measures to correct them. The researcher could also use a statistical model that could account or control for differences between the groups and isolate the impact of the treatment, if any. This discussion is beyond the scope of this text but would be a potential way to deal with selection bias and estimate the impact of this bias on study results. The researcher could also, if possible, attempt to re-match the groups in a quasi-experiment or randomly assign the groups a second time in an experimental design to ensure equivalence. At the least, the researcher could recognize the group differences and discuss their potential impact on the results. Without a pre-test or other pre-study information on study participants, however, such differences might not be able to be detected and, therefore, it would be more difficult to determine how the differences, as a result of selection bias, influenced the results.
Another potential alternative explanation is history. History refers to any event experienced differently by the treatment and comparison groups in the time between the pre-test and the post-test that could impact results. Suppose during the course of the tobacco ban study several riots occurred on D-Block, the comparison group. Because of the riots, prison officers �locked down� this cellblock numerous times. Because D-Block inmates were locked down at various times, this could have affected their ability to otherwise engage in inmate assaults. At the end of the study, the assaults in D-Block might have decreased from their pre-test levels because of the lockdowns, whereas in C-Block assaults may have occurred at their normal pace because there was not a lockdown, or perhaps even increased from the pretest because tobacco was also taken away. Even if the tobacco ban had no effect and assaults remained constant in C-Block from pre- to post-test, the lockdown in D-Block might make it appear that the tobacco ban led to increased assaults in C-Block. Thus, the researcher would not know if the post-test results for the C-Block treatment group were attributable to the tobacco ban or the simple fact that D-Block inmates were locked down and their assault activity was artificially reduced. In this instance, the comparison group becomes much less useful because the lockdown created a historical factor that imbalanced the groups during the treatment phase and nullified the comparison.
Another potential alternative explanation is maturation. Maturation refers to the natural biological, psychological, or emotional processes we all experience as time passes�aging, becoming more or less intelligent, becoming bored, and so on. For example, if a researcher was interested in the effect of a boot camp on recidivism for juvenile offenders, it is possible that over the course of the boot camp program the delinquents naturally matured as they aged and this produced the reduction in recidivism�not that the boot camp somehow led to this reduction. This threat is particularly applicable in situations that deal with populations that rapidly change over a relatively short period of time or when a treatment lasts a considerable period of time. However, this threat could be eliminated with a comparison group that is similar to the treatment group. This is because the maturation effects would occur in both groups and the effect of the boot camp, if any, could be isolated. This assumes, however, that the groups are matched and equitable on factors subject to the maturation process, such as age. If not, such differentials could be an alternative explanation of results. For example, if the treatment and comparison groups differ by age, on average, this could mean that one group changes or matures at a different rate than the other group. This differential rate of change or maturation as a result of the age differential could explain the results, not the treatment. This example demonstrates how selection bias and maturation can interact at the same time as alternative explanations. This example also suggests the importance of an equivalent control or comparison group to eliminate or minimize the impact of maturation as an alternative explanation.
Attrition or Subject Mortality
Attrition or subject mortality is another typical alternative explanation. Attrition refers to differential loss in the number or type of subjects between the treatment and comparison groups and can occur in both experimental and quasi-experimental designs. Suppose we wanted to conduct a study to determine who is the better research methods professor among the authors of this textbook. Let�s assume that we have an experimental design where students were randomly assigned to professor 1, professor 2, or professor 3. By randomly assigning students to each respective professor, there is greater probability that the groups are equivalent and thus there are no differences between the three groups with one exception�the professor they receive and his or her particular teaching and delivery style. This is the treatment. Let�s also assume that the professors will be administering the same tests and using the same textbook. After the group members are randomly assigned, a pre-treatment evaluation shows the groups are in fact equivalent on all important known factors that could influence post-test scores, such as grade point average, age, time in school, and exposure to research methods concepts. Additionally, all groups scored comparably on a pre-test of knowledge about research methods, thus there is more confidence that the groups are in fact equivalent.
At the conclusion of the study, we find that professor 2�s group has the lowest final test scores of the three. However, because professor 2 is such an outstanding professor, the results appear odd. At first glance, the researcher thinks the results could have been influenced by students dropping out of the class. For example, perhaps several of professor 2�s students dropped the course but none did from the classes of professor 1 or 3. It is revealed, however, that an equal number of students dropped out of all three courses before the post-test and, therefore, this could not be the reason for the low scores in professor 2�s course. Upon further investigation, however, the researcher finds that although an equal number of students dropped out of each class, the dropouts in professor 2�s class were some of his best students. In contrast, those who dropped out of professor 1�s and professor 3�s courses were some of their poorest students. In this example, professor 2 appears to be the least effective teacher. However, this result appears to be due to the fact that his best students dropped out, and this highly influenced the final test average for his group. Although there was not a differential loss of subjects in terms of numbers (which can also be an attrition issue), there was differential loss in the types of students. This differential loss, not the teaching style, is an alternative explanation of the results.
Testing or Testing Bias
Another potential alternative explanation is testing or testing bias. Suppose that after the pre-test of research methods knowledge, professor 1 and professor 3 reviewed the test with their students and gave them the correct answers. Professor 2 did not. The fact that professor l�s and professor 3�s groups did better on the post-test final exam may be explained by the finding that students in those groups remembered the answers to the pre-test, were thus biased at the pre-test, and this artificially inflated their post-test scores. Testing bias can explain the results because students in groups 1 and 3 may have simply remembered the answers from the pre-test review. In fact, the students in professor l�s and 3�s courses may have scored high on the post-test without ever having been exposed to the treatment because they were biased at the pre-test.
Instrumentation
Another alternative explanation that can arise is instrumentation. Instrumentation refers to changes in the measuring instrument from pre- to post-test. Using the previous example, suppose professors 1 and 3 did not give the same final exam as professor 2. For example, professors 1 and 3 changed the final exam and professor 2 kept the final exam the same as the pretest. Because professors 1 and 3 changed the exam, and perhaps made it easier or somehow different from the pre-test exam, results that showed lower scores for professor 2�s students may be related only to instrumentation changes from pre- to post-test. Obviously, to limit the influence of instrumentation, researchers should make sure that instruments remain consistent from pre- to post-test.
A final alternative explanation is reactivity. Reactivity occurs when members of the treatment or experimental group change their behavior simply as a result of being part of a study. This is akin to the finding that people tend to change their behavior when they are being watched or are aware they are being studied. If members of the experiment know they are part of an experiment and are being studied and watched, it is possible that their behavior will change independent of the treatment. If this occurs, the researcher will not know if the behavior change is the result of the treatment, or simply a result of being part of a study. For example, suppose a researcher wants to determine if a boot camp program impacts the recidivism of delinquent offenders. Members of the experimental group are sentenced to boot camp and members of the control group are released on their own recognizance to their parents. Because members of the experimental group know they are part of the experiment, and hence being watched closely after they exit boot camp, they may artificially change their behavior and avoid trouble. Their change of behavior may be totally unrelated to boot camp, but rather, to their knowledge of being part of an experiment.
Other Potential Alternative Explanations
The above discussion provided some typical alternative explanations that may arise with the designs discussed in this chapter. There are, however, other potential alternative explanations that may arise. These alternative explanations arise only when a control or comparison group is present.
One such alternative explanation is diffusion of treatment. Diffusion of treatment occurs when the control or comparison group learns about the treatment its members are being denied and attempts to mimic the behavior of the treatment group. If the control group is successful in mimicking the experimental group, for example, the results at the end of the study may show similarity in outcomes between groups and cause the researcher to conclude that the program had no effect. In fact, however, the finding of no effect can be explained by the comparison group mimicking the treatment group. 28 In reality, there may be no effect of the treatment, but the researcher would not know this for sure because the control group effectively transformed into another experimental group�there is then no baseline of comparison. Consider a study where a researcher wants to determine the impact of a training program on class behavior and participation. In this study, the experimental group is exposed to several sessions of training on how to act appropriately in class and how to engage in class participation. The control group does not receive such training, but they are aware that they are part of an experiment. Suppose after a few class sessions the control group starts to mimic the behavior of the experimental group, acting the same way and participating in class the same way. At the conclusion of the study, the researcher might determine that the program had no impact because the comparison group, which did not receive the new program, showed similar progress.
In a related explanation, sometimes the comparison or control group learns about the experiment and attempts to compete with the experimental or treatment group. This alternative explanation is called compensatory rivalry. For example, suppose a police chief wants to determine if a new training program will increase the endurance of SWAT team officers. The chief randomly assigns SWAT members to either an experimental or control group. The experimental group will receive the new endurance training program and the control group will receive the normal program that has been used for years. During the course of the study, suppose the control group learns that the treatment group is receiving the new endurance program and starts to compete with the experimental group. Perhaps the control group runs five more miles per day and works out an extra hour in the weight room, in addition to their normal endurance program. At the end of the study, and due to the control group�s extra and competing effort, the results might show no effect of the new endurance program, and at worst, experimental group members may show a decline in endurance compared to the control group. The rivalry or competing behavior actually explains the results, not that the new endurance program has no effect or a damaging effect. Although the new endurance program may in reality have no effect, this cannot be known because of the actions of the control group, who learned about the treatment and competed with the experimental group.
Closely related to compensatory rivalry is the alternative explanation of comparison or control group demoralization. 29 In this instance, instead of competing with the experimental or treatment group, the control or comparison group simply gives up and changes their normal behavior. Using the SWAT example, perhaps the control group simply quits their normal endurance program when they learn about the treatment group receiving the new endurance program. At the post-test, their endurance will likely drop considerably compared to the treatment group. Because of this, the new endurance program might emerge as a shining success. In reality, however, the researcher will not know if any changes in endurance between the experimental and control groups are a result of the new endurance program or the control group giving up. Due to their giving up, there is no longer a comparison group of equitable others, the change in endurance among the treatment group members could be attributed to a number of alternative explanations, for example, maturation. If the comparison group behaves normally, the researcher will be able to exclude maturation as a potential explanation. This is because any maturation effects will occur in both groups.
The previous discussion suggests that when the control or comparison group learns about the experiment and the treatment they are denied, potential alternative explanations can arise. Perhaps the best remedy to protect from the alternative explanations just discussed is to make sure the treatment and comparison groups do not have contact with one another. In laboratory experiments this can be ensured, but sometimes this is a problem in criminal justice studies, which are often conducted in the field.
The previous discussion also suggests that there are numerous alternative explanations that can impact the interpretation of results from a study. A careful researcher would know that alternative explanations must be ruled out before reaching a definitive conclusion about the impact of a particular program. The researcher must be attuned to these potential alternative explanations because they can influence results and how results are interpreted. Moreover, the discussion shows that several alternative explanations can occur at the same time. For example, it is possible that selection bias, maturation, attrition, and compensatory rivalry all emerge as alternative explanations in the same study. Knowing about these potential alternative explanations and how they can impact the results of a study is what distinguishes a consumer of research from an educated consumer of research.
Chapter Summary
The primary focus of this chapter was the classic experimental design, the foundation for other types of experimental and quasi-experimental designs. The classic experimental design is perhaps the most useful design when exploring causal relationships. Often, however, researchers cannot employ the classic experimental design to answer a research question. In fact, the classic experimental design is rare in criminal justice and criminology because it is often difficult to ensure random assignment for a variety of reasons. In circumstances where an experimental design is appropriate but not feasible, researchers may turn to one of many quasi-experimental designs. The most important difference between the two is that quasi-experimental designs do not feature random assignment. This can create potential problems for researchers. The main problem is that there is a greater chance the treatment and comparison groups may differ on important characteristics that could influence the results of a study. Although researchers can attempt to prevent imbalances between the groups by matching them on important known characteristics, it is still much more difficult to establish equivalence than it is in the classic experiment. As such, it becomes more difficult to determine what impact a treatment had, if any, as one moves from an experimental to a quasi-experimental design.
Perhaps the most important lesson to be learned in this chapter is that to be an educated consumer of research results requires an understanding of the type of design that produced the results. There are numerous ways experimental and quasi-experimental designs can be structured. This is why much attention was paid to the classic experimental design. In reality, all experimental and quasi-experimental designs are variations of the classic experiment in some way�adding or deleting certain components. If the components and organization and logic of the classic experimental design are understood, consumers of research will have a better understanding of the results produced from any sort of research design. For example, what problems in interpretation arise when a design lacks a pre-test, a control group, or random assignment? Having an answer to this question is a good start toward being an informed consumer of research results produced through experimental and quasi-experimental designs.
Critical Thinking Questions
1. Why is randomization/random assignment preferable to matching? Provide several reasons with explanation.
2. What are some potential reasons a researcher would not be able to utilize random assignment?
3. What is a major limitation of matching?
4. What is the difference between a longitudinal study and a cross-sectional study?
5. Describe a hypothetical study where maturation, and not the treatment, could explain the outcomes of the research.
association (or covariance or correlation): One of three conditions that must be met for establishing cause and effect, or a causal relationship. Association refers to the condition that X and Y must be related for a causal relationship to exist. Association is also referred to as covariance or correlation. Although two variables may be associated (or covary or be correlated), this does not automatically imply that they are causally related
attrition or subject mortality: A threat to internal validity, it refers to the differential loss of subjects between the experimental (treatment) and control (comparison) groups during the course of a study
cause and effect relationship: A cause and effect relationship occurs when one variable causes another, and no other explanation for that relationship exists
classic experimental design or experimental design: A design in a research study that features random assignment to an experimental or control group. Experimental designs can vary tremendously, but a constant feature is random assignment, experimental and control groups, and a post-test. For example, a classic experimental design features random assignment, a treatment, experimental and control groups, and pre- and post-tests
comparison group: The group in a quasi-experimental design that does not receive the treatment. In an experimental design, the comparison group is referred to as the control group
compensatory rivalry: A threat to internal validity, it occurs when the control or comparison group attempts to compete with the experimental or treatment group
control group: In an experimental design, the control group does not receive the treatment. The control group serves as a baseline of comparison to the experimental group. It serves as an example of what happens when a group equivalent to the experimental group does not receive the treatment
cross-sectional designs: A measurement of the pre-test and post-test at one point in time (e.g., six months before and six months after the program)
demoralization: A threat to internal validity closely associated with compensatory rivalry, it occurs when the control or comparison group gives up and changes their normal behavior. While in compensatory rivalry the group members compete, in demoralization, they simply quit. Both are not normal behavioral reactions
dependent variable: Also known as the outcome in a research study. A post-test is a measure of the dependent variable
diffusion of treatment: A threat to internal validity, it occurs when the control or comparison group members learn that they are not getting the treatment and attempt to mimic the behavior of the experimental or treatment group. This mimicking may make it seem as if the treatment is having no effect, when in fact it may be
elimination of alternative explanations: One of three conditions that must be met for establishing cause and effect. Elimination of alternative explanations means that the researcher has ruled out other explanations for an observed relationship between X and Y
experimental group: In an experimental design, the experimental group receives the treatment
history: A threat to internal validity, it refers to any event experienced differently by the treatment and comparison groups�an event that could explain the results other than the supposed cause
independent variable: Also called the cause
instrumentation: A threat to internal validity, it refers to changes in the measuring instrument from pre- to post-test
longitudinal: Refers to repeated measurements of the pre-test and post-test over time, typically for the same group of individuals. This is the opposite of cross-sectional
matching: A process sometimes utilized in some quasi-experimental designs that feature treatment and comparison groups. Matching is a process whereby the researcher attempts to ensure equivalence between the treatment and comparison groups on known information, in the absence of the ability to randomly assign the groups
maturation: A threat to internal validity, maturation refers to the natural biological, psychological, or emotional processes as time passes
negative association: Refers to a negative association between two variables. A negative association is demonstrated when X increases and Y decreases, or X decreases and Y increases. Also known as an inverse relationship�the variables moving in opposite directions
operationalized or operationalization: Refers to the process of assigning a working definition to a concept. For example, the concept of intelligence can be operationalized or defined as grade point average or score on a standardized exam, among others
pilot program or test: Refers to a smaller test study or pilot to work out problems before a larger study and to anticipate changes needed for a larger study. Similar to a test run
positive association: Refers to a positive association between two variables. A positive association means as X increases, Y increases, or as X decreases, Y decreases
post-test: The post-test is a measure of the dependent variable after the treatment has been administered
pre-test: The pre-test is a measure of the dependent variable or outcome before a treatment is administered
quasi-experiment: A quasi-experiment refers to any number of research design configurations that resemble an experimental design but primarily lack random assignment. In the absence of random assignment, quasi-experimental designs feature matching to attempt equivalence
random assignment: Refers to a process whereby members of the experimental group and control group are assigned to each group through a random and unbiased process
random selection: Refers to selecting a smaller but representative subset from a population. Not to be confused with random assignment
reactivity: A threat to internal validity, it occurs when members of the experimental (treatment) or control (comparison) group change their behavior unnaturally as a result of being part of a study
selection bias: A threat to internal validity, selection bias occurs when the experimental (treatment) group and control (comparison) group are not equivalent. The difference between the groups can be a threat to internal validity, or, an alternative explanation to the findings
spurious: A spurious relationship is one where X and Y appear to be causally related, but in fact the relationship is actually explained by a variable or factor other than X
testing or testing bias: A threat to internal validity, it refers to the potential of study members being biased prior to a treatment, and this bias, rather than the treatment, may explain study results
threat to internal validity: Also known as alternative explanation to a relationship between X and Y. Threats to internal validity are factors that explain Y, or the dependent variable, and are not X, or the independent variable
timing: One of three conditions that must be met for establishing cause and effect. Timing refers to the condition that X must come before Y in time for X to be a cause of Y. While timing is necessary for a causal relationship, it is not sufficient, and considerations of association and eliminating other alternative explanations must be met
treatment: A component of a research design, it is typically denoted by the letter X. In a research study on the impact of teen court on juvenile recidivism, teen court is the treatment. In a classic experimental design, the treatment is given only to the experimental group, not the control group
treatment group: The group in a quasi-experimental design that receives the treatment. In an experimental design, this group is called the experimental group
unit of analysis: Refers to the focus of a research study as being individuals, groups, or other units of analysis, such as prisons or police agencies, and so on
variable(s): A variable is a concept that has been given a working definition and can take on different values. For example, intelligence can be defined as a person�s grade point average and can range from low to high or can be defined numerically by different values such as 3.5 or 4.0
1 Povitsky, W., N. Connell, D. Wilson, & D. Gottfredson. (2008). �An experimental evaluation of teen courts.� Journal of Experimental Criminology, 4, 137�163.
2 Hirschi, T., and H. Selvin (1966). �False criteria of causality in delinquency.� Social Problems, 13, 254�268.
3 Robert Roy Britt, �Churchgoers Live Longer.� April, 3, 2006. http://www.livescience.com/health/060403_church_ good.html. Retrieved on September 30, 2008.
4 Kalist, D., and D. Yee (2009). �First names and crime: Does unpopularity spell trouble?� Social Science Quarterly, 90 (1), 39�48.
5 Sherman, L. (1992). Policing domestic violence. New York: The Free Press.
6 For historical and interesting reading on the effects of weather on crime and other disorder, see Dexter, E. (1899). �Influence of weather upon crime.� Popular Science Monthly, 55, 653�660 in Horton, D. (2000). Pioneering Perspectives in Criminology. Incline Village, NV: Copperhouse.
7 http://www.escapistmagazine.com/news/view/111191-Less-Crime-in-U-S-Thanks-to-Videogames , retrieved on September 13, 2011. This news article was in response to a study titled �Understanding the effects of violent videogames on violent crime.� See Cunningham, Scott, Engelst�tter, Benjamin, and Ward, (April 7, 2011). Available at SSRN: http://ssm.com/abstract= 1804959.
8 Cohn, E. G. (1987). �Changing the domestic violence policies of urban police departments: Impact of the Minneapolis experiment.� Response, 10 (4), 22�24.
9 Schmidt, Janell D., & Lawrence W. Sherman (1993). �Does arrest deter domestic violence?� American Behavioral Scientist, 36 (5), 601�610.
10 Maxwell, Christopher D., Joel H. Gamer, & Jeffrey A. Fagan. (2001). The effects of arrest on intimate partner violence: New evidence for the spouse assault replication program. Washington D.C.: National Institute of Justice.
11 Miller, N. (2005). What does research and evaluation say about domestic violence laws? A compendium of justice system laws and related research assessments. Alexandria, VA: Institute for Law and Justice.
12 The sections on experimental and quasi-experimental designs rely heavily on the seminal work of Campbell and Stanley (Campbell, D.T., & J. C. Stanley. (1963). Experimental and quasi-experimental designs for research. Chicago: RandMcNally) and more recently, Shadish, W., T. Cook, & D. Campbell. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin.
13 Povitsky et al. (2008). p. 146, note 9.
14 Shadish, W., T. Cook, & D. Campbell. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York: Houghton Mifflin Company.
15 Ibid, 15.
16 Finckenauer, James O. (1982). Scared straight! and the panacea phenomenon. Englewood Cliffs, N.J.: Prentice Hall.
17 Yarborough, J.C. (1979). Evaluation of JOLT (Juvenile Offenders Learn Truth) as a deterrence program. Lansing, MI: Michigan Department of Corrections.
18 Petrosino, Anthony, Carolyn Turpin-Petrosino, & James O. Finckenauer. (2000). �Well-meaning programs can have harmful effects! Lessons from experiments of programs such as Scared Straight.� Crime and Delinquency, 46, 354�379.
19 �Swearing makes pain more tolerable� retrieved at http:// www.livescience.com/health/090712-swearing-pain.html (July 13, 2009). Also see �Bleep! My finger! Why swearing helps ease pain� by Tiffany Sharpies, retrieved at http://www.time.com/time/health/article /0,8599,1910691,00.html?xid=rss-health (July 16, 2009).
20 For an excellent discussion of the value of controlled experiments and why they are so rare in the social sciences, see Sherman, L. (1992). Policing domestic violence. New York: The Free Press, 55�74.
21 For discussion, see Weisburd, D., T. Einat, & M. Kowalski. (2008). �The miracle of the cells: An experimental study of interventions to increase payment of court-ordered financial obligations.� Criminology and Public Policy, 7, 9�36.
22 Shadish, Cook, & Campbell. (2002).
24 Kelly, Cathy. (March 15, 2009). �Tickets in the mail: Red-light cameras questioned.� Santa Cruz Sentinel.
25 Retting, Richard, Susan Ferguson, & Charles Farmer. (January 2007). �Reducing red light running through longer yellow signal timing and red light camera enforcement: Results of a field investigation.� Arlington, VA: Insurance Institute for Highway Safety.
26 Shadish, Cook, & Campbell. (2002).
27 See Shadish, Cook, & Campbell. (2002), pp. 54�61 for an excellent discussion of threats to internal validity. Also see Chapter 2 for an extended discussion of all forms of validity considered in research design.
28 Trochim, W. (2001). The research methods knowledge base, 2nd ed. Cincinnati, OH: Atomic Dog.
Applied Research Methods in Criminal Justice and Criminology Copyright © 2022 by University of North Texas is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.
IMAGES
VIDEO
COMMENTS
What we've just described is known as the classical experimental design and is the simplest type of true experimental design. All of the designs we review in this section are variations on this approach. Figure 8.1 visually represents these steps. Figure 8.1 Steps in classic experimental design
Glossary. Classic experimental design- uses random assignment, an experimental, a control group, pre-testing, and post-testing. Comparison group- a group in quasi-experimental designs that receives "treatment as usual" instead of no treatment. Control group- the group in an experiment that does not receive the intervention. Experiment- a method of data collection designed to test ...
An experimental design where treatments aren't randomly assigned is called a quasi-experimental design. Between-subjects vs. within-subjects. In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.
Experimental design is a structured approach used to conduct scientific experiments. It enables researchers to explore cause-and-effect relationships by controlling variables and testing hypotheses. This guide explores the types of experimental designs, common methods, and best practices for planning and conducting experiments.
Designed experiments help determine the conditions under which a production process yields maximum output or other optimum results, etc. The chapter presents the classical methods of design of experiments. It starts with an introductory section with examples and discusses guiding principles in designing experiments.
In a classic experimental design, participants are randomly assigned to an experimental or control group and then given a pre-test, where the dependent variable is measured prior to exposure to the manipulation (i.e., the independent variable). ... Differentiate between basic and classic experimental designs; explain how exposure to the ...
To quantify the natural variation between experimental units. To increase accuracy of estimated effects. Reduce noise: by controlling as much as possible the conditions in the experiment. A classical example is the grouping of similar experimental units in blocks.
1. 4. Experimental design 1. 4. 1. The role of experimental design Experimental design concerns the validity and efficiency of the experiment. The experimental design in the following diagram (Box et al., 1978), is represented by a movable window through which certain aspects of the true state of nature, more or less distorted by noise, may be ...
An understanding of how experimental design can impinge upon the integrity of research requires a shift from mid-level discussions about experimental design to high-level discourse concerning the causal assumptions held by proponents of experimental design (Guala 2009; Cartwright 2014).From there, it is possible to identify and analyze the problems of causal inference that arise particularly ...
Experimental Group = Group that receives the treatment. Control Group = Group that does not receive the treatment. R = Random assignment. O 1 = Observation before the treatment, or the pre-test. X = Treatment or the independent variable. O 2 = Observation after the treatment, or the post-test. Experimental and Control Groups The second component of the classic experiment is an experimental ...