• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

What Is a Within-Subjects Design?

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

define within subjects experiment

 James Lacy, MLS, is a fact-checker and researcher.

define within subjects experiment

How to Use a Within-Subjects Design

How it differs from a between-subjects design, when to use a within-subjects design.

A within-subjects design is a type of experimental design in which all participants are exposed to every treatment or condition. It is also known as a repeated measures design.

The term "treatment" describes the different levels of the independent variable , the variable that the experimenter controls. In other words, all of the subjects in the study are treated with the critical variable in question.

At a Glance

A within-subject design involves having all participants exposed to the exact same treatments. It can be a helpful way for researchers to learn more about how changes happen over time. This type of design can be helpful when resources are limited or when investigating the real-world effects of treatments or programs.

This article discusses what a within-subjects design is, how this type of experimental design works, and how it compares to a between-subjects design.

When using a within-subjects design, it is important to make sure that all of the participants are exposed to the same treatment variables. By doing this, researchers can measure how each participant changes over time as a result of the treatment.

Example of a Within-Subjects Design

It can be helpful to look at some examples of how a within-subjects design might work. Let's imagine that you are doing an experiment on exercise and memory . For your independent variable, you decide to try two different types of exercise: yoga and jogging.

Instead of breaking participants up into two groups, you have all the participants try yoga before taking a memory test. Then, you have all the participants try jogging before taking a memory test. Next, you compare the test scores to determine which type of exercise had the most significant effect on performance on the memory tests.

What Is a 2x2 Within-Subjects Design?

A within-subjects design can also be a factorial design. A factorial design is a type of experimental design that can look at the effects of two or more independent variables.

In a 2x2 design, researchers examine how two independent variables with two different levels impact a single dependent variable. For example, imagine a study where researchers wanted to see how the type and duration of therapy influence treatment outcomes.

In a 2x2 design, they would examine two types of therapy ( cognitive-behavioral and psychodynamic ) and two levels of each treatment (short- and long-term).

This within-subjects design can be compared to what is known as a between-subjects design. In a between-subjects design, people are only assigned to a single treatment.

So one group of participants would receive one treatment, while another group would receive a different treatment. The differences between the two groups would then be compared.

Consider the earlier example of the experiment looking at exercise and memory. In a between-subjects design, one group of participants would do yoga and take a memory test.

A different group of participants would jog and then take the memory test. Afterward, the results of the memory tests would be compared to see how the type of exercise influenced memory.

In a within-subjects design, all participants receive every treatment. In a between-subjects design, participants only receive one treatment.

A within-subjects design can be a good option if participants or resources are limited. It can also be a good way to examine situations in real-world settings, such as to assess the effectiveness of educational programs.

When Not to Use a Within-Subjects Design

If researchers are concerned about the potential interferences of practice effects, they may want to use a between-subjects design instead. Within-subjects designs can also take more time to administer in some cases, so it may be helpful to use a between-sessions design if many participants are available to quickly conduct data collection sessions.

Advantages of Within-Subjects Design

There are a few different advantages to using a within-subject design when conducting a psychology experiment .

Uses a Smaller Sample Size

One of the most significant benefits of this type of experimental design is that it does not require a large pool of participants. A similar experiment in a between-subject design requires twice as many participants as a within-subject design when two or more groups of participants are tested with different factors.

Reduces Errors Caused by Individual Differences

A within-subject design can also help reduce errors associated with individual differences. In a between-subject design where individuals are randomly assigned to the independent variable or treatment, there is still a possibility that there may be fundamental differences between the groups that could impact the experiment's results.

In a within-subject design, individuals are exposed to all levels of a treatment, so individual differences will not distort the results. Each participant serves as their own baseline.

Disadvantages of Within-Subjects Design 

This type of experimental design can be advantageous in some cases, but there are some potential drawbacks to consider.

Carryover Effects

A major drawback of using a within-subject design is that the sheer act of having participants take part in one condition can impact the performance or behavior on all other conditions, a problem known as a carryover effect.

So, for instance, in our earlier example, having participants take part in yoga might impact their later performance in jogging and may even affect their performance on later memory tests.

Participant Fatigue

Fatigue is another potential drawback of using a within-subject design. Participants may become exhausted, bored, or less motivated after taking part in multiple treatments or tests.

Practice Effects

Finally, performance on subsequent tests can also be affected by practice effects . Taking part in different levels of the treatment or taking the measurement tests several times might help the participants become more skilled.

This means they may be able to figure out how to game the results to do better in the experiment. This can skew the results and make it difficult to determine if any effect is due to the different levels of the treatment or simply a result of practice.

What This Means For You

Understanding how a within-subjects design works can give you a better idea of how psychology experiments are conducted. It can also help give you a better idea of the type of design you might want to use if you are conducting your own psychology experiment. A within-subjects design is a great option if participants and resources tend to be limited.

Salkind NJ, ed.  Encyclopedia of Research Design . SAGE Publications, Inc. doi:10.4135/9781412961288

Haerling Adamson K, Prion S. Two-by-two factorial design .  Clin Simul Nurs . 2020;49:90-91. doi:10.1016/j.ecns.2020.06.004

APA Dictionary of Psychology. Between-subjects design . American Psychological Association.

Steingrimsdottir HS, Arntzen E. On the utility of within-participant research design when working with patients with neurocognitive disorders .  Clin Interv Aging . 2015;10:1189-1199. doi:10.2147/CIA.S81868

Montoya AK. Selecting a within- or between-subject design for mediation: Validity, causality, and statistical power .  Multivariate Behav Res . 2023;58(3):616-636. doi:10.1080/00273171.2022.2077287

Cuttler C. Research Methods in Psychology . University of Washington.

Charness G, Gneezy U, Kuhn M. 

Experimental methods: Between-subject and within-subject design . J Econ Behav Org.  2012;81:1-8. doi:10.1016/j.jebo.2011.08.009

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Between-Subjects vs. Within-Subjects Study Design

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

In a within-subject design, each participant experiences all experimental conditions, whereas, in a between-subject design, different participants are assigned to each condition, with each experiencing only one condition.

experimental design

Within-subjects (or repeated-measures) is an experimental design in which all study participants are exposed to the same treatments or independent variable conditions.

In within-subjects studies, the participants are compared to one another, so there is no control group. The data comparison occurs within the group of study participants, and each participant serves as their own baseline.

In a between-subjects design (or between-groups, independent measures), the study participants are divided into groups, and each group is exposed to one treatment or condition.

Each participant is only assigned to a single treatment. This should be done by random allocation , ensuring that each participant has an equal chance of being assigned to one group.

The differences between the two groups are then compared to a control group that does not receive any treatment. The groups that undergo a treatment or condition are typically called the experimental groups.

  • In a within-subjects design, all participants receive every treatment. 
  • In a between-subjects design, participants only receive one treatment.

Design Similarities

  • Both types of study designs tend to be used to assess the impact of a treatment or condition on a given study population.
  • The goal of both types of studies is to compare several test conditions in a single study.
  • Both within-subjects designs and between-subjects designs have a group of subjects that serve as study participants and are exposed to a given treatment.
  • Both experimental designs are utilized in quantitative studies and aim to result in findings that are statistically likely to generalize to a whole population.
  • Between-subjects and within-subjects design both have an independent variable that is manipulated or controlled by the study’s investigators and a dependent variable that is measured. 
  • Random assignment is essential for both types of designs.

Design Differences

  •  In a within-subjects design, all participants receive all treatments. In a between-subjects design, participants receive only one treatment.
  • In a within-subjects design, the participants are compared to each other, so there is no control group. In a between-subjects design, there is a control group that doesn’t receive any treatment and serves as a source of comparison for the treatment groups. 
  • Between-subjects designs require significantly more participants than within-subjects designs in order to detect a statistically significant difference between the two conditions. In within-subjects designs, on the other hand, fewer participants are required as each participant provides a data point for each level of the independent variable. A similar experiment in a between-subject design would require twice as many participants as a within-subject design. This means that they also require more resources and funding to recruit a larger sample, administer sessions, and cover costs.
  • Between-subjects designs tend to be easier and quicker to administer as each participant is only given one treatment. In contrast, within-subjects designs take longer to implement because every participant is given multiple treatments.
  • Within-subjects designs are vulnerable to fatigue and carryover effects. Participant fatigue occurs when participants become tired, bored, or unmotivated after taking part in multiple treatments in a row. Carryover effects are when the act of having participants take part in one condition impacts the performance or behavior on all other conditions. With between-subjects design, though, participants are exposed to fewer conditions, so fatigue and carryover effects are less of a challenge.
  • Because different participants provide data for each condition in between-subjects designs, individual differences among participants may threaten internal validity. Within-subjects designs are less affected by individual differences among participants because the participants are compared to themselves and thus higher statistical power can be achieved.

What is a 2×2 within subject design?

A 2×2 within-subjects design is one in which there are two independent variables each having two different levels. This design allows researchers to understand the effects of two independent variables (each with two levels) on a single dependent variable.

When would you use a within-subjects design?

You typically would use a within-subjects design when you want to investigate a causal or correlational relationship between variables with a relatively small sample.

The primary goal of a within-subjects design is to determine if one treatment condition is more effective than another.

Within-subjects are typically used for longitudinal studies or observational studies conducted over an extended period.

When should a within-subjects design not be used?

A within-subjects design should not be used if researchers are concerned about the potential interferences of practice effects. 

If the researcher is interested in treatment effects under minimum practice, the within-subjects design is inappropriate because subjects are providing data for two of the three treatments under more than minimum practice.

When should you use a between-subjects design?

Between-subjects designs are used when you have multiple independent variables. This type of design enables researchers to determine if one treatment condition is superior to another.

A between-subjects design is also useful when you want to compare groups that differ on a key characteristic.

This key characteristic would be the independent variable, with varying levels of the characteristic differentiating the groups from each other.

When can a between-subjects design not be used?

Between-subjects cannot be used with small sample sizes because they will not be statistically powerful enough.

Between-subjects studies require at least twice as many participants as a within-subject design, which also means twice the cost and resources. When funding is limited,  between-subjects design can likely not be used.

Can I use a within- and between-subjects design in the same study?

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design).

Factorial designs are a type of experiment where multiple independent variables are tested.

Each level of one independent variable (a factor) is combined with each level of every other independent variable to produce different conditions.

Is between-subjects or within-subjects design more powerful?

Within-subjects designs have more statistical power due to the lack of variation between the individuals in the study because participants are compared to themselves.

A between-subjects design would require a large participant pool in order to reach a similar level of statistical significance as a within-subjects design.

Print Friendly, PDF & Email

Related Articles

Phenomenology In Qualitative Research

Research Methodology

Phenomenology In Qualitative Research

Ethnography In Qualitative Research

Ethnography In Qualitative Research

Narrative Analysis In Qualitative Research

Narrative Analysis In Qualitative Research

Thematic Analysis: A Step by Step Guide

Thematic Analysis: A Step by Step Guide

Metasynthesis Of Qualitative Research

Metasynthesis Of Qualitative Research

Grounded Theory In Qualitative Research: A Practical Guide

Grounded Theory In Qualitative Research: A Practical Guide

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Within-Subjects Design | Explanation, Approaches, Examples

Published on 11 April 2022 by Pritha Bhandari .

In experiments , a different independent variable treatment or manipulation is used in each condition to assess whether there is a cause-and-effect relationship with a dependent variable.

In a within-subjects design , or a within-groups design, all participants take part in every condition. It’s the opposite of a between-subjects design , where each participant experiences only one condition.

A within-subjects design is also called a dependent groups or repeated measures design because researchers compare related measures from the same participants between different conditions.

All longitudinal studies use within-subjects designs to assess changes within the same individuals over time.

Table of contents

Using a within-subjects design, within-subjects vs between-subjects design, pros and cons of a within-subjects design, frequently asked questions about within-subjects designs.

In a within-subjects design, all participants in the sample are exposed to the same treatments. The goal is to measure changes over time or changes resulting from different treatments for outcomes such as attitudes, learning, or performance.

When comparing different treatments within subjects, you should randomise or counterbalance the order in which every condition is presented across the group of participants. This prevents the effects of earlier treatments from spilling over onto later ones.

Randomisation means using many different possible sequences for treatments, while counterbalancing means using a limited number of sequences across the group.

Counterbalancing is sometimes more convenient for researchers because an even portion of the sample undergoes each sequence of conditions selected by researchers. Each treatment ideally appears equally often in each position (e.g., third) of the sequence. This helps balance out the effects of treatment sequence on the outcomes.

To randomise treatment order, the order of the short stories is completely randomised between participants using a computer program. Every possible sequence can be presented to participants across the group, but in complete randomisation, you can’t control how often each sequence is used in the participant group.

In longitudinal studies, time is an independent variable. Because researchers can’t prevent the effects of time, longitudinal studies usually study correlations between time and other (dependent) variables.

Prevent plagiarism, run a free check.

The opposite of a within-subjects design is a between-subjects design , where each participant only experiences one condition, and different treatment groups are compared.

Between-subjects designs usually have a control group (e.g., no treatment) and an experimental group, or multiple groups that differ on a variable (e.g., gender, ethnicity, test score etc). Researchers compare the outcomes of different groups with each other.

In within-subjects designs, participants serve as their own control by providing baseline scores across different conditions.

The word ‘within’ means you’re comparing different conditions within the same group or individual, while the word ‘between’ means that you’re comparing different conditions between groups.

Within subjects design

  • a control group that takes a college course on campus,
  • an experimental group that takes the same college course online.

You would administer the same test to all participants and compare test scores between the groups.

If you use a within-subjects design, everyone in your sample would take part in every condition:

  • Half of the college course is administered on campus before a test.
  • Half of the college course is given online before a comparable test.

In factorial designs , two or more independent variables are tested at the same time. Every level of one independent variable is combined with each level of every other independent variable to create different conditions.

In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

Some longitudinal studies can be experimental when they use a mixed design to study two or more independent variables. If you can directly manipulate one of the independent variables, and participant assignment to conditions, you’re using an experimental approach.

Each participant is randomly assigned to one of two groups:

  • A control group that receives standard teaching methods,
  • Another group that receives experimental teaching methods.

Smaller sample

Within-subjects designs help you detect causal or correlational relationships between variables with relatively small samples. It’s easier to recruit a sample for a within-subjects design than a between-subjects design because you need fewer participants. Every participant provides repeated measures, making the study more cost effective.

Removes effects of individual differences between conditions

In a between-subjects design, different participants take part in each condition, so participant characteristics (e.g., intelligence or memory capacity) often vary between groups. This means it’s hard to say whether the outcomes are truly the result of the independent variable or individual differences between groups.

In contrast, there are no variations in individual differences between conditions in a within-subjects design because the same individuals participate in all conditions. Participant characteristics are controlled for.

  • Statistically powerful

A within-subjects design is more statistically powerful than a between-subjects design, because individual variation is removed. To achieve the same level of power, a between-subjects design often requires double the number of participants (or more) that a within-subjects design does.

Time-related effects

There are many time-related threats to internal validity that only apply to within-subjects design because it’s hard to control the effects of time on the outcomes of the study.

Some examples:

  • History: an unrelated event (e.g., a lockdown) may influence the outcomes.
  • Maturation: the natural physical or psychological changes (e.g., growth or aging) in the participants over time may cause the outcomes.
  • Subject attrition: more participants drop out at every subsequent step of the study, leaving you with a potentially biased sample at the end because only participants with strong motivations stay in the study.

Carryover effects

Carryover effects are a broad category of internal validity threats that occur when an earlier treatment alters the outcomes of a later treatment.

  • Practice effects (learning): familiarity with the study based on earlier conditions leads to better performance in later conditions.
  • Order effects: the placement of a condition in a number of conditions changes the outcomes – for example, participants pay less attention in the last condition because of boredom and fatigue.
  • Sequence effects: the interaction between conditions (based on their sequence) affects the outcomes; for instance, participants in an ad rating survey may compare later ads to earlier ones and base their decisions on the sequence of items.

Randomisation and counterbalancing of the order of conditions can help reduce carryover effects.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word ‘between’ means that you’re comparing different conditions between groups, while the word ‘within’ means you’re comparing different conditions within the same group.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

  • Only requires small samples
  • Removes the effects of individual differences on the outcomes

Disadvantages:

  • Internal validity threats reduce the likelihood of establishing a direct relationship between variables
  • Time-related effects, such as growth, can influence the outcomes
  • Carryover effects mean that the specific order of different treatments affect the outcomes

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, April 11). Within-Subjects Design | Explanation, Approaches, Examples. Scribbr. Retrieved 18 June 2024, from https://www.scribbr.co.uk/research-methods/within-subjects/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Experimental Research

24 Experimental Design

Learning objectives.

  • Explain the difference between between-subjects and within-subjects experiments, list some of the pros and cons of each approach, and decide which approach to use to answer a particular research question.
  • Define random assignment, distinguish it from random sampling, explain its purpose in experimental research, and use some simple strategies to implement it
  • Define several types of carryover effect, give examples of each, and explain how counterbalancing helps to deal with them.

In this section, we look at some different ways to design an experiment. The primary distinction we will make is between approaches in which each participant experiences one level of the independent variable and approaches in which each participant experiences all levels of the independent variable. The former are called between-subjects experiments and the latter are called within-subjects experiments.

Between-Subjects Experiments

In a  between-subjects experiment , each participant is tested in only one condition. For example, a researcher with a sample of 100 university students might assign half of them to write about a traumatic event and the other half write about a neutral event. Or a researcher with a sample of 60 people with severe agoraphobia (fear of open spaces) might assign 20 of them to receive each of three different treatments for that disorder. It is essential in a between-subjects experiment that the researcher assigns participants to conditions so that the different groups are, on average, highly similar to each other. Those in a trauma condition and a neutral condition, for example, should include a similar proportion of men and women, and they should have similar average IQs, similar average levels of motivation, similar average numbers of health problems, and so on. This matching is a matter of controlling these extraneous participant variables across conditions so that they do not become confounding variables.

Random Assignment

The primary way that researchers accomplish this kind of control of extraneous variables across conditions is called  random assignment , which means using a random process to decide which participants are tested in which conditions. Do not confuse random assignment with random sampling. Random sampling is a method for selecting a sample from a population, and it is rarely used in psychological research. Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other fields too.

In its strictest sense, random assignment should meet two criteria. One is that each participant has an equal chance of being assigned to each condition (e.g., a 50% chance of being assigned to each of two conditions). The second is that each participant is assigned to a condition independently of other participants. Thus one way to assign participants to two conditions would be to flip a coin for each one. If the coin lands heads, the participant is assigned to Condition A, and if it lands tails, the participant is assigned to Condition B. For three conditions, one could use a computer to generate a random integer from 1 to 3 for each participant. If the integer is 1, the participant is assigned to Condition A; if it is 2, the participant is assigned to Condition B; and if it is 3, the participant is assigned to Condition C. In practice, a full sequence of conditions—one for each participant expected to be in the experiment—is usually created ahead of time, and each new participant is assigned to the next condition in the sequence as they are tested. When the procedure is computerized, the computer program often handles the random assignment.

One problem with coin flipping and other strict procedures for random assignment is that they are likely to result in unequal sample sizes in the different conditions. Unequal sample sizes are generally not a serious problem, and you should never throw away data you have already collected to achieve equal sample sizes. However, for a fixed number of participants, it is statistically most efficient to divide them into equal-sized groups. It is standard practice, therefore, to use a kind of modified random assignment that keeps the number of participants in each group as similar as possible. One approach is block randomization . In block randomization, all the conditions occur once in the sequence before any of them is repeated. Then they all occur again before any of them is repeated again. Within each of these “blocks,” the conditions occur in a random order. Again, the sequence of conditions is usually generated before any participants are tested, and each new participant is assigned to the next condition in the sequence.  Table 5.2  shows such a sequence for assigning nine participants to three conditions. The Research Randomizer website ( http://www.randomizer.org ) will generate block randomization sequences for any number of participants and conditions. Again, when the procedure is computerized, the computer program often handles the block randomization.

4 B
5 C
6 A

Random assignment is not guaranteed to control all extraneous variables across conditions. The process is random, so it is always possible that just by chance, the participants in one condition might turn out to be substantially older, less tired, more motivated, or less depressed on average than the participants in another condition. However, there are some reasons that this possibility is not a major concern. One is that random assignment works better than one might expect, especially for large samples. Another is that the inferential statistics that researchers use to decide whether a difference between groups reflects a difference in the population takes the “fallibility” of random assignment into account. Yet another reason is that even if random assignment does result in a confounding variable and therefore produces misleading results, this confound is likely to be detected when the experiment is replicated. The upshot is that random assignment to conditions—although not infallible in terms of controlling extraneous variables—is always considered a strength of a research design.

Matched Groups

An alternative to simple random assignment of participants to conditions is the use of a matched-groups design . Using this design, participants in the various conditions are matched on the dependent variable or on some extraneous variable(s) prior the manipulation of the independent variable. This guarantees that these variables will not be confounded across the experimental conditions. For instance, if we want to determine whether expressive writing affects people’s health then we could start by measuring various health-related variables in our prospective research participants. We could then use that information to rank-order participants according to how healthy or unhealthy they are. Next, the two healthiest participants would be randomly assigned to complete different conditions (one would be randomly assigned to the traumatic experiences writing condition and the other to the neutral writing condition). The next two healthiest participants would then be randomly assigned to complete different conditions, and so on until the two least healthy participants. This method would ensure that participants in the traumatic experiences writing condition are matched to participants in the neutral writing condition with respect to health at the beginning of the study. If at the end of the experiment, a difference in health was detected across the two conditions, then we would know that it is due to the writing manipulation and not to pre-existing differences in health.

Within-Subjects Experiments

In a  within-subjects experiment , each participant is tested under all conditions. Consider an experiment on the effect of a defendant’s physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an unattractive defendant and asked to judge his guilt. In a within-subjects experiment, however, the same group of participants would judge the guilt of both an attractive  and  an unattractive defendant.

The primary advantage of this approach is that it provides maximum control of extraneous participant variables. Participants in all conditions have the same mean IQ, same socioeconomic status, same number of siblings, and so on—because they are the very same people. Within-subjects experiments also make it possible to use statistical procedures that remove the effect of these extraneous participant variables on the dependent variable and therefore make the data less “noisy” and the effect of the independent variable easier to detect. We will look more closely at this idea later in the book .  However, not all experiments can use a within-subjects design nor would it be desirable to do so.

Carryover Effects and Counterbalancing

The primary disadvantage of within-subjects designs is that they can result in order effects. An order effect   occurs when participants’ responses in the various conditions are affected by the order of conditions to which they were exposed. One type of order effect is a carryover effect. A  carryover effect  is an effect of being tested in one condition on participants’ behavior in later conditions. One type of carryover effect is a  practice effect , where participants perform a task better in later conditions because they have had a chance to practice it. Another type is a fatigue effect , where participants perform a task worse in later conditions because they become tired or bored. Being tested in one condition can also change how participants perceive stimuli or interpret their task in later conditions. This  type of effect is called a  context effect (or contrast effect) . For example, an average-looking defendant might be judged more harshly when participants have just judged an attractive defendant than when they have just judged an unattractive defendant. Within-subjects experiments also make it easier for participants to guess the hypothesis. For example, a participant who is asked to judge the guilt of an attractive defendant and then is asked to judge the guilt of an unattractive defendant is likely to guess that the hypothesis is that defendant attractiveness affects judgments of guilt. This knowledge could lead the participant to judge the unattractive defendant more harshly because he thinks this is what he is expected to do. Or it could make participants judge the two defendants similarly in an effort to be “fair.”

Carryover effects can be interesting in their own right. (Does the attractiveness of one person depend on the attractiveness of other people that we have seen recently?) But when they are not the focus of the research, carryover effects can be problematic. Imagine, for example, that participants judge the guilt of an attractive defendant and then judge the guilt of an unattractive defendant. If they judge the unattractive defendant more harshly, this might be because of his unattractiveness. But it could be instead that they judge him more harshly because they are becoming bored or tired. In other words, the order of the conditions is a confounding variable. The attractive condition is always the first condition and the unattractive condition the second. Thus any difference between the conditions in terms of the dependent variable could be caused by the order of the conditions and not the independent variable itself.

There is a solution to the problem of order effects, however, that can be used in many situations. It is  counterbalancing , which means testing different participants in different orders. The best method of counterbalancing is complete counterbalancing   in which an equal number of participants complete each possible order of conditions. For example, half of the participants would be tested in the attractive defendant condition followed by the unattractive defendant condition, and others half would be tested in the unattractive condition followed by the attractive condition. With three conditions, there would be six different orders (ABC, ACB, BAC, BCA, CAB, and CBA), so some participants would be tested in each of the six orders. With four conditions, there would be 24 different orders; with five conditions there would be 120 possible orders. With counterbalancing, participants are assigned to orders randomly, using the techniques we have already discussed. Thus, random assignment plays an important role in within-subjects designs just as in between-subjects designs. Here, instead of randomly assigning to conditions, they are randomly assigned to different orders of conditions. In fact, it can safely be said that if a study does not involve random assignment in one form or another, it is not an experiment.

A more efficient way of counterbalancing is through a Latin square design which randomizes through having equal rows and columns. For example, if you have four treatments, you must have four versions. Like a Sudoku puzzle, no treatment can repeat in a row or column. For four versions of four treatments, the Latin square design would look like:

A B C D
B C D A
C D A B
D A B C

You can see in the diagram above that the square has been constructed to ensure that each condition appears at each ordinal position (A appears first once, second once, third once, and fourth once) and each condition precedes and follows each other condition one time. A Latin square for an experiment with 6 conditions would by 6 x 6 in dimension, one for an experiment with 8 conditions would be 8 x 8 in dimension, and so on. So while complete counterbalancing of 6 conditions would require 720 orders, a Latin square would only require 6 orders.

Finally, when the number of conditions is large experiments can use  random counterbalancing  in which the order of the conditions is randomly determined for each participant. Using this technique every possible order of conditions is determined and then one of these orders is randomly selected for each participant. This is not as powerful a technique as complete counterbalancing or partial counterbalancing using a Latin squares design. Use of random counterbalancing will result in more random error, but if order effects are likely to be small and the number of conditions is large, this is an option available to researchers.

There are two ways to think about what counterbalancing accomplishes. One is that it controls the order of conditions so that it is no longer a confounding variable. Instead of the attractive condition always being first and the unattractive condition always being second, the attractive condition comes first for some participants and second for others. Likewise, the unattractive condition comes first for some participants and second for others. Thus any overall difference in the dependent variable between the two conditions cannot have been caused by the order of conditions. A second way to think about what counterbalancing accomplishes is that if there are carryover effects, it makes it possible to detect them. One can analyze the data separately for each order to see whether it had an effect.

When 9 Is “Larger” Than 221

Researcher Michael Birnbaum has argued that the  lack  of context provided by between-subjects designs is often a bigger problem than the context effects created by within-subjects designs. To demonstrate this problem, he asked participants to rate two numbers on how large they were on a scale of 1-to-10 where 1 was “very very small” and 10 was “very very large”.  One group of participants were asked to rate the number 9 and another group was asked to rate the number 221 (Birnbaum, 1999) [1] . Participants in this between-subjects design gave the number 9 a mean rating of 5.13 and the number 221 a mean rating of 3.10. In other words, they rated 9 as larger than 221! According to Birnbaum, this  difference  is because participants spontaneously compared 9 with other one-digit numbers (in which case it is  relatively large) and compared 221 with other three-digit numbers (in which case it is relatively  small).

Simultaneous Within-Subjects Designs

So far, we have discussed an approach to within-subjects designs in which participants are tested in one condition at a time. There is another approach, however, that is often used when participants make multiple responses in each condition. Imagine, for example, that participants judge the guilt of 10 attractive defendants and 10 unattractive defendants. Instead of having people make judgments about all 10 defendants of one type followed by all 10 defendants of the other type, the researcher could present all 20 defendants in a sequence that mixed the two types. The researcher could then compute each participant’s mean rating for each type of defendant. Or imagine an experiment designed to see whether people with social anxiety disorder remember negative adjectives (e.g., “stupid,” “incompetent”) better than positive ones (e.g., “happy,” “productive”). The researcher could have participants study a single list that includes both kinds of words and then have them try to recall as many words as possible. The researcher could then count the number of each type of word that was recalled. 

Between-Subjects or Within-Subjects?

Almost every experiment can be conducted using either a between-subjects design or a within-subjects design. This possibility means that researchers must choose between the two approaches based on their relative merits for the particular situation.

Between-subjects experiments have the advantage of being conceptually simpler and requiring less testing time per participant. They also avoid carryover effects without the need for counterbalancing. Within-subjects experiments have the advantage of controlling extraneous participant variables, which generally reduces noise in the data and makes it easier to detect any effect of the independent variable upon the dependent variable. Within-subjects experiments also require fewer participants than between-subjects experiments to detect an effect of the same size.

A good rule of thumb, then, is that if it is possible to conduct a within-subjects experiment (with proper counterbalancing) in the time that is available per participant—and you have no serious concerns about carryover effects—this design is probably the best option. If a within-subjects design would be difficult or impossible to carry out, then you should consider a between-subjects design instead. For example, if you were testing participants in a doctor’s waiting room or shoppers in line at a grocery store, you might not have enough time to test each participant in all conditions and therefore would opt for a between-subjects design. Or imagine you were trying to reduce people’s level of prejudice by having them interact with someone of another race. A within-subjects design with counterbalancing would require testing some participants in the treatment condition first and then in a control condition. But if the treatment works and reduces people’s level of prejudice, then they would no longer be suitable for testing in the control condition. This difficulty is true for many designs that involve a treatment meant to produce long-term change in participants’ behavior (e.g., studies testing the effectiveness of psychotherapy). Clearly, a between-subjects design would be necessary here.

Remember also that using one type of design does not preclude using the other type in a different study. There is no reason that a researcher could not use both a between-subjects design and a within-subjects design to answer the same research question. In fact, professional researchers often take exactly this type of mixed methods approach.

  • Birnbaum, M.H. (1999). How to show that 9>221: Collect judgments in a between-subjects design. Psychological Methods, 4 (3), 243-249. ↵

An experiment in which each participant is tested in only one condition.

Means using a random process to decide which participants are tested in which conditions.

All the conditions occur once in the sequence before any of them is repeated.

An experiment design in which the participants in the various conditions are matched on the dependent variable or on some extraneous variable(s) prior the manipulation of the independent variable.

An experiment in which each participant is tested under all conditions.

An effect that occurs when participants' responses in the various conditions are affected by the order of conditions to which they were exposed.

An effect of being tested in one condition on participants’ behavior in later conditions.

An effect where participants perform a task better in later conditions because they have had a chance to practice it.

An effect where participants perform a task worse in later conditions because they become tired or bored.

Unintended influences on respondents’ answers because they are not related to the content of the item but to the context in which the item appears.

Varying the order of the conditions in which participants are tested, to help solve the problem of order effects in within-subjects experiments.

A method in which an equal number of participants complete each possible order of conditions. 

A method in which the order of the conditions is randomly determined for each participant.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

define within subjects experiment

Science Education (Experimental Psychology)

Self-report vs. Behavioral Measures of Recycling

The JoVE video player is compatible with HTML5 and Adobe Flash. Older browsers that do not support HTML5 and the H.264 video codec will still use a Flash-based video player. We recommend downloading the newest version of Flash here, but we support all versions 10 and above.

  • 中文 (Chinese)
  • français (French)
  • Deutsch (German)
  • עברית (Hebrew)
  • italiano (Italian)
  • 日本語 (Japanese)
  • 한국어 (Korean)
  • português (Portugese)
  • español (Spanish)

Applications and Summary

Within-subjects repeated-measures design.

Source: Laboratories of Gary Lewandowski , Dave Strohmetz, and Natalie Ciarocco—Monmouth University

A within-subjects, or repeated-measures, design is an experimental design where all the participants receive every level of the treatment, i.e. , every independent variable. For example, in a candy taste test, the researcher would want every participant to taste and rate each type of candy.

This video demonstrates a within-subjects experiment ( i.e. , one where there is an independent variable with several variations or levels) that examines how different motivational messages ( e.g. , hard work, self-affirmation, outcomes, and positive affect) influence willingness to exert physical effort. As a within-subjects design, the participant will read each of the four types of motivational messages and then lift weights to measure physical effort. By providing an overview of how a researcher conducts a repeated-measures experiment, this video allows viewers to see how to address order effects through counterbalancing, which involves a systematic approach to making sure all possible orders of the conditions occur in the study.

Psychological studies often use higher sample sizes than studies in other sciences. A large number of participants helps to ensure that the population under study is better represented and the margin of error accompanied by studying human behavior is sufficiently addressed. In this video, we demonstrate this experiment using just one participant. However, as represented in the results, we used a total of 72 participants to reach the experiment’s conclusions.

1. Define key variables.

Figure 1

  • For purposes of this experiment, effort is defined as the participant’s willingness to exert physical strength on a weight-lifting task.

2. Conduct the study.

  • Meet the student/participant at the lab.
  • Provide participants with informed consent, a brief description of the research (influences on physical behavior), a sense of the procedure, an indication of potential risks/benefits, the right of withdrawal at any time, and a manner to get help if they experience discomfort.
  • It is important to address order effects. If conditions were always in the same order, the later conditions would likely perform worse because participants would be tired.
  • Counterbalanced conditions involve a systematic approach in which the researcher insures that every order occurs during the study. Each condition occurs the same number of times in each of the spots in the order.
  • Each participant receives only one order.

Figure 5

  • Put each of the 24 possible orders on a slip of paper and place all slips in a bowl.
  • Researcher selects one slip and proceeds to run the experiment in that order. This action randomly selects one of the counterbalanced orders. The researcher should not replace the order in the bowl so that every order gets done once before repeating any one order a second time.
  • Show the participant a page-sized printout of the image/quote while the participant sits at a table.
  • Tell the participant: “Please read this over and take 1 min to reflect on what it means to you.”
  • After a minute say: “Please stand and take this 10 lb dumbbell in your dominant hand. Complete as many curls as you’d like to in the next 30 s.” Demonstrate the curl motion to the participant. Count aloud as you complete each one.

Figure 6

3. Debrief the participant.

  • “Thank you for participating. In this study, I was trying to determine if different types of motivational messages would increase the amount of physical effort participants were willing to exert. There were four types of messages: one emphasizing hard work, one emphasizing what a good person you are, one emphasizing successful outcomes, and one that was generally positive. We hypothesized that the message emphasizing hard work would result in exerting more physical effort.”
  • “We could not tell you about our hypotheses ahead of time because we wanted you to act as naturally as possible.”

Choosing the correct experimental design for the specific scientific question at hand is essential to obtain reliable results. A within-subjects design is an experimental paradigm where all the participants receive every level of the treatment; there is one independent variable with several variations, or conditions. In this way, the amount of error arising from natural variance between individuals—typical of a between-subjects design—is reduced.

One caveat of this type of experiment is that the order in which treatments are given can influence the results. To minimize order effects, counterbalancing is used to ensure all possible orders of the conditions occur in the study.

This video demonstrates a within-subjects experiment that examines how different motivational messages influence willingness to exert physical effort. As a within-subjects design, the participant will read each of four types of motivational messages and then have their physical effort measured via lifting weights.

First, several concepts to consider when designing a within-subjects experiment are introduced. The video will then go on to demonstrate how to conduct the experiment using proper counterbalancing. Finally, analysis of data from a large number of participants will be discussed.

To begin, create an operational definition of a motivational message. An operational definition is an unambiguous description of a variable for the purpose of the experiment.

For the purposes of this experiment, a motivational message is any combination of image and phrase designed to energize a person’s behavior. The person’s behavior is manipulated here by viewing a series of images accompanied by empowering quotes focusing on one of 4 areas: hard work, self-affirmation, outcomes and success, and general positive feelings and emotions.

Next, create an operational definition of effort. For purposes of this experiment, effort is defined as the participant’s willingness to exert physical strength on a weight-lifting task.

Then determine the order of conditions through counterbalance. It is important to address order effects because if conditions were always in the same order, the participants would likely perform worse on the later conditions due to fatigue.

Counterbalance conditions involve a systematic approach by which the researcher ensures that every order occurs during the study. Determine all possible orders of the four conditions. Here, H equals Hard Work, S equals Self-Affirmation, O equals Outcomes, and P equals Positive Emotion.

To conduct the study, first meet the participant at the lab. Provide the participant with informed consent. This is a brief description of the research, a sense of the procedure, an indication of potential risks and benefits, the freedom of withdrawal at any time, and a manner to get help if they experience discomfort.

Next, write each of the 24 possible orders of the four conditions on a slip of paper. Then, place all slips in a bowl.

Select one slip and proceed to run the experiment in that order; this randomly selects one of the counterbalanced orders. Do not replace the order in the bowl so that every order gets done once before repeating any one order a second time.

To run the conditions, show the participant a page-sized printout of the quote while the participant sits at a table. Tell the participant to read the quote over and take a minute to reflect on what it means to them.

After a minute instruct the participant to stand and take a 10 pound dumbbell in his or her dominant hand. Tell the participant to complete as many curls as they would like to in the next 30 sec, and to count aloud as they complete each one.

After 30 sec, note the participant’s number of completed curls on a sheet as the participant takes a brief 10 to 15 sec rest.

Proceed to run all four conditions using the same steps, with the only difference being the image the participant views.

To debrief the participant, tell them the nature of the study.

Researcher: “Thank you for participating. In this study I was trying to determine if different types of motivational messages would increase the amount of physical effort participants were willing to exert. There were four types of messages: one emphasizing hard work, one emphasizing what a good person you are, one emphasizing successful outcomes, and one that was generally positive. We hypothesized that the message emphasizing hard work would result in exerting more physical effort.”

Then, explain explicitly why deception was necessary for the experiment.

Researcher: “We couldn’t tell you about our hypotheses ahead of time because we wanted you to act as naturally as possible.”

The procedure was run for 24 counterbalanced orders three times. Accordingly, data was collected from 72 total participants; a large number of participants are necessary to ensure that the results are reliable and reflective of the greater population.

The numbers shown here reflect the number of times participants in each condition lifted the weight. The results are the means for the 72 participants in each condition.

The outcome indicates that participants who read the hard work motivational message exerted more physical effort by doing more curls of the 10 pound weight in 30 sec.

Within-subjects designs are particularly common in functional magnetic resonance imaging, or fMRI, research where participants lay in an fMRI machine while experiencing several conditions to see how the brain reacts to different experiences.

For example, an fMRI study was used to investigate which areas of the brain correlate with feelings of long-term intense romantic love. To test this, images were shown to participants that represented a highly familiar acquaintance, a close long-term friend, a low-familiar person, and their long-term romantic partner.

Analyses indicated that the long-term romantic partner activated areas of the brain associated with the dopamine reward system, and areas associated with emotional attachments.

You’ve just watched JoVE’s introduction to within-subjects experimental design. Now you should have a good understanding of the concept of counterbalancing to generate proper controls for this type of experiment, how to set up a within-subjects experiment through creating operational definitions, and how to conduct the study. You’ve also been introduced to research performed using fMRI as an application of this type of experiment.

Remember, proper counterbalancing and use of a large number of participants is critical to obtain reliable results when performing within-subjects experiments. 

Subscription Required. Please recommend JoVE to your librarian.

The procedure was repeated three times in 24 counterbalanced orders, so data were collected from 72 total participants. A large number of participants is necessary to ensure that the results are reliable. If this research were conducted using just a few participants, it is likely that the results would have been much different and not reflective of the greater population. 

To determine if there were differences between the motivational messages on physical effort, we performed a repeated-measures analysis of variance (ANOVA). The results indicated that participants who read the hard work motivational message exerted more physical effort by doing more curls of the 10 lb weight in 30 s ( Figure 7 ). 

This repeated-measures within-subject experiment shows how researchers use a study design to compare participants’ experiences in one context to their own experiences in another context. In other words, the study allowed researchers to compare participants to themselves.

Figure 7

Repeated-measures within-subjects designs are particularly common in functional magnetic resonance imaging (fMRI) research. Participants lie in an fMRI machine and experience several conditions to see how the brain reacts to different experiences.

For example, one fMRI study wanted to determine which areas of the brain correlate with feelings of long-term and intense romantic love. 1 To test this, participants saw each of the following images: a highly familiar acquaintance, a close long-term friend, a low-familiar person, and their long-term romantic partner. Analyses indicated that the long-term romantic partner activated areas of the brain ( e.g. , the ventral tegmental area and dorsal striatum) associated with the dopamine reward system, as well as areas ( e.g. , globus pallidus and substantia nigra) associated with emotional attachments. 

  • Acevedo, B. P., Aron, A., Fisher, H. E., & Brown, L. L. Neural correlates of long-term intense romantic love. Social Cognitive And Affective Neuroscience. 8 (2), 145-159. doi:10.1093/scan/nsq092 (2012).

Choosing the correct experimental design for the specific scientific question at hand is essential to obtain reliable results. A within-subjects design is an experimental paradigm where all the participants receive every level of the treatment; there is one independent variable with several variations, or conditions. In this way, the amount of error arising from natural variance between individuals—typical of a between-subjects design—is reduced.

For the purposes of this experiment, a motivational message is any combination of image and phrase designed to energize a person’s behavior. The person’s behavior is manipulated here by viewing a series of images accompanied by empowering quotes focusing on one of 4 areas: hard work, self-affirmation, outcomes and success, and general positive feelings and emotions.

Next, create an operational definition of effort. For purposes of this experiment, effort is defined as the participant’s willingness to exert physical strength on a weight-lifting task.

After 30 sec, note the participant’s number of completed curls on a sheet as the participant takes a brief 10 to 15 sec rest.

Researcher: “Thank you for participating. In this study I was trying to determine if different types of motivational messages would increase the amount of physical effort participants were willing to exert. There were four types of messages: one emphasizing hard work, one emphasizing what a good person you are, one emphasizing successful outcomes, and one that was generally positive. We hypothesized that the message emphasizing hard work would result in exerting more physical effort.”

Researcher: “We couldn’t tell you about our hypotheses ahead of time because we wanted you to act as naturally as possible.”

You’ve just watched JoVE’s introduction to within-subjects experimental design. Now you should have a good understanding of the concept of counterbalancing to generate proper controls for this type of experiment, how to set up a within-subjects experiment through creating operational definitions, and how to conduct the study. You’ve also been introduced to research performed using fMRI as an application of this type of experiment.

Remember, proper counterbalancing and use of a large number of participants is critical to obtain reliable results when performing within-subjects experiments. 

JoVE Science Education Database. Experimental Psychology. Within-subjects Repeated-measures Design. JoVE, Cambridge, MA, (2024).

Get cutting-edge science videos from J o VE sent straight to your inbox every month.

mktb-description

We use cookies to enhance your experience on our website.

By continuing to use our website or clicking “Continue”, you are agreeing to accept our cookies.

WeChat QR Code - JoVE

  • Best-Selling Books
  • Zimbardo Research Fields

The Stanford Prison Experiment

  • Heroic Imagination Project (HIP)
  • The Shyness Clinic

The Lucifer Effect

Time perspective theory.

  • Psychology Definitions

define within subjects experiment

Within-Subjects Design: Psychology Definition, History & Examples

Within-subjects design, a cornerstone experimental approach in psychological research, involves repeatedly measuring the same subjects across different conditions or over time. This intrasubject comparison allows researchers to observe changes in behavior or responses under varied stimuli while controlling for individual differences that might confound results.

Historically, this design emerged as psychologists sought to refine experimental methods to yield more reliable and valid data. It has since become integral in fields ranging from cognitive psychology to clinical therapy evaluations.

Examples of within-subjects designs include repeated measures ANOVA and crossover studies. This design’s efficacy has been proven across numerous applications, making it a vital tool in the psychological researcher’s repertoire.

Understanding its definition, historical development, and practical examples provides essential insight into the evolution of experimental psychology.

Table of Contents

A within-subjects design is a research method where participants experience all conditions or treatments being studied. This helps researchers compare how participants respond to different conditions and control for individual differences.

It allows for stronger statistical analysis and helps establish cause-and-effect relationships between variables. However, it can be affected by carryover effects, so counterbalancing is important.

The within-subjects design, also known as the repeated measures design, has a rich historical background that can be traced back to the early experiments in psychology. It originated as researchers recognized the need for controlling extraneous variables in order to isolate the effects of the independent variable . This approach emerged as a way to minimize variability among participants and obtain more precise measurements of change within individuals over time.

The within-subjects design played a crucial role in the evolution of experimental psychology. Pioneering psychologists such as Wilhelm Wundt and William James were key figures associated with its development. They recognized the importance of this design in systematically investigating the intricacies of human consciousness and behavior.

Throughout history, significant events and studies contributed to the evolution of the within-subjects design. One notable event was the establishment of the first experimental psychology laboratory by Wilhelm Wundt in 1879 at the University of Leipzig in Germany. This laboratory became a hub for conducting experiments using the within-subjects design and paved the way for future advancements in psychological research.

Another significant study that contributed to the development of the within-subjects design was the work of Ivan Pavlov and his experiments on classical conditioning in the early 20th century. Pavlov’s use of repeated measures and manipulation of independent variables showcased the effectiveness of within-subjects designs in studying behavioral responses.

Over time, the within-subjects design has been refined and improved. Methodological advancements in statistics and research design have allowed researchers to better control for confounding variables and enhance the reliability and validity of psychological research using this design.

Today, the within-subjects design remains a cornerstone in the methodological arsenal of psychologists. It continues to be used in a wide range of studies, from cognitive psychology to clinical psychology, exemplifying the rigorous and meticulous inquiry that characterizes the field.

Commonly used in everyday life, within-subjects designs can help us understand how individuals respond to different situations. For instance, imagine you’re trying to figure out which type of exercise helps you feel more energized throughout the day.

You decide to compare your energy levels after doing yoga and after going for a run. By doing the yoga one day and the run on another day, you can directly compare how you feel and determine which exercise is more effective for you personally.

This approach is particularly useful because it allows you to control other factors that might affect your energy levels, such as your diet or sleep, and focus on understanding how you respond to different exercises. By paying attention to your own experiences, you can gain valuable insights into what works best for you.

Related Terms

In the context of experimental psychology, terms such as ‘repeated measures design’, ‘counterbalancing’, and ‘order effects’ are closely linked and complement each other in understanding within-subjects research methodologies.

A ‘repeated measures design’, also known as a within-subjects design, involves using the same participants for every condition of the experiment . This approach can increase statistical power by reducing individual differences. However, it can also introduce ‘order effects’, where the sequence of conditions influences outcomes regardless of the experimental variables.

To address this issue, researchers employ ‘counterbalancing’, which involves systematically varying the order of conditions across participants. This method ensures that potential sequence effects are evenly distributed and do not bias the results, thus enhancing the validity of the experimental findings.

Other related terms include ‘carryover effects’, which refer to the lingering influence of a previous condition on subsequent ones, and ‘practice effects’, which denote improvements in performance due to repeated exposure to the same task.

These terms are important considerations when designing and interpreting within-subjects experiments, as they help researchers understand and control for potential confounding factors that may impact the results.

Throughout the development of within-subjects design, several reputable sources, studies, and publications have contributed knowledge about this psychological term. These references provide a foundation for understanding the principles and applications of within-subjects design in psychological research. They are academically credible and offer further reading for researchers and scholars interested in exploring the intricacies of repeated measures and its implications .

The following references encompass pivotal academic journals, books, and articles that dissect the methodology, advantages, and limitations of within-subjects design:

  • Loftus, E. F. (1996). Eyewitness testimony. Harvard University Press.
  • Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition : The Implicit Association Test. Journal of Personality and Social Psychology, 74(6), 1464-1480. doi:10.1037/0022-3514.74.6.1464
  • Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Krantz, D. H., & Dalal, R. (2000). Validity of Web-based psychological research. In M. H. Birnbaum (Ed.), Psychological experiments on the Internet (pp. 35-60). Academic Press.

These references offer a comprehensive view of within-subjects design, providing empirical evidence and theoretical discussions that support its use in experimental psychology. They are academically rigorous and serve as vital resources for the academic community, fostering a deeper understanding of this research methodology.

RECOMMENDED POSTS

  • Stay Connected
  • Terms Of Use

What is within-subjects study design?

Last updated

14 February 2023

Reviewed by

Jean Kaluza

Short on time? Get an AI generated summary of this article instead

Understanding the options available to you is the first step in choosing the right design. In this article, we'll be taking a detailed look at within-subjects design, and comparing it to between-subjects design.

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is within-subjects design?

Within-subjects design, also known as repeated measures design, is a type of experimental design in which the same participants are tested under multiple conditions or points in time. This allows researchers to directly compare the responses of each individual, rather than relying on group averages as in between-subjects designs.

In a within-subjects design, the same group of participants is tested under all conditions, so there's no need to worry about potential differences between groups that could confound the results. This makes it easier to control extraneous variables and increases the power of the study, since the same participants serve as their own controls.

For example, if a researcher is interested in the effect of a new medication on blood pressure, they could use a within-subjects design by measuring blood pressure in the same group of participants both before and after taking the medication. 

By comparing the blood pressure of each individual before and after taking the medication, the researcher can directly assess the effect of the medication on blood pressure without having to worry about differences between groups of participants.

  • How to use a within-subjects design

Although every experiment should be designed according to its own unique set of criteria, below are the basic steps involved in using a within-subjects design.

Define the research question - The first step in using a within-subjects design is to clearly define the research question and determine the specific variables of interest.

Select the participants - The next step is to select the participants for the study. It's important to carefully consider the inclusion and exclusion criteria and to ensure that the sample is representative of the population of interest.

Determine the conditions or time points - The next step is to determine the conditions or time points that will be tested in the study. Researchers must carefully consider the potential confounding and extraneous variables that may affect the results, and therefore design the study in a way that controls for these variables as much as possible.

Administer the measures - The next step is to administer the measures to the participants under each condition or time point.

Analyze the data - The final step is to analyze the data using appropriate statistical analyses to test the research hypotheses and draw conclusions about the results. 

  • Within-subjects versus between-subjects design

A specific UX example of the differences between within-subjects design and between-subjects design can be illustrated through a typical A/B testing scenario. For a within-subjects study, the same group of participants would be shown both A and B variations. For a between-subjects design study, the participants would be separated into two different groups with one being shown the A variation, while the other is shown the B variation.

Researchers often find themselves choosing from a between-subjects design and a within-subjects one. These two options are opposites in many ways, and making the correct choice means understanding the unique differences between the two, as well as their strengths and weaknesses. 

  • What is between-subjects design?

Between-subjects design, also known as independent groups design, is a type of experimental design in which different groups of participants are tested under different conditions or at different time points. This means that each participant is only tested under one condition, and the results are compared across the different groups that have been tested.

The between-subjects equivalent to our previous blood-pressure study example looks like this: Researchers randomly assign participants to either a treatment group or a control group , and measure blood pressure in both groups before and after taking the medication. 

By comparing the average blood pressure of the two groups, the effect of the medication on blood pressure can be assessed, but researchers cannot directly compare the blood pressure of individual participants.

  • Pros and cons of a between-subjects design

To help you better understand how between-subjects design compares to within-subjects design, let's take a look at the pros and cons of the former. Then, we'll take a closer look at how to choose between them.

Between-subjects design is generally more suitable for studying between-subjects differences, such as the effects of different treatments or the influence of individual characteristics on a response.

This design allows researchers to control for many extraneous variables and reduce the influence of individual differences on the results.

Between-subjects design is often more feasible and ethical than within-subjects design, especially when it is not possible or ethical to randomly assign participants to different conditions.

Between-subjects design does not allow researchers to directly compare the responses of individual participants, which may reduce the power of the study and limit the ability to detect small but meaningful changes.

This design is vulnerable to confounding variables, such as individual differences in age, gender, and background, which may influence the results and make it difficult to interpret the findings.

Between-subjects design may require a larger sample size to achieve the same level of statistical power as within-subjects design, which can be more time-consuming and expensive to implement.

  • Choosing from between-subjects and within-subjects designs

When it comes time to choose the design that meets your study’s needs, a good rule of thumb is to determine whether the differences you're looking to study are between subjects or within subjects. 

Between-subjects design is generally more suitable for studying between-subjects differences, such as the effects of different treatments or the influence of individual characteristics on a response. This design is particularly useful when it is not feasible or ethical to randomly assign participants to different conditions, allowing researchers to control for certain variables and reduce the influence of individual differences on the results.

Within-subjects design, on the other hand, is generally more suitable for studying within-subjects changes or differences, such as the effects of a treatment over time or the difference between two closely related conditions. This design is particularly useful when it is important to control for extraneous variables and eliminate between-subjects variability, allowing researchers to directly compare the responses of individual participants and increasing the power of the study.

  • The importance of randomization

Regardless of the design you choose, randomization is an important principle. It helps to control for extraneous variables and reduce the influence of individual differences on the results. 

In a between-subjects design, randomization helps to control for extraneous variables that may differ between the groups, such as age, gender, and background. By randomly assigning participants to different groups, researchers can reduce the risk of systematic bias and increase the validity of the study.

In a within-subjects design, randomization can be used to control for order effects, which refer to changes in the response of participants due to the order in which they are tested. For example, if a researcher is studying the effect of a treatment on anxiety, they could use a within-subjects design and randomly assign the order in which the treatment and control conditions are presented to each participant. This helps to control for potential order effects and reduces the risk of systematic bias.

  • The advantages of within-subjects designs

When deciding the design of your experiments, it's important to understand the strengths and weaknesses of the options available to you. The following advantages make within-subjects design a good option.

More statistical power – Because the same participants are used as their own controls, within-subjects designs have higher statistical power than between-subjects designs, which means they are more likely to detect real effects if they exist.

Less variability – By using the same participants in all conditions, within-subjects designs eliminate between-subjects variability, which can be a major source of noise in the data. This allows researchers to detect small but meaningful changes in the response of each individual.

Improved control – Within-subjects designs allow researchers to control for many extraneous variables, such as individual differences in age, gender, and background, which can confound the results in between-subjects designs.

Greater efficiency – Within-subjects designs are generally more efficient than between-subjects designs, as they require fewer participants to achieve the same level of statistical power. This can be particularly useful when studying rare or hard-to-recruit populations.

Increased feasibility – Within-subjects designs can be more feasible than between-subjects designs in some cases, especially when it is difficult or impossible to randomly assign participants to different conditions.

  • The disadvantages of within-subjects designs

Although the within-subjects design is a great choice for many types of experiments, it doesn't fit all of them. For those it does fit, there are also limitations that researchers should be aware of to improve the design of their study.

Order effects – Within-subjects design is vulnerable to order effects, which refer to changes in the response of participants due to the order in which they are tested. Order effects can be controlled through the use of randomization, but it is important to carefully consider their potential impact on the results.

Practice effects – Within-subjects design may also be vulnerable to practice effects, which refer to improvements in performance due to repeated testing. Practice effects can be controlled through the use of appropriate counterbalancing and statistical analyses, but it is important to carefully consider their potential impact on the results.

Fatigue – Within-subjects design may be more prone to participant fatigue than between-subjects design, as participants are being tested multiple times and may become tired or bored. This can affect the quality of the data and make it difficult to interpret the results.

Data analysis – Within-subjects design requires more sophisticated statistical analyses to account for the repeated measures and the potential within-subjects correlations in the data. This may require more specialized knowledge and expertise and may be more challenging to interpret than between-subjects design.

When should you use a within-subjects design?

Within-subjects design should be used when researchers are interested in studying within-subjects changes or differences, such as the effects of a marketing effort over time or the difference between two closely related screen layouts.

What test is used for a within-subjects design?

The appropriate statistical test for a within-subjects design depends on the specific research question and the type of data being collected. This may include paired t-tests, repeated measures ANOVA, or mixed-effects models.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 6 February 2023

Last updated: 15 January 2024

Last updated: 6 October 2023

Last updated: 5 February 2023

Last updated: 16 April 2023

Last updated: 7 March 2023

Last updated: 9 March 2023

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

define within subjects experiment

Users report unexpectedly high data usage, especially during streaming sessions.

define within subjects experiment

Users find it hard to navigate from the home page to relevant playlists in the app.

define within subjects experiment

It would be great to have a sleep timer feature, especially for bedtime listening.

define within subjects experiment

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Skip navigation

Nielsen Norman Group logo

World Leaders in Research-Based User Experience

Between-subjects vs. within-subjects study design.

define within subjects experiment

July 10, 2023 2023-07-10

  • Email article
  • Share on LinkedIn
  • Share on Twitter

In This Article:

Two ways to plan your study, experimental design in quantitative studies, which is better: between-subjects or within-subjects, randomization: essential for both types of design.

When you want to compare several user interfaces in a single study, there are two ways of planning your study:

  • Between-subjects (or between-groups ) study design: different people test each condition, so that each person is only exposed to a single user interface .
  • Within-subjects (or repeated-measures ) study design: the same person tests all the conditions (i.e., all the user interfaces).

(Note that here we use the word “design” to refer to the design of the study , and not to website design. The design of the study is also called experimental design .)

define within subjects experiment

For example, if we wanted to compare two car-rental sites A and B by looking at how participants book cars on each site, our study could be designed in two different ways:

  • Between-subjects: Each participant could test a single car-rental site and book a car only on that site.
  • Within-subjects: Each participant could test both car-rental sites and book a car on each.

Any type of user research that involves more than a single test condition must determine whether to be between-subjects or within-subjects. However, the distinction is particularly important for quantitative studies .

Unlike qualitative studies, quantitative usability studies aim to result in findings that are statistically likely to generalize to the whole user population. How the data from quantitative studies is analyzed depends on the study design.

Independent and Dependent Variables

Often, the main goal of quantitative usability studies is to compare — a site with its competitors, two different iterations of a design, or two different groups of users (such as experts vs. novices).  Like in any scientific experiment in which we want to detect causal relationships, a quantitative study involves two types of variables :

  • Independent variables , which are directly manipulated by the researcher
  • Dependent variables , which are measured (and expected to vary as a result of the independent-variable manipulation)

(If the study produces statistically significant results, then we can say that a change in the independent variable caused a change in the dependent variable.)

Let’s go back to our original car-rental example. If we wanted to measure which of the two sites, A or B, is better for the task of reserving a car, we could choose Site (with two possible values or levels — A and B) as the independent variable, and the time on task and the accuracy for booking a car could be the dependent variables. The goal of the study would be to see whether the dependent variables (time and accuracy) change when we vary the site or they stay the same. (If they stay the same, then none of the sites is better than the other.)

Between-Subjects, Within-Subjects, or Both?

To plan our study, the next question is whether the study design should be between-subjects or within-subjects — that is, whether a participant in the study should be exposed to all the different conditions for the independent variable in our study (within-subjects) or only to one condition (between-subjects). The choice of experimental design will affect the type of statistical analysis that should be used on your data.

A study design can be both within-subjects and between-subjects . For example, assume that, in the case of our car-rental study, we were also interested in knowing how participants younger than 30 perform compared with older participants. In this case we would have two independent variables:

  • Age , with 2 levels: under 30, over 30
  • Site , with 2 levels: A and B

For the study, we will recruit an equal number of participants in each age group. Let’s assume that we decide that each participant, whether under or over 30, will make a car-rental reservation both on site A and on site B. In this case, the study is within-subjects with respect to the independent variable Site (because each person sees both levels of this variable — that is, both site A and site B). However, the study is between-subjects with respect to Age : one person can only be in a single age group (either under or over 30, not both). (Well, technically, you could pick a group of under-30-year olds and wait until they turn 30 to have them test the sites again, but this setup is obviously highly impractical.)

Some independent variables may impose the choice of study design . Age is one of them, as seen above. Others are Expertise (if we want to compare experts and novices), User Type (if we want to compare different user groups or personas — for example, business traveler vs. leisure traveler), or Gender (assuming that a person cannot be of several genders at the same time). Outside usability, drug trials are one common case of between-subject design: participants are exposed to only one treatment: either the drug being tested or a placebo, not both.

Sometimes the manipulation changes the state of the participant : for example, if you want to see which of two curricula is more effective for teaching reading, you could not have the same student be exposed to both, because once she’s learned how to read, she cannot unlearn it.

Unfortunately, there is no easy answer to this question. As seen above, sometimes your independent variables will dictate the experimental design. But in many situations, both designs may be possible. The table below summarizes the advantages of both.

No transfer across conditions

Require fewer participants and are cheaper

Shorter study sessions

Minimize the noise in your data

Easy to set up, especially when you have multiple independent variables

 

Below we discuss each of these advantages.

Between-Subjects Minimizes the Learning and Transfer Across Conditions

After a person has completed a series of tasks on a car-rental site, they are more knowledgeable about the domain than she was before. For example, they may now know that car-rental sites charge an extra fee for drivers under 21, or what a collision-damage waiver is. That knowledge will likely help them become more efficient on a second car-rental site, even though that second site may be very different from the first.

With between-subject design, this transfer of knowledge is not an issue — participants are never exposed to several levels of the same independent variable.

Between-Subjects Studies Have Shorter Sessions

  A participant who tests a single car-rental site will have a shorter session than one who tests two. Shorter sessions are less tiring (or boring) for users and can also be more appropriate for remote unmoderated testing (especially since tools like UserZoom usually require a fairly short session length).

Between-Subject Studies Are Easier to Set Up

When the study is within-subjects, you will have to use randomization of your stimuli to make sure that there are no order effects .

For example, in our car-rental study, we need to make sure that participants don’t always start with site A and then move on to site B. The order of the sites needs to be random for each participant. This is easy with just two sites: randomly assign 50% of users to start with each site. But let’s say that you want to look at 4 sites and each site could be in dark or light mode. As you increase the number of independent variables and of levels for your independent variables, randomization becomes more difficult to implement within some of the existing platforms for quantitative usability testing .

Within-Subject Designs Require Fewer Participants

To detect a statistically significant difference between two conditions, you’ll often need a fairly large number of a data points (often above 40) in each condition. If you have a within-subject design, each participant will provide a data point for each level of the independent variable. For our car-rental study, 40 participants will provide data points for both sites. But if the study is between-subjects you will need twice as many to get the same number of data points. That means twice the cost. Within-subjects studies are, thus, more cost-effective than between-subjects ones.

Within-Subjects Design Minimize the Noise in Your Data

Perhaps the most important advantage of within-subject designs is that they make it less likely that a real difference that exists between your conditions will stay undetected or be covered by random noise.

Individual participants bring in to the test their own history, background knowledge, and context. One may be tired after a long night of partying, another one may be bored, yet another one may have received a great news just before the study and be happy. If the same participant interacts with all levels of a variable, she will affect them in the same way. The happy person will be happy on both sites, the tired one will be tired on both. But if the study is between-subjects, the happy participant will only interact with one site and may affect the final results. You’ll have to make sure you get a similar happy participant in the other group to counteract her effects.

In practice, researchers won’t be able to assess such differences between participants — although they may match the gender, the experience, and the age across groups, it will be difficult to predict or detect other factors specific to each participant.

Whether your experimental design is within-subjects or between-subjects, you will have to be concerned with randomization, although in slightly different ways.

Above, we discussed how randomization counteracts the possible order effects and minimizes transfer and learning across conditions in within-subjects design .

For between-subject designs, you must make sure that participants are allotted randomly to conditions, because you want to ensure that your participant assignment does not affect your study results (that is, it has to ensure that the study has internal validity ). Thus, if a researcher decides that all the participants that he likes should interact with site A and then he finds that site A performed better than site B, he won’t know whether he’s discovered a true difference between the sites or whether the result simply reflects his assignment (for example, because people who sense that they are liked tend to return the favor, and may be more patient or have a positive mindset during the test). In this situation, the assignment is a confounding variable .

Even without such an obvious bias as your personal preferences, it’s easy to get randomization wrong. Say that you run a study across four days, Saturday through Tuesday. You might decide to have the first half of the test users start with site A and have the second half of the users start with site B. However, this is not a true randomization, because it’s very likely that certain types of people are more likely to agree to a study during the weekend and other types of people are more likely to sign up for your weekday testing slots. In this example, the day of the week is a confounding variable.

User research can be between-subjects or within-subjects (or both), depending on whether each participant is exposed to only one condition or to all conditions that are varied within a study. Each of these types of experimental design has its own advantages and disadvantages; within-subjects design requires fewer participants and increases the chance of discovering a true difference among your conditions; between-subjects designs minimize the learning effects across conditions, lead to shorter sessions, and may be easier to set up and analyze.

Related Courses

Measuring ux and roi.

Use metrics from quantitative research to demonstrate value

How to Interpret UX Numbers: Statistics for UX

When research data should be trusted; what statistics to use when

ResearchOps: Scaling User Research

Orchestrate and optimize research to amplify its impact

Related Topics

  • Research Methods Research Methods
  • User Testing

Learn More:

Please accept marketing cookies to view the embedded video. https://www.youtube.com/watch?v=LpiPZciRIa8

Between-Subject vs. Within-Subject Study Design in User Research

define within subjects experiment

Card Sorting: Why & When

Samhita Tankala · 3 min

define within subjects experiment

Measurement Error in UX Research

Caleb Sponheim · 3 min

define within subjects experiment

Product Instrumentation: 3 Benefits

Sara Paul · 4 min

Related Articles:

Should You Run a Survey?

Maddie Brown · 6 min

Writing Good Survey Questions: 10 Best Practices

Maddie Brown · 9 min

Field Studies

Susan Farrell and Therese Fessenden · 8 min

Competitive Usability Evaluations

Tim Neusesser · 6 min

How to Run Surveys at Every Stage of the Design Cycle

27 Tips and Tricks for Conducting Successful User Research in the Field

Susan Farrell and Mayya Azarova · 5 min

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

16.5: Experimental Designs

  • Last updated
  • Save as PDF
  • Page ID 36211

  • Rice University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Learning Objectives

  • Distinguish between between-subject and within-subject designs
  • State the advantages of within-subject designs
  • Define "multi-factor design" and "factorial design"
  • Identify the levels of a variable in an experimental design
  • Describe when counterbalancing is used

There are many ways an experiment can be designed. For example, subjects can all be tested under each of the treatment conditions or a different group of subjects can be used for each treatment. An experiment might have just one independent variable or it might have several. This section describes basic experimental designs and their advantages and disadvantages.

Between-Subjects Designs

In a between-subjects design, the various experimental treatments are given to different groups of subjects. For example, in the "Teacher Ratings" case study, subjects were randomly divided into two groups. Subjects were all told they were going to see a video of an instructor's lecture after which they would rate the quality of the lecture. The groups differed in that the subjects in one group were told that prior teaching evaluations indicated that the instructor was charismatic whereas subjects in the other group were told that the evaluations indicated the instructor was punitive. In this experiment, the independent variable is "Condition" and has two levels (charismatic teacher and punitive teacher). It is a between-subjects variable because different subjects were used for the two levels of the independent variable: subjects were in either the "charismatic teacher" or the "punitive teacher" condition. Thus the comparison of the charismatic-teacher condition with the punitive-teacher condition is a comparison between the subjects in one condition with the subjects in the other condition.

The two conditions were treated exactly the same except for the instructions they received. Therefore, it would appear that any difference between conditions should be attributed to the treatments themselves. However, this ignores the possibility of chance differences between the groups. That is, by chance, the raters in one condition might have, on average, been more lenient than the raters in the other condition. Randomly assigning subjects to treatments ensures that all differences between conditions are chance differences; it does not ensure there will be no differences. The key question, then, is how to distinguish real differences from chance differences. The field of inferential statistics answers just this question. The inferential statistics applicable to testing the difference between the means of the two conditions can be found here. Analyzing the data from this experiment reveals that the ratings in the charismatic-teacher condition were higher than those in the punitive-teacher condition. Using inferential statistics, it can be calculated that the probability of finding a difference as large or larger than the one obtained if the treatment had no effect is only \(0.018\). Therefore it seems likely that the treatment had an effect and it is not the case that all differences were chance differences.

Independent variables often have several levels. For example, in the "Smiles and Leniency" case study the independent variable is "type of smile" and there are four levels of this independent variable:

  • false smile
  • miserable smile
  • a neutral control

Keep in mind that although there are four levels, there is only one independent variable. Designs with more than one independent variable are considered next.

Multi-Factor Between-Subject Designs

In the "Bias Against Associates of the Obese" experiment, the qualifications of potential job applicants were judged. Each applicant was accompanied by an associate. The experiment had two independent variables: the weight of the associate (obese or average) and the applicant's relationship to the associate (girl friend or acquaintance). This design can be described as an Associate's Weight (\(2\)) x Associate's Relationship (\(2\)) factorial design. The numbers in parentheses represent the number of levels of the independent variable. The design was a factorial design because all four combinations of associate's weight and associate's relationship were included. The dependent variable was a rating of the applicant's qualifications (on a \(9\)-point scale).

If two separate experiments had been conducted, one to test the effect of Associate's Weight and one to test the effect of Associate's Relationship then there would be no way to assess whether the effect of Associate's Weight depended on the Associate's Relationship. One might imagine that the Associate's Weight would have a larger effect if the associate were a girl friend rather than merely an acquaintance. A factorial design allows this question to be addressed. When the effect of one variable does differ depending on the level of the other variable then it is said that there is an interaction between the variables.

Factorial designs can have three or more independent variables. In order to be a between-subjects design there must be a separate group of subjects for each combination of the levels of the independent variables.

Within-Subjects Designs

A within-subjects design differs from a between-subjects design in that the same subjects perform at all levels of the independent variable. For example consider the "ADHD Treatment" case study. In this experiment, subjects diagnosed as having attention deficit disorder were each tested on a delay of gratification task after receiving methylphenidate (MPH). All subjects were tested four times, once after receiving one of the four doses. Since each subject was tested under each of the four levels of the independent variable "dose," the design is a within-subjects design and dose is a within-subjects variable. Within-subjects designs are sometimes called repeated-measures designs.

Counterbalancing

In a within-subject design it is important not to confound the order in which a task is performed with the experimental treatment. For example, consider the problem that would have occurred if, in the ADHD study, every subject had received the doses in the same order starting with the lowest and continuing to the highest. It is not unlikely that experience with the delay of gratification task would have an effect. If practice on this task leads to better performance, then it would appear that higher doses caused the better performance when, in fact, it was the practice that caused the better performance.

One way to address this problem is to counterbalance the order of presentations. In other words, subjects would be given the doses in different orders in such a way that each dose was given in each sequential position an equal number of times. An example of counterbalancing is shown in Table \(\PageIndex{1}\).

Table \(\PageIndex{1}\):
Subject 0 mg/kg 0.15 mg/kg 0.30 mg/kg 0.60 mg/kg
1 First Second Third Fourth
2 Second Third Fourth First
3 Third Fourth First Second
4 Fourth First Second Third

It should be kept in mind that counterbalancing is not a satisfactory solution if there are complex dependencies between which treatment precedes which and the dependent variable. In these cases, it is usually better to use a between-subjects design than a within-subjects design.

Advantage of Within-Subjects Designs

An advantage of within-subjects designs is that individual differences in subjects' overall levels of performance are controlled. This is important because subjects invariably will differ greatly from one another. In an experiment on problem solving, some subjects will be better than others regardless of the condition they are in. Similarly, in a study of blood pressure some subjects will have higher blood pressure than others regardless of the condition. Within-subjects designs control these individual differences by comparing the scores of a subject in one condition to the scores of the same subject in other conditions. In this sense each subject serves as his or her own control. This typically gives within-subjects designs considerably more power than between-subjects designs. That is, this makes within-subjects designs more able to detect an effect of the independent variable than are between-subjects designs.

Within-subjects designs are often called "repeated-measures" designs since repeated measurements are taken for each subject. Similarly, a within-subject variable can be called a repeated-measures factor.

Complex Designs

Designs can contain combinations of between-subject and within-subject variables. For example, the "Weapons and Aggression" case study has one between-subject variable (gender) and two within-subject variables (the type of priming word and the type of word to be responded to).

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

7.6: Within Subjects Design

  • Last updated
  • Save as PDF
  • Page ID 124578

  • Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton
  • Kwantlen Polytechnic U., Washington State U., & Texas A&M U.—Texarkana

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Learning Objective

  • Describe and give an example of a within-subjects design.

Within-Subjects Design

In a within-subjects , each participant is tested under all conditions. To gather subjects for this type of experiment you would use a non-probability sample. Consider an experiment on the effect of a defendant’s physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an unattractive defendant and asked to judge his guilt. In a within-subjects experiment, however, the same group of participants would judge the guilt of both an attractive and an unattractive defendant.

Here is what this experiment type would look like.

X1 01 X2 01

O¹: First post-test

O²: Second post-test

X¹: Experimental Stimulus

X²: Control Stimulus or

Comparison Stimulus

The primary advantage of this approach is that it provides maximum control of extraneous participant variables. Participants in all conditions have the same mean IQ, same socioeconomic status, same number of siblings, and so on—because they are the very same people. Within-subjects experiments also make it possible to use statistical procedures that remove the effect of these extraneous participant variables on the dependent variable and therefore make the data less “noisy” and the effect of the independent variable easier to detect. We will look more closely at this idea later in the book . However, not all experiments can use a within-subjects design nor would it be desirable to do so.

KEY TAKEAWAY

  • This type of research is useful for comparisons.
Within-subjects vs. Between-subjects Designs: Which to Use? I. Scott MacKenzie Dept. of Computer Science York University Toronto, Ontario, Canada M3J 1P3 [email protected]   Last update: 29/3/2013   The information in this research note appears in greater detail, and with additional discussion on experiment design, in Chapter 5 in Human-Computer Interaction: An Empirical Research Perspective (MacKenzie, 2013). Background Most empirical evaluations of input devices or interaction techniques are comparative.  A new device or technique is compared against alternative devices or techniques.  One design for such experiments is the within-subjects design , also known as a repeated-measures design .  In a within-subjects design, each participant is tested under each condition.  The conditions are, for example, "device A", "device B", etc.  So, for each participant, the measurements under one condition are repeated on the other conditions.  The alternative to a within-subjects design is a between-subjects design .  In this case, each participant is tested under one condition only.  One group of participants is tested under condition A, a separate group is tested under condition B, and so on. The test conditions (A, B, ...) are levels of the same factor .  For example, the factor might be device and the levels might be mouse , trackball , and touchpad .  In experiments with more than one factor, it is possible to use a within-subjects (viz. repeated-measures) assignment for the levels of one factor and a between-subjects assignment for the levels of another factor. Considerations for the Design of an Experiment There are a number of issues to consider in deciding whether an experimental factor should be assigned within-subjects or between-subjects. Sometimes there is no choice.  Here is an example where a between-subjects design must be used.  If hand preference is a factor in an experiment, it must be assigned between-subjects, because a participant cannot be both left handed and right handed!  Hand preference must be a between-subjects factor, with separate groups of left handers and right handers recruited for the experiment. Conversely, here is an example where a within-subjects design must be used.  If an experiment seeks to investigate the acquisition of skill over multiple sessions of practice, then the only option for the factor session is within-subjects.  No two ways about it!  The factor is session , it is within-subjects, and the levels are session # 1, session #2 , session #3 , and so on. However, in many other situations, there is a choice.  If so, a within-subjects design is generally preferred.  There are at least two reasons.  First, fewer participants are needed in a within-subjects design since each participant is tested on all levels of a factor.  Although more testing is required for each participant, there is an advantage in having fewer participants overall, since recruiting, scheduling, briefing, demonstrating, practicing, and so on, are easier if there are fewer participants. Another advantage of a within-subjects design is that there is less variance due to participant disposition (since there are fewer participants).  A participant who is predisposed to be meticulous (or reckless!) will likely exhibit such behaviour consistently across the experimental conditions.  This is beneficial because the variability in measurements is more likely due to differences among conditions than to behavioural differences between participants (if a between-subjects design were used). Interference Effects and Learning Effects Despite the above-noted advantages of a within-subjects design, a between-subjects design is sometimes preferred in order to avoid interference between the conditions.  If the conditions under test involve conflicting motor skills, such as typing on keyboards with different arrangements of keys, then a within-subjects design is a poor choice, because the required skill to operate one keyboard tends to inhibit, block, or otherwise interfere with, the skill required for the other keyboard.  Such a factor should be assigned between-subjects. If interference is not anticipated, or if the effect is minimal and easily mitigated through a few minutes of practice when a participant changes conditions, then a within-subjects design should be considered.  However, one additional effect must be accounted for: learning .  Learning effects are due to the order of presentation .  They are in some sense the opposite of interference.  For example, if participants are tested under condition A first, then under condition B, they could potentially exhibit better performance under condition B simply due to prior practice under condition A.  To compensate for this, a technique known as counterbalancing is used.  Counterbalancing is performed by placing participants in groups and presenting conditions to each group in a different order.  The order is given by a Latin Square. Latin Squares If the factor has two levels (e.g., A and B), participants are randomly assigned to groups of equal size: Group 1 is given condition A followed by condition B, while Group 2 is given condition B followed by condition A: → time This is a trivial example of a Latin Square , known, in this case, as a 2 × 2 Latin Square.  If three or more conditions are tested, then a bit more planning is required.  Examples of 3 × 3, 4 × 4, and 5 × 5 Latin Squares follow: 3 × 3 Latin Square 4 × 4 Latin Square 5 × 5 Latin Square Look carefully, and the pattern is easily seen.  The defining characteristic of a Latin Square is that a condition appears precisely once in each row and in each column.  Because the order of presentation is different for each group of participants, the learning effect noted earlier tends to balance out. To ensure the groups are of equal size, the number of participants in the experiment should be a multiple of the number of conditions.  For example, 16 participants could be used in an experiment comparing 4 devices.  The participants are divided into four groups of four, and each group is assigned to one of the rows in the Latin Square.  There are different arrangements of conditions for Latin Squares of sizes 4 × 4 or larger.  An important variation is presented next. Balanced Latin Squares Counterbalancing conditions using a Latin Square does not fully eliminate the learning effect noted earlier.  Note in the 3 × 3 design that condition B follows condition A for two of the three groups of participants.  Similarly, in the 4 × 4 design, condition B follows condition A for three of the four groups of participants.  Thus, there is a tendency for better performance on condition B simply because most participants benefited from practice on condition A prior to testing on condition B.  This phenomenon is eliminated using a balanced Latin Square .  A 4 × 4 balanced Latin Square follows: 4 × 4 Balanced Latin Square Note that each condition appears precisely once in each row and column, as before.  Furthermore, each condition appears before and after each other condition an equal number of times.  For example, condition B follows condition A two times and it also precedes condition A two times.  Thus, the imbalance noted earlier is eliminated. Balanced Latin Squares do not exist for odd-order squares, such as 3 × 3, 5 × 5, etc.  However, it is possible to construct a balanced Latin square for any even number of conditions.  Here's the rubric: The 1st column is in order, starting at A.  The top row has the sequence, A, B, n , C, n - 1, D, n - 2, etc.  Entries in the 2nd and subsequent column are in order, with wrap around.  The 6 × 6 cases looks as follows: 6 × 6 Balanced Latin Square In practice, within-subjects factors with more than four levels are rare in HCI experiments, so the 6 × 6 example is given just for curiosity.  For a good discussion on counterbalancing, see Martin (1996). Did Counterbalancing Work? When counterbalancing is used to assign levels of a within-subjects factor to participants, it is a good idea to verify that the desired effect was achieved – that learning was effectively canceled out (i.e., balanced).  This is verified by including "group" as a between-subjects factor in the analysis of variance.  The desired outcome is a non-significant "group effect".  This is typically what occurs.  However, sometimes the group effect is statistically significant.  The interpretation, in this case, is that counterbalancing did not work!  This is an unfortunate outcome.  The effect arises because of a particularly insidious form of leaning or interference known as asymmetric skill transfer (Poulton & Freeman, 1966).  In this case, there is not only a learning or interference effect, but the effect is different depending on the order of testing.  In other words, one condition tended to benefit or suffer more than others, depending on the preceding condition(s).  If there is any sense that asymmetrical skill transfer might occur, then it is best to assign the factor between-subjects. An example of asymmetric skill transfer from a published HCI paper is described in detail in Chapter 5 in the MacKenzie reference below. References MacKenzie, I. S. (2013). Human-computer interaction: An empirical research perspective . Waltham, MA: Morgan Kaufmann. Martin, D. W. (1996). Doing psychology experiments . (4th ed.). Pacific Grove, CA: Brooks/Cole. Poulton, E. C., & Freeman, P. R. (1966). Unwanted asymmetrical transfer effects with balanced experimental designs. Psychological Bulletin , 66 (1-8).

Frequently asked questions

What’s the difference between within-subjects and between-subjects designs.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

Frequently asked questions: Methodology

Attrition refers to participants leaving a study. It always happens to some extent—for example, in randomized controlled trials for medical research.

Differential attrition occurs when attrition or dropout rates differ systematically between the intervention and the control group . As a result, the characteristics of the participants who drop out differ from the characteristics of those who stay in the study. Because of this, study results may be biased .

Action research is conducted in order to solve a particular issue immediately, while case studies are often conducted over a longer period of time and focus more on observing and analyzing a particular ongoing phenomenon.

Action research is focused on solving a problem or informing individual and community-based knowledge in a way that impacts teaching, learning, and other related processes. It is less focused on contributing theoretical input, instead producing actionable input.

Action research is particularly popular with educators as a form of systematic inquiry because it prioritizes reflection and bridges the gap between theory and practice. Educators are able to simultaneously investigate an issue as they solve it, and the method is very iterative and flexible.

A cycle of inquiry is another name for action research . It is usually visualized in a spiral shape following a series of steps, such as “planning → acting → observing → reflecting.”

To make quantitative observations , you need to use instruments that are capable of measuring the quantity you want to observe. For example, you might use a ruler to measure the length of an object or a thermometer to measure its temperature.

Criterion validity and construct validity are both types of measurement validity . In other words, they both show you how accurately a method measures something.

While construct validity is the degree to which a test or other measurement method measures what it claims to measure, criterion validity is the degree to which a test can predictively (in the future) or concurrently (in the present) measure something.

Construct validity is often considered the overarching type of measurement validity . You need to have face validity , content validity , and criterion validity in order to achieve construct validity.

Convergent validity and discriminant validity are both subtypes of construct validity . Together, they help you evaluate whether a test measures the concept it was designed to measure.

  • Convergent validity indicates whether a test that is designed to measure a particular construct correlates with other tests that assess the same or similar construct.
  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related. This type of validity is also called divergent validity .

You need to assess both in order to demonstrate construct validity. Neither one alone is sufficient for establishing construct validity.

  • Discriminant validity indicates whether two tests that should not be highly related to each other are indeed not related

Content validity shows you how accurately a test or other measurement method taps  into the various aspects of the specific construct you are researching.

In other words, it helps you answer the question: “does the test measure all aspects of the construct I want to measure?” If it does, then the test has high content validity.

The higher the content validity, the more accurate the measurement of the construct.

If the test fails to include parts of the construct, or irrelevant parts are included, the validity of the instrument is threatened, which brings your results into question.

Face validity and content validity are similar in that they both evaluate how suitable the content of a test is. The difference is that face validity is subjective, and assesses content at surface level.

When a test has strong face validity, anyone would agree that the test’s questions appear to measure what they are intended to measure.

For example, looking at a 4th grade math test consisting of problems in which students have to add and multiply, most people would agree that it has strong face validity (i.e., it looks like a math test).

On the other hand, content validity evaluates how well a test represents all the aspects of a topic. Assessing content validity is more systematic and relies on expert evaluation. of each question, analyzing whether each one covers the aspects that the test was designed to cover.

A 4th grade math test would have high content validity if it covered all the skills taught in that grade. Experts(in this case, math teachers), would have to evaluate the content validity by comparing the test to the learning objectives.

Snowball sampling is a non-probability sampling method . Unlike probability sampling (which involves some form of random selection ), the initial individuals selected to be studied are the ones who recruit new participants.

Because not every member of the target population has an equal chance of being recruited into the sample, selection in snowball sampling is non-random.

Snowball sampling is a non-probability sampling method , where there is not an equal chance for every member of the population to be included in the sample .

This means that you cannot use inferential statistics and make generalizations —often the goal of quantitative research . As such, a snowball sample is not representative of the target population and is usually a better fit for qualitative research .

Snowball sampling relies on the use of referrals. Here, the researcher recruits one or more initial participants, who then recruit the next ones.

Participants share similar characteristics and/or know each other. Because of this, not every member of the population has an equal chance of being included in the sample, giving rise to sampling bias .

Snowball sampling is best used in the following cases:

  • If there is no sampling frame available (e.g., people with a rare disease)
  • If the population of interest is hard to access or locate (e.g., people experiencing homelessness)
  • If the research focuses on a sensitive topic (e.g., extramarital affairs)

The reproducibility and replicability of a study can be ensured by writing a transparent, detailed method section and using clear, unambiguous language.

Reproducibility and replicability are related terms.

  • Reproducing research entails reanalyzing the existing data in the same manner.
  • Replicating (or repeating ) the research entails reconducting the entire analysis, including the collection of new data . 
  • A successful reproduction shows that the data analyses were conducted in a fair and honest manner.
  • A successful replication shows that the reliability of the results is high.

Stratified sampling and quota sampling both involve dividing the population into subgroups and selecting units from each subgroup. The purpose in both cases is to select a representative sample and/or to allow comparisons between subgroups.

The main difference is that in stratified sampling, you draw a random sample from each subgroup ( probability sampling ). In quota sampling you select a predetermined number or proportion of units, in a non-random manner ( non-probability sampling ).

Purposive and convenience sampling are both sampling methods that are typically used in qualitative data collection.

A convenience sample is drawn from a source that is conveniently accessible to the researcher. Convenience sampling does not distinguish characteristics among the participants. On the other hand, purposive sampling focuses on selecting participants possessing characteristics associated with the research study.

The findings of studies based on either convenience or purposive sampling can only be generalized to the (sub)population from which the sample is drawn, and not to the entire population.

Random sampling or probability sampling is based on random selection. This means that each unit has an equal chance (i.e., equal probability) of being included in the sample.

On the other hand, convenience sampling involves stopping people at random, which means that not everyone has an equal chance of being selected depending on the place, time, or day you are collecting your data.

Convenience sampling and quota sampling are both non-probability sampling methods. They both use non-random criteria like availability, geographical proximity, or expert knowledge to recruit study participants.

However, in convenience sampling, you continue to sample units or cases until you reach the required sample size.

In quota sampling, you first need to divide your population of interest into subgroups (strata) and estimate their proportions (quota) in the population. Then you can start your data collection, using convenience sampling to recruit participants, until the proportions in each subgroup coincide with the estimated proportions in the population.

A sampling frame is a list of every member in the entire population . It is important that the sampling frame is as complete as possible, so that your sample accurately reflects your population.

Stratified and cluster sampling may look similar, but bear in mind that groups created in cluster sampling are heterogeneous , so the individual characteristics in the cluster vary. In contrast, groups created in stratified sampling are homogeneous , as units share characteristics.

Relatedly, in cluster sampling you randomly select entire groups and include all units of each group in your sample. However, in stratified sampling, you select some units of all groups and include them in your sample. In this way, both methods can ensure that your sample is representative of the target population .

A systematic review is secondary research because it uses existing research. You don’t collect new data yourself.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

An observational study is a great choice for you if your research question is based purely on observations. If there are ethical, logistical, or practical concerns that prevent you from conducting a traditional experiment , an observational study may be a good choice. In an observational study, there is no interference or manipulation of the research subjects, as well as no control or treatment groups .

It’s often best to ask a variety of people to review your measurements. You can ask experts, such as other researchers, or laypeople, such as potential participants, to judge the face validity of tests.

While experts have a deep understanding of research methods , the people you’re studying can provide you with valuable insights you may have missed otherwise.

Face validity is important because it’s a simple first step to measuring the overall validity of a test or technique. It’s a relatively intuitive, quick, and easy way to start checking whether a new measure seems useful at first glance.

Good face validity means that anyone who reviews your measure says that it seems to be measuring what it’s supposed to. With poor face validity, someone reviewing your measure may be left confused about what you’re measuring and why you’re using this method.

Face validity is about whether a test appears to measure what it’s supposed to measure. This type of validity is concerned with whether a measure seems relevant and appropriate for what it’s assessing only on the surface.

Statistical analyses are often applied to test validity with data from your measures. You test convergent validity and discriminant validity with correlations to see if results from your test are positively or negatively related to those of other established tests.

You can also use regression analyses to assess whether your measure is actually predictive of outcomes that you expect it to predict theoretically. A regression analysis that supports your expectations strengthens your claim of construct validity .

When designing or evaluating a measure, construct validity helps you ensure you’re actually measuring the construct you’re interested in. If you don’t have construct validity, you may inadvertently measure unrelated or distinct constructs and lose precision in your research.

Construct validity is often considered the overarching type of measurement validity ,  because it covers all of the other types. You need to have face validity , content validity , and criterion validity to achieve construct validity.

Construct validity is about how well a test measures the concept it was designed to evaluate. It’s one of four types of measurement validity , which includes construct validity, face validity , and criterion validity.

There are two subtypes of construct validity.

  • Convergent validity : The extent to which your measure corresponds to measures of related constructs
  • Discriminant validity : The extent to which your measure is unrelated or negatively related to measures of distinct constructs

Naturalistic observation is a valuable tool because of its flexibility, external validity , and suitability for topics that can’t be studied in a lab setting.

The downsides of naturalistic observation include its lack of scientific control , ethical considerations , and potential for bias from observers and subjects.

Naturalistic observation is a qualitative research method where you record the behaviors of your research subjects in real world settings. You avoid interfering or influencing anything in a naturalistic observation.

You can think of naturalistic observation as “people watching” with a purpose.

A dependent variable is what changes as a result of the independent variable manipulation in experiments . It’s what you’re interested in measuring, and it “depends” on your independent variable.

In statistics, dependent variables are also called:

  • Response variables (they respond to a change in another variable)
  • Outcome variables (they represent the outcome you want to measure)
  • Left-hand-side variables (they appear on the left-hand side of a regression equation)

An independent variable is the variable you manipulate, control, or vary in an experimental study to explore its effects. It’s called “independent” because it’s not influenced by any other variables in the study.

Independent variables are also called:

  • Explanatory variables (they explain an event or outcome)
  • Predictor variables (they can be used to predict the value of a dependent variable)
  • Right-hand-side variables (they appear on the right-hand side of a regression equation).

As a rule of thumb, questions related to thoughts, beliefs, and feelings work well in focus groups. Take your time formulating strong questions, paying special attention to phrasing. Be careful to avoid leading questions , which can bias your responses.

Overall, your focus group questions should be:

  • Open-ended and flexible
  • Impossible to answer with “yes” or “no” (questions that start with “why” or “how” are often best)
  • Unambiguous, getting straight to the point while still stimulating discussion
  • Unbiased and neutral

A structured interview is a data collection method that relies on asking questions in a set order to collect data on a topic. They are often quantitative in nature. Structured interviews are best used when: 

  • You already have a very clear understanding of your topic. Perhaps significant research has already been conducted, or you have done some prior research yourself, but you already possess a baseline for designing strong structured questions.
  • You are constrained in terms of time or resources and need to analyze your data quickly and efficiently.
  • Your research question depends on strong parity between participants, with environmental conditions held constant.

More flexible interview options include semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias is the tendency for interview participants to give responses that will be viewed favorably by the interviewer or other participants. It occurs in all types of interviews and surveys , but is most common in semi-structured interviews , unstructured interviews , and focus groups .

Social desirability bias can be mitigated by ensuring participants feel at ease and comfortable sharing their views. Make sure to pay attention to your own body language and any physical or verbal cues, such as nodding or widening your eyes.

This type of bias can also occur in observations if the participants know they’re being observed. They might alter their behavior accordingly.

The interviewer effect is a type of bias that emerges when a characteristic of an interviewer (race, age, gender identity, etc.) influences the responses given by the interviewee.

There is a risk of an interviewer effect in all types of interviews , but it can be mitigated by writing really high-quality interview questions.

A semi-structured interview is a blend of structured and unstructured types of interviews. Semi-structured interviews are best used when:

  • You have prior interview experience. Spontaneous questions are deceptively challenging, and it’s easy to accidentally ask a leading question or make a participant uncomfortable.
  • Your research question is exploratory in nature. Participant answers can guide future research questions and help you develop a more robust knowledge base for future research.

An unstructured interview is the most flexible type of interview, but it is not always the best fit for your research topic.

Unstructured interviews are best used when:

  • You are an experienced interviewer and have a very strong background in your research topic, since it is challenging to ask spontaneous, colloquial questions.
  • Your research question is exploratory in nature. While you may have developed hypotheses, you are open to discovering new or shifting viewpoints through the interview process.
  • You are seeking descriptive data, and are ready to ask questions that will deepen and contextualize your initial thoughts and hypotheses.
  • Your research depends on forming connections with your participants and making them feel comfortable revealing deeper emotions, lived experiences, or thoughts.

The four most common types of interviews are:

  • Structured interviews : The questions are predetermined in both topic and order. 
  • Semi-structured interviews : A few questions are predetermined, but other questions aren’t planned.
  • Unstructured interviews : None of the questions are predetermined.
  • Focus group interviews : The questions are presented to a group instead of one individual.

Deductive reasoning is commonly used in scientific research, and it’s especially associated with quantitative research .

In research, you might have come across something called the hypothetico-deductive method . It’s the scientific method of testing hypotheses to check whether your predictions are substantiated by real-world data.

Deductive reasoning is a logical approach where you progress from general ideas to specific conclusions. It’s often contrasted with inductive reasoning , where you start with specific observations and form general conclusions.

Deductive reasoning is also called deductive logic.

There are many different types of inductive reasoning that people use formally or informally.

Here are a few common types:

  • Inductive generalization : You use observations about a sample to come to a conclusion about the population it came from.
  • Statistical generalization: You use specific numbers about samples to make statements about populations.
  • Causal reasoning: You make cause-and-effect links between different things.
  • Sign reasoning: You make a conclusion about a correlational relationship between different things.
  • Analogical reasoning: You make a conclusion about something based on its similarities to something else.

Inductive reasoning is a bottom-up approach, while deductive reasoning is top-down.

Inductive reasoning takes you from the specific to the general, while in deductive reasoning, you make inferences by going from general premises to specific conclusions.

In inductive research , you start by making observations or gathering data. Then, you take a broad scan of your data and search for patterns. Finally, you make general conclusions that you might incorporate into theories.

Inductive reasoning is a method of drawing conclusions by going from the specific to the general. It’s usually contrasted with deductive reasoning, where you proceed from general information to specific conclusions.

Inductive reasoning is also called inductive logic or bottom-up reasoning.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Triangulation can help:

  • Reduce research bias that comes from using a single method, theory, or investigator
  • Enhance validity by approaching the same topic with different tools
  • Establish credibility by giving you a complete picture of the research problem

But triangulation can also pose problems:

  • It’s time-consuming and labor-intensive, often involving an interdisciplinary team.
  • Your results may be inconsistent or even contradictory.

There are four main types of triangulation :

  • Data triangulation : Using data from different times, spaces, and people
  • Investigator triangulation : Involving multiple researchers in collecting or analyzing data
  • Theory triangulation : Using varying theoretical perspectives in your research
  • Methodological triangulation : Using different methodologies to approach the same topic

Many academic fields use peer review , largely to determine whether a manuscript is suitable for publication. Peer review enhances the credibility of the published manuscript.

However, peer review is also common in non-academic settings. The United Nations, the European Union, and many individual nations use peer review to evaluate grant applications. It is also widely used in medical and health-related fields as a teaching or quality-of-care measure. 

Peer assessment is often used in the classroom as a pedagogical tool. Both receiving feedback and providing it are thought to enhance the learning process, helping students think critically and collaboratively.

Peer review can stop obviously problematic, falsified, or otherwise untrustworthy research from being published. It also represents an excellent opportunity to get feedback from renowned experts in your field. It acts as a first defense, helping you ensure your argument is clear and that there are no gaps, vague terms, or unanswered questions for readers who weren’t involved in the research process.

Peer-reviewed articles are considered a highly credible source due to this stringent process they go through before publication.

In general, the peer review process follows the following steps: 

  • First, the author submits the manuscript to the editor.
  • Reject the manuscript and send it back to author, or 
  • Send it onward to the selected peer reviewer(s) 
  • Next, the peer review process occurs. The reviewer provides feedback, addressing any major or minor issues with the manuscript, and gives their advice regarding what edits should be made. 
  • Lastly, the edited manuscript is sent back to the author. They input the edits, and resubmit it to the editor for publication.

Exploratory research is often used when the issue you’re studying is new or when the data collection process is challenging for some reason.

You can use exploratory research if you have a general idea or a specific question that you want to study but there is no preexisting knowledge or paradigm with which to study it.

Exploratory research is a methodology approach that explores research questions that have not previously been studied in depth. It is often used when the issue you’re studying is new, or the data collection process is challenging in some way.

Explanatory research is used to investigate how or why a phenomenon occurs. Therefore, this type of research is often one of the first stages in the research process , serving as a jumping-off point for future research.

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

Explanatory research is a research method used to investigate how or why something occurs when only a small amount of information is available pertaining to that topic. It can help you increase your understanding of a given topic.

Clean data are valid, accurate, complete, consistent, unique, and uniform. Dirty data include inconsistencies and errors.

Dirty data can come from any part of the research process, including poor research design , inappropriate measurement materials, or flawed data entry.

Data cleaning takes place between data collection and data analyses. But you can use some methods even before collecting data.

For clean data, you should start by designing measures that collect valid data. Data validation at the time of data entry or collection helps you minimize the amount of data cleaning you’ll need to do.

After data collection, you can use data standardization and data transformation to clean your data. You’ll also deal with any missing values, outliers, and duplicate values.

Every dataset requires different techniques to clean dirty data , but you need to address these issues in a systematic way. You focus on finding and resolving data points that don’t agree or fit with the rest of your dataset.

These data might be missing values, outliers, duplicate values, incorrectly formatted, or irrelevant. You’ll start with screening and diagnosing your data. Then, you’ll often standardize and accept or remove data to make your dataset consistent and valid.

Data cleaning is necessary for valid and appropriate analyses. Dirty data contain inconsistencies or errors , but cleaning your data helps you minimize or resolve these.

Without data cleaning, you could end up with a Type I or II error in your conclusion. These types of erroneous conclusions can be practically significant with important consequences, because they lead to misplaced investments or missed opportunities.

Data cleaning involves spotting and resolving potential data inconsistencies or errors to improve your data quality. An error is any value (e.g., recorded weight) that doesn’t reflect the true value (e.g., actual weight) of something that’s being measured.

In this process, you review, analyze, detect, modify, or remove “dirty” data to make your dataset “clean.” Data cleaning is also called data cleansing or data scrubbing.

Research misconduct means making up or falsifying data, manipulating data analyses, or misrepresenting results in research reports. It’s a form of academic fraud.

These actions are committed intentionally and can have serious consequences; research misconduct is not a simple mistake or a point of disagreement but a serious ethical failure.

Anonymity means you don’t know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations .

You can only guarantee anonymity by not collecting any personally identifying information—for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos.

You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals.

Research ethics matter for scientific integrity, human rights and dignity, and collaboration between science and society. These principles make sure that participation in studies is voluntary, informed, and safe.

Ethical considerations in research are a set of principles that guide your research designs and practices. These principles include voluntary participation, informed consent, anonymity, confidentiality, potential for harm, and results communication.

Scientists and researchers must always adhere to a certain code of conduct when collecting data from others .

These considerations protect the rights of research participants, enhance research validity , and maintain scientific integrity.

In multistage sampling , you can use probability or non-probability sampling methods .

For a probability sample, you have to conduct probability sampling at every stage.

You can mix it up by using simple random sampling , systematic sampling , or stratified sampling to select units at different stages, depending on what is applicable and relevant to your study.

Multistage sampling can simplify data collection when you have large, geographically spread samples, and you can obtain a probability sample without a complete sampling frame.

But multistage sampling may not lead to a representative sample, and larger samples are needed for multistage samples to achieve the statistical properties of simple random samples .

These are four of the most common mixed methods designs :

  • Convergent parallel: Quantitative and qualitative data are collected at the same time and analyzed separately. After both analyses are complete, compare your results to draw overall conclusions. 
  • Embedded: Quantitative and qualitative data are collected at the same time, but within a larger quantitative or qualitative design. One type of data is secondary to the other.
  • Explanatory sequential: Quantitative data is collected and analyzed first, followed by qualitative data. You can use this design if you think your qualitative data will explain and contextualize your quantitative findings.
  • Exploratory sequential: Qualitative data is collected and analyzed first, followed by quantitative data. You can use this design if you think the quantitative data will confirm or validate your qualitative findings.

Triangulation in research means using multiple datasets, methods, theories and/or investigators to address a research question. It’s a research strategy that can help you enhance the validity and credibility of your findings.

Triangulation is mainly used in qualitative research , but it’s also commonly applied in quantitative research . Mixed methods research always uses triangulation.

In multistage sampling , or multistage cluster sampling, you draw a sample from a population using smaller and smaller groups at each stage.

This method is often used to collect data from a large, geographically spread group of people in national surveys, for example. You take advantage of hierarchical groupings (e.g., from state to city to neighborhood) to create a sample that’s less expensive and time-consuming to collect data from.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

Quantitative research designs can be divided into two main categories:

  • Correlational and descriptive designs are used to investigate characteristics, averages, trends, and associations between variables.
  • Experimental and quasi-experimental designs are used to test causal relationships .

Qualitative research designs tend to be more flexible. Common types of qualitative design include case study , ethnography , and grounded theory designs.

A well-planned research design helps ensure that your methods match your research aims, that you collect high-quality data, and that you use the right kind of analysis to answer your questions, utilizing credible sources . This allows you to draw valid , trustworthy conclusions.

The priorities of a research design can vary depending on the field, but you usually have to specify:

  • Your research questions and/or hypotheses
  • Your overall approach (e.g., qualitative or quantitative )
  • The type of design you’re using (e.g., a survey , experiment , or case study )
  • Your sampling methods or criteria for selecting subjects
  • Your data collection methods (e.g., questionnaires , observations)
  • Your data collection procedures (e.g., operationalization , timing and data management)
  • Your data analysis methods (e.g., statistical tests  or thematic analysis )

A research design is a strategy for answering your   research question . It defines your overall approach and determines how you will collect and analyze data.

Questionnaires can be self-administered or researcher-administered.

Self-administered questionnaires can be delivered online or in paper-and-pen formats, in person or through mail. All questions are standardized so that all respondents receive the same questions with identical wording.

Researcher-administered questionnaires are interviews that take place by phone, in-person, or online between researchers and respondents. You can gain deeper insights by clarifying questions for respondents or asking follow-up questions.

You can organize the questions logically, with a clear progression from simple to complex, or randomly between respondents. A logical flow helps respondents process the questionnaire easier and quicker, but it may lead to bias. Randomization can minimize the bias from order effects.

Closed-ended, or restricted-choice, questions offer respondents a fixed set of choices to select from. These questions are easier to answer quickly.

Open-ended or long-form questions allow respondents to answer in their own words. Because there are no restrictions on their choices, respondents can answer in ways that researchers may not have otherwise considered.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

The third variable and directionality problems are two main reasons why correlation isn’t causation .

The third variable problem means that a confounding variable affects both variables to make them seem causally related when they are not.

The directionality problem is when two variables correlate and might actually have a causal relationship, but it’s impossible to conclude which variable causes changes in the other.

Correlation describes an association between variables : when one variable changes, so does the other. A correlation is a statistical indicator of the relationship between variables.

Causation means that changes in one variable brings about changes in the other (i.e., there is a cause-and-effect relationship between variables). The two variables are correlated with each other, and there’s also a causal link between them.

While causation and correlation can exist simultaneously, correlation does not imply causation. In other words, correlation is simply a relationship where A relates to B—but A doesn’t necessarily cause B to happen (or vice versa). Mistaking correlation for causation is a common error and can lead to false cause fallacy .

Controlled experiments establish causality, whereas correlational studies only show associations between variables.

  • In an experimental design , you manipulate an independent variable and measure its effect on a dependent variable. Other variables are controlled so they can’t impact the results.
  • In a correlational design , you measure variables without manipulating any of them. You can test whether your variables change together, but you can’t be sure that one variable caused a change in another.

In general, correlational research is high in external validity while experimental research is high in internal validity .

A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

A correlational research design investigates relationships between two variables (or more) without the researcher controlling or manipulating any of them. It’s a non-experimental type of quantitative research .

A correlation reflects the strength and/or direction of the association between two or more variables.

  • A positive correlation means that both variables change in the same direction.
  • A negative correlation means that the variables change in opposite directions.
  • A zero correlation means there’s no relationship between the variables.

Random error  is almost always present in scientific studies, even in highly controlled settings. While you can’t eradicate it completely, you can reduce random error by taking repeated measurements, using a large sample, and controlling extraneous variables .

You can avoid systematic error through careful design of your sampling , data collection , and analysis procedures. For example, use triangulation to measure your variables using multiple methods; regularly calibrate instruments or procedures; use random sampling and random assignment ; and apply masking (blinding) where possible.

Systematic error is generally a bigger problem in research.

With random error, multiple measurements will tend to cluster around the true value. When you’re collecting data from a large sample , the errors in different directions will cancel each other out.

Systematic errors are much more problematic because they can skew your data away from the true value. This can lead you to false conclusions ( Type I and II errors ) about the relationship between the variables you’re studying.

Random and systematic error are two types of measurement error.

Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale records an incorrect measurement).

Systematic error is a consistent or proportional difference between the observed and true values of something (e.g., a miscalibrated scale consistently records weights as higher than they actually are).

On graphs, the explanatory variable is conventionally placed on the x-axis, while the response variable is placed on the y-axis.

  • If you have quantitative variables , use a scatterplot or a line graph.
  • If your response variable is categorical, use a scatterplot or a line graph.
  • If your explanatory variable is categorical, use a bar graph.

The term “ explanatory variable ” is sometimes preferred over “ independent variable ” because, in real world contexts, independent variables are often influenced by other variables. This means they aren’t totally independent.

Multiple independent variables may also be correlated with each other, so “explanatory variables” is a more appropriate term.

The difference between explanatory and response variables is simple:

  • An explanatory variable is the expected cause, and it explains the results.
  • A response variable is the expected effect, and it responds to other variables.

In a controlled experiment , all extraneous variables are held constant so that they can’t influence the results. Controlled experiments require:

  • A control group that receives a standard treatment, a fake treatment, or no treatment.
  • Random assignment of participants to ensure the groups are equivalent.

Depending on your study topic, there are various other methods of controlling variables .

There are 4 main types of extraneous variables :

  • Demand characteristics : environmental cues that encourage participants to conform to researchers’ expectations.
  • Experimenter effects : unintentional actions by researchers that influence study outcomes.
  • Situational variables : environmental variables that alter participants’ behaviors.
  • Participant variables : any characteristic or aspect of a participant’s background that could affect study results.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

In a factorial design, multiple independent variables are tested.

If you test two variables, each level of one independent variable is combined with each level of the other independent variable to create different conditions.

Within-subjects designs have many potential threats to internal validity , but they are also very statistically powerful .

Advantages:

  • Only requires small samples
  • Statistically powerful
  • Removes the effects of individual differences on the outcomes

Disadvantages:

  • Internal validity threats reduce the likelihood of establishing a direct relationship between variables
  • Time-related effects, such as growth, can influence the outcomes
  • Carryover effects mean that the specific order of different treatments affect the outcomes

While a between-subjects design has fewer threats to internal validity , it also requires more participants for high statistical power than a within-subjects design .

  • Prevents carryover effects of learning and fatigue.
  • Shorter study duration.
  • Needs larger samples for high power.
  • Uses more resources to recruit participants, administer sessions, cover costs, etc.
  • Individual differences may be an alternative explanation for results.

Yes. Between-subjects and within-subjects designs can be combined in a single study when you have two or more independent variables (a factorial design). In a mixed factorial design, one variable is altered between subjects and another is altered within subjects.

Random assignment is used in experiments with a between-groups or independent measures design. In this research design, there’s usually a control group and one or more experimental groups. Random assignment helps ensure that the groups are comparable.

In general, you should always use random assignment in this type of experimental design when it is ethically possible and makes sense for your study topic.

To implement random assignment , assign a unique number to every member of your study’s sample .

Then, you can use a random number generator or a lottery method to randomly assign each number to a control or experimental group. You can also do so manually, by flipping a coin or rolling a dice to randomly assign participants to groups.

Random selection, or random sampling , is a way of selecting members of a population for your study’s sample.

In contrast, random assignment is a way of sorting the sample into control and experimental groups.

Random sampling enhances the external validity or generalizability of your results, while random assignment improves the internal validity of your study.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

“Controlling for a variable” means measuring extraneous variables and accounting for them statistically to remove their effects on other variables.

Researchers often model control variable data along with independent and dependent variable data in regression analyses and ANCOVAs . That way, you can isolate the control variable’s effects from the relationship between the variables of interest.

Control variables help you establish a correlational or causal relationship between variables by enhancing internal validity .

If you don’t control relevant extraneous variables , they may influence the outcomes of your study, and you may not be able to demonstrate that your results are really an effect of your independent variable .

A control variable is any variable that’s held constant in a research study. It’s not a variable of interest in the study, but it’s controlled because it could influence the outcomes.

Including mediators and moderators in your research helps you go beyond studying a simple relationship between two variables for a fuller picture of the real world. They are important to consider when studying complex correlational or causal relationships.

Mediators are part of the causal pathway of an effect, and they tell you how or why an effect takes place. Moderators usually help you judge the external validity of your study by identifying the limitations of when the relationship between variables holds.

If something is a mediating variable :

  • It’s caused by the independent variable .
  • It influences the dependent variable
  • When it’s taken into account, the statistical correlation between the independent and dependent variables is higher than when it isn’t considered.

A confounder is a third variable that affects variables of interest and makes them seem related when they are not. In contrast, a mediator is the mechanism of a relationship between two variables: it explains the process by which they are related.

A mediator variable explains the process through which two variables are related, while a moderator variable affects the strength and direction of that relationship.

There are three key steps in systematic sampling :

  • Define and list your population , ensuring that it is not ordered in a cyclical or periodic order.
  • Decide on your sample size and calculate your interval, k , by dividing your population by your target sample size.
  • Choose every k th member of the population as your sample.

Systematic sampling is a probability sampling method where researchers select members of the population at a regular interval – for example, by selecting every 15th person on a list of the population. If the population is in a random order, this can imitate the benefits of simple random sampling .

Yes, you can create a stratified sample using multiple characteristics, but you must ensure that every participant in your study belongs to one and only one subgroup. In this case, you multiply the numbers of subgroups for each characteristic to get the total number of groups.

For example, if you were stratifying by location with three subgroups (urban, rural, or suburban) and marital status with five subgroups (single, divorced, widowed, married, or partnered), you would have 3 x 5 = 15 subgroups.

You should use stratified sampling when your sample can be divided into mutually exclusive and exhaustive subgroups that you believe will take on different mean values for the variable that you’re studying.

Using stratified sampling will allow you to obtain more precise (with lower variance ) statistical estimates of whatever you are trying to measure.

For example, say you want to investigate how income differs based on educational attainment, but you know that this relationship can vary based on race. Using stratified sampling, you can ensure you obtain a large enough sample from each racial group, allowing you to draw more precise conclusions.

In stratified sampling , researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment).

Once divided, each subgroup is randomly sampled using another probability sampling method.

Cluster sampling is more time- and cost-efficient than other probability sampling methods , particularly when it comes to large samples spread across a wide geographical area.

However, it provides less statistical certainty than other methods, such as simple random sampling , because it is difficult to ensure that your clusters properly represent the population as a whole.

There are three types of cluster sampling : single-stage, double-stage and multi-stage clustering. In all three types, you first divide the population into clusters, then randomly select clusters for use in your sample.

  • In single-stage sampling , you collect data from every unit within the selected clusters.
  • In double-stage sampling , you select a random sample of units from within the clusters.
  • In multi-stage sampling , you repeat the procedure of randomly sampling elements from within the clusters until you have reached a manageable sample.

Cluster sampling is a probability sampling method in which you divide a population into clusters, such as districts or schools, and then randomly select some of these clusters as your sample.

The clusters should ideally each be mini-representations of the population as a whole.

If properly implemented, simple random sampling is usually the best sampling method for ensuring both internal and external validity . However, it can sometimes be impractical and expensive to implement, depending on the size of the population to be studied,

If you have a list of every member of the population and the ability to reach whichever members are selected, you can use simple random sampling.

The American Community Survey  is an example of simple random sampling . In order to collect detailed data on the population of the US, the Census Bureau officials randomly select 3.5 million households per year and use a variety of methods to convince them to fill out the survey.

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population . Each member of the population has an equal chance of being selected. Data is then collected from as large a percentage as possible of this random subset.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

Blinding is important to reduce research bias (e.g., observer bias , demand characteristics ) and ensure a study’s internal validity .

If participants know whether they are in a control or treatment group , they may adjust their behavior in ways that affect the outcome that researchers are trying to measure. If the people administering the treatment are aware of group assignment, they may treat participants differently and thus directly or indirectly influence the final results.

  • In a single-blind study , only the participants are blinded.
  • In a double-blind study , both participants and experimenters are blinded.
  • In a triple-blind study , the assignment is hidden not only from participants and experimenters, but also from the researchers analyzing the data.

Blinding means hiding who is assigned to the treatment group and who is assigned to the control group in an experiment .

A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn’t receive the experimental treatment.

However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group’s outcomes before and after a treatment (instead of comparing outcomes between different groups).

For strong internal validity , it’s usually best to include a control group if possible. Without a control group, it’s harder to be certain that the outcome was caused by the experimental treatment and not by other variables.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Individual Likert-type questions are generally considered ordinal data , because the items have clear rank order, but don’t have an even distribution.

Overall Likert scale scores are sometimes treated as interval data. These scores are considered to have directionality and even spacing between them.

The type of data determines what statistical tests you should use to analyze your data.

A Likert scale is a rating scale that quantitatively assesses opinions, attitudes, or behaviors. It is made up of 4 or more questions that measure a single attitude or trait when response scores are combined.

To use a Likert scale in a survey , you present participants with Likert-type questions or statements, and a continuum of items, usually with 5 or 7 possible responses, to capture their degree of agreement.

In scientific research, concepts are the abstract ideas or phenomena that are being studied (e.g., educational achievement). Variables are properties or characteristics of the concept (e.g., performance at school), while indicators are ways of measuring or quantifying variables (e.g., yearly grade reports).

The process of turning abstract concepts into measurable variables and indicators is called operationalization .

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

There are five common approaches to qualitative research :

  • Grounded theory involves collecting data in order to develop new theories.
  • Ethnography involves immersing yourself in a group or organization to understand its culture.
  • Narrative research involves interpreting stories to understand how people make sense of their experiences and perceptions.
  • Phenomenological research involves investigating phenomena through people’s lived experiences.
  • Action research links theory and practice in several cycles to drive innovative changes.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Operationalization means turning abstract conceptual ideas into measurable observations.

For example, the concept of social anxiety isn’t directly observable, but it can be operationally defined in terms of self-rating scores, behavioral avoidance of crowded places, or physical anxiety symptoms in social situations.

Before collecting data , it’s important to consider how you will operationalize the variables that you want to measure.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
  • You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

Yes, but including more than one of either type requires multiple research questions .

For example, if you are interested in the effect of a diet on health, you can use multiple measures of health: blood sugar, blood pressure, weight, pulse, and many more. Each of these is its own dependent variable with its own research question.

You could also choose to look at the effect of exercise levels as well as diet, or even the additional effect of the two combined. Each of these is a separate independent variable .

To ensure the internal validity of an experiment , you should only change one independent variable at a time.

No. The value of a dependent variable depends on an independent variable, so a variable cannot be both independent and dependent at the same time. It must be either the cause or the effect, not both!

You want to find out how blood sugar levels are affected by drinking diet soda and regular soda, so you conduct an experiment .

  • The type of soda – diet or regular – is the independent variable .
  • The level of blood sugar that you measure is the dependent variable – it changes depending on the type of soda.

Determining cause and effect is one of the most important parts of scientific research. It’s essential to know which is the cause – the independent variable – and which is the effect – the dependent variable.

In non-probability sampling , the sample is selected based on non-random criteria, and not every member of the population has a chance of being included.

Common non-probability sampling methods include convenience sampling , voluntary response sampling, purposive sampling , snowball sampling, and quota sampling .

Probability sampling means that every member of the target population has a known chance of being included in the sample.

Probability sampling methods include simple random sampling , systematic sampling , stratified sampling , and cluster sampling .

Using careful research design and sampling procedures can help you avoid sampling bias . Oversampling can be used to correct undercoverage bias .

Some common types of sampling bias include self-selection bias , nonresponse bias , undercoverage bias , survivorship bias , pre-screening or advertising bias, and healthy user bias.

Sampling bias is a threat to external validity – it limits the generalizability of your findings to a broader group of people.

A sampling error is the difference between a population parameter and a sample statistic .

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

There are seven threats to external validity : selection bias , history, experimenter effect, Hawthorne effect , testing effect, aptitude-treatment and situation effect.

The two types of external validity are population validity (whether you can generalize to other groups of people) and ecological validity (whether you can generalize to other situations and settings).

The external validity of a study is the extent to which you can generalize your findings to different groups of people, situations, and measures.

Cross-sectional studies cannot establish a cause-and-effect relationship or analyze behavior over a period of time. To investigate cause and effect, you need to do a longitudinal study or an experimental study .

Cross-sectional studies are less expensive and time-consuming than many other types of study. They can provide useful insights into a population’s characteristics and identify correlations for further research.

Sometimes only cross-sectional data is available for analysis; other times your research question may only require a cross-sectional study to answer it.

Longitudinal studies can last anywhere from weeks to decades, although they tend to be at least a year long.

The 1970 British Cohort Study , which has collected data on the lives of 17,000 Brits since their births in 1970, is one well-known example of a longitudinal study .

Longitudinal studies are better to establish the correct sequence of events, identify changes over time, and provide insight into cause-and-effect relationships, but they also tend to be more expensive and time-consuming than other types of studies.

Longitudinal studies and cross-sectional studies are two different types of research design . In a cross-sectional study you collect data from a population at a specific point in time; in a longitudinal study you repeatedly collect data from the same sample over an extended period of time.

Longitudinal study Cross-sectional study
observations Observations at a in time
Observes the multiple times Observes (a “cross-section”) in the population
Follows in participants over time Provides of society at a given point

There are eight threats to internal validity : history, maturation, instrumentation, testing, selection bias , regression to the mean, social interaction and attrition .

Internal validity is the extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g. the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g. water volume or weight).

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

I nternal validity is the degree of confidence that the causal relationship you are testing is not influenced by other factors or variables .

External validity is the extent to which your results can be generalized to other contexts.

The validity of your experiment depends on your experimental design .

Reliability and validity are both about how well a method measures something:

  • Reliability refers to the  consistency of a measure (whether the results can be reproduced under the same conditions).
  • Validity   refers to the  accuracy of a measure (whether the results really do represent what they are supposed to measure).

If you are doing experimental research, you also have to consider the internal and external validity of your experiment.

A sample is a subset of individuals from a larger population . Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

Methodology refers to the overarching strategy and rationale of your research project . It involves studying the methods used in your field and the theories or principles behind them, in order to develop an approach that matches your objectives.

Methods are the specific tools and procedures you use to collect and analyze data (for example, experiments, surveys , and statistical tests ).

In shorter scientific papers, where the aim is to report the findings of a specific study, you might simply describe what you did in a methods section .

In a longer or more complex research project, such as a thesis or dissertation , you will probably include a methodology section , where you explain your approach to answering the research questions and cite relevant sources to support your choice of methods.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

Within-subjects designs

To now we have considered between-subjects experimental designs; that is, experimental set-ups where a single subject contributes a single observation to the data set. However, for good reason many researchers adopt within-subjects experimental designs: experimental designs where a single subject contributes more than one observation to the data. There are two primary advantages to within subject designs for us:

  • A within-subjects design allows us to get more data from each subject. Practically speaking, this means greater power per subject run. This means to reach a given power level, you need to run fewer subjects, saving time, energy, and money.
  • A within-subjects design allows you to remove variance due to between participant differences from the error term in the ANOVA. This is another way in which within-subjects designs increase power: each subject essentially serves as his/her own control.

Let’s go back to our simple \(3\times1\) one way experiment from Vasishth and Broe. We recorded 3 observations in each of three treatments:

A B C
9 10 8
1 3 1
4 5 2

If this were a between-subjects design, then each observation would come from a different participant, meaning we would have run nine subjects in total. Let’s run the ANOVA on the between-variants version of this data and see how it goes:

No signficant differences between groups.

Suppose though we had a within-subjects design instead of a between-subjects design.A within-subjects version of this same experiment might involve measuring a subject’s response to each of the treatments A, B, and C:

Subj A B C
S1 9 10 8
S2 1 3 1
S3 4 5 2

What goes wrong if we analyze this with our between-subjects ANOVA model?

\(x_{ij} = \mu + \alpha_i + \epsilon_{ij}\)

Well, let’s look at the errors \(\epsilon_{ij}\) that would arise with this model, given this data:

We can see a pretty striking pattern here. Within any subject, the errors are strongly correlated: within subject one, for instance, the errors are always strongly positive. This makes sense: it’s plausible that a subject who gives high scores in one condition will give high scores in another.

But this correlation of errors is not innocent. Remember that one of the core assumptions of ANOVA is that our errors are normally distributed and independent of each other. What’s worse, ANOVA is not robust to violations of the independence assumption. So the high correlation among the errors, due to the subject groupings in the data, is a subsantial violation of our assumptions. We’re also losing power unnecessarily: our error term, which contributes to the denominator in our F statistic, includes one source of variance, by subject variance, that is not really unexplained. We don’t really want that to be considered as part of our error term.

To deal with this situation, we’re going to expand our model to include the effect of subjects in the model:

\(x_{ij} = \mu + \alpha_i + \pi_j +\epsilon_{ij}\)

This looks very similar to our two way ANOVA model from last class: there is an effect of our treatment and an effect of condition. We are essentially treating the subject label as if it were a factor like any other.

Now, one thing to note is that the interaction of subject and treatment is essentially indistinguishable from the error term here, so we’ll just leave it out. (Though we will return below to the issue of treatment by subject interactions.)

As usual, our null hypothesis is whether the \(\alpha_i\) all are equal to zero.

There is an important difference between these models however. In the two-way between-subjects model, our two factors were fixed effects : they reprented fixed levels of the factors that we were interested in, and we built a model to account specifically for these effects. If we re-ran the experiment, we’d include exactly those same effects. The same is true of our treatments here. However, our factor of subject is not a fixed effect: we are not interested in estimating the behavior of this particular group of subjects. Their selection was random, and they are simply meant to represent the population from which they were drawn. If we re-ran the experiment, we’d likely get different subjects. Subject we will say, then, is a random factor . The model for our within-subjects design, then, is a mixed model, containing random and fixed factors.

So how do we test our null hypothesis with this model? Conceptually, we do it the same way as the between subjects design. We first partition the sum of squares into between and within treatment SS:

\(SS_{total} = SS_{between} + SS_{within}\)

The difference in our mixed model is that we can take the within group variance and break it down into two components:

\(SS_{within} = SS_{subjects} + SS_{error}\)

That is, there is within-group variance that we know to be the effect of subjects, and then there is random, unexplained within-group variance.

As before, we can calculate mean squared error for our treatment, and for our error effect now:

\(MS_{between} = \dfrac{SS_{between}}{a-1}\)

\(MS_{error} = \dfrac{SS_{error}}{(a-1)(n-1)}\)

Where a is the number of levels in our factor, and n is the number of subjects in the experiment. Our F -statistic is calculated as before:

\(F = \dfrac{MS_{between}}{MS_{error}}\)

To calculate this in R, we need to explicitly indicate to R that the error term needs to be split up:

The calculation of this ANOVA proceeds essentially the same as the one-way ANOVA. The only new term is:

\(SS_{subj} = a*\sum_j (\bar{x}_j - \bar{x})^2\)

That is, the sum of squares associated with the subject is the sum of the deviations of the subject means \(\bar{x}_j\) from the grand mean \(\bar{x}\) .

Earlier, I noted that the error term here was essentially the interaction of subject and treatment. This is in fact the case here, although it’s not explicit in the way we calculated it just now. We can check that this is the case, though. Remember how we calculated the SS for an interaction term in our two way ANOVA, given factors A and B:

\(SS_{A:B} = n\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_j - \bar{x}_i+\bar{x})^2\)

In the same fashion, we could calculate the interaction of subject and treatment in our data:

\(SS_{Treatment:S} = n\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_j - \bar{x}_i+\bar{x})^2\)

We can confirm that this SS gives us the same value for the error term that our ANOVA did:

A thing to note is that our usual post hoc comparison schemes don’t work currently with repeated measure ANOVAs in R. We’re limited, for the time being, to using paired t -tests to test for differences between treatment groups:

Let’s re-examine the NPI experiment from the other day, and now analyze it as a within-subjects experiment, using our repeated measures ANOVA. First, we might read in the raw data from the experiment:

This analysis allows us to establish that there is a reliable effect of our condition. Recall that we treated subject as a random factor in this model. Our significant F -test gives us license to generalize our effect to other levels of our random factor. In other words, if we drew some other random sample from the same population, we would expect that that same would exhibit the same response to our treatment.

Interestingly, since the 70’s psycholinguists have taken it that items in a psycholinguistic experiment of this sort can be treated as random factors. That is, we may want to think of the particular sentences that we tested in our experiment as random samples from a population of possible sentences, each instantiating some treatment of interest. From this point of view, we might be interested in asking whether our effect of condition would generalize to a new sample of items. In the previous example, we tested something similar for subjects by treating subject as a random factor in our model. We could in principle do the same thing for items, and run the analysis the same way: get the average RT by condition, for items, and then use item as a random factor in our ANOVA:

By running two separate analyses, one treating subjects as a random factor, and the other treating items as a random factor, we have some degree of confidence that the effect we observe will generalize to new items and subjects. In other words, we have reason to believe our effect isn’t due to our particular group of subjects or items. By-subject test stats are typically subscripted with 1, as in \(F_1\) or \(t_1\) . By-item stats are subscripted with 2: \(F_2\) or \(t_2\) .

At this point, we need to be clear about an additional statistical assumption that repeated measures ANOVAs make that our regular independent ANOVAs did not. Repeated measures ANOVAs assume that the variance of the differences between conditions is the same for all pairwise comparisons of levels within a factor . When this assumption is met, we say that sphericity holds. If you have more than three levels in a factor in a repeated measures ANOVA, it is customary to check that sphericity holds. If it does not, then we need to apply a correction to our F -statistic.

We can check sphericity easily by using a new package called ez :

ezANOVA will run an ANOVA on the data contained in a data frame if we specifiy the dependent variable ( dv=.(..) ), within-groups factors ( within=.(...) ), and between-groups factors ( between=.(...) . If there are within-groups factors, we should further specify which factor determines the groups ( wid=.(...) ). Thus, the call above says ’run a repeated measures ANOVA with Condition as a within-group fixed effect and with Subject as a random effect.

The first part of the printout has exactly the same information as we found in our regular aov analysis. It additionally gives us a generalized \(\eta^2\) measure ( ges ), which is a measure of effect size that is sometimes reported with the results of an ANOVA.

The second part has a statistical test called Mauchly’s test for sphericity. As the name suggests, this tests whether or not the sphericity assumption is met in our data. A significant effect at p < 0.05 indicates that sphericity is most likely not met in our data, meaning that we should be suspicious of the F -value we observe in our test.

Given that sphericity does not hold, we need to apply a correction to the degrees of freedom. On the third line, two different corrections are offered: the Greenhouse-Geisser ( GG ) correction, and the Huynh-Feldt ( HF ) correction. Both of these corrections provide a value \(\epsilon\) , which is the proposed correction to the degrees of freedom on each of the two corrections. Given one of these two \(\epsilon\) values, we can calculate corrected degrees of freedom, and so get an adjusted p -value:

The two corrections provide slightly different answers. A rough rule of thumb that lots of researchers use is to check the Greenhouse-Geisser \(\epsilon\) , and see if it is greater or less than .75. If it is greater than .75, then the Huynh-Feldt correction is recommended. Less than .75, and the Greenhouse-Geisser correction is recommended.

ezANOVA is a really useful tool. It can actually compute ANOVAs for you if you haven’t gone through the preprocessing steps to calculate cell means before analysis:

This makes it a snap to run by-subject and by-item ANOVAs:

It’s also possible to have ezANOVA explicitly calculate an aov object for you, and return it:

What about two-way, within-subjects designs?

Suppose we had two two-level factors, A and B, that we manipulated within subjects. As before, the strategy will be very much the same: factor out subject variance from the error term, and develop F tests based around the SS for each of our treatements and their interaction.

The output here looks a little different: what’s going on? When we break up the sum of squares for a two-way, within-subjects design, we make explicit something that was only implicit in the one way ANOVA: our error term represents the interaction of the treatment factor with the subject factor . That is, the variance that goes into the error term in the one way analysis repeated measures analysis can be thought of as the difference between the predicted effect of treatment, and the actual effect of treatment observed for any given subject. This makes intuitive sense of our F -statistic: for a repeated measures ANOVA analysis, we want to know how consistent our treatment effect is across levels of our random factor. Remember, our statistic is the mean squared error for our treatment divided by the mean squared error of the treatment by subject interaction. So to the extent that there is little variation in the effect across subjects, the F statistic will be large, and we will reject the null hypothesis.

In a two-way, within-subjects design, we split up the error term into interactions with subjects for each factor and their interaction :

\(SS_{total} = SS_{subj}+SS_{A}+SS_{A:subj}+SS_{B}+SS_{B:subj}+SS_{A:B}+SS_{A:B:subj}\)

The SS for interaction term for A:S, or the by-subject error associated with the A factor, is calculated just like the SS for an interaction in the two-way ANOVA:

\(SS_{A:subj} = b\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_i-\bar{x}_j+\bar{x})^2\)

Looping over i , levels of A, and j , levels of the subject factor; b is the number of levels in factor B.

In order to calculate MS for each of these terms, we divide by the relevant degrees of freedom. For the main effects of A and B and their interaction, the d.f. are as in the two way ANOVA: \((a-1)\) , \((b-1)\) , and \((a-1)(b-1)\) , respectively. The d.f. for the interactions with subjects are these error terms multiplied by \((n-1)\) , n the number of subjects:

\(MS_{A:subj} = MS_{A:subj}/((n-1)(a-1))\)

\(MS_{B:subj} = MS_{B:subj}/((n-1)(b-1))\)

\(MS_{A:B:subj} = MS_{A:B:subj}/((n-1)(b-1)(a-1))\)

Now we’re closer to understanding the output of our two way, repeated measures ANOVA table. The F tests for our main effects and our interaction here the mean squares associated with the factor divided by the mean squares of the interaction of that factor with subjects:

\(F = \dfrac{MS_{A}}{MS_{A:subj}}\)

\(F = \dfrac{MS_{B}}{MS_{B:subj}}\)

\(F = \dfrac{MS_{A:B}}{MS_{A:B:subj}}\)

We can use ezANOVA to do this for us as well:

And the results are printed in an ANOVA table that contains only the minimal information needed to report the ANOVA, along with the generalized \(\eta^2\) effect size.

We note that here we do not see any tests for sphericity; this is expected, because it is actually impossible to violate sphericity when there are only two levels in a factor ( Question : Why is this so?).

Study.com

In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.

IMAGES

  1. Within-Subjects Design

    define within subjects experiment

  2. Within-Subjects Design

    define within subjects experiment

  3. Within-Subjects Design

    define within subjects experiment

  4. PPT

    define within subjects experiment

  5. Within subject design

    define within subjects experiment

  6. PPT

    define within subjects experiment

VIDEO

  1. Within Subjects Designs Part 9

  2. BASIC RULES OF SUBJECT VERB AGREEMENT IN ENGLISH GRAMMAR| Examples and Explanation

  3. Within Subjects Experimental Design

  4. Within Subjects Designs Part 11

  5. Retrofuturism: punch cards control laptop running Common Lisp

  6. Research Investigation designs

COMMENTS

  1. Within-Subjects Design

    Within-Subjects Design | Explanation, Approaches, Examples. Published on March 29, 2021 by Pritha Bhandari.Revised on June 22, 2023. In experiments, a different independent variable treatment or manipulation is used in each condition to assess whether there is a cause-and-effect relationship with a dependent variable.. In a within-subjects design, or a within-groups design, all participants ...

  2. Within-Subjects Design: Definition and Examples

    A within-subjects design is a type of experimental design in which all participants are exposed to every treatment or condition. It is also known as a repeated measures design. The term "treatment" describes the different levels of the independent variable, the variable that the experimenter controls. In other words, all of the subjects in the ...

  3. Within-Subjects Design: Examples, Pros & Cons

    References. A within-subjects design is an experimental design in which the same group of participants is exposed to all independent variable levels. This design controls for individual differences and often requires fewer participants. A within-subjects design allows researchers to assign test participants to different treatment groups.

  4. Between-Subjects vs. Within-Subjects

    Between-subjects and within-subjects design both have an independent variable that is manipulated or controlled by the study's investigators and a dependent variable that is measured. Random assignment is essential for both types of designs. Design Differences. In a within-subjects design, all participants receive all treatments.

  5. Within-Subjects Design

    Within-Subjects Design | Explanation, Approaches, Examples. Published on 11 April 2022 by Pritha Bhandari. In experiments, a different independent variable treatment or manipulation is used in each condition to assess whether there is a cause-and-effect relationship with a dependent variable.. In a within-subjects design, or a within-groups design, all participants take part in every condition.

  6. Within-Subjects Design

    A within-subjects design, or a within-groups design, is a research design method of assigning participants to treatment groups. This type of design is also known as a repeated measures design. In ...

  7. 5.1: Within-Subjects Design

    Within-Subjects Design. In a within-subjects design, each participant is tested under all conditions.Consider an experiment on the effect of a defendant's physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of participants would be shown an attractive defendant and asked to judge his guilt, and another group of participants would be shown an ...

  8. Guide to Experimental Design

    Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.

  9. Experimental Design

    Within-subjects experiments also require fewer participants than between-subjects experiments to detect an effect of the same size. A good rule of thumb, then, is that if it is possible to conduct a within-subjects experiment (with proper counterbalancing) in the time that is available per participant—and you have no serious concerns about ...

  10. Within-subjects/ Repeated-measures Design (Video)

    A within-subjects, or repeated-measures, design is an experimental design where all the participants receive every level of the treatment, i.e., every independent variable. For example, in a candy taste test, the researcher would want every participant to taste and rate each type of candy. This video demonstrates a within-subjects experiment ...

  11. 6.2 Experimental Design

    Key Takeaways. Experiments can be conducted using either between-subjects or within-subjects designs. Deciding which to use in a particular situation requires careful consideration of the pros and cons of each approach. Random assignment to conditions in between-subjects experiments or to orders of conditions in within-subjects experiments is a ...

  12. Within-Subjects Design: Psychology Definition, History & Examples

    The within-subjects design, also known as the repeated measures design, has a rich historical background that can be traced back to the early experiments in psychology. It originated as researchers recognized the need for controlling extraneous variables in order to isolate the effects of the independent variable.

  13. Within-Subjects Design or Between-Subjects Design?

    Although every experiment should be designed according to its own unique set of criteria, below are the basic steps involved in using a within-subjects design. Define the research question - The first step in using a within-subjects design is to clearly define the research question and determine the specific variables of interest.

  14. Between-Subjects vs. Within-Subjects Study Design

    Between-subjects (or between-groups) study design: different people test each condition, so that each person is only exposed to a single user interface . Within-subjects (or repeated-measures) study design: the same person tests all the conditions (i.e., all the user interfaces). (Note that here we use the word "design" to refer to the ...

  15. 16.5: Experimental Designs

    A within-subjects design differs from a between-subjects design in that the same subjects perform at all levels of the independent variable. For example consider the "ADHD Treatment" case study. In this experiment, subjects diagnosed as having attention deficit disorder were each tested on a delay of gratification task after receiving ...

  16. 7.6: Within Subjects Design

    Within-Subjects Design. In a within-subjects, each participant is tested under all conditions. To gather subjects for this type of experiment you would use a non-probability sample. Consider an experiment on the effect of a defendant's physical attractiveness on judgments of his guilt. Again, in a between-subjects experiment, one group of ...

  17. Within-subjects vs. Between-subjects Designs: Which to Use?

    If an experiment seeks to investigate the acquisition of skill over multiple sessions of practice, then the only option for the factor session is within-subjects. No two ways about it! The factor is session, it is within-subjects, and the levels are session #1, session #2, session #3, and so on. However, in many other situations, there is a choice.

  18. What's the difference between within-subjects and between ...

    A true experiment (a.k.a. a controlled experiment) always includes at least one control group that doesn't receive the experimental treatment. However, some experiments use a within-subjects design to test treatments without a control group. In these designs, you usually compare one group's outcomes before and after a treatment (instead of ...

  19. Experimental methods: Between-subject and within-subject design

    In a "between-subject" designed experiment, each individual is exposed to only one treatment. With these types of designs, as long as group assignment is random, causal estimates are obtained by comparing the behavior of those in one experimental condition with the behavior of those in another. In this article we explore the issues that ...

  20. Within-subjects designs

    Suppose though we had a within-subjects design instead of a between-subjects design.A within-subjects version of this same experiment might involve measuring a subject's response to each of the treatments A, B, and C: ... Where a is the number of levels in our factor, and n is the number of subjects in the experiment.

  21. Within-Subjects vs. Between-Subjects

    Between-subjects design and within-subjects design are considered to be opposite models of research design. Between-subjects design can also be useful in bringing to light within-group differences.

  22. What is a within-subjects design? Including pros and cons

    A within-subject design, also known as dependent groups or a repeated measures design, is a method of designing experiments in which you choose to expose all participants to every condition or treatment. This makes the within-subject design the opposite of a between-subjects design where each participant experiences only one condition or ...

  23. What is the difference between an experiment in a within-subject design

    An experiment in a within-subject design is an experiment when all the participants have the same conditions - therefore we have only one group of participants. Whereas an experiment in a between-subject design is an experiment when the participants are split into two or more groups and each group has different conditions.