Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Guide to Experimental Design | Overview, Steps, & Examples

Guide to Experimental Design | Overview, 5 steps & Examples

Published on December 3, 2019 by Rebecca Bevans . Revised on June 21, 2023.

Experiments are used to study causal relationships . You manipulate one or more independent variables and measure their effect on one or more dependent variables.

Experimental design create a set of procedures to systematically test a hypothesis . A good experimental design requires a strong understanding of the system you are studying.

There are five key steps in designing an experiment:

  • Consider your variables and how they are related
  • Write a specific, testable hypothesis
  • Design experimental treatments to manipulate your independent variable
  • Assign subjects to groups, either between-subjects or within-subjects
  • Plan how you will measure your dependent variable

For valid conclusions, you also need to select a representative sample and control any  extraneous variables that might influence your results. If random assignment of participants to control and treatment groups is impossible, unethical, or highly difficult, consider an observational study instead. This minimizes several types of research bias, particularly sampling bias , survivorship bias , and attrition bias as time passes.

Table of contents

Step 1: define your variables, step 2: write your hypothesis, step 3: design your experimental treatments, step 4: assign your subjects to treatment groups, step 5: measure your dependent variable, other interesting articles, frequently asked questions about experiments.

You should begin with a specific research question . We will work with two research question examples, one from health sciences and one from ecology:

To translate your research question into an experimental hypothesis, you need to define the main variables and make predictions about how they are related.

Start by simply listing the independent and dependent variables .

Research question Independent variable Dependent variable
Phone use and sleep Minutes of phone use before sleep Hours of sleep per night
Temperature and soil respiration Air temperature just above the soil surface CO2 respired from soil

Then you need to think about possible extraneous and confounding variables and consider how you might control  them in your experiment.

Extraneous variable How to control
Phone use and sleep in sleep patterns among individuals. measure the average difference between sleep with phone use and sleep without phone use rather than the average amount of sleep per treatment group.
Temperature and soil respiration also affects respiration, and moisture can decrease with increasing temperature. monitor soil moisture and add water to make sure that soil moisture is consistent across all treatment plots.

Finally, you can put these variables together into a diagram. Use arrows to show the possible relationships between variables and include signs to show the expected direction of the relationships.

Diagram of the relationship between variables in a sleep experiment

Here we predict that increasing temperature will increase soil respiration and decrease soil moisture, while decreasing soil moisture will lead to decreased soil respiration.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

experimental 3 factors

Now that you have a strong conceptual understanding of the system you are studying, you should be able to write a specific, testable hypothesis that addresses your research question.

Null hypothesis (H ) Alternate hypothesis (H )
Phone use and sleep Phone use before sleep does not correlate with the amount of sleep a person gets. Increasing phone use before sleep leads to a decrease in sleep.
Temperature and soil respiration Air temperature does not correlate with soil respiration. Increased air temperature leads to increased soil respiration.

The next steps will describe how to design a controlled experiment . In a controlled experiment, you must be able to:

  • Systematically and precisely manipulate the independent variable(s).
  • Precisely measure the dependent variable(s).
  • Control any potential confounding variables.

If your study system doesn’t match these criteria, there are other types of research you can use to answer your research question.

How you manipulate the independent variable can affect the experiment’s external validity – that is, the extent to which the results can be generalized and applied to the broader world.

First, you may need to decide how widely to vary your independent variable.

  • just slightly above the natural range for your study region.
  • over a wider range of temperatures to mimic future warming.
  • over an extreme range that is beyond any possible natural variation.

Second, you may need to choose how finely to vary your independent variable. Sometimes this choice is made for you by your experimental system, but often you will need to decide, and this will affect how much you can infer from your results.

  • a categorical variable : either as binary (yes/no) or as levels of a factor (no phone use, low phone use, high phone use).
  • a continuous variable (minutes of phone use measured every night).

How you apply your experimental treatments to your test subjects is crucial for obtaining valid and reliable results.

First, you need to consider the study size : how many individuals will be included in the experiment? In general, the more subjects you include, the greater your experiment’s statistical power , which determines how much confidence you can have in your results.

Then you need to randomly assign your subjects to treatment groups . Each group receives a different level of the treatment (e.g. no phone use, low phone use, high phone use).

You should also include a control group , which receives no treatment. The control group tells us what would have happened to your test subjects without any experimental intervention.

When assigning your subjects to groups, there are two main choices you need to make:

  • A completely randomized design vs a randomized block design .
  • A between-subjects design vs a within-subjects design .

Randomization

An experiment can be completely randomized or randomized within blocks (aka strata):

  • In a completely randomized design , every subject is assigned to a treatment group at random.
  • In a randomized block design (aka stratified random design), subjects are first grouped according to a characteristic they share, and then randomly assigned to treatments within those groups.
Completely randomized design Randomized block design
Phone use and sleep Subjects are all randomly assigned a level of phone use using a random number generator. Subjects are first grouped by age, and then phone use treatments are randomly assigned within these groups.
Temperature and soil respiration Warming treatments are assigned to soil plots at random by using a number generator to generate map coordinates within the study area. Soils are first grouped by average rainfall, and then treatment plots are randomly assigned within these groups.

Sometimes randomization isn’t practical or ethical , so researchers create partially-random or even non-random designs. An experimental design where treatments aren’t randomly assigned is called a quasi-experimental design .

Between-subjects vs. within-subjects

In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.

In medical or social research, you might also use matched pairs within your between-subjects design to make sure that each treatment group contains the same variety of test subjects in the same proportions.

In a within-subjects design (also known as a repeated measures design), every individual receives each of the experimental treatments consecutively, and their responses to each treatment are measured.

Within-subjects or repeated measures can also refer to an experimental design where an effect emerges over time, and individual responses are measured over time in order to measure this effect as it emerges.

Counterbalancing (randomizing or reversing the order of treatments among subjects) is often used in within-subjects designs to ensure that the order of treatment application doesn’t influence the results of the experiment.

Between-subjects (independent measures) design Within-subjects (repeated measures) design
Phone use and sleep Subjects are randomly assigned a level of phone use (none, low, or high) and follow that level of phone use throughout the experiment. Subjects are assigned consecutively to zero, low, and high levels of phone use throughout the experiment, and the order in which they follow these treatments is randomized.
Temperature and soil respiration Warming treatments are assigned to soil plots at random and the soils are kept at this temperature throughout the experiment. Every plot receives each warming treatment (1, 3, 5, 8, and 10C above ambient temperatures) consecutively over the course of the experiment, and the order in which they receive these treatments is randomized.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Finally, you need to decide how you’ll collect data on your dependent variable outcomes. You should aim for reliable and valid measurements that minimize research bias or error.

Some variables, like temperature, can be objectively measured with scientific instruments. Others may need to be operationalized to turn them into measurable observations.

  • Ask participants to record what time they go to sleep and get up each day.
  • Ask participants to wear a sleep tracker.

How precisely you measure your dependent variable also affects the kinds of statistical analysis you can use on your data.

Experiments are always context-dependent, and a good experimental design will take into account all of the unique considerations of your study system to produce information that is both valid and relevant to your research question.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 21). Guide to Experimental Design | Overview, 5 steps & Examples. Scribbr. Retrieved June 18, 2024, from https://www.scribbr.com/methodology/experimental-design/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, random assignment in experiments | introduction & examples, quasi-experimental design | definition, types & examples, how to write a lab report, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Experimental Design: Definition and Types

By Jim Frost 3 Comments

What is Experimental Design?

An experimental design is a detailed plan for collecting and using data to identify causal relationships. Through careful planning, the design of experiments allows your data collection efforts to have a reasonable chance of detecting effects and testing hypotheses that answer your research questions.

An experiment is a data collection procedure that occurs in controlled conditions to identify and understand causal relationships between variables. Researchers can use many potential designs. The ultimate choice depends on their research question, resources, goals, and constraints. In some fields of study, researchers refer to experimental design as the design of experiments (DOE). Both terms are synonymous.

Scientist who developed an experimental design for her research.

Ultimately, the design of experiments helps ensure that your procedures and data will evaluate your research question effectively. Without an experimental design, you might waste your efforts in a process that, for many potential reasons, can’t answer your research question. In short, it helps you trust your results.

Learn more about Independent and Dependent Variables .

Design of Experiments: Goals & Settings

Experiments occur in many settings, ranging from psychology, social sciences, medicine, physics, engineering, and industrial and service sectors. Typically, experimental goals are to discover a previously unknown effect , confirm a known effect, or test a hypothesis.

Effects represent causal relationships between variables. For example, in a medical experiment, does the new medicine cause an improvement in health outcomes? If so, the medicine has a causal effect on the outcome.

An experimental design’s focus depends on the subject area and can include the following goals:

  • Understanding the relationships between variables.
  • Identifying the variables that have the largest impact on the outcomes.
  • Finding the input variable settings that produce an optimal result.

For example, psychologists have conducted experiments to understand how conformity affects decision-making. Sociologists have performed experiments to determine whether ethnicity affects the public reaction to staged bike thefts. These experiments map out the causal relationships between variables, and their primary goal is to understand the role of various factors.

Conversely, in a manufacturing environment, the researchers might use an experimental design to find the factors that most effectively improve their product’s strength, identify the optimal manufacturing settings, and do all that while accounting for various constraints. In short, a manufacturer’s goal is often to use experiments to improve their products cost-effectively.

In a medical experiment, the goal might be to quantify the medicine’s effect and find the optimum dosage.

Developing an Experimental Design

Developing an experimental design involves planning that maximizes the potential to collect data that is both trustworthy and able to detect causal relationships. Specifically, these studies aim to see effects when they exist in the population the researchers are studying, preferentially favor causal effects, isolate each factor’s true effect from potential confounders, and produce conclusions that you can generalize to the real world.

To accomplish these goals, experimental designs carefully manage data validity and reliability , and internal and external experimental validity. When your experiment is valid and reliable, you can expect your procedures and data to produce trustworthy results.

An excellent experimental design involves the following:

  • Lots of preplanning.
  • Developing experimental treatments.
  • Determining how to assign subjects to treatment groups.

The remainder of this article focuses on how experimental designs incorporate these essential items to accomplish their research goals.

Learn more about Data Reliability vs. Validity and Internal and External Experimental Validity .

Preplanning, Defining, and Operationalizing for Design of Experiments

A literature review is crucial for the design of experiments.

This phase of the design of experiments helps you identify critical variables, know how to measure them while ensuring reliability and validity, and understand the relationships between them. The review can also help you find ways to reduce sources of variability, which increases your ability to detect treatment effects. Notably, the literature review allows you to learn how similar studies designed their experiments and the challenges they faced.

Operationalizing a study involves taking your research question, using the background information you gathered, and formulating an actionable plan.

This process should produce a specific and testable hypothesis using data that you can reasonably collect given the resources available to the experiment.

  • Null hypothesis : The jumping exercise intervention does not affect bone density.
  • Alternative hypothesis : The jumping exercise intervention affects bone density.

To learn more about this early phase, read Five Steps for Conducting Scientific Studies with Statistical Analyses .

Formulating Treatments in Experimental Designs

In an experimental design, treatments are variables that the researchers control. They are the primary independent variables of interest. Researchers administer the treatment to the subjects or items in the experiment and want to know whether it causes changes in the outcome.

As the name implies, a treatment can be medical in nature, such as a new medicine or vaccine. But it’s a general term that applies to other things such as training programs, manufacturing settings, teaching methods, and types of fertilizers. I helped run an experiment where the treatment was a jumping exercise intervention that we hoped would increase bone density. All these treatment examples are things that potentially influence a measurable outcome.

Even when you know your treatment generally, you must carefully consider the amount. How large of a dose? If you’re comparing three different temperatures in a manufacturing process, how far apart are they? For my bone mineral density study, we had to determine how frequently the exercise sessions would occur and how long each lasted.

How you define the treatments in the design of experiments can affect your findings and the generalizability of your results.

Assigning Subjects to Experimental Groups

A crucial decision for all experimental designs is determining how researchers assign subjects to the experimental conditions—the treatment and control groups. The control group is often, but not always, the lack of a treatment. It serves as a basis for comparison by showing outcomes for subjects who don’t receive a treatment. Learn more about Control Groups .

How your experimental design assigns subjects to the groups affects how confident you can be that the findings represent true causal effects rather than mere correlation caused by confounders. Indeed, the assignment method influences how you control for confounding variables. This is the difference between correlation and causation .

Imagine a study finds that vitamin consumption correlates with better health outcomes. As a researcher, you want to be able to say that vitamin consumption causes the improvements. However, with the wrong experimental design, you might only be able to say there is an association. A confounder, and not the vitamins, might actually cause the health benefits.

Let’s explore some of the ways to assign subjects in design of experiments.

Completely Randomized Designs

A completely randomized experimental design randomly assigns all subjects to the treatment and control groups. You simply take each participant and use a random process to determine their group assignment. You can flip coins, roll a die, or use a computer. Randomized experiments must be prospective studies because they need to be able to control group assignment.

Random assignment in the design of experiments helps ensure that the groups are roughly equivalent at the beginning of the study. This equivalence at the start increases your confidence that any differences you see at the end were caused by the treatments. The randomization tends to equalize confounders between the experimental groups and, thereby, cancels out their effects, leaving only the treatment effects.

For example, in a vitamin study, the researchers can randomly assign participants to either the control or vitamin group. Because the groups are approximately equal when the experiment starts, if the health outcomes are different at the end of the study, the researchers can be confident that the vitamins caused those improvements.

Statisticians consider randomized experimental designs to be the best for identifying causal relationships.

If you can’t randomly assign subjects but want to draw causal conclusions about an intervention, consider using a quasi-experimental design .

Learn more about Randomized Controlled Trials and Random Assignment in Experiments .

Randomized Block Designs

Nuisance factors are variables that can affect the outcome, but they are not the researcher’s primary interest. Unfortunately, they can hide or distort the treatment results. When experimenters know about specific nuisance factors, they can use a randomized block design to minimize their impact.

This experimental design takes subjects with a shared “nuisance” characteristic and groups them into blocks. The participants in each block are then randomly assigned to the experimental groups. This process allows the experiment to control for known nuisance factors.

Blocking in the design of experiments reduces the impact of nuisance factors on experimental error. The analysis assesses the effects of the treatment within each block, which removes the variability between blocks. The result is that blocked experimental designs can reduce the impact of nuisance variables, increasing the ability to detect treatment effects accurately.

Suppose you’re testing various teaching methods. Because grade level likely affects educational outcomes, you might use grade level as a blocking factor. To use a randomized block design for this scenario, divide the participants by grade level and then randomly assign the members of each grade level to the experimental groups.

A standard guideline for an experimental design is to “Block what you can, randomize what you cannot.” Use blocking for a few primary nuisance factors. Then use random assignment to distribute the unblocked nuisance factors equally between the experimental conditions.

You can also use covariates to control nuisance factors. Learn about Covariates: Definition and Uses .

Observational Studies

In some experimental designs, randomly assigning subjects to the experimental conditions is impossible or unethical. The researchers simply can’t assign participants to the experimental groups. However, they can observe them in their natural groupings, measure the essential variables, and look for correlations. These observational studies are also known as quasi-experimental designs. Retrospective studies must be observational in nature because they look back at past events.

Imagine you’re studying the effects of depression on an activity. Clearly, you can’t randomly assign participants to the depression and control groups. But you can observe participants with and without depression and see how their task performance differs.

Observational studies let you perform research when you can’t control the treatment. However, quasi-experimental designs increase the problem of confounding variables. For this design of experiments, correlation does not necessarily imply causation. While special procedures can help control confounders in an observational study, you’re ultimately less confident that the results represent causal findings.

Learn more about Observational Studies .

For a good comparison, learn about the differences and tradeoffs between Observational Studies and Randomized Experiments .

Between-Subjects vs. Within-Subjects Experimental Designs

When you think of the design of experiments, you probably picture a treatment and control group. Researchers assign participants to only one of these groups, so each group contains entirely different subjects than the other groups. Analysts compare the groups at the end of the experiment. Statisticians refer to this method as a between-subjects, or independent measures, experimental design.

In a between-subjects design , you can have more than one treatment group, but each subject is exposed to only one condition, the control group or one of the treatment groups.

A potential downside to this approach is that differences between groups at the beginning can affect the results at the end. As you’ve read earlier, random assignment can reduce those differences, but it is imperfect. There will always be some variability between the groups.

In a  within-subjects experimental design , also known as repeated measures, subjects experience all treatment conditions and are measured for each. Each subject acts as their own control, which reduces variability and increases the statistical power to detect effects.

In this experimental design, you minimize pre-existing differences between the experimental conditions because they all contain the same subjects. However, the order of treatments can affect the results. Beware of practice and fatigue effects. Learn more about Repeated Measures Designs .

Assigned to one experimental condition Participates in all experimental conditions
Requires more subjects Fewer subjects
Differences between subjects in the groups can affect the results Uses same subjects in all conditions.
No order of treatment effects. Order of treatments can affect results.

Design of Experiments Examples

For example, a bone density study has three experimental groups—a control group, a stretching exercise group, and a jumping exercise group.

In a between-subjects experimental design, scientists randomly assign each participant to one of the three groups.

In a within-subjects design, all subjects experience the three conditions sequentially while the researchers measure bone density repeatedly. The procedure can switch the order of treatments for the participants to help reduce order effects.

Matched Pairs Experimental Design

A matched pairs experimental design is a between-subjects study that uses pairs of similar subjects. Researchers use this approach to reduce pre-existing differences between experimental groups. It’s yet another design of experiments method for reducing sources of variability.

Researchers identify variables likely to affect the outcome, such as demographics. When they pick a subject with a set of characteristics, they try to locate another participant with similar attributes to create a matched pair. Scientists randomly assign one member of a pair to the treatment group and the other to the control group.

On the plus side, this process creates two similar groups, and it doesn’t create treatment order effects. While matched pairs do not produce the perfectly matched groups of a within-subjects design (which uses the same subjects in all conditions), it aims to reduce variability between groups relative to a between-subjects study.

On the downside, finding matched pairs is very time-consuming. Additionally, if one member of a matched pair drops out, the other subject must leave the study too.

Learn more about Matched Pairs Design: Uses & Examples .

Another consideration is whether you’ll use a cross-sectional design (one point in time) or use a longitudinal study to track changes over time .

A case study is a research method that often serves as a precursor to a more rigorous experimental design by identifying research questions, variables, and hypotheses to test. Learn more about What is a Case Study? Definition & Examples .

In conclusion, the design of experiments is extremely sensitive to subject area concerns and the time and resources available to the researchers. Developing a suitable experimental design requires balancing a multitude of considerations. A successful design is necessary to obtain trustworthy answers to your research question and to have a reasonable chance of detecting treatment effects when they exist.

Share this:

experimental 3 factors

Reader Interactions

' src=

March 23, 2024 at 2:35 pm

Dear Jim You wrote a superb document, I will use it in my Buistatistics course, along with your three books. Thank you very much! Miguel

' src=

March 23, 2024 at 5:43 pm

Thanks so much, Miguel! Glad this post was helpful and I trust the books will be as well.

' src=

April 10, 2023 at 4:36 am

What are the purpose and uses of experimental research design?

Comments and Questions Cancel reply

Note that this "residual" for the within plot \(subplot\) part of the analysis is actually the sum of squares for the interaction of rows \(w\ hole plots\) with varieties \(subplot treatments\)---as in an RCBD.

- r_k\(i\) ~ N\(0, sigma^2_r\)

- e_ijk ~ N\(0, sigma^2_e\)

Experimental Design

  • What is Experimental Design?
  • Validity in Experimental Design
  • Types of Design
  • Related Topics

1. What is Experimental Design?

Experimental design is a way to carefully plan experiments in advance so that your results are both objective and valid . The terms “Experimental Design” and “Design of Experiments” are used interchangeably and mean the same thing. However, the medical and social sciences tend to use the term “Experimental Design” while engineering, industrial and computer sciences favor the term “Design of experiments.”

Design of experiments involves:

  • The systematic collection of data
  • A focus on the design itself, rather than the results
  • Planning changes to independent (input) variables and the effect on dependent variables or response variables
  • Ensuring results are valid, easily interpreted, and definitive.

Ideally, your experimental design should:

  • Describe how participants are allocated to experimental groups. A common method is completely randomized design, where participants are assigned to groups at random. A second method is randomized block design, where participants are divided into homogeneous blocks (for example, age groups) before being randomly assigned to groups.
  • Minimize or eliminate confounding variables , which can offer alternative explanations for the experimental results.
  • Allow you to make inferences about the relationship between independent variables and dependent variables .
  • Reduce variability , to make it easier for you to find differences in treatment outcomes.

The most important principles 1 are:

  • Randomization : the assignment of study components by a completely random method, like simple random sampling . Randomization eliminates bias from the results
  • Replication : the experiment must be replicable by other researchers. This is usually achieved with the use of statistics like the standard error of the sample mean or confidence intervals .
  • Blocking: controlling sources of variation in the experimental results.

2. Variables in Design of Experiments

  • What is a Confounding Variable?
  • What is a Control Variable?
  • What is a Criterion Variable?
  • What are Endogenous Variables?
  • What is a Dependent Variable?
  • What is an Explanatory Variable?
  • What is an Intervening Variable?
  • What is a Manipulated Variable?
  • What is an Outcome Variable?

Back to Top

3. Validity in Design of Experiments

  • What is Concurrent Validity?
  • What is Construct Validity?
  • What is Consequential Validity?
  • What is Convergent Validity?
  • What is Criterion Validity?
  • What is Ecological validity?
  • What is External Validity?
  • What is Face Validity?
  • What is Internal Validity?
  • What is Predictive Validity?

4. Design of Experiments: Types

  • Adaptive designs.
  • Balanced Latin Square Design.
  • Balanced and Unbalanced Designs .
  • Between Subjects Design.
  • What are Case Studies?
  • What is a Case-Control Study?
  • What is a Cohort Study?
  • Completely Randomized Design.
  • Cross Lagged Panel Design .

Cross Sectional Research

  • Cross Sequential Design.
  • Definite Screening Design

Factorial Design.

  • Flexible Design.
  • Group sequential Design.
  • Longitudinal Research.

Matched-Pairs Design.

  • Parallel Design.
  • Observational Study .
  • Plackett-Burman Design.

Pretest-Posttest Design.

  • Prospective Study.

Quasi-Experimental Design.

Randomized block design., randomized controlled trial.

  • Repeated Measures Design .
  • Retrospective Study.
  • Split-Plot Design.
  • Strip-Plot Design .
  • Stepped Wedge Designs .
  • What is Survey Research?

Within subjects Design.

Between subjects design (independent measures)., what is between subjects design.

experimental design

In between subjects design, separate groups are created for each treatment. This type of experimental design is sometimes called independent measures design because each participant is assigned to only one treatment group.For example, you might be testing a new depression medication: one group receives the actual medication and the other receives a placebo . Participants can only be a member of one of the groups (either the treatment or placebo group). A new group is created for every treatment. For example, if you are testing two depression medications, you would have:

  • Group 1 (Medication 1).
  • Group 2 (Medication 2).
  • Group 3 (Placebo).

Advantages and Disadvantages of Between Subjects Design.

Advantages..

Between subjects design is one of the simplest types of experimental design setup. Other advantages include:

  • Multiple treatments and treatment levels can be tested at the same time.
  • This type of design can be completed quickly.

Disadvantages.

A major disadvantage in this type of experimental design is that as each participant is only being tested once, the addition of a new treatment requires the formation of another group. The design can become extremely complex if more than a few treatments are being tested. Other disadvantages include:

  • Differences in individuals (i.e. age, race, sex) may skew results and are almost impossible to control for in this experimental design.
  • Bias can be an issue unless you control for this factor using experimental blinds (either a single blind experiment–where the participant doesn’t know if they are getting a treatment or placebo–or a double blind, where neither the participant nor the researcher know).
  • Generalization issues means that you may not be able to extrapolate your results to a wider audience.
  • Environmental bias can be a problem with between subjects design. For example, let’s say you were giving one group of college students a standardized test at 8 a.m. and a second group the test at noon. Students who took the 8 a.m. test may perform poorly simply because they weren’t awake yet.

Back to Top.

Completely Randomized Experimental Design.

What is a completely randomized design.

A completely randomized design (CRD) is an experiment where the treatments are assigned at random. Every experimental unit has the same odds of receiving a particular treatment. This design is usually only used in lab experiments, where environmental factors are relatively easy to control for; it is rarely used out in the field, where environmental factors are usually impossible to control. When a CRD has two treatments, it is equivalent to a t-test .

A completely randomized design is generally implemented by:

  • Listing the treatment levels or treatment combinations.
  • Assigning each level/combination a random number.
  • Sorting the random numbers in order, to produce a random application order for treatments.

However, you could use any method that completely randomizes the treatments and experimental units, as long as you take care to ensure that:

  • The assignment is truly random.
  • You have accounted for extraneous variables .

Completely Randomized Design Example.

completely randomized design

Completely Randomized Design with Subsampling.

This subset of CRD is usually used when experimental units are limited. Subsampling might include several branches of a particular tree, or several samples from an individual plot. Back to Top.

What is a Factorial Design?

A factorial experimental design is used to investigate the effect of two or more independent variables on one dependent variable . For example, let’s say a researcher wanted to investigate components for increasing SAT Scores . The three components are:

  • SAT intensive class (yes or no).
  • SAT Prep book (yes or no).
  • Extra homework (yes or no).

The researcher plans to manipulate each of these independent variables. Each of the independent variables is called a factor , and each factor has two levels (yes or no). As this experiment has 3 factors with 2 levels, this is a 2 x 2 x 2 = 2 3 factorial design. An experiment with 3 factors and 3 levels would be a 3 3 factorial design and an experiment with 2 factors and 3 levels would be a 3 2 factorial design.

The vast majority of factorial experiments only have two levels. In some experiments where the number of level/factor combinations are unmanageable, the experiment can be split into parts (for example, by half), creating a fractional experimental design.

Null Outcome.

A null outcome is when the experiment’s outcome is the same regardless of how the levels and factors were combined. In the above example, that would mean no amount of SAT prep (book and class, class and extra homework etc.) could increase the scores of the students being studied.

Main Effect and Interaction Effect.

Two types of effects are considered when analyzing the results from a factorial experiment: main effect and interaction effect . The main effect is the effect of an independent variable (in this case, SAT prep class or SAT book or extra homework) on the dependent variable (SAT Scores). For a main effect to exist, you’d want to see a consistent trend across the different levels. For example, you might conclude that students who took the SAT prep class scored consistently higher than students who did not. An interaction effect occurs between factors. For example, one group of students who took the SAT class and used the SAT prep book showed an increase in SAT scores while the students who took the class but did not use the book didn’t show any increase. You could infer that there is an interaction between the SAT class and use of the SAT prep book. Back to Top.

What is Matched Pairs Design?

Matched pairs design is a special case of randomized block design. In this design, two treatments are assigned to homogeneous groups (blocks) of subjects. The goal is to maximize homogeneity in each pair. In other words, you want the pairs to be as similar as possible. The blocks are composed of matched pairs which are randomly assigned a treatment (commonly the drug or a placebo).

matched pairs design

Stacking in Matched Pairs Design.

You can think of matched pair design as a type of stacked randomized block design . With either design, your goal is to control for some variable that’s going to skew your results. In the above experiment, it isn’t just age that could account for differences in how people respond to drugs, several other confounding variables could also affect your experiment. The purpose of the blocks is to minimize a single source of variability (for example, differences due to age). When you create matched pairs, you’re creating blocks within blocks, enabling you to control for multiple sources of potential variability. You should construct your matched pairs carefully, as it’s often impossible to account for all variables without creating a huge and complex experiment. Therefore, you should create your blocks starting with which candidates are most likely to affect your results. Back to Top.

Observational Study

What is an observational study.

An observational study (sometimes called a natural experiment or a quasi-experiment) is where the researcher observes the study participants and measures variables without assigning any treatments. For example, let’s say you wanted to find out the effect of cognitive therapy for ADHD. In an experimental study, you would assign some patients cognitive therapy and other patients some other form of treatment (or no treatment at all). In an observational study you would find patients who are already undergoing the therapy , and some who are already participating in other therapies (or no therapy at all).

Ideally, treatments should be investigated experimentally with random assignment of treatments to participants. This random assignment means that measured and unmeasured characteristics are evenly divided over the groups. In other words, any differences between the groups would be due to chance. Any statistical tests you run on these types of studies would be reliable. However, it isn’t always ethical or feasible to run experimental studies, especially in medical studies involving life-threatening or potentially disabled studies. In these cases, observational studies are used.

Examples of Observational Studies

Selective Serotonin Reuptake Inhibitors and Violent Crime: A Cohort Study A study published in PLOS magazine studied the uncertain relationship between SSRIs (like Prozac and Paxil) and Violent Crime. The researchers “…extracted information on SSRIs prescribed in Sweden between 2006 and 2009 from the Swedish Prescribed Drug Register and information on convictions for violent crimes for the same period from the Swedish national crime register. They then compared the rate of violent crime while individuals were prescribed SSRIs with the rate of violent crime in the same individuals while not receiving medication.” The study findings found an increased association between SSRI use and violent crimes.

Cleaner Air Found to Add 5 Months to Life A Brigham Young University study examined the connected between air quality and life expectancy. The researchers looked at life expectancy data from 51 metropolitan areas and compared the figures to air quality improvements in each region from the 1980s to 1990s. After taking into account factors like smoking and socioeconomic status, the researchers found that an average of about five months life expectancy was attributed to clean air. The New York Times printed a summary of the results here .

Effects of Children of Occupational Exposures to Lead Researchers matched 33 children whose parents were exposed to lead at work with 33 children who were the same age and loved in the same neighborhood. Elevated levels of lead were found in the exposed children. This was attributed to levels of lead that the parents were exposed to at work, and poor hygiene practices of the parent (UPenn).

Longitudinal Research

Longitudinal research is an observational study of the same variables over time. Studies can last weeks, months or even decades. The term “longitudinal” is very broad, but generally means to collect data over more than one period, from the same participants(or very similar participants). According to sociologist Scott Menard, Ph.D. , the research should also involve some comparison of data among or between periods. However, the longitudinal research doesn’t necessarily have to be collected over time. Data could be collected at one point in time but include retrospective data. For example, a participant could be asked about their prior exercise habits up to and including the time of the study.

The purpose of Longitudinal Research is to:

  • Record patterns of change. For example, the development of emphysema over time.
  • Establish the direction and magnitude of causal relationships. For example, women who smoke are 12 times more likely to die of emphysema than non-smokers.

Cross sectional research involves collecting data at one specific point in time. You can interact with individuals directly, or you could study data in a database or other media. For example, you could study medical databases to see if illegal drug use results in heart disease. If you find a correlation ( what is correlation? ) between illegal drug use and heart disease, that would support the claim that illegal drug use may increase the risk of heart disease.

Cross sectional research is a descriptive study ; you only record what you find and you don’t manipulate variables like in traditional experiments. It is most often used to look at how often a phenomenon occurs in a population .

Advantages and Disadvantages of Cross Sectional Research

  • Can be very inexpensive if you already have a database (for example, medical history data in a hospital database).
  • Allows you to look at many factors at the same time, like age/weight/height/tobacco use/drug use.

Disadvantages

  • Can result in weak evidence, compared to cohort studies (which cost more and take longer).
  • Available data may not be suited to your research question. For example, if you wanted to know if sugar consumption leads to obesity, you are unlikely to find data on sugar consumption in a medical database.
  • Cross sectional research studies are usually unable to control for confounding variables . One reason for this is that it’s usually difficult to find people who are similar enough. For example, they might be decades apart in age or they might be born in very different geographic regions.

cross sectional research

Cross sectional research can give the “big picture” and can be a foundation to suggest other areas for more expensive research. For example, if the data suggests that there may be a relationship between sugar consumption and obesity, this could bolster an application for funding more research in this area.

Cross-Sectional vs Longitudinal Research

longitudinal research

Both cross-sectional and longitudinal research studies are observational. They are both conducted without any interference to the study participants. Cross-sectional research is conducted at a single point in time while a longitudinal study can be conducted over many years.

For example, let’s say researchers wanted to find out if older adults who gardened had lower blood pressure than older adults who did not garden. In a cross-sectional study, the researchers might select 100 people from different backgrounds, ask them about their gardening habits and measure their blood pressure. The study would be conducted at approximately the same period of time (say, over a week). In a longitudinal study, the questions and measurements would be the same. But the researchers would follow the participants over time. They may record the answers and measurements every year.

One major advantage of longitudinal research is that over time, researchers are more able to provide a cause-and-effect relationship. With the blood pressure example above, cross-sectional research wouldn’t give researchers information about what blood pressure readings were before the study. For example, participants may have had lower blood pressure before gardening. Longitudinal research can detect changes over time, both at the group and at the individual level.

Types of Longitudinal Design

Longitudinal Panel Design is the “traditional” type of longitudinal design, where the same data is collected from the same participants over a period of time. Repeated cross-sectional studies can be classified as longitudinal. Other types are:

  • Total population design, where the total population is surveyed in each study period.
  • Revolving panel design, where new participants are selected each period.

What is Pretest Posttest Design?

pretest posttest design

A pretest posttest design is an experiment where measurements are taken both before and after a treatment . The design means that you are able to see the effects of some type of treatment on a group. Pretest posttest designs may be quasi-experimental, which means that participants are not assigned randomly. However, the most usual method is to randomly assign the participants to groups in order to control for confounding variables. Three main types of pretest post design are commonly used:

  • Randomized Control-Group Pretest Posttest Design.
  • Randomized Solomon Four-Group Design.
  • Nonrandomized Control Group Pretest-Posttest Design.

1. Randomized Control-Group Pretest Posttest Design.

The pre-test post-test control group design is also called the classic controlled experimental design . The design includes both a control and a treatment group. For example, if you wanted to gauge if a new way of teaching math was effective, you could:

  • Randomly assign participants to a treatment group or a control group .
  • Administer a pre-test to the treatment group and the control group.
  • Use the new teaching method on the treatment group and the standard method on the control group, ensuring that the method of treatment is the only condition that is different.
  • Administer a post-test to both groups.
  • Assess the differences between groups.

Two issues can affect the Randomized Control-Group Pretest Posttest Design:

  • Internal validity issues: maturation (i.e. biological changes in participants can affect differences between pre- and post-tests) and history (where participants experience something outside of the treatment that can affect scores).
  • External validity issues : Interaction of the pre-test and the treatment can occur if participants are influenced by the tone or content of the question. For example, a question about how many hours a student spends on homework might prompt the student to spend more time on homework.

2. Randomized Solomon Four-Group Design.

In this type of pretest posttest design, four groups are randomly assigned: two experimental groups E1/E2 and two control groups C1/C2. Groups E1 and C1 complete a pre-test and all four groups complete a post-test. This better controls for the interaction of pretesting and posttesting; in the “classic” design, participants may be unduly influenced by the questions on the pretest.

3. Nonrandomized Control Group Pretest-Posttest Design.

This type of test is similar to the “classic” design, but participants are not randomly assigned to groups. Nonrandomization can be more practical in real-life, when you are dealing with groups like students or employees who are already in classes or departments; randomization (i.e. moving people around to form new groups) could prove disruptive. This type of experimental design suffers from problems with internal validity more so than the other two types. Back to Top.

quasi-experimental design

What is a Quasi-Experimental Design?

A quasi-experimental design has much the same components as a regular experiment, but is missing one or more key components. The three key components of a traditional experiment are:

  • Pre-post test design.
  • Treatment and control groups.
  • Random assignment of subjects to groups.

You may want or need to deliberately leave out one of these key components. This could be for ethical or methodological reasons. For example:

  • It would be unethical to withhold treatment from a control group. This is usually the case with life-threatening illness, like cancer.
  • It would be unethical to treat patients; for example, you might want to find out if a certain drug causes blindness.
  • A regular experiment might be expensive and impossible to fund.
  • An experiment could technically fail due to loss of participants, but potentially produce useful data.
  • It might be logistically impossible to control for all variables in a regular experiment.

These types of issues crop up frequently, leading to the widespread acceptance of quasi-experimental designs — especially in the social sciences. Quasi-experimental designs are generally regarded as unreliable and unscientific in the physical and biological sciences.

Some experiments naturally fall into groups. For example, you might want to compare educational experiences of first, middle and last born children. Random assignment isn’t possible, so these experiments are quasi-experimental by nature.

Quasi-Experimental Design Examples.

The general form of a quasi-experimental design thesis statement is “What effect does (a certain intervention or program) have on a (specific population)”?

Example 1 : Does smoking during pregnancy leads to low birth weight? It would be unethical to randomly assign one group of mothers packs of cigarettes to smoke. The researcher instead asks the mothers if they smoked during pregnancy and assigns them to groups after the fact.

Example 2 : Does thoughtfully designed software improve learning outcomes for students? This study used a pre-post test design and multiple classrooms to show how technology can be successfully implemented in schools.

Example 3 : Can being mentored for your job lead to increased job satisfaction? This study followed 73 employees, some who were mentored and some who were not. Back to Top.

What is Randomized Block Design?

In randomized block design, the researcher divides experimental subjects into homogeneous blocks. Treatments are then randomly assigned to the blocks. The variability within blocks should be greater than the variability between blocks. In other words, you need to make sure that the blocks contain subjects that are very similar. For example, you could put males in one block and females in a second block. This method is practically identical to stratified random sampling (SRS), except the blocks in SRS are called “ strata .” Randomized block design reduces variability in experiments.

randomized block design

Age isn’t the only potential source of variability. Other blocking factors that you could consider for this type of experiment include:

  • Consumption of certain foods.
  • Use of over the counter food supplements.
  • Adherence to dosing regimen.
  • Differences in metabolism due to genetic differences, liver or kidney issues, race, or sex.
  • Coexistence of other disorders.
  • Use of other drugs.

Randomized block experimental design is sometimes called randomized complete block experimental design , because the word “complete” makes it clear that all subjects are included in the experiment, not just a sample. However, the setup of the experiment usually makes it clear that all subjects are included, so most people will drop the word complete . Back to Top.

What is a Randomized Controlled Trial?

randomized controlled trial

A randomized controlled trial is an experiment where the participants are randomly allocated to two or more groups to test a specific treatment or drug. Participants are assigned to either an experimental group or a comparison group. Random allocation means that all participants have the same chance of being placed in either group. The experimental group receives a treatment or intervention, for example:

  • Diagnostic Tests.
  • Experimental medication.
  • Interventional procedures.
  • Screening programs.
  • Specific types of education.

Participants in the comparison group receive a placebo (a dummy treatment), an alternative treatment, or no treatment at all. There are many randomization methods available. For example, simple random sampling , stratified random sampling or systematic random sampling. The common factor for all methods is that researchers, patients and other parties cannot tell ahead of time who will be placed in which group.

Advantages and Disadvantages of Randomized Controlled Trials

  • Random allocation can cancel out population bias ; it ensures that any other possible causes for the experimental results are split equally between groups.
  • Blinding is easy to include in this type of experiment.
  • Results from the experiment can be analyzed with statistical tests and used to infer other possibilities, like the likelihood of the method working for all populations.
  • Participants are readily identifiable as members of a specific population./li>
  • Generally more expensive and more time consuming than other methods.
  • Very large sample sizes (over 5,000 participants) are often needed.
  • Random controlled trials cannot uncover causation/risk factors. For example, ethical concerns would prevent a randomized controlled trial investigating the risk factors for smoking.
  • This type of experimental design is unsuitable for outcomes which take a long time to develop. Cohort studies may be a more suitable alternative.
  • Some programs, for example cancer screening, are unsuited for random allocation of participants (again, due to ethical concerns).
  • Volunteer bias can be an issue.

What is a Within Subjects Experimental Design?

within subjects design

In a within subjects experimental design, participants are assigned more than one treatment: each participant experiences all the levels for any categorical explanatory variable . The levels can be ordered, like height or time. Or they can be un-ordered. For example, let’s say you are testing if blood pressure is raised when watching horror movies vs. romantic comedies. You could have all the participants watch a scary movie, then measure their blood pressure. Later, the same group of people watch a romantic comedy, and their blood pressure is measured.

Within subjects designs are frequently used in pre-test/post-test scenarios. For example, if a teacher wants to find out if a new classroom strategy is effective, they might test children before the strategy is in place and then after the strategy is in place.

Within subjects designs are similar to other analysis of variance designs, in that it’s possible to have a single independent variable, or multiple factorial independent variables. For example, three different depression inventories could be given at one, three, and six month intervals.

Advantages and Disadvantages of Within Subjects Experimental Design.

  • It requires fewer participants than the between subjects design. If a between subjects design were used for the blood pressure example above, double the amount of participants would be required. Within subjects design therefore requires fewer resources and is generally cheaper.
  • Individual difference between participants are controlled for, as each participant acts as their own control. As the subjects are measured multiple times, this better enables the researcher to hone in on individual differences so that they can be removed from the analysis.
  • Effects from one test could carry over to the next, a phenomenon called the “range effect.” In the blood pressure example, if participants were asked to watch the scary movie first, their blood pressure could stay elevated for hours afterwards, skewing the results from the romantic comedy.
  • Participants can exhibit “practice effects”, where they improve scores simply by taking the same test multiple times. This is often an issue on pre-test/post-test studies.
  • Data is not completely independent, which may effect running hypothesis tests , like ANOVA .

References : Merck Manual. Retrieved Jan 1, 2016 from: http://www.merckmanuals.com/professional/clinical-pharmacology/factors-affecting-response-to-drugs/introduction-to-factors-affecting-response-to-drugs Penn State: Basic Principles of DOE. Retrieved Jan 1, 2016 from: https://onlinecourses.science.psu.edu/stat503/node/67 Image: SUNY Downstate. Retrieved Jan 1, 2016 from: http://library.downstate.edu/EBM2/2200.htm

5. Related Topics

  • Accuracy and Precision .
  • Block plots .
  • Cluster Randomization .
  • What is Clustering?
  • What is the Cohort Effect?
  • What is a Control Group?
  • What is Counterbalancing?
  • Data Collection Methods
  • What is an Effect Size?
  • What is a Experimental Group (or Treatment Group)?
  • Fixed, Random, and Mixed Effects Models
  • What are generalizability and transferability?
  • What is Grounded Theory?
  • The Hawthorne Effect .
  • The Hazard Ratio.
  • Inter-rater Reliability.
  • Main Effects .
  • Order Effects .
  • The Placebo Effect
  • What is the Practice Effect?
  • Primary and Secondary Data .
  • What is Qualitative Research?
  • What is Quantitative Research?
  • What is a Randomized Clinical Trial?
  • Random Selection and Assignment.
  • Randomization .
  • Recall Bias .
  • What is Response Bias?
  • Research Methods (includes Quantitative and Qualitative).
  • Subgroup Analysis .
  • What is Survey Sampling?
  • Systematic Errors.
  • Treatment Diffusion.

Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York. Cook, T. (2005). Introduction to Statistical Methods for Clinical Trials(Chapman & Hall/CRC Texts in Statistical Science) 1st Edition. Chapman and Hall/CRC Friedman (2015). Fundamentals of Clinical Trials 5th ed. Springer.” Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Everitt, B. S.; Skrondal, A. (2010), The Cambridge Dictionary of Statistics , Cambridge University Press. Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences , Wiley. Levine, D. (2014). Even You Can Learn Statistics and Analytics: An Easy to Understand Guide to Statistics and Analytics 3rd Edition. Pearson FT Press UPenn. http://finzi.psych.upenn.edu/library/granovaGG/html/blood_lead.html. Retrieved May 1, 2020.

  • Privacy Policy

Research Method

Home » Experimental Design – Types, Methods, Guide

Experimental Design – Types, Methods, Guide

Table of Contents

Experimental Research Design

Experimental Design

Experimental design is a process of planning and conducting scientific experiments to investigate a hypothesis or research question. It involves carefully designing an experiment that can test the hypothesis, and controlling for other variables that may influence the results.

Experimental design typically includes identifying the variables that will be manipulated or measured, defining the sample or population to be studied, selecting an appropriate method of sampling, choosing a method for data collection and analysis, and determining the appropriate statistical tests to use.

Types of Experimental Design

Here are the different types of experimental design:

Completely Randomized Design

In this design, participants are randomly assigned to one of two or more groups, and each group is exposed to a different treatment or condition.

Randomized Block Design

This design involves dividing participants into blocks based on a specific characteristic, such as age or gender, and then randomly assigning participants within each block to one of two or more treatment groups.

Factorial Design

In a factorial design, participants are randomly assigned to one of several groups, each of which receives a different combination of two or more independent variables.

Repeated Measures Design

In this design, each participant is exposed to all of the different treatments or conditions, either in a random order or in a predetermined order.

Crossover Design

This design involves randomly assigning participants to one of two or more treatment groups, with each group receiving one treatment during the first phase of the study and then switching to a different treatment during the second phase.

Split-plot Design

In this design, the researcher manipulates one or more variables at different levels and uses a randomized block design to control for other variables.

Nested Design

This design involves grouping participants within larger units, such as schools or households, and then randomly assigning these units to different treatment groups.

Laboratory Experiment

Laboratory experiments are conducted under controlled conditions, which allows for greater precision and accuracy. However, because laboratory conditions are not always representative of real-world conditions, the results of these experiments may not be generalizable to the population at large.

Field Experiment

Field experiments are conducted in naturalistic settings and allow for more realistic observations. However, because field experiments are not as controlled as laboratory experiments, they may be subject to more sources of error.

Experimental Design Methods

Experimental design methods refer to the techniques and procedures used to design and conduct experiments in scientific research. Here are some common experimental design methods:

Randomization

This involves randomly assigning participants to different groups or treatments to ensure that any observed differences between groups are due to the treatment and not to other factors.

Control Group

The use of a control group is an important experimental design method that involves having a group of participants that do not receive the treatment or intervention being studied. The control group is used as a baseline to compare the effects of the treatment group.

Blinding involves keeping participants, researchers, or both unaware of which treatment group participants are in, in order to reduce the risk of bias in the results.

Counterbalancing

This involves systematically varying the order in which participants receive treatments or interventions in order to control for order effects.

Replication

Replication involves conducting the same experiment with different samples or under different conditions to increase the reliability and validity of the results.

This experimental design method involves manipulating multiple independent variables simultaneously to investigate their combined effects on the dependent variable.

This involves dividing participants into subgroups or blocks based on specific characteristics, such as age or gender, in order to reduce the risk of confounding variables.

Data Collection Method

Experimental design data collection methods are techniques and procedures used to collect data in experimental research. Here are some common experimental design data collection methods:

Direct Observation

This method involves observing and recording the behavior or phenomenon of interest in real time. It may involve the use of structured or unstructured observation, and may be conducted in a laboratory or naturalistic setting.

Self-report Measures

Self-report measures involve asking participants to report their thoughts, feelings, or behaviors using questionnaires, surveys, or interviews. These measures may be administered in person or online.

Behavioral Measures

Behavioral measures involve measuring participants’ behavior directly, such as through reaction time tasks or performance tests. These measures may be administered using specialized equipment or software.

Physiological Measures

Physiological measures involve measuring participants’ physiological responses, such as heart rate, blood pressure, or brain activity, using specialized equipment. These measures may be invasive or non-invasive, and may be administered in a laboratory or clinical setting.

Archival Data

Archival data involves using existing records or data, such as medical records, administrative records, or historical documents, as a source of information. These data may be collected from public or private sources.

Computerized Measures

Computerized measures involve using software or computer programs to collect data on participants’ behavior or responses. These measures may include reaction time tasks, cognitive tests, or other types of computer-based assessments.

Video Recording

Video recording involves recording participants’ behavior or interactions using cameras or other recording equipment. This method can be used to capture detailed information about participants’ behavior or to analyze social interactions.

Data Analysis Method

Experimental design data analysis methods refer to the statistical techniques and procedures used to analyze data collected in experimental research. Here are some common experimental design data analysis methods:

Descriptive Statistics

Descriptive statistics are used to summarize and describe the data collected in the study. This includes measures such as mean, median, mode, range, and standard deviation.

Inferential Statistics

Inferential statistics are used to make inferences or generalizations about a larger population based on the data collected in the study. This includes hypothesis testing and estimation.

Analysis of Variance (ANOVA)

ANOVA is a statistical technique used to compare means across two or more groups in order to determine whether there are significant differences between the groups. There are several types of ANOVA, including one-way ANOVA, two-way ANOVA, and repeated measures ANOVA.

Regression Analysis

Regression analysis is used to model the relationship between two or more variables in order to determine the strength and direction of the relationship. There are several types of regression analysis, including linear regression, logistic regression, and multiple regression.

Factor Analysis

Factor analysis is used to identify underlying factors or dimensions in a set of variables. This can be used to reduce the complexity of the data and identify patterns in the data.

Structural Equation Modeling (SEM)

SEM is a statistical technique used to model complex relationships between variables. It can be used to test complex theories and models of causality.

Cluster Analysis

Cluster analysis is used to group similar cases or observations together based on similarities or differences in their characteristics.

Time Series Analysis

Time series analysis is used to analyze data collected over time in order to identify trends, patterns, or changes in the data.

Multilevel Modeling

Multilevel modeling is used to analyze data that is nested within multiple levels, such as students nested within schools or employees nested within companies.

Applications of Experimental Design 

Experimental design is a versatile research methodology that can be applied in many fields. Here are some applications of experimental design:

  • Medical Research: Experimental design is commonly used to test new treatments or medications for various medical conditions. This includes clinical trials to evaluate the safety and effectiveness of new drugs or medical devices.
  • Agriculture : Experimental design is used to test new crop varieties, fertilizers, and other agricultural practices. This includes randomized field trials to evaluate the effects of different treatments on crop yield, quality, and pest resistance.
  • Environmental science: Experimental design is used to study the effects of environmental factors, such as pollution or climate change, on ecosystems and wildlife. This includes controlled experiments to study the effects of pollutants on plant growth or animal behavior.
  • Psychology : Experimental design is used to study human behavior and cognitive processes. This includes experiments to test the effects of different interventions, such as therapy or medication, on mental health outcomes.
  • Engineering : Experimental design is used to test new materials, designs, and manufacturing processes in engineering applications. This includes laboratory experiments to test the strength and durability of new materials, or field experiments to test the performance of new technologies.
  • Education : Experimental design is used to evaluate the effectiveness of teaching methods, educational interventions, and programs. This includes randomized controlled trials to compare different teaching methods or evaluate the impact of educational programs on student outcomes.
  • Marketing : Experimental design is used to test the effectiveness of marketing campaigns, pricing strategies, and product designs. This includes experiments to test the impact of different marketing messages or pricing schemes on consumer behavior.

Examples of Experimental Design 

Here are some examples of experimental design in different fields:

  • Example in Medical research : A study that investigates the effectiveness of a new drug treatment for a particular condition. Patients are randomly assigned to either a treatment group or a control group, with the treatment group receiving the new drug and the control group receiving a placebo. The outcomes, such as improvement in symptoms or side effects, are measured and compared between the two groups.
  • Example in Education research: A study that examines the impact of a new teaching method on student learning outcomes. Students are randomly assigned to either a group that receives the new teaching method or a group that receives the traditional teaching method. Student achievement is measured before and after the intervention, and the results are compared between the two groups.
  • Example in Environmental science: A study that tests the effectiveness of a new method for reducing pollution in a river. Two sections of the river are selected, with one section treated with the new method and the other section left untreated. The water quality is measured before and after the intervention, and the results are compared between the two sections.
  • Example in Marketing research: A study that investigates the impact of a new advertising campaign on consumer behavior. Participants are randomly assigned to either a group that is exposed to the new campaign or a group that is not. Their behavior, such as purchasing or product awareness, is measured and compared between the two groups.
  • Example in Social psychology: A study that examines the effect of a new social intervention on reducing prejudice towards a marginalized group. Participants are randomly assigned to either a group that receives the intervention or a control group that does not. Their attitudes and behavior towards the marginalized group are measured before and after the intervention, and the results are compared between the two groups.

When to use Experimental Research Design 

Experimental research design should be used when a researcher wants to establish a cause-and-effect relationship between variables. It is particularly useful when studying the impact of an intervention or treatment on a particular outcome.

Here are some situations where experimental research design may be appropriate:

  • When studying the effects of a new drug or medical treatment: Experimental research design is commonly used in medical research to test the effectiveness and safety of new drugs or medical treatments. By randomly assigning patients to treatment and control groups, researchers can determine whether the treatment is effective in improving health outcomes.
  • When evaluating the effectiveness of an educational intervention: An experimental research design can be used to evaluate the impact of a new teaching method or educational program on student learning outcomes. By randomly assigning students to treatment and control groups, researchers can determine whether the intervention is effective in improving academic performance.
  • When testing the effectiveness of a marketing campaign: An experimental research design can be used to test the effectiveness of different marketing messages or strategies. By randomly assigning participants to treatment and control groups, researchers can determine whether the marketing campaign is effective in changing consumer behavior.
  • When studying the effects of an environmental intervention: Experimental research design can be used to study the impact of environmental interventions, such as pollution reduction programs or conservation efforts. By randomly assigning locations or areas to treatment and control groups, researchers can determine whether the intervention is effective in improving environmental outcomes.
  • When testing the effects of a new technology: An experimental research design can be used to test the effectiveness and safety of new technologies or engineering designs. By randomly assigning participants or locations to treatment and control groups, researchers can determine whether the new technology is effective in achieving its intended purpose.

How to Conduct Experimental Research

Here are the steps to conduct Experimental Research:

  • Identify a Research Question : Start by identifying a research question that you want to answer through the experiment. The question should be clear, specific, and testable.
  • Develop a Hypothesis: Based on your research question, develop a hypothesis that predicts the relationship between the independent and dependent variables. The hypothesis should be clear and testable.
  • Design the Experiment : Determine the type of experimental design you will use, such as a between-subjects design or a within-subjects design. Also, decide on the experimental conditions, such as the number of independent variables, the levels of the independent variable, and the dependent variable to be measured.
  • Select Participants: Select the participants who will take part in the experiment. They should be representative of the population you are interested in studying.
  • Randomly Assign Participants to Groups: If you are using a between-subjects design, randomly assign participants to groups to control for individual differences.
  • Conduct the Experiment : Conduct the experiment by manipulating the independent variable(s) and measuring the dependent variable(s) across the different conditions.
  • Analyze the Data: Analyze the data using appropriate statistical methods to determine if there is a significant effect of the independent variable(s) on the dependent variable(s).
  • Draw Conclusions: Based on the data analysis, draw conclusions about the relationship between the independent and dependent variables. If the results support the hypothesis, then it is accepted. If the results do not support the hypothesis, then it is rejected.
  • Communicate the Results: Finally, communicate the results of the experiment through a research report or presentation. Include the purpose of the study, the methods used, the results obtained, and the conclusions drawn.

Purpose of Experimental Design 

The purpose of experimental design is to control and manipulate one or more independent variables to determine their effect on a dependent variable. Experimental design allows researchers to systematically investigate causal relationships between variables, and to establish cause-and-effect relationships between the independent and dependent variables. Through experimental design, researchers can test hypotheses and make inferences about the population from which the sample was drawn.

Experimental design provides a structured approach to designing and conducting experiments, ensuring that the results are reliable and valid. By carefully controlling for extraneous variables that may affect the outcome of the study, experimental design allows researchers to isolate the effect of the independent variable(s) on the dependent variable(s), and to minimize the influence of other factors that may confound the results.

Experimental design also allows researchers to generalize their findings to the larger population from which the sample was drawn. By randomly selecting participants and using statistical techniques to analyze the data, researchers can make inferences about the larger population with a high degree of confidence.

Overall, the purpose of experimental design is to provide a rigorous, systematic, and scientific method for testing hypotheses and establishing cause-and-effect relationships between variables. Experimental design is a powerful tool for advancing scientific knowledge and informing evidence-based practice in various fields, including psychology, biology, medicine, engineering, and social sciences.

Advantages of Experimental Design 

Experimental design offers several advantages in research. Here are some of the main advantages:

  • Control over extraneous variables: Experimental design allows researchers to control for extraneous variables that may affect the outcome of the study. By manipulating the independent variable and holding all other variables constant, researchers can isolate the effect of the independent variable on the dependent variable.
  • Establishing causality: Experimental design allows researchers to establish causality by manipulating the independent variable and observing its effect on the dependent variable. This allows researchers to determine whether changes in the independent variable cause changes in the dependent variable.
  • Replication : Experimental design allows researchers to replicate their experiments to ensure that the findings are consistent and reliable. Replication is important for establishing the validity and generalizability of the findings.
  • Random assignment: Experimental design often involves randomly assigning participants to conditions. This helps to ensure that individual differences between participants are evenly distributed across conditions, which increases the internal validity of the study.
  • Precision : Experimental design allows researchers to measure variables with precision, which can increase the accuracy and reliability of the data.
  • Generalizability : If the study is well-designed, experimental design can increase the generalizability of the findings. By controlling for extraneous variables and using random assignment, researchers can increase the likelihood that the findings will apply to other populations and contexts.

Limitations of Experimental Design

Experimental design has some limitations that researchers should be aware of. Here are some of the main limitations:

  • Artificiality : Experimental design often involves creating artificial situations that may not reflect real-world situations. This can limit the external validity of the findings, or the extent to which the findings can be generalized to real-world settings.
  • Ethical concerns: Some experimental designs may raise ethical concerns, particularly if they involve manipulating variables that could cause harm to participants or if they involve deception.
  • Participant bias : Participants in experimental studies may modify their behavior in response to the experiment, which can lead to participant bias.
  • Limited generalizability: The conditions of the experiment may not reflect the complexities of real-world situations. As a result, the findings may not be applicable to all populations and contexts.
  • Cost and time : Experimental design can be expensive and time-consuming, particularly if the experiment requires specialized equipment or if the sample size is large.
  • Researcher bias : Researchers may unintentionally bias the results of the experiment if they have expectations or preferences for certain outcomes.
  • Lack of feasibility : Experimental design may not be feasible in some cases, particularly if the research question involves variables that cannot be manipulated or controlled.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Methods

Research Methods – Types, Examples and Guide

Mixed Research methods

Mixed Methods Research – Types & Analysis

Quantitative Research

Quantitative Research – Methods, Types and...

Quasi-Experimental Design

Quasi-Experimental Research Design – Types...

Case Study Research

Case Study – Methods, Examples and Guide

Questionnaire

Questionnaire – Definition, Types, and Examples

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Design of Experiments with Multiple Independent Variables: A Resource Management Perspective on Complete and Reduced Factorial Designs

Linda m. collins.

The Methodology Center and Department of Human Development and Family Studies, The Pennsylvania State University

John J. Dziak

The Methodology Center, The Pennsylvania State University

Department of Statistics and The Methodology Center, The Pennsylvania State University

An investigator who plans to conduct experiments with multiple independent variables must decide whether to use a complete or reduced factorial design. This article advocates a resource management perspective on making this decision, in which the investigator seeks a strategic balance between service to scientific objectives and economy. Considerations in making design decisions include whether research questions are framed as main effects or simple effects; whether and which effects are aliased (confounded) in a particular design; the number of experimental conditions that must be implemented in a particular design and the number of experimental subjects the design requires to maintain the desired level of statistical power; and the costs associated with implementing experimental conditions and obtaining experimental subjects. In this article four design options are compared: complete factorial, individual experiments, single factor, and fractional factorial designs. Complete and fractional factorial designs and single factor designs are generally more economical than conducting individual experiments on each factor. Although relatively unfamiliar to behavioral scientists, fractional factorial designs merit serious consideration because of their economy and versatility.

Suppose a scientist is interested in investigating the effects of k independent variables, where k > 1. For example, Bolger and Amarel (2007) investigated the hypothesis that the effect of peer social support on performance stress can be positive or negative, depending on whether the way the peer social support is given enhances or degrades self-efficacy. Their experiment could be characterized as involving four factors: support offered (yes or no), nature of support (visible or indirect), message from a confederate that recipient of support is unable to handle the task alone (yes or no), and message that a confederate would be unable to handle the task (yes or no).

One design possibility when k > 1 independent variables are to be examined is a factorial experiment. In factorial research designs, experimental conditions are formed by systematically varying the levels of two or more independent variables, or factors. For example, in the classic two × two factorial design there are two factors each with two levels. The two factors are crossed, that is, all combinations of levels of the two factors are formed, to create a design with four experimental conditions. More generally, factorial designs can include k ≥ 2 factors and can incorporate two or more levels per factor. With four two-level variables, such as in Bolger and Amarel (2007) , a complete factorial experiment would involve 2 × 2 × 2 × 2 = 16 experimental conditions. One advantage of factorial designs, as compared to simpler experiments that manipulate only a single factor at a time, is the ability to examine interactions between factors. A second advantage of factorial designs is their efficiency with respect to use of experimental subjects; factorial designs require fewer experimental subjects than comparable alternative designs to maintain the same level of statistical power (e.g. Wu & Hamada, 2000 ).

However, a complete factorial experiment is not always an option. In some cases there may be combinations of levels of the factors that would create a nonsensical, toxic, logistically impractical or otherwise undesirable experimental condition. For example, Bolger and Amarel (2007) could not have conducted a complete factorial experiment because some of the combinations of levels of the factors would have been illogical (e.g. no support offered but support was direct). But even when all combinations of factors are reasonable, resource limitations may make implementation of a complete factorial experiment impossible. As the number of factors and levels of factors under consideration increases, the number of experimental conditions that must be implemented in a complete factorial design increases rapidly. The accompanying logistical difficulty and expense may exceed available resources, prompting investigators to seek alternative experimental designs that require fewer experimental conditions.

In this article the term “reduced design” will be used to refer generally to any design approach that involves experimental manipulation of all k independent variables, but includes fewer experimental conditions than a complete factorial design with the same k variables. Reduced designs are often necessary to make simultaneous investigation of multiple independent variables feasible. However, any removal of experimental conditions to form a reduced design has important scientific consequences. The number of effects that can be estimated in an experimental design is limited to one fewer than the number of experimental conditions represented in the design. Therefore, when experimental conditions are removed from a design some effects are combined so that their sum only, not the individual effects, can be estimated. Another way to think of this is that two or more interpretational labels (e.g. main effect of Factor A; interaction between Factor A and Factor B) can be applied to the same source of variation. This phenomenon is known as aliasing (sometimes referred to as confounding, or as collinearity in the regression framework).

Any investigator who wants or needs to examine multiple independent variables is faced with deciding whether to use a complete factorial or a reduced experimental design. The best choice is one that strikes a careful and strategic balance between service to scientific objectives and economy. Weighing a variety of considerations to achieve such a balance, including the exact research questions of interest, the potential impact of aliasing on interpretation of results, and the costs associated with each design option, is the topic of this article.

Objectives of this article

This article has two objectives. The first objective is to propose that a resource management perspective may be helpful to investigators who are choosing a design for an experiment that will involve several independent variables. The resource management perspective assumes that an experiment is motivated by a finite set of research questions and that these questions can be prioritized for decision making purposes. Then according to this perspective the preferred experimental design is the one that, in relation to the resource requirements of the design, offers the greatest potential to advance the scientific agenda motivating the experiment. Four general design alternatives will be considered from a resource management perspective: complete factorial designs and three types of reduced designs. One of the reduced designs, the fractional factorial, is used routinely in engineering but currently unfamiliar to many social and behavioral scientists. In our view fractional factorial designs merit consideration by social and behavioral scientists alongside other more commonly used reduced designs. Accordingly, a second objective of this article is to offer a brief introductory tutorial on fractional factorial designs, in the hope of assisting investigators who wish to evaluate whether these designs might be of use in their research.

Overview of four design alternatives

Throughout this article, it is assumed that an investigator is interested in examining the effects of k independent variables, each of which could correspond to a factor in a factorial experiment. It is not necessarily a foregone conclusion that the k independent variables must be examined in a single experiment; they may represent a set of questions comprising a program of research, or a set of features or components comprising a behavioral intervention program. It is assumed that the k factors can be independently manipulated, and that no possible combination of the factors would create an experimental condition that cannot or should not be implemented. For the sake of simplicity, it is also assumed that each of the k factors has only two levels, such as On/Off or Yes/No. Factorial and fractional factorial designs can be done with factors having any number of levels, but two-level factors allow the most straightforward interpretation and largest statistical power, especially for interactions.

In this section the four different design alternatives considered in this article are introduced using a hypothetical example based on the following scenario: An investigator is to conduct a study on anxiety related to public speaking (this example is modeled very loosely on Bolger and Amarel, 2007 ). There are three factors of theoretical interest to the investigator, each with two levels, On or Off. The factors are whether or not (1) the subject is allowed to choose a topic for the presentation ( choose ); (2) the subject is taught a deep-breathing relaxation exercise to perform just before giving the presentation ( breath ); and (3) the subject is provided with extra time to prepare for the speech ( prep ). This small hypothetical example will be useful in illustrating some initial key points of comparison among the design alternatives. Later in the article the hypothetical example will be extended to include more factors so that some additional points can be illustrated.

The first alternative considered here is a complete factorial design. The remaining alternatives considered are reduced designs, each of which can be viewed as a subset of the complete factorial.

Complete factorial designs

Factorial designs may be denoted using the exponential notation 2 k , which compactly expresses that k factors with 2 levels each are crossed, resulting in 2 k experimental conditions (sometimes called “cells”). Each experimental condition represents a unique combination of levels of the k factors. In the hypothetical example a complete factorial design would be expressed as 2 3 (or equivalently, 2 × 2 × 2) and would involve eight experimental conditions. Table 1 shows these eight experimental conditions along with effect coding. The design enables estimation of seven effects: three main effects, three two-way interactions, and a single three-way interaction.

Experimental ConditionFactor 1Factor 2Factor 3Main EffectsInteractions
choosebreathprep1231 × 21 × 32 × 31 × 2 × 3
1OffOffOff-1-1-1111-1
2OffOffOn-1-111-1-11
3OffOnOff-11-1-11-11
4OffOnOn-111-1-11-1
5OnOffOff1-1-1-1-111
6OnOffOn1-11-11-1-1
7OnOnOff11-11-1-1-1
8OnOnOn1111111

Table 1 illustrates one feature of complete factorial designs in which an equal number of subjects is assigned to each experimental condition, namely the balance property. A design is balanced if each level of each factor appears in the design the same number of times and is assigned to the same number of subjects ( Hays, 1994 ; Wu & Hamada, 2000 ). In a balanced design the main effects and interactions are orthogonal, so that each one is estimated and tested as if it were the only one under consideration, with very little loss of efficiency due to the presence of other factors 1 . (Effects may still be orthogonal even in unbalanced designs if certain proportionality conditions are met; see e.g. Hays, 1994 , p. 475.) The balance property is evident in Table 1 ; each level of each factor appears exactly four times.

Individual experiments

The individual experiments approach requires conducting a two-condition experiment for each independent variable, that is, k separate experiments. In the example this would require conducting three different experiments, involving a total of six experimental conditions. In one experiment, a condition in which subjects are allowed to choose the topic of the presentation would be compared to one in which subjects are assigned a topic; in a second experiment, a condition in which subjects are taught a relaxation exercise would be compared to one in which no relaxation exercise is taught; in a third experiment, a condition in which subjects are given ample time to prepare in advance would be compared to one in which subjects are given little preparation time. The subset of experimental conditions from the complete three-factor factorial experiment in Table 1 that would be implemented in the individual experiments approach is depicted in the first section of Table 2 . This design, considered as a whole, is not balanced. Each of the independent variables is set to On once and set to Off five times.

Experimental ConditionFactor 1Factor 2Factor 3Main EffectsInteractions
choosebreathprep1231 × 21 × 32 × 31 × 2 × 3
Subset comprising individual experiments
Effect of
 1OffOffOff-1-1-1111-1
 5OnOffOff1-1-1-1-111
Effect of
 1OffOffOff-1-1-1111-1
 3OffOnOff-11-1-11-11
Effect of
 1OffOffOff-1-1-1111-1
 2OffOffOn-1-111-1-11
Subset comprising single factor design: Comparative treatment design
 1OffOffOff-1-1-1111-1
 2OffOffOn-1-111-1-11
 3OffOnOff-11-1-11-11
 5OnOffOff1-1-1-1-111
Subset comprising single factor design: Constructive treatment design
 1OffOffOff-1-1-1111-1
 5OnOffOff1-1-1-1-111
 7OnOnOff11-11-1-1-1
 8OnOnOn1111111
Subset comprising fractional factorial design
 2OffOffOn-1-111-1-11
 3OffOnOff-11-1-11-11
 5OnOffOff1-1-1-1-111
 8OnOnOn1111111

Single factor designs in which the factor has many levels

In the single factor approach a single experiment is performed in which various combinations of levels of the independent variables are selected to form one nominal or ordinal categorical factor with several qualitatively distinct levels. West, Aiken, and Todd (1993 ; West & Aiken, 1997 ) reviewed three variations of the single factor design that are used frequently, particularly in research on behavioral interventions for prevention and treatment. In the comparative treatment design there are k +1 experimental conditions: k experimental conditions in which one independent variable is set to On and all the others to Off, plus a single control condition in which all independent variables are set to Off. This approach is similar to conducting separate individual experiments, except that a shared control group is used for all factors. The second section of Table 2 shows the four experimental conditions that would comprise a comparative treatment design in the hypothetical example. These are the same experimental conditions that appear in the individual experiments design.

By contrast, for the constructive treatment design an intervention is “built” by combining successive features. For example, an investigator interested in developing a treatment to reduce anxiety might want to assess the effect of allowing the subject to choose a topic, then the incremental effect of also teaching a relaxation exercise, then the incremental effect of allowing extra preparation time. The third section of Table 2 shows the subset of experimental conditions from the complete factorial shown in Table 1 that would be implemented in a three-factor constructive treatment experiment in which first choose is added, followed by breath and then prep . The constructive treatment strategy typically has k +1 experimental conditions but may have fewer or more. The dismantling design, in which the objective is to determine the effect of removing one or more features of an intervention, and other single factor designs are based on similar logic.

Table 2 shows that both the comparative treatment design and the constructive treatment design are unbalanced. In the comparative treatment design, each factor is set to On once and set to Off three times. In the constructive treatment design, choose is set to Off once and to On three times, and prep is set to On once and to Off three times. Other single factor designs are similarly unbalanced.

Fractional factorial designs

The fourth alternative considered in this article is to use a design from the family of fractional factorial designs. A fractional factorial design involves a special, carefully chosen subset, or fraction, of the experimental conditions in a complete factorial design. The bottom section of Table 2 shows a subset of experimental conditions from the complete three-factor factorial design that constitute a fractional factorial design. The experimental conditions in fractional factorial designs are selected so as to preserve the balance property. 2 As Table 2 shows, each level of each factor appears in the design exactly twice.

Fractional factorial designs are represented using an exponential notation based on that used for complete factorial designs. The fractional factorial design in Table 2 would be expressed as 2 3−1 . This notation contains the following information: (a) the corresponding complete factorial design is 2 3 , in other words involves 3 factors, each of which has 2 levels, for a total of 8 experimental conditions; (b) the fractional factorial design involves 2 3−1 = 2 2 = 4 experimental conditions; and (c) this fractional factorial design is a 2 −1 = 1/2 fraction of the complete factorial. Many fractional factorial designs, particularly those with many factors, involve even smaller fractions of the complete factorial.

Aliasing in the individual experiments, single factor, and fractional factorial designs

It was mentioned above that reduced designs involve aliasing of effects. A design's aliasing is evident in its effect coding. When effects are aliased their effect coding is perfectly correlated (whether positively or negatively). Aliasing in the individual experiments approach can be seen by examining the first section of Table 2 . In the experiment examining choose , the effect codes are identical for the main effect of choose and the choose × breath × prep interaction (−1 for experimental condition 1 and 1 for experimental condition 4), and these are perfectly negatively correlated with the effect codes for the choose × breath and choose × prep interactions. Thus these effects are aliased; the effect estimated by this experiment is an aggregate of the main effect of choose and all of the interactions involving choose . (The codes for the remaining effects, namely the main effects of breath and prep and the breath × prep interaction, are constants in this design.) Similarly, in the experiment investigating breath , the main effect and all of the interactions involving breath are aliased, and in the experiment investigating prep , the main effect and all of the interactions involving prep are aliased.

The aliasing in single factor experiments using the comparative treatment strategy is identical to the aliasing in the individual experiments approach. As shown in the second section of Table 2 , for the hypothetical example a comparative treatment experiment would involve experimental conditions 1, 2, 3, and 5, which are the same conditions as in the individual experiments approach. The effects of each factor are assessed by means of the same comparisons; for example, the effect of choose would be assessed by comparing experimental conditions 1 and 5. The primary difference is that only one control condition would be required in the single factor experiment, whereas in the individual experiments approach three control conditions are required.

The constructive treatment strategy is comprised of a different subset of experimental conditions from the full factorial than the individual experiments and comparative treatment approaches. Nevertheless, the aliasing is similar. As the third section of Table 2 shows, the effect of adding choose would be assessed by comparing experimental conditions 1 and 5, so the aliasing would be the same as that in the individual experiment investigating choose discussed above. The cumulative effect of adding breath would be assessed by comparing experimental conditions 5 and 7. The effect codes in these two experimental conditions for the main effect of breath are perfectly (positively or negatively) correlated with those for all of the interactions involving breath , although here the effect codes for the interactions are reversed as compared to the individual experiments and comparative treatment approaches. The same reasoning applies to the effect of prep , which is assessed by comparing experimental conditions 7 and 8.

As the fourth section of Table 2 illustrates, the aliasing in fractional factorial designs is different from the aliasing seen in the individual experiments and single factor approaches. In this fractional factorial design the effect of choose is estimated by comparing the mean of experimental conditions 2 and 3 with the mean of experimental conditions 5 and 8; the effect of breath is estimated by comparing the mean of experimental conditions 3 and 8 to the mean of experimental conditions 2 and 5; and the effect of prep is estimated by comparing the mean of experimental conditions 2 and 8 to the mean of experimental conditions 3 and 5. The effect codes show that the main effect of choose and the breath × prep interaction are aliased. The remaining effects are either orthogonal to the aliased effect or constant. Similarly, the main effect of breath and the choose × prep interaction are aliased, and the main effect of prep and the choose × breath interaction are aliased.

Note that each source of variation in this fractional factorial design has two aliases (e.g. choose and the breath × prep interaction form a single source of variation). This is characteristic of fractional factorial designs that, like this one, are 1/2 fractions. The denominator of the fraction always reveals how many aliases each source of variation has. Thus in a fractional factorial design that is a 1/4 fraction each source of variation has four aliases; in a fractional factorial design that is a 1/8 fraction each source of variation has eight aliases; and so on.

Aliasing and scientific questions

An investigator who is interested in using a reduced design to estimate the effects of k factors faces several considerations. These include: whether the research questions of primary scientific interest concern simple effects or main effects; whether the design's aliasing means that assumptions must be made in order to address the research questions; and how to use aliasing strategically. Each of these considerations is reviewed in this section.

Simple effects and main effects

In this article we have been discussing a situation in which a finite set of k independent variables is under consideration and the individual effects of each of the k variables are of interest. However, the question “Does a particular factor have an effect?” is incomplete; different research questions may involve different types of effects. Let us examine three different research questions concerning the effect of breath in the hypothetical example, and see how they correspond to effects in a factorial design.

Question 1: “Does the factor breath have an effect on the outcome variable when the factors choose and prep are set to Off?”

Question 2: “Will an intervention consisting of only the factors choose and prep set to On be improved if the factor breath is changed from Off to On?”

Question 3: “Does the factor breath have an effect on the outcome variable on average across levels of the other factors?”

In the language of experimental design, Questions 1 and 2 concern simple effects, and Question 3 concerns a main effect. The distinction between simple effects and main effects is subtle but important. A simple effect of a factor is an effect at a particular combination of levels of the remaining factors. There are as many simple effects for each factor as there are combinations of levels of the remaining factors. For example, the simple effect relevant to Question 1 is the conditional effect of changing breath from Off to On, assuming both prep and choose are set to Off. The simple effect relevant to Question 2 is the conditional effect of changing breath from Off to On, assuming both other factors are set to On. Thus although Questions 1 and 2 both are concerned with simple effects of breath , they are concerned with different simple effects.

A significant main effect for a factor is an effect on average across all combinations of levels of the other factors in the experiment. For example, Question 3 is concerned with the main effect of breath , that is, the effect of breath averaged across all combinations of levels of prep and choose . Given a particular set of k factors, there is only one main effect corresponding to each factor.

Simple effects and main effects are not interchangeable, unless we assume that all interactions are negligible. Thus, neither necessarily tells anything about the other. A positive main effect does not imply that all of the simple effects are nonzero or even nonnegative. It is even possible (due to a large interaction) for one simple effect to be positive, another simple effect for the same factor to be negative, and the main (averaged) effect to be zero. In the public speaking example, the answer to Question 2 does not imply anything about whether an intervention consisting of breath alone would be effective, or whether there would be an incremental effect of breath if it were added to an intervention initially consisting of choose alone.

Research questions, aliasing, and assumptions

Suppose an investigator is interested in addressing Question 1 above. The answer to this research question depends only upon the particular simple effect of breath when both of the other factors are set to Off. The research question does not ask whether any observed differences are attributable to the main effect of breath , the breath × prep interaction, the breath × choose interaction, the breath × prep × choose interaction, or some combination of the aliased effects. The answer to Question 2, which also concerns a simple effect, depends only upon whether changing breath from Off to On has an effect on the outcome variable when prep and choose are set to On; it does not depend on establishing whether any other effects in the model are present or absent. As Kirk (1968) pointed out, simple effects “represent a partition of a treatment sum of squares plus an interaction sum of squares” (p. 380). Thus, although there is aliasing in the individual experiments and comparative treatment strategies, these designs are appropriate for addressing Question 1, because the aliased effects correspond exactly to the effect of interest in Question 1. Similarly, although there is aliasing in the constructive treatment strategy, this design is appropriate for addressing Question 2. In other words, although in our view it is important to be aware of aliasing whenever considering a reduced experimental design, the aliasing ultimately is of little consequence if the aliased effect as a package is of primary scientific interest.

The individual experiments and comparative treatment strategies would not be appropriate for addressing Question 2. The constructive treatment strategy could address Question 1, but only if breath was the first factor set high, with the others low, in the first non-control group. The conclusions drawn from these experiments would be limited to simple effects and cannot be extended to main effects or interactions.

The situation is different if a reduced design is to be used to estimate main effects. Suppose an investigator is interested in addressing Question 3, that is, is interested in the main effect of breath . As was discussed above, in the individual experiments, comparative treatment, and constructive treatment approaches the main effect of breath is aliased with all the interactions involving breath . It is appropriate to use these designs to draw conclusions about the main effect of breath only if it is reasonable to assume that all of the interactions involving breath up to the k -way interaction are negligible. Then any effect of breath observed using an individual experiment or a single factor design is attributable to the main effect.

The difference in the aliasing structure of fractional factorial designs as compared to individual experiments and single factor designs becomes particularly salient when the primary scientific questions that motivate an experiment require estimating main effects as opposed to simple effects, and when larger numbers of factors are involved. However, the small three-factor fractional factorial experiment in Table 2 can be used to demonstrate the logic behind the choice of a particular fractional factorial design. In the design in Table 2 the main effect of breath is aliased with one two-way interaction: prep × choose . If it is reasonable to assume that this two-way interaction is negligible, then it is appropriate to use this fractional factorial design to estimate the main effect of breath . In general, investigators considering using a fractional factorial design seek a design in which main effects and scientifically important interactions are aliased only with effects that can be assumed to be negligible.

Many fractional factorial designs in which there are four or more factors require many fewer and much weaker assumptions for estimation of main effects than those required by the small hypothetical example used here. For these larger problems it is possible to identify a fractional factorial design that uses fewer experimental conditions than the complete design but in which main effects and also two-way interaction are aliased only with interactions involving three or more factors. Many of these designs also enable identification of some three-way interactions that are to be aliased only with interactions involving four or more factors. In general, the appeal of fractional factorial designs increases as the number of factors becomes larger. By contrast, individual experiments and single factor designs always alias main effects and all interactions from the two-way up to the k -way, no matter how many factors are involved.

Strategic aliasing and designating negligible effects

A useful starting point for choosing a reduced design is sorting all of the effects in the complete factorial into three categories: (1) effects that are of primary scientific interest and therefore are to be estimated; (2) effects that are expected to be zero or negligible; and (3) effects that are not of primary scientific interest but may be non-negligible. Strategic aliasing involves ensuring that effects of primary scientific interest are aliased only with negligible effects. There may be non-negligible effects that are not of scientific interest. Resources are not to be devoted to estimating such effects, but care must be taken not to alias them with effects of primary scientific interest.

Considering which, if any, effects to place in the negligible category is likely to be an unfamiliar, and perhaps in some instances uncomfortable, process for some social and behavioral scientists. However, the choice is critically important. On the one hand, when more effects are designated negligible the available options will in general include designs involving smaller numbers of experimental conditions; on the other hand, incorrectly designating effects as negligible can threaten the validity of scientific conclusions. The best bases for making assumptions about negligible effects are theory and prior empirical research. Yet there are few areas in the social and behavioral sciences in which theory makes specific predictions about higher-order interactions, and it appears that to date there has been relatively little empirical investigation of such interactions. Given this lack of guidance, on what basis can an investigator decide on assumptions?

A very cautious approach would be to assume that each and every interaction up to the k -way interaction is likely to be sizeable, unless there is empirical evidence or a compelling theoretical basis for assuming that it is negligible. This is equivalent to leaving the negligible category empty and designating each effect either of primary scientific interest or non-negligible. There are two strategies consistent with this perspective. One is to conduct a complete factorial experiment, being careful to ensure adequate statistical power to detect any interactions of scientific interest. The other strategy consistent with assuming all interactions are likely to be sizeable is to frame research questions only about simple effects that can reasonably be estimated with the individual experiments or single factor approaches. For example, as discussed above the aliasing associated with the comparative treatment design may not be an issue if research questions are framed in terms of simple effects.

If these cautious strategies seem too restrictive, another possibility is to adopt some heuristic guiding principles (see Wu & Hamada, 2000 ) that are used in engineering research for informing the choice of assumptions and aliasing structure and to help target resources in areas where they are likely to result in the most scientific progress. The guiding principles are intended for use when theory and prior research are unavailable; if guidance from these sources is available it should always be applied first. One guiding principle is called Hierarchical Ordering . This principle states that when resources are limited, the first priority should be estimation of lower order effects. Thus main effects are the first investigative priority, followed by two-way interactions. As Green and Rao (1971) noted, “…in many instances the simpler (additive) model represents a very good approximation of reality” (p. 359), particularly if measurement quality is good and floor and ceiling effects can be avoided. Another guiding principle is called Effect Sparsity ( Box & Meyer, 1986 ), or sometimes the Pareto Principle in Experimental Design ( Wu & Hamada, 2000 ). This principle states that the number of sizeable and important effects in a factorial experiment is small in comparison to the overall number of effects. Taken together, these principles suggest that unless theory and prior research specifically suggest otherwise, there are likely to be relatively few sizeable interactions except for a few two-way interactions and even fewer three-way interactions, and that aliasing the more complex and less interpretable higher-order interactions may well be a good choice.

Resolution of fractional factorial designs

Some general information about aliasing of main effects and two-way interactions is conveyed in a fractional factorial design's resolution ( Wu & Hamada, 2000 ). Resolution is designated by a Roman numeral, usually either III, IV, V or VI. The aliasing of main effects and two-way interactions in these designs is shown in Table 3 . As Table 3 shows, as design resolution increases main effects and two-way interactions become increasingly free of aliasing with lower-order interactions. Importantly, no design that is Resolution III or higher aliases main effects with other main effects.

Design resolutionMain effects not aliased withTwo-way interactions not aliased with
Resolution IIImain effects
Resolution IVmain effects and two-way interactionsmain effects
Resolution Vmain effects, two-way interactions and three-way interactionsmain effects and two-way interactions
Resolution VImain effects, two-way interactions, three-way interactions and four-way interactionsmain effects, two-way interactions and three-way interactions

Table 3 shows only which effects are not aliased with main effects and two-way interactions. Which and how many effects are aliased with main effects and two-way interactions depends on the exact design. For example, consider a 2 6−2 fractional factorial design. As mentioned previously, this is a 1/4 fraction design, so each source of variance has four aliases; thus each main effect is aliased with three other effects. Suppose this design is Resolution IV. Then none of the three effects aliased with the main effect will be another main effect or a two-way interaction. Instead, they will be three higher-order interactions.

According to the Hierarchical Ordering and Effect Sparsity principles, in the absence of theory or evidence to the contrary it is reasonable to make the working assumption that higher-order interactions are less likely to be sizeable than lower-order interactions. Thus, all else being equal, higher resolution designs, which alias scientifically important main effects and two-way interactions with higher-order interactions, are preferred to lower resolution designs, which alias these effects with lower-order interactions or with main effects. This concept has been called the maximum resolution criterion by Box and Hunter (1961) .

In general higher resolution designs tend to require more experimental conditions, although for a given number of experimental conditions there may be design alternatives with different resolutions.

Relative resource requirements of the four design alternatives

Number of experimental conditions and subjects required.

The four design options considered here can vary widely with respect to the number of experimental conditions that must be implemented and the number of subjects required to achieve a given statistical power. These two resource requirements must be considered separately. In single factor experiments, the number of subjects required to perform the experiment is directly proportional to the number of experimental conditions to be implemented. However, when comparing different designs in a multi-factor framework this is not the case. For instance, a complete factorial may require many more experimental conditions than the corresponding individual experiments or single factor approach, yet require fewer total subjects.

Table 4 shows how to compute a comparison of the number of experimental conditions required by each of the four design alternatives. As Table 4 indicates, the individual experiments, single factor and fractional factorial approaches are more economical than the complete factorial approach in terms of number of experimental conditions that must be implemented. In general, the single factor approach requires the fewest experimental conditions.

Number experimental conditionsNumber subjects
Complete factorial2
Individual experiments2
Single factor + 1( + 1)
Fractional factorial2 or fewer

Table 4 also provides a comparison of the minimum number of subjects required to maintain the same level of statistical power. Suppose a total of k factors are to be investigated, with the smallest effect size among them equal to d , and that a total minimum sample size of N is required in order to maintain a desired level of statistical power at a particular Type I error rate. The effect size d might be the expected normalized difference between two means, or it might be the smallest normalized difference considered clinically or practically significant. (Note that in practice there must be at least one subject per experimental condition, so at a minimum N must at least equal the number of experimental conditions. This may require additional subjects beyond the number needed to achieve a given level of power when implementing complete factorial designs with large k .) Table 4 shows that the complete factorial and fractional factorial designs are most economical in terms of sample size requirements. In any balanced factorial design each main effect is estimated using all subjects, averaging across the other main effects. In the hypothetical three-factor example, the main effects of choose , breath and prep are each based on all N subjects, with the subjects sorted differently into treatment and control groups for each main effect estimate. For example, Table 2 shows that in both the complete and fractional factorial designs a subject assigned to experimental condition 3 is in the Off group for the purpose of estimating the main effects of choose and prep but in the On group for the purpose of estimating the main effect of breath .

Essentially factorial designs “recycle” subjects by placing every subject in one of the levels of every factor. As long as the sample sizes in each group are balanced, orthogonality is maintained, so that estimation and testing for each effect can be treated as independent of the other effects. (The idea of “balance” here assumes that each level of each factor is assigned exactly the same amount of subjects, which may not hold true in practice; however, the benefits associated with balance hold approximately even if there are slight imbalances in the number of subjects per experimental condition.) Because they “recycle” subjects while keeping factors mutually orthogonal to each other, balanced factorial designs make very efficient use of experimental subjects. In fact, this means that an increase in the number of factors in a factorial experiment does not necessarily require an increase in the total sample size in order to maintain approximately the same statistical power for testing main effects. This efficiency applies only to main effects, though. For example, given a fixed sample size N , the more experimental conditions there are, the fewer subjects will be in each experimental condition and the less power there will be for, say, pairwise comparisons of particular experimental conditions.

By contrast, the individual experiments approach sometimes requires many more subjects than the complete factorial experiment to obtain a given level of statistical power, because it cannot reuse subjects to test different orthogonal effect estimates simultaneously as balanced factorial experiments can. As Table 4 shows, if a factorial experiment with k factors requires an overall sample size of N to achieve a desired level of statistical power for detecting a main effect of size d at a particular Type I error rate, the comparable individual experiments approach requires kN subjects to detect a simple effect of the same size at the same Type I error rate. This is because the first experiment requires N subjects, the second experiment requires another N subjects, and so on, for a total of kN . In other words, in the individual experiments approach subjects are used in a single experiment to estimate a single effect, and then discarded. The extra subjects provide neither increased Type I error protection nor appreciably increased power, relative to the test of a simple effect in the single factor approach or the test of a main effect in the factorial approach. Unless there is a special need to obtain results from one experiment before beginning another, the extra subjects are largely wasted resources.

As Table 4 shows, if a factorial experiment with k factors requires an overall sample size of N to achieve a desired level of statistical power for detect a main effect of size d at a particular Type I error rate, the comparable single factor approach requires a sample size of ( k + 1)( N /2) to detect a simple effect of the same size at the same Type I error rate. This is because in the single factor approach, to maintain power each mean comparison must be based on two experimental conditions including a total of N subjects. Thus N /2 subjects per experimental condition would be required. However, this single factor experiment would be adequately powered for k simple effects, whereas the comparable factorial experiment with N subjects, although adequately powered for k main effects, would be underpowered for k simple effects. This is because estimating a simple effect in a factorial experiment essentially requires selecting a subset of experimental conditions and discarding the remaining conditions along with the subjects that have been assigned to them. This would bring the sample size considerably below N for each simple effect.

Subject, condition, and overall costs

In order to compare the resource requirements of the four design alternatives it is helpful to draw a distinction between per-subject costs and per-condition overhead costs. Examples of subject costs are recruitment and compensation of human subjects, and housing, feeding and care of laboratory animals. Condition overhead costs refer to costs required to plan, implement, and manage each experimental condition in a design, beyond the cost of the subjects assigned to that condition. Examples of condition overhead costs are training and salaries of personnel to run an experiment, preparation of differing versions of materials needed for different experimental conditions, and cost of setting up and taking down laboratory equipment. Thus, the overhead cost associated with an experimental condition may be either more or less than the cost of a subject. Because the absolute and relative costs in these two domains vary considerably according to the situation, the absolute and relative costs associated with the four designs considered here can vary considerably as well.

One possible scenario is one in which both per-condition overhead costs and per-subject costs are low. For example, consider a social psychology experiment in which experimental conditions consist of different written materials, the experimenters are graduate students on stipends, and a large departmental subject pool is at their disposal. This represents the happy circumstance in which a design can be chosen on purely scientific grounds with little regard to financial costs. Another possible scenario is one in which per-condition overhead costs are low but per-subject costs are high, as might occur if an experiment is to be conducted via the Internet. In this study perhaps adding an experimental condition is a fairly straightforward computer programming task, but substantial cash incentives are required to ensure subject participation. Another example might be an experiment in which individual experimental conditions are not difficult to set up, but the subjects are laboratory animals whose purchase, feeding and care is very costly. Per-condition costs might roughly equal per-subject costs in a similar scenario in which each experimental condition involves time-intensive and complicated reconfiguring of laboratory equipment by a highly-paid technician. Per-condition overhead costs might greatly exceed per-subject costs when subjects are drawn from a subject pool and are not monetarily compensated, but each new experimental condition requires additional training of personnel, preparation of elaborate new materials, or difficult reconfiguration of laboratory equipment.

Comparing relative estimated overall costs across designs

In this section we demonstrate a comparison of relative financial costs across the four design alternatives, based on the expressions in Table 4 . In the demonstration we consider four different situations: effect sizes of d = .2 or d = .5 (corresponding to Cohen's (1988) benchmark values for small and medium, respectively), and k = 6 or k = 10 two-level independent variables. The starting point for the cost comparison is the number of experimental conditions required by each design, and the sample sizes required to achieve statistical power of at least .8 for testing the effect of each factor in the way that seemed appropriate for the design. Specifically, for the full and fractional factorial designs, we calculated the total sample size N needed to have a power of .80 for each main effect. For the individual experiments and single factor designs, we calculated the N needed for a power of .80 for each simple effect of interest. These are shown in Table 5 . As the table indicates, the fractional factorial designs used for k = 6 and k = 10 are both Resolution IV.

= .2 = .5
Number of experimental conditions per experimental conditionOverall per experimental conditionOverall
=6
Complete factorial6413832 2128
Individual experiments12394472864768
Single factor7394275864448
Fractional factorial Resolution IV(2 )1650800 8128
=10
Complete factorial102411024 11024
Individual experiments203947880641280
Single factor11394433464704
Fractional factorial Resolution IV(2 )3225800 4128

A practical issue arose that influenced the selection of the overall sample sizes N that are listed in Table 5 . Let N min designate the minimum overall N required to achieve a desired level of statistical power. In the cases marked with an asterisk the overall N that was actually used exceeds N min , because experimental conditions cannot have fractional numbers of subjects. Let n designate the number of subjects in each experimental condition, assuming equal n 's are to be assigned to each experimental condition. In theory the minimum n per experimental condition for a particular design would be N min divided by the number of experimental conditions. However, in some of the cases in Table 5 this would have resulted in a non-integer n . In these cases the per-condition n was rounded up to the nearest integer. For example, consider the complete factorial design with k =10 factors and d = .2. In theory a per-factor power of ≥ .8 would be maintained with N min = 788. However, the complete factorial design required 1024 experimental conditions, so the minimum N that could be used was 1024. All cost comparisons reported here are based on the overall N listed in Table 5 .

For purposes of illustration, per-subject cost will be defined here as the average incremental cost of adding a single research subject to a design without increasing the number of experimental conditions, and condition overhead cost will be defined as the average incremental cost of adding a single experimental condition without increasing the number of subjects. (For simplicity we assume per subject costs do not differ dramatically across conditions.) Then a rough estimate of total costs can be computed as follows, providing a basis for comparing the four design alternatives:

Figure 1 illustrates total costs for experiments corresponding to the situations and designs in Table 5 , for experiments in which per-subject costs equal or exceed per-condition overhead costs. In order to compute total costs on the y -axis, per-condition costs were arbitrarily fixed at $1. Thus the x -axis can be interpreted as the ratio of per-subject costs to per-condition costs; for example, the “4” on the x -axis means that per-subject costs are four times per-condition costs.

An external file that holds a picture, illustration, etc.
Object name is nihms148234f1.jpg

Costs of different experimental design options when per-subject costs exceed per-condition overhead costs. Total costs are computed with per-condition costs fixed at $1.

In the situations considered in Figure 1 , fractional factorial designs were always either least expensive or tied with complete factorial designs for least expensive. As the ratio of per-subject costs to per-condition costs increased, the economy of complete and fractional factorial designs became increasingly evident. Figure 1 shows that when per-subject costs outweighed per-condition costs, the single factor approach and, in particular, the individual experiments approach were often much more expensive than even complete factorial designs, and fractional factorials were often the least expensive.

Figure 2 examines the same situations as in Figure 1 , but now total costs are shown on the y -axis for experiments in which per-condition overhead costs equal or exceed per-subject costs. In order to compute total costs, per-subject costs were arbitrarily fixed at $1. Thus the x -axis represents the ratio of per-condition costs to per-subject costs; in this figure the “40” on the x -axis means that per-condition costs are forty times per-subject costs.

An external file that holds a picture, illustration, etc.
Object name is nihms148234f2.jpg

Costs of different experimental design options when per-condition overhead costs exceed per-subject costs. Total costs are computed with per-subject costs fixed at $1.

The picture here is more complex than that in Figure 1 . For the most part, in the four situations considered here the complete factorial was the most expensive design, frequently by a wide margin. The complete factorial requires many more experimental conditions than any of the other design alternatives, so it is not surprising that it was expensive when condition costs were relatively high. It is perhaps more surprising that the individual experiments approach, although it requires many fewer experimental conditions than the complete factorial, was usually the next most expensive. The individual experiments approach even exceeded the cost of the complete factorial under some circumstances when the effect sizes were small. This is because the reduction in experimental conditions afforded by the individual experiments approach was outweighed by much greater subject requirements (see Table 4 ). Figure 2 shows that the least expensive approaches were usually the single factor and fractional factorial designs. Which of these two was less expensive depended on effect size and the ratio of per-condition costs to per-subject costs. When the effect sizes were large and the ratio of per-condition costs to per-subject costs was less than about 20, fractional factorial designs tended to be more economical; the single factor approach was most economical once per-condition costs exceeded about 20 times per-subject costs. However, when effect sizes were small, fractional factorial designs were cheaper until the ratio of per-condition costs to per-subject costs substantially exceeded 100.

A brief tutorial on selecting a fractional factorial design

In this section we provide a brief tutorial intended to familiarize investigators with the basics of choosing a fractional factorial design. The more advanced introduction to fractional factorial designs provided by Kirk (1995) and Kuehl (1999) and the detailed treatment in Wu and Hamada (2000) are excellent resources for further reading.

When the individual experiments and single factor approaches are used, typically the choice of experimental conditions is made on intuitive grounds, with aliasing per se seldom an explicit basis for choosing a design. By contrast, when fractional factorial designs are used aliasing is given primary consideration. Usually a design is selected to achieve a particular aliasing structure while considering cost. Although the choice of experimental conditions for fractional factorials may be less intuitively obvious, this should not be interpreted as meaning that the selection of a fractional factorial design has no conceptual basis. On the contrary, fractional factorial designs are carefully chosen with key research questions in mind.

There are many possible fractional factorial designs for any set of k factors. The designs vary in how many experimental conditions they require and the nature of the aliasing. Fortunately, the hard work of determining the number of experimental conditions and aliasing structure of fractional factorial designs has largely been done. The designs can be found in books (e.g. Box et al., 1978 ; Wu & Hamada, 2000 ) and on the Internet (e.g. National Institute of Standards and Technology/SEMATECH, 2006 ), but the easiest way to choose a fractional factorial design is by using computer software. Here we demonstrate the use of PROC FACTEX ( SAS Institute, Inc., 2004 ). Using this approach the investigator specifies the factors in the experiment, and may specify which effects are in the Estimate, Negligible and Non-negligible categories, the desired design resolution, maximum number of experimental conditions (sometimes called “runs”), and other aspects relevant to choice of a design. The software returns a design that meets the specified criteria, or indicates that such a design does not exist. Minitab (see Ryan, Joiner, & Cryer, 2004 ; Mathews, 2005 ) and S-PLUS ( Insightful Corp., 2007 ) also provide software for designing fractional factorial experiments.

To facilitate the presentation, let us increase the size of the hypothetical example. In addition to the factors (1) choose , (2) breath , and (3) prep , the new six-factor example will also include factors corresponding to whether or not (4) an audience is present besides just the investigator ( audience ); (5) the subject is promised a monetary reward if the speech is judged good enough ( stakes ); and (6) the subject is allowed to speak from notes ( notes ). A complete factorial experiment would require 2 6 = 64 experimental conditions. Three different ways of choosing a fractional factorial design using SAS PROC FACTEX are illustrated below.

Specifying a desired resolution

One way to use software to choose a fractional factorial design is to specify a desired resolution and instruct the software to find the smallest number of experimental conditions needed to achieve it. For example, suppose the investigator in the hypothetical example finds it acceptable to alias main effects with interactions as low as three-way, and to alias two-way interactions with other two-way interactions and higher-order interactions. A design of Resolution IV will meet these criteria and may be requested as follows:

SAS will find a design with these characteristics if it can, print information on the aliasing and design matrix, and save the design matrix in the dataset dataset1. The ALIASING(6) command requests a list of all aliasing up to six-way interactions, and DESIGN asks for the effect codes for each experimental condition in the design to be printed.

Table 6 shows the effect codes from the SAS output for this design. The design found by SAS requires only 16 experimental conditions; that is, the design is a 2 6−2 , or a one-quarter fractional factorial because it requires only 2 −2 = 1/4 = 16/64 of the experimental conditions in the full experiment. In a one-quarter fraction each source of variance has four aliases. This means that each main effect is aliased with three other effects. Because this is a Resolution IV design, all of these other effects are three-way interactions or any higher-order interactions; they will not be main effects or two-way interactions. Similarly, each two-way interaction is aliased with three other effects. Because this is a Resolution IV design, these other effects may be any interactions.

Conditionbreathaudiencechooseprepnotesstakes
1-1-1-1-1-1-1
2-1-1-1111
3-1-11-111
4-1-111-1-1
5-11-1-11-1
6-11-11-11
7-111-1-11
8-11111-1
91-1-1-1-11
101-1-111-1
111-11-11-1
121-111-11
1311-1-111
1411-11-1-1
15111-1-1-1
16111111

Different fractional factorial designs, even those with the same resolution, have different aliasing structures, some of which may appeal more to an investigator than others. SAS simply returns the first one it can find that fits the desired specifications. There is no feature in SAS, to the best of our knowledge, that automatically returns multiple possible designs with the same resolution, but it is possible to see different designs by arbitrarily changing the order in which the factors are listed in the FACTORS statement. Another possibility is to use the MINABS option to request a design that meets the “minimum aberration” criterion, which is a mathematical definition of least-aliased (see Wu & Hamada, 2000 ).

Specifying which effects are in which categories

The above methods of identifying a suitable fractional factorial design did not require specification of which effects are of primary scientific interest, which are negligible, and which are non-negligible, although the investigator would have to have determined this in order to decide that a Resolution IV design was desired. Another way to identify a fractional factorial design is to specify directly which effects fall in each of these categories, and instruct the software to find the smallest design that does not alias effects of primary interest either with each other or with effects in the non-negligible category. This method enables a little more fine-tuning.

Suppose in addition to the main effects, the investigator wants to be able to estimate all two-way interactions involving breath . The remaining two-way interactions and all three-way interactions are not of scientific interest but may be sizeable, so they are designated non-negligible. In addition, one four-way interaction, breath × prep × notes × stakes might be sizeable, because those factors are suspected in advance to be the most powerful factors, and so their combination might lead to a floor or ceiling effect, which could act as an interaction. This four-way interaction is placed in the non-negligible category. All remaining effects are designated negligible. Given these specifications, a design with the smallest possible number of experimental conditions is desired. The following code will produce such a design:

  
  
  

The ESTIMATE statement designates the effects that are of primary scientific interest and must be aliased only with effects expected to be negligible. The NONNEGLIGIBLE statement designates effects that are not of scientific interest but may be sizeable; these effects must not be aliased with effects mentioned in the ESTIMATE statement. It is necessary to specify only effects to be estimated and those designated non-negligible; any remaining effects are assumed negligible.

The SAS output (not shown) indicates that the result is a 2 6−1 design, which has 32 experimental conditions, and that this design is Resolution VI. Because this design is a one-half fraction of the complete factorial, each source of variation has two aliases, or, in other words, each main effect and interaction is aliased with one other effect. The output provides a complete account of the aliasing, indicating that each main effect is aliased with a five-way interaction, and each two-way interaction is aliased with a four-way interaction. This aliasing is characteristic of Resolution VI designs, as was shown in Table 3 . Because the four-way interaction breath × prep × notes × stakes has been placed in the non-negligible category, the design aliases it with another interaction in this category, audience × choose , rather than with one of the two-way interactions in the Estimate category.

Specifying the maximum number of experimental conditions

Another way to use software to choose a design is to specify the number of experimental conditions in the design, and let the software return the aliasing structure. This approach may make sense when resource constraints impose a strict upper limit on the number of experimental conditions that can be implemented, and the investigator wishes to decide whether key research questions can be addressed within this limit. Suppose in our hypothetical example the investigator can implement no more than eight experimental conditions; in other words, we need a 2 6−3 design. The investigator can use the following code:

In this case, the SAS output suggests a design with Resolution III. Because this Resolution III design is a one-eighth fraction, each source of variance has eight aliases. Each main effect is aliased with seven other effects. These effects may be any interaction; they will not be main effects.

A comparison of results for several different experiments

This section contains direct comparisons among the various experimental designs discussed in this article, based on artificial data generated using the same model for all the designs. This can be imagined as a situation in which after each experiment, time is turned back and the same factors are again investigated with the same experimental subjects, but using a different experimental design.

Let us return to the hypothetical example with six factors (breath , audience , choose , prep , notes , stakes ), each with two levels per factor, coded -1 for Off and +1 for On. Suppose there are a total of 320 subjects, with five subjects randomly assigned to each of the 64 experimental conditions of a 2 6 full factorial design, and the outcome variable is a reverse-scaled questionnaire about public speaking anxiety, that is, a higher score indicates less anxiety. Data were generated so that the score of participant j in the i th experimental condition was modeled as μ i + ε ij where the μ i are given by

and the errors are N (0, 2 2 ). Because the outcome variable in ( 1 ) is reverse-scored, helpful (anxiety-reducing) main effects can be called “positive” and harmful ones can be called “negative.” The standard deviation of 2 was used so that the regression coefficients above can also be interpreted as Cohen's d 's despite the -1/+1 metric for effect coding. Thus, the main effects coefficients in ( 1 ) represent half the long-run average raw difference between participants receiving the Off and On levels of the factor, and also represent the normalized difference between the -1 and +1 groups.

The example was deliberately set up so as not to be completely consistent with the investigator's ideas as expressed in the previous section. In the model above, anxiety is reduced on average by doing the breathing relaxation exercise, by being able to choose one's own topic, by having extra preparation time, and by having notes available. There is a small anxiety-increasing effect of higher stakes. The audience factor had zero main effect on anxiety. The first two positive two-way interactions indicate that longer preparation time intensified the effects of the breathing exercise or notes, or equivalently, that shorter preparation time largely neutralized their effects (as the subjects had little time to put them into practice). The third interaction indicates that higher stakes were energizing for those who were prepared, but anxiety-provoking for the less prepared. The first pair of negative two-way interactions indicate that the breath intervention was somewhat redundant with the more conventional aids of having notes and having one's choice of topic, or equivalently that breathing relaxation was more important when those aids were not available. There follow several other small higher-order nuisance interactions with no clear interpretability, as might occur in practice.

Data were generated using the above model for the following seven experimental designs: Complete factorial; individual experiments; two single factor designs (comparative treatment and constructive treatment); and the Resolution III, IV, and VI designs arrived at in the previous section. The total number of subjects used was held constant at 320 for all of the designs. For the individual experiments approach, six experiments, each with either 53 or 54 subjects, were simulated. For the single factor designs, experiments were simulated assigning either 45 or 46 subjects to each of seven experimental conditions. The comparative treatment design included a no-treatment control (i.e. all factors set to Off) and six experimental conditions, each with one factor set to On and the others set to Off. The constructive treatment design included a no-treatment control and six experimental conditions, each of which added a factor set to On in order from left to right, e.g. in the first treatment condition only breath was set to On, in the second treatment condition breath and audience were set to On and the remaining factors were set to Off, and so on until in the seventh experimental condition all six factors were set to On. To simulate data for the Resolution III, IV, and VI fractional factorial designs, 40, 20, and 10 subjects, respectively, were assigned to each experimental condition. In simulating data for each of the seven design alternatives, the μ i 's were recalculated accordingly but the vector of ε's was left the same.

ANOVA models were fit to each data set in the usual way using SAS PROC GLM. For example, the code used to fit an ANOVA model to the data set corresponding to the Resolution III fractional factorial design was as follows:

This model contained no interactions because they cannot be estimated in a Resolution III design. An abbreviated version of the SAS output corresponding to this code appears in Figure 3 . In the comparative treatment strategy each of the treatment conditions was compared to the no-treatment control. In the constructive treatment strategy each treatment condition was compared to the condition with one fewer factor set to On; for example, the condition in which breath and audience were set to On was compared to the condition in which only breath was set to On.

An external file that holds a picture, illustration, etc.
Object name is nihms148234f3.jpg

Partial output from SAS PROC GLM for simulated Resolution III data set.

Table 7 contains the regression coefficients corresponding to the effects of each factor for each of the seven designs. For reference, the true values of the regression coefficients used in data generation are shown at the top of the table.

Factor:breathaudiencechooseprepnotesstakes
Population main effect
(see ):
0.250.000.500.300.30-0.10
Complete factorial0.24 0.18 0.69 0.40 0.51 -0.07
Individual experiments0.51 0.161.14 -0.390.10-0.39
Single factor designs
 Comparative treatment0.32 0.260.79 -0.32 0.31-0.19
 Constructive treatment0.32 0.240.180.090.38 0.25
Fractional factorial designs
 Resolution III0.44 0.43 0.69 -0.150.64 -0.35
 Resolution IV0.24 0.18 0.69 0.40 0.42 -0.05
 Resolution VI0.24 0.18 0.69 0.40 0.51 -0.24

In the complete factorial experiment, breath , choose , prep , and notes were significant. The true main effect of stakes was small; with N = 320 this design had little power to detect it. Audience was marginally significant at α = .15, although the data were generated with this effect set at exactly zero. In the individual experiments approach, only choose was significant, and breath was marginally significant. The results for the comparative treatment experiment were similar to those of the individual experiments approach, as would be expected given that the two have identical aliasing. An additional effect was marginally significant in the comparative treatment approach, reflecting the additional statistical power associated with this design as compared to the individual experiments approach. In the constructive treatment experiment none of the factors were significant at α = .05. There were two marginally significant effects, breath and notes .

In the Resolution III design every effect except prep was significant. One of these, the significant effect of audience , was a spurious result (probably caused by aliasing with the prepare × stakes interaction). By contrast, results of the Resolution IV and VI designs were very similar to those of the complete factorial, except that in the Resolution VI design stakes was significant. In the individual experiments and single factor approaches, the estimates of the coefficients varied considerably from the true values. In the fractional factorial designs the estimates of the coefficients tended to be closer to the true values, particularly in the Resolution IV and Resolution VI designs.

Table 8 shows estimates of interactions from the designs that enable such estimates, namely the complete factorial design and the Resolution IV and Resolution VI factorial designs. The breath × prep interaction was significant in all three designs. The breath × choose interaction was significant in the complete factorial and the Resolution VI fractional factorial but was estimated as zero in the Resolution IV design. In general the coefficients for these interactions were very similar across the three designs. An exception was the coefficient for the breath × choose interaction, and, to a lesser degree, the coefficient for the breath × notes interaction.

Interaction: breath×audiencechooseprepnotesstakes
Truth:0.00-0.150.25-0.150.00
Complete factorial-0.03-0.25 0.29 -0.07-0.03
Res. IV fractional-0.030.000.29 -0.16-0.02
Res. VI fractional0.02-0.25 0.29 -0.070.04

Differences observed among the designs in estimates of coefficients are due to differences in aliasing plus a minor random disturbance due to reallocating the error terms when each new experiment was simulated, as described above. In general, more aliasing was associated with greater deviations from the true coefficient values. No effects were aliased in the complete factorial design, which had coefficient estimates closest to the true values. In the Resolution IV design each effect was aliased with three other effects, all of them interactions of three or more factors, and in the Resolution VI design each effect was aliased with one other effect, an interaction of four or more factors. These designs had coefficient estimates that were also very close to the true values. The Resolution III fractional factorial design, which aliased each effect with seven other effects, had coefficient estimates somewhat farther from the true values. The coefficient estimates associated with the individual and single factor approaches were farthest from the true values of the main effect coefficients. In the individual experiments and single factor approaches each effect was aliased with 15 other effects (the main effect of a factor was aliased with all the interactions involving that factor, from the two-way up to the six-way). The comparative treatment and constructive treatment approach aliased the same number of effects but differed in the coding of the aliased effects (as can be seen in Table 2 ), which is why their coefficient estimates differed.

Although the seven experiments had the same overall sample size N , they differed in statistical power. The complete and fractional factorial experiments, which had identical statistical power, were the most powerful. Next most powerful were the comparative treatment and constructive treatment designs. The individual experiments approach was the least powerful. These differences in statistical power, along with the differences in coefficient estimates, were reflected in the effects found significant at various levels of α across the designs. Among the designs examined here, the individual experiments approach and the two single factor designs showed the greatest disparities with the complete factorial.

Given the differences among them in aliasing, it is perhaps no surprise that these designs yielded different effect estimates and hypothesis tests. The research questions that motivate individual experiments and single factor designs, which often involve pairwise contrasts between individual experimental conditions, may not require estimation of main effects per se , so the relatively large differences between the coefficient estimates obtained using these designs and the true main effect coefficients may not be important. Instead, what may be more noteworthy is how few effects these designs detected as significant as compared to the factorial experiments.

General discussion

Some overall recommendations.

Despite the situation-specific nature of most design decisions, it is possible to offer some general recommendations. When per-subject costs are high in relation to per-condition overhead costs, complete and fractional factorials are usually the most economical designs. When per-condition costs are high in relation to per-subject costs, usually either a fractional factorial or single factor design will be most economical. Which is most economical will depend on considerations such as the number of factors, the sample size required to achieve the desired statistical power, and the particular fractional factorial design being considered.

In the limited set of situations examined in this article, the individual experiments approach emerged as the least economical. Although the individual experiments approach requires many fewer experimental conditions than a complete factorial and usually requires fewer than a fractional factorial, it requires more experimental conditions than a single factor experiment. In addition, it makes the least efficient use of subjects of any of the designs considered in this article. Of course, an individual experiments approach is necessary whenever the results of one experiment must be obtained first in order to inform the design of a subsequent experiment. Except for this application, in general the individual experiments approach is likely to be the least appealing of the designs considered here. Investigators who are planning a series of individual experiments may wish to consider whether any of them can be combined to form a complete or fractional factorial experiment, or whether a single factor design can be used.

Although factorial experiments with more than two or three factors are currently relatively rare in psychology, we recommend that investigators give such designs serious consideration. All else being equal, the statistical power of a balanced factorial experiment to detect a main effect of a given size is not reduced by the presence of other factors, except to a small degree caused by the reduction of error degrees of freedom in the model. In other words, if main effects are of primary scientific interest and interactions are not of great concern, then factors can be added without needing to increase N appreciably.

An interest in interactions is not the only reason to consider using factorial designs; investigators may simply wish to take advantage of the economy these designs afford, even when interactions are expected to be negligible or are not of scientific interest. In particular, investigators who undergo high subject costs but relatively modest condition costs may find that a factorial experiment will be much more economical than other design alternatives. Investigators faced with an upper limit on the availability of subjects may even find that a factorial experiment enables them to investigate research questions that would otherwise have to be set aside for some time. As Oehlert (2000 , p. 171) explained, “[t]here are thus two times when you should use factorial treatment structure—when your factors interact, and when your factors do not interact.”

One of the objectives of this article has been to demonstrate that fractional factorial designs merit consideration for use in psychological research alongside other reduced designs and complete factorial designs. Previous authors have noted that fractional factorial designs may be useful in a variety of areas within the social and behavioral sciences ( Landsheer & van den Wittenboer, 2000 ) such as behavioral medicine (e.g. Allore, Peduzzi, Han, & Tinetti, 2006 ; Allore, Tinettia, Gill, & Peduzzi, 2005 ), marketing research (e.g. Holland & Cravens, 1973 ), epidemiology ( Taylor et al., 1994 ), education ( McLean, 1966 ), human factors ( Simon & Roscoe, 1984 ), and legal psychology ( Stolle, Robbennolt, Patry, & Penrod, 2002 ). Shaw (2004) and Shaw, Festing, Peers, & Furlong (2002) noted that factorial and fractional factorial designs can help to reduce the number of animals that must be used in laboratory research. Cutler, Penrod, and Martens (1987) used a large fractional factorial design to conduct an experiment studying the effect of context variables on the ability of participants to identify the perpetrator correctly in a video of a simulated robbery. Their experiment included 10 factors, with 128 experimental conditions, but only 290 subjects.

An important special case: Development and evaluation of behavioral interventions

As discussed by Allore et al. (2006) , Collins, Murphy, Nair, and Strecher (2005) , Collins, Murphy, and Strecher (2007) , and West et al. (1993) , behavioral intervention scientists could build more potent interventions if there was more empirical evidence about which intervention components are contributing to program efficacy, which are not contributing, and which may be detracting from overall efficacy. However, as these authors note, generally behavioral interventions are designed a priori and then evaluated by means of the typical randomized controlled trial (RCT) consisting of a treatment group and a control group (e.g. experimental conditions 8 and 1, respectively, in Table 2 ). This all-or-nothing approach, also called the treatment package strategy ( West et al., 1993 ), involves the fewest possible experimental conditions, so in one sense it is a very economical design. The trade-off is that all main effects and interactions are aliased with all others. Thus although the treatment package strategy can be used to evaluate whether an intervention is efficacious as a whole, it does not provide direct evidence about any individual intervention component. A factorial design with as many factors as there are distinct intervention components of interest would provide estimates of individual component effects and interactions between and among components.

Individual intervention components are likely to have smaller effect sizes than the intervention as a whole ( West & Aiken, 1997 ), in which case sample size requirements will be increased as compared to a two-experimental-condition RCT. One possibility is to increase power by using a Type I error rate larger than the traditional α = .05, in other words, to tolerate a somewhat larger probability of mistakenly choosing an inactive component for inclusion in the intervention in order to reduce the probability of mistakenly rejecting an active intervention component. Collins et al. (2005 , 2007) recommended this and similar tactics as part of a phased experimental strategy aimed at selecting components and levels to comprise an intervention. In this phased experimental strategy, after the new intervention is formed its efficacy is confirmed in a RCT at the conventional α = .05. As Hays (1994 , p. 284) has suggested, “In some situations, perhaps, we should be far more attentive to Type II errors and less attentive to setting α at one of the conventional levels.”

One reason for eschewing a factorial design in favor of the standard two-experimental-condition RCT may be a shortage of resources needed to implement all the experimental conditions in a complete factorial design. If this is the primary obstacle, it is possible that it can be overcome by identifying a fractional factorial design requiring a manageable number of experimental conditions. Fractional factorial designs are particularly apropos for experiments in which the primary objective is to determine which factors out of an array of factors have important effects (where “important” can be defined as “statistically significant,” “effect size greater than d ,” or any other reasonable empirical criterion). In engineering these are called screening experiments. For example, suppose an investigator is developing an intervention and wishes to conduct an experiment to ascertain which of a set of possible intervention features are likely to contribute to an overall intervention effect. In most cases an approximate estimate of the effect of an individual factor is sufficient for a screening experiment, as long as the estimate is not so far off as to lead to incorrect inclusion of an intervention feature that has no effect (or, worse, has a negative effect) or incorrect exclusion of a feature that makes a positive contribution. Thus in this context the increased scientific information that can be gained using a fractional factorial design may be an acceptable tradeoff against the somewhat reduced estimation precision that can accompany aliasing. (For a Monte Carlo simulation examining the use of a fractional factorial screening experiment in intervention science, see Collins, Chakroborty, Murphy, & Strecher, in press .)

It must be acknowledged that even very economical fractional factorial designs typically require more experimental conditions than intervention scientists routinely consider implementing. In some areas in intervention science, there may be severe restrictions on the number of experimental conditions that can be realistically handled in any one experiment. For example, it may not be reasonable to demand of intervention personnel that they deliver different versions of the intervention to different subsets of participants, as would be required in any experiment other than the treatment package RCT. Or, the intervention may be so complex and demanding, and the context in which it must be delivered so chaotic, that implementing even two experimental conditions well is a remarkable achievement, and trying to implement more would surely result in sharply diminished implementation fidelity ( West & Aiken, 1997 ). Despite the undeniable reality of such difficulties, we wish to suggest that they do not necessarily rule out the use of complete and, in particular, fractional factorial designs across the board in all areas of intervention science. There may be some areas in which a careful analysis of available resources and logistical strategies will suggest that a factorial approach is feasible. One example is Strecher et al. (2008) , who described a 16-experimental-condition fractional factorial experiment to investigate five intervention components in a smoking cessation intervention. Another example can be found in Nair et al. (2008) , who described a 16-experimental-condition fractional factorial experiment to investigate five features of decision aids for women choosing among breast cancer treatments. Commenting on the Strecher et al. article, Norman (2008) wrote, “The fractional factorial design can provide considerable cost savings for more rapid prototype testing of intervention components and will likely be used more in future health behavior change research” (p. 450). Collins et al. (2005) and Nair et al. (2008) have provided some introductory information on the use of fractional factorial designs in intervention research. Collins et al. (2005 , 2007) discussed the use of fractional factorial designs in the context of a phased experimental strategy for building more efficacious behavioral interventions.

One interesting difference between the RCT on the one hand and factorial and fractional factorial designs on the other is that as compared to the standard RCT, a factorial design assigns a much smaller proportion of subjects to an experimental condition that receives no treatment. In a standard two-arm RCT about half of the experimental subjects will be assigned to some kind of control condition, for example a wait list or the current standard of care. By contrast, in a factorial experiment there is typically only one experimental condition in which all of the factors are set to Off. Thus if the design is a 2 3 factorial, say, seventh-eighths of the subjects will be assigned to a condition in which at least one of the factors is set to On. If the intervention is sought-after and assignment to a control condition is perceived as less desirable than assignment to a treatment condition, there may be better compliance because most subjects will receive some version of an intervention. In fact, it often may be possible to select a fractional factorial design in which there is no experimental condition in which all factors are set to Off.

Investigating interactions between individual characteristics and experimental factors in factorial experiments

Investigators are often interested in determining whether there are interactions between individual subject characteristics and any of the factors in a factorial or fractional factorial experiment. As an example, suppose an investigator is interested in determining whether gender interacts with the six independent variables in the hypothetical example used in this article. There are two ways this can be accomplished; one is exploratory, and the other is a priori (e.g. Murray, 1998 ).

In the exploratory approach, after the experiment has been conducted gender is coded and added to the analysis of variance as if it were another factor. Even if the design was originally perfectly balanced, such an addition nearly always results in a substantial disruption of balance. Thus the effect estimates are unlikely to be orthogonal, and so care must be taken in estimating the sums of squares. If a reduced design was used, it is important to be aware of what effects, if any, are aliased with the interactions being examined. In most fractional factorial experiments the two-way interactions between gender and any of the independent variables are unlikely to be aliased with other effects, but three-way and higher-order interactions involving gender are likely to be aliased with other effects.

In the a priori approach, gender is built into the design as an additional factor before the experiment is conducted, by ensuring that it is crossed with every other factor. Orthogonality will be maintained and power for detecting gender effects will be optimized if half of the subjects are male and half are female, with randomization done separately within each gender, as if gender were a blocking variable. However, in blocking it is assumed that there are no interactions between the blocking variable and the independent variables; the purpose of blocking is to control error. By contrast, in the a priori approach the interactions between gender and the manipulated independent variables are of particular interest, and the experiment should be powered accordingly to detect these interactions. As compared to the exploratory approach, with the a priori approach it is much more likely that balance can be maintained or nearly maintained. Variables such as gender can easily be incorporated into fractional factorial designs using the a priori approach. These variables can simply be listed with the other independent variables when using software such as PROC FACTEX to identify a suitable fractional factorial design. A fractional factorial design can be chosen so that important two-way and even three-way interactions between, for example, gender and other independent variables are aliased only with higher-order interactions.

How negligible is negligible?

To the extent that an effect placed in the negligible category is nonzero, the estimate of any effect of primary scientific interest that is aliased with it will be different from an estimate based on a complete factorial experiment. Thus a natural question is, “How small should the expected size of an interaction be for the interaction to be placed appropriately in the negligible category?”

The answer depends on the field of scientific endeavor, the value of the scientific information that can be gained using a reduced design, and the kind of decisions that are to be made based on the results of the experiment. There are risks associated with assuming an effect is negligible. If the effect is in reality non-negligible and positive, it can make a positive effect aliased with it look spuriously large, or make a negative effect aliased with it look spuriously zero or even positive. If an effect placed in the negligible category is non-negligible and negative, it can make a positive effect aliased with it look spuriously zero or even negative, or make a negative effect aliased with it look spuriously large.

Placing an effect in the negligible category is not the same as assuming it is exactly zero. Rather, the assumption is that the effect is small enough not to be very likely to lead to incorrect decisions. If highly precise estimates of effects are required, it may be that few or no effects are deemed small enough to be eligible for placement in the negligible category. If the potential gain of additional scientific information obtained at a cost of fewer resources offsets the risk associated with reduced estimation precision and the possibility of some spurious effects, then effects expected to be nonzero, but small, may more readily be designated negligible.

Some limitations of this article

The discussion of reduced designs in this article is limited in a number of ways. One limitation of the discussion is that it has focused on between-subjects designs. It is straightforward to extend every design here to incorporate repeated measures, which will improve statistical power. However, all else being equal, the factorial designs will still have more power than the individual experiments and single factor approaches. There have been a few examples of the application of within-subjects fractional designs in legal psychology ( Cutler, Penrod, & Dexter, 1990 ; Cutler, Penrod, & Martens, 1987 ; Cutler, Penrod, & Stuve, 1988 ; O'Rourke, Penrod, Cutler, & Stuve, 1989 ; Smith, Penrod, Otto, & Park, 1996 ) and in other research on attitudes and choices (e.g., van Schaik, Flynn & van Wersch, 2005 ; Sorenson & Taylor, 2005 ; Zimet et al., 2005 ) in which a fractional factorial structure is used to construct the experimental conditions assigned to each subject. In fact, the Latin squares approach for balancing orders of experimental conditions in repeated-measures studies is a form of within-subjects fractional factorial. Within-subjects fractional designs of this kind could be seen as a form of planned missingness design (see Graham, Taylor, Olchowski, & Cumsille, 2006 ).

Another limitation of this article is the focus on factors with only two levels. Designs involving exclusively two-level factors are very common, and factorial designs with two levels per factor tend to be more economical than those involving factors with three or more levels, as well as much more interpretable in practice, due to their simpler interaction structure ( Wu & Hamada, 2000 ). However, any of the designs discussed here can incorporate factors with more than two levels, and different factors may have different numbers of levels. Factors with three or more levels, and in particular an array of factors with mixed numbers of levels, adds complexity to the aliasing in fractional factorial experiments. Although this requires careful attention, it can be handled in a straightforward manner using software like SAS PROC FACTEX.

This article has not discussed what to do when unexpected difficulties arise. One such difficulty is unplanned missing data, for example, an experimental subject failing to provide outcome data. The usual concerns about informative missingness (e.g. dropout rates that are higher in some experimental conditions than in others) apply in complete and reduced factorial experiments just as they do in other research settings. In any complete or reduced design unplanned missingness can be handled in the usual manner, via multiple imputation or maximum likelihood (see e.g. Schafer & Graham, 2002 ). If experimental conditions are assigned unequal numbers of subjects, use of a regression analysis framework can deal with the resulting lack of orthogonality of effects with very little extra effort (e.g. PROC GLM in SAS). Another unexpected difficulty that can arise in reduced designs is evidence that assumptions about negligible interactions are incorrect. If this occurs, one possibility is to implement additional experimental conditions to address targeted questions, in an approach often called sequential experimentation ( Meyer, Steinberg, & Box, 1996 ).

The resource management perspective: Strategic weighing of resource requirements and expected scientific benefit

According to the resource management perspective, the choice of an experimental design requires consideration of both resource requirements and expected scientific benefit; the preferred research design is the one expected to provide the greatest scientific benefit in relation to resources required. Although aliasing may sometimes be raised as an objection to the use of fractional factorial designs, it must be remembered that aliasing in some form is inescapable in any and all reduced designs, including individual experiments and single factor designs. We recommend considering all feasible designs and making a decision taking a resource management perspective that weighs resource demands against scientific costs and benefits.

Paramount among the considerations that drive the choice of an experimental design is addressing the scientific question motivating the research. At the same time, if this scientific question can be addressed only by a very resource-intensive design, but a closely related question can be addressed by a much less resource-intensive design, the investigator may wish to consider reframing the question to conserve resources. For example, when research subjects are expensive or scarce, it may be prudent to consider whether scientific questions can be framed in terms of main effects rather than simple effects so that a factorial or fractional factorial design can be used. Or, when resource limitations preclude implementing more than a very few experimental conditions, it may be prudent to consider framing research questions in terms of simple effects rather than main effects. When a research question is reframed to take advantage of the economy offered by a particular design, it is important that the interpretation of effects be consistent with the reframing, and that this consistency be maintained not only in the original research report but in subsequent citations of the report, as well as integrative reviews or meta-analyses that include the findings.

Resource requirements can often be estimated objectively, as discussed above. Tables like Table 5 may be helpful and can readily be prepared for any N and k . (A SAS macro to perform these computations can be found on the web site http:\\methodology.psu.edu .) In contrast, assessment of expected scientific benefit is much more subjective, because it represents the investigator's judgment of the value of the scientific knowledge proffered by an experimental design in relation to the plausibility of any assumptions that must be made. For this reason, weighing resource requirements against expected scientific benefit can be challenging. Because expected scientific benefit usually cannot be expressed in purely financial terms, or even readily quantified, a simple benefit to cost ratio is unlikely to be helpful in choosing among alternative designs. For many social and behavioral scientists, the decision may be simplified somewhat by the existence of absolute upper limits on the number of subjects that are available, number of experimental conditions that can be handled logistically, availability of qualified personnel to run experimental conditions, number of hours shared equipment can be used, and so on. Designs that would exceed these limitations are immediately ruled out, and the preferred design now becomes the one that is expected to provide the greatest scientific benefit without exceeding available resources. This requires careful planning to ensure that the design of the study clearly addresses the scientific questions of most interest.

For example, suppose an investigator who is interested in six two-level independent variables has the resources to implement an experiment with at most 16 experimental conditions. One possible strategy is a “complete” factorial design involving four factors and holding the remaining two factors constant at specified levels. Given that six factors are of scientific interest, this “complete” factorial design is actually a reduced design. This approach enables estimation of the main effects and all interactions involving the four factors included in the experiment, but these effects will be aliased with interactions involving the two omitted factors. Therefore in order to draw conclusions either these effects must be assumed negligible, or interpretation must be restricted to the levels at which the two omitted factors were set. Another possible strategy is a Resolution IV fractional factorial design including all six factors, which enables investigation of all six main effects and many two-way interactions, but no higher-order interactions. Instead, this design requires assuming that all three-way and higher-order interactions are negligible. Thus, both designs can be implemented within available resources, but they differ in the kind of scientific information they provide and the assumptions they require. Which option is better depends on the value of the information provided by each experiment in relation to the research questions. If the ability to estimate the higher-order interactions afforded by the four-factor factorial design is more valuable than the ability to estimate the six main effects and additional two-way interactions afforded by the fractional factorial design, then the four-factor factorial may have greater expected scientific benefit. On the other hand, if the investigator is interested primarily in main effects of all six factors and selected two-way interactions, the fractional factorial design may provide more valuable information.

Strategic use of reduced designs involves taking calculated risks. To assess the expected scientific benefit of each design, the investigator must also consider the risk associated with any necessary assumptions in relation to the value of the knowledge that can be gained by the design. In the example above, any risk associated with making the assumptions required by the fractional factorial design must be weighted against the value associated with the additional main effect and two-way interaction estimates. If other, less powerful reduced designs are considered, any increased risk of a Type II error must also be considered. If an experiment is an exploratory endeavor intended to determine which factors merit further study in a subsequent experiment, the ability to investigate many factors may be of paramount importance and may outweigh the risks associated with aliasing. A design that requires no or very safe assumptions may not have a greater net scientific benefit than a riskier design if the knowledge it proffers is meager or is not at the top of the scientific agenda motivating the experiment. Put another way, the potential value of the knowledge that can be gained in a design may offset any risk associated with the assumptions it requires.

Acknowledgments

The authors would like to thank Bethany C. Bray, Michael J. Cleveland, Donna L. Coffman, Mark Feinberg, Brian R. Flay, John W. Graham, Susan A. Murphy, Megan E. Patrick, Brittany Rhoades, and David Rindskopf for comments on an earlier draft. This research was supported by NIDA grants P50 DA10075 and K05 DA018206.

1 Assuming orthogonality is maintained, adding a factor to a factorial experiment does not change estimates of main effects and interactions. However, the addition of a factor does change estimates of error terms, so hypothesis tests can be slightly different.

2 In the social and behavioral sciences literature the term “fractional factorial” has sometimes been applied to reduced designs that do not maintain the balance property, such as the individual experiments and single factor designs. In this article we maintain the convention established in the statistics literature (e.g. Wu & Hamada, 2000 ) of reserving the term “fractional factorial” for the subset of reduced designs that maintain the balance property.

Contributor Information

Linda M. Collins, The Methodology Center and Department of Human Development and Family Studies, The Pennsylvania State University.

John J. Dziak, The Methodology Center, The Pennsylvania State University.

Runze Li, Department of Statistics and The Methodology Center, The Pennsylvania State University.

  • Allore H, Peduzzi P, Han L, Tinetti M. Using the SAS system for experimental designs for multicomponent interventionsin medicine (No. 127-31) SAS white paper. 2006. see www2.sas.com/proceedings/sugi31/127-31.pdf .
  • Allore HG, Tinettia ME, Gill TM, Peduzzi PN. Experimental designs for multicomponent interventions among persons with multifactorial geriatric syndromes. Clinical Trials. 2005; 2 :13–21. [ PubMed ] [ Google Scholar ]
  • Bolger N, Amarel D. Effects of social support visibility on adjustment to stress: Experimental evidence. Journal of Personality and Social Psychology. 2007; 92 :458–475. [ PubMed ] [ Google Scholar ]
  • Box G, Hunter JS. The 2 k−p fractional factorial designs. Technometrics. 1961; 3 :311–351. 449–458. [ Google Scholar ]
  • Box G, Meyer R. An analysis for unreplicated fractional factorials. Technometrics. 1986; 28 :11–18. [ Google Scholar ]
  • Box GEP, Hunter WG, Hunter JS. Statistics for experimenters: An introduction to design, data analysis, and model building. New York: Wiley; 1978. [ Google Scholar ]
  • Cohen J. Statistical power analysis for the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum Associates; 1988. [ Google Scholar ]
  • Collins L, Chakroborty B, Murphy S, Strecher V. Comparison of a phased experimental approach and a single randomized clinical trial for developing multicomponent behavioral interventions. Clinical Trials in press. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Murphy SA, Nair V, Strecher V. A strategy for optimizing and evaluating behavioral interventions. Annals of Behavioral Medicine. 2005; 30 :65–73. [ PubMed ] [ Google Scholar ]
  • Collins LM, Murphy SA, Strecher V. The Multiphase Optimization Strategy (MOST) and the SequentialMultiple Assignment Randomized Trial (SMART): New methods formore potent e-health interventions. American Journal of Preventive Medicine. 2007; 32 :S112–S118. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cutler BL, Penrod SD, Dexter HR. Juror sensitivity to eyewitness identification evidence. Law and Human Behavior. 1990; 14 :185–191. [ Google Scholar ]
  • Cutler BL, Penrod SD, Martens TK. Improving the reliability of eyewitness identification: Putting context into context. Journal of Applied Psychology. 1987; 71 :629–637. [ Google Scholar ]
  • Cutler BL, Penrod SD, Stuve TE. Juror decision making in eyewitness identification cases. Law and Human Behavior. 1988; 12 :41–55. [ Google Scholar ]
  • Graham JW, Taylor BJ, Olchowski AE, Cumsille PE. Planned missing data designs in psychological research. Psychological Methods. 2006; 11 :323–343. [ PubMed ] [ Google Scholar ]
  • Green PE, Rao VR. Conjoint measurement for quantifying judgmental data. Journal of Marketing Research. 1971; 8 :355–363. [ Google Scholar ]
  • Hays WL. Statistics. Orlando, Florida: Harcourt Brace & Company; 1994. [ Google Scholar ]
  • Holland CW, Cravens DW. Fractional factorial experimental designs in marketing research. Journal of Marketing Research. 1973; 10 :270–276. [ Google Scholar ]
  • Insightful Corporation. S-PLUS® 8 for Windows® user's guide. Seattle, WA: Insightful Corporation; 2007. [ Google Scholar ]
  • Kirk R. Experimental design: Procedures for the behavioral sciences. 3rd. Pacific Grove, CA: Brooks/Cole; 1995. [ Google Scholar ]
  • Kuehl RO. Design of experiments: Statistical principles of research design and analysis. 2nd. Pacific Grove, CA: Duxbury/Thomson; 1999. [ Google Scholar ]
  • Landsheer JA, van den Wittenboer G. Fractional designs: a simulation study of usefulness in the social sciences. Behavior Research Methods. 2000; 32 :528–36. [ PubMed ] [ Google Scholar ]
  • Mathews PG. Design of experiments with Minitab. Milwaukee, WI: Quality Press; 2005. [ Google Scholar ]
  • McLean LD. Phantom classrooms. The School Review. 1966; 74 :139–149. [ Google Scholar ]
  • Meyer RD, Steinberg DM, Box GEP. Follow-up designs to resolve confounding in multifactor experiments. Technometrics. 1996; 38 :303–313. [ Google Scholar ]
  • Murray DM. Design and analysis of group-randomized trials. New York: Oxford University Press; 1998. [ Google Scholar ]
  • Nair V, Strecher V, Fagerlin A, Ubel P, Resnicow K, Murphy S, et al. Screening Experiments and the Use of Fractional Factorial Designs in Behavioral Intervention Research. American Journal of Public Health. 2008; 98 (8):1354. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • National Institute of Standards and Technology/SEMATECH. e-Handbook of statistical methods. 2006. [July 17, 2007]. http://www.itl.nist.gov/div898/handbook/ Available from http://www.itl.nist.gov/div898/handbook/
  • Norman GJ. Answering the “What works?” question in health behavior change. American Journal of Preventive Medicine. 2008; 34 :449–450. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Oehlert GW. A first course in design and analysis of experiments. New York: W. H. Freeman; 2000. [ Google Scholar ]
  • O'Rourke TE, Penrod SD, Cutler BL, Stuve TE. The external validity of eyewitness identification research: Generalizing across subject populations. Law and Human Behavior. 1989; 13 :385–398. [ Google Scholar ]
  • Ryan BF, Joiner BL, Cryer JD. Minitab handbook. 5th. Belmont, CA: Duxbury/Thomson; 2004. [ Google Scholar ]
  • SAS Institute Inc. SAS/QC® 9.1 user's guide. Cary, NC: Author; 2004. [ Google Scholar ]
  • Schafer JL, Graham JW. Missing data: Our view of the state of the art. Psychological Methods. 2002; 7 :147–177. [ PubMed ] [ Google Scholar ]
  • Shaw R. Reduction in laboratory animal use by factorial design. Alternatives to Laboratory Animals. 2004; 32 :49–51. [ PubMed ] [ Google Scholar ]
  • Shaw R, Festing MFW, Peers I, Furlong L. Use of factorial designs to optimize animal experiments and reduce animal use. Institute for Laboratory Animal Research Journal. 2002; 43 :223–232. [ PubMed ] [ Google Scholar ]
  • Simon CW, Roscoe SN. Application of a multifactor approach to transfer of training research. Human Factors. 1984; 26 :591–612. [ Google Scholar ]
  • Smith BC, Penrod SD, Otto AL, Park RC. Jurors' use of probabilistic evidence. Law and Human Behavior. 1996; 20 :49–82. [ Google Scholar ]
  • Sorenson SB, Taylor CA. Female aggression toward male intimate partners: An examination of social norms in a community-based sample. Psychology of Women Quarterly. 2005; 29 :78–96. [ Google Scholar ]
  • Stolle DP, Robbennolt JK, Patry M, Penrod SD. Fractional factorial designs for legal psychology. Behavioral Sciences and the Law. 2002; 20 :5–17. [ PubMed ] [ Google Scholar ]
  • Strecher VJ, McClure JB, Alexander GL, Chakraborty B, Nair VN, Konkel JM, et al. Web-based smoking-cessation programs: Results of a randomized trial. American Journal of Preventive Medicine. 2008; 34 :373–381. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Taylor PR, Li B, Dawsey SM, Li J, Yang CS, Gao W, et al. Prevention of esophageal cancer: The nutrition intervention trials in Linxian, China. Cancer Research. 1994; 54 :2029s–2031s. [ PubMed ] [ Google Scholar ]
  • van Schaik P, Flynn D, van Wersch A. Influence of illness script components and medical practice on medical decision making. Journal of Experimental Psychology: Applied. 2005; 11 :187–199. [ PubMed ] [ Google Scholar ]
  • West SG, Aiken LS. Toward understanding individual effects in multicomponent prevention programs: Design and analysis strategies. In: Bryant K, Windle M, West S, editors. The science of prevention: Methodological advances from alcohol and substanceabuse research. Washington, D.C.: American Psychological Association; 1997. pp. 167–209. chap 6. [ Google Scholar ]
  • West SG, Aiken LS, Todd M. Probing the effects of individual components in multiple component prevention programs. American Journal of Community Psychology. 1993; 21 :571–605. [ PubMed ] [ Google Scholar ]
  • Wu C, Hamada M. Experiments: Planning, analysis, and parameter design optimization. New York: Wiley; 2000. [ Google Scholar ]
  • Zimet GD, Mays RM, Sturm LA, Ravert AA, Perkins SM, Juliar BE. Parental attitudes about sexually transmitted infection vaccination for their adolescent children. Archives of Pediatrics and Adolescent Medicine. 2005; 159 :132–137. [ PubMed ] [ Google Scholar ]
  • PID home page »
  • 5. Design and Analysis of Experiments »
  • 5.8. Full factorial designs »
  • 5.8.5. Example: design and analysis of a three-factor experiment

5.8.5. Example: design and analysis of a three-factor experiment ¶

This example should be done by yourself. It is based on Question 19 in the exercises for Chapter 5 in Box, Hunter and Hunter (2nd edition).

The data are from a plastics molding factory that must treat its waste before discharge. The \(y\) -variable represents the average amount of pollutant discharged (lb per day), while the three factors that were varied were

\(C\) = the chemical compound added (choose either chemical P or chemical Q) \(T\) = the treatment temperature (72 °F or 100 °F) \(S\) = the stirring speed (200 rpm or 400 rpm) \(y\) = the amount of pollutant discharged (lb per day) Experiment Order \(C\) \(T\) [°F] \(S\) [rpm] \(y\) [lb] 1 5 Choice P 72 200 5 2 6 Choice Q 72 200 30 3 1 Choice P 100 200 6 4 4 Choice Q 100 200 33 5 2 Choice P 72 400 4 6 7 Choice Q 72 400 3 7 3 Choice P 100 400 5 8 8 Choice Q 100 400 4

Draw a geometric figure that illustrates the data from this experiment.

Calculate the main effect for each factor by hand.

For the C effect , there are four estimates of \(C\) : \[\displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}\] For the T effect , there are four estimates of \(T\) : \[\displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}\] For the S effect , there are four estimates of \(S\) : \[\displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}\]

Calculate the 3 two-factor interactions (2fi) by hand, recalling that interactions are defined as the half difference going from high to low.

For the CT interaction , there are two estimates of \(CT\) . Recall that interactions are calculated as the half difference going from high to low. Consider the change in \(C\) when \(T_\text{high}\) (at \(S\) high) = \(4 - 5 = -1\) \(T_\text{low}\) (at \(S\) high) = \(3 - 4 = -1\) This gives a first estimate of \([(-1) - (-1)]/2 = 0\) . Similarly, \(T_\text{high}\) (at \(S\) low) = \(33 - 6 = +27\) \(T_\text{low}\) (at \(S\) low) = \(30 - 5 = +25\) gives a second estimate of \([(+27) - (+25)]/2 = +1\) . The average CT interaction is therefore \((0 + 1)/2 = \mathbf{0.5}\) . You can interchange \(C\) and \(T\) and still get the same result. For the CS interaction , there are two estimates of \(CS\) . Consider the change in \(C\) when \(S_\text{high}\) (at \(T\) high) = \(4 - 5 = -1\) \(S_\text{low}\) (at \(T\) high) = \(33 - 6 = +27\) This gives a first estimate of \([(-1) - (+27)]/2 = -14\) . Similarly, \(S_\text{high}\) (at \(T\) low) = \(3 - 4 = -1\) \(S_\text{low}\) (at \(T\) low) = \(30 - 5 = +25\) gives a second estimate of \([(-1) - (+25)]/2 = -13\) . The average CS interaction is therefore \((-13 - 14)/2 = \mathbf{-13.5}\) . You can interchange \(C\) and \(S\) and still get the same result. For the ST interaction , there are two estimates of \(ST\) : \((-1 + 0)/2 = \mathbf{-0.5}\) . Calculate in the same way as above.

Calculate the single three-factor interaction (3fi).

There is only a single estimate of \(CTS\) . The \(CT\) effect at high \(S\) is 0, and the \(CT\) effect at low \(S\) is \(+1\) . The \(CTS\) interaction is then \([(0) - (+1)] / 2 = \mathbf{-0.5}\) . You can also calculate this by considering the \(CS\) effect at the two levels of \(T\) , or by considering the \(ST\) effect at the two levels of \(C\) . All three approaches give the same result.

Compute the main effects and interactions using matrix algebra and a least squares model.

Use computer software to build the following model and verify that:

Learning notes:

The chemical compound could be coded either as (chemical P = \(-1\) , chemical Q = \(+1\) ) or (chemical P = \(+1\) , chemical Q = \(-1\) ). The interpretation of the \(x_C\) coefficient is the same, regardless of the coding. Just the tabulation of the raw data gives us some interpretation of the results. Why? Since the variables are manipulated independently, we can just look at the relationship of each factor to \(y\) , without considering the others. It is expected that the chemical compound and speed have a strong effect on \(y\) , but we can also see the chemical \(\times\) speed interaction. You can see this last interpretation by writing out the full \(\mathbf{X}\) design matrix and comparing the bold column, associated with the \(b_\text{CS}\) term, with the \(y\) column.

A note about magnitude of effects

In this text we quantify the effect as the change in response over half the range of the factor. For example, if the center point is 400 K, the lower level is 375 K and the upper level is 425 K, then an effect of "-5" represents a reduction in \(y\) of 5 units for every increase of 25 K in \(x\) .

We use this representation because it corresponds with the results calculated from least-squares software. Putting the matrix of \(-1\) and \(+1\) entries into the software as \(\mathbf{X}\) , along with the corresponding vector of responses, \(y\) , you can calculate these effects as \(\mathbf{b} = \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}\mathbf{y}\) .

Other textbooks, specifically Box, Hunter and Hunter, will report effects that are double ours. This is because they consider the effect to be the change from the lower level to the upper level (double the distance). The advantage of their representation is that binary factors (catalyst A or B; agitator on or off) can be readily interpreted, whereas in our notation, the effect is a little harder to describe (simply double it!).

The advantage of our methodology, though, is that the results calculated by hand would be the same as those from any computer software with respect to the magnitude of the coefficients and the standard errors, particularly in the case of duplicate runs and experiments with center points.

Remember: our effects are half those reported in Box, Hunter and Hunter, and in some other textbooks; our standard error would also be half of theirs. The conclusions drawn will always be the same, as long as one is consistent.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

3.1: Factorial Designs

  • Last updated
  • Save as PDF
  • Page ID 32925

  • Yang Lydia Yang
  • Kansas State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Just as it is common for studies in education (or social sciences in general) to include multiple levels of a single independent variable (new teaching method, old teaching method), it is also common for them to include multiple independent variables. Just as including multiple levels of a single independent variable allows one to answer more sophisticated research questions, so too does including multiple independent variables in the same experiment. But including multiple independent variables also allows the researcher to answer questions about whether the effect of one independent variable depends on the level of another. This is referred to as an interaction between the independent variables. As we will see, interactions are often among the most interesting results in empirical research.

Factorial Designs

By far the most common approach to including multiple independent variables (which are also called factors or ways) in an experiment is the factorial design. In a between-subjects factorial design , each level of one independent variable is combined with each level of the others to produce all possible combinations. Each combination, then, becomes a condition in the experiment. Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure \(\PageIndex{1}\). The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible combinations or conditions: using a cell phone during the day, not using a cell phone during the day, using a cell phone at night, and not using a cell phone at night. This particular design is referred to as a 2 × 2 (read “two-by-two”) factorial design because it combines two variables, each of which has two levels.

If one of the independent variables had a third level (e.g., using a handheld cell phone, using a hands-free cell phone, and not using a cell phone), then it would be a 3 × 2 factorial design, and there would be six distinct conditions. Notice that the number of possible conditions is the product of the numbers of levels. A 2 × 2 factorial design has four conditions, a 3 × 2 factorial design has six conditions, a 4 × 5 factorial design would have 20 conditions, and so on. Also notice that each number in the notation represents one factor, one independent variable. So by looking at how many numbers are in the notation, you can determine how many independent variables there are in the experiment. 2 x 2, 3 x 3, and 2 x 3 designs all have two numbers in the notation and therefore all have two independent variables. Some people refer to these are two-way factorial ANOVA. The numerical value of each of the numbers represents the number of levels of each independent variable. A 2 means that the independent variable has two levels, a 3 means that the independent variable has three levels, a 4 means it has four levels, etc. To illustrate, a 3 x 3 design has two independent variables, each with three levels, while a 2 x 2 x 2 design has three independent variables, each with two levels.

8.1.png

In principle, factorial designs can include any number of independent variables with any number of levels. For example, an experiment could include the type of psychotherapy (cognitive vs. behavioral), the length of the psychotherapy (2 weeks vs. 2 months), and the sex of the psychotherapist (female vs. male). This would be a 2 × 2 × 2 factorial design and would have eight conditions. Figure \(\PageIndex{2}\) shows one way to represent this design. In practice, it is unusual for there to be more than three independent variables with more than two or three levels each. This is for at least two reasons: For one, the number of conditions can quickly become unmanageable. For example, adding a fourth independent variable with three levels (e.g., therapist experience: low vs. medium vs. high) to the current example would make it a 2 × 2 × 2 × 3 factorial design with 24 distinct conditions. Second, the number of participants required to populate all of these conditions (while maintaining a reasonable ability to detect a real underlying effect) can render the design unfeasible. As a result, in the remainder of this section, we will focus on designs with two independent variables. The general principles discussed here extend in a straightforward way to more complex factorial designs.

8.2.png

Assigning Participants to Conditions

Recall that in a between-subjects single factor design, each participant is tested in only one condition. In a between-subjects factorial design , all of the independent variables are manipulated between subjects. For example, all participants could be tested either while using a cell phone or while not using a cell phone and either during the day or during the night. This would mean that each participant would be tested in one and only one condition.

Since factorial designs have more than one independent variable, it is also possible to manipulate one independent variable between subjects and another within subjects. This is called a mixed factorial design . For example, a researcher might choose to treat cell phone use as a within-subjects factor by testing the same participants both while using a cell phone and while not using a cell phone. But they might choose to treat time of day as a between-subjects factor by testing each participant either during the day or during the night (perhaps because this only requires them to come in for testing once). Thus each participant in this mixed design would be tested in two of the four conditions. This is a complex design with complex statistical analyses. In the remainder of this section, we will focus on between-subjects factorial designs only. Also, regardless of the design, the actual assignment of participants to conditions is typically done randomly.

Non-Manipulated Independent Variables

In many factorial designs, one of the independent variables is a non-manipulated independent variable . The researcher measures it but does not manipulate it. An example is a study by Halle Brown and colleagues in which participants were exposed to several words that they were later asked to recall (Brown, Kosslyn, Delamater, Fama, & Barsky, 1999) [1] . The manipulated independent variable was the type of word. Some were negative health-related words (e.g., tumor, coronary ), and others were not health related (e.g., election, geometry ). The non-manipulated independent variable was whether participants were high or low in hypochondriasis (excessive concern with ordinary bodily symptoms). The result of this study was that the participants high in hypochondriasis were better than those low in hypochondriasis at recalling the health-related words, but they were no better at recalling the non-health-related words.

Such studies are extremely common, and there are several points worth making about them. First, non-manipulated independent variables are usually participant background variables (self-esteem, gender, and so on), and as such, they are by definition between-subjects factors. For example, people are either low in self-esteem or high in self-esteem; they cannot be tested in both of these conditions. Second, such studies are generally considered to be experiments as long as at least one independent variable is manipulated, regardless of how many non-manipulated independent variables are included. Third, it is important to remember that causal conclusions can only be drawn about the manipulated independent variable. Thus it is important to be aware of which variables in a study are manipulated and which are not.

Non-Experimental Studies With Factorial Designs

Thus far we have seen that factorial experiments can include manipulated independent variables or a combination of manipulated and non-manipulated independent variables. But factorial designs can also include only non-manipulated independent variables, in which case they are no longer experiment designs, but are instead non-experimental in nature. Consider a hypothetical study in which a researcher simply measures both the moods and the self-esteem of several participants—categorizing them as having either a positive or negative mood and as being either high or low in self-esteem—along with their willingness to have unprotected sex. This can be conceptualized as a 2 × 2 factorial design with mood (positive vs. negative) and self-esteem (high vs. low) as non-manipulated between-subjects factors. Willingness to have unprotected sex is the dependent variable.

Again, because neither independent variable in this example was manipulated, it is a non-experimental study rather than an experimental design. This is important because, as always, one must be cautious about inferring causality from non-experimental studies because of the threats of potential confounding variables. For example, an effect of participants’ moods on their willingness to have unprotected sex might be caused by any other variable that happens to be correlated with their moods.

  • Brown, H. D., Kosslyn, S. M., Delamater, B., Fama, A., & Barsky, A. J. (1999). Perceptual and memory biases for health-related information in hypochondriacal individuals. Journal of Psychosomatic Research, 47 , 67–78. ↵

Statology

Statistics Made Easy

A Complete Guide: The 2×3 Factorial Design

A 2×3 factorial design is a type of experimental design that allows researchers to understand the effects of two independent variables on a single dependent variable.

In this type of design, one independent variable has two levels and the other independent variable has three levels.

experimental 3 factors

For example, suppose a botanist wants to understand the effects of sunlight (low vs. medium vs. high) and watering frequency (daily vs. weekly) on the growth of a certain species of plant.

experimental 3 factors

This is an example of a 2×3 factorial design because there are two independent variables, one having two levels and the other having three levels:

  • Levels: Low, Medium, High
  • Levels: Daily, Weekly

And there is one dependent variable: Plant growth.

The Purpose of a 2×3 Factorial Design

A 2×3 factorial design allows you to analyze the following effects:

Main Effects: These are the effects that just one independent variable has on the dependent variable.

For example, in our previous scenario we could analyze the following main effects:

  • Mean growth of all plants that received low sunlight.
  • Mean growth of all plants that received medium sunlight.
  • Mean growth of all plants that received high sunlight.
  • Mean growth of all plants that were watered daily.
  • Mean growth of all plants that were watered weekly.

Interaction Effects: These occur when the effect that one independent variable has on the dependent variable depends on the level of the other independent variable.

For example, in our previous scenario we could analyze the following interaction effects:

  • Does the effect of sunlight on plant growth depend on watering frequency?
  • Does the effect of watering frequency on plant growth depend on the amount of sunlight?

How to Analyze a 2×3 Factorial Design

We can perform a two-way ANOVA to formally test whether or not the independent variables have a statistically significant relationship with the dependent variable.

For example, the following code shows how to perform a two-way ANOVA for our hypothetical plant scenario in R:

Here’s how to interpret the output of the ANOVA:

  • The p-value associated with sunlight is <2e-16 . Since this is less than .05, this means sunlight exposure has a statistically significant effect on plant growth.
  • The p-value associated with water is  .0105 . Since this is less than .05, this means watering frequency also has a statistically significant effect on plant growth.
  • The p-value for the interaction between sunlight and water is  .2819 . Since this is not less than .05, this means there is no interaction effect between sunlight and water.

Additional Resources

The following tutorials provide additional information on experimental design and analysis:

A Complete Guide: The 2×2 Factorial Design What Are Levels of an Independent Variable? Independent vs. Dependent Variables What is a Factorial ANOVA?

Featured Posts

experimental 3 factors

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “A Complete Guide: The 2×3 Factorial Design”

This is a great tutorial but it would be helpful to walk through post hoc analysis for understanding the interaction effect too.

Thanks for your helping post.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Six Sigma Study Guide

Six Sigma Study Guide

Study notes and guides for Six Sigma certification tests

Ted Hessing

Factors in an Experiment

Posted by Ted Hessing

In most experiments, you’ll have a number of factors to deal with. These are elements that affect the outcomes of your experiment. They fall into a few basic categories:

  • Experimental factors are those that you can specify and set yourself. For example, the maximum temperature to which you can heat a solution.
  • Classification factors can’t be specified or set, but they can be recognized and your samples selected accordingly. For example, a person’s age or gender.
  • Treatment factors are those which are of interest to you in your experiment and that you’ll want to manipulate in order to test your hypothesis.
  • Nuisance factors aren’t of interest to you for the experiment but might affect your results regardless.

There are two basic types of treatment factors that you’ll use:

  • Quantitative factors can be set to any specific level required – for example, pH levels.
  • Qualitative factors contain a number of categories– for example, different plant species or a person’s gender.

A popular example in explaining factors is the simple-sounding task of baking cookies. Most people would simply follow a recipe – or, let’s face it, buy the cookie dough pre-made and bake whatever we don’t eat raw. But how did the recipe come to be in the first place? Someone had to experiment with ingredients and baking methods for the right combination.

  • Flour: The ratios of flour to liquid and flour to fat are crucial to the texture of a cookie. Too much flour, and you end up with a dry, crumbly cookie. Too little, and you end up with an overly flat, crispy cookie.
  • Sugar: The type of sugar used can change the way a cookie reacts to the baking process because using granulated (white) sugar usually creates a crisper, flatter cookie. Using brown sugar creates a moister, chewier cookie.
  • Fat: Rubbing the fat into the flour creates a softer cookie. Using butter creates a flatter cookie than using margarine.
  • Eggs: Eggs create a less crumbly, chewier cookie.
  • Baking powder: Using baking powder causes a cookie to rise or spread, creating a ‘cakey’ texture or a more crisp cookie.
  • Temperature: Low-temperature baking gives a cookie more time to spread out while cooking, meaning it’s more likely to be flatter and crisper.

Think of each of these ingredients and the baking temperature as factors in an experiment. You can’t test each factor independently – you need to have all ingredients to produce the cookies. But you can modify the amount, type of ingredient, and temperature at which they’re baked, to find the combination that yields your perfect cookie.

I originally created SixSigmaStudyGuide.com to help me prepare for my own Black belt exams. Overtime I've grown the site to help tens of thousands of Six Sigma belt candidates prepare for their Green Belt & Black Belt exams. Go here to learn how to pass your Six Sigma exam the 1st time through!

Naomi

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Insert/edit link

Enter the destination URL

Or link to existing content

A Two-Level, Three-Factor Full Factorial Design ¶

Table of contents ¶.

  • Introduction
  • Two-Level Three-Factor Full Factorial Design
  • Design of the Experiment
  • Inputs and Responses
  • Computing Main Effects
  • Analyzing Main Effects
  • Two Way Interactions
  • Analyzing Two Way Interactions
  • Three Way Interactions
  • Analyzing Three Way Interactions
  • Fitting a Polynomial Response Surface
  • The Impact of Uncertainty
  • Uncertainty Quantification: A Factory Example
  • Uncertainty Numbers
  • Uncertainty Measurements
  • Accounting for Uncertainty in the Model

Introduction ¶

As with other notebooks in this repository, this notebook follows, more or less closely, content from Box and Draper's Empirical Model-Building and Response Surfaces (Wiley, 1984). This content is covered by Chapter 4 of Box and Draper.

In this notebook, we'll carry out an anaylsis of a full factorial design, and show how we can obtain information about a system and its responses, and a quantifiable range of certainty about those values. This is the fundamental idea behind empirical model-building and allows us to construct cheap and simple models to represent complex, nonlinear systems.

Once we've nailed this down for simple models and small numbers of inputs and responses, we can expand on it, use more complex models, and link this material with machine learning algorithms.

We'll start by importing numpy for numerical analysis, and pandas for convenient data containers.

Box and Draper cover different experimental design methods in the book, but begin with the simplest type of factorial design in Chapter 4: a full factorial design with two levels. A factorial experimental design is appropriate for exploratory stages, when the effects of variables or their interactions on a system response are poorly understood or not quantifiable.

Two-Level Full Factorial Design ¶

The analysis begins with a two-level, three-variable experimental design - also written $2^3$, with $n=2$ levels for each factor, $k=3$ different factors. We start by encoding each fo the three variables to something generic: $(x_1,x_2,x_3)$. A dataframe with input variable values is then populated.

low high label
index
x1 250 350 Length of specimen (mm)
x2 8 10 Amplitude of load cycle (mm)
x3 40 50 Load (g)

Next, we encode the variable values. For an arbitrary variable value $\phi_1$, the value of the variable can be coded to be between -1 and 1 according to the formula:

where the average and the span of the variable $\phi_i$ are defined as:

low high label encoded_low encoded_high
index
x1 250 350 Length of specimen (mm) -1.0 1.0
x2 8 10 Amplitude of load cycle (mm) -1.0 1.0
x3 40 50 Load (g) -1.0 1.0

Design of the Experiment ¶

While everything preceding this point is important to state, to make sure we're being consistent and clear about our problem statement and assumptions, nothing preceding this point is particularly important to understanding how experimental design works. This is simply illustrating the process of transforming one's problem from a problem-specific problem space to a more general problem space.

Inputs and Responses ¶

Box and Draper present the results (observed outcomes) of a $2^3$ factorial experiment. The $2^3$ comes from the fact that there are 2 levels for each variable (-1 and 1) and three variables (x1, x2, and x3). The observed, or output, variable is the number of cycles to failure for a particular piece of machinery; this variable is more conveniently cast as a logarithm, as it can be a very large number.

Each observation data point consists of three input variable values and an output variable value, $(x_1, x_2, x_3, y)$, and can be thought of as a point in 3D space $(x_1,x_2,x_3)$ with an associated point value of $y$. Alternatively, this might be thought of as a point in 4D space (the first three dimensions are the location in 3D space where the point will appear, and the $y$ value is when it will actually appear).

The input variable values consist of all possible input value combinations, which we can produce using the itertools module:

Now we implement the observed outcomes; as we mentioned, these numbers are large (hundreds or thousands of cycles), and are more conveniently scaled by taking $\log_{10}()$ (which will rescale them to be integers between 1 and 4).

x1 x2 x3 y logy
0 -1 -1 -1 674 2.828660
1 1 -1 -1 3636 3.560624
2 -1 1 -1 170 2.230449
3 1 1 -1 1140 3.056905
4 -1 -1 1 292 2.465383
5 1 -1 1 2000 3.301030
6 -1 1 1 90 1.954243
7 1 1 1 360 2.556303

The variable inputs_df contains all input variables for the expeirment design, and results_df contains the inputs and responses for the experiment design; these variables are the encoded levels. To obtain the original, unscaled values, which allows us to check what experiments must be run, we can always convert the dataframe back to its originals by defining a function to un-apply the scaling equation. This is as simple as finding

Length of specimen (mm) Amplitude of load cycle (mm) Load (g)
0 250 8 40
1 350 8 40
2 250 10 40
3 350 10 40
4 250 8 50
5 350 8 50
6 250 10 50
7 350 10 50

Computing Main Effects ¶

Now we compute the main effects of each variable using the results of the experimental design. We'll use some shorthand Pandas functions to compute these averages: the groupby function, which groups rows of a dataframe according to some condition (in this case, the value of our variable of interest $x_i$).

Analyzing Main Effects ¶

The main effect of a given variable (as defined by Yates 1937) is the average difference in the level of response as the input variable moves from the low to the high level. If there are other variables, the change in the level of response is averaged over all combinations of the other variables.

Now that we've computed the main effects, we can analyze the results to glean some meaningful information about our system. The first variable x1 has a positive effect of 0.74 - this indicates that when x1 goes from its low level to its high level, it increases the value of the response (the lieftime of the equipment). This means x1 should be increased, if we want to make our equipment last longer. Furthermore, this effect was the largest, meaning it's the variable we should consider changing first.

This might be the case if, for example, changing the value of the input variables were capital-intensive. A company might decide that they can only afford to change one variable, x1 , x2 , or x3 . If this were the case, increasing x1 would be the way to go.

In contrast, increasing the variables x2 and x3 will result in a decrease in the lifespan of our equipment (makes the response smaller), since these have a negative main effect. These variables should be kept at their lower levels, or decreased, to increase the lifespan of the equipment.

Two-Way Interactions ¶

In addition to main effects, a factorial design will also reveal interaction effects between variables - both two-way interactions and three-way interactions. We can use the itertools library to compute the interaction effects using the results from the factorial design.

We'll use the Pandas groupby function again, grouping by two variables this time.

This one-liner is a bit hairy:

What this does is, computes the two-way variable effect with a multi-step calculation, but does it with a list comprehension. First, let's just look at this part:

This computes the prefix i*j , which determines if the interaction effect effects[i][j] is positive or negative. We're also looping over one additional dimension; we multiply by 1/2 for each additional dimension we loop over. These are all summed up to yield the final interaction effect for every combination of the input variables.

If we were computing three-way interaction effects, we would have a similar-looking one-liner, but with i , j , and k :

Analyzing Two-Way Interactions ¶

As with main effects, we can analyze the results of the interaction effects analysis to come to some useful conclusions about our physical system. A two-way interaction is a measure of how the main effect of one variable changes as the level of another variable changes. A negative two-way interaction between $x_2$ and $x_3$ means that if we increase $x_3$, the main effect of $x_2$ will be to decrase the response; or, alternatively, if we increase $x_2$, the main effect of $x_3$ will be to decrease the response.

In this case, we see that the $x_2-x_3$ interaction effect is the largest, and it is negative. This means that if we decrease both $x_2$ and $x_3$, it will increase our response - make the equipment last longer. In fact, all of the variable interactions have the same result - increasing both variables will decrease the lifetime of the equipment - which indicates that any gains in equipment lifetime accomplished by increasing $x_1$ will be nullified by increases to $x_2$ or $x_3$, since these variables will interact.

Once again, if we are limited in the changes that we can actually make to the equipment and input levels, we would want to keep $x_2$ and $x_3$ both at their low levels to keep the response variable value as high as possible.

Three-Way Interactions ¶

Now let's comptue the three-way effects (in this case, we can only have one three-way effect, since we only have three variables). We'll start by using the itertools library again, to create a tuple listing the three variables whose interactions we're computing. Then we'll use the Pandas groupby() feature to partition each output according to its inputs, and use it to compute the three-way effects.

Analysis of Three-Way Effects ¶

While three-way interactions are relatively rare, typically smaller, and harder to interpret, a negative three-way interaction esssentially means that increasing these variables, all together, will lead to interactions which lower the response (the lifespan of the equipment) by -0.082, which is equivalent to decreasing the lifespan of the equipment by one cycle. However, this effect is very weak comapred to main and interaction effects.

Fitting a Polynomial Response Surface ¶

While identifying general trends and the effects of different input variables on a system response is useful, it's more useful to have a mathematical model for the system. The factorial design we used is designed to get us coefficients for a linear model $\hat{y}$ that is a linear function of input variables $x_i$, and that predicts the actual system response $y$:

To determine these coefficients, we can obtain the effects we computed above. When we computed effects, we defined them as measuring the difference in the system response that changing a variable from -1 to +1 would have. Because this quantifies the change per two units of x, and the coefficients of a polynomial quantify the change per one unit of x, the effect must be divided by two.

Thus, the final result of the experimental design matrix and the 8 experiments that were run is the following polynomial for $\hat{y}$, which is a model for $y$, the system response:

The Impact of Uncertainty ¶

The main and interaction effects give us a more quantitative idea of what variables are important, yes. They can also be important for identifying where a model can be improved (if an input is linked strongly to a system response, more effort should be spent understanding the nature of the relationship).

But there are still some practical considerations missing from the implementation above. Specifically, in the real world it is impossible to know the system repsonse, $y$, perfectly. Rather, we may measure the response with an instrument whose uncertainty has been quantified, or we may measure a quantity multiple times (or both). How do we determine the impact of that uncertainty on the model?

Ultimately, factorial designs are based on the underlying assumption that the response $y$ is a linear function of the inputs $x_i$. Thus, for the three-factor full factorial experiment design, we are collecting data and running experiments in such a way that we obtain a model $\hat{y}$ for our system response $y$, and $\hat{y}$ is a linear function of each factor:

The experiment design allows us to obtain a value for each coefficient $a_0$, $a_1$, etc. that will fit $\hat{y}$ to $y$ to the best of its abilities.

Thus, uncertainty in the measured responses $y$ propagates into the linear model in the form of uncertainty in the coefficients $a_0$, $a_1$, etc.

Uncertainty Quantfication: Factory Example ¶

For example, suppose that we're dealing with a machine on a factory floor, and we're measuring the system response $y$, which is a machine failure. Now, how do we know if a machine has failed? Perhaps we can't see its internals, and it still makes noise. We might find out that a machine has failed by seeing it emit smoke. But sometimes, machines will emit smoke before they fail, while other times, machines will only smoke after they've failed. We don't know exactly how many life cycles the machines went through, but we can quantify what we know. We can measure the mean $\overline{y}$ and variance $\sigma^2$ in a controlled setting, so that when a machine starts smoking, we have a probability distribution assigning probabilities to different times of failure (i.e., there is a 5% chance it failed more than 1 hour ago).

Once we obtain the variance, or $\sigma^2$, we can obtain the value of $\sigma$, which represents the distribution of uncertainty. Assuming 2 sigma is acceptable (covers 95% of cases), we can add or subtract $\sigma$ from the estimate of parameters.

Uncertainty Numbers ¶

To obtain an estimate of the uncertainty, the experimentalist will typically make several measurements at the center point, that is, where all parameter levels are 0. The more samples are taken at this condition, the better characterized the distribution of uncertainty becomes. These center point samples can be used to construct a Gaussian probability distribution function, which yeilds a variance, $\sigma^2$ (or, to be proper, an estimate $s^2$ of the real variance $\sigma^2$). This parameter is key for quantifying uncertainty.

Using Uncertainty Measurements ¶

Suppose we measure $s^2 = 0.0050$. Now what?

Now we can obtain the variance of all measurements, and the variance in the effects that we computed above. These are computed via:

Alternatively, if the responses $y$ are actually averages of a given number $r$ of $y$-observations, $\overline{y}$, then the variance will shrink:

The variance gives us an estimate of sigma squared, and if we have sigma squared we can obtain sigma. Sigma is the quantity that represents the range of response values that captures 1 sigma, or 66%, of the probable values of $y$ with $\hat{y}$. Adding a plus or minus sigma means we are capturing 2 sigma, or 95%, of the probable values of $y$.

Taking the square root of the variance gives $\sigma$:

Accounting for Uncertainty in Model ¶

Now we can convert the values of the effects, and the values of $\sigma$, to values for the final linear model:

We begin with the case where each variable value is at its middle point (all non-constant terms are 0), and

In this case, the standard error is $\pm \sigma$ as computed for the mean (or overall) system response,

where $\sigma_{mean} = \sqrt{Var(mean)}$.

The final polynomial model for our system response prediction $\hat{y}$ therefore becomes:

Discussion ¶

At this point, we would usually dive deeper into the details of the actual problem of interest. By tying the empirical model to the system, we can draw conclusions about the physical system - for example, if we were analyzing a chemically reacting process, and we found the response to be particularly sensitive to temperature, it would indicate that the chemical reaction is sensitive to temperature, and that the reaction should be studied more deeply (in isolation from the more complicated system) to better understand the impact of temperature on the response.

It's also valuable to explore the linear model that we obtained more deeply, by looking at contours of the response surface, taking first derivatives, and optimizing the input variable values to maximize or minimize the response value. We'll leave those tasks for later, and illustrate them in later notebooks.

At this point we have accomplished the goal of illustrating the design, execution, and analysis of a two-level, three-factor full factorial experimental design, so we'll leave things at that.

Conclusion ¶

In this notebook, we've covered a 2-level, three-factor factorial design from start to finish, including incorporation of uncertainty information. The design of the experiment was made simple by using the itertools and pandas libraries, and we showed how to transform variables to have low and high levels, as well as demonstrating a system response transformation. The results were analyzed to obtain a linear polynomial model.

However, this process was a bit cumbersome. What we'll see in later notebooks is that we can use Python modules designed for statistical modeling to fit linear models to data using least squares and regression, and carry the analysis further.

Teach yourself statistics

ANOVA With Full Factorial Experiments

This lesson explains how to use analysis of variance (ANOVA) with balanced, completely randomized, full factorial experiments. The discussion covers general issues related to design, analysis, and interpretation with fixed factors and with random factors .

Future lessons expand on this discussion, using sample problems to demonstrate the analysis under the following scenarios:

  • Two-factor ANOVA: Fixed-effects model .
  • Two-factor ANOVA: Random-effects model .
  • Two-factor ANOVA: Mixed-effects model .
  • Two-factor ANOVA with Excel .

Design Considerations

Since this lesson is all about implementing analysis of variance with a balanced, completely randomized, full factorial experiment, we begin by answering four relevant questions:

  • What is a full factorial experiment?
  • What is a completely randomized design?
  • What are the data requirements for analysis of variance with a completely randomized, full factorial design?
  • What is a balanced design?

What is a Full Factorial Experiment?

A factorial experiment allows researchers to study the joint effect of two or more factors on a dependent variable .

With a full factorial design, the experiment includes a treatment group for every combination of factor levels. Therefore, the number of treatment groups is the product of factor levels. For example, consider the full factorial design shown below:

  C C C C
A B Grp 1 Grp 2 Grp 3 Grp 4
B Grp 5 Grp 6 Grp 7 Grp 8
B Grp 9 Grp 10 Grp 11 Grp 12
A B Grp 13 Grp 14 Grp 15 Grp 16
B Grp 17 Grp 18 Grp 19 Grp 20
B Grp 21 Grp 22 Grp 23 Grp 24
  A A
B B B B B B
C Group 1 Group 2 Group 3 Group 4 Group 5 Group 6
C Group 7 Group 8 Group 9 Group 10 Group 11 Group 12
C Group 13 Group 14 Group 15 Group 16 Group 17 Group 18
C Group 19 Group 20 Group 21 Group 22 Group 23 Group 24

Factor A has two levels, factor B has three levels, and factor C has four levels. Therefore, the full factorial design has 2 x 3 x 4 = 24 treatment groups.

Full factorial designs can be characterized by the number of treatment levels associated with each factor, or by the number of factors in the design. Thus, the design above could be described as a 2 x 3 x 4 design (number of treatment levels) or as a three-factor design (number of factors).

Note: Another type of factorial experiment is a fractional factorial. Unlike full factorial experiments, which include a treatment group for every combination of factor levels, fractional factorial experiments include only a subset of possible treatment groups. Our focus in this lesson is on full factorial experiments, rather than fractional factorial experiments.

Completely Randomized Design

With a full factorial experiment, a completely randomized design is distinguished by the following attributes:

  • The design has two or more factors (i.e., two or more independent variables ), each with two or more levels .
  • Treatment groups are defined by a unique combination of non-overlapping factor levels.
  • The number of treatment groups is the product of factor levels.
  • Experimental units are randomly selected from a known population .
  • Each experimental unit is randomly assigned to one, and only one, treatment group.
  • Each experimental unit provides one dependent variable score.

Data Requirements

Analysis of variance requires that the dependent variable be measured on an interval scale or a ratio scale . In addition, analysis of variance with a full factorial experiment makes three assumptions about dependent variable scores:

  • Independence . The dependent variable score for each experimental unit is independent of the score for any other unit.
  • Normality . In the population, dependent variable scores are normally distributed within treatment groups.
  • Equality of variance . In the population, the variance of dependent variable scores in each treatment group is equal. (Equality of variance is also known as homogeneity of variance or homoscedasticity.)

The assumption of independence is the most important assumption. When that assumption is violated, the resulting statistical tests can be misleading. This assumption is tenable when (a) experimental units are randomly sampled from the population and (b) sampled unitsare randomly assigned to treatments.

With respect to the other two assumptions, analysis of variance is more forgiving. Violations of normality are less problematic when the sample size is large. And violations of the equal variance assumption are less problematic when the sample size within groups is equal.

Before conducting an analysis of variance with data from a full factorial experiment, it is best practice to check for violations of normality and homogeneity assumptions. For further information, see:

  • How to Test for Normality: Three Simple Tests
  • How to Test for Homogeneity of Variance: Hartley's Fmax Test
  • How to Test for Homogeneity of Variance: Bartlett's Test

Balanced versus Unbalanced Design

A balanced design has an equal number of observations in all treatment groups. In contrast, an unbalanced design has an unequal number of observations in some treatment groups.

Balance is not required with one-way analysis of variance , but it is helpful with full-factorial designs because:

  • Balanced factorial designs are less vulnerable to violations of the equal variance assumption.
  • Balanced factorial designs have more statistical power .
  • Unbalanced factorial designs can produce confounded factors, making it hard to interpret results.
  • Unbalanced designs use special weights for data analysis, which complicates the analysis.

Note: Our focus in this lesson is on balanced designs.

Analytical Logic

To implement analysis of variance with a balanced, completely randomized, full factorial experiment, a researcher takes the following steps:

  • Specify a mathematical model to describe how main effects and interaction effects influence the dependent variable.
  • Write statistical hypotheses to be tested by experimental data.
  • Specify a significance level for a hypothesis test.
  • Compute the grand mean and the mean scores for each treatment group.
  • Compute sums of squares for each effect in the model.
  • Find the degrees of freedom associated with each effect in the model.
  • Based on sums of squares and degrees of freedom, compute mean squares for each effect in the model.
  • Find the expected value of the mean squares for each effect in the model.
  • Compute a test statistic for each effect, based on observed mean squares and their expected values.
  • Find the P value for each test statistic.
  • Accept or reject the null hypothesis for each effect, based on the P value and the significance level.
  • Assess the magnitude of effect, based on sums of squares.

If you are familiar with one-way analysis of variance (see One-Way Analysis of Variance ), you might notice that the analytical logic for a completely-randomized, single-factor experiment is very similar to the logic for a completely randomized, full factorial experiment. Here are the main differences:

  • Formulas for mean scores and sums of squares differ, depending on the number of factors in the experiment.
  • Expected mean squares differ, depending on whether the experiment tests fixed effects and/or random effects.

Below, we'll explain how to implement analysis of variance for fixed-effects models, random-effects models, and mixed models with a balanced, two-factor, completely randomized, full-factorial experiment.

Mathematical Model

For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable.

Fixed Effects

For example, here is the fixed-effects mathematical model for a two-factor, completely randomized, full-factorial experiment:

X i j m = μ + α i + β j + αβ i j + ε m ( ij )

where X i j m is the dependent variable score for subject m in treatment group ij , μ is the population mean, α i is the main effect of Factor A at level i ; β j is the main effect of Factor B at level j ; αβ i j is the interaction effect of Factor A at level i and Factor B at level j ; and ε m ( ij ) is the effect of all other extraneous variables on subject m in treatment group ij .

For this model, it is assumed that ε m ( ij ) is normally and independently distributed with a mean of zero and a variance of σ ε 2 . The mean ( μ ) is constant.

Note: The parentheses in ε m ( ij ) indicate that subjects are nested under treatment groups. When a subject is assigned to only one treatment group, we say that the subject is nested under a treatment.

Random Effects

The random-effects mathematical model for a completely randomized full factorial experiment is similar to the fixed-effects mathematical model. It can also be expressed as:

Like the fixed-effects mathematical model, the random-effects model also assumes that (1) ε m ( ij ) is normally and independently distributed with a mean of zero and a variance of σ ε 2 and (2) the mean ( μ ) is constant.

Here's the difference between the two mathematical models. With a fixed-effects model, the experimenter includes all treatment levels of interest in the experiment. With a random-effects model, the experimenter includes a random sample of treatment levels in the experiment. Therefore, in the random-effects mathematical model, the following is true:

  • The main effect ( α i  ) is a random variable with a mean of zero and a variance of σ 2 α .
  • The main effect ( β j  ) is a random variable with a mean of zero and a variance of σ 2 β .
  • The interaction effect ( αβ ij  ) is a random variable with a mean of zero and a variance of σ 2 αβ .

All three effects are assumed to be normally and independently distributed (NID).

Statistical Hypotheses

With a full factorial experiment, it is possible to test all main effects and all interaction effects. For example, here are the null hypotheses (H 0 ) and alternative hypotheses (H 1 ) for each effect in a two-factor full factorial experiment.

For fixed-effects models, it is common practice to write statistical hypotheses in terms of treatment effects:

H : α = 0 for all H : β = 0 for all H : αβ = 0 for all
H : α ≠ 0 for some H : β ≠ 0 for some H : αβ ≠ 0 for some

For random-effects models, it is common practice to write statistical hypotheses in terms of the variance of treatment levels included in the experiment:

H : σ = 0 H : σ = 0 H : σ = 0
H : σ ≠ 0 H : σ ≠ 0 H : σ ≠ 0

Significance Level

The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins. Experimenters often choose significance levels of 0.05 or 0.01.

A significance level of 0.05 means that there is a 5% chance of rejecting the null hypothesis when it is true. A significance level of 0.01 means that there is a 1% chance of rejecting the null hypothesis when it is true. The lower the significance level, the more persuasive the evidence needs to be before an experimenter can reject the null hypothesis.

Mean Scores

Analysis of variance for a full factorial experiment begins by computing a grand mean, marginal means , and group means. Here are formulas for computing the various means for a balanced, two-factor, full factorial experiment:

  • Grand mean. The grand mean ( X ) is the mean of all observations, computed as follows: N = p Σ i=1 q Σ j=1 n = pqn X  = ( 1 / N ) p Σ i=1 q Σ j=1 n Σ m=1 ( X  i j m  )
  • Marginal means for Factor A. The mean for level i of Factor A is computed as follows: X  i  = ( 1 / q ) q Σ j=1 n Σ m=1 ( X  i j m  )
  • Marginal means for Factor B. The mean for level j of Factor B is computed as follows: X  j  = ( 1 / p ) p Σ i=1 n Σ m=1 ( X  i j m  )
  • Group means. The mean of all observations in group i j ( X i j ) is computed as follows: X  i j  = ( 1 / n ) n Σ m=1 ( X  i j m  )

In the equations above, N is the total sample size across all treatment groups; n is the sample size in a single treatment group, p is the number of levels of Factor A, and q is the number of levels of Factor B.

Sums of Squares

A sum of squares is the sum of squared deviations from a mean score. Two-way analysis of variance makes use of five sums of squares:

  • Factor A sum of squares. The sum of squares for Factor A (SSA) measures variation of the marginal means of Factor A (  X  i  ) around the grand mean (  X  ). It can be computed from the following formula: SSA = nq p Σ i=1 (  X  i  -  X  ) 2
  • Factor B sum of squares. The sum of squares for Factor B (SSB) measures variation of the marginal means of Factor B (  X  j  ) around the grand mean (  X  ). It can be computed from the following formula: SSB = np q Σ j=1 (  X  j  -  X  ) 2
  • Interaction sum of squares. The sum of squares for the interaction between Factor A and Factor B (SSAB) can be computed from the following formula: SSAB = n p Σ i=1 q Σ j=1 (  X  i j  -  X   i  -  X   j  +  X  ) 2
  • Within-groups sum of squares. The within-groups sum of squares (SSW) measures variation of all scores ( X  i j m  ) around their respective group means (  X   i j  ). It can be computed from the following formula: SSW = p Σ i=1 q Σ j=1 n Σ m=1 ( X  i j m  -  X   i j  ) 2 Note: The within-groups sum of squares is also known as the error sum of squares (SSE).
  • Total sum of squares. The total sum of squares (SST) measures variation of all scores ( X  i j m  ) around the grand mean (  X  ). It can be computed from the following formula: SST = p Σ i=1 q Σ j=1 n Σ m=1 ( X  i j m  -  X  ) 2

In the formulas above, n is the sample size in each treatment group, p is the number of levels of Factor A, and q is the number of levels of Factor B.

It turns out that the total sum of squares is equal to the sum of the component sums of squares, as shown below:

SST = SSA + SSB + SSAB + SSW

As you'll see later on, this relationship will allow us to assess the relative magnitude of any effect (Factor A, Factor B, or the AB interaction) on the dependent variable.

Degrees of Freedom

The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.

The degrees of freedom used to compute the various sums of squares for a balanced, two-way factorial experiment are shown in the table below:

Sum of squares Degrees of freedom
Factor A p - 1
Factor B q - 1
AB interaction ( p - 1 )( q - 1)
Within groups pq( n - 1 )
Total npq - 1

Notice that there is an additive relationship between the various sums of squares. The degrees of freedom for total sum of squares (df TOT ) is equal to the degrees of freedom for the Factor A sum of squares (df A ) plus the degrees of freedom for the Factor B sum of squares (df B ) plus the degrees of freedom for the AB interaction sum of squares (df AB ) plus the degrees of freedom for within-groups sum of squares (df WG ). That is,

df TOT = df A + df B + df AB + df WG

Mean Squares

A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:

MS = SS / df

To conduct analysis of variance with a two-factor, full factorial experiment, we are interested in four mean squares:

MS A = SSA / df A

MS B = SSB / df B

MS AB = SSAB / df AB

MS WG = SSW / df WG

Expected Value

The expected value of a mean square is the average value of the mean square over a large number of experiments.

Statisticians have derived formulas for the expected value of mean squares for balanced, two-factor, full factorial experiments. The expected values differ, depending on whether the experiment uses all fixed factors, all random factors, or a mix of fixed and random factors.

Fixed-Effects Model

A fixed-effects model describes an experiment in which all factors are fixed factors. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment when both factors are fixed:

Mean square Expected value
MS σ + nqσ
MS σ + npσ
MS σ + nσ
MS σ

In the table above, n is the sample size in each treatment group, p is the number of levels for Factor A, q is the number of levels for Factor B, σ 2 A is the variance of main effects due to Factor A, σ 2 B is the variance of main effects due to Factor B, σ 2 AB is the variance due to interaction effects, and σ 2 WG is the variance due to extraneous variables (also known as variance due to experimental error).

Random-Effects Model

A random-effects model describes an experiment in which all factors are random factors. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment when both factors are random:

Mean square Expected value
MS σ + nσ + nqσ
MS σ + nσ + npσ
MS σ + nσ
MS σ

Mixed Model

A mixed model describes an experiment in which at least one factor is a fixed factor, and at least one factor is a random factor. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment, when Factor A is a fixed factor and Factor B is a random factor:

Mean square Expected value
MS σ + nσ + nqσ
MS σ + npσ
MS σ + nσ
MS σ

Note: The expected values shown in the tables are approximations. For all practical purposes, the values for the fixed-effects model will always be valid for computing test statistics (see below). The values for the random-effects model and the mixed model will be valid when random-effect levels in the experiment represent a small fraction of levels in the population.

Test Statistics

Suppose we want to test the significance of a main effect or the interaction effect in a two-factor, full factorial experiment. We can use the mean squares to define a test statistic F as follows:

F(v 1 , v 2 ) = MS EFFECT 1 / MS EFFECT 2

where MS EFFECT 1 is the mean square for the effect we want to test; MS EFFECT 2 is an appropriate mean square, based on the expected value of mean squares; v 1 is the degrees of freedom for MS EFFECT 1  ; and v 2 is the degrees of freedom for MS EFFECT 2 .

How do you choose an appropriate mean square for the denominator in an F ratio? The expected value of the denominator of the F ratio should be identical to the expected value of the numerator, except for one thing: The numerator should have an extra term that includes the variance of the effect being tested (σ 2 EFFECT ).

The table below shows how to construct F ratios when an experiment uses a fixed-effects model.

Table 1. Fixed-Effects Model

Effect Mean square:
Expected value
F ratio
A σ + nqσ
B σ + nqσ
AB σ + nσ
Error σ  

The table below shows how to construct F ratios when an experiment uses a Random-effects model.

Table 2. Random-Effects Model

Effect Mean square:
Expected value
F ratio
A σ + nσ + nqσ
B σ + nσ + npσ
AB σ + nσ
Error σ  

The table below shows how to construct F ratios when an experiment uses a mixed model. Here, Factor A is a fixed effect, and Factor B is a random effect.

Table 3. Mixed Model

Effect Mean square:
Expected value
F ratio
A
(fixed)
σ + nσ + nqσ
B
(random)
σ + npσ
AB σ + nσ
Error σ  

How to Interpret F Ratios

For each F ratio in the tables above, notice that numerator should equal the denominator when the variation due to the source effect ( σ 2  SOURCE ) is zero (i.e., when the source does not affect the dependent variable). And the numerator should be bigger than the denominator when the variation due to the source effect is not zero (i.e., when the source does affect the dependent variable).

Defined in this way, each F ratio is a convenient measure that we can use to test the null hypothesis about the effect of a source (Factor A, Factor B, or the AB interaction) on the dependent variable. Here's how to conduct the test:

  • When the F ratio is close to one, the numerator of the F ratio is approximately equal to the denominator. This indicates that the source did not affect the dependent variable, so we cannot reject the null hypothesis.
  • When the F ratio is significantly greater than one, the numerator is bigger than the denominator. This indicates that the source did affect the dependent variable, so we must reject the null hypothesis.

What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.

In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.

With analysis of variance for a full factorial experiment, the F ratios are the observed experimental outcomes that we are interested in. So, the P-value would be the probability that an F ratio would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.

How does an experimenter attach a probability to an observed F ratio? Luckily, the F ratio is a random variable that has an F distribution . The degrees of freedom (v 1 and v 2 ) for the F ratio are the degrees of freedom associated with the effects used to compute the F ratio.

For example, consider the F ratio for Factor A when Factor A is a fixed effect. That F ratio (F A ) is computed from the following formula:

F A = F(v 1 , v 2 ) = MS A / MS WG

MS A (the numerator in the formula) has degrees of freedom equal to df A  ; so for F A  , v 1 is equal to df A  . Similarly, MS WG (the denominator in the formula) has degrees of freedom equal to df WG  ; so for F A  , v 2 is equal to df WG  . Knowing the F ratio and its degrees of freedom, we can use an F table or an online calculator to find the probability that an F ratio will be bigger than the actual F ratio observed in the experiment.

F Distribution Calculator

To find the P-value associated with an F ratio, use Stat Trek's free F distribution calculator . You can access the calculator by clicking a link in the table of contents (at the top of this web page in the left column). find the calculator in the Appendix section of the table of contents, which can be accessed by tapping the "Analysis of Variance: Table of Contents" button at the top of the page. Or you can click tap the button below.

For examples that show how to find the P-value for an F ratio, see Problem 1 or Problem 2 at the end of this lesson.

Hypothesis Test

Recall that the experimenter specified a significance level early on - before the first data point was collected. Once you know the significance level and the P-values, the hypothesis tests are routine. Here's the decision rule for accepting or rejecting a null hypothesis:

  • If the P-value is bigger than the significance level, accept the null hypothesis.
  • If the P-value is equal to or smaller than the significance level, reject the null hypothesis.

A "big" P-value for a source of variation (Factor A, Factor B, or the AB interaction) indicates that the source did not have a statistically significant effect on the dependent variable. A "small" P-value indicates that the source did have a statistically significant effect on the dependent variable.

Magnitude of Effect

The hypothesis tests tell us whether sources of variation in our experiment had a statistically significant effect on the dependent variable, but the tests do not address the magnitude of the effect. Here's the issue:

  • When the sample size is large, you may find that even small effects (indicated by a small F ratio) are statistically significant.
  • When the sample size is small, you may find that even big effects are not statistically significant.

With this in mind, it is customary to supplement analysis of variance with an appropriate measure of effect size. Eta squared (η 2 ) is one such measure. Eta squared is the proportion of variance in the dependent variable that is explained by a treatment effect. The eta squared formula for a main effect or an interaction effect is:

η 2 = SS EFFECT / SST

where SS EFFECT is the sum of squares for a particular treatment effect (i.e., Factor A, Factor B, or the AB interaction) and SST is the total sum of squares.

ANOVA Summary Table

It is traditional to summarize ANOVA results in an analysis of variance table. Here, filled with hypothetical data, is an analysis of variance table for a 2 x 3 full factorial experiment.

Analysis of Variance Table

Source SS df MS F P
A 13,225 p - 1 = 1 13,225 9.45 0.004
B 2450 q - 1 = 2 1225 0.88 0.427
AB 9650 (p-1)(q-1) = 2 4825 3.45 0.045
WG 42,000 pq(n - 1) = 30 1400
Total 67,325 npq - 1 = 35

In this experiment, Factors A and B were fixed effects; so F ratios were computed with that in mind. There were two levels of Factor A, so p equals two. And there were three levels of Factor B, so q equals three. And finally, each treatment group had six subjects, so n equal six. The table shows critical outputs for each main effect and for the AB interaction effect.

Many of the table entries are derived from the sum of squares (SS) and degrees of freedom (df), based on the following formulas:

MS A = SS A / df A = 13,225/1 = 13,225

MS B = SS B / df B = 2450/2 = 1225

MS AB = SS AB / df AB = 9650/2 = 4825

MS WG = MS WG / df WG = 42,000/30 = 1400

F A = MS A / MS WG = 13,225/1400 = 9.45

F B = MS B / MS WG = 2450/1400 = 0.88

F AB = MS AB / MS WG = 9650/1400 = 3.45

where MS A is mean square for Factor A, MS B is mean square for Factor B, MS AB is mean square for the AB interaction, MS WG is the within-groups mean square, F A is the F ratio for Factor A, F B is the F ratio for Factor B, and F AB is the F ratio for the AB interaction.

An ANOVA table provides all the information an experimenter needs to (1) test hypotheses and (2) assess the magnitude of treatment effects.

Hypothesis Tests

The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When a P-value for a main effect or an interaction effect is bigger than the significance level, we accept the null hypothesis for the effect; when it is smaller, we reject the null hypothesis.

For example, based on the F ratios in the table above, we can draw the following conclusions:

  • The P-value for Factor A is 0.004. Since the P-value is smaller than the significance level (0.05), we reject the null hypothesis that Factor A has no effect on the dependent variable.
  • The P-value for Factor B is 0.427. Since the P-value is bigger than the significance level (0.05), we cannot reject the null hypothesis that Factor B has no effect on the dependent variable.
  • The P-value for the AB interaction is 0.045. Since the P-value is smaller than the significance level (0.05), we reject the null hypothesis of no significant interaction. That is, we conclude that the effect of each factor varies, depending on the level of the other factor.

Magnitude of Effects

To assess the strength of a treatment effect, an experimenter can compute eta squared (η 2 ). The computation is easy, using sum of squares entries from an ANOVA table in the formula below:

where SS EFFECT is the sum of squares for the main or interaction effect being tested and SST is the total sum of squares.

To illustrate how to this works, let's compute η 2 for the main effects and the interaction effect in the ANOVA table below:

Source SS df MS F P
A 100 2 50 2.5 0.09
B 180 3 60 3 0.04
AB 300 6 50 2.5 0.03
WG 960 48 20
Total 1540 59

Based on the table entries, here are the computations for eta squared (η 2 ):

η 2 A = SSA / SST = 100 / 1540 = 0.065

η 2 B = SSB / SST = 180 / 1540 = 0.117

η 2 AB = SSAB / SST = 300 / 1540 = 0.195

Conclusion: In this experiment, Factor A accounted for 6.5% of the variance in the dependent variable; Factor B, 11.7% of the variance; and the interaction effect, 19.5% of the variance.

Test Your Understanding

In the ANOVA table shown below, the P-value for Factor B is missing. Assuming Factors A and B are fixed effects , what is the correct entry for the missing P-value?

Source SS df MS F P
A 300 4 75 5.00 0.002
B 100 2 50 3.33 ???
AB 200 8 25 1.67 0.12
WG 900 60 15
Total 1500 74

Hint: Stat Trek's F Distribution Calculator may be helpful.

(A) 0.01 (B) 0.04 (C) 0.20 (D) 0.97 (E) 0.99

The correct answer is (B).

A P-value is the probability of obtaining a result more extreme (bigger) than the observed F ratio, assuming the null hypothesis is true. From the ANOVA table, we know the following:

  • The observed value of the F ratio for Factor B is 3.33.

F B = F(v 1 , v 2 ) = MS B / MS WG

  • The degrees of freedom (v 1 ) for the Factor B mean square (MS B ) is 2.
  • The degrees of freedom (v 2 ) for the within-groups mean square (MS WG ) is 60.

Therefore, the P-value we are looking for is the probability that an F with 2 and 60 degrees of freedom is greater than 3.33. We want to know:

P [ F(2, 60) > 3.33 ]

Now, we are ready to use the F Distribution Calculator . We enter the degrees of freedom (v1 = 2) for the Factor B mean square, the degrees of freedom (v2 = 60) for the within-groups mean square, and the F value (3.33) into the calculator; and hit the Calculate button.

The calculator reports that the probability that F is greater than 3.33 equals about 0.04. Hence, the correct P-value is 0.04.

In the ANOVA table shown below, the P-value for Factor B is missing. Assuming Factors A and B are random effects , what is the correct entry for the missing P-value?

Source SS df MS F P
A 300 4 75 3.00 0.09
B 100 2 50 2.00 ???
AB 200 8 25 1.67 0.12
WG 900 60 15
Total 1500 74

(A) 0.01 (B) 0.04 (C) 0.20 (D) 0.80 (E) 0.96

The correct answer is (C).

  • The observed value of the F ratio for Factor B is 2.0.

F B = F(v 1 , v 2 ) = MS B / MS AB

  • The degrees of freedom (v 2 ) for the AB interaction (MS AB ) is 8.

Therefore, the P-value we are looking for is the probability that an F with 2 and 8 degrees of freedom is greater than 2.0. We want to know:

P [ F(2, 8) > 2.0 ]

Now, we are ready to use the F Distribution Calculator . We enter the degrees of freedom (v1 = 2) for the Factor B mean square, the degrees of freedom (v2 = 8) for the AB interaction mean square, and the F value (2.0) into the calculator; and hit the Calculate button.

The calculator reports that the probability that F is greater than 2.0 equals about 0.20. Hence, the correct P-value is 0.20.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 9: 3-level and mixed-level factorials and fractional factorials, overview section  .

These designs are a generalization of the \(2^k\) designs. We will continue to talk about coded variables so we can describe designs in general terms, but in this case we will be assuming in the \(3^k\) designs that the factors are all quantitative. With \(2^k\) designs we weren't as strict about this because we could have either qualitative or quantitative factors. Most \(3^k\) designs are only useful where the factors are quantitative. With \(3^k\) designs we are moving from screening factors to analyzing them to understand what their actual response function looks like.

With 2 level designs, we had just two levels of each factor. This is fine for fitting a linear, straight line relationship. With three level of each factor we now have points at the middle so we will are able to fit curved response functions, i.e. quadratic response functions. In two dimensions with a square design space, using a \(2^k\) design we simply had corner points, which defined a square that looked like this:

In three dimensions the design region becomes a cube and with four or more factors it is a hypercube which we can't draw.

We can label the design points, similar to what we did before – see the columns on the left. However for these design we prefer the other way of coding, using {0,1,2} which is a generalization of the {0,1} coding that we used in the \(2^k\) designs. This is shown in the columns on the right in the table below:

A B   A B
- -   0 0
0 -   1 0
+ -   2 0
- 0   0 1
0 0   1 1
+ 0   2 1
- +   0 2
0 +   1 2
+ +   2 2

For either method of coding, the treatment combinations represent the actual values of \(X_1\) and \(X_2\), where there is some high level, a middle level and some low level of each factor. Visually our region of experimentation or region of interest is highlighted in the figure below when \(k = 2\):

If we look at the analysis of variance for a \(k = 2\) experiment with n replicates, where we have three levels of both factors we would have the following:

AOV
A 2
B 2
A x B 4
Error 9(n-1)
Total 9n-1

Important idea used for confounding and taking fractions

How we consider three level designs will parallel what we did in two level designs, therefore we may confound the experiment in incomplete blocks or simply utilize a fraction of the design. In two-level designs, the interactions each have 1 d.f. and consist only of +/- components, so it is simple to see how to do the confounding. Things are more complicated in 3 level designs, since a p-way interaction has \(2^p\) d.f. If we want to confound a main effect (2 d.f.) with a 2-way interaction (4 d.f.) we need to partition the interaction into 2 orthogonal pieces with 2 d.f. each. Then we will confound the main effect with one of the 2 pieces. There will be 2 choices. Similarly, if we want to confound a main effect with a 3-way interaction, we need to break the interaction into 4 pieces with 2 d.f. each. Each piece of the interaction is represented by a psuedo-factor with 3 levels. The method given using the Latin squares is quite simple . There is some clever modulus arithmetic in this section, but the details are not important. The important idea is that just as with the \(2^k\)designs, we can purposefully confound to achieve designs that are efficient either because they do not use the entire set of \(3^k\)runs or because they can be run in blocks which do not disturb our ability to estimate the effects of most interest.

Following the text, for the A*B interaction, we define the pseudo factors, which are called the AB component and the \(AB^2\) component. These components could be called pseudo-interaction effects. The two components will be defined as a linear combination as follows, where \(X_1\) is the level of factor A and \(X_2\) is the level of factor B using the {0,1,2} coding system. Let the \(AB\) component be defined as

\(L_{AB}=X_{1}+X_{2}\ (mod3)\)

and the \(AB^2\) component will be defined as:

\(L_{AB^2}=X_{1}+2X_{2}\ (mod3)\)

Using these definitions we can create the pseudo-interaction components. Below you see that the AB levels are defined by \(L_{AB}\) and the \(AB^2\) levels are defined by \(L_{AB^2}\).

\(A\) \(B\)   \(AB\) \(AB^2\)
0 0   0 0
1 0   1 1
2 0   2 2
0 1   1 2
1 1   2 0
2 1   0 1
0 2   2 1
1 2   0 2
2 2   1 0

This table has entries {0, 1, 2} which allow us to confound a main effect or either component of the interaction A*B. Each of these main effects or pseudo interaction components have three levels and therefore 2 degrees of freedom.

This section will also discuss partitioning the interaction SS's into 1 d.f. sums of squares associated with a polynomial, however, this is just polynomial regression. This method does not seem to be readily applicable to creating interpretable confounding patterns.

  • Application of \(3^k\) factorial designs, the interaction components and relative degrees of freedom
  • How to perform blocking of \(3^k\) designs in \(3^p\) number of blocks and how to choose the effect(s) which should be confounded with blocks
  • Concept of “Partial Confounding” in replicated blocked designs and its advantages
  • How to generate reasonable \(3^{k-p}\) fractional factorial designs and understand the alias structure
  • The fact that Latin square and Graeco-Latin square designs are special cases of \(3^k\) fractional  factorial design
  • Mixed level factorial designs and their applications
  • Set objectives
  • Select process variables and levels
  • Select experimental design
  • Completely randomized designs
  • Randomized block designs
  • Latin squares
  • Graeco-Latin squares
  • Hyper-Graeco-Latin squares
  • Full factorial designs
  • Two-level full factorial designs
  • Full factorial example
  • Blocking of full factorial designs
  • Fractional factorial designs
  • A 2 3-1 half-fraction design
  • How to construct a 2 3-1  design
  • Confounding
  • Design resolution
  • Use of fractional factorial designs
  • Screening designs
  • Fractional factorial designs summary tables
  • Plackett-Burman designs
  • Response surface (second-order) designs
  • Central composite designs
  • Box-Behnken designs
  • Response surface design comparisons
  • Blocking a response surface design
  • Adding center points
  • Improving fractional design resolution
  • Mirror-image foldover designs
  • Alternative foldover designs
  • Three-level full factorial designs
  • Three-level, mixed level and fractional factorial designs

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 17 June 2024

An explanatory study of factors influencing engagement in AI education at the K-12 Level: an extension of the classic TAM model

  • Xiaolin Zhang 2 , 6 ,
  • Jing Li 3 ,
  • Xiao Yang 4 ,
  • Dong Li 5 &
  • Yantong Liu 1  

Scientific Reports volume  14 , Article number:  13922 ( 2024 ) Cite this article

Metrics details

  • Computer science
  • Psychology and behaviour
  • Scientific data

Artificial intelligence (AI) holds immense promise for K-12 education, yet understanding the factors influencing students’ engagement with AI courses remains a challenge. This study addresses this gap by extending the technology acceptance model (TAM) to incorporate cognitive factors such as AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), alongside human–computer interaction (HCI) elements like user interface (UI), content (C), and learner-interface interactivity (LINT) in the context of using generative AI (GenAI) tools. By including these factors, an expanded model is presented to capture the complexity of student engagement with AI education. To validate the model, 210 Chinese students spanning grades K7 to K9 participated in a 1 month artificial intelligence course. Survey data and structural equation modeling reveal significant relationships between cognitive and HCI factors and perceived usefulness (PU) and ease of use (PEOU). Specifically, AIIM, AIRD, AICF, UI, C, and LINT positively influence PU and PEOU, while AIAX negatively affects both. Furthermore, PU and PEOU significantly predict students’ attitudes toward AI curriculum learning. These findings underscore the importance of considering cognitive and HCI factors in the design and implementation of AI education initiatives. By providing a theoretical foundation and practical insights, this study informs curriculum development and aids educational institutions and businesses in evaluating and optimizing AI4K12 curriculum design and implementation strategies.

Similar content being viewed by others

experimental 3 factors

The unified theory of acceptance and use of DingTalk for educational purposes in China: an extended structural equation model

experimental 3 factors

Exploring the impact of intelligent learning tools on students’ independent learning abilities: a PLS-SEM analysis of grade 6 students in China

experimental 3 factors

Exploring the effects of AI literacy in teacher learning: an empirical study

Introduction.

Artificial intelligence (AI) technologies, including blockchain, augmented reality, 3D printing, nanotechnology, and the internet of things, significantly impact various human life aspects 1 . AI’s promise to revolutionize education is evident, with countries like the United States and China actively promoting AI in K-12 education 2 , 3 , 4 . In May 2018, the association for the promotion of artificial intelligence (AAAI) and the association of computer science teachers (CSTA) formed a joint working group to develop national guidelines for K-12 AI education, establishing the K-12 AI education concept (AI4K12) 5 , 6 . A key advancement in AI research is generative AI (GenAI), which uses machine learning and deep learning to create new data 7 , 8 , 9 . GenAI applications include image generation, natural language processing, and music composition, with innovations like midjourney generated images and ChatGPT smart chat enhancing creativity and public engagement 10 , 11 , 12 . The rise of tools like ChatGPT has intensified GenAI’s role in educational research, drawing public and academic interest to its educational implications, challenges, and opportunities 13 , 14 , 15 .

Research on artificial intelligence in education (AIED) has examined learners’ receptivity, Technological, system quality, cultural, self-efficacy, and trust factors are deemed crucial in e-learning systems 10 , 11 , 12 . Studies in computer vision courses highlight the influence of prior knowledge, skills, learning styles, motivation, and self-efficacy, the usability of the system, observable rows, and experimentation also affect the use of computer tools in the classroom 16 , 17 , 18 , 19 . Students’ perspectives on employing ChatGPT in programming and programming education 20 . A scale was developed, based on the unified theory of acceptance and use of technology (UTAUT) model, to gauge students’ acceptance of AI applications generated by artificial intelligence 21 . This scale was tailored and crafted for individuals aged 18–60 in Turkey. The validity and reliability of the AI literacy scale were confirmed 22 . Studies on the utilization of chatbots in training programs disclose that social expectations, effort, and influence are pivotal factors for engagement 23 .Chai et al. explored the correlation between AI literacy, AI curriculum framework (AICF), social welfare, and behavioral intention (BI) in K-12 students, finding positive correlations among these elements 24 . Long and Magerko (2020) developed a framework for AI literacy in K-12, emphasizing design considerations like explainability and transparency 25 . Green et al.(2019) proposed disciplinary literacy instruction in K-12 engineering to address diversity barriers in engineering careers 26 . However, few studies have systematically examined K-12 students’ acceptance of AI programs, particularly the causal relationships and direct impact factors. Students’ perceptions, cognitive factors in AI learning 27 , GenAI tools’ interactivity, and human–computer interaction (HCI) 28 factors are all crucial in influencing acceptance of AI in education. Studying K-12 students’ attitudes towards AI courses using GenAI tools is a promising research area, vital for understanding engagement, learning outcomes, and course optimization 29 .

In this study, we adopt the technology acceptance model (TAM) as the theoretical framework to understand K-12 students’ attitudes towards AI courses using generative AI (GenAI) tools. TAM, has been widely utilized to assess users’ acceptance and adoption of new technologies. The model posits that perceived usefulness (PU) and perceived ease of use (PEOU) significantly influence users’ attitudes and behavioral intentions towards adopting a new technology. Building upon this foundation, our extended TAM incorporates cognitive factors related to AI learning, such as AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), alongside human–computer interaction (HCI) elements like user interface (UI), content (C), and learner interface interactivity (LINT) specific to GenAI tools. By integrating these additional constructs into the TAM framework, we aim to provide a more comprehensive understanding of the factors shaping K-12 students’ acceptance of AI courses facilitated by GenAI tools.

Recent empirical studies have shed light on various aspects of AI in education, providing valuable insights into factors influencing students’ attitudes and behaviors. For instance, research by Almaiah and Almulhem (2018) identified key success factors for e-learning system implementation using the Delphi technique 10 . Similarly, Almaiah, Al-Khasawneh, and Althunibat (2020) explored critical challenges and factors influencing e-learning system usage during the COVID-19 pandemic 11 . Thematic analysis by Almaiah and Al Mulhem (2020) classified main challenges and factors influencing the successful implementation of e-learning systems using NVivo 12 . These studies underscore the importance of understanding the dynamics of technology acceptance and usage in educational settings, providing valuable insights that inform our research approach and contribute to the broader discourse on AI in education.

This study aims to analyze K-12 students’ attitudes towards AI courses using GenAI tools. Employing a conceptual model based on the technology adoption model (TAM), it includes cognitive and HCI factors as external variables in an extended model. The study involves designing and implementing GenAI-based AI courses. A group of 210 Chinese K7–K9 students participated, with their experiences evaluated through post-course questionnaires. Hypotheses were tested using structural equation modeling, resulting in an enhanced TAM version.

The study’s innovative contributions are:

Developing a comprehensive set of indicators for factors influencing K-12 students’ attitudes towards GenAI tool-based courses.

Experimentally deriving interrelationships among these influencing factors.

Proposing an improved experimental methodology based on the TAM model to validate these relationships.

As K-12 AI education advances, the significance and refinement of related models and frameworks are expected to grow.

This study proposes an extended TAM combining students’ cognitive learning of AI courses with GenAI’s HCI factors, potentially offering new directions for TAM in GenAI-based education in K-12.

Literature review and hypothesis

Artificial intelligence (AI) is increasingly pivotal in education 30 . AI in education (AI-Ed) involves computers executing cognitive tasks akin to human thinking, particularly in learning and problem-solving. Over the past 30 years, AI-Ed has integrated into the education sector through various means, the integration of intelligent educational methods, curriculum design, and course structure aims to imbue students with environmental and sustainable development (ESD) awareness, while simultaneously incorporating cutting-edge technologies like artificial intelligence within the ESD framework 31 . This integration includes AI monitoring student forums, intelligent assessments, serving as a learning companion, assisting or replacing educators, and functioning as private tutors. Moreover, AI-Ed serves as a research tool for advancing science education 32 . Utilizing AI-Ed in computer science, machine learning, and deep learning can bridge the digital divide and foster AI literacy 13 . Consequently, AI has become an integral subject in K-12 education, preparing students with digital-driven knowledge and problem-solving skills for the digital world 33 .

Generative AI (GenAI) is a subset of AI that has garnered significant attention. It allows users to create new content, including text, images, audio, video, and 3D models, based on input requests. Recently, several GenAI platforms have emerged, such as ChatGPT, a large language model launched on November 30, 2022, which attracted a million users within five days of its release 34 , 35 . ChatGPT, as an AI chatbot, aids student learning by providing information and narration 36 . In the realm of GenAI imagery, platforms like Disco Diffusion, Dall-E2, Imagen, Mid Journey, and Stable Diffusion are prominent. Mid Journey, for example, creates artistic images based on user text inputs 37 , impacting art and education. These GenAI applications generate outputs after learning from user requests. Applying GenAI to AI4K12 involves interpreting science and technology through engineering and art 38 . For instance, using ChatGPT and Mid Journey in AI education practice for K-12 involves designing courses with visual narratives 39 . ChatGPT enhances students’ communication and narrative skills, while Mid Journey can be used to create picture books 40 . Incorporating tools like ChatGPT and Mid Journey into curriculum design is feasible for improving AI literacy in K-12 education.

Previous research has delved into various facets of AI in Education (AI-Ed), exploring educators’ readiness to teach AI, attitudes towards using chatbots in education, and factors influencing students' continued interest in AI learning. However, despite these valuable insights, there remains a gap in understanding the factors specifically influencing K-12 students’ acceptance of AI courses facilitated by generative AI (GenAI) tools. Given the increasing integration of AI into K-12 education and the emergence of GenAI platforms, it is crucial to explore the unique dynamics shaping students’ attitudes towards these innovative learning tools. By addressing this research gap, our study aims to contribute to the existing literature by providing insights into the factors driving K-12 students’ acceptance of AI4K12 courses, ultimately informing the design and implementation of effective AI education programs for this demographic.

Previous studies have examined various perspectives in AIED. These include educators’ readiness and willingness to teach AI 18 , combining diffusion theory with technology adoption rates confirms that usability and user-friendliness are relevant to the adoption rate of artificial intelligence tools in online learning 19 , and attitudes towards using chatbots in education 41 . Research has also focused on factors influencing students’ continued interest in AI learning 24 , perceptions of AI coaching 23 , and universities’ behavioral intention (BI) to use AI robots for education 42 . To foster the widespread adoption of GenAI programs in AI4K12, understanding the factors influencing K-12 students’ acceptance of such courses is essential. This study aims to explore these influential factors.

The technology acceptance model (TAM), originally proposed by Davis, provides a robust theoretical framework for examining user acceptance and usage of new technologies. While TAM has been widely applied across various fields, including healthcare, management, and finance, its application in the context of AI in education (AIED) remains relatively underexplored. Specifically, the unique characteristics of GenAI tools and their implications for students’ perceptions of usefulness and ease of use have not been thoroughly investigated within the TAM framework. By applying TAM to the study of K-12 students’ attitudes towards AI courses with GenAI tools, our research seeks to elucidate the underlying factors driving students’ acceptance of these innovative learning platforms. This theoretical approach allows us to identify key determinants of students’ attitudes and intentions towards AI4K12 courses, providing valuable insights for educators, policymakers, and developers seeking to enhance AI literacy and engagement among K-12 students.

The technology acceptance model (TAM), proposed by Davis, addresses user acceptance and usage of new technologies 43 , 44 .

Based on TAM model, this paper explores the willingness of university students to use the meta-universe-based learning platform. Perceived usefulness, personal innovation and perceived enjoyment are the key factors 45 . He suggested that perceived usefulness (PU) and perceived ease of use (PEOU) are key to embracing and promoting technology use 43 , 44 . TAM has been applied across various fields, including healthcare, management, finance, and education, For example: the university student to the mobile learning acceptance degree research 46 , 47 . In AI device acceptance studies, a theoretical model called AI device usage acceptance (AIDUA) includes social influence, personification, performance expectations, emotional engagement, and hedonic motivation as antecedents to user attitudes 48 . Other studies on AI acceptance have identified PU, performance expectations, attitude, trust, and effort expectation as influencing AI intention, willingness, and usage behavior 49 . Research on students’ willingness to continue AI learning revealed that AI literacy and its impact on social welfare affect students’ BI 24 . Applying TAM in AIED shows varying external variables influencing PU and PEOU from different research angles. The factors influencing K-12 students’ attitudes towards learning AI courses with GenAI tools remain unclear.

Materials and methods

External variables, learning cognition of ai.

Based on the cognitive characteristics of students in AI education for K-12 (AI4K-12), we propose the inclusion of four key variables to enhance the technology acceptance model (TAM) study: AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX).

This suggestion stems from a comprehensive review of teachers’ AI cognition and their willingness to teach AI, highlighting the importance of understanding AIAX, its impact on social welfare, attitude towards use (ATT) AI, perceived teaching confidence, AICF, AI correlations, AIRD, and behavioral intentions (BI). Furthermore, in the context of AI4K12, students’ cognition plays a crucial role in influencing their learning process and outcomes. Our research on developing and evaluating AI courses confirms the significance of learning perception abilities, including motivation, confidence, attitude, readiness, and anxiety, in shaping effective AI education strategies.

AI intrinsic motivation (AIIM): Previous study has shown that motivation can enhance students’ willingness to learn 50 , 51 , 52 . Intrinsic motivation possesses a psychological cognitive process of exploration, experimentation, curiosity, and manipulation, which is a natural manifestation of human learning and integration of knowledge 53 . The intrinsic motivation of learning guides students to set learning goals and continuously participates in the learning process through the classroom learning activities, which has a positive impact on academic performance 54 . Therefore we propose the following assumptions.

H1a, students’ AIIM has a positive impact on their PU in learning AI courses through the GenAI tool.

H1b, students’ AIIM has a positive impact on their PEOU in learning AI courses through the GenAI tool.

AI readiness (AIRD): the technology readiness index (TRI) is used to measure people’s tendency to accept and use advanced information technology 55 . Based on positive expectations for the use of technology, preparatory work can predict learning behavior 56 . AIRD can measure students’ understanding of the comfort level of AI knowledge and technology in their learning and life, and has a related impact on their learning attitude towards AI courses 57 . In the behavioral research of teachers teaching AIED, AIRD is related to BI, and PU has a positive impact on BI. Therefore, we propose the following hypothesis.

H2a, students’AIRD has a positive impact on their PU in learning AI courses through the GenAI tool.

H2b, students’AIRD has a positive impact on their PEOU in learning AI courses through the GenAI tool.

AI confidence: in AIED, AICF represents students’ confidence in learning AI course content 58 . AICF can affect students’ willingness to learn and other variables, and is an important impact factor on AI usage behavior 59 , 60 , 61 . In research on students using mobile devices for learning, it has been found that mobile device usage confidence has a positive impact on PEOU 62 , 63 . Therefore, we propose the following assumptions.

H3a, students’ AICF has a positive impact on their PU in learning AI courses through the GenAI tool.

H3b, students’ AICF has a positive impact on their PEOU in learning AI courses through the GenAI tool.

AI anxiety: computer phobia is defined as the fear and anxiety of advanced technology 64 . When using mobile devices for learning, mobile device anxiety can also affect learning behavior 63 . Based on the background of AI, AIAX can be traced back to technology phobia and computer anxiety. Define AIAX as a fear of AI, and users' concerns about the unknown impact of AI programs and related technological developments on humans and society 65 , 66 . In the use of ChatGPT, AIAX predicts learning behavior 67 , 68 , and the unease of GenAI usage affects user behavior 69 , 70 . In e-learning environments, where learners interact with AI tools during the process, anxiety and uneasiness affect the user’s usage.AIAX has an impact on PU 71 . In using the GenAI tool to learn AI courses, we propose the following assumptions.

H4a, students’ AIAX has a negative impact on their PU in learning AI courses through the GenAI tool.

H4b, students’ AIAX has a negative impact on their PEOU in learning AI courses through the GenAI tool.

HCI factors in AIED

HCI refers to the interaction between users and computers. And human–computer interaction refers to the computer-mediated dialogue that users engage in the created environment by themselves. Interactivity in online educational programs refers to the relationship between students and computers in a human–computer interaction environment 72 . During the process of using the GenAI tool for AIED, HCI has an impact on students’ attitudes and behaviors 67 . Therefore, based on the teaching characteristics of using the GenAI tool in AI4K12, we suggest that considering HCI factors and using interface design (UI), content (C), and learner interface interactivity (LINT) as variables to expand TAM research.

using interface design (UI): in HCI, an interface is defined as the visible part of the information system that can be touched, heard, and seen by the user 72 . UI is an important factor in the software development process, and user demand oriented design is the key to UI 73 . The emergence of user centered UI principles provides a theoretical basis for designers to conduct UI, such as distinguishing the most important information, buttons with consistent styles, and actively providing feedback 74 , 75 . In the research field of online courses or mobile applications for learning, following UI principles makes the system easier for students to use and operate, and UI also plays an important role in the system's PU, Based on the technology acceptance model, learning content quality, content design quality, interactivity, functionality, user interface design, accessibility, personalization, and responsiveness are the main factors influencing the acceptance of mobile learning 76 , 77 . In the process of using GenAI for teaching, the UI also has an impact on PEOU. Therefore, we propose the following assumptions.

H5, the UI of the GenAI tool used in AI course learning has a positive impact on PEOU.

Content (C): C is related to the course content. In the field of mobile devices, C is considered to have a significant impact on student satisfaction 78 . In the computer context, the structure and capacity of C have a direct impact on PU, and C is an important influencing factor for user acceptance of the system 79 . When investigating the factors that affect the use of BI on mobile devices, C has a positive impact on PU 63 . In evaluating the role of MOOC acceptance and use, C is positively correlated with PEOU 80 . Based on previous research findings, we propose the following assumptions.

H6a, the use of GenAI tools for teaching’s C has a positive impact on students’ PU in learning AI courses.

H6b, the use of GenAI tools for teaching's C has a positive impact on students’ PEOU in learning AI courses.

Learner interface interactivity (LINT): LINT allows users to interact with the system through the menu bar using the program 81 . When testing the impact of enhancing student interactivity on improving e-learning acceptance and the relationship between variables, there is a relationship between LINT, PU, and PEOU 29 . During the use of GenAI tools, LINT also has an impact on students’ PU and PEOU, so we assume that.

H7a, LINT has a positive impact on students’ PU in learning AI courses through the GenAI tool.

H7b, LINT has a positive impact on students’ PEOU in learning AI courses through the GenAI tool.

Internal variables

Perceived Usefulness (PU) is defined as the degree to which a user believes that using a specific system will improve their/her work performance. In addition, perceived ease of use (PEOU) is defined as the degree to which users do not need to put in any effort to use the system 43 , 44 . The correlation between TAM model structures has been proven in many studies. The relationship between PU and PEOU has also been confirmed in research in the field of education. Attitude towards use (ATT) is a person’s perception of technology, which is a psychological feedback of liking, enjoying, and being happy with technology 58 , usability, which has a positive impact on the practical use of m-learning systems 82 . In the previous research, there is a higher education students to adopt the meta-educational intention of the factors 83 . and studies on users’ sustained intention towards e-learning 84 have both concluded that both PU and PEOU affect a person’s ability to use the system’s ATT. Therefore, when studying the influencing factors of students’ attitudes towards AI teaching using GenAI, we propose the following assumptions:

H8, the PU of AI courses learned by students through the GenAI tool has a positive impact on ATT.

H9, students’ learning of AI courses through the GenAI tool has a positive impact on PU through PEOU.

H10, the PEOU of students learning AI courses through the GenAI tool has a positive impact on their attitude towards ATT.

Research model

This study analyzed the learning cognitive and human interaction factors that affect students’ attitudes. Expand Davis’ TAM model with external variables from literature review and previous research findings. Using PU, PEOU, and ATT as basic variables, seven external variables were derived through literature review and previous research analysis. Figure  1 shows the proposed hypothesis model.

figure 1

Assumption model.

This study endeavors to delve into the determinants shaping K-12 students’ perceptions of AI courses facilitated by generative AI (GenAI) tools. To elucidate these factors, an analytical framework was formulated, drawing inspiration from Davis’ technology acceptance model (TAM) as its foundational underpinning. Building upon the core constructs of TAM—perceived usefulness (PU), perceived ease of use (PEOU), and attitude towards use (ATT)—the research extends the model by incorporating additional external variables gleaned from an exhaustive literature review and synthesis of prior research. Specifically, the model integrates cognitive factors associated with AI learning, including AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), as well as human–computer interaction (HCI) elements such as user interface (UI), content (C), and Learner Interface Interactivity (LINT). Figure  1 depicts the proposed hypothesis model, illustrating the interconnections among these variables. For participant selection, a convenience sampling approach was adopted to recruit a cohort of 210 Chinese K-12 students spanning grades K7–K9. This sampling method was chosen for its practicality and ease of access, facilitating the efficient enlistment of participants from the target demographic. Demographic details, encompassing age, gender, and grade level, were gathered to furnish insights into the profile of the sample, enabling a more nuanced analysis of the research outcomes.

In terms of tool development and validation, all research instruments utilized in this study were selected or adapted from established measures drawn from prior research endeavors. Rigorous attention was dedicated to ensuring the reliability and validity of these measures, with necessary adjustments made to align them with the study context. Validation procedures encompassed pilot testing and expert validation to affirm the appropriateness of the measures in assessing the intended constructs. Through this meticulous validation process, the research instruments were deemed apt for capturing the pertinent variables of interest.

Data analysis procedures entailed the utilization of structural equation modeling (SEM) techniques to analyze the quantitative data collected through surveys. This analytical approach facilitated the testing of the stipulated hypotheses and the exploration of the relationships between the variables delineated in the research model. Statistical software packages such as SPSS and AMOS were employed to conduct the analyses, enabling robust statistical testing and elucidation of the research findings. By organizing the methodology section in a cohesive narrative format, this study offers a lucid and transparent depiction of the research design, participant recruitment approach, measurement instruments, and data analysis protocols, ensuring rigor and validity in the study’s outcomes.

Participants and experimental procedures

The participants in this study were 210 students selected from two high schools in China. Among them, 97 were males (45.7%) and 114 were females (54.3%). The students’ grades are K7-K9. Students voluntarily participate in research experiments and are aware of the research procedures. The data related to the experiments are anonymous and have also received permission and recognition from their parents and the school. Students will participate in a one month course, which mainly focuses on AI knowledge learning using the GenAI tool. The main content of the course is the creation of AI visual narratives (story picture books). All students are undergoing systematic AIED for the first time. The experimental process is shown in Fig.  2 .

figure 2

Experimental process.

Sample population: the sample population for this study consisted of K-12 students from various schools in China. These students were chosen to represent a diverse demographic, including different grade levels and socioeconomic backgrounds, to ensure the findings were applicable across a broad range of contexts.

Sampling technique: a stratified random sampling technique was employed to select participants for the study. Schools were stratified based on geographic location, school type (public/private), and grade level. Within each stratum, a random sample of schools was selected, and then students within those schools were randomly chosen to participate in the study. This sampling technique helped ensure that the sample was representative of the target population and minimized selection bias.

Justification of sample size: the sample size of 210 Chinese K-12 students was determined based on power analysis and the requirements for structural equation modeling (SEM) analysis. Prior research suggests that a sample size of at least 200 participants is adequate for SEM analysis, particularly when examining complex relationships among variables. Additionally, power analysis was conducted to ensure that the sample size was sufficient to detect meaningful effects with a reasonable degree of confidence. This sample size also allowed for subgroup analyses based on demographic variables such as grade level and gender, providing further insights into potential variations within the sample population.

Experimental implementation and feedback

The course spans one month, comprising a total of eight classes, dedicated to the creation of AI picture books centered on “AI, Love, and the Future”. From sessions 3 to 7, students delve into utilizing ChatGPT, Midjourney, and AI translation software for crafting their picture books. Collaboratively, teachers and students explore the nexus between AI and our world, leveraging GenAI for acquiring novel knowledge. This encompasses mastering AI translation software for bilingual tasks and harnessing generative chat tools for narrative continuity. Additionally, understanding how generative image systems operate in image creation and story coherence is emphasized. The final session involves student presentations, fostering discussions and idea exchanges between teachers and students. Course content design adheres to input from five AIED experts, detailed in Table 1 .

During the course implementation process, students use ChatGPT and Midjourney to create and showcase their works, as shown in Fig.  3 of the course implementation process.

figure 3

Course implementation process.

Questionnaire design

The survey instrument is divided into two parts. The first part of the survey questionnaire includes demographic questions, including gender, grade; and the second step uses 38 items to measure the 10 structures of the research model. Ten structures are classified as external variables and internal variables.

External variables (AIIM, AIRD, AICF, AIAX, UI, C, LINT).

Internal variables (PU, PEOU, ATT). Each construct is measured by multiple items. In order to obtain participants’ responses and quantify the construction, a five-point Likert scale was used to score the questionnaire responses. The Likert scale consists of five answer options, ranging from “strongly disagree” (mapped to number 1) to “strongly agree” (mapped to number 5).

This tool was developed after reviewing research on TAM models, AI learning cognitive factors, and HCI factors. All items in the survey questionnaire were proofread by translation experts and translated into Chinese. The specific content and reference materials of the variable item survey questionnaire are shown in Table 2 .

Questionnaire collection and demography

Following the course completion, students anonymously and voluntarily completed a questionnaire survey. The questionnaire was administered via the Chinese online platform, question star, resulting in 210 responses. Post-sorting, 13 responses were deemed invalid, leaving 197 valid ones. Demographic variables underwent frequency analysis utilizing SPSS 26 software, revealing a distribution of 86 boys (43.7%) and 111 girls (56.3%). Among them, 65 students were in seventh grade (33%), 70 in eighth grade (35.5%), and 62 in ninth grade (31.5%). Demographic information is summarized in Table 3 .

All methods were performed in accordance with relevant ethical guidelines and regulations, the experimental protocols were approved by the Academic Committee of Guangzhou University of Technology and Guangzhou University of Technology, and the experiments were conducted with the informed consent of all subjects and their legal guardians.

Data analysis methods

This study used SPSS 26 and SMART PLS 4.0 for data analysis. Data analysis includes two steps, reliability and validity analysis, as well as hypothesis testing. Firstly, internal consistency reliability (Cronbach's α and composite reliability} was measured by SMART PLS 4.0; and composite reliability (CR) was tested using SPSS 26. High CA and CR values indicate high reliability of the tool. It is recommended that the CA and CR values be higher than 0.70. To evaluate the convergence effectiveness of the construction, we used CR values and average variance extraction (AVE) values, and verified the discriminant effectiveness of the construction by analyzing the square root value of the extracted mean difference (AVE). If all constructs are higher than the correlation between constructs, then the sufficiency of discriminative validity is demonstrated. Secondly, after obtaining satisfactory results in the first step, use the structural model to test the hypothesis. Analyze the significance and magnitude of each path coefficient to test our hypothesis. The model fitting index was also evaluated to determine the adequacy of the proposed research model.

Experimental bias

To mitigate the potential effects of common method bias, several strategies were employed throughout the data collection and analysis processes. First, we ensured anonymity and confidentiality in the survey responses to encourage participants to provide honest and accurate answers without fear of judgment or repercussion. Additionally, we employed procedural remedies such as counterbalancing the order of questionnaire items and using reverse-coded items to minimize response bias. Furthermore, we conducted Harman’s single-factor test to assess the extent of common method bias in our data. The results indicated that no single factor accounted for the majority of the variance, suggesting that common method bias was not a significant concern in our study. However, we acknowledge that these measures may not completely eliminate common method bias and have included this limitation in our discussion.

Evaluation of measurement tools

Results of reliability and effectiveness testing.

Table 4 shows the reliability analysis results by SPSS 26, and the Clonbachα values meet the standard, all greater than 0.8. Therefore, it can be proven that the research results of variables are reasonable. To ensure the accuracy of measurement results, reliability analysis needs to be conducted on the valid data in the questionnaire before analysis.

Secondly, KMO and Bartlett tests were conducted to analyze the effectiveness of the entire questionnaire. The results are shown in Table 5 below.

From the Table 5 , it can be seen that the KMO value is 0.880, and the KMO value is greater than 0.8, which illustrate the research data is very suitable for extracting information.

Discriminant validity

The results of the discriminant validity test are shown in Table 6 . It can be seen that the AVE extracted square root (number on the diagonal) of each variable is greater than the correlation between this variable and other variables, so the data is considered to have good discriminant validity.

According to Table 7 , the above HTMT values are all below 0.85, indicating that the data has good discriminant validity.

Structural Equation Evaluation

Model fitting index.

The initial step in hypothesis testing involves assessing the structural model. Our model adheres to established fitting standards, with all model fitting index values deemed acceptable, including VIF < 5 and F2 > 0.02. Notably, all VIF values fall below 5, signifying the absence of significant collinearity concerns within the dataset.

Hypothesis testing

The sample size of this study is an important factor in model analysis. Therefore, after strict screening, 197 valid questionnaires were used for research analysis, which meets the sample size required for SMART PLS analysis. This study calculated the path coefficient and p-value. As shown in Fig.  4 , the significance of all assumed pathways is supported at the 0.05 significance level.

figure 4

Results of hypothesis testing.

The path coefficients of the structural equation model are shown in Table 8 .

In this study, students’ cognitive factors such as AIIM, AIRD and AICF influence positively on PU. Hypothesis all have been tested as H1a (AIIM → PU, β = 0.211); H1b (AIIM → PEOU, β = 0.166), H2a (AIRD → PU, β = 0.152), H2b (AIRD → PEOU, β = 0.136), H3a (AICF → PU, β = 0.158), H3b (AICF → PEOU, β = 0.159), and p < 0.05. The indicates that AIIM, AIRD, and AICF have a positive impact on attitudes among cognitive factors in learning AIED through the use of GenAI. However, H4a (AIAX → PU, β =—0.130), H4b (AIAX → PEOU, β =—0.162), and p < 0.05, indicating that student AIAX has a negative impact on students’ ATT. Among the HCI factors, C and LINT have a positive impact on PU and PEOU, while UI has a positive impact on PEOU. After verification, these assumptions are valid and valid. While, H5 (UI → PEOU, β = 0.173), H6a (C → PU, β = 0.168), H6b (C → PEOU, β = 0.184), H7a (LINT → PU, β = 0.145), H7b (LINT → PEOU, β = 0.203), and p < 0.05. PU and PEOU have a positive impact on ATT, with path coefficients ranging from 0.17 to 0.23 with p < 0.05.

To sum up, PU, PEOU, AIIM, AIRD, AICF, UI, C, and LINT are important factors that positively affect students’ attitudes towards learning AIED through the use of GenAI, while AIAX has a negative impact on ATT.

The impact of AI learning cognitive factors on PU and PEOU

The results of the study showed that firstly, AIIM had a positive impact on PU and it was the second most influential factor (0.211) on student acceptance as well as positively affecting PEOU.

Although AIRD has a positive effect on PU (0.152) and PEOU (0.136), its positive impact on PEOU is indeed the smallest. This is consistent with the research of Chiu et al. 24 .which previously believed that the level of AIRD can measure the understanding of AI knowledge and technology.

AICF is positively correlated with PU (0.158) and PEOU (0.159), which is consistent with the results of graduate students using mobile devices for learning (Stavros A. Nikou, et al.). The greater the confidence in learning AI, the better students can accept AI courses and maintain sustainable learning behavior.

AIAX has a negative impact on both PU (− 0.130) and PEOU (− 0.162). This is consistent with the results of Tae Hyun Baek and Minseong Kim’s study on students’ learning behavior using ChatGPT 67 , and also with the results of Stavros A. Nikou et al.’s study on mobile device use anxiety.

Positive impact of HCI factors on PU and PEOU

Among the HCI factors, UI is positively correlated with PEOU, with a path coefficient of 0.173. UI is considered an important influencing factor in online course student acceptance (AL-Sayid, F. and Kirkil, G.) 29 and mobile learning acceptance 77 . While using GenAI to teach courses, a more user-friendly interface makes it more likely for students to accept and choose this course.

In the results, C was found to be significantly positively correlated with PU (0.168) and PEOU (0.168).When using the GenAI tool for learning, LINT has a significant impact on PU and PEOU, with a path coefficient of 0.203.The use of programs through the menu bar to interact with the GenAI system has a significant impact on students’ acceptance.

Relationship between PU, PEOU, and ATT

We gained the following conclusion. PU has a positive impact on ATT (0.208). PEOU has a positive impact on PU (0.177). And PEOU has a positive impact on ATT (0.228). Previous research in the field of education has focused on the use of mobile devices and online courses. While the results of this study indicate that those factors are also applicable to the study of AI course acceptance by GenAI.

The results indicate that AIIM, AIRD, AICF, AIAX, UI, C and LINT all influence students’ attitudes towards learning AI. Among students' cognitive factors. AIIM has the greatest effect on PU, and among human HCI factors, LINT has the greatest effect on PEOU.

The discussion section of this study offers an in-depth analysis of K-12 students’ attitudes towards AI courses facilitated by generative AI (GenAI) tools. The examination is structured into three key segments, each focusing on distinct aspects of the research. Initially, the study investigates how cognitive factors related to AI learning influence perceived usefulness (PU) and perceived ease of use (PEOU). Subsequently, it explores the impact of human–computer interaction (HCI) factors on PU and PEOU. Finally, it delves into the interplay between PU, PEOU, and attitude towards use (ATT).Building upon established theoretical frameworks, such as the technology acceptance model (TAM), this study introduces a novel conceptual model tailored to assess K-12 students’ attitudes towards using GenAI tools in AI in education (AIED) courses. By incorporating both cognitive learning factors and HCI elements, the study extends the existing literature, offering a comprehensive understanding of the complex dynamics influencing students' attitudes towards AI education.

The empirical analysis conducted in this study validates the proposed model and hypotheses, thereby contributing to theoretical advancements in the field of AI4K12 education. However, it is crucial to contextualize these findings within the broader landscape of educational research. Previous studies, such as those by Almaiah and Almulhem (2018) 10 , Almaiah, Al-Khasawneh, and Althunibat (2020) 11 , and Almaiah and Al Mulhem (2020) 12 , have highlighted the critical challenges and success factors influencing the implementation and usage of e-learning systems. Drawing parallels between these studies and the current research can provide valuable insights into the unique considerations and obstacles associated with integrating innovative technologies, like GenAI, into educational settings.

From a practical perspective, the findings of this study underscore the potential of GenAI tools to enhance AIED methodologies. However, it is essential to recognize that variations in students’ cognitive learning processes may impact their attitudes and efficacy towards learning. By leveraging cutting-edge technologies and implementing pedagogical strategies informed by self-determination theory, educators and system designers can create inclusive and engaging learning experiences that promote sustained student engagement and mastery of AI knowledge.

In terms of theoretical implications, the findings of this study contribute significantly to the existing body of knowledge in the field of artificial intelligence in education (AIED). By expanding upon Davis’ technology acceptance model (TAM) with additional cognitive and human–computer interaction (HCI) factors, we have not only provided a more nuanced understanding of students’ attitudes towards AI courses facilitated by generative AI (GenAI) tools but also enriched the theoretical framework guiding research in this domain. This augmentation of the TAM model with external variables derived from the literature review and previous research findings offers a more comprehensive perspective on the determinants of students’ acceptance of AI4K12 courses. Furthermore, the empirical validation of this extended model through structural equation modeling adds robustness to its theoretical underpinnings and lays the groundwork for future research endeavors in the realm of AIED.

In conclusion, this study offers actionable insights for AIED policymakers, system developers, educators, and students, aiming to foster a superior AI learning experience for K-12 students. By addressing the complex interplay between cognitive factors, HCI elements, and attitudes towards AI education, this research contributes to the ongoing discourse surrounding the integration of GenAI tools in educational settings.

The advent of artificial intelligence (AI) represents both significant opportunities and challenges for society, as intelligent algorithms and robots increasingly assume roles across various sectors. As AI becomes more integrated into daily life, it becomes crucial for individuals to adapt to coexist with these technologies. This underscores the importance of early integration of AI in education (AIED) into student learning, necessitating pioneering research in AIED-centric pedagogy for the K-12 demographic.

This study delves into K-12 students’ perceptions of learning AI-related content through generative AI (GenAI) tools. Through an extensive literature review, the study identifies external factors shaping students’ attitudes towards learning and applies the technology acceptance model (TAM), integrating it with theories of cognitive learning and human–computer interaction (HCI). With the participation of 210 Chinese K-12 students, this work stands as a significant contribution to the field. The analysis validates ten hypotheses, demonstrating the substantial impact of cognitive and behavioral learning factors, alongside HCI considerations, on students’ attitudes towards AI education. These findings offer crucial insights for AIED policymakers and developers, informing the creation of diverse and engaging AI4K12 curricula aimed at sustaining students’ interest in AI and promoting ongoing engagement and acquisition of intricate AI knowledge. However, this study has its limitations. The predominantly Chinese sample may not fully represent the global student body, and the study does not comprehensively cover all K-12 age groups. Future research should encompass a broader spectrum of K-12 grade levels, span multiple countries and regions, and explore gender and grade-level variations among students. Additionally, reliance on a single experimental course approach and online quantitative data collection may not fully capture the nuances of students’ attitudes. Future investigations should integrate qualitative methodologies, such as semi-structured interviews and group discussions, for deeper insights.

The results of this study underscore the importance of considering both cognitive learning factors and HCI elements in designing and implementing AI courses in K-12 education. Our findings suggest that enhancing students' perceptions of usefulness and ease of use, while addressing potential anxiety associated with AI, is crucial for fostering positive attitudes towards AI education. By integrating GenAI tools into the curriculum, educators can create more engaging and effective learning experiences for students, thereby promoting the development of essential AI literacy skills. Moreover, our study sheds light on the complex interplay between cognitive and HCI factors in shaping students' attitudes towards AI education, highlighting the need for a holistic approach to curriculum design. Furthermore, recent studies have contributed to the development of scales aimed at measuring artificial intelligence literacy and acceptance, providing valuable tools for researchers and educators to assess students' readiness and attitudes towards AI education 20 , 21 , 22 .

In conclusion, this research offers a thorough examination of K-12 students’ attitudes towards AI education using GenAI tools, focusing on learning cognition and HCI factors. Future endeavors should explore additional factors affecting AI learning acceptance, including various aspects of the learning environment, and examine students’ AI learning experiences from diverse cognitive viewpoints.

In addition to the research findings and limitations discussed above, it is essential to consider the practical and theoretical implications of this study. Practically, the findings offer valuable insights for educators, policymakers, and developers involved in AI education for K-12 students. By identifying the cognitive and HCI factors that influence students’ attitudes towards AI education using GenAI tools, this research provides a roadmap for designing more effective and engaging AI4K12 curricula. Educators can leverage these insights to tailor their teaching approaches and course designs to better meet students’ needs and preferences, ultimately fostering a more positive learning experience.

Moreover, policymakers can use this research to inform decisions regarding the integration of AI education into school curricula, ensuring that students are adequately prepared for the future workforce. From a theoretical perspective, this study contributes to the existing body of literature on AI education and technology acceptance by extending the TAM framework to include cognitive and HCI factors specific to GenAI tools. By validating the proposed model and hypotheses, this research advances our understanding of the complex interplay between individual perceptions, cognitive processes, and technological interfaces in the context of AI education. Furthermore, the inclusion of HCI factors underscores the importance of considering user experience and interface design in educational technology development, highlighting the need for a more holistic approach to AI education research. Overall, the practical and theoretical implications of this study underscore its significance and provide a foundation for future research in the field of AI education.

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

  • Artificial intelligence

Artificial intelligence education

Artificial intelligence education for k-12

Generative artificial intelligence

Technology acceptance mode

Perceived usefulness

Perceived ease of use

Attitudes towards the use of artificial intelligence

Behavioral intent

Structural Equation model

Student’s intrinsic motivation to learn artificial intelligence

Artificial intelligence readiness

Artificial intelligence confidence

Artificial intelligence anxiety

Human–computer interaction

User interface

Learner-interface interactivity

Computational thinking

Design thinking

Russell Stuart, J. & Norvig, P. Artificial intelligence a modern approach (2010).

Google Scholar  

OECD. Trustworthy Artificial Intelligence (AI) in Education, Promises and Challenges. https://www.oecd.org/education/trustworthy-artificial-intelligence-ai-in-education-a6c90fa9-en.html . Accessed 10 Oct 2023. (2020).

Touretzky, D., Gardner-Mccune, C., Breazeal, C., Martin, F. & Seehorn, D. A year in K-12 AI education. AI. Mag. 40 , 88–90 (2019).

Touretzky D, Gardner-McCune C, Martin F, Seehorn D. Envisioning AI for K-12, What should every child know about AI? In Proceedings of the AAAI conference on artificial intelligence, Honolulu, 17 July 2019; pp. 9795-9799. (2019).

Ibe NA, Howsmon R, Penney L, Granor N, DeLyser LA, Wang K. Reflections of a diversity, equity, and inclusion working group based on data from a national CS education program. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education . New York, NY, 21 February 2018; ACM, New York, NY, USA; pp. 711–716. (2018).

Oermann, E. K. & Kondziolka, D. On chatbots and generative artificial intelligence. Neurosurgery 92 , 665–666 (2022).

Article   Google Scholar  

Yu, H. & Guo, Y. Generative artificial intelligence empowers educational reform, current status, issues, and prospects. Front. Educ. https://doi.org/10.3389/feduc.2023.1183162 (2023).

Stokel-Walker, C. AI bot ChatGPT writes smart essays-should academics worry. Nature https://doi.org/10.1038/d41586-022-04397-7 (2022).

Article   PubMed   Google Scholar  

Cooper, G. Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. J. Sci. Educ. Technol. 32 , 444–452 (2023).

Almaiah, M. A. & Almulhem, A. A conceptual framework for determining the success factors of e-learning system implementation using Delphi technique. J. Theor. Appl. Inf. Technol. 96 (17), 5962–5976 (2018).

Almaiah, M. A., Al-Khasawneh, A. & Althunibat, A. Exploring the critical challenges and factors influencing the E-learning system usage during COVID-19 pandemic. Educ. Inf. Technol. 25 , 5261–5280 (2020).

Almaiah, M. & Al Mulhem, A. Thematic analysis for classifying the main challenges and factors influencing the successful implementation of e-learning system using NVivo. Int. J. Adv. Trends Comput. Sci. Eng. 9 (1), 142–152 (2020).

Long D, Magerko B. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI conference on human factors in computing systems . New York, NY, USA, 23 April 2020, 1–16; ACM, New York, NY, USA; pp. 1–6. (2020).

Zhou, X., Van Brummelen, J. & Lin, P. Designing AI learning experiences for K-12: Emerging works, future opportunities and a design framework. Arxiv https://doi.org/10.48550/arXiv.2009.10228 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Lin, P.; Van Brummelen, J. Engaging teachers to co-design integrated AI curriculum for K-12 classrooms. In Proceedings of the 2021 CHI conference on human factors in computing systems . Yokohama, Japan, 07 May 2021; ACM, New York, NY, USA; pp. 1–12. (2021).

Sabuncuoglu A. Designing one year curriculum to teach artificial intelligence for middle school. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education . New York, NY, USA, 15 June 2020; ACM, New York, NY, USA; pp. 96-102. (2020).

Schleiss, J., Laupichler, M. C., Raupach, T. & Stober, S. AI course design planning framework, developing domain-specific ai education courses. Educ. Sci. 13 , 954 (2023).

Ayanwale, M. A., Sanusi, I. T., Adelana, O. P., Aruleba, K. D. & Oyelere, S. S. Teachers’ readiness and intention to teach artificial intelligence in schools. Comput. Educ. Artif. Intell. 3 , 100099 (2022).

Almaiah, M. A. et al. Measuring institutions’ adoption of artificial intelligence applications in online learning environments: Integrating the innovation diffusion theory with technology adoption rate. Electronics 11 (20), 3291 (2022).

Yilmaz, R. & Yilmaz, F. G. K. Augmented intelligence in programming learning: Examining student views on the use of ChatGPT for programming learning. Comput. Hum. Behav. Artif. Hum. 1 , 100005 (2023).

Yilmaz, F. G. K., Yilmaz, R. & Ceylan, M. Generative artificial intelligence acceptance scale: A validity and reliability study. Int. J. Hum. Comput. Interact. https://doi.org/10.1080/10447318.2023.2288730 (2023).

Yılmaz, F. G. & Karaoğlan, and Ramazan Yılmaz,. Yapay Zekâ Okuryazarlığı Ölçeğinin Türkçeye Uyarlanması. Bilgi Ve İletişim Teknolojileri Dergisi 5 (2), 172–190 (2023).

Terblanche, N., Molyn, J., Williams, K. & Maritz, J. Performance matters, students’ perceptions of artificial intelligence coach adoption factors. Coach. Int. J. Theor. 16 , 100–114 (2023).

Chai, J. L. et al. Factors influencing students’ behavioral intention to continue artificial intelligence learning. In International Symposium on Educational Technology (ISET) 147–150 (IEEE, 2020).

Zhou X, Van Brummelen J, Lin P. Designing AI learning experiences for K-12: Emerging works, future opportunities and a design framework. arXiv preprint arXiv:2009.10228. https://ar5iv.labs.arxiv.org/html/2009.10228 . (2020).

Wang, N. & Lester, J. K-12 education in the age of AI: A call to action for K-12 AI literacy. Int. J. Artif. Intell. Educ. 33 , 228–232 (2023).

Chiu, T. K. et al. Creation and evaluation of a pretertiary artificial intelligence (AI) curriculum. IEEE Trans. Educ. 65 , 30–39 (2021).

Lv, Z. Generative artificial intelligence in the metaverse era. Cogn. Robot. 3 , 208–217 (2023).

Al-Sayid, F. & Kirkil, G. Exploring non-linear relationships between perceived interactivity or interface design and acceptance of collaborative web-based learning. Educ. Inf. Technol. 28 , 11819–11866 (2023).

Chen, X., Xie, H., Zou, D. & Hwang, G. J. Application and theory gaps during the rise of artificial intelligence in education. Comput. Educ. Artif. Intell. 1 , 100002 (2020).

Shishakly, R., Almaiah, M., Lutfi, A. & Alrawad, M. The influence of using smart technologies for sustainable development in higher education institutions. Int. J. Data Netw. Sci. 8 (1), 77–90 (2024).

Holmes, W., Bialik, M. & Fadel, C. Artificial Intelligence in Education (Globethics Publications, 2023).

Ng, D. T. K., Luo, W., Chan, H. M. Y. & Chu, S. K. W. Using digital story writing as a pedagogy to develop AI literacy among primary students. Comput. Educ. Artif. Intell. 3 , 100054 (2022).

Dwivedi, Y. K. et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 71 , 102642 (2023).

Chiu, T. K., Moorhouse, B. L., Chai, J. L. & Ismailov, M. Teacher support and student motivation to learn with artificial intelligence (AI) based chatbot. Interact. Learn. Environ. https://doi.org/10.1080/10494820.2023.2172044 (2023).

Chiu, T. K. The impact of generative AI (GenAI) on practices, policies and research direction in education: A case of ChatGPT and midjourney. Interact. Learn. Environ. https://doi.org/10.1080/10494820.2023.2253861 (2023).

Wu, Y., Yu, N., Li, Z., Backes, M. & Zhang, Y. Membership inference attacks against text-to-image generation models. Arxiv https://doi.org/10.48550/arXiv.2210.00968 (2022).

Watson, A. D. & Watson, G. H. Transitioning STEM to STEAM: Reformation of engineering education. J. Qual. Part. 36 , 1–5 (2013).

Cohn, N. Visual narrative comprehension: Universal or not. Psychon. B. Rev. 27 , 266–285 (2020).

Kim, K. H. & Kim, H. G. A study on how to create interactive children’s books using ChatGPT and midjourney. Techart J. Art Imaging Sci. 10 , 39–46 (2023).

Chocarro, R., Cortiñas, M. & Marcos-Matás, G. Teachers’ attitudes towards chatbots in education, a technology acceptance model approach considering the effect of social language, bot proactiveness, and users’ characteristics. Educ. Stud. 49 , 295–313 (2023).

Roy, R., Babakerkhell, M. D., Mukherjee, S., Pal, D. & Funilkul, S. Evaluating the intention for the adoption of artificial intelligence-based robots in the university to educate the students. IEEE Access 10 , 125666–125678 (2022).

Davis, F. D. A Technology Acceptance Model for Empirically Testing New End-User Information Systems: Theory and Results (Massachusetts Institute of Technology, 1985).

Davis, F. D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. Mis. Quart. 13 , 319–340 (1989).

Al-Adwan, A. S. et al. Extending the technology acceptance model (TAM) to Predict University students’ intentions to use metaverse-based learning platforms. Educ. Inf. Technol. 28 (11), 15381–15413 (2023).

Almaiah, M. A. et al. Employing the TAM model to investigate the readiness of M-learning system usage using SEM technique. Electronics 11 (8), 1259 (2022).

Article   CAS   Google Scholar  

Almaiah, M. A. et al. Smart mobile learning success model for higher educational institutions in the context of the COVID-19 pandemic. Electronics 11 (8), 1278 (2022).

Gursoy, D., Chi, O. H., Lu, L. & Nunkoo, R. Consumers acceptance of artificially intelligent (AI) device use in service delivery. Int. J. Inform. Manage. 49 , 157–169 (2019).

Kelly, S., Kaye, S. A. & Oviedo-Trespalacios, O. What factors contribute to acceptance of artificial intelligence? A systematic review. . Telemat. Inform. 77 , 101925 (2022).

Garrison, D. R., Anderson, T. & Archer, W. Critical thinking, cognitive presence, and computer conferencing in distance education. Am. J. Distance Educ. 15 , 7–23 (2001).

Chai, J. L., Wang, X. & Xu, C. An extended theory of planned behavior for the modelling of Chinese secondary school students’ intention to learn artificial intelligence. Mathematics 8 , 2089 (2020).

Lan, Y. J., Botha, A., Shang, J. & Jong, M. S. Y. Guest editorial: Technology enhanced contextual game-based language learning. J. Educ. Technol. Soc. 21 , 86–89 (2018).

Ryan, R. M. & Deci, E. L. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25 , 54–67 (2000).

Article   CAS   PubMed   Google Scholar  

Froiland, J. M. & Worrell, F. C. Intrinsic motivation, learning goals, engagement, and achievement in a diverse high school. Psychol. Sch. 53 , 321–336 (2016).

Fagan, M. H., Neill, S. & Wooldridge, B. R. Exploring the intention to use computers: An empirical investigation of the role of intrinsic motivation, extrinsic motivation, and perceived ease of use. J. Comput. Inform. Syst. 48 , 31–37 (2008).

Martín-Núñez, J. L., Ar, A. Y., Fernández, R. P., Abbas, A. & Radovanović, D. Does intrinsic motivation mediate perceived artificial intelligence (AI) learning and computational thinking of students during the COVID-19 pandemic. Comput. Educ. Artif. Intell. 4 , 100128 (2023).

Parasuraman, A. & Colby, C. L. An updated and streamlined technology readiness index: TRI 2.0. J. Serv. Res. 18 , 59–74 (2015).

Dai, Y. et al. Promoting students’ well-being by developing their readiness for the artificial intelligence age. Sustain. Sci. 12 , 6597 (2020).

Ajzen, I. The theory of planned behavior. Organ. Behav. Hum. Decis. 50 , 179–211 (1991).

Lin, P. Y. et al. Modeling the structural relationship among primary students’ motivation to learn artificial intelligence. Comput. Educ. Artif. Intell. 2 , 100006 (2021).

Chai, J. L. et al. Perceptions of and behavioral intentions towards learning artificial intelligence in primary school students. Educ. Technol. Soc. 24 , 89–101 (2021).

Owolabi, K. et al. Awareness and readiness of Nigerian polytechnic students towards adopting artificial intelligence in libraries. J. Inf. Knowl. 59 , 15–24 (2022).

Nikou, S. A. & Economides, A. A. Mobile-based assessment: Investigating the factors that influence behavioral intention to use. Comput. Educ. 109 , 56–73 (2017).

Ha, J. G., Page, T. & Thorsteinsson, G. A study on technophobia and mobile device design. Int. J. Contents 7 , 17–25 (2011).

Johnson, D. G. & Verdicchio, M. AI anxiety. J. Assoc. Inf. Sci. Tech. 68 , 2267–2270 (2017).

Wang, Y. Y. & Wang, Y. S. Development and validation of an artificial intelligence anxiety scale: An initial application in predicting motivated learning behavior. Interact. Learn. Envir. 30 , 619–634 (2022).

Baek, T. H. & Kim, M. Is ChatGPT scary good? How user motivations affect creepiness and trust in generative artificial intelligence. Telemat. Inform. 83 , 102030 (2023).

Massey, B. L. & Levy, M. R. Interactivity, online journalism, and English-language web newspapers in Asia. J. Mass. Commun. Q. 76 , 138–151 (1999).

Mcmillan, S. J. The researchers and the concept: Moving beyond a blind examination of interactivity. J. Interact. Advert. 5 , 1–4 (2005).

Cho, C. H. Effects of banner clicking and attitude toward the linked target ads on brand-attitude and purchase-intention changes. J. Glob. Acad. Market. Sci. 14 , 1–16 (2004).

Article   ADS   Google Scholar  

Almaiah, M. A. et al. Examining the impact of artificial intelligence and social and computer anxiety in e-learning settings: Students’ perceptions at the university level. Electronics 11 (22), 3662 (2022).

Head, A. J. Design Wise: A Guide for Evaluating the Interface Design of Information Resources 19–99 (Information Today, Inc., 1999).

Cliff, M., Dillon, A. & Richardson, J. User Centered Design of Hypertext and Hypermedia for Education (Macmillan, 1996).

Wang, S. K. & Yang, C. The interface design and the usability testing of a fossilization web-based learning environment. J. Sci. Educ. Technol. 14 , 305–313 (2005).

Lohr, L. L., Falvo, D. A., Hunt, E. & Johnson, B. Improving the usability of distance learning through template modification. In Flexible Learning in an Information Society (ed. Khan, B. H.) 186–197 (IGI Global, 2007).

Chapter   Google Scholar  

Liu, I. F., Chen, M. C., Sun, Y. S., Wible, D. & Kuo, C. H. Extending the TAM model to explore the factors that affect intention to use an online learning community. Comput. Educ. 54 , 600–610 (2010).

Almaiah, M. A., Jalil, M. A. & Man, M. Extending the TAM to examine the effects of quality features on mobile learning acceptance. J. Comput. Educ. 3 , 453–485 (2016).

Shee, D. Y. & Wang, Y. S. Multi-criteria evaluation of the web-based e-learning system, a methodology based on learner satisfaction and its applications. Comput. Educ. 50 , 894–905 (2008).

Terzis, V. & Economides, A. A. The acceptance and use of computer based assessment. Comput. Educ. 56 , 1032–1044 (2011).

Lee, B. C., Yoon, J. O. & Lee, I. Learners’ acceptance of e-learning in South Korea, theories and results. Comput. Educ. 53 , 1320–1329 (2009).

Isaias, P. & Issa, T. Sustainable design, HCI, usability and environmental concerns (Springer-Verlag, 2015).

Althunibat, A., Almaiah, M. A. & Altarawneh, F. Examining the factors influencing the mobile learning applications usage in higher education during the COVID-19 pandemic. Electronics 10 (21), 2676 (2021).

Al-Adwan, A. S. et al. Unlocking future learning: Exploring higher education students’ intention to adopt meta-education. Heliyon 10 (9), e29544 (2024).

Lee, M. C. Explaining and predicting users’ continuance intention toward e-learning, an extension of the expectation-confirmation model. Comput. Educ. 54 , 506–516 (2010).

Duncan, T. G. & Mckeachie, W. J. The making of the motivated strategies for learning questionnaire. Educ. Psychol. 40 , 117–128 (2005).

Chou, C. Interactivity and interactive functions in web-based learning systems, a technical framework for designers. Brit. J. Educ. Technol. 34 , 265–279 (2003).

Download references

This research was funded by the Chinese Ministry of Education Collaborative Education Project between Universities and Firms (grant number 220605242172594) and the Guangdong University of Technology Online Course Construction Project (grant number 211210102) and Supported by Kunsan National University’s Industry-Academia Cooperation Group (IACG) (grant number 2023H052).

Author information

Authors and affiliations.

Department of Computer Information Engineering, Kunsan National University, Gunsan, 54150, Republic of Korea

Yantong Liu

Department of Smart Experience Design, Kookmin University, Seoul, 02707, Republic of Korea

Wei Li & Xiaolin Zhang

Department of Educational Psychology, University of Georgia, Athens, GA, 30605, USA

Department of Poultry Science, University of Georgia, Athens, GA, 30605, USA

Department of International Culture Education, Chodang University, Muan, 58530, Republic of Korea

College of Art and Design, Guangdong University of Technology, Guangzhou, 510006, China

Xiaolin Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, W.L. and X.Z..; methodology, W.L.; software, W.L.; validation, W.L. and J.L..; formal analysis, W.L.; investigation, W.L. and Y.L..; resources, X.Y.; data curation, J.L. and D.L..; revision and funding, Y.L., X.Z. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Yantong Liu .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Li, W., Zhang, X., Li, J. et al. An explanatory study of factors influencing engagement in AI education at the K-12 Level: an extension of the classic TAM model. Sci Rep 14 , 13922 (2024). https://doi.org/10.1038/s41598-024-64363-3

Download citation

Received : 20 February 2024

Accepted : 07 June 2024

Published : 17 June 2024

DOI : https://doi.org/10.1038/s41598-024-64363-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Cognitive factors of learning
  • HCI factors

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

experimental 3 factors

Survival trend and outcome prediction for pediatric Hodgkin and non-Hodgkin lymphomas based on machine learning

  • Open access
  • Published: 18 June 2024
  • Volume 24 , article number  132 , ( 2024 )

Cite this article

You have full access to this open access article

experimental 3 factors

  • Yue Zheng 1 , 2   na1 ,
  • Chunlan Zhang 3   na1 ,
  • Kai Kang 1 , 2 ,
  • Ren Luo 1 , 2 ,
  • Ailin Zhao 3 &
  • Yijun Wu 1 , 2  

Pediatric Hodgkin and non-Hodgkin lymphomas differ from adult cases in biology and management, yet there is a lack of survival analysis tailored to pediatric lymphoma. We analyzed lymphoma data from 1975 to 2018, comparing survival trends between 7,871 pediatric and 226,211 adult patients, identified key risk factors for pediatric lymphoma survival, developed a predictive nomogram, and utilized machine learning to predict long-term lymphoma-specific mortality risk. Between 1975 and 2018, we observed substantial increases in 1-year (19.3%), 5-year (41.9%), and 10-year (48.8%) overall survival rates in pediatric patients with lymphoma. Prognostic factors such as age, sex, race, Ann Arbor stage, lymphoma subtypes, and radiotherapy were incorporated into the nomogram. The nomogram exhibited excellent predictive performance with area under the curve (AUC) values of 0.766, 0.724, and 0.703 for one-year, five-year, and ten-year survival, respectively, in the training cohort, and AUC values of 0.776, 0.712, and 0.696 in the validation cohort. Importantly, the nomogram outperformed the Ann Arbor staging system in survival prediction. Machine learning models achieved AUC values of approximately 0.75, surpassing the conventional method (AUC =  ~ 0.70) in predicting the risk of lymphoma-specific death. We also observed that pediatric lymphoma survivors had a substantially reduced risk of lymphoma after ten years b,ut faced an increasing risk of non-lymphoma diseases. The study highlights substantial improvements in pediatric lymphoma survival, offers reliable predictive tools, and underscores the importance of long-term monitoring for non-lymphoma health issues in pediatric patients.

Avoid common mistakes on your manuscript.

Introduction

Lymphoma stands as the third most prevalent pediatric cancer, comprising 15% of childhood malignancies [ 1 ]. Despite significant advancements in treatment approaches that have markedly improved the outlook for pediatric lymphoma patients in recent decades, lymphoma remains a notable contributor to childhood cancer-related mortality [ 2 , 3 ]. This is especially true for children aged 1–10 years. Notably, treatment outcomes can exhibit considerable variability, potentially attributed to a complex interplay of psychosocial factors, patient-specific variables, tumor subtypes, and their underlying biological characteristics [ 4 ]. Therefore, it is imperative to conduct a comprehensive investigation on a substantial scale to discern the factors influencing survival and prognosis in pediatric lymphoma patients.

Recent investigations have illuminated the shifting landscape of pediatric lymphoma through extensive database analyses. For example, Kahn et al. [ 5 ] delved into racial disparities in the survival of pediatric Hodgkin lymphoma (HL) patients, revealing that Black patients exhibited a significantly lower 10-year overall survival (OS) rate compared to Caucasians. Interestingly, this survival gap has been narrowing over time, primarily due to more substantial improvements in the ten-year OS rates observed among Black patients. In a similar vein, Bazzeh et al. [ 6 ] focused their exploration on pediatric HL patients spanning from 1988 to 2005, identifying stage IV disease and the presence of B symptoms as independent prognostic risk factors. Various studies have focused on specific facets or subtypes of pediatric non-Hodgkin lymphoma (NHL), such as cutaneous T-cell or B-cell lymphoma, as well as primary gastrointestinal lymphoma within the Surveillance, Epidemiology, and End Results (SEER) database [ 7 , 8 , 9 ]. The imperative for a comprehensive investigation into the survival and prognosis of both HL and NHL in pediatric patients remains paramount. Therefore, building upon the extensive clinical data available in the SEER database, encompassing patients with lymphoma from 1975 to 2018, we aimed to comprehensively explore the survival and prognosis predictors of pediatric lymphoma, serving as the foundation for the development of machine learning models capable of reliably predicting survival outcomes. Simultaneously, this study sought to analyze survival trends over recent decades to pinpoint key aspects that could guide the future trajectory of pediatric lymphoma research.

Subjects and methods

Data source.

The data for this study were sourced from cancer records spanning a period from 1975 to 2018, originating from nine specific states within the United States through SEER database collected and consolidated by the National Cancer Institute as part of its commitment to tackling the increasing burden of cancer. The selected states that contributed to this dataset encompass Connecticut, Michigan, Georgia, California, Hawaii, Iowa, New Mexico, Washington, and Utah. The SEER database, accessible at https://seer.cancer.gov , stands as a comprehensive repository of cancer-related information. A visual representation of the study's flow and methodology can be found in Supplemental Figure S1 .

Patient enrollment

Patients diagnosed with primary lymphoma at ages ranging from 0 to 19 years were identified using the third edition of the International Classification of Diseases for Oncology. To conduct survival-associated analyses, additional screening was carried out to exclude cases lacking follow-up information or involving patients who passed away within one month after their diagnosis. Extensive demographic and clinical data pertaining to the patients were meticulously gathered. This encompassed data such as the age at diagnosis, sex, race, tumor subtype, Ann Arbor staging, year of diagnosis, the utilization of chemotherapy and radiotherapy, and vital status. It is important to underscore that the execution of this study was carried out in strict adherence to the Strengthening the Reporting of Observational Studies in Epidemiology guideline, ensuring the robustness and transparency of the research methodology [ 10 ].

Outcome prediction tools

The analysis of factors associated with OS among pediatric lymphoma patients was carried out using a multivariable Cox proportional hazards regression model. A nomogram model, built upon the most influential factors, was developed to predict OS at one-year, five-year, and ten-year intervals. This model underwent external validation within a separate validation cohort, created through random division at a 7:3 ratio. The model's precision was confirmed by assessing the area under the curve (AUC) of receiver operating characteristic (ROC), and comparisons were made between the nomogram and the Ann Arbor staging system. Additionally, a calibration curve was generated to compare the predictive outcomes of the nomogram against actual survival rates.

Five well-established machine learning algorithms were employed to predict the long-term risk of lymphoma-specific mortality. These algorithms included extreme gradient boosting (XGB), the random forest classifier (RFC), adaptive boosting (ADB), artificial neural network (ANN), and gradient boosting decision tree (GBDT), alongside logistic regression (LR). The parameters for each machine learning algorithm are shown in Supplemental Table S1 . The ANN algorithm is a complex, highly interconnected network composed of adaptable units that mimic the interaction of biological nervous systems with real-world objects. RFC represents an advanced iteration of the decision tree algorithm, suitable for both regression and classification tasks. GBDT, XGB, and ADB are part of the ensemble learning category of machine learning algorithms, known for improving classifier generalization by training multiple classifiers and combining their results for enhanced predictive performance. Additionally, to enhance the reliability of models, continuous variables underwent z -score normalization as preprocessing. Except for LR, the transparency of these algorithms is limited, making it challenging for users to decipher the relationship between variables and outcomes. To enhance the reliability of models, continuous variables underwent z-score normalization, and categorical variables were one-hot encoded. Feature selection using Cox regression identified potential prognostic predictors.

The training procedure involved several key steps. Each algorithm was trained using fivefold cross-validation to ensure robustness and prevent overfitting, with the datasets split in a 7:3 ratio for training and validation. For the ANN model, the Adam optimizer was employed with a binary cross-entropy loss function, trained for 100 epochs with a batch size of 32. Early stopping was implemented to prevent overfitting by monitoring the validation loss and halting training if no improvement was observed for ten consecutive epochs. The performance of each model was evaluated using the area under the curve (AUC) of receiver operating characteristic (ROC) curves, and decision curve analysis (DCA) was conducted to assess clinical utility.

Statistical analysis

The extraction of patient data, including clinical characteristics and follow-up information, was conducted using SEER*Stat version 8.3.9 software, accessible at https://seer.cancer.gov/seerstat . Subsequent statistical analyses were performed using IBM SPSS version 27.0, headquartered in Armonk, NY, USA, and R software version 4.3.1, available at https://www.r-project.org . To compare baseline characteristics between the training and validation cohorts, the χ2 test was employed, encompassing variables such as gender, race, age, lymphoma subtype, Ann Arbor stage, and the initial treatment course (involving chemotherapy and radiotherapy). These variables were further subjected to multivariable Cox proportional hazards regression analysis, which calculated hazard ratios (HR) and their associated 95% confidence intervals (CI) with respect to OS. Survival curves for both OS and disease-specific survival (DSS, lymphoma-specific) were generated using the Kaplan–Meier method, and distinctions among various subpopulations were assessed via a log-rank test. Within the SEER program, the survival time was defined as the duration from the date of diagnosis to either death or the most recent follow-up. It's important to note that the patient data utilized in this study were most recently updated as of November 2020. In this study, statistical significance was determined using a two-sided P value < 0.05.

Patient characteristics

A cohort of 7871 pediatric individuals, ranging in age from 0 to 19 years with a median age of 15 years, received diagnoses of lymphoma between 1975 and 2018. These cases were extracted from the SEER database, which draws data from nine U.S. states (Supplemental Table S2 ). The majority of these patients, constituting 53.6% ( N  = 4215), fell within the 15–19-year age group. Furthermore, 6.5% ( N  = 513) of the cases were in the 0–4-year age group, 14.4% ( N  = 1137) were aged 5–9 years, and 25.5% ( N  = 2006) were aged 10–14 years. Males accounted for a higher proportion, at 59.1% ( N  = 4650), compared to females at 40.9% ( N  = 3221). The ethnic distribution showed that Caucasians comprised the largest group, with 80.5% ( N  = 6338), followed by 11.4% ( N  = 897) who were of African descent, and 8.1% ( N  = 556) from other ethnic backgrounds, including AI/AN/AP (American Indian/Alaska Native/Asian and Pacific Islander). In terms of lymphoma subtypes, 53.3% ( N  = 4193) were diagnosed with HL, of which 4144 were nodal and 49 were extra-nodal, while 46.7% ( N  = 3678) had NHL, with 2543 being nodal and 1135 extra-nodal cases. Among the 5579 (70.9%) cases with staging information available, 17.6% ( N  = 1388) were categorized as stage I, 24.2% ( N  = 1901) as stage II, 11.4% ( N  = 896) as stage III, and 17.7% ( N  = 1394) as stage IV. As of the latest update, 1859 patients (23.6%) have succumbed to the condition, while 5992 (76.1%) remain alive. It's worth noting that 20 patients (0.3%) among the surviving group were lost to follow-up.

Survival trend analysis

As shown in Fig.  1 and Supplemental Table S2 , both adult and pediatric patients with lymphoma demonstrated gradually improved OS and DSS over the past four decades. Among the pediatric patients, the 1-year, 5-year and 10-year OS probability rates increased by 19.3% (82.5% in 1975 to 98.4% in 2017), 41.9% (66.1% in 1975 to 93.8% in 2013), and 48.8% (59.8% in 1975 to 89.0% in 2008), compared with 15.0% (73.9% in 1975 to 85.0% in 2017), 39.8% (48.7% in 1975 to 68.1% in 2013), and 54.6% (34.6% in 1975 to 53.5% in 2008) among adults, respectively. As for DSS, increases in the 1-year, 5-year, and 10-year rates for pediatric patients were 16.3% (85.5% in 1975 to 99.4% in 2017), 35.4% (70.1% in 1975 to 94.9% in 2013), and 43.8% (65.5% in 1975 to 94.2% in 2008), while adult cases were 12.3% (79.5% in 1975 to 89.3% in 2013), 34.2% (60.3% in 1975 to 80.9% in 2013), and 53.0% (50.0% in 1975 to 76.5% in 2008), respectively. Kahn et al. [ 5 ] reported that the Black population showed more prominent improvement in the long-term survival than Caucasians. In contrast, our subgroup analyses of different races demonstrated that Caucasian children showed consistently higher survival rates, especially in the 5-year and 10-year outcomes (Fig.  1 C).

figure 1

Overall survival and disease-specific survival trends of pediatric lymphoma over time. A 1-year, 5-year, and 10-year overall survival rates of pediatric and adult lymphoma over the year of diagnosis. B 1-year, 5-year, and 10-year disease-specific survival rates of pediatric and adult lymphoma over the year of diagnosis. C 1-year, 5-year, and 10-year overall survival and disease-specific survival rates of pediatric lymphoma patients among the subgroups of different races over the year of diagnosis. aAI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander

Prognostic analysis

Using the multivariable Cox regression model, the independent prognostic risk factors for OS among pediatric patients with lymphoma were identified, including age (0–4 years: reference, HR = 1; 5–9 years: HR = 0.83, 95%CI 0.66–1.05, P  = 0.117; 10–14 years: HR = 1.05, 95%CI 0.85–1.30, P  = 0.628; 15–19 years: HR = 1.35, 95%CI 1.11–1.66, P  = 0.003), sex (male: reference, HR = 1; female: HR = 0.89, 95%CI 0.81–0.98, P  = 0.018), race (Caucasian: reference, HR = 1; Black: HR = 1.32, 95%CI 1.14–1.52, P  < 0.001), the lymphoma subtype (HL: reference, HR = 1; Nodal NHL: HR = 1.92, 95%CI 1.71–2.16, P  < 0.001; Extra-nodal NHL: HR = 1.42, 95%CI 1.20–1.69, P  < 0.001), the Ann Arbor stage (stage I: reference, HR = 1; stage II: HR = 1.32, 95%CI 1.08–1.62, P  = 0.006; stage III: HR = 1.67, 95%CI 1.33–2.09, P  < 0.001; stage IV: HR = 2.42, 95%CI 2.01–2.92, P  < 0.001), and radiotherapy (not receiving: reference, HR = 1; receiving: HR = 1.32, 95%CI 1.19–1.46, P  < 0.001) (Fig.  2 ). The survival curves and comparisons associated with OS and DSS among subgroups divided by age, sex, race, the lymphoma subtype, and the Ann Arbor stage are shown in Supplemental Figure S2 – S6 . Importantly, among the pediatric patients with lymphoma, age significantly affected OS but not DSS (Supplemental Figure S2 ). In the first 26 years, approximately, children with a diagnosis at 0–4 years of age performed worse than others and pediatric patients aged 15–19 years demonstrated worse long-term OS. Sex was identified as one of the critical factors affecting not only DSS but also OS (Supplemental Figure S3 ). In terms of long-term outcomes, females consistently demonstrated better DSS than males. Although female patients had significantly better OS, the two groups started to overlap after approximately 20 years, indicating that other factors and other diseases may have affected the long-term survival rather than the lymphoma per se. Furthermore, the differences related to ethnicity are complicated. Pediatric patients of different races demonstrated similar DSS, while Caucasians and AI/AN/AP demonstrated significantly better OS than Black (Supplemental Figure S4 ). This is much more likely to be associated with multiple socioeconomic factors, instead of internal ethnicity differences. As for lymphoma subtypes, HL always showed better DSS than NHL among pediatric patients. Although, both subtypes may have no effects on long-term survival (Supplemental Figure S5 ). Surprisingly, we observed that the OS lines intersected at about 34 years after diagnosis. Pediatric patients with HL may be more susceptible to other associated factors or other diseases than pediatric patients with NHL during the long-term survival period.

figure 2

Multivariable Cox proportional hazards regression analysis for overall survival among pediatric patients with lymphoma. aHR, hazard ratio; AI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander; HL, Hodgkin lymphoma; N-NHL, Nodal Non-Hodgkin lymphoma; E-NHL, extra-nodal non-Hodgkin lymphoma

  • Outcome prediction

A total of 7741 pediatric patients with lymphoma were randomly divided into the training cohort and the validation cohort in a ratio of 7:3. The demographic characteristics of the two cohorts were not significantly different (Supplemental Table S4 ). Based on the independent prognostic factors identified using the multiple Cox regression model, a prediction nomogram was developed using the variables that involved sex, age, race, Ann Arbor stages, lymphoma subtypes, and radiotherapy in the training cohort (Fig.  3 A). Both internal and external validations to test the calibration and predictive ability of the nomogram were performed. Calibration curves of 1-year, 5-year, and 10-year OS demonstrated great consistency between the nomogram-predicted outcomes and the actual OS rates in both the training and the validation cohorts (Supplemental Figure S7 ). Furthermore, the prediction ability of the nomogram (1-year: 0.766 and 0.776, 5-year: 0.724 and 0.712, 10-year: 0.703 and 0.696, in the training and validation cohorts, respectively) was evaluated by AUCs of the ROC curves, and the nomogram performed better than Ann Arbor staging system (1-year: 0.666 and 0.668, 5-year: 0.647 and 0.651, 10-year: 0.646 and 0.641, in the training and validation cohorts, respectively) (Fig.  3 B). We further visualized relationship between all patients’ nomogram scores and survival time (Fig.  3 C), and higher nomogram scores indicated significantly worse survival outcomes (Fig.  3 D–E).

figure 3

The nomogram to predict 1-year, 5-year, and 10-year overall survival (OS) probabilities among pediatric patients with lymphoma. A Quantitative nomogram to predict survival probabilities according to the total points based on sex, age, race, the Ann Arbor stage, the lymphoma subtype, and radiotherapy. White, Caucasians; Black, African-American; AI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander. B Receiver operating characteristic curves of the nomogram and the Ann Arbor Staging System to predict 1-year, 5-year, and 10-year OS probabilities in the training and validation cohorts. AUC, the area under the ROC curve. AUCs of the nomogram (1-year: 0.766 and 0.776, 5-year: 0.724 and 0.712, 10-year: 0.703 and 0.696, in the training and validation cohorts, respectively) vs AUCs of the Ann Arbor Staging System (1-year: 0.666 and 0.668, 5-year: 0.647 and 0.651, 10-year: 0.646 and 0.641, in the training and validation cohorts, respectively). C Relationship between nomogram scores and survival time of each pediatric lymphoma patient. D and E Kaplan–Meier survival curves for pediatric lymphoma patients grouped by the median nomogram score in the training cohort and validation cohort, respectively

To further explore relationships between demographic characteristics and long-term outcomes of pediatric lymphoma, we developed multiple machine learning algorithm-based models for predicting the 5-year, 10-year and 20-year risk of lymphoma-specific death using the abovementioned variables. All machine learning models (AUC =  ~ 0.75) demonstrated significantly higher AUCs than conventional LR (AUC =  ~ 0.70) with better performance in decision curves, highlighting the superiority of artificial intelligence (Fig.  4 A, B). Furthermore, patients were nearly free from lymphoma-specific death about ten years after diagnosis of pediatric lymphoma, while the non-lymphoma death risk increased sharply all the time (Fig.  4 C, D). The sensitivity and specificity values for each model were confirmed at the maximal Youden index (Table  1 ). The non-lymphoma death causes for pediatric lymphoma patients were shown in Fig.  5 .

figure 4

Machine learning models for risk prediction of long-term lymphoma-specific death in patients with pediatric lymphoma. A Receiver operating characteristic curves of five classical machine learning-based models and logistic regression (LR) with areas under the curve (AUC). B Decision curve analysis for five classical machine learning-based models and LR. C Number of lymphoma-specific and non-lymphoma deaths as survival years after lymphoma diagnosis. D Cumulative lymphoma-specific and non-lymphoma mortalities as survival years after lymphoma diagnosis

figure 5

Analysis of death causes among pediatric patients with lymphoma

In this comprehensive population-based study, we leveraged the largest available dataset of cancer patients from the SEER database to conduct a systematic analysis of survival and outcome prediction for pediatric lymphomas, employing advanced machine learning techniques. Our investigation into survival trends revealed a notable increase in OS and DSS over the decades, both in the pediatric and adult lymphoma patient populations. Crucially, our findings indicated a remarkable similarity between 5-year and 10-year survival rates among pediatric patients, implying that the 5-year mark might serve as a critical management checkpoint for long-term survival prospects. It suggests that once pediatric patients with lymphoma surpass the initial 5-year survival threshold, their chances of being cured and enjoying sustained remission significantly improve. Additionally, we observed that OS rates closely mirrored DSS rates within the pediatric population, in stark contrast to the adult population. This intriguing pattern could be attributed to the fact that pediatric patients rarely succumbed to NHL, as evidenced by our analysis of causes of death. Specifically, among deceased pediatric patients, only 52.1% were attributed to lymphoma-related causes (NHL: N  = 574, 32.8%; Hodgkin Lymphoma: N  = 337, 19.3%), while other non-lymphoma causes included heart diseases, infectious diseases, accidents, adverse effects, and acute lymphocytic leukemia, among others. Patients with lymphoma were found to have long-term death risk of cardiovascular diseases [ 11 , 12 ]. The potential immune deficiency caused by lymphomas may also increase the risk of infection [ 13 ]. These insights shed light on the complex interplay of factors affecting survival in pediatric lymphoma patients, emphasizing the importance of long-term follow-up and tailored management strategies.

We also performed a multivariable analysis using Cox proportional hazards regression to identify potential independent risk factors for survival outcomes among pediatric patients with lymphoma. Age, sex, race, the lymphoma subtype, the stage, and radiotherapy were found to be significantly associated with OS. The Kaplan–Meier curves also suggested similar survival comparisons. Pediatric patients aged 0–4 years had lower OS than other ages in the first 20 plus years. Surprisingly, we found that the older pediatric patients, aged 15–19 years, demonstrated worse long-term OS outcomes, which was not observed in DSS curves. Consistent with a previous report, in some populations, the number of patients who died of other factors or other diseases can be comparable to those who died of lymphoma itself [ 14 ]/Moreover, the cumulative mortality curve also demonstrated that patients after diagnosis of pediatric lymphoma could be exempted from lymphoma-specific death but had an increasing risk of non-lymphoma diseases, especially after surviving ten years. Regardless of OS or DSS, pediatric males had significantly worse survival outcomes than pediatric females. Yet, there was still overlap between the two groups in the OS curve after a follow-up period of more than 30 years. The same situation was also observed between HL and NHL. Our results showed that males and pediatric patients with HL may be more susceptible to some long-term events, such as secondary malignancies and cardiovascular diseases [ 14 , 15 , 16 ]. As for race, Caucasian and AI/AN/AP children had significantly better OS than the Black children, while all of them demonstrated similar DSS. Populations of different ethnicities may have specific internal sensitivity to treatment, such as chemotherapy and radiotherapy. Moreover, socioeconomic limitations may lead to delayed diagnoses and management among Black. Furthermore, transplantation is currently one of the most critical treatments for cure, However, Black is under-represented in the marrow donor registries, and thus, have fewer opportunities to undergo transplantation [ 5 ].

Though lymphoma does not occur as commonly in children as in adults, it is still one of the most common malignancies among children [ 17 ]. The predictive tools that previous studies developed mainly focused on adult lymphoma, very few studies have reported the survival prediction nomogram among pediatric patients with lymphoma. Of note, the Ann Arbor staging system, which focused on the distribution of nodal involvement, was initially developed for HL. The biological features of NHL are different from those of HL. Thus, by integrating the independent prognostic risk factors using one of the largest lymphoma datasets from the SEER database, we developed a predictive nomogram model that can be easily used by clinicians worldwide. We also compared the predictive ability between the nomogram we developed and the Ann Arbor staging system. We found that the nomogram performed better in predicting 1-year, 5-year, and 10-year OS in both the training and validation cohorts. Besides, all machine learning models we developed also performed better than the conventional method in predicting long-term lymphoma-specific death risk, showing the superiority of machine learning in data mining. machine learning models can process large volumes of patient data, including clinical records, images, and genetic information, to assist physicians in devising more personalized treatment plans. In detail, our machine learning models may aid doctors in initial screening and diagnosis, saving time and allowing doctors to focus more on interacting with patients and formulating treatment plans. However, it's important to note that the application of machine learning models requires high-quality data and appropriate regulation to ensure their safety and effectiveness [ 18 ]. Additionally, machine learning models should only serve as an auxiliary tool in medical decision-making, with the ultimate treatment decisions still being made by experienced physicians. The ongoing development and improvement of more artificial intelligence tools will contribute to enhancing the diagnosis and treatment outcomes for lymphoma patients [ 19 ]. Overall, the quantitative nomogram and machine learning models may be useful for accurately and effectively predicting the survival probability for each individual child and contribute to clinical decision-making.

Our study had several limitations that merit consideration. Firstly, the clinical data we relied upon were sourced from the SEER database, representing only nine U.S. states. As a result, it is essential to acknowledge that our findings might not be entirely representative of the entire pediatric lymphoma landscape across the United States. Nevertheless, it is noteworthy that our study boasted the largest cohort of pediatric lymphoma patients to date, enabling systematic clinical analyses. Another limitation pertains to the comprehensiveness of information within the SEER database. Certain critical clinical details, such as specific treatment protocols, were regrettably unavailable, limiting the further optimization for enhanced predictive accuracy. To address these limitations and enhance the precision of our predictive models, our future plans include launching a multicenter cohort study that encompasses a broader array of pediatric lymphoma patients, thus allowing for the collection of a more comprehensive dataset encompassing a wider range of patient characteristics.

In conclusion, our study pioneered the exploration of survival trends, revealing that advancements in diagnostic and treatment approaches have led to notable improvements in both short-term and long-term survival outcomes. Moreover, we introduced an innovative quantitative nomogram and deployed multiple machine learning models to facilitate outcome prediction, showcasing their remarkable predictive accuracy and practical utility. The insights gleaned from this comprehensive clinical investigation are poised to offer valuable and actionable information on pediatric lymphoma, benefitting clinicians globally and serving as a catalyst for further research in this field.

Data availability

The data is available on the Surveillance, Epidemiology, and End Results (SEER, http://seer.cancer.gov ) database.

Buhtoiarov I. Pediatric lymphoma. Pediatr Rev. 2017;38(9):410–23. https://doi.org/10.1542/pir.2016-0152 .

Article   PubMed   Google Scholar  

Smith MA, Seibel NL, Altekruse SF, et al. Outcomes for children and adolescents with cancer: challenges for the twenty-first century. J Clin Oncol. 2010;28(15):2625–34. https://doi.org/10.1200/jco.2009.27.0421 .

Article   PubMed   PubMed Central   Google Scholar  

Mauz-Körholz C, Metzger ML, Kelly KM, et al. Pediatric Hodgkin lymphoma. J Clin Oncol. 2015;33(27):2975–85. https://doi.org/10.1200/jco.2014.59.4853 .

Sandlund JT, Martin MG. Non-Hodgkin lymphoma across the pediatric and adolescent and young adult age spectrum. Hematology Am Soc Hematol Educ Prog. 2016;1:589–97. https://doi.org/10.1182/asheducation-2016.1.589 .

Article   Google Scholar  

Kahn JM, Keegan TH, Tao L, et al. Racial disparities in the survival of American children, adolescents, and young adults with acute lymphoblastic leukemia, acute myelogenous leukemia, and Hodgkin lymphoma. Cancer. 2016;122(17):2723–30. https://doi.org/10.1002/cncr.30089 .

Bazzeh F, Rihani R, Howard S, Sultan I. Comparing adult and pediatric Hodgkin lymphoma in the surveillance, epidemiology and end results program, 1988–2005: an analysis of 21 734 cases. Leuk Lymphoma. 2010;51(12):2198–207. https://doi.org/10.3109/10428194.2010.525724 .

Bomze D, Sprecher E, Goldberg I, Samuelov L, Geller S. Primary cutaneous B-cell lymphomas in children and adolescents: a SEER population-based study. Clin Lymphoma Myeloma Leuk. 2021;21(12):e1000–5. https://doi.org/10.1016/j.clml.2021.07.021 .

Article   CAS   PubMed   Google Scholar  

Kassira N, Pedroso FE, Cheung MC, Koniaris LG, Sola JE. Primary gastrointestinal tract lymphoma in the pediatric patient: review of 265 patients from the SEER registry. J Pediatr Surg. 2011;46(10):1956–64. https://doi.org/10.1016/j.jpedsurg.2011.06.006 .

Naeem B, Ayub A. Primary pediatric non-Hodgkin lymphomas of the gastrointestinal tract: a population-based analysis. Anticancer Res. 2019;39(11):6413–6.

von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7. https://doi.org/10.1016/s0140-6736(07)61602-x .

Gröbner S, Worst B, Weischenfeldt J, et al. The landscape of genomic alterations across childhood cancers. Nature. 2018;555(7696):321–7. https://doi.org/10.1038/nature25480 .

Ma X, Liu Y, Liu Y, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555(7696):371–6. https://doi.org/10.1038/nature25795 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Dulai PS, Thompson KD, Blunt HB, Dubinsky MC, Siegel CA. Risks of serious infection or lymphoma with anti-tumor necrosis factor therapy for pediatric inflammatory bowel disease: a systematic review. Clin Gastroenterol Hepatol. 2014;12(9):1443–51. https://doi.org/10.1016/j.cgh.2014.01.021 .

Gao J, Chen Y, Wu P, et al. Causes of death and effect of non-cancer-specific death on rates of overall survival in adult classic Hodgkin lymphoma: a populated-based competing risk analysis. BMC Cancer. 2021;21(1):955. https://doi.org/10.1186/s12885-021-08683-x .

Bhakta N, Liu Q, Yeo F, et al. Cumulative burden of cardiovascular morbidity in paediatric, adolescent, and young adult survivors of Hodgkin’s lymphoma: an analysis from the St Jude lifetime cohort study. Lancet Oncol. 2016;17(9):1325–34. https://doi.org/10.1016/s1470-2045(16)30215-7 .

Kupeli S. Cardiovascular disease after Hodgkin’s lymphoma: a role for screening. Lancet Haematol. 2015;2(11):e461–2. https://doi.org/10.1016/s2352-3026(15)00194-5 .

Horn SR, Stoltzfus KC, Mackley HB, et al. Long-term causes of death among pediatric patients with cancer. Cancer. 2020;126(13):3102–13. https://doi.org/10.1002/cncr.32885 .

Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55. https://doi.org/10.1038/s41580-021-00407-0 .

Bzdok D, Krzywinski M, Altman N. Machine learning: supervised methods. Nat Methods. 2018;15(1):5–6. https://doi.org/10.1038/nmeth.4551 .

Download references

Acknowledgements

All authors sincerely thank the staff of the SEER Program for the development of the database.

This work was supported by Postdoctoral Fellowship Program of CPSF (No. GZB20230481), Post-Doctor Research Project, West China Hospital, Sichuan University (No. 2024HXBH149, No. 2024HXBH006), National Natural Science Foundation of China (No. 82303773, No. 82303772, No. 82303694, No. 82204490), Natural Science Foundation of Sichuan Province (No. 2023NSFSC1885, No. 2024NSFSC1908), Key Research and Development Program of Sichuan Province (No. 23ZDYF2836).

Author information

Yue Zheng and Chunlan Zhang have contributed equally to this work.

Authors and Affiliations

Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, China

Yue Zheng, Kai Kang, Ren Luo & Yijun Wu

Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China

Department of Hematology, West China Hospital, Sichuan University, Chengdu, China

Chunlan Zhang, Xu Sun & Ailin Zhao

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, A.Z. and Y.W.; methodology, Y.Z.; software, C.Z.; validation, Y.W. and Z.A.; formal analysis, X.S.; investigation, K.K.; resources, R.L.; data curation, Y.W.; writing—original draft preparation, Y.Z. and C.Z.; writing—review and editing, X.S.; visualization, K.K.; supervision, A.Z.; project administration, Y.W.; funding acquisition, A.Z. and Y.W. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Ailin Zhao or Yijun Wu .

Ethics declarations

Conflict of interest.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 983 kb)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Zheng, Y., Zhang, C., Sun, X. et al. Survival trend and outcome prediction for pediatric Hodgkin and non-Hodgkin lymphomas based on machine learning. Clin Exp Med 24 , 132 (2024). https://doi.org/10.1007/s10238-024-01402-3

Download citation

Received : 24 April 2024

Accepted : 12 June 2024

Published : 18 June 2024

DOI : https://doi.org/10.1007/s10238-024-01402-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Pediatric lymphoma
  • Survival trend
  • Machine learning
  • Find a journal
  • Publish with us
  • Track your research

Home

Academic Year Program

Introduction to experimental psychology.

This course provides an introduction to the basic topics of psychology including our three major areas of distribution: the biological basis of behavior, the cognitive basis of behavior, and individual and group bases of behavior. Topics include, but are not limited to, neuropsychology, learning, cognition, development, disorder, personality, and social psychology.

Penn Arts & Sciences High School Programs

3440 Market Street, Suite 450 Philadelphia, PA 19104-3335

(215) 746-2309 [email protected]

View Our Facebook

  • Open access
  • Published: 18 June 2024

Combination of induced pluripotent stem cell-derived motor neuron progenitor cells with irradiated brain-derived neurotrophic factor over-expressing engineered mesenchymal stem cells enhanced restoration of axonal regeneration in a chronic spinal cord injury rat model

  • Jang-Woon Kim 1 , 2 ,
  • Juryun Kim 4 ,
  • Soon Min Lee 5 ,
  • Yeri Alice Rim 1 , 2 ,
  • Young Chul Sung 5 ,
  • Yoojun Nam 4 ,
  • Hyo-Jin Kim 5 ,
  • Hyewon Kim 4 ,
  • Se In Jung 1 , 2 ,
  • Jooyoung Lim 1 , 2 &
  • Ji Hyeon Ju   ORCID: orcid.org/0009-0001-7649-5076 1 , 2 , 3 , 4  

Stem Cell Research & Therapy volume  15 , Article number:  173 ( 2024 ) Cite this article

Metrics details

Spinal cord injury (SCI) is a disease that causes permanent impairment of motor, sensory, and autonomic nervous system functions. Stem cell transplantation for neuron regeneration is a promising strategic treatment for SCI. However, selecting stem cell sources and cell transplantation based on experimental evidence is required. Therefore, this study aimed to investigate the efficacy of combination cell transplantation using the brain-derived neurotrophic factor (BDNF) over-expressing engineered mesenchymal stem cell (BDNF-eMSC) and induced pluripotent stem cell-derived motor neuron progenitor cell (iMNP) in a chronic SCI rat model.

A contusive chronic SCI was induced in Sprague-Dawley rats. At 6 weeks post-injury, BDNF-eMSC and iMNP were transplanted into the lesion site via the intralesional route. At 12 weeks post-injury, differentiation and growth factors were evaluated through immunofluorescence staining and western blot analysis. Motor neuron differentiation and neurite outgrowth were evaluated by co-culturing BDNF-eMSC and iMNP in vitro in 2-dimensional and 3-dimensional.

Combination cell transplantation in the chronic SCI model improved behavioral recovery more than single-cell transplantation. Additionally, combination cell transplantation enhanced mature motor neuron differentiation and axonal regeneration at the injured spinal cord. Both BDNF-eMSC and iMNP played a critical role in neurite outgrowth and motor neuron maturation via BDNF expression.

Conclusions

Our results suggest that the combined transplantation of BDNF- eMSC and iMNP in chronic SCI results in a significant clinical recovery. The transplanted iMNP cells predominantly differentiated into mature motor neurons. Additionally, BDNF-eMSC exerts a paracrine effect on neuron regeneration through BDNF expression in the injured spinal cord.

Graphical Abstract

experimental 3 factors

Spinal cord injury (SCI) is a disease that causes motor, sensory, and autonomic dysfunction. It is characterized by various symptoms, including post-injury paralysis, paresthesia, spastic pain, and cardiovascular, bladder, or sexual dysfunction. Severe SCI is a leading cause of death owing to severe autonomic dysfunction and neurogenic shock [ 1 ]. Based on a retrospective population-based study conducted between 2011 and 2020, 1,303 traumatic SCI (TSCI) accidents occurred among 4.9 million residents. The recent increase in TSCI incidence has increased its recognition as a global health priority [ 2 ]. The causes of TSCI are motor vehicle accidents, falls, work-related injuries, violent crimes, and sport-related injuries [ 3 ]. Patients with TSCI experience substantial mortality and morbidity rates, as well as an economic burden, owing to the high cost and complexity of medical care and lost productivity [ 2 ].

The pathophysiology of TSCI comprises two phases: primary and secondary. In most clinical situations, the focus is to prevent secondary injury mechanisms that occur following the primary injury. The secondary injury process is divided into acute, sub-acute, and chronic phases based on the time after the injury [ 3 ]. After the primary injury, pathological changes, such as chronic inflammation, cell dysfunction, and vascular changes, occur in the injured spinal cord (SC) tissue. These changes activate resident astrocytes, microglia, fibroblasts, and other glial cells at the lesion site and contribute to the infiltration of peripheral immune cells. The interactions between these cells at the lesion site are the basis of glial scarring, which inhibits axonal regeneration and myelination formation [ 4 , 5 , 6 , 7 ].

Currently, effective treatments for acute and chronic SCI do not exist. Stem cell transplantation has emerged as a promising strategy to inhibit glial scarring and reduce inflammation. Various cell sources are being explored in stem cell transplantation studies for SCI. The transplantation of Schwann cells, neural stem or progenitor cells, olfactory ensheathing cells (OECs), oligodendrocyte precursor cells, and mesenchymal stem cells (MSC) have been investigated as potential therapies for SCI. Stem cell transplantation can be derived from adult and embryonic stem cells (ESC) and induced pluripotent stem cells (iPSC) via direct conversion technology [ 8 , 9 , 10 , 11 , 12 , 13 ]. A combined cell transplantation approach is required to treat SCI more effectively because cellular response factors within the injured tissue determine SCI progression [ 14 , 15 ].

We aimed to confirm the feasibility of a combination cell transplantation strategy in a chronic SCI model. Initially, we used MSC as the first combination cell source. Prior research has reported significant clinical improvement in chronic SCI following MSC transplantation. Astrocytic differentiation of the transplanted cells was predominant at the lesion site. The potential of transplanted cells in chronic SCI has been confirmed; however, further research is required to improve their migration and differentiation into functional cells [ 16 ]. Brain-derived neurotrophic factor (BDNF) plays an essential role in neuronal maturation, differentiation, and survival of newly generated neurons via BDNF and TrkB signaling. BDNF has a promising potential as a treatment for central nervous system diseases such as brain disease and SCI; however, its application for neurological diseases is limited [ 17 , 18 , 19 ]. BDNF has a short half-life in vivo and cannot cross the blood-brain barrier, posing complex challenges in its application. To overcome these limitations, BDNF overexpression in MSCs has been attempted as a treatment for neurological diseases. Based on previous research, we have engineered human MSCs (hMSCs) to overexpress BDNF. Moreover, previous studies have confirmed that BDNF over-expressing engineered MSC (BDNF-eMSC) after irradiation can enhance their efficacy in facilitating recovery from brain diseases in rodent models [ 17 , 18 , 19 ].

Furthermore, previous studies have reported that transplanting ESC-derived and iPSC-derived-motor neuron progenitor cells increased neuronal survival and promoted neurite branching, resulting in functional recovery in SCI models. Recent experimental studies have suggested motor neuron and motor neuron progenitor cells as potential stem cell therapy strategies for SCI [ 20 , 21 , 22 , 23 ]. We aimed to transplant iPSC-derived motor neuron progenitor cells (iMNP) as a second combination cell source to increase the motor neuron differentiation rate at the lesion site in a chronic SCI model.

Trials of ideal cell types and cell transplantation strategies for chronic SCI are required to achieve effective stem cell transplantation. In this study, we used BDNF-eMSC and iMNP combination cell transplantation in a chronic SCI model. We hypothesized that transplanting BDNF-eMSC and iMNP cells in the severe stage of chronic SCI would induce functional recovery through BDNF expression, mature motor neuron differentiation, and axonal regeneration.

Materials and methods

In vitro assay, bdnf-emsc preparation.

BDNF-eMSC was established based on the previously reported protocol [ 17 , 18 , 19 , 24 , 25 ] and provided as irradiated form by SL BIGEN, Inc., (Incheno, Korea). Human bone marrow-derived MSCs were purchased from the Catholic Institute of Cell Therapy, South Korea. MSC was cultured in low glucose-containing Dulbecco’s Modified Eagle Medium (DMEM) (Gibco, Grand Island, NY, USA) supplemented with 20% fetal bovine serum (FBS) (Gibco) and 5 ng/mL basic fibroblast growth factor (bFGF) (PeproTech, Rocky Hill, NJ, USA) and BDNF-eMSC was cultured in low glucose-DMEM supplemented with 10% FBS, 10 ng/mL bFGF in a humidified atmosphere of 5% CO 2 at 37℃.

iPSC derived motor neuron progenitor cell generation

Human iPSCs were generated from cord blood mononuclear cells (CBMCs) as previously described using Cyto Tune-iPSC Sendai Reprogramming kit containing Yamanaka factors (A16518, Thermo Fisher Scientific). CBMCs were directly obtained from the Cord Blood Bank of the Seoul St. Mary’s Hospital [ 26 , 27 ]. The CBMC-derived iPSCs were cultured and maintained in vitronectin-coated plate dishes using Essential 8™ Basal medium (Thermo Fisher Scientific) and supplements (Thermo Fisher Scientific). The differentiation of iPSC into motor neuron using small molecules was performed based on a previously reported protocol [ 28 , 29 ]. During motor neuron differentiation, we used motor neuron induction medium, including DMEM/F12, Neurobasal medium at 1:1, 1% N 2 , 1% B27, (Thermo Fisher Scientific), 0.1 mM ascorbic acid (Sigma-Aldrich, St Louis, MO, USA), 1X Glutamax, and 1X penicillin/streptomycin (Thermo Fisher Scientific). Induced pluripotent stem cell-derived neuron epithelial progenitor (iNEP) differentiation was induced in a motor neuron induction medium containing CHIR99021 (3 µM, Tocris, Bristol, United Kingdom), 2 µM dorsomorphin homolog 1 (Tocris), and 2 µM SB431542 (Stemgent, Cambridge, MA, USA). The culture medium was changed every other day for 6 days. During the induction of iMNP differentiation, retinoic acid (RA; 0.1 µM, Stemgent) and pumorphamine (Pur; 0.5 µM, Stemgent) were added to the iNEP cells along with 1 µM CHIR99021 (Tocris), 2 µM DMH1 (Tocris), and 2 µM SB431542 (Tocris) for 6 days. Subsequently, iMNP cells were cultured in a suspension of motor neuron induction medium containing 0.5 µM RA and 0.1 µM Pur to induce pluripotent stem cell-derived motor neuron (iMN) differentiation for an additional 6 days. For mature motor neuron differentiation, iMN cells were cultured with 0.5 µM RA, 0.1 µM Pur, and 0.1 µM Compound E (Calbiochem, San Diego, CA, USA) for 10 days.

Immunofluorescence (IF) staining for in vitro assay

To assess BDNF expression, 5 × 10 4 MSC and BDNF-eMSC were seeded onto coverslips in a 12-well plate. IF staining for MSC and BDNF-eMSC were performed 2 and 7 days after cell seeding, respectively. Cell seeding was performed on a 12-well plate laminin (10 mg/ml)-coated coverslip for IF staining in motor neuron cells. In iNEP, 5 × 10 4 cells were seeded, and for iMNP, iMN, and iPSC-derived mature motor neuron (iMature MN) cells were seeded with 5 × 10 5 cells on a laminin-coated coverslip. IF staining for iNEP, iMNP, and iMN were performed 6 days after cell seeding, whereas iMature MN was stained after 10 days. The IF staining protocol for BDNF expression and motor neuron differentiation were performed under the same conditions. All cells were fixed in 4% PFA for 30 min at RT and permeabilized using 0.1% Triton X-100 for 20 min at RT. Cell blocking was performed using PBS containing 2% BSA (PBA, Sigma-Aldrich) for 30 min. Primary antibodies were incubated with 2% PBA for 2 h at RT. After washing the cells using tris-buffered saline (TBS) with 0.05% Tween-20 (TBST), secondary antibodies conjugated with Alex Fluor-488 or 594 (Life Technologies) were incubated with 2% PBA for 1 h at RT. The stained cells were counterstained with 4′, 6-diamidino-2-phenylindole (DAPI, Roche, Basel, Switzerland), washed, and mounted using an antifade mounting reagent (Thermo Fisher Scientific). The stained cells were observed under a fluorescence LSM 900 and FV 3000 confocal microscope (Carl Zeiss, Oberkochen, Germany and Olympus Life science (EVIDNT), Tokyo, Japan ) (x 200 magnifications). The intensity of IF staining was measured in four areas at 200× magnification. The measured fluorescence intensity was analyzed using Fiji (Windos-64 Image J). Table  1 details the primary antibodies.

2-dimensional (2D) and 3-dimensional (3D) co-culture and in vitro neurite outgrowth assay

We performed BDNF-eMSC and iMN co-culture to analyze BDNF expression, mature motor neuron differentiation, and neurite outgrowth in vitro. For BDNF expression and mature motor neuron differentiation, BDNF-eMSC and iMN cells were cultured at a 1:1 ratio in a 2D co-culture. Mature motor neurons were differentiated for 10 days after 2D co-culture cell seeding on a laminin-coated plate. We used a 3D co-culture platform to assess neurite outgrowth cells during mature motor neuron differentiation. We generated BDNF-eMSC and iMN 3D aggregates using microwell plates (AggreWell TM 800, STEMCELL Technologies, Seattle, WA USA) following the manufacturer’s instructions. After aggregating BDNF-eMSC and iMN in the aggrewell for 2 days, the 3D-co-culture spheroids were attached to a laminin-coated plate and differentiated into mature motor neurons for 10 days. The neurite outgrowth during mature motor neuron differentiation was confirmed and evaluated using microtubule-associated protein-2 (MAP-2) and a neurite outgrowth assay kit (Life Technologies). Neurite outgrowth was evaluated using a fluorescence plate reader. Red fluorescence was detected using emission settings of 554/567 nm for the neurite outgrowth cells. To analyze synaptic connections and neural networks in mature motor neurons, IF staining was performed using synapsin-1, Tuj-1, and MAP-2 antibodies. Fluorescence intensity was measured in four areas at 200× magnification.

MEA analysis

Electrophysiological analysis of mature motor neurons was performed using MEA. The 3D spheroid was made using AggreWell. Before cell seeding, the MEM plate was coated with 0.1% polyethyleneimine solution for 1 h at RT, followed by rinsing with sterile deionized water thrice and dried overnight in a biosafety cabinet at RT. The 3D spheroids per AggreWell treated with laminin (10 µg/mL) were seeded onto the MEA plate. After 1 h, the 3D spheroids were incubated with a mature motor neuron induction medium. The MEA plate was placed in a cell culture incubator with a 5% CO 2 humidified atmosphere at 37 ℃. Electroactivity of neurons was monitored after 10 days of culture and the number of spikes was recorded.

Protein extraction in cell lysates and WB analysis

Protein was extracted from BDNF-eMSC, MSC, and iMNP cells using RIPA buffer (Thermo Fisher Scientific). The cell lysate was obtained after 10 days of 2D co-culture to analyze the expressions of BDNF and mature motor neuron marker, as well as mature motor neuron differentiation through 2D BNDF-eMSC and iMN culture. The cell lysate was incubated with RIPA buffer for 30 min at 4℃, followed by centrifugation at 16,000 rpm for 20 min. The amount of extracted protein was quantified using a bicinchoninic acid (BCA) protein assay. To confirm the expression of BDNF, motor neuron differentiation markers and MAP-2 in the quantified protein supernatant were separated using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred onto a nitrocellulose blotting membrane. The membrane was blocked with 3% BSA for 1 h at RT and incubated with primary antibodies overnight at 4℃. The following day, the membrane was incubated with secondary antibodies at RT for 1 h. Subsequently, protein expression was confirmed using an enhanced chemiluminescence solution. Protein expression was detected using LAS 4000 (BioRad, Herecules, CA, USA), and band intensity was quantified using multi-gauge V 3.0 software (Fujifilm, Tokyo, Japan). Full-length Western blot images are presented in Additional file 4 : Fig.  4 .

In vivo assay

Animal care and contusive chronic sci model.

The Animal Studies Committee of the School of Medicine, the Catholic University of Korea, approved this study (IACUC approval Number CUMC-2020-0364-04). All animal care, operation, and cell transplantation procedures were conducted in accordance with the Laboratory Animal Welfare Act and the Guideline and Policies for Rodent Survival Surgery. The Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were followed. Contusive chronic SCI models were generated and prepared based on a previously reported surgical procedure [ 16 , 30 ]. Briefly, a contusive chronic SCI model was generated using 7-week-old adult male Sprague-Dawley rats (weighing between 270 and 320 g). The rats were anesthetized with isoflurane via inhalation and Rompun (2 mg/kg) via intraperitoneal injection. After anesthesia, the rats were shaved and sterilized with antiseptic betadine. The paravertebral muscles from thoracic 8 and 10 (T8–10) were exposed, and a laminectomy was performed at T9. The contusive SCI model was induced using the Multicenter Animal Spinal Cord Injury Study impactor (a 10 g rod was dropped from a height of 2.5 cm) during the laminectomy at T9. Pre- and post-operatively, the rats were administered 5 mg of ketoprofen, gentamicin, and warm saline solution for 3–5 days. The bladders of all rats with SCI were manually emptied for 1 week. Behavioral recovery was observed for 6 weeks after SCI, and rats that spontaneously recovered were excluded before combination cell transplantation.

Group allocation and combination cell transplantation

The concepts of combination cell transplantation have been performed in the induced contusive chronic SCI model. We performed and recorded the behavioral assessment before cell transplantation. The rats were randomized into the following groups at 6 weeks post-SCI for cell transplantation: (1) chronic SCI + phosphate-buffered saline (PBS) group ( n  = 8), (2) chronic SCI + BDNF-eMSC group ( n  = 6), (3) chronic SCI + iMNP group ( n  = 7), and (4) chronic SCI + BDNF-eMSC + iMNP group ( n  = 8). Before cell transplantation, BDNF-eMSCs were labeled with PKH26 (red fluorescence), whereas iMNPs were labeled with PKH67 (green fluorescence). At 6 weeks post-injury, the lesion site (T9) was re-exposed, and 1 × 10 6 cells of BDNF-eMSC and iMNP cells in 10 µL PBS were transplanted using a Hamilton needle in the BDNF-eMSC and iMNP groups, respectively. The BDNF-eMSC + iMNP group was transplanted with both types of cells (1:1) in 10 µL PBS at the lesion site. The PBS group was transplanted with 10 µL PBS at the lesion site. All groups were transplanted at the rostral (5 µL) and caudal (5 µL) of the lesion site. We intramuscularly administered 10 mg/kg of cyclosporin A (Cipol Inj, Chongkundang Pharmaceutical) daily after cell transplantation.

Cell labeling for tracking transplanted cells

The transplanted cells were tracked, and their engraftment and differentiation in the injured SC were confirmed using PKH26 (red fluorescence) (Sigma-Aldrich, St Louis, MO, USA) and PKH67 (green fluorescence) (Sigma-Aldrich). PKH26 and PKH67 are fluorescent cell membrane-intercalating dyes. Before cell transplantation, BDNF-eMSC was labeled with PKH26, whereas iMNP was labeled with PKH67. Briefly, the PKH26 and PKH67 cell tracking procedure was the same. A cell pellet containing 1 × 10 6 cells was incubated with Diluent C and cell tracking dye solutions for 5 min at room temperature (RT). After labeling, the activity of the cell tracking dyes was stopped using 1% bovine serum albumin (BSA). After the final wash, the cell pellet was centrifuged and suspended in PBS for cell transplantation.

Behavioral recovery assessment

We assessed behavioral recovery using the Basso, Beattie, and Bresnahan (BBB) locomotor rating scale after SCI. Three researchers monitored the BBB locomotor scale and recorded the scores every week for 12 weeks. Rats that exhibited natural, spontaneous improvement of hindlimb function within 24 h post-injury were excluded. Furthermore, rats that spontaneously recovered before cell transplantation were excluded. Based on the BBB locomotor scales, rats were divided into two grades: grade 1 (0–5) and grade 2 (6–11). Improvements in behavioral recovery were compared using the incidence rate. The incidence rate (%) was calculated as follows: (BBB grade score of total rats/total rats) x 100.

Preparation of injured SC tissue

We euthanized the specimens using CO 2 gas (30–70% chamber volume/min) before obtaining the injured SC, following the American Veterinary Medical Association Guidelines for the Euthanasia of Animals (2020 Edition). The injured SC samples (approximately 1 cm segment) were obtained at 12 weeks post-injury. For immunofluorescence (IF) staining assessments, we first confirmed cardiac arrest, and then the injured SC was obtained after trans-cardiac perfusion with PBS and 4% paraformaldehyde (PFA). The obtained injured SC was fixed overnight in 4% PFA, followed by overnight incubation in 15% and 30% sucrose at 4℃. Injured SC was embedded in optical cutting temperature (Tissue-Tek; Sakura Finetek USA, Torrance, CA, USA) and snap-frozen using liquid nitrogen. For Western blot (WB) analysis, injured SC (approximately 1 cm segment) was obtained after euthanasia without a prior cardiac perfusion procedure. The obtained injured SC was immersed in liquid nitrogen and stored in a deep freezer at -80℃.

IF staining for injured SC

IF staining was performed on frozen sections of the embedded injured segments (approximately 1 cm) to assess transplanted cell engraftment, differentiations, axonal regeneration, and BDNF expression. Frozen SC sections (4 μm thick) were obtained and mounted on saline-coated slides. The SC sections were fixed using cold acetone for 10 min at RT, followed by washing with TBST. The slide sections were permeabilized using 0.1% Triton X-100 for 20 min at RT. Subsequently, the SC sections were blocked with normal goat or horse serum containing 0.1% Triton X-100 for 1 h at RT. Primary antibodies were incubated with 0.1% Triton X-100 and 1% normal goat or horse serum overnight at 4℃. After washing, Cy5 (Life Technologies, Carlsbad, CA, USA) Fluor-conjugated secondary antibodies were incubated and diluted in TBST at RT for 1 h. Slides were washed, stained with DAPI, and finally washed and mounted using an antifade mounting reagent. We confirmed and observed the IF staining under a confocal microscope, LSM 700 and 900 and FV 3000 (Carl Zeiss, Oberkochen, Germany) and Olympus Life science (EVIDNT). The number of engrafted cells in the lesion site was counted in six areas at 400× magnification for PKH26 (BDNF-eMSC) and PKH67 (iMNP). Transplanted cells were counted using the cell counter plug-in of Fiji (Windows-64 Image J) program. The intensity of engrafted cells was measured in six areas at 400× magnification. The intensity of IF staining was measured in four areas at x200 magnification. The measured fluorescence intensity was analyzed using Fiji (Windows-64 image J) program. Table  1 details the primary antibodies.

Protein extraction in injured spinal cord and WB analysis

The injured SC segments (approximately 1 cm length) were extracted using tissue protein extraction reagent (Thermo Fisher Scientific) with one protease inhibitor cocktail tablet (Roche, Basel, Switzerland) and 1 mM phenyl methyl sulfonyl fluoride for 1 h at 4℃. Additionally, the tissue-extracted protein amount was quantified using a BCA quantitative analysis. Protein quantification was separated using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred onto a nitrocellulose blotting membrane. The transferred membranes were blocked using 3% BSA in TBST for 1 h at RT and incubated with primary antibodies overnight at 4℃. Table  1 detail the primary antibodies. Membranes were incubated with secondary peroxidase-conjugated antibodies for 1 h at RT. Protein expression was confirmed using an enhanced chemiluminescence solution. Protein expression was detected via exposure to LAS 4000 (BioRad, Hercules, CA, USA), and band intensity was quantified using multi-gauge V 3.0 software (Fujifilm, Tokyo, Japan). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

Statistical analysis

All results were statistically analyzed using IBM SPSS statistics for Windows, version XX (IBM Corp., Armonk, N.Y., USA). All data were expressed as means ± standard deviations. For BDNF expression analysis in hMSC and BDNF-eMSC in vitro assay, the paired t-test (#) and Kruskal–Wallis test followed by Mann–Whitney U test (†) were used to compare the results between the two groups. Statistical significance was set at a p  < 0.05. The statistical relevancies were expressed using a one-way analysis of variance and Fisher’s least significant difference (*) and Kruskal–Wallis tests followed by Mann–Whitney U test for intergroup comparison to compare the results among the three or four groups. Statistical significance was set at p <0.05 (†*# p  < 0.05, ††**## p  < 0.01, †††***### p  < 0.001, n.s = not significant).

BDNF over-expressing engineered mesenchymal stem cell (BDNF-eMSC) generation

In previous studies reported that the BDNF-eMSC demonstrated highly proliferative and secreted BDNF expression than naïve MSCs. BDNF-eMSC was also increased the BDNF expression than naïve MSCs. The BDNF-eMSC was generated as previously described [ 17 , 18 , 19 ]. Before cell transplantation, we confirmed and performed BDNF expression in BDNF-eMSC. The BDNF-eMSC was established using a lentiviral vectors encoding the c-Myc, tTA and BDNF genes and then irradiated with 200 Gy radiation using an X-ray irradiation device (Red Source Technologies, Buford, GA, USA) (Fig.  1 a). On day 2 after cell thawing, the BDNF-eMSC displayed a homogenous spindle-like cell morphology representative of MSCs (Fig.  1 b). MSC and BDNF-eMSC exhibited positive expression of BDNF on day 2. The fluorescence intensity of BDNF was statistically significantly higher in BDNF-eMSC than in MSC (Fig.  1 c). BDNF-eMSC maintained their spindle-like cell morphology till day 7 (Fig.  1 d). IF staining revealed BDNF expression in BDNF-eMSC on day 7. However, the fluorescence intensity of BDNF in BDNF-eMSC significantly decreased on day 7 compared to that on day 2 (Fig.  1 e). WB analysis confirmed higher BDNF expression in BDNF-eMSC than in MSCs (Fig.  1 f). Furthermore, we observed that BDNF expression in BDNF-eMSC decreased on day 7 than on day 2 (Fig.  1 g). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 1

Generation of BDNF-eMSCin vitro. ( a ) The scheme of BDNF-eMSC. ( b ) Representative light microscope images of MSC and BDNF-eMSC cell morphologies on day 2. ( c ) Representative fluorescence images showing BDNF expression in MSC and BDNF-eMSC on day 2. Quantification of the fluorescence intensity of BDNF in MSC and BDNF-eMSC ( n  = 8). ( d ) Representative light microscope image depicting BDNF-eMSC morphology on day 7. ( e ) Representative fluorescence image showing BDNF expression on BDNF-eMSC on day 7. Quantification of the fluorescence intensity of BDNF in BDNF-eMSC on day 2 and 7 ( n  = 8). ( f ) Western blotting (WB) results showing BDNF expression in MSC and BDNF-eMSC after 2 days of cell lysates. ( g ) WB results demonstrating BDNF expression in BDNF-eMSC after 2 and 7 days of cell lysates. Full-length western blot images are presented in Additional file 4 : Fig.  4 . The data are presented as mean ± SEM. Statistical significance was estimated using paired t-test (#) and Kruskal–Wallis analysis followed by Mann–Whitney (†) analysis for intergroup comparison. #, † P  < 0.05, ## P  < 0.01. ### P  < 0.001. (MSC: 2 days n  = 4, BDNF-eMSC: 2 days n  = 4, BDNF-eMSC: 7 days n  = 4). Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; IF, Immunofluorescence staining; WB, Western blot

Human iPSC derived motor neuron progenitor and motor neuron differentiation

The motor neuron progenitor and mature motor neurons were differentiated using a small molecule cocktail as previously described [ 28 , 29 ] (Fig.  2 a). Our results revealed that iMNP and iMature MN were successfully differentiated and reproduced from human iPSC, as confirmed via light microscopy (Fig.  2 b). We evaluated the stages of motor neuron differentiation using the specific marker expression of each cell differentiation phase. On day 6, IF staining and WB analysis revealed Sox 1 expression in the iNEP phase (Fig.  2 c and d-e). The fluorescence intensity of SOX1 was significantly higher in iNEP (Additional file 1 : Fig.  1 ). OLIG2 expression was more strongly expressed in the iMNP phase than in other motor neuron cell differentiation stages (Fig.  2 d and f). The fluorescence intensity of OLIG2 was also significantly higher in iMNP (Additional file 1 : Fig.  1 ). IF staining revealed HB9-positive cells in the iMN phase (Fig.  2 c), and the fluorescence intensity of HB9 was significantly higher in iMN (Additional file 1 : Fig.  1 ). The protein expression increased on day 18, as confirmed by WB analysis (Fig.  2 d and g). In the iMature MN phase, we evaluated SMI-32 expression using IF staining and WB analysis. The fluorescence intensity of SMI-32 was statistically significantly higher in mature iMNs than in iNEP, iMNP, and iMN (Additional file 1 : Fig.  1 ). On day 28, iMature MN showed increased SMI-32 positivity and protein expression (Fig.  2 c and d-h). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 2

Generation of induced pluripotent stem cell (iPSC)-derived motor neurons using small moleculesin vitro. ( a ) The scheme of motor neuron progenitor cells (MNP), motor neuron (MN), and mature MN differentiation from human iPSCs using a small molecule cocktail. ( b ) Representative time course light microscopy images displaying induced pluripotent stem cell-derived neuron epithelial progenitor cell (iNEP), induced pluripotent stem cell-derived motor neuron progenitor cells (iMNP), induced pluripotent stem cell-derived motor neuron cells (iMN), and induced pluripotent stem cell-derived mature motor neuron cells (iMature MN) differentiation. ( c ) Fluorescence time course images of iNEP, iMNP, iMN, and iMature MN using stage differentiation markers. ( d-h ) WB results of motor neuron stage differentiation markers of stage cell differentiation in cell lysates. (iNEP: n  = 4, iMNP; n  = 4, iMN: n  = 4, and iMature MN: n  = 4). Full-length western blot images are presented in Additional file 1 : Fig.  1 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P  < 0.05, **†† p  < 0.01. Scale bars = 50 μm. iPSCs, induced pluripotent stem cell; iNEPs, induced pluripotent stem cell-derived neuron epithelial progenitor cells; iMNPs, induced pluripotent stem cell-derived motor neuron progenitor cells; iMNs, induced pluripotent stem cell-derived motor neuron cells; iMature MNs, induced pluripotent stem cell-derived mature motor neuron cells; IF, Immunofluorescence staining; WB, Western blot. SOX1 = iNEP, OLIG2 = iMNP, HB9 = iMN, SMI-32 = iMature MN.

Combination cell transplantation enhances behavioral improvement in a contusive chronic SCI model

Before cell transplantation, BDNF-eMSC was labeled with PKH 26 (red), whereas iMNP cells were labeled with PKH67 (green). First, we generated a contusive chronic SCI model and transplanted BDNF-eMSC and iMNP cells into the injured SC via the intralesional route at 6 weeks post-injury. At 12 weeks post-injury, we assessed the engraftment of transplanted cells and BDNF expression in the injured SC (Fig.  3 a). We used the BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post-injury. At 12 weeks post-injury, the BDNF-eMSC + iMNP group exhibited a significantly improved functional recovery than the PBS and BDNF-eMSC groups. The incidence ratio of BBB score 6–11 was 62.5% in the BDNF-eMSC + iMNP group and 14.28% in the iMNP group at 12 weeks post-injury (Fig.  3 b and Supplementary Movie 1 , 2 , 3 and 4 ). At 12 weeks post-injury, transplanted BDNF-eMSC (PKH26, red) were not observed in the white and gray matter of the lesion site. In contrast, transplanted iMNP (PKH67, green) and BDNF-eMSC + iMNP (PKH67, green) cells were observed and persistent in the white and gray matter of the lesion (Fig.  3 c and e-f). However, fewer BDNF-eMSC + iMNP cells were observed in the transplanted BDNF-eMSC at the lesion site (Fig.  3 c and e-f). At 12 weeks after injury, iMNP cells rather than BDNF-eMSCs remained at the lesion site (Fig.  3 c and e-f). However, it was confirmed that BDNF-eMSCs engrafted and remained at the lesion site one week after transplantation (Additional file 2 : Fig.  2 ). BDNF expression was assessed using IF staining to confirm BDNF expression at the lesion site. All cell transplantation groups showed BDNF expression compared with the PBS group at 12 weeks post-injury. The IF staining showed the BDNF-eMSC + iMNP group had a higher fluorescence intensity than the PBS, BDNF-eMSC, and iMNP groups. However, significant differences were not observed between the groups (Fig.  3 d and g). In the WB analysis, the BDNF-eMSC + iMNP group had a higher BDNF expression than the PBS, BDNF-eMSC, and iMNP groups at the lesion segment (approximately 1 cm). However, significant differences were not observed between all groups (Fig.  3 h). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 3

Combined transplantation of BDNF-eMSC and iMNP in a contusive chronic SCI Model. ( a ) Experimental schemes illustrating contusive chronic SCI rat model generation, combined cell transplantation, and clinical behavior and histology assessments. ( b ) The BBB scales and incidence rates 12 weeks after SCI (PBS: n  = 8, BDNF-eMSC: n  = 6, iMNP: n  = 7, BDNF-eMSC + iMNP n  = 8). ( c ) IF images showing merged 4´,6-diamidino-2-phenylindole and transplanted cells (red; BDNF-eMSC, green; iMNP) at the lesion site at 12 weeks. ( d ) Multi-fluorescent confocal images showing the expression of BDNF and transplanted cells in the white matter. ( e ) The number of engrafted cells at the lesion site ( n  = 6). ( f ) Quantification of the fluorescence intensity of engrafted cells at the lesion site ( n  = 6). ( g ) Quantification of the fluorescence intensity of BDNF at the lesion site ( n  = 4). ( h ) WB results of BDNF expression at the lesion site segments (approximately 1 cm). Full-length western blot images are presented in Additional file 4 : Fig.  4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and Mann–Whitney U (†) test with least significant difference post hoc analysis (*); *, † P  < 0.05, **†† p  < 0.01. (PBS: n  = 4, BDNF-eMSC: n  = 4, iMNP: n  = 4, BDNF-eMSC + iMNP: n  = 4). Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; BBB, Basso–Beattie–Bresnahan; iMNPs, induced pluripotent stem cell-derived motor neuron progenitor cells; IF, Immunofluorescence staining; WB; Western blot

BDNF-eMSC + iMNP combination cell transplantation promotes motor neuron maturation and growth density of neuronal processes at the lesion site

IF staining and WB analysis confirmed motor neuron differentiation and axonal regeneration of the transplanted cells at the lesion site. Mature motor neuron differentiation was evaluated using the SMI-32 marker. At 12 weeks post-injury, SMI-32 expression was observed around the gray matter of the lesion site. The expression of SMI-32 and transplanted iMNP cells were higher in the iMNP and BDNF-eMSC + iMNP groups than that in the PBS and BDNF-eMSC groups (Fig.  4 a). SMI-32 protein expression in the injured segments of the BDNF-eMSC + iMNP group significantly increased than that of the PBS group. However, significant differences were not observed between cell transplantation groups (Fig.  4 b). MAP-2 marker expression was analyzed in the axial section of the injured SC to confirm axonal regeneration of the transplanted cells at the lesion site 12 weeks post-injury. MAP-2 expression was observed around the dorsal horn and central canal of the lesion site. MAP-2 expression in the transplanted iMNP and BDNF-eMSC + iMNP cells were predominant in the injured SC (Fig.  4 c). MAP-2 revealed a significantly higher expression in the BDNF-eMSC + iMNP group compared with the PBS and iMNP groups (Fig.  4 d). The growth density of the neuronal process at the lesion site was analyzed using the GAP-43 marker at 12 weeks post-injury. GAP-43 expression was observed around the dorsal horn and central canal of the lesion site (Additional file 3 : Fig.  3 a). The IF staining showed the fluorescence intensity of the cell transplantation groups was significantly higher than that of PBS (Additional file 3 : Fig.  3 b). The expression of GAP-43 was significantly higher in the iMNP and BDNF-eMSC + iMNP groups than in the PBS group (Additional file 3 : Fig.  3 c). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 4

Enhancements of mature MN differentiation and growth density of neuronal processes by BDNF-eMSC and iMNPin vivo. ( a ) IF images showing the merged SMI-32 and transplanted cells at the lesion site, with SMI-32 differentiation of the transplanted iMNP cells being predominant in the lesion site at 12 weeks post-injury. ( b ) WB results of SMI-32 expression at the lesion site segment (approximately 1 cm) (PBS: n  = 4, BDNF-eMSC: n  = 4, iMNP: n  = 4, BDNF-eMSC + iMNP: n  = 4). Full-length WB images are presented in Additional file 1 : Fig.  1 . ( c ) Confocal images showing MAP-2 and transplanted cell expression around the lesion site. ( d ) WB results of MAP-2 expression at the lesion site segment (approximately 1 cm) (PBS: n  = 4, BDNF-eMSC: n  = 4, iMNP: n  = 4, BDNF-eMSC + iMNP n  = 4). Full-length western blot images are presented in Additional file 4 : Fig.  4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P  < 0.05. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMNP, induced pluripotent stem cell-derived motor neuron progenitor cells; IF, Immunofluorescence staining; WB, Western blot

BDNF-eMSC + iMNP combination cell transplantation increases oligodendrocyte and neuronal cell differentiation at the lesion site

IF staining and WB analysis were performed to confirm the expression of oligodendrocyte and neuronal cell differentiation in the engrafted cells. We assessed oligodendrocyte and neuronal cell differentiation using the CC-1 and NeuN markers at the lesion site. CC-1 is a representative oligodendrocyte phenotype marker. At 12 weeks post-injury, CC-1 expression was observed around the white matter. Oligodendrocyte-positive cells of the transplanted iMNP and BDNF-eMSC + iMNP cells were predominant in the injured SC and were qualitatively abundant around the engrafted iMNP cells (Fig.  5 a). The IF staining confirmed that the fluorescence intensity of CC-1 in the cell transplantation groups was significantly higher than that in the PBS group (Fig.  5 b). The BDNF-eMSC + iMNP group had significantly higher CC-1 expression than the PBS group (Fig.  5 c). NeuN is a neuronal cell phenotype marker. NeuN expression was mainly observed in the dorsal horn of the gray matter and central canal. NeuN and iMNP were highly expressed in the iMNP and BDNF-eMSC + iMNP groups than in the PBS and BDNF-eMSC groups in the dorsal horn of the gray matter and central canal at 12 weeks post-injury (Fig.  5 d). The IF staining confirmed that the fluorescence intensity of NeuN in the cell transplantation groups was significantly higher than that in the PBS group (Fig.  5 e). The iMNP and BDNF-eMSC + iMNP groups had significantly higher NeuN expression than the PBS group (Fig.  5 f). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 5

Increased oligodendrocyte and neuronal cells by BDNF-eMSC and iMNP in the lesion site. ( a ) IF image showing oligodendrocyte differentiation by transplanted BDNF-eMSC and iMNP around the injured site. ( b ) Quantification of the fluorescence intensity of CC-1 at the lesion site ( n  = 4) ( c ) WB results of CC-1 expression in the injured segment (approximately 1 cm) (PBS: n  = 4, BDNF-eMSC: n  = 4, iMNP: n  = 4, BDNF-eMSC + iMNP: n  = 4). Full-length WB images are presented in Additional file 4 : Fig.  4 . ( d ) IF analysis of NeuN expression, merged with transplanted cells, was observed in the gray matter. ( e ) Quantification of the fluorescence intensity of NeuN at the lesion site ( n  = 4). ( f ) Expression of neuronal cell marker NeuN, confirmed in the injured segment (approximately 1 cm) via WB. Full-length western blot images are presented in Additional file 4 : Fig.  4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P  < 0.05. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMNP, induced pluripotent stem cell-derived motor neuron progenitor cell; IF; Immunofluorescence staining, WB; Western blot

BDNF-eMSC and iMN play a critical role in promoting motor neuron maturation and growth density of neuronal processes in vitro

In vivo results demonstrated that combination cell transplantation using BDNF-eMSC and iMNP promoted motor neuron maturation and axonal regeneration at the lesion site. 2D and 3D co-culture of BDNF-eMSC and iMN were performed to confirm the effect of motor neuron differentiation and axonal regeneration in vitro based on previous in vivo results. We analyzed the motor neuron maturation and axonal regeneration by co-culturing BDNF-eMSC and iMN in 2D and 3D spheroid platforms during mature motor neuron differentiation (Fig.  6 a). In the IF staining, SMI-32 expression was qualitatively higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig.  6 g). In the WB analysis, SMI-32 expression was also significantly higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig.  6 b and c). BDNF expression was qualitatively higher in the BDNF-eMSC and BDNF-eMSC + iMN groups than in the iMN group (Fig.  6 g). In the WB analysis, BDNF expression was significantly higher in the BDNF-eMSC group than in the iMN and BDNF-eMSC + iMN groups. Additionally, BDNF expression was significantly higher in the BDNF-eMSC + iMN group than in the iMN groups on day 10 (Fig.  6 d). The MAP-2 marker was analyzed in a 2D co-culture platform to evaluate the expression of axonal regeneration. In the IF staining and WB analysis, MAP-2 expression was significantly higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig.  6 e and h). We assessed neurite outgrowth induction by co-culturing BDNF-eMSC and iMN in 3D spheroid platforms. We attached the BDNF-eMSC and iMN 3D spheroids to laminin-coated plates for 10 days and assessed neurite outgrowth from the 3D spheroids on day 10 using a neurite outgrowth assay kit. In the red fluorescence staining and intensity analysis, neurite outgrowth was significantly higher in the BDNF-eMSC + iMN group than in the BDNF-eMSC and iMN groups. Furthermore, neurite outgrowth was significantly higher in the iMN group than in the BDNF-eMSC group (Fig.  6 f and h). Full-length western blot images are presented in Additional file 4 : Fig.  4 .

figure 6

Increased MN maturation and axonal regeneration induction by BDNF-eMSC and iMN cell co-culture. ( a ) Schematic of BDNF-eMSC and iMN 2D and 3D co-culture motor neuron differentiation and maturation and axonal regeneration assessments in vitro assay. ( b ) Representative WB images of SMI-32, BDNF, MAP-2 in 2D co-culture on day 10. ( c ) Protein expression of SMI-32 in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10. ( d ) Protein expression of BDNF in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10. ( e ) Protein expression of MAP-2 in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10 (BDNF-eMSC: n  = 3, iMN: n  = 3, BDNF-eMSC + iMN: n  = 3). ( f ) Quantification of neurite outgrowths on day 10 in 3D co-cultured spheroid (BDNF-eMSC: n  = 6, iMN: n  = 6, BDNF-eMSC + iMN: n  = 6). ( g ) Representative IF images of SMI-32 in BDNF-eMSC and iMN 2D co-culture on day 10. Representative IF images of BDNF in BDNF-eMSC and iMN 2D co-culture on day 10. ( h ) Representative IF images of MAP-2 in BDNF-eMSC and iMN 2D co-culture on day 10. Representative fluorescence images of neurite outgrowth on day 10 in 3D co-cultured spheroid. Full-length western blot images are presented in Additional file 4 : Fig.  4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P  < 0.05, **†† p  < 0.01, *** p  < 0.001. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMN, induced pluripotent stem cell-derived motor neuron; IF, Immunofluorescence staining; WB, Western blot

BDNF-eMSC and iMN supported neural circuitry and connection in 3D co-culture spheroid in vitro

We showed that cell transplantation using the combination of BDNF-eMSC and iMN increased the differentiation of mature motor neurons and growth density of the neuronal process in injured SC; however, it was difficult to confirm the possible mechanism in vivo. We hypothesized that the 3D co-culture of BDNF-eMSC and iMN would increase the neural circuitry and connection during MN differentiation and maturation. To confirm this hypothesis, we analyzed the induction of neural circuitry and connection by co-culturing BDNF-eMSC and iMN on 3D spheroid platforms during the differentiation of mature iMN. Synaptic connections and local neural networks were assessed in the 3D co-culture using synapsin-1, Tuj-1, MAP-2 markers, and MEA analysis on day 10 (Fig.  7 a). The IF staining showed that synapsin-1, Tuj-1, and MAP-2 expression were higher in the BDNF-eMSC + iMN group than in the BDNF-eMSC and iMN groups (Fig.  7 b - f). Dendrite connections between spheroids were confirmed in the iMN and BDNF-eMSC + iMN groups, but not in the BDNF-eMSC group (Fig.  7 b and c). The IF staining confirmed that the fluorescence intensity of synapsin-1, Tuj-1, and MAP-2 in the iMN and BDNF-eMSC + iMN groups was significantly higher than that in the BDNF-eMSC group (Fig.  7 d-f). Electrophysiology of the 3D spheroid during mature MN differentiation was confirmed using MEA. Neural spikes and activities increased in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group. MEA showed a higher number of spikes in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig.  7 g-i). Taken together, we confirmed the successful neural circuitry and connection during the maturation of MN generated from the 3D co-culture platform.

figure 7

Synergistic effect of promoting synaptic connections and neural networks by BDNF-eMSC and iMN 3D co-culture platform during the differentiation of mature motor neurons. ( a ) Schematic of the assessments of BDNF-eMSC and iMN 3D co-culture in promoting the differentiation of motor neurons, synaptic connections, and neural networking using IF staining and MEA analysis in vitro assay. ( b ) Representative IF images of synapsin-1 and Tuj-1 in 3D co-culture on day 10. ( c ) Representative IF images of synapsin-1 and MAP-2 in 3D co-culture on day 10. ( d ) Quantification of the fluorescence intensity of synapsin-1 in 3D co-culture on day 10. ( e ) Quantification of the fluorescence intensity of Tuj-1 in 3D co-culture on day 10. ( f ) Quantification of the fluorescence intensity of MAP-2 in 3D co-culture on day 10. ( g ) Representative images of heatmap activity for plate-wide visualization of spike or beat rates and amplitudes on MEAs (3D BDNF-eMSC, n  = 4; 3D iMN, n  = 4 and 3D BDNF-eMSC + iMN, n  = 4). ( h ) Measurement of active electrodes per well. i Measure of average number of spikes of active electrodes per well (3D BDNF-eMSC, n  = 4; 3D iMN, n  = 4 and 3D BDNF-eMSC + iMN, n  = 4). Data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P  < 0.05, **†† p  < 0.01, *** p  < 0.001. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMN, induced pluripotent stem cell-derived motor neuron; IF, Immunofluorescence staining; MEA, multi-electrode arrays

Current pharmacological or physical rehabilitation-based therapies for chronic SCI are limited and primarily focus on managing symptoms such as pain or muscle stiffness. Moreover, treatments that can clinically improve motor or sensory function in patients with chronic SCI trauma are lacking [ 31 , 32 , 33 , 34 , 35 ]. Clinical therapeutic trials for overcoming SCI include neuronal protection and regeneration approaches. Neuroregenerative trials have been used to enhance exogenous supplement using various stem cells in chronic SCI model. These stem cells include Schwann cells, OECs, MSCs, neural stem/progenitor cells, ESCs, and iPSC-derived cells [ 3 , 8 ]. Trials aimed at identifying the ideal cell types and transplantation strategies in chronic SCI are required to achieve effective stem cell transplantation in SCI. In this study, we aimed to investigate the combination transplant of BDNF-eMSC and iMNP in a chronic SCI model.

MSC cell transplantation in chronic SCI offers promising neuron regenerative strategies. Previous studies have reported significant clinical improvement with MSC transplantation in a chronic SCI model [ 16 ]. However, the efficacy of cell engraftment is low owing to the distinct pathology of chronic SCI. Therefore, modulation of the microenvironment in chronic SCI is required to enhance the efficacy of transplantation [ 30 ]. Based on previous studies, we aimed to increase the efficacy of cell engraftment and differentiation at the lesion site using BDNF-eMSCs in a chronic SCI model. Previous studies have reported the neuroprotective therapeutic effects of BDNF-eMSC in neonatal hypoxic-ischemic, traumatic brain injury, and neurogenic bladder models in rats [ 17 , 18 , 25 , 30 ]. However, to therapeutic efficacy, safety is another critical issue for successful clinical translation. BDNF-eMSC was established using a lentiviral vector encoding the c-Myc, the reprogramming factor, tumorigenicity is a major concern for their in vivo. Previous studies findings suggest that the BDNF-eMSC is safety and ready to use and therapeutic efficacy confirmed in neonatal hypoxic-ischemic, traumatic brain injury model in rats and cardiac repair [ 17 , 18 , 24 ]. We performed an in vitro assay to assess BDNF expression in BDNF-eMSC and naïve MSC before cell transplantation. Our data suggest that BDNF-eMSC effectively increased BDNF expression more than naïve MSC (Fig.  1 c - f). After cell seeding, BDNF expression decreased on day 7 compared with day 2, but continued BDNF expression was confirmed (Fig.  1 d - g).

Recovery of motor function in chronic SCI model is still limited. Therefore, a combination treatment strategy involving various approaches must be considered to improve motor functions in chronic SCI models [ 36 ]. We attempted a combination cell transplantation strategy involving BDNF-eMSC and iMNP to increase the survival of engrafted cells at the lesion site and enhance the differentiation capacity of motor neurons in a chronic SCI model. We used iMNP cells to promote motor neuron differentiation at the lesion site. We generated iPSC-derived motor neurons using a previously reported small molecule approach [ 28 , 29 ]. Our results suggest that iPSC-derived motor neurons were successfully differentiated in vitro (Fig.  2 a-c), and the iMNP cell phenotype was confirmed before cell transplantation using the OLIG2 marker (Fig.  2 d). We performed combination cell transplantation of irradiated BDNF-eMSC and iMNP in a contusive chronic SCI model. At 6 weeks post-injury, BDNF-eMSC + iMNP group containing both types of cells (1:1) in 10 µL PBS was transplanted to the lesion site. Interestingly, BDNF-eMSC and iMNP combination cell transplantation improved clinical recovery and incidence rate than PBS and single-cell transplantation groups in the chronic SCI model (Fig.  3 a and b). At 12 weeks post-injury, we discovered that transplanted iMNP in the iMNP and BDNF-eMSC + iMNP groups remained at the lesion site, but no BDNF-eMSC were detected (Fig.  3 c). Previous studies have reported that irradiated cultured HGF (Hepatocyte growth factor) over-expressing engineered mesenchymal stem cell (HGF-eMSC) exhibited decreased proliferation rates in vitro culture and no tumor was detected in in vivo tumorigenicity testing using nude mice [ 24 ]. Our result revealed that irradiated the culture BDNF-eMSC exhibited decreased BDNF expression on day 7 (Fig.  1 g), and at 6 weeks after implantation, most of the BDNF-eMSC did not remain at the lesion site using microscopic observations (Fig.  3 c). These results suggest that genetically engineered cell is a suitable combination cell transplantation strategy in chronic SCI models.

Previous studies have reported that allogenic bone marrow-derived MSC transplantation without cell manipulation in acute and chronic SCI mainly resulted in astrocytic differentiation at the lesion site [ 16 , 37 , 38 ]. Our research discovered that BDNF-eMSC transplantation in chronic SCI increased the oligodendrocyte and neuron cells compared with PBS. However, the BDNF-eMSC group showed less neuron differentiation at the lesion site than the iMNP group (Fig.  5 a - f). Another study reported that human iMNP cell transplantation in an acute SCI model resulted in transplanted human iMNP with a motor neuron lineage of mixed maturation state in the ventral horns [ 23 ]. Our research observed higher SMI-32 expression in transplanted iMNP cell in the iMNP and BDNF-eMSC + iMNP groups than that in the PBS and BDNF-eMSC groups at 12 weeks post-injury (Fig.  4 a and b). Our results suggest that iMNP directly influences mature motor neuron differentiation more than BDNF-eMSC at the lesion site. Transplanted hMNP cell increased endogenous neuronal survival and promote neurite branching [ 23 ]. We confirmed that BDNF-eMSC and iMNP combination cell transplantation increased axonal regeneration, as indicated by MAP-2 expression, compared with PBS. The BDNF-eMSC and iMNP groups showed significantly higher MAP-2 expression than the iMNP group at the lesion site (Fig.  4 c and d). Our results suggest that mature motor neuron differentiation and growth density of neuronal processes are enhanced by the synergistic effects of BDNF-eMSC and iMNP combination cell transplantation at the lesion site in the chronic SCI model (Figs.  3 b and 4 b and d). In this study, we were able to confirm that transplanted cells at the lesion site could promote the growth density of neuronal processes using MAP-2 and GAP-43 markers. However, the limitation is that we could not detect axonal regeneration using retrovirus or anterograde tracer BDA. In future studies, axonal regeneration detection using anterograde tracer BDA is needed at the lesion site after cell transplantation.

Other studies have reported that motor neurons respond to neurotrophic cues and express and secret growth factors. Moreover, hMNPs express and secrete neurotrophic factors that promote axonal growth and protect neurons from cell death [ 22 , 23 , 39 ]. We confirmed that mature motor neuron differentiation and BDNF expression were increased at the lesion site by BDNF-eMSC + iMNP combination cell transplantation in chronic SCI (Figs.  3 h and 4 a and b). In addition, axonal regeneration was promoted at the lesion site (Fig.  4 d). However, it was complicated to confirm the possible mechanism in vivo. We hypothesized that the BDNF-eMSC and iMN might synergically promote neurite outgrowth induction during motor neuron differentiation and maturation through BDNF expression. We co-cultured BDNF-eMSC and iMN in 2D and 3D spheroid platforms during mature motor neuron differentiation and assessed the neurite outgrowth in vitro assay to confirm this hypothesis. As in previous studies, the BDNF-eMSC + iMN group had significantly higher mature motor neuron differences and BDNF expression than the iMN group. In addition, neurite outgrowth was significantly promoted in the BDNF-eMSC + iMN group (Fig.  6 f and h). However, BDNF-eMSC has a paracrine effect on motor neuron differentiation and neurite outgrowth promotion (Fig.  6 f, g and h). Our results suggest that BDNF-eMSC and iMN co-cultures play an essential role in promoting mature motor neuron differentiation and neurite outgrowth. Additionally, the in vitro assay confirmed that the co-culture of BDNF-eMSC and iMN could promote functional synaptic connections and neural networks during the differentiation of mature motor neurons. These results show successful neural circuitry and connection during maturation of MN generated from the 3D co-culture platform. We were able to confirm that the synergistic effect of BDNF-eMSC + iMN promoted the differentiation of mature motor neurons and neural networks in vitro (Fig.  7 ). However, in future studies, it is necessary to confirm that the transplanted cells at the lesion site promote functional synaptic connections and local neural networks after transplantation of a combination of BDNF-eMSCand iMNP cells.

In summary, this study confirms that behavioral abilities were recovered through the induction of differentiation of mature motor neurons at the lesion site by transplanting a combination of BDNF-eMSC and iMNP cells in a chronic SCI model, suggesting the therapeutic efficacy of the transplantation strategy using a combination of genetically engineered cells and iPSCs in a chronic SCI rat model. However, the limitation of this study is the lack of explanation of the mechanisms supporting the synergistic effect of transplantation of combined genetically engineered cells and iPSCs in the chronic SCI model. In future studies, it will be necessary to confirm the mechanisms of the synergistic effects of transplantation of a combination of cell types using RNA sequencing (RNA-seq) or single-cell analysis at the lesion site. In addition, it is necessary to study the effect of neural regeneration by the differentiation of motor neurons and BDNF expression according to cell ratio and number of transplants. To reduce variation in animal experiments, a sample size sufficient for statistical analysis should be calculated using a few free software packages (G power, power sample).

To our knowledge, this study demonstrates that the combination cell transplantation of BDNF-eMSC and iMNP improves behavioral recovery in the chronic SCI model. At 12 weeks post-injury, the transplanted iMNP predominantly differentiated into mature motor neurons. The BDNF-eMSC exerted a paracrine effect on neuron regeneration, as evidenced by BDNF expression at the lesion site. In vivo and in vitro, the co-culture of BDNF-eMSC and iMNP played a crucial role in motor neuron maturation and axonal regeneration through BDNF expression. Overall, our findings provide proof of concept that stem cell-based gene therapy and combination cell transplantation can enhance motor neuron maturation and BDNF expression in chronic SCI.

Data availability

All datasets of this article are included within the article.

Abbreviations

American veterinary medical association

Animal research: reporting of in vivo experiments

Blood-Brain-Barrier

Basso–Beattie–Bresnahan

Bovine serum albumin

Brain-derived neurotrophic factor

Cord blood mononuclear cells

Dulbecco’s modified eagle’s medium

Dorsomorphin homologue 1

4′,6-diamidino-2-phenylindole

Engineered mesenchymal stem cells

Embryonic stem cell

Human mesenchymal stem cells

Hepatocyte growth factor

Induced pluripotent stem cell

Induced pluripotent stem cell derived Neuro epithelial progenitor

Induced pluripotent stem cell derived motor neuron progenitor cells

Induced pluripotent stem cell derived motor neuron

Induced pluripotent stem cell derived mature motor neurons

Intra lesional

Immuno Fluorescence

Motor neuron cells

Multicenter animal spinal cord injury study

Microtubule-Associated protein-2

Neural stem and progenitor cells

Olfactory ensheathing cells

Optimal cutting temperature

Phosphate-buffered saline

Paraformaldehyde

Phenyl methyl sulfonyl fluoride

Pumorphamine

Room temperature

Retinoic acid

Spinal cord injury

Spinal cord

Sprague–Dawley

Tetra cycline trans activator

Tris-buffered saline

Tris-buffered saline with 0.05% Tween-20

Traumatic spinal cord injury

Western blot

Quadri SA, Farooqui M, Ikram A, Zafar A, Khan MA, Suriya SS, Claus CF, Fiani B, Rahman M, Ramachandran A, et al. Recent update on basic mechanisms of spinal cord injury. Neurosurg Rev. 2020;43:425–41.

Article   PubMed   Google Scholar  

Barbiellini Amidei C, Salmaso L, Bellio S, Saia M. Epidemiology of traumatic spinal cord injury: a large population-based study. Spinal Cord. 2022;60:812–9.

Article   PubMed   PubMed Central   Google Scholar  

Kim YH, Ha KY, Kim SI. Spinal cord Injury and related clinical trials. Clin Orthop Surg. 2017;9:1–9.

Pang QM, Chen SY, Xu QJ, Fu SP, Yang YC, Zou WH, Zhang M, Liu J, Wan WH, Peng JC, Zhang T. Neuroinflammation and Scarring after spinal cord Injury: therapeutic roles of MSCs on inflammation and glial scar. Front Immunol. 2021;12:751021.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Al Mamun A, Monalisa I, Tul Kubra K, Akter A, Akter J, Sarker T, Munir F, Wu Y, Jia C, Afrin Taniya M, Xiao J. Advances in immunotherapy for the treatment of spinal cord injury. Immunobiology. 2021;226:152033.

Article   CAS   PubMed   Google Scholar  

Eli I, Lerner DP, Ghogawala Z. Acute traumatic spinal cord Injury. Neurol Clin. 2021;39:471–88.

Fischer I, Dulin JN, Lane MA. Transplanting neural progenitor cells to restore connectivity after spinal cord injury. Nat Rev Neurosci. 2020;21:366–83.

Assinck P, Duncan GJ, Hilton BJ, Plemel JR, Tetzlaff W. Cell transplantation therapy for spinal cord injury. Nat Neurosci. 2017;20:637–47.

Tetzlaff W, Okon EB, Karimi-Abdolrezaee S, Hill CE, Sparling JS, Plemel JR, Plunet WT, Tsai EC, Baptiste D, Smithson LJ, et al. A systematic review of cellular transplantation therapies for spinal cord injury. J Neurotrauma. 2011;28:1611–82.

Lu P, Woodruff G, Wang Y, Graham L, Hunt M, Wu D, Boehle E, Ahmad R, Poplawski G, Brock J, et al. Long-distance axonal growth from human induced pluripotent stem cells after spinal cord injury. Neuron. 2014;83:789–96.

Yang N, Zuchero JB, Ahlenius H, Marro S, Ng YH, Vierbuchen T, Hawkins JS, Geissler R, Barres BA, Wernig M. Generation of oligodendroglial cells by direct lineage conversion. Nat Biotechnol. 2013;31:434–9.

Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–76.

Kramer AS, Harvey AR, Plant GW, Hodgetts SI. Systematic review of induced pluripotent stem cell technology as a potential clinical therapy for spinal cord injury. Cell Transpl. 2013;22:571–617.

Article   Google Scholar  

Sun L, Wang F, Chen H, Liu D, Qu T, Li X, Xu D, Liu F, Yin Z, Chen Y. Co-transplantation of Human umbilical cord mesenchymal stem cells and human neural stem cells improves the outcome in rats with spinal cord Injury. Cell Transpl. 2019;28:893–906.

Siddiqui AM, Khazaei M, Fehlings MG. Translating mechanisms of neuroprotection, regeneration, and repair to treatment of spinal cord injury. Prog Brain Res. 2015;218:15–54.

Kim JW, Ha KY, Molon JN, Kim YH. Bone marrow-derived mesenchymal stem cell transplantation for chronic spinal cord injury in rats: comparative study between intralesional and intravenous transplantation. Spine (Phila Pa 1976). 2013;38:E1065–1074.

Ahn SY, Sung DK, Chang YS, Sung SI, Kim YE, Kim HJ, Lee SM, Park WS. BDNF-Overexpressing Engineered mesenchymal stem cells enhances their therapeutic efficacy against severe neonatal hypoxic ischemic brain Injury. Int J Mol Sci 2021, 22.

Choi BY, Hong DK, Kang BS, Lee SH, Choi S, Kim HJ, Lee SM, Suh SW. Engineered Mesenchymal stem cells over-expressing BDNF protect the Brain from Traumatic Brain Injury-Induced neuronal death, neurological deficits, and cognitive impairments. Pharmaceuticals (Basel) 2023, 16.

Ahn SY, Sung DK, Kim YE, Sung S, Chang YS, Park WS. Brain-derived neurotropic factor mediates neuroprotection of mesenchymal stem cell-derived extracellular vesicles against severe intraventricular hemorrhage in newborn rats. Stem Cells Transl Med. 2021;10:374–84.

Khazaei M, Siddiqui AM, Fehlings MG. The potential for iPS-Derived stem cells as a therapeutic strategy for spinal cord Injury: opportunities and challenges. J Clin Med. 2014;4:37–65.

Nogradi A, Pajer K, Marton G. The role of embryonic motoneuron transplants to restore the lost motor function of the injured spinal cord. Ann Anat. 2011;193:362–70.

Lukovic D, Valdes-Sanchez L, Sanchez-Vera I, Moreno-Manzano V, Stojkovic M, Bhattacharya SS, Erceg S. Brief report: astrogliosis promotes functional recovery of completely transected spinal cord following transplantation of hESC-derived oligodendrocyte and motoneuron progenitors. Stem Cells. 2014;32:594–9.

Rossi SL, Nistor G, Wyatt T, Yin HZ, Poole AJ, Weiss JH, Gardener MJ, Dijkstra S, Fischer DF, Keirstead HS. Histological and functional benefit following transplantation of motor neuron progenitors to the injured rat spinal cord. PLoS ONE. 2010;5:e11852.

Park BW, Jung SH, Das S, Lee SM, Park JH, Kim H, Hwang JW, Lee S, Kim HJ, Kim HY, et al. In vivo priming of human mesenchymal stem cells with hepatocyte growth factor-engineered mesenchymal stem cells promotes therapeutic potential for cardiac repair. Sci Adv. 2020;6:eaay6994.

Tian WJ, Jeon SH, Zhu GQ, Kwon EB, Kim GE, Bae WJ, Cho HJ, Ha US, Hong SH, Lee JY, et al. Effect of high-BDNF microenvironment stem cells therapy on neurogenic bladder model in rats. Transl Androl Urol. 2021;10:345–55.

Rim YA, Nam Y, Ju JH. Application of Cord Blood and Cord Blood-Derived Induced Pluripotent Stem cells for cartilage regeneration. Cell Transpl. 2019;28:529–37.

Nam Y, Rim YA, Jung SM, Ju JH. Cord blood cell-derived iPSCs as a new candidate for chondrogenic differentiation and cartilage regeneration. Stem Cell Res Ther. 2017;8:16.

Du ZW, Chen H, Liu H, Lu J, Qian K, Huang CL, Zhong X, Fan F, Zhang SC. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat Commun. 2015;6:6626.

Li XJ, Du ZW, Zarnowska ED, Pankratz M, Hansen LO, Pearce RA, Zhang SC. Specification of motoneurons from human embryonic stem cells. Nat Biotechnol. 2005;23:215–21.

Lee JY, Ha KY, Kim JW, Seo JY, Kim YH. Does extracorporeal shock wave introduce alteration of microenvironment in cell therapy for chronic spinal cord injury? Spine (Phila Pa 1976). 2014;39:E1553–1559.

Curtis E, Martin JR, Gabel B, Sidhu N, Rzesiewicz TK, Mandeville R, Van Gorp S, Leerink M, Tadokoro T, Marsala S, et al. A first-in-Human, phase I study of neural stem cell transplantation for chronic spinal cord Injury. Cell Stem Cell. 2018;22:941–e950946.

Saulino M, Averna JF. Evaluation and management of SCI-Associated Pain. Curr Pain Headache Rep. 2016;20:53.

Gwak YS, Kim HY, Lee BH, Yang CH. Combined approaches for the relief of spinal cord injury-induced neuropathic pain. Complement Ther Med. 2016;25:27–33.

McIntyre A, Mays R, Mehta S, Janzen S, Townson A, Hsieh J, Wolfe D, Teasell R. Examining the effectiveness of intrathecal baclofen on spasticity in individuals with chronic spinal cord injury: a systematic review. J Spinal Cord Med. 2014;37:11–8.

Emamhadi M, Alijani B, Andalib S. Long-term clinical outcomes of spinal accessory nerve transfer to the suprascapular nerve in patients with brachial plexus palsy. Acta Neurochir (Wien). 2016;158:1801–6.

Gomes-Osman J, Cortes M, Guest J, Pascual-Leone A. A systematic review of experimental strategies aimed at improving motor function after Acute and chronic spinal cord Injury. J Neurotrauma. 2016;33:425–38.

Kim YC, Kim YH, Kim JW, Ha KY. Transplantation of mesenchymal stem cells for Acute spinal cord Injury in rats: comparative study between Intralesional Injection and Scaffold based transplantation. J Korean Med Sci. 2016;31:1373–82.

Kang ES, Ha KY, Kim YH. Fate of transplanted bone marrow derived mesenchymal stem cells following spinal cord injury in rats by transplantation routes. J Korean Med Sci. 2012;27:586–93.

Erceg S, Ronaghi M, Oria M, Rosello MG, Arago MA, Lopez MG, Radojevic I, Moreno-Manzano V, Rodriguez-Jimenez FJ, Bhattacharya SS, et al. Transplanted oligodendrocytes and motoneuron progenitors generated from human embryonic stem cells promote locomotor recovery after spinal cord transection. Stem Cells. 2010;28:1541–9.

Download references

Acknowledgements

Not applicable.

This work was supported by a grant from the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT, & Future Planning (grant number: NRF-2019R1A5A2027588, and NRF-2021R1C1C2004688). This research was also supported by a grant from the Catholic Institute of Cell Therapy (CRC) in 2024. The basic Medical Science Facilitation Program through the Catholic Medical Center of the Catholic University of Korea funded by the Catholic Education Foundation. The funding body played no role in the design of the study, collection, analysis, and interpretation of data, and manuscript writing.

Author information

Authors and affiliations.

CiSTEM laboratory, Catholic iPSC Research Center (CiRC), College of Medicine, The Catholic University of Korea, Seoul, 137-701, Republic of Korea

Jang-Woon Kim, Yeri Alice Rim, Se In Jung, Jooyoung Lim & Ji Hyeon Ju

Department of Biomedicine & Health Science, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea

Division of Rheumatology, Department of Internal Medicine, Seoul St. Mary’s Hospital, Institute of Medical Science, College of Medicine, The Catholic University of Korea, Seoul, 137-701, Republic of Korea

Ji Hyeon Ju

YiPSCELL, Inc., Seoul, Republic of Korea

Juryun Kim, Yoojun Nam, Hyewon Kim & Ji Hyeon Ju

SL BiGen, Inc., Incheon, Republic of Korea

Soon Min Lee, Young Chul Sung & Hyo-Jin Kim

You can also search for this author in PubMed   Google Scholar

Contributions

Study design: J.-W.K., J.R.K., S.M.L., Y.C.S., H.-J.K., and J.H.J.; data collection: J.-W.K., J.R.K., Y.A.R., Y.N., H.K., S.I.J., J.L, and J.H.J.; data analysis: J.-W.K., J.R.K., S.M.L., Y.C.S., Y.A.R., and J.H.J.; drafting manuscript: J.-W.K. and J.H.J.

Corresponding author

Correspondence to Ji Hyeon Ju .

Ethics declarations

Ethics approval and consent to participate.

Title of the approved project; Efficacy evaluation on neuronal protection and regeneration of combined treatment of MNP and BM-102 in contusive spinal cord injury model.

Name of the institutional approval committee; The Animal Studies Committee of the School of Medicine, the Catholic University of Korea.

Approval Number; IACUC approval Number CUMC-2020-0364-04.

Date of approval; 29. December 2021.

Consent for publication

Not application.

Conflict of interest

The authors declare that there is no competing interest. Affiliation 4 declares that there is no competing interest. J.K., Y.N., and H.K. are employees at YiPSCELL, Inc., and J.H.J. is the employer. J.H.J. is the founder of YiPSCELL, Inc., and also works at the Seoul St. Mary’s hospital, Catholic University of Korea. The two groups do not have competing interests. Affiliation 5 declares that there is no competing interest. S.M.L., Y.C.S., H.-J. K. are employees at SL BiGen, Inc. The two groups do not have competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, 13287_2024_3770_moesm3_esm.mp4.

Supplementary Movie 1: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in PBS group (separate Movie file)

13287_2024_3770_MOESM4_ESM.mp4

Supplementary Movie 2: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in BDNF eMSC group (separate Movie file)

13287_2024_3770_MOESM5_ESM.mp4

Supplementary Movie 3: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in iMNP group (separate Movie file)

13287_2024_3770_MOESM6_ESM.mp4

Supplementary Movie 4: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in BDNF eMSC+iMNP group (separate Movie file)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Kim, JW., Kim, J., Lee, S.M. et al. Combination of induced pluripotent stem cell-derived motor neuron progenitor cells with irradiated brain-derived neurotrophic factor over-expressing engineered mesenchymal stem cells enhanced restoration of axonal regeneration in a chronic spinal cord injury rat model. Stem Cell Res Ther 15 , 173 (2024). https://doi.org/10.1186/s13287-024-03770-9

Download citation

Received : 28 November 2023

Accepted : 26 May 2024

Published : 18 June 2024

DOI : https://doi.org/10.1186/s13287-024-03770-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • BDNF over-expressing engineered mesenchymal stem cell
  • Induced pluripotent stem cell-derived motor neuron progenitor cell
  • Chronic spinal cord injury
  • Combination cell transplantation

Stem Cell Research & Therapy

ISSN: 1757-6512

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

experimental 3 factors

  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, experimental investigation on pedestrian walking load in steel footbridges.

www.frontiersin.org

  • 1 China Construction Steel Engineering Co. Ltd., Shenzhen, China
  • 2 China Construction Steel Structure Guangdongcorp.ltd., Huizhou, China
  • 3 China Railway Guangzhou Group Co. Ltd., Guangzhou, China
  • 4 Dongguan Hongchuan Intelligent Logistics Development Co. Ltd., Dongguan, China
  • 5 School of Transportation and Civil Engineering and Architecture, Foshan University, Foshan, China

Accurate simulation of walking load is of great significance in conducting human-induced vibration analyses. However, accurate pedestrian walking load data obtained from long-span footbridges is scarce and data reliability depends on the sensor used for measurement. In the current work, Yanluo Footbridge with 102 m span was adopted as test site and Xnode high-precision acceleration wireless sensor was applied for measurements. An experimental investigation was performed on walking loads according to bipedal walking force model. In experimental studies, single-person and multi-person walking tests were performed at Yanluo Footbridge to measure corresponding stride frequency and dynamic load factor. The acceleration time-histories of walking pedestrians were accurately recorded using three-axis wireless acceleration sensor Xnode. Furthermore, the equation of dynamic load factor was derived by analyzing time-histories and power spectra and the design models of pedestrian walking load and crowd load were developed based on a great number of experimental data. Time histories of pedestrian walking loads showed regular periodic changes and dynamic load factor increased by increasing stride frequency. Using the walking load model developed in this work, the reliable structure response of human-induced vibration analysis can be obtained.

Introduction

To measure pedestrian force, the time-history curve of footstep load was directly observed using force measuring plate, walking machine, and other instruments to analyze its characteristics and develop a mathematical model. ( Harper et al., 1961 ) developed a force plate to measure the time-history curves of vertical, lateral, and longitudinal components of the footstep load of an individual pedestrian, which was the first experimental measurement of pedestrian load. Due to early development of force plates and high accuracy of their measured data, many researchers have applied and improved them and its technology has been matured. However, the large volume and fixed position of the instrument limit the free walking of the tester. However, application of a wireless acceleration sensor fixed in the body or waist leg is more convenient when collecting data on human movement characteristics and pedestrian movement is not limited by the bulkiness of the device. With rapid development of electronic technology, production cost of wireless sensors is decreasing, attracting the attention of researchers around the world. For example, Song et al. applied acceleration sensors for the development of activity recognition systems capable of identifying daily activities such as running, walking, sitting, standing, lying down, falling, etc., by fixing sensor module connected to mobile phone on a belt with recognition accuracy of up to 95.5%. A human body daily activity recognition system was designed using three-axis acceleration sensors ( Khan et al., 2010 ). By placing the recognition model on waist, coat pocket, etc., daily physical activities such as going up and downstairs, resting, riding and so on could be identified. Recognition accuracy of this method could reach 95.96%. ( Ailisto et al., 2005 ) applied a triaxial acceleration sensor fixed at waist to collect acceleration signals of walking and compared the collected signals and with stored template signals. He found that the average error rate was only 6.4%. In addition, accurate pedestrian walking load data obtained from long-span footbridges, such as arch bridge ( Lu et al., 2021a ; Lu et al., 2021b ; Yang et al., 2022a ) is scarce and data reliability depends on the sensor used for measurement.

Acceleration sensors could effectively record responses containing human motion information. Many other researchers have conducted similar studies ( Zhu et al., 2018 ; Chen et al., 2019 ; Wang et al., 2019 ; Mohammed and Pavic, 2021 ; Paweł et al., 2021 ; Xiong et al., 2021 ; Chen et al., 2022 ). However, few studies are available on indirect measurement of walking force using acceleration sensors. Several approaches have been designed to measure walking force by acceleration detection systems based on 2DOF-bipedal walking force ( Ebrahimpour et al., 1996 ; Živanović et al., 2005 ; Bachus et al., 2006 ; Geyer et al., 2006 ; Bachus et al., 2006 ; Gurney et al., 2008 ; Jones et al., 2011 ). Our findings confirmed that the developed system was able to effectively measure walking force and walking characteristics. However, both studies simplified human body as a double particle and measured human body acceleration at the double center of mass by acceleration sensors. However, test procedure was relatively complex; the accelerations of mass centers of the three test pedestrians were obtained under their free walking and the number of test pedestrians and test conditions were inadequate, which made it impossible to demonstrate the differences of the walking forces of testers with different human characteristic parameters. According to previous research works based on bipedal walking force model, in this work, human body was considered as a single particle. A three-axis wireless acceleration sensor measured human body motion accelerations at the center of mass and two legs. Furthermore, the dynamic load factor of pedestrian walking load, which is a fundamental parameter in describing harmonic amplitude, could be obtained from a large number of experimental data. Using the walking load model developed in this work, the reliable structure response of human-induced vibration analysis can be obtained. Since pedestrian load can be regarded as impact load or periodic load, the results of this experiment can provide a standardized load model for the response of bridge structure under impact load ( Yang et al., 2021a ) or periodic load ( Yang et al., 2022b ). In addition, the experiments described in this paper provided test results for the formulation of regulations related to structural vibrations in China and provided technical support for vibration displacement monitoring of long-span structure ( Lu et al., 2020 ) and composite structures ( Yang et al., 2021b ).

In this paper, an experimental investigation was performed on pedestrian walking and crowd loads and a corresponding mechanical model was developed. We performed theoretical studies using bipedal walking force model. In experimental investigations, out single-person and multi-person walking tests were performed at Yanluo footbridge and tested corresponding stride frequency and dynamic load factor were evaluated. With three-axis wireless acceleration sensor Xnode, the acceleration time-histories of pedestrian walking were accurately recorded. Also, dynamic load factor equation was obtained by analyzing time-histories and power spectra. Finally, design models were established for pedestrian walking and crowd loads based on a huge number of experimental data.

Bipedal Walking Force Model

Several pedestrian models have been developed to study pedestrian load ( Wei and Griffin. 1998 ; Živanovic et al., 2005 ; Racic et al., 2009 ; Venuti and Bruno 2009 ; Bocian et al., 2013 ; Han et al., 2017 ; Gao et al., 2018 ; Han et al., 2021 ). Bipedal force model was established by ( Geyer et al., 2006 ) based on spring-mass model taking into account the gait characteristics supported by bipedal during walking. As shown in Figure 1 , this model ignored foot rotation and simplified human body into a single-particle system. Legs were simplified as two independent massless springs supported on point mass. Both springs had stiffness, rest length, and constant angle relative to gravity (g is gravity acceleration) during follow-up phase. The figure showed a complete gait, including two stages of monopod support and bipedal support. Bipedal stage was from touching the ground with the heel of the right foot (TD) to lifting the tip of the left foot off the ground (TO). The leg which is pulled off the ground from the tip of the foot until the heel is landed again during walking is known as trailing leg, while the other leg is called leading leg. Geyer proved that the model could simulate vertical walking force with bimodal properties and vertical displacement of the center of mass.

www.frontiersin.org

FIGURE 1 . Biped walking force model (Gurney et al.).

( Qin et al., 2013 ) introduced damping parameters on the basis of Geyer’s bipedal model taking into account human-structure interaction. As shown in Figure 2 , the two legs of a person were simplified as massless springs and damping rods which moved independently. In the process of moving, two spring legs absorbed the generated impact energy when they touched the ground (impact angle) simultaneously providing propulsion power. In the case of single-foot support, trailing leg was not in contact with ground. At this moment, spring leg was in original state, under which elastic recovery and damping forces were zero. Meanwhile, leading leg was in contact with ground. Accordingly, spring was in compressed state, so that elastic recovery and damping forces balanced each other with gravity and inertia forces of human body.

www.frontiersin.org

FIGURE 2 . Damped biped walking force model (Chen et al.).

Principle of Test Design

Bipedal model of biomechanics simplified human body as a point mass. Accordingly, two legs of human body were simplified as a massless system consisting of two springs and damping forces. The elastic and damping forces generated by pedestrians when touching the ground in motion was in balance with gravity and inertia forces of human body. As presented in Figure 3 , human body mass was represented by the mass m h of the center of mass. In bipedal model, human body and ground were regarded as a whole for force analysis. When the influence of air damping force was ignored, the balance equations of the gravity, inertia force and ground reaction force of human body was derived as

where z ¨ and y ¨ are vertical and longitudinal accelerations of M, respectively, F ′ z and F ′ y denote vertical and longitudinal reactions of ground, respectively, and g is acceleration due to gravity.

www.frontiersin.org

FIGURE 3 . Mechanical analysis.

According to the derived balance equation, a three-axis acceleration sensor was fixed on pedestrian waist to measure the vertical and longitudinal accelerations of the pedestrian when walking. Therefore, ground reaction force could be calculated. The walking force of pedestrian was obtained based on action-reaction relationship as

Furthermore, by fixing the three-axis acceleration sensor on the left or right legs of the pedestrian, the vertical acceleration of pedestrians’ walking when his left or right leg touched the ground could be effectively measured.

Pedestrian walking load is difficult to predict and its frequency and magnitude can significantly change. Walking is generally characterized as regular predominantly horizontal human body motion whereby at least one foot is always in contact with ground in a frequency ranging from 1.4 to 2.5 Hz ( Jones et al., 2011 ). Typical Fourier series was used to represent periodic pedestrian walking load, which was expressed as

where G is human body weight and F w ( t ) is the dynamic component of pedestrian walking load stated as

where f w is the stride frequency of walking pedestrian A 1 , A 2 and A 3 are the first, second and third dynamic load factors and ϕ 1 , ϕ 2 and ϕ 3 are the three associated phase lags in radians, respectively. According to reference [30], the values of these factors were

Therefore, in order to develop pedestrian walking load model, first dynamic load factor had to be evaluated. In addition, the expansion of pedestrian walking load models calibrated for individuals into models for crowd loads is an important subject. According to Eq. 5 , crowd load model could be expressed as

where Q is the weight of associated crowd. In experiments, both first dynamic load factor A 1 and stride frequency f w of the crowd load had to be tested.

Experimental Researches

The major instruments adopted in this work included three-axis wireless acceleration sensor, gateway node the computer with a corresponding terminal program installed. Terminal program enabled the computer to communicate with Xnode gateway node, as shown in Figure 4 . Embedor’s proprietary synchronized distributed sensing framework could precisely deliver synchronized sensed data from thousands of distributed sensor channels. Wireless communication protocol between Xnodes and Gateways enabled highly accurate time synchronization with precision of 50-microsecond and ensured reliable and lossless data transfer under any operating conditions. Furthermore, each Xnode could be configured either as a sensor node or as a Gateway to coordinate and maintain wireless transmissions across a network of distributed wireless sensor nodes. This modular and versatile sensor platform enabled wireless data acquisition and processing for data-intensive applications (high resolution and high sampling rate) such as structural health monitoring, manufacturing and monitoring of industrial equipment, and seismic sensing. Sensor board employed a 24-bit ADC (Texas Instruments ADS131E8) with eight channels allowing maximum sampling rates up to 16 kHz. The device was equipped with an ultra-compact low power triaxial accelerometer and technical parameters are summarized in Table 1 . Xnode wireless triaxial acceleration sensor could obtain acceleration along three directions of any position in human body. Studies have shown that human body mass center was closest to waist abdomen. Therefore, in experiments, three sensors were placed on tester’s back, waist, and two legs. The X -axis of sensor was aligned with the vertical direction of human body to obtain vertical acceleration of mass center, Y -axis was aligned with horizontal direction of human body, and Z -axis was aligned with forward direction.

www.frontiersin.org

FIGURE 4 . Xnode wireless sensor.

www.frontiersin.org

TABLE 1 . Xnode performance parameters.

Finite Element Analysis

Yanluo Footbridge in Shenzhen with 102 m length and level ground was selected as test site, as shown in Figure 5 . Yanluo Footbridge is a steel foot bridge and was designed for Foxconn workers to connect dormitory and work areas. Therefore, the flow of people was relatively large and people walked in a hurry. Using MIDAS Civil for eigenvalue analysis, natural vibration characteristics of the first 3 orders were calculated. The frequency and period of natural vibration are summarized in Table 2 and the first 3 modal diagrams are shown in Figure 6 .

www.frontiersin.org

FIGURE 5 . Yanluo footbridge.

www.frontiersin.org

TABLE 2 . The first 10 natural vibration frequencies and periods.

www.frontiersin.org

FIGURE 6 . Diagram of the first 3 modes: (A) 1 order vibration mode; (B) 2 order vibration mode; (C) 3 order vibration mode.

When the frequency of the pedestrian load is close to the natural vibration frequency of the footbridge, the pedestrian bridge resonance may occur. It could be seen from Figure 6 that the third-order natural vibration frequency of the modified bridge was relatively close to pedestrian stride frequency but further analysis was required.

Walking Force Test for the Single Person

To avoid pedestrian interference, tests were performed at night. For accurate investigation, two men and two women without walking defects were adopted for the test. The physical characteristics of the testers were collected and recorded before the test, as shown in Table 3 . In order to simultaneously observe leg acceleration change during walking process, three wireless acceleration sensors were attached to the waist and front side of the left and right thighs of the testers through bandages.

www.frontiersin.org

TABLE 3 . Physical characteristic values of testers.

In experiments, fixed walking frequencies of 1.4, 1.6, 1.8, 2.0, 2.2, and 2.5 Hz and two groups of random walking frequencies were selected in all eight working conditions, in which a metronome controlled fixed walking frequency. After the sensor was fixed, testers performed adaptive walking training until they could adapt to normal walking instrument. After turning on the child node switch and waiting for the gateway node to set out the test instruction, the tester followed the metronome to walk uniformly along the test route. After finishing fixed cadence test, testers walked along a random uniform straight line with cadence according to their walking habits. During the tests, step number and walking time of each tester was recorded and processed; the obtained test results are summarized in Tables 4 , Table 5 , Table 6 , and Table 7 .

www.frontiersin.org

TABLE 4 . Experimental data of tester 1 walking at different step frequencies.

www.frontiersin.org

TABLE 5 . Experimental data of tester 2 walking at different step frequencies.

www.frontiersin.org

TABLE 6 . Experimental data of tester 3 walking at different step frequencies.

www.frontiersin.org

TABLE 7 . Experimental data of tester 4 walking at different step frequencies.

From the experimental data given in Tables 4 , Table 5 , Table 6 , and Table 7 , it was seen that the walking speeds of four testers were gradually increased with the increase of stride frequency, which was consistent with theorical findings. By calculation, the average stride frequency of four test testers for a total of 8 free walks was 1.8224 Hz, which was consistent with the average stride frequency of 1.82 Hz experimentally obtained for more than 2,000 students in reference [31]. At the same time, average free walking speed of the four testers was calculated to be 1.288 m/s, which was also consistent with the findings of previous studies. Comparing the data from the four groups, it was found that the average walking speed of the two men was greater than that of the two women when walking freely. Step length of men was longer because they had longer legs than women. Therefore, walking speed of men was much greater than women at the same walking frequency since its value was equal to the product of step length and stride frequency.

Multi-Person Walking Test

Selection of the site and time of tests was consistent with single-person walking load test procedure. In addition to the four testers who were evaluated in single-person walking force tests, two additional testers with no movement disorder and different heights and weights were selected for multi-person tests. The physical characteristics of the testers are summarized in Table 8 .

www.frontiersin.org

TABLE 8 . Physical characteristics of testers.

Unlike walking load test conditions of individual pedestrians, it was necessary to conduct synchronous adaptability training before tests and attach an acceleration sensor on the back of the waist on each tester during walking tests. First of all, two testers were required to walk in a row less than 60 cm from each other because longer intervals could reduce the test to a one-pedestrian walking test. After several tests on the walking conditions of two people standing in a row, the testers walked side by side in two rows again. Finally, tests with 4 and 6 people were performed in the same manner. Walking time and the number of steps of the testers under all conditions were recorded throughout multi-person walking tests, as presented in Table 9 .

www.frontiersin.org

TABLE 9 . Experimental data of synchronized multi-person walking.

As summarized in Table 5 , Table 6 , Table 7 , Table 8 , and Table 9 both walking time and the number of steps were increased with the increase of the number of testers, no matter they walked in one or two columns, which indicated that walking speed and step length of pedestrians crossing the bridge were decreased by increasing crowd density under normal circumstances. Total walking steps and walking time in the opposite direction were smaller than those for one or two lines, which might be because the testers were far apart before the intersection in the opposite direction without interfering with each other.

Analysis of Test Results

Time history analysis.

Dynamic load factor (DLF) of pedestrian walking load, also known as the first harmonic dynamic load factor, is a basic parameter describing the harmonic amplitude of dynamic load. Dynamic load amplitude was calculated as the product of pedestrian weight and dynamic load factor. To illustrate the effects of stride frequency and measuring position of human body on time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G of tester 1 were plotted for different stride frequencies and measuring positions, as shown in Figure 7 , where stride frequency f w was set at 1.4 and 1.8 Hz. Measuring positions were centroid, left leg and right leg.

www.frontiersin.org

FIGURE 7 . Time histories of non-dimensional pedestrian walking loads for different stride frequencies and measuring positions: (A) 1.4Hz; (B) 1.8 Hz.

It was seen from Figure 7 that when measuring position was in centroid, time history showed regular periodic changes, where the period of the time history curve with stride frequency 1.4 Hz was longer than that with stride frequency 1.8 Hz. It was also seen that when measuring position was in left and right legs, the half period of time history curve showed regular periodic changes and the other half showed irregularity. This indicated that the irregular half period of time history denoted one foot off the ground and the regular half period indicated that one foot or two feet touched the ground. In addition, DLF, defined as the amplitude of time history curve of non-dimensional pedestrian walking load, also shown in Figure 7 , for stride frequency 1.4 Hz was smaller than that for stride frequency 1.8 Hz. Also, the DLF value obtained for measuring positions of two legs was higher than that with measure position located in centroid. In addition, to illustrate human weight effects on time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G were plotted for testers with weights G = 686, 549, 519 and 441N, and stride frequency f w = 1.6 Hz, as shown in Figure 8 .

www.frontiersin.org

FIGURE 8 . Time histories of non-dimensional pedestrian walking loads for different testers.

It was seen from Figure 8 that the amplitude of time history curves for the four testers had little differences and the tester with weight of G = 686N had the highest value. This indicated that the amplitude of time history curve was slightly increased by increasing in human weight. Therefore, the effects of human weight in pedestrian walking load model were ignored. Furthermore, to illustrate the effects of multi-person walking on the time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G were plotted, as shown in Figure 9 for testers 1, 2, 3, and 6 walking together. It was seen from the figure that, when pedestrians were walking together, the range of stride frequency f w of pedestrians was narrow; i.e. 1.7 Hz–1.9 Hz. It was also seen from Figure 9 that the amplitudes of time history curves of all testers were very close. This indicated that crowd loads had very narrow ranges of stride frequency and dynamic load factor. Therefore, according to test results, the stride frequency of crowd load could be considered to be 1.8 Hz.

www.frontiersin.org

FIGURE 9 . Time histories of non-dimensional pedestrian walking loads for multi-person walking.

Power Spectrum Analysis

In order to further investigate the DLF of pedestrian walking loads, power spectra of pedestrian walking loads have been analyzed in this section. To illustrate the effects of stride frequency and measuring position of human body on the power spectra and DLFs of pedestrian walking loads, the power spectrum of pedestrian walking load of tester 1 for centroid, left leg and right leg measuring positions were plotted for stride frequencies f w = 1.4 ( Figure 10A ), 1.6 Hz ( Figure 10B ), f w = 1.8 ( Figure 10C ) and 2.0 Hz ( Figure 10D ). It could be seen from Figure 10 that when measuring position was in centroid, the frequency segment near stride frequency presented a peak in corresponding power spectrum curve, which was considered as DLF A 1 , and increased by increasing stride frequency f w . When measuring position was in left or right leg, the frequency segments near 0.5 f w , f w and 1.5 f w presented three peaks in corresponding power spectrum curve, which were defined A 0.5 , A 1 and A 0.5 , respectively, and were increased by increasing stride frequency f w . Since the left and right legs were in ground touching status only for half time, the lowest frequency segment of power spectrum curve having a peak for measuring position being in the left or right leg is closed to 0.5 f w .

www.frontiersin.org

FIGURE 10 . Power spectra of pedestrian walking loads for stride frequencies f w of 1.4 and 2.0 Hz

To illustrate the effects of multi-person walking on the power spectra and DLFs of pedestrian walking loads, the power spectra of pedestrian walking loads were plotted for testers 1 and 2 walking in a line and side by side ( Figure 11 ), testers 1, 2, 3 and 6 walking in two lines ( Figure 12A ), testers 1, 2, 3 and 6 walking in one line ( Figure 12B ), testers 1–6 walking in two lines ( Figure 13A ), and testers 1–6 walking in one line ( Figure 13B ). Figure 11 , Figure 12 , and Figure 13 showed that when pedestrians walked together, the frequency segment near the stride frequency presented a peak in corresponding power spectrum curve, which was called DLF A 1 , and had a narrow range of about 0.168–0.254. It was also found that the stride frequencies f w of pedestrians walking together had a narrow range of about 1.782–1.904 Hz.

www.frontiersin.org

FIGURE 11 . Power spectra of pedestrian walking loads for two testers in a line or side by side.

www.frontiersin.org

FIGURE 12 . Power spectra of pedestrian walking loads for four testers.

www.frontiersin.org

FIGURE 13 . Power spectra of pedestrian walking loads for six testers.

Design Equations of Pedestrian Walking Loads and Crowd Loads

According to test results, design equations of pedestrian walking loads and crowd loads were built up in this section.

The Eq. 6 can be used to build the design equation of pedestrian walking loads, which can be given by

where A 1 is the DLF of pedestrian walking loads obtained from power spectrum analysis. In order to obtain the equation of the DLF of pedestrian walking loads A 1 , variations of A 1 obtained under various test conditions in the experiment with the stride frequency f w are shown in Figure 14 . In addition, the upper and lower bounds of the relation between A 1 and f w were obtained by linear fitting analysis, which is also shown in Figure 14 . For safety reason, the upper bounds were applied to derive the equation of the DLF of pedestrian walking loads as:

www.frontiersin.org

FIGURE 14 . The relation between A1 and fw as well as corresponding fitting curve.

Furthermore, Eq. 8 could be used to derive the design equation of crowd loads, which was stated as

The stride frequency f w and DLF A 1 of the crowd loads were obtained from the mid-value of test data. According to Figure 11 , Figure 12 , and Figure 13 , the mid-value of the stride frequency f w of crowd loads test was 1.843Hz, and that of the DLF A 1 of crowd loads test was 0.211. Therefore, the design equation of crowd loads was expressed as

This paper presented an experimental investigation on pedestrian walking and crowd loads for Yanluo foot bridge. A theoretical study was carried out using a bipedal walking force model. In the experimental investigation, both single-person and multi-person walking tests were performed and corresponding stride frequencies and dynamic load factors were evaluated, respectively. The average frequency and walking speed of four testers in individual pedestrian tests for eight free walks were consistent with those reported in previous studies, which verified the reliability of the tests. In addition, design equations of pedestrian walking loads and crowd loads was obtained by the analysis of time-histories and power spectra. It was found that there was a similarity in variation rules of walking forces of different testers, but due to differences in walking habits and human characteristic parameters (such as height, weight, leg length, etc.), walking force could change. It was also found that dynamic load factor was increased by increasing stride frequency, and the mid-value of the stride frequency f w of crowd loads test was 1.843 Hz, and that of the DLF A 1 of crowd loads test was 0.211.

The experiments described in this paper provide a standardized load model for the response of bridge structure under impact load or periodic load, provided test results for the formulation of regulations related to structural vibrations in China and provided technical support for vibration displacement monitoring of long-span structure and composite structures.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

Author Contributions

DD, XZ, and HL contributed to conception and design of the study. ZW organized the database. DD performed the statistical analysis. DD and XZ wrote the first draft of the manuscript. DD, ZW, XZ, and HL wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Conflict of Interest

Author DD is employed by China Construction Steel Engineering Co., Ltd., Author ZW is employed by China Railway Guangzhou Group Co., Ltd., Author XZ is employed by Dongguan Hongchuan Intelligent Logistics Development Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors would like to thank all participants who provided support for our approach.

Ailisto, H. J., Lindholm, M., Mantyjarvi, J., and Mäkelä, S. M. (2005). “Identifying People from Gait Pattern with Accelerometers,” in Proceedings of SPIE - The International Society for Optical Engineering , 28 March 2005 (Orlando, Florida, United States: SPIE ), 5779. doi:10.1117/12.603331

CrossRef Full Text | Google Scholar

Bachus, K. N., Demarco, A. L., Judd, K. T., Horwitz, D. S., and Brodke, D. S. (2006). Measuring Contact Area, Force, and Pressure for Bioengineering Applications: Using Fuji Film and TekScan Systems. Med. Eng. Phys. 28 (5), 483–488. doi:10.1016/j.medengphy.2005.07.022

PubMed Abstract | CrossRef Full Text | Google Scholar

Bocian, M., Macdonald, J. H. G., and Burn, J. F. (2013). Biomechanically Inspired Modeling of Pedestrian-Induced Vertical Self-Excited Forces. J. Bridge Eng. 18, 1336–1346. doi:10.1061/(asce)be.1943-5592.0000490

Chen, J., Han, Z., and Brownjhn, J. (2019). Human Shaker Modal Testing Technology via Wearable Inertial Measurement Units. J. Vib. Eng. 32 (4), 644–652. doi:10.16385/j.cnki.issn.1004-4523.201904.011

Chen, Z., Chen, Z., Zhang, X., Huang, S., and Chen, Z. (2022). Dynamic Response and Vibration Reduction of Steel Truss Corridor Pedestrian Bridge Under Pedestrian Load. Front. Mater. , 31.

Google Scholar

Ebrahimpour, A., Hamam, A., Sack, R. L., and Patten, W. N. (1996). Measuring and Modeling Dynamic Loads Imposed by Moving Crowds. J. Struct. Eng. 122 (12), 1468–1474. doi:10.1061/(asce)0733-9445(1996)122:12(1468)

Gao, Y. A., Yang, Q. S., and Dong, Y. (2018). A Three-Dimensional Pedestrian-Structure Interaction Model for General Applications. Int. J. Struct. Stable. Dyn. 18 (9), 1850107. doi:10.1142/s0219455418501079

Geyer, H., Seyfarth, A., and Blickhan, R. (2006). Compliant Leg Behaviour Explains Basic Dynamics of Walking and Running. Proc. R. Soc. B 273 (1603), 2861–2867. doi:10.1098/rspb.2006.3637

Gurney, J. K., Kersting, U. G., and Rosenbaum, D. (2008). Between-Day Reliability of Repeated Plantar Pressure Distribution Measurements in a Normal Population. Gait Posture 27 (4), 706–709. doi:10.1016/j.gaitpost.2007.07.002

Han, H. X., Zhou, D., and Ji, T. (2017). Mechanical Parameters of Standing Body and Applications in Human-Structure Interaction. Int. J. Appl. Mech. 9 (2), 1750021. doi:10.1142/s1758825117500211

Han, H., Zhou, D., Ji, T., and Zhang, J. (2021). Modelling of Lateral Forces Generated by Pedestrians Walking Across Footbridges. Appl. Math. Model. 89, 1775–1791. doi:10.1016/j.apm.2020.08.081

Harper, F. C., Warlow, W. J., and Clarke, B. L. (1961). The Forces Applied to the Floor by the Foot in Walking , London: Department of Scientific and Industrial Research. Building Research Station , 495–497.

Jones, C. A., Reynolds, P., and Pavic, A. (2011). Vibration Serviceability of Stadia Structures Subjected to Dynamic Crowd Loads: A Literature Review. J. Sound Vib. 330 (8), 1531–1566. doi:10.1016/j.jsv.2010.10.032

Khan, A. M., Young-Koo Lee, Y. K., Lee, S. Y., and Tae-Seong, K. (2010). A Triaxial Accelerometer-Based Physical-Activity Recognition via Augmented-Signal Features and a Hierarchical Recognizer. IEEE Trans. Inf. Technol. Biomed. 14 (5), 1166–1172. doi:10.1109/titb.2010.2051955

Lu, H., Liu, L., Liu, A., Pi, Y.-L., Bradford, M. A., and Huang, Y. (2020). Effects of Movement and Rotation of Supports on Nonlinear Instability of Fixed Shallow Arches. Thin-Walled Struct. 155, 106909. doi:10.1016/j.tws.2020.106909

Lu, H., Zhou, J., Sahmani, S., and Safaei, B. (2021a). Nonlinear Stability of Axially Compressed Couple Stress-Based Composite Micropanels Reinforced with Random Checkerboard Nanofillers. Phys. Scr. 96 (12), 125703. doi:10.1088/1402-4896/ac1d7f

Lu, H., Zhou, J., Yang, Z., Liu, A., and Zhu, J. (2021b). Nonlinear Buckling of Fixed Functionally Graded Material Arches Under a Locally Uniformly Distributed Radial Load. Front. Mater. 8, 310. doi:10.3389/fmats.2021.731627

Mohammed, A., and Pavic, A. (2021). Human-Structure Dynamic Interaction Between Building Floors and Walking Occupants in Vertical Direction. Mech. Syst. Signal Process. 147, 107036. doi:10.1016/j.ymssp.2020.107036

Paweł, H., Roberto, P., Rafaela, S., and Silva, F. (2021). Vertical Vibrations of Footbridges Due to Group Loading: Effect of Pedestrian-Structure Interaction. Appl. Sci. 11, 1–16. doi:10.3390/app11041355

Qin, J. W., Law, S. S., Yang, Q. S., and Yang, N. (2013). Pedestrian-Bridge Dynamic Interaction, Including Human Participation. J. Sound Vib. 332 (4), 1107–1124. doi:10.1016/j.jsv.2012.09.021

Racic, V., Pavic, A., and Brownjohn, J. M. W. (2009). Experimental Identification and Analytical Modelling of Human Walking Forces: Literature Review. J. Sound Vib. 326, 1–49. doi:10.1016/j.jsv.2009.04.020

Venuti, F., and Bruno, L. (2009). Crowd-Structure Interaction in Lively Footbridges Under Synchronous Lateral Excitation: A Literature Review. Phys. Life Rev. 6, 176–206. doi:10.1016/j.plrev.2009.07.001

Wang, Q., Song, Z. G., and Wang, Z. Y. (2019). Tests for Measuring Vertical Pedestrian Loads Using Acceleration Sensors. J. Vib. Shock 38 (1), 215–220. doi:10.13465/j.cnki.jvs.2019.01.031

Wei, L., and Griffin, M. J. (1998). Mathematical Models for the Apparent Mass of the Seated Human Body Exposed to Vertical Vibration. J. Sound Vib. 212, 855–874. doi:10.1006/jsvi.1997.1473

Xiong, J., Chen, J., and Caprani, C. (2021). Spectral Analysis of Human-Structure Interaction During Crowd Jumping. Appl. Math. Model. 89, 610–626. doi:10.1016/j.apm.2020.07.030

Yang, Z., Liu, A., Lai, S.-K., Safaei, B., Lv, J., Huang, Y., et al. (2022a). Thermally Induced Instability on Asymmetric Buckling Analysis of Pinned-Fixed FG-GPLRC Arches. Eng. Struct. 250, 113243. doi:10.1016/j.engstruct.2021.113243

Yang, Z., Lu, H., Sahmani, S., and Safaei, B. (2021b). Isogeometric Couple Stress Continuum-Based Linear and Nonlinear Flexural Responses of Functionally Graded Composite Microplates with Variable Thickness. Archives Civ. Mech. Eng. 21 (3), 1–19. doi:10.1007/s43452-021-00264-w

Yang, Z., Safaei, B., Sahmani, S., and Zhang, Y. (2022b). A Couple-Stress-Based Moving Kriging Meshfree Shell Model for Axial Postbuckling Analysis of Random Checkerboard Composite Cylindrical Microshells. Thin-Walled Struct. 170, 108631. doi:10.1016/j.tws.2021.108631

Yang, Z., Wu, D., Yang, J., Lai, S. K., Lv, J., Liu, A., et al. (2021a). Dynamic Buckling of Rotationally Restrained FG Porous Arches Reinforced with Graphene Nanoplatelets Under a Uniform Step Load. Thin-Walled Struct. 166, 08103. doi:10.1016/j.tws.2021.108103

Zhu, Q. K., Chen, K., Du, Y. F., Jach, M., and Drodza, M. (2018). A Pedestrian up and Down Stairs Biodynamic Model Based on the Measured Data. J. Vib. Shock 37 (4), 233–239. doi:10.1155/2020/8015465

Živanović, S., Pavic, A., and Reynolds, P. (2005). Vibration Serviceability of Footbridges under Human-Induced Excitation: a Literature Review. J. Sound Vib. 279, 1–74. doi:10.1016/j.jsv.2004.01.019

Keywords: bipedal walking force model, pedestrian walking load, wireless sensor, stride frequency, dynamic load factor

Citation: Deng D, Wang Z, Zhang X and Lin H (2022) Experimental Investigation on Pedestrian Walking Load in Steel Footbridges. Front. Mater. 9:922545. doi: 10.3389/fmats.2022.922545

Received: 18 April 2022; Accepted: 16 May 2022; Published: 16 June 2022.

Reviewed by:

Copyright © 2022 Deng, Wang, Zhang and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Deyuan Deng, [email protected]

This article is part of the Research Topic

Advanced Steel and Composite Structures in Civil Engineering Volume II

  • SI SWIMSUIT
  • SI SPORTSBOOK
  • MEN'S BASKETBALL

Could a Kentucky Wildcat win SEC Player of the Year this season?

Andrew stefaniak | 2 hours ago.

Mar 12, 2023; Brooklyn, NY, USA;  Dayton Flyers guard Koby Brea (4) waves to the crowd after a timeout is called by the Virginia Commonwealth Rams in the first half at Barclays Center. Mandatory Credit: Wendell Cruz-USA TODAY Sports

  • Kentucky Wildcats

This Kentucky basketball team that Coach Mark Pope has put together is going to have a lot of veteran players who have played lots of minutes during their careers.

The question is, do any of the players on this roster have a chance to win SEC Player of the Year?

It seems like there are only two players who could win the SEC Player of the Year Award for the Wildcats and they are Andrew Carr and Koby Brea. Their chances are still very small, but we have seen players win this award before that no one expected.

Some might say Amari Williams has a chance at this award, but more than likely, he won't score enough points to win it this season. Williams does have a solid chance to bring home the Defensive Player of the Year Award in the SEC this season if he continues to block shots and protect the rim as he did at Drexel.

Carr and Brea are the two players who would have a chance to win this award, as they are the two with the best chance to lead the Wildcats in scoring.

With players like Mark Sears and Johni Broome in the SEC, it would likely take 20 points per game from Carr or Brea to win this award.

This is possible for the transfers but would take a lot. More than likely, these two with both average around 15 points per contest, which would make it hard to win the SEC Player of the Year Award. But if a Wildcat were to win this award, it would most likely be Carr or Brea.

Andrew Stefaniak

ANDREW STEFANIAK

IMAGES

  1. Experimental design for 3 factors

    experimental 3 factors

  2. 3: Experimental Factors

    experimental 3 factors

  3. Experimental factors with three levels.

    experimental 3 factors

  4. Experimental design for 3 factors

    experimental 3 factors

  5. The 3 Types Of Experimental Design (2024)

    experimental 3 factors

  6. Experimental Conditions Combining Three Factors

    experimental 3 factors

VIDEO

  1. 03 3 Factor Designed Experiment

  2. Design of Experiments, Lecture 14: 3k Full Factorial Designs

  3. 3 Factors Factorial Design on SPSS

  4. Interactive module: finding factors

  5. Experimental Bedini 3 phase dc pulse motor chassis

  6. Statistical Analysis of 2^3 Factorial Experiment

COMMENTS

  1. Components of an experimental study design

    1. In a study of the effects of colors and prices on sales of cars, the factors being studied are color (qualitative variable) and price (quantitative variable). 2. In an investigation of the effects of education on income, the factor being studied is education level (qualitative but ordinal). Factor levels.

  2. Guide to Experimental Design

    Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.

  3. Experimental Design: Definition and Types

    A standard guideline for an experimental design is to "Block what you can, randomize what you cannot." Use blocking for a few primary nuisance factors. Then use random assignment to distribute the unblocked nuisance factors equally between the experimental conditions. You can also use covariates to control nuisance factors.

  4. PDF Design and Analysis of Experiments

    Definitions Factor - A variable under the control of the experimenter. Factors are explanatory variables. A factor has 2 or more levels. Treatment - The combination of experimental conditions applied to an experimental unit. Response - The outcome being measured. Experimental unit - The unit to which the treatment is applied. Observational unit - The unit on which the response is

  5. 1.3

    The practical steps needed for planning and conducting an experiment include: recognizing the goal of the experiment, choice of factors, choice of response, choice of the design, analysis and then drawing conclusions. This pretty much covers the steps involved in the scientific method. What this course will deal with primarily is the choice of ...

  6. 5.3.3.9. Three-level full factorial designs

    The three-level design is written as a 3 k factorial design. It means that k factors are considered, each at 3 levels. These are (usually) referred to as low, intermediate and high levels. These levels are numerically expressed as 0, 1, and 2. One could have considered the digits -1, 0, and +1, but this may be confusing with respect to the 2 ...

  7. Experimental Design

    Each of the independent variables is called a factor, and each factor has two levels (yes or no). As this experiment has 3 factors with 2 levels, this is a 2 x 2 x 2 = 2 3 factorial design. An experiment with 3 factors and 3 levels would be a 3 3 factorial design and an experiment with 2 factors and 3 levels would be a 3 2 factorial design.

  8. Experimental Design

    Experimental Design. Experimental design is a process of planning and conducting scientific experiments to investigate a hypothesis or research question. It involves carefully designing an experiment that can test the hypothesis, and controlling for other variables that may influence the results. Experimental design typically includes ...

  9. What Is Design of Experiments (DOE)?

    Design of experiments (DOE) is defined as a branch of applied statistics that deals with planning, conducting, analyzing, and interpreting controlled tests to evaluate the factors that control the value of a parameter or group of parameters. DOE is a powerful data collection and analysis tool that can be used in a variety of experimental ...

  10. 3.3

    In experimental design terminology, factors are variables that are controlled and varied during the course of the experiment. For example, treatment is a factor in a clinical trial with experimental units randomized to treatment. Another example is pressure and temperature as factors in a chemical experiment. Most clinical trials are structured ...

  11. Design of Experiments with Multiple Independent Variables: A Resource

    The subset of experimental conditions from the complete three-factor factorial experiment in Table 1 that would be implemented in the individual experiments approach is depicted in the first section of Table 2. This design, considered as a whole, is not balanced. Each of the independent variables is set to On once and set to Off five times.

  12. 5.8.5. Example: design and analysis of a three-factor experiment

    The average CS interaction is therefore ( − 13 − 14) / 2 = − 13.5. You can interchange C and S and still get the same result. For the ST interaction, there are two estimates of S T: ( − 1 + 0) / 2 = − 0.5. Calculate in the same way as above. Calculate the single three-factor interaction (3fi).

  13. Statistical Design of Experiments (DoE)

    The full factorial experiment design with the three factors A, B, and C consists of 2 3 = 8 factor-level combinations. These factor-level combinations are used to calculate the main effects of factors A, B, and C, their two-way interaction (i.e., AB, AC, and BC), as well as their three-way interaction (ABC). Refer to Table 1.3 in this case.

  14. 3.1: Factorial Designs

    Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure 3.1.1 3.1. 1. The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible ...

  15. Factorial experiment

    In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design.

  16. A Complete Guide: The 2x3 Factorial Design

    A 2×3 factorial design is a type of experimental design that allows researchers to understand the effects of two independent variables on a single dependent variable.. In this type of design, one independent variable has two levels and the other independent variable has three levels.. For example, suppose a botanist wants to understand the effects of sunlight (low vs. medium vs. high) and ...

  17. Factors in an Experiment

    Experimental factors are those that you can specify and set yourself. For example, the maximum temperature to which you can heat a solution. Classification factors can't be specified or set, but they can be recognized and your samples selected accordingly. For example, a person's age or gender. Treatment factors are those which are of ...

  18. 5.3.3.10. Three-level, mixed-level and fractional factorial designs

    Three-level, mixed-level and fractional factorial designs. Mixed level designs have some factors with, say, 2 levels, and some with 3 levels or 4 levels. The 2 k and 3 k experiments are special cases of factorial designs. In a factorial design, one obtains data at every combination of the levels. The importance of factorial designs, especially ...

  19. Factorial

    The analysis begins with a two-level, three-variable experimental design - also written 23 2 3, with n = 2 n = 2 levels for each factor, k = 3 k = 3 different factors. We start by encoding each fo the three variables to something generic: (x1,x2,x3) ( x 1, x 2, x 3). A dataframe with input variable values is then populated.

  20. An Experimental Design for Three Factors at Three Levels

    Abstract. IN an experiment with three factors at three levels each, the experimenter may be willing to sacrifice information on certain components of the two-factor interactions and to ignore the ...

  21. Full Factorial ANOVA

    The number of treatment groups is the product of factor levels. Experimental units are randomly selected from a known population. Each experimental unit is randomly assigned to one, and only one, treatment group. ... The observed value of the F ratio for Factor B is 3.33. Since Factor B is a fixed effect, the F ratio (F B) was computed from the ...

  22. Lesson 9: 3-level and Mixed-level Factorials and Fractional Factorials

    The two components will be defined as a linear combination as follows, where X 1 is the level of factor A and X 2 is the level of factor B using the {0,1,2} coding system. Let the A B component be defined as. L A B = X 1 + X 2 ( m o d 3) and the A B 2 component will be defined as: L A B 2 = X 1 + 2 X 2 ( m o d 3) Using these definitions we can ...

  23. 5.3. Choosing an experimental design

    Choosing an experimental design. Contents of Section 3. This section describes in detail the process of choosing an experimental design to obtain the results you need. The basic designs an engineer needs to know about are described in detail. Note that this section describes the basic designs used for most engineering and scientific applications.

  24. An explanatory study of factors influencing engagement in AI ...

    By including these factors, an expanded model is presented to capture the complexity of student engagement with AI education. ... (3) Proposing an improved experimental methodology based on the ...

  25. Survival trend and outcome prediction for pediatric Hodgkin ...

    Pediatric Hodgkin and non-Hodgkin lymphomas differ from adult cases in biology and management, yet there is a lack of survival analysis tailored to pediatric lymphoma. We analyzed lymphoma data from 1975 to 2018, comparing survival trends between 7,871 pediatric and 226,211 adult patients, identified key risk factors for pediatric lymphoma survival, developed a predictive nomogram, and ...

  26. Introduction to Experimental Psychology

    3440 Market Street, Suite 450 Philadelphia, PA 19104-3335 (215) 746-2309 [email protected]

  27. Mitigation of experimental ER stress and diabetes mellitus induced

    BioFactors is an international journal aimed at identifying and increasing our understanding of the precise biochemical effects and roles of the large number of trace substances that are required by living organisms. These include vitamins and trace elements, as well as growth factors and regulatory substances made by cells themselves. The elucidation, in a particular organism or cell line, of ...

  28. Combination of induced pluripotent stem cell-derived motor neuron

    Background Spinal cord injury (SCI) is a disease that causes permanent impairment of motor, sensory, and autonomic nervous system functions. Stem cell transplantation for neuron regeneration is a promising strategic treatment for SCI. However, selecting stem cell sources and cell transplantation based on experimental evidence is required. Therefore, this study aimed to investigate the efficacy ...

  29. Frontiers

    where Q is the weight of associated crowd. In experiments, both first dynamic load factor A 1 and stride frequency f w of the crowd load had to be tested.. Experimental Researches. The major instruments adopted in this work included three-axis wireless acceleration sensor, gateway node the computer with a corresponding terminal program installed.

  30. Could a Kentucky Wildcat win SEC Player of the Year this season?

    Kentucky basketball has to land these three 2025 recruits . Who will be the x-factor for the 2024-25 Kentucky basketball team? Kentucky will have an elite frontcourt during the 2024-25 season.