(see ):
In the complete factorial experiment, breath , choose , prep , and notes were significant. The true main effect of stakes was small; with N = 320 this design had little power to detect it. Audience was marginally significant at α = .15, although the data were generated with this effect set at exactly zero. In the individual experiments approach, only choose was significant, and breath was marginally significant. The results for the comparative treatment experiment were similar to those of the individual experiments approach, as would be expected given that the two have identical aliasing. An additional effect was marginally significant in the comparative treatment approach, reflecting the additional statistical power associated with this design as compared to the individual experiments approach. In the constructive treatment experiment none of the factors were significant at α = .05. There were two marginally significant effects, breath and notes .
In the Resolution III design every effect except prep was significant. One of these, the significant effect of audience , was a spurious result (probably caused by aliasing with the prepare × stakes interaction). By contrast, results of the Resolution IV and VI designs were very similar to those of the complete factorial, except that in the Resolution VI design stakes was significant. In the individual experiments and single factor approaches, the estimates of the coefficients varied considerably from the true values. In the fractional factorial designs the estimates of the coefficients tended to be closer to the true values, particularly in the Resolution IV and Resolution VI designs.
Table 8 shows estimates of interactions from the designs that enable such estimates, namely the complete factorial design and the Resolution IV and Resolution VI factorial designs. The breath × prep interaction was significant in all three designs. The breath × choose interaction was significant in the complete factorial and the Resolution VI fractional factorial but was estimated as zero in the Resolution IV design. In general the coefficients for these interactions were very similar across the three designs. An exception was the coefficient for the breath × choose interaction, and, to a lesser degree, the coefficient for the breath × notes interaction.
Interaction: breath× | audience | choose | prep | notes | stakes |
---|---|---|---|---|---|
Truth: | 0.00 | -0.15 | 0.25 | -0.15 | 0.00 |
Complete factorial | -0.03 | -0.25 | 0.29 | -0.07 | -0.03 |
Res. IV fractional | -0.03 | 0.00 | 0.29 | -0.16 | -0.02 |
Res. VI fractional | 0.02 | -0.25 | 0.29 | -0.07 | 0.04 |
Differences observed among the designs in estimates of coefficients are due to differences in aliasing plus a minor random disturbance due to reallocating the error terms when each new experiment was simulated, as described above. In general, more aliasing was associated with greater deviations from the true coefficient values. No effects were aliased in the complete factorial design, which had coefficient estimates closest to the true values. In the Resolution IV design each effect was aliased with three other effects, all of them interactions of three or more factors, and in the Resolution VI design each effect was aliased with one other effect, an interaction of four or more factors. These designs had coefficient estimates that were also very close to the true values. The Resolution III fractional factorial design, which aliased each effect with seven other effects, had coefficient estimates somewhat farther from the true values. The coefficient estimates associated with the individual and single factor approaches were farthest from the true values of the main effect coefficients. In the individual experiments and single factor approaches each effect was aliased with 15 other effects (the main effect of a factor was aliased with all the interactions involving that factor, from the two-way up to the six-way). The comparative treatment and constructive treatment approach aliased the same number of effects but differed in the coding of the aliased effects (as can be seen in Table 2 ), which is why their coefficient estimates differed.
Although the seven experiments had the same overall sample size N , they differed in statistical power. The complete and fractional factorial experiments, which had identical statistical power, were the most powerful. Next most powerful were the comparative treatment and constructive treatment designs. The individual experiments approach was the least powerful. These differences in statistical power, along with the differences in coefficient estimates, were reflected in the effects found significant at various levels of α across the designs. Among the designs examined here, the individual experiments approach and the two single factor designs showed the greatest disparities with the complete factorial.
Given the differences among them in aliasing, it is perhaps no surprise that these designs yielded different effect estimates and hypothesis tests. The research questions that motivate individual experiments and single factor designs, which often involve pairwise contrasts between individual experimental conditions, may not require estimation of main effects per se , so the relatively large differences between the coefficient estimates obtained using these designs and the true main effect coefficients may not be important. Instead, what may be more noteworthy is how few effects these designs detected as significant as compared to the factorial experiments.
Some overall recommendations.
Despite the situation-specific nature of most design decisions, it is possible to offer some general recommendations. When per-subject costs are high in relation to per-condition overhead costs, complete and fractional factorials are usually the most economical designs. When per-condition costs are high in relation to per-subject costs, usually either a fractional factorial or single factor design will be most economical. Which is most economical will depend on considerations such as the number of factors, the sample size required to achieve the desired statistical power, and the particular fractional factorial design being considered.
In the limited set of situations examined in this article, the individual experiments approach emerged as the least economical. Although the individual experiments approach requires many fewer experimental conditions than a complete factorial and usually requires fewer than a fractional factorial, it requires more experimental conditions than a single factor experiment. In addition, it makes the least efficient use of subjects of any of the designs considered in this article. Of course, an individual experiments approach is necessary whenever the results of one experiment must be obtained first in order to inform the design of a subsequent experiment. Except for this application, in general the individual experiments approach is likely to be the least appealing of the designs considered here. Investigators who are planning a series of individual experiments may wish to consider whether any of them can be combined to form a complete or fractional factorial experiment, or whether a single factor design can be used.
Although factorial experiments with more than two or three factors are currently relatively rare in psychology, we recommend that investigators give such designs serious consideration. All else being equal, the statistical power of a balanced factorial experiment to detect a main effect of a given size is not reduced by the presence of other factors, except to a small degree caused by the reduction of error degrees of freedom in the model. In other words, if main effects are of primary scientific interest and interactions are not of great concern, then factors can be added without needing to increase N appreciably.
An interest in interactions is not the only reason to consider using factorial designs; investigators may simply wish to take advantage of the economy these designs afford, even when interactions are expected to be negligible or are not of scientific interest. In particular, investigators who undergo high subject costs but relatively modest condition costs may find that a factorial experiment will be much more economical than other design alternatives. Investigators faced with an upper limit on the availability of subjects may even find that a factorial experiment enables them to investigate research questions that would otherwise have to be set aside for some time. As Oehlert (2000 , p. 171) explained, “[t]here are thus two times when you should use factorial treatment structure—when your factors interact, and when your factors do not interact.”
One of the objectives of this article has been to demonstrate that fractional factorial designs merit consideration for use in psychological research alongside other reduced designs and complete factorial designs. Previous authors have noted that fractional factorial designs may be useful in a variety of areas within the social and behavioral sciences ( Landsheer & van den Wittenboer, 2000 ) such as behavioral medicine (e.g. Allore, Peduzzi, Han, & Tinetti, 2006 ; Allore, Tinettia, Gill, & Peduzzi, 2005 ), marketing research (e.g. Holland & Cravens, 1973 ), epidemiology ( Taylor et al., 1994 ), education ( McLean, 1966 ), human factors ( Simon & Roscoe, 1984 ), and legal psychology ( Stolle, Robbennolt, Patry, & Penrod, 2002 ). Shaw (2004) and Shaw, Festing, Peers, & Furlong (2002) noted that factorial and fractional factorial designs can help to reduce the number of animals that must be used in laboratory research. Cutler, Penrod, and Martens (1987) used a large fractional factorial design to conduct an experiment studying the effect of context variables on the ability of participants to identify the perpetrator correctly in a video of a simulated robbery. Their experiment included 10 factors, with 128 experimental conditions, but only 290 subjects.
As discussed by Allore et al. (2006) , Collins, Murphy, Nair, and Strecher (2005) , Collins, Murphy, and Strecher (2007) , and West et al. (1993) , behavioral intervention scientists could build more potent interventions if there was more empirical evidence about which intervention components are contributing to program efficacy, which are not contributing, and which may be detracting from overall efficacy. However, as these authors note, generally behavioral interventions are designed a priori and then evaluated by means of the typical randomized controlled trial (RCT) consisting of a treatment group and a control group (e.g. experimental conditions 8 and 1, respectively, in Table 2 ). This all-or-nothing approach, also called the treatment package strategy ( West et al., 1993 ), involves the fewest possible experimental conditions, so in one sense it is a very economical design. The trade-off is that all main effects and interactions are aliased with all others. Thus although the treatment package strategy can be used to evaluate whether an intervention is efficacious as a whole, it does not provide direct evidence about any individual intervention component. A factorial design with as many factors as there are distinct intervention components of interest would provide estimates of individual component effects and interactions between and among components.
Individual intervention components are likely to have smaller effect sizes than the intervention as a whole ( West & Aiken, 1997 ), in which case sample size requirements will be increased as compared to a two-experimental-condition RCT. One possibility is to increase power by using a Type I error rate larger than the traditional α = .05, in other words, to tolerate a somewhat larger probability of mistakenly choosing an inactive component for inclusion in the intervention in order to reduce the probability of mistakenly rejecting an active intervention component. Collins et al. (2005 , 2007) recommended this and similar tactics as part of a phased experimental strategy aimed at selecting components and levels to comprise an intervention. In this phased experimental strategy, after the new intervention is formed its efficacy is confirmed in a RCT at the conventional α = .05. As Hays (1994 , p. 284) has suggested, “In some situations, perhaps, we should be far more attentive to Type II errors and less attentive to setting α at one of the conventional levels.”
One reason for eschewing a factorial design in favor of the standard two-experimental-condition RCT may be a shortage of resources needed to implement all the experimental conditions in a complete factorial design. If this is the primary obstacle, it is possible that it can be overcome by identifying a fractional factorial design requiring a manageable number of experimental conditions. Fractional factorial designs are particularly apropos for experiments in which the primary objective is to determine which factors out of an array of factors have important effects (where “important” can be defined as “statistically significant,” “effect size greater than d ,” or any other reasonable empirical criterion). In engineering these are called screening experiments. For example, suppose an investigator is developing an intervention and wishes to conduct an experiment to ascertain which of a set of possible intervention features are likely to contribute to an overall intervention effect. In most cases an approximate estimate of the effect of an individual factor is sufficient for a screening experiment, as long as the estimate is not so far off as to lead to incorrect inclusion of an intervention feature that has no effect (or, worse, has a negative effect) or incorrect exclusion of a feature that makes a positive contribution. Thus in this context the increased scientific information that can be gained using a fractional factorial design may be an acceptable tradeoff against the somewhat reduced estimation precision that can accompany aliasing. (For a Monte Carlo simulation examining the use of a fractional factorial screening experiment in intervention science, see Collins, Chakroborty, Murphy, & Strecher, in press .)
It must be acknowledged that even very economical fractional factorial designs typically require more experimental conditions than intervention scientists routinely consider implementing. In some areas in intervention science, there may be severe restrictions on the number of experimental conditions that can be realistically handled in any one experiment. For example, it may not be reasonable to demand of intervention personnel that they deliver different versions of the intervention to different subsets of participants, as would be required in any experiment other than the treatment package RCT. Or, the intervention may be so complex and demanding, and the context in which it must be delivered so chaotic, that implementing even two experimental conditions well is a remarkable achievement, and trying to implement more would surely result in sharply diminished implementation fidelity ( West & Aiken, 1997 ). Despite the undeniable reality of such difficulties, we wish to suggest that they do not necessarily rule out the use of complete and, in particular, fractional factorial designs across the board in all areas of intervention science. There may be some areas in which a careful analysis of available resources and logistical strategies will suggest that a factorial approach is feasible. One example is Strecher et al. (2008) , who described a 16-experimental-condition fractional factorial experiment to investigate five intervention components in a smoking cessation intervention. Another example can be found in Nair et al. (2008) , who described a 16-experimental-condition fractional factorial experiment to investigate five features of decision aids for women choosing among breast cancer treatments. Commenting on the Strecher et al. article, Norman (2008) wrote, “The fractional factorial design can provide considerable cost savings for more rapid prototype testing of intervention components and will likely be used more in future health behavior change research” (p. 450). Collins et al. (2005) and Nair et al. (2008) have provided some introductory information on the use of fractional factorial designs in intervention research. Collins et al. (2005 , 2007) discussed the use of fractional factorial designs in the context of a phased experimental strategy for building more efficacious behavioral interventions.
One interesting difference between the RCT on the one hand and factorial and fractional factorial designs on the other is that as compared to the standard RCT, a factorial design assigns a much smaller proportion of subjects to an experimental condition that receives no treatment. In a standard two-arm RCT about half of the experimental subjects will be assigned to some kind of control condition, for example a wait list or the current standard of care. By contrast, in a factorial experiment there is typically only one experimental condition in which all of the factors are set to Off. Thus if the design is a 2 3 factorial, say, seventh-eighths of the subjects will be assigned to a condition in which at least one of the factors is set to On. If the intervention is sought-after and assignment to a control condition is perceived as less desirable than assignment to a treatment condition, there may be better compliance because most subjects will receive some version of an intervention. In fact, it often may be possible to select a fractional factorial design in which there is no experimental condition in which all factors are set to Off.
Investigators are often interested in determining whether there are interactions between individual subject characteristics and any of the factors in a factorial or fractional factorial experiment. As an example, suppose an investigator is interested in determining whether gender interacts with the six independent variables in the hypothetical example used in this article. There are two ways this can be accomplished; one is exploratory, and the other is a priori (e.g. Murray, 1998 ).
In the exploratory approach, after the experiment has been conducted gender is coded and added to the analysis of variance as if it were another factor. Even if the design was originally perfectly balanced, such an addition nearly always results in a substantial disruption of balance. Thus the effect estimates are unlikely to be orthogonal, and so care must be taken in estimating the sums of squares. If a reduced design was used, it is important to be aware of what effects, if any, are aliased with the interactions being examined. In most fractional factorial experiments the two-way interactions between gender and any of the independent variables are unlikely to be aliased with other effects, but three-way and higher-order interactions involving gender are likely to be aliased with other effects.
In the a priori approach, gender is built into the design as an additional factor before the experiment is conducted, by ensuring that it is crossed with every other factor. Orthogonality will be maintained and power for detecting gender effects will be optimized if half of the subjects are male and half are female, with randomization done separately within each gender, as if gender were a blocking variable. However, in blocking it is assumed that there are no interactions between the blocking variable and the independent variables; the purpose of blocking is to control error. By contrast, in the a priori approach the interactions between gender and the manipulated independent variables are of particular interest, and the experiment should be powered accordingly to detect these interactions. As compared to the exploratory approach, with the a priori approach it is much more likely that balance can be maintained or nearly maintained. Variables such as gender can easily be incorporated into fractional factorial designs using the a priori approach. These variables can simply be listed with the other independent variables when using software such as PROC FACTEX to identify a suitable fractional factorial design. A fractional factorial design can be chosen so that important two-way and even three-way interactions between, for example, gender and other independent variables are aliased only with higher-order interactions.
To the extent that an effect placed in the negligible category is nonzero, the estimate of any effect of primary scientific interest that is aliased with it will be different from an estimate based on a complete factorial experiment. Thus a natural question is, “How small should the expected size of an interaction be for the interaction to be placed appropriately in the negligible category?”
The answer depends on the field of scientific endeavor, the value of the scientific information that can be gained using a reduced design, and the kind of decisions that are to be made based on the results of the experiment. There are risks associated with assuming an effect is negligible. If the effect is in reality non-negligible and positive, it can make a positive effect aliased with it look spuriously large, or make a negative effect aliased with it look spuriously zero or even positive. If an effect placed in the negligible category is non-negligible and negative, it can make a positive effect aliased with it look spuriously zero or even negative, or make a negative effect aliased with it look spuriously large.
Placing an effect in the negligible category is not the same as assuming it is exactly zero. Rather, the assumption is that the effect is small enough not to be very likely to lead to incorrect decisions. If highly precise estimates of effects are required, it may be that few or no effects are deemed small enough to be eligible for placement in the negligible category. If the potential gain of additional scientific information obtained at a cost of fewer resources offsets the risk associated with reduced estimation precision and the possibility of some spurious effects, then effects expected to be nonzero, but small, may more readily be designated negligible.
The discussion of reduced designs in this article is limited in a number of ways. One limitation of the discussion is that it has focused on between-subjects designs. It is straightforward to extend every design here to incorporate repeated measures, which will improve statistical power. However, all else being equal, the factorial designs will still have more power than the individual experiments and single factor approaches. There have been a few examples of the application of within-subjects fractional designs in legal psychology ( Cutler, Penrod, & Dexter, 1990 ; Cutler, Penrod, & Martens, 1987 ; Cutler, Penrod, & Stuve, 1988 ; O'Rourke, Penrod, Cutler, & Stuve, 1989 ; Smith, Penrod, Otto, & Park, 1996 ) and in other research on attitudes and choices (e.g., van Schaik, Flynn & van Wersch, 2005 ; Sorenson & Taylor, 2005 ; Zimet et al., 2005 ) in which a fractional factorial structure is used to construct the experimental conditions assigned to each subject. In fact, the Latin squares approach for balancing orders of experimental conditions in repeated-measures studies is a form of within-subjects fractional factorial. Within-subjects fractional designs of this kind could be seen as a form of planned missingness design (see Graham, Taylor, Olchowski, & Cumsille, 2006 ).
Another limitation of this article is the focus on factors with only two levels. Designs involving exclusively two-level factors are very common, and factorial designs with two levels per factor tend to be more economical than those involving factors with three or more levels, as well as much more interpretable in practice, due to their simpler interaction structure ( Wu & Hamada, 2000 ). However, any of the designs discussed here can incorporate factors with more than two levels, and different factors may have different numbers of levels. Factors with three or more levels, and in particular an array of factors with mixed numbers of levels, adds complexity to the aliasing in fractional factorial experiments. Although this requires careful attention, it can be handled in a straightforward manner using software like SAS PROC FACTEX.
This article has not discussed what to do when unexpected difficulties arise. One such difficulty is unplanned missing data, for example, an experimental subject failing to provide outcome data. The usual concerns about informative missingness (e.g. dropout rates that are higher in some experimental conditions than in others) apply in complete and reduced factorial experiments just as they do in other research settings. In any complete or reduced design unplanned missingness can be handled in the usual manner, via multiple imputation or maximum likelihood (see e.g. Schafer & Graham, 2002 ). If experimental conditions are assigned unequal numbers of subjects, use of a regression analysis framework can deal with the resulting lack of orthogonality of effects with very little extra effort (e.g. PROC GLM in SAS). Another unexpected difficulty that can arise in reduced designs is evidence that assumptions about negligible interactions are incorrect. If this occurs, one possibility is to implement additional experimental conditions to address targeted questions, in an approach often called sequential experimentation ( Meyer, Steinberg, & Box, 1996 ).
According to the resource management perspective, the choice of an experimental design requires consideration of both resource requirements and expected scientific benefit; the preferred research design is the one expected to provide the greatest scientific benefit in relation to resources required. Although aliasing may sometimes be raised as an objection to the use of fractional factorial designs, it must be remembered that aliasing in some form is inescapable in any and all reduced designs, including individual experiments and single factor designs. We recommend considering all feasible designs and making a decision taking a resource management perspective that weighs resource demands against scientific costs and benefits.
Paramount among the considerations that drive the choice of an experimental design is addressing the scientific question motivating the research. At the same time, if this scientific question can be addressed only by a very resource-intensive design, but a closely related question can be addressed by a much less resource-intensive design, the investigator may wish to consider reframing the question to conserve resources. For example, when research subjects are expensive or scarce, it may be prudent to consider whether scientific questions can be framed in terms of main effects rather than simple effects so that a factorial or fractional factorial design can be used. Or, when resource limitations preclude implementing more than a very few experimental conditions, it may be prudent to consider framing research questions in terms of simple effects rather than main effects. When a research question is reframed to take advantage of the economy offered by a particular design, it is important that the interpretation of effects be consistent with the reframing, and that this consistency be maintained not only in the original research report but in subsequent citations of the report, as well as integrative reviews or meta-analyses that include the findings.
Resource requirements can often be estimated objectively, as discussed above. Tables like Table 5 may be helpful and can readily be prepared for any N and k . (A SAS macro to perform these computations can be found on the web site http:\\methodology.psu.edu .) In contrast, assessment of expected scientific benefit is much more subjective, because it represents the investigator's judgment of the value of the scientific knowledge proffered by an experimental design in relation to the plausibility of any assumptions that must be made. For this reason, weighing resource requirements against expected scientific benefit can be challenging. Because expected scientific benefit usually cannot be expressed in purely financial terms, or even readily quantified, a simple benefit to cost ratio is unlikely to be helpful in choosing among alternative designs. For many social and behavioral scientists, the decision may be simplified somewhat by the existence of absolute upper limits on the number of subjects that are available, number of experimental conditions that can be handled logistically, availability of qualified personnel to run experimental conditions, number of hours shared equipment can be used, and so on. Designs that would exceed these limitations are immediately ruled out, and the preferred design now becomes the one that is expected to provide the greatest scientific benefit without exceeding available resources. This requires careful planning to ensure that the design of the study clearly addresses the scientific questions of most interest.
For example, suppose an investigator who is interested in six two-level independent variables has the resources to implement an experiment with at most 16 experimental conditions. One possible strategy is a “complete” factorial design involving four factors and holding the remaining two factors constant at specified levels. Given that six factors are of scientific interest, this “complete” factorial design is actually a reduced design. This approach enables estimation of the main effects and all interactions involving the four factors included in the experiment, but these effects will be aliased with interactions involving the two omitted factors. Therefore in order to draw conclusions either these effects must be assumed negligible, or interpretation must be restricted to the levels at which the two omitted factors were set. Another possible strategy is a Resolution IV fractional factorial design including all six factors, which enables investigation of all six main effects and many two-way interactions, but no higher-order interactions. Instead, this design requires assuming that all three-way and higher-order interactions are negligible. Thus, both designs can be implemented within available resources, but they differ in the kind of scientific information they provide and the assumptions they require. Which option is better depends on the value of the information provided by each experiment in relation to the research questions. If the ability to estimate the higher-order interactions afforded by the four-factor factorial design is more valuable than the ability to estimate the six main effects and additional two-way interactions afforded by the fractional factorial design, then the four-factor factorial may have greater expected scientific benefit. On the other hand, if the investigator is interested primarily in main effects of all six factors and selected two-way interactions, the fractional factorial design may provide more valuable information.
Strategic use of reduced designs involves taking calculated risks. To assess the expected scientific benefit of each design, the investigator must also consider the risk associated with any necessary assumptions in relation to the value of the knowledge that can be gained by the design. In the example above, any risk associated with making the assumptions required by the fractional factorial design must be weighted against the value associated with the additional main effect and two-way interaction estimates. If other, less powerful reduced designs are considered, any increased risk of a Type II error must also be considered. If an experiment is an exploratory endeavor intended to determine which factors merit further study in a subsequent experiment, the ability to investigate many factors may be of paramount importance and may outweigh the risks associated with aliasing. A design that requires no or very safe assumptions may not have a greater net scientific benefit than a riskier design if the knowledge it proffers is meager or is not at the top of the scientific agenda motivating the experiment. Put another way, the potential value of the knowledge that can be gained in a design may offset any risk associated with the assumptions it requires.
The authors would like to thank Bethany C. Bray, Michael J. Cleveland, Donna L. Coffman, Mark Feinberg, Brian R. Flay, John W. Graham, Susan A. Murphy, Megan E. Patrick, Brittany Rhoades, and David Rindskopf for comments on an earlier draft. This research was supported by NIDA grants P50 DA10075 and K05 DA018206.
1 Assuming orthogonality is maintained, adding a factor to a factorial experiment does not change estimates of main effects and interactions. However, the addition of a factor does change estimates of error terms, so hypothesis tests can be slightly different.
2 In the social and behavioral sciences literature the term “fractional factorial” has sometimes been applied to reduced designs that do not maintain the balance property, such as the individual experiments and single factor designs. In this article we maintain the convention established in the statistics literature (e.g. Wu & Hamada, 2000 ) of reserving the term “fractional factorial” for the subset of reduced designs that maintain the balance property.
Linda M. Collins, The Methodology Center and Department of Human Development and Family Studies, The Pennsylvania State University.
John J. Dziak, The Methodology Center, The Pennsylvania State University.
Runze Li, Department of Statistics and The Methodology Center, The Pennsylvania State University.
This example should be done by yourself. It is based on Question 19 in the exercises for Chapter 5 in Box, Hunter and Hunter (2nd edition).
The data are from a plastics molding factory that must treat its waste before discharge. The \(y\) -variable represents the average amount of pollutant discharged (lb per day), while the three factors that were varied were
\(C\) = the chemical compound added (choose either chemical P or chemical Q) \(T\) = the treatment temperature (72 °F or 100 °F) \(S\) = the stirring speed (200 rpm or 400 rpm) \(y\) = the amount of pollutant discharged (lb per day) Experiment Order \(C\) \(T\) [°F] \(S\) [rpm] \(y\) [lb] 1 5 Choice P 72 200 5 2 6 Choice Q 72 200 30 3 1 Choice P 100 200 6 4 4 Choice Q 100 200 33 5 2 Choice P 72 400 4 6 7 Choice Q 72 400 3 7 3 Choice P 100 400 5 8 8 Choice Q 100 400 4
Draw a geometric figure that illustrates the data from this experiment.
Calculate the main effect for each factor by hand.
For the C effect , there are four estimates of \(C\) : \[\displaystyle \frac{(+25) + (+27) + (-1) + (-1)}{4} = \frac{50}{4} = \bf{12.5}\] For the T effect , there are four estimates of \(T\) : \[\displaystyle \frac{(+1) + (+3) + (+1) + (+1)}{4} = \frac{6}{4} = \bf{1.5}\] For the S effect , there are four estimates of \(S\) : \[\displaystyle \frac{(-27) + (-1) + (-29) + (-1)}{4} = \frac{-58}{4} = \bf{-14.5}\]
Calculate the 3 two-factor interactions (2fi) by hand, recalling that interactions are defined as the half difference going from high to low.
For the CT interaction , there are two estimates of \(CT\) . Recall that interactions are calculated as the half difference going from high to low. Consider the change in \(C\) when \(T_\text{high}\) (at \(S\) high) = \(4 - 5 = -1\) \(T_\text{low}\) (at \(S\) high) = \(3 - 4 = -1\) This gives a first estimate of \([(-1) - (-1)]/2 = 0\) . Similarly, \(T_\text{high}\) (at \(S\) low) = \(33 - 6 = +27\) \(T_\text{low}\) (at \(S\) low) = \(30 - 5 = +25\) gives a second estimate of \([(+27) - (+25)]/2 = +1\) . The average CT interaction is therefore \((0 + 1)/2 = \mathbf{0.5}\) . You can interchange \(C\) and \(T\) and still get the same result. For the CS interaction , there are two estimates of \(CS\) . Consider the change in \(C\) when \(S_\text{high}\) (at \(T\) high) = \(4 - 5 = -1\) \(S_\text{low}\) (at \(T\) high) = \(33 - 6 = +27\) This gives a first estimate of \([(-1) - (+27)]/2 = -14\) . Similarly, \(S_\text{high}\) (at \(T\) low) = \(3 - 4 = -1\) \(S_\text{low}\) (at \(T\) low) = \(30 - 5 = +25\) gives a second estimate of \([(-1) - (+25)]/2 = -13\) . The average CS interaction is therefore \((-13 - 14)/2 = \mathbf{-13.5}\) . You can interchange \(C\) and \(S\) and still get the same result. For the ST interaction , there are two estimates of \(ST\) : \((-1 + 0)/2 = \mathbf{-0.5}\) . Calculate in the same way as above.
Calculate the single three-factor interaction (3fi).
There is only a single estimate of \(CTS\) . The \(CT\) effect at high \(S\) is 0, and the \(CT\) effect at low \(S\) is \(+1\) . The \(CTS\) interaction is then \([(0) - (+1)] / 2 = \mathbf{-0.5}\) . You can also calculate this by considering the \(CS\) effect at the two levels of \(T\) , or by considering the \(ST\) effect at the two levels of \(C\) . All three approaches give the same result.
Compute the main effects and interactions using matrix algebra and a least squares model.
Use computer software to build the following model and verify that:
Learning notes:
The chemical compound could be coded either as (chemical P = \(-1\) , chemical Q = \(+1\) ) or (chemical P = \(+1\) , chemical Q = \(-1\) ). The interpretation of the \(x_C\) coefficient is the same, regardless of the coding. Just the tabulation of the raw data gives us some interpretation of the results. Why? Since the variables are manipulated independently, we can just look at the relationship of each factor to \(y\) , without considering the others. It is expected that the chemical compound and speed have a strong effect on \(y\) , but we can also see the chemical \(\times\) speed interaction. You can see this last interpretation by writing out the full \(\mathbf{X}\) design matrix and comparing the bold column, associated with the \(b_\text{CS}\) term, with the \(y\) column.
A note about magnitude of effects
In this text we quantify the effect as the change in response over half the range of the factor. For example, if the center point is 400 K, the lower level is 375 K and the upper level is 425 K, then an effect of "-5" represents a reduction in \(y\) of 5 units for every increase of 25 K in \(x\) .
We use this representation because it corresponds with the results calculated from least-squares software. Putting the matrix of \(-1\) and \(+1\) entries into the software as \(\mathbf{X}\) , along with the corresponding vector of responses, \(y\) , you can calculate these effects as \(\mathbf{b} = \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}\mathbf{y}\) .
Other textbooks, specifically Box, Hunter and Hunter, will report effects that are double ours. This is because they consider the effect to be the change from the lower level to the upper level (double the distance). The advantage of their representation is that binary factors (catalyst A or B; agitator on or off) can be readily interpreted, whereas in our notation, the effect is a little harder to describe (simply double it!).
The advantage of our methodology, though, is that the results calculated by hand would be the same as those from any computer software with respect to the magnitude of the coefficients and the standard errors, particularly in the case of duplicate runs and experiments with center points.
Remember: our effects are half those reported in Box, Hunter and Hunter, and in some other textbooks; our standard error would also be half of theirs. The conclusions drawn will always be the same, as long as one is consistent.
selected template will load here
This action is not available.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
Just as it is common for studies in education (or social sciences in general) to include multiple levels of a single independent variable (new teaching method, old teaching method), it is also common for them to include multiple independent variables. Just as including multiple levels of a single independent variable allows one to answer more sophisticated research questions, so too does including multiple independent variables in the same experiment. But including multiple independent variables also allows the researcher to answer questions about whether the effect of one independent variable depends on the level of another. This is referred to as an interaction between the independent variables. As we will see, interactions are often among the most interesting results in empirical research.
By far the most common approach to including multiple independent variables (which are also called factors or ways) in an experiment is the factorial design. In a between-subjects factorial design , each level of one independent variable is combined with each level of the others to produce all possible combinations. Each combination, then, becomes a condition in the experiment. Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure \(\PageIndex{1}\). The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible combinations or conditions: using a cell phone during the day, not using a cell phone during the day, using a cell phone at night, and not using a cell phone at night. This particular design is referred to as a 2 × 2 (read “two-by-two”) factorial design because it combines two variables, each of which has two levels.
If one of the independent variables had a third level (e.g., using a handheld cell phone, using a hands-free cell phone, and not using a cell phone), then it would be a 3 × 2 factorial design, and there would be six distinct conditions. Notice that the number of possible conditions is the product of the numbers of levels. A 2 × 2 factorial design has four conditions, a 3 × 2 factorial design has six conditions, a 4 × 5 factorial design would have 20 conditions, and so on. Also notice that each number in the notation represents one factor, one independent variable. So by looking at how many numbers are in the notation, you can determine how many independent variables there are in the experiment. 2 x 2, 3 x 3, and 2 x 3 designs all have two numbers in the notation and therefore all have two independent variables. Some people refer to these are two-way factorial ANOVA. The numerical value of each of the numbers represents the number of levels of each independent variable. A 2 means that the independent variable has two levels, a 3 means that the independent variable has three levels, a 4 means it has four levels, etc. To illustrate, a 3 x 3 design has two independent variables, each with three levels, while a 2 x 2 x 2 design has three independent variables, each with two levels.
In principle, factorial designs can include any number of independent variables with any number of levels. For example, an experiment could include the type of psychotherapy (cognitive vs. behavioral), the length of the psychotherapy (2 weeks vs. 2 months), and the sex of the psychotherapist (female vs. male). This would be a 2 × 2 × 2 factorial design and would have eight conditions. Figure \(\PageIndex{2}\) shows one way to represent this design. In practice, it is unusual for there to be more than three independent variables with more than two or three levels each. This is for at least two reasons: For one, the number of conditions can quickly become unmanageable. For example, adding a fourth independent variable with three levels (e.g., therapist experience: low vs. medium vs. high) to the current example would make it a 2 × 2 × 2 × 3 factorial design with 24 distinct conditions. Second, the number of participants required to populate all of these conditions (while maintaining a reasonable ability to detect a real underlying effect) can render the design unfeasible. As a result, in the remainder of this section, we will focus on designs with two independent variables. The general principles discussed here extend in a straightforward way to more complex factorial designs.
Recall that in a between-subjects single factor design, each participant is tested in only one condition. In a between-subjects factorial design , all of the independent variables are manipulated between subjects. For example, all participants could be tested either while using a cell phone or while not using a cell phone and either during the day or during the night. This would mean that each participant would be tested in one and only one condition.
Since factorial designs have more than one independent variable, it is also possible to manipulate one independent variable between subjects and another within subjects. This is called a mixed factorial design . For example, a researcher might choose to treat cell phone use as a within-subjects factor by testing the same participants both while using a cell phone and while not using a cell phone. But they might choose to treat time of day as a between-subjects factor by testing each participant either during the day or during the night (perhaps because this only requires them to come in for testing once). Thus each participant in this mixed design would be tested in two of the four conditions. This is a complex design with complex statistical analyses. In the remainder of this section, we will focus on between-subjects factorial designs only. Also, regardless of the design, the actual assignment of participants to conditions is typically done randomly.
In many factorial designs, one of the independent variables is a non-manipulated independent variable . The researcher measures it but does not manipulate it. An example is a study by Halle Brown and colleagues in which participants were exposed to several words that they were later asked to recall (Brown, Kosslyn, Delamater, Fama, & Barsky, 1999) [1] . The manipulated independent variable was the type of word. Some were negative health-related words (e.g., tumor, coronary ), and others were not health related (e.g., election, geometry ). The non-manipulated independent variable was whether participants were high or low in hypochondriasis (excessive concern with ordinary bodily symptoms). The result of this study was that the participants high in hypochondriasis were better than those low in hypochondriasis at recalling the health-related words, but they were no better at recalling the non-health-related words.
Such studies are extremely common, and there are several points worth making about them. First, non-manipulated independent variables are usually participant background variables (self-esteem, gender, and so on), and as such, they are by definition between-subjects factors. For example, people are either low in self-esteem or high in self-esteem; they cannot be tested in both of these conditions. Second, such studies are generally considered to be experiments as long as at least one independent variable is manipulated, regardless of how many non-manipulated independent variables are included. Third, it is important to remember that causal conclusions can only be drawn about the manipulated independent variable. Thus it is important to be aware of which variables in a study are manipulated and which are not.
Thus far we have seen that factorial experiments can include manipulated independent variables or a combination of manipulated and non-manipulated independent variables. But factorial designs can also include only non-manipulated independent variables, in which case they are no longer experiment designs, but are instead non-experimental in nature. Consider a hypothetical study in which a researcher simply measures both the moods and the self-esteem of several participants—categorizing them as having either a positive or negative mood and as being either high or low in self-esteem—along with their willingness to have unprotected sex. This can be conceptualized as a 2 × 2 factorial design with mood (positive vs. negative) and self-esteem (high vs. low) as non-manipulated between-subjects factors. Willingness to have unprotected sex is the dependent variable.
Again, because neither independent variable in this example was manipulated, it is a non-experimental study rather than an experimental design. This is important because, as always, one must be cautious about inferring causality from non-experimental studies because of the threats of potential confounding variables. For example, an effect of participants’ moods on their willingness to have unprotected sex might be caused by any other variable that happens to be correlated with their moods.
Statistics Made Easy
A 2×3 factorial design is a type of experimental design that allows researchers to understand the effects of two independent variables on a single dependent variable.
In this type of design, one independent variable has two levels and the other independent variable has three levels.
For example, suppose a botanist wants to understand the effects of sunlight (low vs. medium vs. high) and watering frequency (daily vs. weekly) on the growth of a certain species of plant.
This is an example of a 2×3 factorial design because there are two independent variables, one having two levels and the other having three levels:
And there is one dependent variable: Plant growth.
A 2×3 factorial design allows you to analyze the following effects:
Main Effects: These are the effects that just one independent variable has on the dependent variable.
For example, in our previous scenario we could analyze the following main effects:
Interaction Effects: These occur when the effect that one independent variable has on the dependent variable depends on the level of the other independent variable.
For example, in our previous scenario we could analyze the following interaction effects:
We can perform a two-way ANOVA to formally test whether or not the independent variables have a statistically significant relationship with the dependent variable.
For example, the following code shows how to perform a two-way ANOVA for our hypothetical plant scenario in R:
Here’s how to interpret the output of the ANOVA:
The following tutorials provide additional information on experimental design and analysis:
A Complete Guide: The 2×2 Factorial Design What Are Levels of an Independent Variable? Independent vs. Dependent Variables What is a Factorial ANOVA?
Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike. My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.
This is a great tutorial but it would be helpful to walk through post hoc analysis for understanding the interaction effect too.
Thanks for your helping post.
Your email address will not be published. Required fields are marked *
Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!
By subscribing you accept Statology's Privacy Policy.
Six Sigma Study Guide
Study notes and guides for Six Sigma certification tests
Posted by Ted Hessing
In most experiments, you’ll have a number of factors to deal with. These are elements that affect the outcomes of your experiment. They fall into a few basic categories:
There are two basic types of treatment factors that you’ll use:
A popular example in explaining factors is the simple-sounding task of baking cookies. Most people would simply follow a recipe – or, let’s face it, buy the cookie dough pre-made and bake whatever we don’t eat raw. But how did the recipe come to be in the first place? Someone had to experiment with ingredients and baking methods for the right combination.
Think of each of these ingredients and the baking temperature as factors in an experiment. You can’t test each factor independently – you need to have all ingredients to produce the cookies. But you can modify the amount, type of ingredient, and temperature at which they’re baked, to find the combination that yields your perfect cookie.
I originally created SixSigmaStudyGuide.com to help me prepare for my own Black belt exams. Overtime I've grown the site to help tens of thousands of Six Sigma belt candidates prepare for their Green Belt & Black Belt exams. Go here to learn how to pass your Six Sigma exam the 1st time through!
Your email address will not be published. Required fields are marked *
This site uses Akismet to reduce spam. Learn how your comment data is processed .
Enter the destination URL
Or link to existing content
Table of contents ¶.
As with other notebooks in this repository, this notebook follows, more or less closely, content from Box and Draper's Empirical Model-Building and Response Surfaces (Wiley, 1984). This content is covered by Chapter 4 of Box and Draper.
In this notebook, we'll carry out an anaylsis of a full factorial design, and show how we can obtain information about a system and its responses, and a quantifiable range of certainty about those values. This is the fundamental idea behind empirical model-building and allows us to construct cheap and simple models to represent complex, nonlinear systems.
Once we've nailed this down for simple models and small numbers of inputs and responses, we can expand on it, use more complex models, and link this material with machine learning algorithms.
We'll start by importing numpy for numerical analysis, and pandas for convenient data containers.
Box and Draper cover different experimental design methods in the book, but begin with the simplest type of factorial design in Chapter 4: a full factorial design with two levels. A factorial experimental design is appropriate for exploratory stages, when the effects of variables or their interactions on a system response are poorly understood or not quantifiable.
The analysis begins with a two-level, three-variable experimental design - also written $2^3$, with $n=2$ levels for each factor, $k=3$ different factors. We start by encoding each fo the three variables to something generic: $(x_1,x_2,x_3)$. A dataframe with input variable values is then populated.
low | high | label | |
---|---|---|---|
index | |||
x1 | 250 | 350 | Length of specimen (mm) |
x2 | 8 | 10 | Amplitude of load cycle (mm) |
x3 | 40 | 50 | Load (g) |
Next, we encode the variable values. For an arbitrary variable value $\phi_1$, the value of the variable can be coded to be between -1 and 1 according to the formula:
where the average and the span of the variable $\phi_i$ are defined as:
low | high | label | encoded_low | encoded_high | |
---|---|---|---|---|---|
index | |||||
x1 | 250 | 350 | Length of specimen (mm) | -1.0 | 1.0 |
x2 | 8 | 10 | Amplitude of load cycle (mm) | -1.0 | 1.0 |
x3 | 40 | 50 | Load (g) | -1.0 | 1.0 |
While everything preceding this point is important to state, to make sure we're being consistent and clear about our problem statement and assumptions, nothing preceding this point is particularly important to understanding how experimental design works. This is simply illustrating the process of transforming one's problem from a problem-specific problem space to a more general problem space.
Box and Draper present the results (observed outcomes) of a $2^3$ factorial experiment. The $2^3$ comes from the fact that there are 2 levels for each variable (-1 and 1) and three variables (x1, x2, and x3). The observed, or output, variable is the number of cycles to failure for a particular piece of machinery; this variable is more conveniently cast as a logarithm, as it can be a very large number.
Each observation data point consists of three input variable values and an output variable value, $(x_1, x_2, x_3, y)$, and can be thought of as a point in 3D space $(x_1,x_2,x_3)$ with an associated point value of $y$. Alternatively, this might be thought of as a point in 4D space (the first three dimensions are the location in 3D space where the point will appear, and the $y$ value is when it will actually appear).
The input variable values consist of all possible input value combinations, which we can produce using the itertools module:
Now we implement the observed outcomes; as we mentioned, these numbers are large (hundreds or thousands of cycles), and are more conveniently scaled by taking $\log_{10}()$ (which will rescale them to be integers between 1 and 4).
x1 | x2 | x3 | y | logy | |
---|---|---|---|---|---|
0 | -1 | -1 | -1 | 674 | 2.828660 |
1 | 1 | -1 | -1 | 3636 | 3.560624 |
2 | -1 | 1 | -1 | 170 | 2.230449 |
3 | 1 | 1 | -1 | 1140 | 3.056905 |
4 | -1 | -1 | 1 | 292 | 2.465383 |
5 | 1 | -1 | 1 | 2000 | 3.301030 |
6 | -1 | 1 | 1 | 90 | 1.954243 |
7 | 1 | 1 | 1 | 360 | 2.556303 |
The variable inputs_df contains all input variables for the expeirment design, and results_df contains the inputs and responses for the experiment design; these variables are the encoded levels. To obtain the original, unscaled values, which allows us to check what experiments must be run, we can always convert the dataframe back to its originals by defining a function to un-apply the scaling equation. This is as simple as finding
Length of specimen (mm) | Amplitude of load cycle (mm) | Load (g) | |
---|---|---|---|
0 | 250 | 8 | 40 |
1 | 350 | 8 | 40 |
2 | 250 | 10 | 40 |
3 | 350 | 10 | 40 |
4 | 250 | 8 | 50 |
5 | 350 | 8 | 50 |
6 | 250 | 10 | 50 |
7 | 350 | 10 | 50 |
Now we compute the main effects of each variable using the results of the experimental design. We'll use some shorthand Pandas functions to compute these averages: the groupby function, which groups rows of a dataframe according to some condition (in this case, the value of our variable of interest $x_i$).
The main effect of a given variable (as defined by Yates 1937) is the average difference in the level of response as the input variable moves from the low to the high level. If there are other variables, the change in the level of response is averaged over all combinations of the other variables.
Now that we've computed the main effects, we can analyze the results to glean some meaningful information about our system. The first variable x1 has a positive effect of 0.74 - this indicates that when x1 goes from its low level to its high level, it increases the value of the response (the lieftime of the equipment). This means x1 should be increased, if we want to make our equipment last longer. Furthermore, this effect was the largest, meaning it's the variable we should consider changing first.
This might be the case if, for example, changing the value of the input variables were capital-intensive. A company might decide that they can only afford to change one variable, x1 , x2 , or x3 . If this were the case, increasing x1 would be the way to go.
In contrast, increasing the variables x2 and x3 will result in a decrease in the lifespan of our equipment (makes the response smaller), since these have a negative main effect. These variables should be kept at their lower levels, or decreased, to increase the lifespan of the equipment.
In addition to main effects, a factorial design will also reveal interaction effects between variables - both two-way interactions and three-way interactions. We can use the itertools library to compute the interaction effects using the results from the factorial design.
We'll use the Pandas groupby function again, grouping by two variables this time.
This one-liner is a bit hairy:
What this does is, computes the two-way variable effect with a multi-step calculation, but does it with a list comprehension. First, let's just look at this part:
This computes the prefix i*j , which determines if the interaction effect effects[i][j] is positive or negative. We're also looping over one additional dimension; we multiply by 1/2 for each additional dimension we loop over. These are all summed up to yield the final interaction effect for every combination of the input variables.
If we were computing three-way interaction effects, we would have a similar-looking one-liner, but with i , j , and k :
As with main effects, we can analyze the results of the interaction effects analysis to come to some useful conclusions about our physical system. A two-way interaction is a measure of how the main effect of one variable changes as the level of another variable changes. A negative two-way interaction between $x_2$ and $x_3$ means that if we increase $x_3$, the main effect of $x_2$ will be to decrase the response; or, alternatively, if we increase $x_2$, the main effect of $x_3$ will be to decrease the response.
In this case, we see that the $x_2-x_3$ interaction effect is the largest, and it is negative. This means that if we decrease both $x_2$ and $x_3$, it will increase our response - make the equipment last longer. In fact, all of the variable interactions have the same result - increasing both variables will decrease the lifetime of the equipment - which indicates that any gains in equipment lifetime accomplished by increasing $x_1$ will be nullified by increases to $x_2$ or $x_3$, since these variables will interact.
Once again, if we are limited in the changes that we can actually make to the equipment and input levels, we would want to keep $x_2$ and $x_3$ both at their low levels to keep the response variable value as high as possible.
Now let's comptue the three-way effects (in this case, we can only have one three-way effect, since we only have three variables). We'll start by using the itertools library again, to create a tuple listing the three variables whose interactions we're computing. Then we'll use the Pandas groupby() feature to partition each output according to its inputs, and use it to compute the three-way effects.
While three-way interactions are relatively rare, typically smaller, and harder to interpret, a negative three-way interaction esssentially means that increasing these variables, all together, will lead to interactions which lower the response (the lifespan of the equipment) by -0.082, which is equivalent to decreasing the lifespan of the equipment by one cycle. However, this effect is very weak comapred to main and interaction effects.
While identifying general trends and the effects of different input variables on a system response is useful, it's more useful to have a mathematical model for the system. The factorial design we used is designed to get us coefficients for a linear model $\hat{y}$ that is a linear function of input variables $x_i$, and that predicts the actual system response $y$:
To determine these coefficients, we can obtain the effects we computed above. When we computed effects, we defined them as measuring the difference in the system response that changing a variable from -1 to +1 would have. Because this quantifies the change per two units of x, and the coefficients of a polynomial quantify the change per one unit of x, the effect must be divided by two.
Thus, the final result of the experimental design matrix and the 8 experiments that were run is the following polynomial for $\hat{y}$, which is a model for $y$, the system response:
The main and interaction effects give us a more quantitative idea of what variables are important, yes. They can also be important for identifying where a model can be improved (if an input is linked strongly to a system response, more effort should be spent understanding the nature of the relationship).
But there are still some practical considerations missing from the implementation above. Specifically, in the real world it is impossible to know the system repsonse, $y$, perfectly. Rather, we may measure the response with an instrument whose uncertainty has been quantified, or we may measure a quantity multiple times (or both). How do we determine the impact of that uncertainty on the model?
Ultimately, factorial designs are based on the underlying assumption that the response $y$ is a linear function of the inputs $x_i$. Thus, for the three-factor full factorial experiment design, we are collecting data and running experiments in such a way that we obtain a model $\hat{y}$ for our system response $y$, and $\hat{y}$ is a linear function of each factor:
The experiment design allows us to obtain a value for each coefficient $a_0$, $a_1$, etc. that will fit $\hat{y}$ to $y$ to the best of its abilities.
Thus, uncertainty in the measured responses $y$ propagates into the linear model in the form of uncertainty in the coefficients $a_0$, $a_1$, etc.
For example, suppose that we're dealing with a machine on a factory floor, and we're measuring the system response $y$, which is a machine failure. Now, how do we know if a machine has failed? Perhaps we can't see its internals, and it still makes noise. We might find out that a machine has failed by seeing it emit smoke. But sometimes, machines will emit smoke before they fail, while other times, machines will only smoke after they've failed. We don't know exactly how many life cycles the machines went through, but we can quantify what we know. We can measure the mean $\overline{y}$ and variance $\sigma^2$ in a controlled setting, so that when a machine starts smoking, we have a probability distribution assigning probabilities to different times of failure (i.e., there is a 5% chance it failed more than 1 hour ago).
Once we obtain the variance, or $\sigma^2$, we can obtain the value of $\sigma$, which represents the distribution of uncertainty. Assuming 2 sigma is acceptable (covers 95% of cases), we can add or subtract $\sigma$ from the estimate of parameters.
To obtain an estimate of the uncertainty, the experimentalist will typically make several measurements at the center point, that is, where all parameter levels are 0. The more samples are taken at this condition, the better characterized the distribution of uncertainty becomes. These center point samples can be used to construct a Gaussian probability distribution function, which yeilds a variance, $\sigma^2$ (or, to be proper, an estimate $s^2$ of the real variance $\sigma^2$). This parameter is key for quantifying uncertainty.
Suppose we measure $s^2 = 0.0050$. Now what?
Now we can obtain the variance of all measurements, and the variance in the effects that we computed above. These are computed via:
Alternatively, if the responses $y$ are actually averages of a given number $r$ of $y$-observations, $\overline{y}$, then the variance will shrink:
The variance gives us an estimate of sigma squared, and if we have sigma squared we can obtain sigma. Sigma is the quantity that represents the range of response values that captures 1 sigma, or 66%, of the probable values of $y$ with $\hat{y}$. Adding a plus or minus sigma means we are capturing 2 sigma, or 95%, of the probable values of $y$.
Taking the square root of the variance gives $\sigma$:
Now we can convert the values of the effects, and the values of $\sigma$, to values for the final linear model:
We begin with the case where each variable value is at its middle point (all non-constant terms are 0), and
In this case, the standard error is $\pm \sigma$ as computed for the mean (or overall) system response,
where $\sigma_{mean} = \sqrt{Var(mean)}$.
The final polynomial model for our system response prediction $\hat{y}$ therefore becomes:
At this point, we would usually dive deeper into the details of the actual problem of interest. By tying the empirical model to the system, we can draw conclusions about the physical system - for example, if we were analyzing a chemically reacting process, and we found the response to be particularly sensitive to temperature, it would indicate that the chemical reaction is sensitive to temperature, and that the reaction should be studied more deeply (in isolation from the more complicated system) to better understand the impact of temperature on the response.
It's also valuable to explore the linear model that we obtained more deeply, by looking at contours of the response surface, taking first derivatives, and optimizing the input variable values to maximize or minimize the response value. We'll leave those tasks for later, and illustrate them in later notebooks.
At this point we have accomplished the goal of illustrating the design, execution, and analysis of a two-level, three-factor full factorial experimental design, so we'll leave things at that.
In this notebook, we've covered a 2-level, three-factor factorial design from start to finish, including incorporation of uncertainty information. The design of the experiment was made simple by using the itertools and pandas libraries, and we showed how to transform variables to have low and high levels, as well as demonstrating a system response transformation. The results were analyzed to obtain a linear polynomial model.
However, this process was a bit cumbersome. What we'll see in later notebooks is that we can use Python modules designed for statistical modeling to fit linear models to data using least squares and regression, and carry the analysis further.
Teach yourself statistics
This lesson explains how to use analysis of variance (ANOVA) with balanced, completely randomized, full factorial experiments. The discussion covers general issues related to design, analysis, and interpretation with fixed factors and with random factors .
Future lessons expand on this discussion, using sample problems to demonstrate the analysis under the following scenarios:
Since this lesson is all about implementing analysis of variance with a balanced, completely randomized, full factorial experiment, we begin by answering four relevant questions:
A factorial experiment allows researchers to study the joint effect of two or more factors on a dependent variable .
With a full factorial design, the experiment includes a treatment group for every combination of factor levels. Therefore, the number of treatment groups is the product of factor levels. For example, consider the full factorial design shown below:
C | C | C | C | ||
---|---|---|---|---|---|
A | B | Grp 1 | Grp 2 | Grp 3 | Grp 4 |
B | Grp 5 | Grp 6 | Grp 7 | Grp 8 | |
B | Grp 9 | Grp 10 | Grp 11 | Grp 12 | |
A | B | Grp 13 | Grp 14 | Grp 15 | Grp 16 |
B | Grp 17 | Grp 18 | Grp 19 | Grp 20 | |
B | Grp 21 | Grp 22 | Grp 23 | Grp 24 |
A | A | |||||
---|---|---|---|---|---|---|
B | B | B | B | B | B | |
C | Group 1 | Group 2 | Group 3 | Group 4 | Group 5 | Group 6 |
C | Group 7 | Group 8 | Group 9 | Group 10 | Group 11 | Group 12 |
C | Group 13 | Group 14 | Group 15 | Group 16 | Group 17 | Group 18 |
C | Group 19 | Group 20 | Group 21 | Group 22 | Group 23 | Group 24 |
Factor A has two levels, factor B has three levels, and factor C has four levels. Therefore, the full factorial design has 2 x 3 x 4 = 24 treatment groups.
Full factorial designs can be characterized by the number of treatment levels associated with each factor, or by the number of factors in the design. Thus, the design above could be described as a 2 x 3 x 4 design (number of treatment levels) or as a three-factor design (number of factors).
Note: Another type of factorial experiment is a fractional factorial. Unlike full factorial experiments, which include a treatment group for every combination of factor levels, fractional factorial experiments include only a subset of possible treatment groups. Our focus in this lesson is on full factorial experiments, rather than fractional factorial experiments.
With a full factorial experiment, a completely randomized design is distinguished by the following attributes:
Analysis of variance requires that the dependent variable be measured on an interval scale or a ratio scale . In addition, analysis of variance with a full factorial experiment makes three assumptions about dependent variable scores:
The assumption of independence is the most important assumption. When that assumption is violated, the resulting statistical tests can be misleading. This assumption is tenable when (a) experimental units are randomly sampled from the population and (b) sampled unitsare randomly assigned to treatments.
With respect to the other two assumptions, analysis of variance is more forgiving. Violations of normality are less problematic when the sample size is large. And violations of the equal variance assumption are less problematic when the sample size within groups is equal.
Before conducting an analysis of variance with data from a full factorial experiment, it is best practice to check for violations of normality and homogeneity assumptions. For further information, see:
A balanced design has an equal number of observations in all treatment groups. In contrast, an unbalanced design has an unequal number of observations in some treatment groups.
Balance is not required with one-way analysis of variance , but it is helpful with full-factorial designs because:
Note: Our focus in this lesson is on balanced designs.
To implement analysis of variance with a balanced, completely randomized, full factorial experiment, a researcher takes the following steps:
If you are familiar with one-way analysis of variance (see One-Way Analysis of Variance ), you might notice that the analytical logic for a completely-randomized, single-factor experiment is very similar to the logic for a completely randomized, full factorial experiment. Here are the main differences:
Below, we'll explain how to implement analysis of variance for fixed-effects models, random-effects models, and mixed models with a balanced, two-factor, completely randomized, full-factorial experiment.
For every experimental design, there is a mathematical model that accounts for all of the independent and extraneous variables that affect the dependent variable.
For example, here is the fixed-effects mathematical model for a two-factor, completely randomized, full-factorial experiment:
X i j m = μ + α i + β j + αβ i j + ε m ( ij )
where X i j m is the dependent variable score for subject m in treatment group ij , μ is the population mean, α i is the main effect of Factor A at level i ; β j is the main effect of Factor B at level j ; αβ i j is the interaction effect of Factor A at level i and Factor B at level j ; and ε m ( ij ) is the effect of all other extraneous variables on subject m in treatment group ij .
For this model, it is assumed that ε m ( ij ) is normally and independently distributed with a mean of zero and a variance of σ ε 2 . The mean ( μ ) is constant.
Note: The parentheses in ε m ( ij ) indicate that subjects are nested under treatment groups. When a subject is assigned to only one treatment group, we say that the subject is nested under a treatment.
The random-effects mathematical model for a completely randomized full factorial experiment is similar to the fixed-effects mathematical model. It can also be expressed as:
Like the fixed-effects mathematical model, the random-effects model also assumes that (1) ε m ( ij ) is normally and independently distributed with a mean of zero and a variance of σ ε 2 and (2) the mean ( μ ) is constant.
Here's the difference between the two mathematical models. With a fixed-effects model, the experimenter includes all treatment levels of interest in the experiment. With a random-effects model, the experimenter includes a random sample of treatment levels in the experiment. Therefore, in the random-effects mathematical model, the following is true:
All three effects are assumed to be normally and independently distributed (NID).
With a full factorial experiment, it is possible to test all main effects and all interaction effects. For example, here are the null hypotheses (H 0 ) and alternative hypotheses (H 1 ) for each effect in a two-factor full factorial experiment.
For fixed-effects models, it is common practice to write statistical hypotheses in terms of treatment effects:
H : α = 0 for all | H : β = 0 for all | H : αβ = 0 for all |
H : α ≠ 0 for some | H : β ≠ 0 for some | H : αβ ≠ 0 for some |
For random-effects models, it is common practice to write statistical hypotheses in terms of the variance of treatment levels included in the experiment:
H : σ = 0 | H : σ = 0 | H : σ = 0 |
H : σ ≠ 0 | H : σ ≠ 0 | H : σ ≠ 0 |
The significance level (also known as alpha or α) is the probability of rejecting the null hypothesis when it is actually true. The significance level for an experiment is specified by the experimenter, before data collection begins. Experimenters often choose significance levels of 0.05 or 0.01.
A significance level of 0.05 means that there is a 5% chance of rejecting the null hypothesis when it is true. A significance level of 0.01 means that there is a 1% chance of rejecting the null hypothesis when it is true. The lower the significance level, the more persuasive the evidence needs to be before an experimenter can reject the null hypothesis.
Analysis of variance for a full factorial experiment begins by computing a grand mean, marginal means , and group means. Here are formulas for computing the various means for a balanced, two-factor, full factorial experiment:
In the equations above, N is the total sample size across all treatment groups; n is the sample size in a single treatment group, p is the number of levels of Factor A, and q is the number of levels of Factor B.
A sum of squares is the sum of squared deviations from a mean score. Two-way analysis of variance makes use of five sums of squares:
In the formulas above, n is the sample size in each treatment group, p is the number of levels of Factor A, and q is the number of levels of Factor B.
It turns out that the total sum of squares is equal to the sum of the component sums of squares, as shown below:
SST = SSA + SSB + SSAB + SSW
As you'll see later on, this relationship will allow us to assess the relative magnitude of any effect (Factor A, Factor B, or the AB interaction) on the dependent variable.
The term degrees of freedom (df) refers to the number of independent sample points used to compute a statistic minus the number of parameters estimated from the sample points.
The degrees of freedom used to compute the various sums of squares for a balanced, two-way factorial experiment are shown in the table below:
Sum of squares | Degrees of freedom |
---|---|
Factor A | p - 1 |
Factor B | q - 1 |
AB interaction | ( p - 1 )( q - 1) |
Within groups | pq( n - 1 ) |
Total | npq - 1 |
Notice that there is an additive relationship between the various sums of squares. The degrees of freedom for total sum of squares (df TOT ) is equal to the degrees of freedom for the Factor A sum of squares (df A ) plus the degrees of freedom for the Factor B sum of squares (df B ) plus the degrees of freedom for the AB interaction sum of squares (df AB ) plus the degrees of freedom for within-groups sum of squares (df WG ). That is,
df TOT = df A + df B + df AB + df WG
A mean square is an estimate of population variance. It is computed by dividing a sum of squares (SS) by its corresponding degrees of freedom (df), as shown below:
MS = SS / df
To conduct analysis of variance with a two-factor, full factorial experiment, we are interested in four mean squares:
MS A = SSA / df A
MS B = SSB / df B
MS AB = SSAB / df AB
MS WG = SSW / df WG
The expected value of a mean square is the average value of the mean square over a large number of experiments.
Statisticians have derived formulas for the expected value of mean squares for balanced, two-factor, full factorial experiments. The expected values differ, depending on whether the experiment uses all fixed factors, all random factors, or a mix of fixed and random factors.
A fixed-effects model describes an experiment in which all factors are fixed factors. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment when both factors are fixed:
Mean square | Expected value |
---|---|
MS | σ + nqσ |
MS | σ + npσ |
MS | σ + nσ |
MS | σ |
In the table above, n is the sample size in each treatment group, p is the number of levels for Factor A, q is the number of levels for Factor B, σ 2 A is the variance of main effects due to Factor A, σ 2 B is the variance of main effects due to Factor B, σ 2 AB is the variance due to interaction effects, and σ 2 WG is the variance due to extraneous variables (also known as variance due to experimental error).
A random-effects model describes an experiment in which all factors are random factors. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment when both factors are random:
Mean square | Expected value |
---|---|
MS | σ + nσ + nqσ |
MS | σ + nσ + npσ |
MS | σ + nσ |
MS | σ |
A mixed model describes an experiment in which at least one factor is a fixed factor, and at least one factor is a random factor. The table below shows the expected value of mean squares for a balanced, two-factor, full factorial experiment, when Factor A is a fixed factor and Factor B is a random factor:
Mean square | Expected value |
---|---|
MS | σ + nσ + nqσ |
MS | σ + npσ |
MS | σ + nσ |
MS | σ |
Note: The expected values shown in the tables are approximations. For all practical purposes, the values for the fixed-effects model will always be valid for computing test statistics (see below). The values for the random-effects model and the mixed model will be valid when random-effect levels in the experiment represent a small fraction of levels in the population.
Suppose we want to test the significance of a main effect or the interaction effect in a two-factor, full factorial experiment. We can use the mean squares to define a test statistic F as follows:
F(v 1 , v 2 ) = MS EFFECT 1 / MS EFFECT 2
where MS EFFECT 1 is the mean square for the effect we want to test; MS EFFECT 2 is an appropriate mean square, based on the expected value of mean squares; v 1 is the degrees of freedom for MS EFFECT 1 ; and v 2 is the degrees of freedom for MS EFFECT 2 .
How do you choose an appropriate mean square for the denominator in an F ratio? The expected value of the denominator of the F ratio should be identical to the expected value of the numerator, except for one thing: The numerator should have an extra term that includes the variance of the effect being tested (σ 2 EFFECT ).
The table below shows how to construct F ratios when an experiment uses a fixed-effects model.
Table 1. Fixed-Effects Model
Effect | Mean square: Expected value | F ratio |
---|---|---|
A | σ + nqσ | |
B | σ + nqσ | |
AB | σ + nσ | |
Error | σ |
The table below shows how to construct F ratios when an experiment uses a Random-effects model.
Table 2. Random-Effects Model
Effect | Mean square: Expected value | F ratio |
---|---|---|
A | σ + nσ + nqσ | |
B | σ + nσ + npσ | |
AB | σ + nσ | |
Error | σ |
The table below shows how to construct F ratios when an experiment uses a mixed model. Here, Factor A is a fixed effect, and Factor B is a random effect.
Table 3. Mixed Model
Effect | Mean square: Expected value | F ratio |
---|---|---|
A (fixed) | σ + nσ + nqσ | |
B (random) | σ + npσ | |
AB | σ + nσ | |
Error | σ |
For each F ratio in the tables above, notice that numerator should equal the denominator when the variation due to the source effect ( σ 2 SOURCE ) is zero (i.e., when the source does not affect the dependent variable). And the numerator should be bigger than the denominator when the variation due to the source effect is not zero (i.e., when the source does affect the dependent variable).
Defined in this way, each F ratio is a convenient measure that we can use to test the null hypothesis about the effect of a source (Factor A, Factor B, or the AB interaction) on the dependent variable. Here's how to conduct the test:
What does it mean for the F ratio to be significantly greater than one? To answer that question, we need to talk about the P-value.
In an experiment, a P-value is the probability of obtaining a result more extreme than the observed experimental outcome, assuming the null hypothesis is true.
With analysis of variance for a full factorial experiment, the F ratios are the observed experimental outcomes that we are interested in. So, the P-value would be the probability that an F ratio would be more extreme (i.e., bigger) than the actual F ratio computed from experimental data.
How does an experimenter attach a probability to an observed F ratio? Luckily, the F ratio is a random variable that has an F distribution . The degrees of freedom (v 1 and v 2 ) for the F ratio are the degrees of freedom associated with the effects used to compute the F ratio.
For example, consider the F ratio for Factor A when Factor A is a fixed effect. That F ratio (F A ) is computed from the following formula:
F A = F(v 1 , v 2 ) = MS A / MS WG
MS A (the numerator in the formula) has degrees of freedom equal to df A ; so for F A , v 1 is equal to df A . Similarly, MS WG (the denominator in the formula) has degrees of freedom equal to df WG ; so for F A , v 2 is equal to df WG . Knowing the F ratio and its degrees of freedom, we can use an F table or an online calculator to find the probability that an F ratio will be bigger than the actual F ratio observed in the experiment.
To find the P-value associated with an F ratio, use Stat Trek's free F distribution calculator . You can access the calculator by clicking a link in the table of contents (at the top of this web page in the left column). find the calculator in the Appendix section of the table of contents, which can be accessed by tapping the "Analysis of Variance: Table of Contents" button at the top of the page. Or you can click tap the button below.
For examples that show how to find the P-value for an F ratio, see Problem 1 or Problem 2 at the end of this lesson.
Recall that the experimenter specified a significance level early on - before the first data point was collected. Once you know the significance level and the P-values, the hypothesis tests are routine. Here's the decision rule for accepting or rejecting a null hypothesis:
A "big" P-value for a source of variation (Factor A, Factor B, or the AB interaction) indicates that the source did not have a statistically significant effect on the dependent variable. A "small" P-value indicates that the source did have a statistically significant effect on the dependent variable.
The hypothesis tests tell us whether sources of variation in our experiment had a statistically significant effect on the dependent variable, but the tests do not address the magnitude of the effect. Here's the issue:
With this in mind, it is customary to supplement analysis of variance with an appropriate measure of effect size. Eta squared (η 2 ) is one such measure. Eta squared is the proportion of variance in the dependent variable that is explained by a treatment effect. The eta squared formula for a main effect or an interaction effect is:
η 2 = SS EFFECT / SST
where SS EFFECT is the sum of squares for a particular treatment effect (i.e., Factor A, Factor B, or the AB interaction) and SST is the total sum of squares.
It is traditional to summarize ANOVA results in an analysis of variance table. Here, filled with hypothetical data, is an analysis of variance table for a 2 x 3 full factorial experiment.
Analysis of Variance Table
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
A | 13,225 | p - 1 = 1 | 13,225 | 9.45 | 0.004 |
B | 2450 | q - 1 = 2 | 1225 | 0.88 | 0.427 |
AB | 9650 | (p-1)(q-1) = 2 | 4825 | 3.45 | 0.045 |
WG | 42,000 | pq(n - 1) = 30 | 1400 | ||
Total | 67,325 | npq - 1 = 35 |
In this experiment, Factors A and B were fixed effects; so F ratios were computed with that in mind. There were two levels of Factor A, so p equals two. And there were three levels of Factor B, so q equals three. And finally, each treatment group had six subjects, so n equal six. The table shows critical outputs for each main effect and for the AB interaction effect.
Many of the table entries are derived from the sum of squares (SS) and degrees of freedom (df), based on the following formulas:
MS A = SS A / df A = 13,225/1 = 13,225
MS B = SS B / df B = 2450/2 = 1225
MS AB = SS AB / df AB = 9650/2 = 4825
MS WG = MS WG / df WG = 42,000/30 = 1400
F A = MS A / MS WG = 13,225/1400 = 9.45
F B = MS B / MS WG = 2450/1400 = 0.88
F AB = MS AB / MS WG = 9650/1400 = 3.45
where MS A is mean square for Factor A, MS B is mean square for Factor B, MS AB is mean square for the AB interaction, MS WG is the within-groups mean square, F A is the F ratio for Factor A, F B is the F ratio for Factor B, and F AB is the F ratio for the AB interaction.
An ANOVA table provides all the information an experimenter needs to (1) test hypotheses and (2) assess the magnitude of treatment effects.
The P-value (shown in the last column of the ANOVA table) is the probability that an F statistic would be more extreme (bigger) than the F ratio shown in the table, assuming the null hypothesis is true. When a P-value for a main effect or an interaction effect is bigger than the significance level, we accept the null hypothesis for the effect; when it is smaller, we reject the null hypothesis.
For example, based on the F ratios in the table above, we can draw the following conclusions:
To assess the strength of a treatment effect, an experimenter can compute eta squared (η 2 ). The computation is easy, using sum of squares entries from an ANOVA table in the formula below:
where SS EFFECT is the sum of squares for the main or interaction effect being tested and SST is the total sum of squares.
To illustrate how to this works, let's compute η 2 for the main effects and the interaction effect in the ANOVA table below:
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
A | 100 | 2 | 50 | 2.5 | 0.09 |
B | 180 | 3 | 60 | 3 | 0.04 |
AB | 300 | 6 | 50 | 2.5 | 0.03 |
WG | 960 | 48 | 20 | ||
Total | 1540 | 59 |
Based on the table entries, here are the computations for eta squared (η 2 ):
η 2 A = SSA / SST = 100 / 1540 = 0.065
η 2 B = SSB / SST = 180 / 1540 = 0.117
η 2 AB = SSAB / SST = 300 / 1540 = 0.195
Conclusion: In this experiment, Factor A accounted for 6.5% of the variance in the dependent variable; Factor B, 11.7% of the variance; and the interaction effect, 19.5% of the variance.
In the ANOVA table shown below, the P-value for Factor B is missing. Assuming Factors A and B are fixed effects , what is the correct entry for the missing P-value?
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
A | 300 | 4 | 75 | 5.00 | 0.002 |
B | 100 | 2 | 50 | 3.33 | ??? |
AB | 200 | 8 | 25 | 1.67 | 0.12 |
WG | 900 | 60 | 15 | ||
Total | 1500 | 74 |
Hint: Stat Trek's F Distribution Calculator may be helpful.
(A) 0.01 (B) 0.04 (C) 0.20 (D) 0.97 (E) 0.99
The correct answer is (B).
A P-value is the probability of obtaining a result more extreme (bigger) than the observed F ratio, assuming the null hypothesis is true. From the ANOVA table, we know the following:
F B = F(v 1 , v 2 ) = MS B / MS WG
Therefore, the P-value we are looking for is the probability that an F with 2 and 60 degrees of freedom is greater than 3.33. We want to know:
P [ F(2, 60) > 3.33 ]
Now, we are ready to use the F Distribution Calculator . We enter the degrees of freedom (v1 = 2) for the Factor B mean square, the degrees of freedom (v2 = 60) for the within-groups mean square, and the F value (3.33) into the calculator; and hit the Calculate button.
The calculator reports that the probability that F is greater than 3.33 equals about 0.04. Hence, the correct P-value is 0.04.
In the ANOVA table shown below, the P-value for Factor B is missing. Assuming Factors A and B are random effects , what is the correct entry for the missing P-value?
Source | SS | df | MS | F | P |
---|---|---|---|---|---|
A | 300 | 4 | 75 | 3.00 | 0.09 |
B | 100 | 2 | 50 | 2.00 | ??? |
AB | 200 | 8 | 25 | 1.67 | 0.12 |
WG | 900 | 60 | 15 | ||
Total | 1500 | 74 |
(A) 0.01 (B) 0.04 (C) 0.20 (D) 0.80 (E) 0.96
The correct answer is (C).
F B = F(v 1 , v 2 ) = MS B / MS AB
Therefore, the P-value we are looking for is the probability that an F with 2 and 8 degrees of freedom is greater than 2.0. We want to know:
P [ F(2, 8) > 2.0 ]
Now, we are ready to use the F Distribution Calculator . We enter the degrees of freedom (v1 = 2) for the Factor B mean square, the degrees of freedom (v2 = 8) for the AB interaction mean square, and the F value (2.0) into the calculator; and hit the Calculate button.
The calculator reports that the probability that F is greater than 2.0 equals about 0.20. Hence, the correct P-value is 0.20.
Content preview.
Arcu felis bibendum ut tristique et egestas quis:
Lesson 9: 3-level and mixed-level factorials and fractional factorials, overview section .
These designs are a generalization of the \(2^k\) designs. We will continue to talk about coded variables so we can describe designs in general terms, but in this case we will be assuming in the \(3^k\) designs that the factors are all quantitative. With \(2^k\) designs we weren't as strict about this because we could have either qualitative or quantitative factors. Most \(3^k\) designs are only useful where the factors are quantitative. With \(3^k\) designs we are moving from screening factors to analyzing them to understand what their actual response function looks like.
With 2 level designs, we had just two levels of each factor. This is fine for fitting a linear, straight line relationship. With three level of each factor we now have points at the middle so we will are able to fit curved response functions, i.e. quadratic response functions. In two dimensions with a square design space, using a \(2^k\) design we simply had corner points, which defined a square that looked like this:
In three dimensions the design region becomes a cube and with four or more factors it is a hypercube which we can't draw.
We can label the design points, similar to what we did before – see the columns on the left. However for these design we prefer the other way of coding, using {0,1,2} which is a generalization of the {0,1} coding that we used in the \(2^k\) designs. This is shown in the columns on the right in the table below:
A | B | A | B | |
---|---|---|---|---|
- | - | 0 | 0 | |
0 | - | 1 | 0 | |
+ | - | 2 | 0 | |
- | 0 | 0 | 1 | |
0 | 0 | 1 | 1 | |
+ | 0 | 2 | 1 | |
- | + | 0 | 2 | |
0 | + | 1 | 2 | |
+ | + | 2 | 2 |
For either method of coding, the treatment combinations represent the actual values of \(X_1\) and \(X_2\), where there is some high level, a middle level and some low level of each factor. Visually our region of experimentation or region of interest is highlighted in the figure below when \(k = 2\):
If we look at the analysis of variance for a \(k = 2\) experiment with n replicates, where we have three levels of both factors we would have the following:
AOV | |
---|---|
A | 2 |
B | 2 |
A x B | 4 |
Error | 9(n-1) |
Total | 9n-1 |
Important idea used for confounding and taking fractions
How we consider three level designs will parallel what we did in two level designs, therefore we may confound the experiment in incomplete blocks or simply utilize a fraction of the design. In two-level designs, the interactions each have 1 d.f. and consist only of +/- components, so it is simple to see how to do the confounding. Things are more complicated in 3 level designs, since a p-way interaction has \(2^p\) d.f. If we want to confound a main effect (2 d.f.) with a 2-way interaction (4 d.f.) we need to partition the interaction into 2 orthogonal pieces with 2 d.f. each. Then we will confound the main effect with one of the 2 pieces. There will be 2 choices. Similarly, if we want to confound a main effect with a 3-way interaction, we need to break the interaction into 4 pieces with 2 d.f. each. Each piece of the interaction is represented by a psuedo-factor with 3 levels. The method given using the Latin squares is quite simple . There is some clever modulus arithmetic in this section, but the details are not important. The important idea is that just as with the \(2^k\)designs, we can purposefully confound to achieve designs that are efficient either because they do not use the entire set of \(3^k\)runs or because they can be run in blocks which do not disturb our ability to estimate the effects of most interest.
Following the text, for the A*B interaction, we define the pseudo factors, which are called the AB component and the \(AB^2\) component. These components could be called pseudo-interaction effects. The two components will be defined as a linear combination as follows, where \(X_1\) is the level of factor A and \(X_2\) is the level of factor B using the {0,1,2} coding system. Let the \(AB\) component be defined as
\(L_{AB}=X_{1}+X_{2}\ (mod3)\)
and the \(AB^2\) component will be defined as:
\(L_{AB^2}=X_{1}+2X_{2}\ (mod3)\)
Using these definitions we can create the pseudo-interaction components. Below you see that the AB levels are defined by \(L_{AB}\) and the \(AB^2\) levels are defined by \(L_{AB^2}\).
\(A\) | \(B\) | \(AB\) | \(AB^2\) | |
---|---|---|---|---|
0 | 0 | 0 | 0 | |
1 | 0 | 1 | 1 | |
2 | 0 | 2 | 2 | |
0 | 1 | 1 | 2 | |
1 | 1 | 2 | 0 | |
2 | 1 | 0 | 1 | |
0 | 2 | 2 | 1 | |
1 | 2 | 0 | 2 | |
2 | 2 | 1 | 0 |
This table has entries {0, 1, 2} which allow us to confound a main effect or either component of the interaction A*B. Each of these main effects or pseudo interaction components have three levels and therefore 2 degrees of freedom.
This section will also discuss partitioning the interaction SS's into 1 d.f. sums of squares associated with a polynomial, however, this is just polynomial regression. This method does not seem to be readily applicable to creating interpretable confounding patterns.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Scientific Reports volume 14 , Article number: 13922 ( 2024 ) Cite this article
Metrics details
Artificial intelligence (AI) holds immense promise for K-12 education, yet understanding the factors influencing students’ engagement with AI courses remains a challenge. This study addresses this gap by extending the technology acceptance model (TAM) to incorporate cognitive factors such as AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), alongside human–computer interaction (HCI) elements like user interface (UI), content (C), and learner-interface interactivity (LINT) in the context of using generative AI (GenAI) tools. By including these factors, an expanded model is presented to capture the complexity of student engagement with AI education. To validate the model, 210 Chinese students spanning grades K7 to K9 participated in a 1 month artificial intelligence course. Survey data and structural equation modeling reveal significant relationships between cognitive and HCI factors and perceived usefulness (PU) and ease of use (PEOU). Specifically, AIIM, AIRD, AICF, UI, C, and LINT positively influence PU and PEOU, while AIAX negatively affects both. Furthermore, PU and PEOU significantly predict students’ attitudes toward AI curriculum learning. These findings underscore the importance of considering cognitive and HCI factors in the design and implementation of AI education initiatives. By providing a theoretical foundation and practical insights, this study informs curriculum development and aids educational institutions and businesses in evaluating and optimizing AI4K12 curriculum design and implementation strategies.
Introduction.
Artificial intelligence (AI) technologies, including blockchain, augmented reality, 3D printing, nanotechnology, and the internet of things, significantly impact various human life aspects 1 . AI’s promise to revolutionize education is evident, with countries like the United States and China actively promoting AI in K-12 education 2 , 3 , 4 . In May 2018, the association for the promotion of artificial intelligence (AAAI) and the association of computer science teachers (CSTA) formed a joint working group to develop national guidelines for K-12 AI education, establishing the K-12 AI education concept (AI4K12) 5 , 6 . A key advancement in AI research is generative AI (GenAI), which uses machine learning and deep learning to create new data 7 , 8 , 9 . GenAI applications include image generation, natural language processing, and music composition, with innovations like midjourney generated images and ChatGPT smart chat enhancing creativity and public engagement 10 , 11 , 12 . The rise of tools like ChatGPT has intensified GenAI’s role in educational research, drawing public and academic interest to its educational implications, challenges, and opportunities 13 , 14 , 15 .
Research on artificial intelligence in education (AIED) has examined learners’ receptivity, Technological, system quality, cultural, self-efficacy, and trust factors are deemed crucial in e-learning systems 10 , 11 , 12 . Studies in computer vision courses highlight the influence of prior knowledge, skills, learning styles, motivation, and self-efficacy, the usability of the system, observable rows, and experimentation also affect the use of computer tools in the classroom 16 , 17 , 18 , 19 . Students’ perspectives on employing ChatGPT in programming and programming education 20 . A scale was developed, based on the unified theory of acceptance and use of technology (UTAUT) model, to gauge students’ acceptance of AI applications generated by artificial intelligence 21 . This scale was tailored and crafted for individuals aged 18–60 in Turkey. The validity and reliability of the AI literacy scale were confirmed 22 . Studies on the utilization of chatbots in training programs disclose that social expectations, effort, and influence are pivotal factors for engagement 23 .Chai et al. explored the correlation between AI literacy, AI curriculum framework (AICF), social welfare, and behavioral intention (BI) in K-12 students, finding positive correlations among these elements 24 . Long and Magerko (2020) developed a framework for AI literacy in K-12, emphasizing design considerations like explainability and transparency 25 . Green et al.(2019) proposed disciplinary literacy instruction in K-12 engineering to address diversity barriers in engineering careers 26 . However, few studies have systematically examined K-12 students’ acceptance of AI programs, particularly the causal relationships and direct impact factors. Students’ perceptions, cognitive factors in AI learning 27 , GenAI tools’ interactivity, and human–computer interaction (HCI) 28 factors are all crucial in influencing acceptance of AI in education. Studying K-12 students’ attitudes towards AI courses using GenAI tools is a promising research area, vital for understanding engagement, learning outcomes, and course optimization 29 .
In this study, we adopt the technology acceptance model (TAM) as the theoretical framework to understand K-12 students’ attitudes towards AI courses using generative AI (GenAI) tools. TAM, has been widely utilized to assess users’ acceptance and adoption of new technologies. The model posits that perceived usefulness (PU) and perceived ease of use (PEOU) significantly influence users’ attitudes and behavioral intentions towards adopting a new technology. Building upon this foundation, our extended TAM incorporates cognitive factors related to AI learning, such as AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), alongside human–computer interaction (HCI) elements like user interface (UI), content (C), and learner interface interactivity (LINT) specific to GenAI tools. By integrating these additional constructs into the TAM framework, we aim to provide a more comprehensive understanding of the factors shaping K-12 students’ acceptance of AI courses facilitated by GenAI tools.
Recent empirical studies have shed light on various aspects of AI in education, providing valuable insights into factors influencing students’ attitudes and behaviors. For instance, research by Almaiah and Almulhem (2018) identified key success factors for e-learning system implementation using the Delphi technique 10 . Similarly, Almaiah, Al-Khasawneh, and Althunibat (2020) explored critical challenges and factors influencing e-learning system usage during the COVID-19 pandemic 11 . Thematic analysis by Almaiah and Al Mulhem (2020) classified main challenges and factors influencing the successful implementation of e-learning systems using NVivo 12 . These studies underscore the importance of understanding the dynamics of technology acceptance and usage in educational settings, providing valuable insights that inform our research approach and contribute to the broader discourse on AI in education.
This study aims to analyze K-12 students’ attitudes towards AI courses using GenAI tools. Employing a conceptual model based on the technology adoption model (TAM), it includes cognitive and HCI factors as external variables in an extended model. The study involves designing and implementing GenAI-based AI courses. A group of 210 Chinese K7–K9 students participated, with their experiences evaluated through post-course questionnaires. Hypotheses were tested using structural equation modeling, resulting in an enhanced TAM version.
The study’s innovative contributions are:
Developing a comprehensive set of indicators for factors influencing K-12 students’ attitudes towards GenAI tool-based courses.
Experimentally deriving interrelationships among these influencing factors.
Proposing an improved experimental methodology based on the TAM model to validate these relationships.
As K-12 AI education advances, the significance and refinement of related models and frameworks are expected to grow.
This study proposes an extended TAM combining students’ cognitive learning of AI courses with GenAI’s HCI factors, potentially offering new directions for TAM in GenAI-based education in K-12.
Artificial intelligence (AI) is increasingly pivotal in education 30 . AI in education (AI-Ed) involves computers executing cognitive tasks akin to human thinking, particularly in learning and problem-solving. Over the past 30 years, AI-Ed has integrated into the education sector through various means, the integration of intelligent educational methods, curriculum design, and course structure aims to imbue students with environmental and sustainable development (ESD) awareness, while simultaneously incorporating cutting-edge technologies like artificial intelligence within the ESD framework 31 . This integration includes AI monitoring student forums, intelligent assessments, serving as a learning companion, assisting or replacing educators, and functioning as private tutors. Moreover, AI-Ed serves as a research tool for advancing science education 32 . Utilizing AI-Ed in computer science, machine learning, and deep learning can bridge the digital divide and foster AI literacy 13 . Consequently, AI has become an integral subject in K-12 education, preparing students with digital-driven knowledge and problem-solving skills for the digital world 33 .
Generative AI (GenAI) is a subset of AI that has garnered significant attention. It allows users to create new content, including text, images, audio, video, and 3D models, based on input requests. Recently, several GenAI platforms have emerged, such as ChatGPT, a large language model launched on November 30, 2022, which attracted a million users within five days of its release 34 , 35 . ChatGPT, as an AI chatbot, aids student learning by providing information and narration 36 . In the realm of GenAI imagery, platforms like Disco Diffusion, Dall-E2, Imagen, Mid Journey, and Stable Diffusion are prominent. Mid Journey, for example, creates artistic images based on user text inputs 37 , impacting art and education. These GenAI applications generate outputs after learning from user requests. Applying GenAI to AI4K12 involves interpreting science and technology through engineering and art 38 . For instance, using ChatGPT and Mid Journey in AI education practice for K-12 involves designing courses with visual narratives 39 . ChatGPT enhances students’ communication and narrative skills, while Mid Journey can be used to create picture books 40 . Incorporating tools like ChatGPT and Mid Journey into curriculum design is feasible for improving AI literacy in K-12 education.
Previous research has delved into various facets of AI in Education (AI-Ed), exploring educators’ readiness to teach AI, attitudes towards using chatbots in education, and factors influencing students' continued interest in AI learning. However, despite these valuable insights, there remains a gap in understanding the factors specifically influencing K-12 students’ acceptance of AI courses facilitated by generative AI (GenAI) tools. Given the increasing integration of AI into K-12 education and the emergence of GenAI platforms, it is crucial to explore the unique dynamics shaping students’ attitudes towards these innovative learning tools. By addressing this research gap, our study aims to contribute to the existing literature by providing insights into the factors driving K-12 students’ acceptance of AI4K12 courses, ultimately informing the design and implementation of effective AI education programs for this demographic.
Previous studies have examined various perspectives in AIED. These include educators’ readiness and willingness to teach AI 18 , combining diffusion theory with technology adoption rates confirms that usability and user-friendliness are relevant to the adoption rate of artificial intelligence tools in online learning 19 , and attitudes towards using chatbots in education 41 . Research has also focused on factors influencing students’ continued interest in AI learning 24 , perceptions of AI coaching 23 , and universities’ behavioral intention (BI) to use AI robots for education 42 . To foster the widespread adoption of GenAI programs in AI4K12, understanding the factors influencing K-12 students’ acceptance of such courses is essential. This study aims to explore these influential factors.
The technology acceptance model (TAM), originally proposed by Davis, provides a robust theoretical framework for examining user acceptance and usage of new technologies. While TAM has been widely applied across various fields, including healthcare, management, and finance, its application in the context of AI in education (AIED) remains relatively underexplored. Specifically, the unique characteristics of GenAI tools and their implications for students’ perceptions of usefulness and ease of use have not been thoroughly investigated within the TAM framework. By applying TAM to the study of K-12 students’ attitudes towards AI courses with GenAI tools, our research seeks to elucidate the underlying factors driving students’ acceptance of these innovative learning platforms. This theoretical approach allows us to identify key determinants of students’ attitudes and intentions towards AI4K12 courses, providing valuable insights for educators, policymakers, and developers seeking to enhance AI literacy and engagement among K-12 students.
The technology acceptance model (TAM), proposed by Davis, addresses user acceptance and usage of new technologies 43 , 44 .
Based on TAM model, this paper explores the willingness of university students to use the meta-universe-based learning platform. Perceived usefulness, personal innovation and perceived enjoyment are the key factors 45 . He suggested that perceived usefulness (PU) and perceived ease of use (PEOU) are key to embracing and promoting technology use 43 , 44 . TAM has been applied across various fields, including healthcare, management, finance, and education, For example: the university student to the mobile learning acceptance degree research 46 , 47 . In AI device acceptance studies, a theoretical model called AI device usage acceptance (AIDUA) includes social influence, personification, performance expectations, emotional engagement, and hedonic motivation as antecedents to user attitudes 48 . Other studies on AI acceptance have identified PU, performance expectations, attitude, trust, and effort expectation as influencing AI intention, willingness, and usage behavior 49 . Research on students’ willingness to continue AI learning revealed that AI literacy and its impact on social welfare affect students’ BI 24 . Applying TAM in AIED shows varying external variables influencing PU and PEOU from different research angles. The factors influencing K-12 students’ attitudes towards learning AI courses with GenAI tools remain unclear.
External variables, learning cognition of ai.
Based on the cognitive characteristics of students in AI education for K-12 (AI4K-12), we propose the inclusion of four key variables to enhance the technology acceptance model (TAM) study: AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX).
This suggestion stems from a comprehensive review of teachers’ AI cognition and their willingness to teach AI, highlighting the importance of understanding AIAX, its impact on social welfare, attitude towards use (ATT) AI, perceived teaching confidence, AICF, AI correlations, AIRD, and behavioral intentions (BI). Furthermore, in the context of AI4K12, students’ cognition plays a crucial role in influencing their learning process and outcomes. Our research on developing and evaluating AI courses confirms the significance of learning perception abilities, including motivation, confidence, attitude, readiness, and anxiety, in shaping effective AI education strategies.
AI intrinsic motivation (AIIM): Previous study has shown that motivation can enhance students’ willingness to learn 50 , 51 , 52 . Intrinsic motivation possesses a psychological cognitive process of exploration, experimentation, curiosity, and manipulation, which is a natural manifestation of human learning and integration of knowledge 53 . The intrinsic motivation of learning guides students to set learning goals and continuously participates in the learning process through the classroom learning activities, which has a positive impact on academic performance 54 . Therefore we propose the following assumptions.
H1a, students’ AIIM has a positive impact on their PU in learning AI courses through the GenAI tool.
H1b, students’ AIIM has a positive impact on their PEOU in learning AI courses through the GenAI tool.
AI readiness (AIRD): the technology readiness index (TRI) is used to measure people’s tendency to accept and use advanced information technology 55 . Based on positive expectations for the use of technology, preparatory work can predict learning behavior 56 . AIRD can measure students’ understanding of the comfort level of AI knowledge and technology in their learning and life, and has a related impact on their learning attitude towards AI courses 57 . In the behavioral research of teachers teaching AIED, AIRD is related to BI, and PU has a positive impact on BI. Therefore, we propose the following hypothesis.
H2a, students’AIRD has a positive impact on their PU in learning AI courses through the GenAI tool.
H2b, students’AIRD has a positive impact on their PEOU in learning AI courses through the GenAI tool.
AI confidence: in AIED, AICF represents students’ confidence in learning AI course content 58 . AICF can affect students’ willingness to learn and other variables, and is an important impact factor on AI usage behavior 59 , 60 , 61 . In research on students using mobile devices for learning, it has been found that mobile device usage confidence has a positive impact on PEOU 62 , 63 . Therefore, we propose the following assumptions.
H3a, students’ AICF has a positive impact on their PU in learning AI courses through the GenAI tool.
H3b, students’ AICF has a positive impact on their PEOU in learning AI courses through the GenAI tool.
AI anxiety: computer phobia is defined as the fear and anxiety of advanced technology 64 . When using mobile devices for learning, mobile device anxiety can also affect learning behavior 63 . Based on the background of AI, AIAX can be traced back to technology phobia and computer anxiety. Define AIAX as a fear of AI, and users' concerns about the unknown impact of AI programs and related technological developments on humans and society 65 , 66 . In the use of ChatGPT, AIAX predicts learning behavior 67 , 68 , and the unease of GenAI usage affects user behavior 69 , 70 . In e-learning environments, where learners interact with AI tools during the process, anxiety and uneasiness affect the user’s usage.AIAX has an impact on PU 71 . In using the GenAI tool to learn AI courses, we propose the following assumptions.
H4a, students’ AIAX has a negative impact on their PU in learning AI courses through the GenAI tool.
H4b, students’ AIAX has a negative impact on their PEOU in learning AI courses through the GenAI tool.
HCI refers to the interaction between users and computers. And human–computer interaction refers to the computer-mediated dialogue that users engage in the created environment by themselves. Interactivity in online educational programs refers to the relationship between students and computers in a human–computer interaction environment 72 . During the process of using the GenAI tool for AIED, HCI has an impact on students’ attitudes and behaviors 67 . Therefore, based on the teaching characteristics of using the GenAI tool in AI4K12, we suggest that considering HCI factors and using interface design (UI), content (C), and learner interface interactivity (LINT) as variables to expand TAM research.
using interface design (UI): in HCI, an interface is defined as the visible part of the information system that can be touched, heard, and seen by the user 72 . UI is an important factor in the software development process, and user demand oriented design is the key to UI 73 . The emergence of user centered UI principles provides a theoretical basis for designers to conduct UI, such as distinguishing the most important information, buttons with consistent styles, and actively providing feedback 74 , 75 . In the research field of online courses or mobile applications for learning, following UI principles makes the system easier for students to use and operate, and UI also plays an important role in the system's PU, Based on the technology acceptance model, learning content quality, content design quality, interactivity, functionality, user interface design, accessibility, personalization, and responsiveness are the main factors influencing the acceptance of mobile learning 76 , 77 . In the process of using GenAI for teaching, the UI also has an impact on PEOU. Therefore, we propose the following assumptions.
H5, the UI of the GenAI tool used in AI course learning has a positive impact on PEOU.
Content (C): C is related to the course content. In the field of mobile devices, C is considered to have a significant impact on student satisfaction 78 . In the computer context, the structure and capacity of C have a direct impact on PU, and C is an important influencing factor for user acceptance of the system 79 . When investigating the factors that affect the use of BI on mobile devices, C has a positive impact on PU 63 . In evaluating the role of MOOC acceptance and use, C is positively correlated with PEOU 80 . Based on previous research findings, we propose the following assumptions.
H6a, the use of GenAI tools for teaching’s C has a positive impact on students’ PU in learning AI courses.
H6b, the use of GenAI tools for teaching's C has a positive impact on students’ PEOU in learning AI courses.
Learner interface interactivity (LINT): LINT allows users to interact with the system through the menu bar using the program 81 . When testing the impact of enhancing student interactivity on improving e-learning acceptance and the relationship between variables, there is a relationship between LINT, PU, and PEOU 29 . During the use of GenAI tools, LINT also has an impact on students’ PU and PEOU, so we assume that.
H7a, LINT has a positive impact on students’ PU in learning AI courses through the GenAI tool.
H7b, LINT has a positive impact on students’ PEOU in learning AI courses through the GenAI tool.
Perceived Usefulness (PU) is defined as the degree to which a user believes that using a specific system will improve their/her work performance. In addition, perceived ease of use (PEOU) is defined as the degree to which users do not need to put in any effort to use the system 43 , 44 . The correlation between TAM model structures has been proven in many studies. The relationship between PU and PEOU has also been confirmed in research in the field of education. Attitude towards use (ATT) is a person’s perception of technology, which is a psychological feedback of liking, enjoying, and being happy with technology 58 , usability, which has a positive impact on the practical use of m-learning systems 82 . In the previous research, there is a higher education students to adopt the meta-educational intention of the factors 83 . and studies on users’ sustained intention towards e-learning 84 have both concluded that both PU and PEOU affect a person’s ability to use the system’s ATT. Therefore, when studying the influencing factors of students’ attitudes towards AI teaching using GenAI, we propose the following assumptions:
H8, the PU of AI courses learned by students through the GenAI tool has a positive impact on ATT.
H9, students’ learning of AI courses through the GenAI tool has a positive impact on PU through PEOU.
H10, the PEOU of students learning AI courses through the GenAI tool has a positive impact on their attitude towards ATT.
This study analyzed the learning cognitive and human interaction factors that affect students’ attitudes. Expand Davis’ TAM model with external variables from literature review and previous research findings. Using PU, PEOU, and ATT as basic variables, seven external variables were derived through literature review and previous research analysis. Figure 1 shows the proposed hypothesis model.
Assumption model.
This study endeavors to delve into the determinants shaping K-12 students’ perceptions of AI courses facilitated by generative AI (GenAI) tools. To elucidate these factors, an analytical framework was formulated, drawing inspiration from Davis’ technology acceptance model (TAM) as its foundational underpinning. Building upon the core constructs of TAM—perceived usefulness (PU), perceived ease of use (PEOU), and attitude towards use (ATT)—the research extends the model by incorporating additional external variables gleaned from an exhaustive literature review and synthesis of prior research. Specifically, the model integrates cognitive factors associated with AI learning, including AI intrinsic motivation (AIIM), AI readiness (AIRD), AI confidence (AICF), and AI anxiety (AIAX), as well as human–computer interaction (HCI) elements such as user interface (UI), content (C), and Learner Interface Interactivity (LINT). Figure 1 depicts the proposed hypothesis model, illustrating the interconnections among these variables. For participant selection, a convenience sampling approach was adopted to recruit a cohort of 210 Chinese K-12 students spanning grades K7–K9. This sampling method was chosen for its practicality and ease of access, facilitating the efficient enlistment of participants from the target demographic. Demographic details, encompassing age, gender, and grade level, were gathered to furnish insights into the profile of the sample, enabling a more nuanced analysis of the research outcomes.
In terms of tool development and validation, all research instruments utilized in this study were selected or adapted from established measures drawn from prior research endeavors. Rigorous attention was dedicated to ensuring the reliability and validity of these measures, with necessary adjustments made to align them with the study context. Validation procedures encompassed pilot testing and expert validation to affirm the appropriateness of the measures in assessing the intended constructs. Through this meticulous validation process, the research instruments were deemed apt for capturing the pertinent variables of interest.
Data analysis procedures entailed the utilization of structural equation modeling (SEM) techniques to analyze the quantitative data collected through surveys. This analytical approach facilitated the testing of the stipulated hypotheses and the exploration of the relationships between the variables delineated in the research model. Statistical software packages such as SPSS and AMOS were employed to conduct the analyses, enabling robust statistical testing and elucidation of the research findings. By organizing the methodology section in a cohesive narrative format, this study offers a lucid and transparent depiction of the research design, participant recruitment approach, measurement instruments, and data analysis protocols, ensuring rigor and validity in the study’s outcomes.
The participants in this study were 210 students selected from two high schools in China. Among them, 97 were males (45.7%) and 114 were females (54.3%). The students’ grades are K7-K9. Students voluntarily participate in research experiments and are aware of the research procedures. The data related to the experiments are anonymous and have also received permission and recognition from their parents and the school. Students will participate in a one month course, which mainly focuses on AI knowledge learning using the GenAI tool. The main content of the course is the creation of AI visual narratives (story picture books). All students are undergoing systematic AIED for the first time. The experimental process is shown in Fig. 2 .
Experimental process.
Sample population: the sample population for this study consisted of K-12 students from various schools in China. These students were chosen to represent a diverse demographic, including different grade levels and socioeconomic backgrounds, to ensure the findings were applicable across a broad range of contexts.
Sampling technique: a stratified random sampling technique was employed to select participants for the study. Schools were stratified based on geographic location, school type (public/private), and grade level. Within each stratum, a random sample of schools was selected, and then students within those schools were randomly chosen to participate in the study. This sampling technique helped ensure that the sample was representative of the target population and minimized selection bias.
Justification of sample size: the sample size of 210 Chinese K-12 students was determined based on power analysis and the requirements for structural equation modeling (SEM) analysis. Prior research suggests that a sample size of at least 200 participants is adequate for SEM analysis, particularly when examining complex relationships among variables. Additionally, power analysis was conducted to ensure that the sample size was sufficient to detect meaningful effects with a reasonable degree of confidence. This sample size also allowed for subgroup analyses based on demographic variables such as grade level and gender, providing further insights into potential variations within the sample population.
The course spans one month, comprising a total of eight classes, dedicated to the creation of AI picture books centered on “AI, Love, and the Future”. From sessions 3 to 7, students delve into utilizing ChatGPT, Midjourney, and AI translation software for crafting their picture books. Collaboratively, teachers and students explore the nexus between AI and our world, leveraging GenAI for acquiring novel knowledge. This encompasses mastering AI translation software for bilingual tasks and harnessing generative chat tools for narrative continuity. Additionally, understanding how generative image systems operate in image creation and story coherence is emphasized. The final session involves student presentations, fostering discussions and idea exchanges between teachers and students. Course content design adheres to input from five AIED experts, detailed in Table 1 .
During the course implementation process, students use ChatGPT and Midjourney to create and showcase their works, as shown in Fig. 3 of the course implementation process.
Course implementation process.
The survey instrument is divided into two parts. The first part of the survey questionnaire includes demographic questions, including gender, grade; and the second step uses 38 items to measure the 10 structures of the research model. Ten structures are classified as external variables and internal variables.
External variables (AIIM, AIRD, AICF, AIAX, UI, C, LINT).
Internal variables (PU, PEOU, ATT). Each construct is measured by multiple items. In order to obtain participants’ responses and quantify the construction, a five-point Likert scale was used to score the questionnaire responses. The Likert scale consists of five answer options, ranging from “strongly disagree” (mapped to number 1) to “strongly agree” (mapped to number 5).
This tool was developed after reviewing research on TAM models, AI learning cognitive factors, and HCI factors. All items in the survey questionnaire were proofread by translation experts and translated into Chinese. The specific content and reference materials of the variable item survey questionnaire are shown in Table 2 .
Following the course completion, students anonymously and voluntarily completed a questionnaire survey. The questionnaire was administered via the Chinese online platform, question star, resulting in 210 responses. Post-sorting, 13 responses were deemed invalid, leaving 197 valid ones. Demographic variables underwent frequency analysis utilizing SPSS 26 software, revealing a distribution of 86 boys (43.7%) and 111 girls (56.3%). Among them, 65 students were in seventh grade (33%), 70 in eighth grade (35.5%), and 62 in ninth grade (31.5%). Demographic information is summarized in Table 3 .
All methods were performed in accordance with relevant ethical guidelines and regulations, the experimental protocols were approved by the Academic Committee of Guangzhou University of Technology and Guangzhou University of Technology, and the experiments were conducted with the informed consent of all subjects and their legal guardians.
This study used SPSS 26 and SMART PLS 4.0 for data analysis. Data analysis includes two steps, reliability and validity analysis, as well as hypothesis testing. Firstly, internal consistency reliability (Cronbach's α and composite reliability} was measured by SMART PLS 4.0; and composite reliability (CR) was tested using SPSS 26. High CA and CR values indicate high reliability of the tool. It is recommended that the CA and CR values be higher than 0.70. To evaluate the convergence effectiveness of the construction, we used CR values and average variance extraction (AVE) values, and verified the discriminant effectiveness of the construction by analyzing the square root value of the extracted mean difference (AVE). If all constructs are higher than the correlation between constructs, then the sufficiency of discriminative validity is demonstrated. Secondly, after obtaining satisfactory results in the first step, use the structural model to test the hypothesis. Analyze the significance and magnitude of each path coefficient to test our hypothesis. The model fitting index was also evaluated to determine the adequacy of the proposed research model.
To mitigate the potential effects of common method bias, several strategies were employed throughout the data collection and analysis processes. First, we ensured anonymity and confidentiality in the survey responses to encourage participants to provide honest and accurate answers without fear of judgment or repercussion. Additionally, we employed procedural remedies such as counterbalancing the order of questionnaire items and using reverse-coded items to minimize response bias. Furthermore, we conducted Harman’s single-factor test to assess the extent of common method bias in our data. The results indicated that no single factor accounted for the majority of the variance, suggesting that common method bias was not a significant concern in our study. However, we acknowledge that these measures may not completely eliminate common method bias and have included this limitation in our discussion.
Results of reliability and effectiveness testing.
Table 4 shows the reliability analysis results by SPSS 26, and the Clonbachα values meet the standard, all greater than 0.8. Therefore, it can be proven that the research results of variables are reasonable. To ensure the accuracy of measurement results, reliability analysis needs to be conducted on the valid data in the questionnaire before analysis.
Secondly, KMO and Bartlett tests were conducted to analyze the effectiveness of the entire questionnaire. The results are shown in Table 5 below.
From the Table 5 , it can be seen that the KMO value is 0.880, and the KMO value is greater than 0.8, which illustrate the research data is very suitable for extracting information.
The results of the discriminant validity test are shown in Table 6 . It can be seen that the AVE extracted square root (number on the diagonal) of each variable is greater than the correlation between this variable and other variables, so the data is considered to have good discriminant validity.
According to Table 7 , the above HTMT values are all below 0.85, indicating that the data has good discriminant validity.
Model fitting index.
The initial step in hypothesis testing involves assessing the structural model. Our model adheres to established fitting standards, with all model fitting index values deemed acceptable, including VIF < 5 and F2 > 0.02. Notably, all VIF values fall below 5, signifying the absence of significant collinearity concerns within the dataset.
The sample size of this study is an important factor in model analysis. Therefore, after strict screening, 197 valid questionnaires were used for research analysis, which meets the sample size required for SMART PLS analysis. This study calculated the path coefficient and p-value. As shown in Fig. 4 , the significance of all assumed pathways is supported at the 0.05 significance level.
Results of hypothesis testing.
The path coefficients of the structural equation model are shown in Table 8 .
In this study, students’ cognitive factors such as AIIM, AIRD and AICF influence positively on PU. Hypothesis all have been tested as H1a (AIIM → PU, β = 0.211); H1b (AIIM → PEOU, β = 0.166), H2a (AIRD → PU, β = 0.152), H2b (AIRD → PEOU, β = 0.136), H3a (AICF → PU, β = 0.158), H3b (AICF → PEOU, β = 0.159), and p < 0.05. The indicates that AIIM, AIRD, and AICF have a positive impact on attitudes among cognitive factors in learning AIED through the use of GenAI. However, H4a (AIAX → PU, β =—0.130), H4b (AIAX → PEOU, β =—0.162), and p < 0.05, indicating that student AIAX has a negative impact on students’ ATT. Among the HCI factors, C and LINT have a positive impact on PU and PEOU, while UI has a positive impact on PEOU. After verification, these assumptions are valid and valid. While, H5 (UI → PEOU, β = 0.173), H6a (C → PU, β = 0.168), H6b (C → PEOU, β = 0.184), H7a (LINT → PU, β = 0.145), H7b (LINT → PEOU, β = 0.203), and p < 0.05. PU and PEOU have a positive impact on ATT, with path coefficients ranging from 0.17 to 0.23 with p < 0.05.
To sum up, PU, PEOU, AIIM, AIRD, AICF, UI, C, and LINT are important factors that positively affect students’ attitudes towards learning AIED through the use of GenAI, while AIAX has a negative impact on ATT.
The results of the study showed that firstly, AIIM had a positive impact on PU and it was the second most influential factor (0.211) on student acceptance as well as positively affecting PEOU.
Although AIRD has a positive effect on PU (0.152) and PEOU (0.136), its positive impact on PEOU is indeed the smallest. This is consistent with the research of Chiu et al. 24 .which previously believed that the level of AIRD can measure the understanding of AI knowledge and technology.
AICF is positively correlated with PU (0.158) and PEOU (0.159), which is consistent with the results of graduate students using mobile devices for learning (Stavros A. Nikou, et al.). The greater the confidence in learning AI, the better students can accept AI courses and maintain sustainable learning behavior.
AIAX has a negative impact on both PU (− 0.130) and PEOU (− 0.162). This is consistent with the results of Tae Hyun Baek and Minseong Kim’s study on students’ learning behavior using ChatGPT 67 , and also with the results of Stavros A. Nikou et al.’s study on mobile device use anxiety.
Among the HCI factors, UI is positively correlated with PEOU, with a path coefficient of 0.173. UI is considered an important influencing factor in online course student acceptance (AL-Sayid, F. and Kirkil, G.) 29 and mobile learning acceptance 77 . While using GenAI to teach courses, a more user-friendly interface makes it more likely for students to accept and choose this course.
In the results, C was found to be significantly positively correlated with PU (0.168) and PEOU (0.168).When using the GenAI tool for learning, LINT has a significant impact on PU and PEOU, with a path coefficient of 0.203.The use of programs through the menu bar to interact with the GenAI system has a significant impact on students’ acceptance.
We gained the following conclusion. PU has a positive impact on ATT (0.208). PEOU has a positive impact on PU (0.177). And PEOU has a positive impact on ATT (0.228). Previous research in the field of education has focused on the use of mobile devices and online courses. While the results of this study indicate that those factors are also applicable to the study of AI course acceptance by GenAI.
The results indicate that AIIM, AIRD, AICF, AIAX, UI, C and LINT all influence students’ attitudes towards learning AI. Among students' cognitive factors. AIIM has the greatest effect on PU, and among human HCI factors, LINT has the greatest effect on PEOU.
The discussion section of this study offers an in-depth analysis of K-12 students’ attitudes towards AI courses facilitated by generative AI (GenAI) tools. The examination is structured into three key segments, each focusing on distinct aspects of the research. Initially, the study investigates how cognitive factors related to AI learning influence perceived usefulness (PU) and perceived ease of use (PEOU). Subsequently, it explores the impact of human–computer interaction (HCI) factors on PU and PEOU. Finally, it delves into the interplay between PU, PEOU, and attitude towards use (ATT).Building upon established theoretical frameworks, such as the technology acceptance model (TAM), this study introduces a novel conceptual model tailored to assess K-12 students’ attitudes towards using GenAI tools in AI in education (AIED) courses. By incorporating both cognitive learning factors and HCI elements, the study extends the existing literature, offering a comprehensive understanding of the complex dynamics influencing students' attitudes towards AI education.
The empirical analysis conducted in this study validates the proposed model and hypotheses, thereby contributing to theoretical advancements in the field of AI4K12 education. However, it is crucial to contextualize these findings within the broader landscape of educational research. Previous studies, such as those by Almaiah and Almulhem (2018) 10 , Almaiah, Al-Khasawneh, and Althunibat (2020) 11 , and Almaiah and Al Mulhem (2020) 12 , have highlighted the critical challenges and success factors influencing the implementation and usage of e-learning systems. Drawing parallels between these studies and the current research can provide valuable insights into the unique considerations and obstacles associated with integrating innovative technologies, like GenAI, into educational settings.
From a practical perspective, the findings of this study underscore the potential of GenAI tools to enhance AIED methodologies. However, it is essential to recognize that variations in students’ cognitive learning processes may impact their attitudes and efficacy towards learning. By leveraging cutting-edge technologies and implementing pedagogical strategies informed by self-determination theory, educators and system designers can create inclusive and engaging learning experiences that promote sustained student engagement and mastery of AI knowledge.
In terms of theoretical implications, the findings of this study contribute significantly to the existing body of knowledge in the field of artificial intelligence in education (AIED). By expanding upon Davis’ technology acceptance model (TAM) with additional cognitive and human–computer interaction (HCI) factors, we have not only provided a more nuanced understanding of students’ attitudes towards AI courses facilitated by generative AI (GenAI) tools but also enriched the theoretical framework guiding research in this domain. This augmentation of the TAM model with external variables derived from the literature review and previous research findings offers a more comprehensive perspective on the determinants of students’ acceptance of AI4K12 courses. Furthermore, the empirical validation of this extended model through structural equation modeling adds robustness to its theoretical underpinnings and lays the groundwork for future research endeavors in the realm of AIED.
In conclusion, this study offers actionable insights for AIED policymakers, system developers, educators, and students, aiming to foster a superior AI learning experience for K-12 students. By addressing the complex interplay between cognitive factors, HCI elements, and attitudes towards AI education, this research contributes to the ongoing discourse surrounding the integration of GenAI tools in educational settings.
The advent of artificial intelligence (AI) represents both significant opportunities and challenges for society, as intelligent algorithms and robots increasingly assume roles across various sectors. As AI becomes more integrated into daily life, it becomes crucial for individuals to adapt to coexist with these technologies. This underscores the importance of early integration of AI in education (AIED) into student learning, necessitating pioneering research in AIED-centric pedagogy for the K-12 demographic.
This study delves into K-12 students’ perceptions of learning AI-related content through generative AI (GenAI) tools. Through an extensive literature review, the study identifies external factors shaping students’ attitudes towards learning and applies the technology acceptance model (TAM), integrating it with theories of cognitive learning and human–computer interaction (HCI). With the participation of 210 Chinese K-12 students, this work stands as a significant contribution to the field. The analysis validates ten hypotheses, demonstrating the substantial impact of cognitive and behavioral learning factors, alongside HCI considerations, on students’ attitudes towards AI education. These findings offer crucial insights for AIED policymakers and developers, informing the creation of diverse and engaging AI4K12 curricula aimed at sustaining students’ interest in AI and promoting ongoing engagement and acquisition of intricate AI knowledge. However, this study has its limitations. The predominantly Chinese sample may not fully represent the global student body, and the study does not comprehensively cover all K-12 age groups. Future research should encompass a broader spectrum of K-12 grade levels, span multiple countries and regions, and explore gender and grade-level variations among students. Additionally, reliance on a single experimental course approach and online quantitative data collection may not fully capture the nuances of students’ attitudes. Future investigations should integrate qualitative methodologies, such as semi-structured interviews and group discussions, for deeper insights.
The results of this study underscore the importance of considering both cognitive learning factors and HCI elements in designing and implementing AI courses in K-12 education. Our findings suggest that enhancing students' perceptions of usefulness and ease of use, while addressing potential anxiety associated with AI, is crucial for fostering positive attitudes towards AI education. By integrating GenAI tools into the curriculum, educators can create more engaging and effective learning experiences for students, thereby promoting the development of essential AI literacy skills. Moreover, our study sheds light on the complex interplay between cognitive and HCI factors in shaping students' attitudes towards AI education, highlighting the need for a holistic approach to curriculum design. Furthermore, recent studies have contributed to the development of scales aimed at measuring artificial intelligence literacy and acceptance, providing valuable tools for researchers and educators to assess students' readiness and attitudes towards AI education 20 , 21 , 22 .
In conclusion, this research offers a thorough examination of K-12 students’ attitudes towards AI education using GenAI tools, focusing on learning cognition and HCI factors. Future endeavors should explore additional factors affecting AI learning acceptance, including various aspects of the learning environment, and examine students’ AI learning experiences from diverse cognitive viewpoints.
In addition to the research findings and limitations discussed above, it is essential to consider the practical and theoretical implications of this study. Practically, the findings offer valuable insights for educators, policymakers, and developers involved in AI education for K-12 students. By identifying the cognitive and HCI factors that influence students’ attitudes towards AI education using GenAI tools, this research provides a roadmap for designing more effective and engaging AI4K12 curricula. Educators can leverage these insights to tailor their teaching approaches and course designs to better meet students’ needs and preferences, ultimately fostering a more positive learning experience.
Moreover, policymakers can use this research to inform decisions regarding the integration of AI education into school curricula, ensuring that students are adequately prepared for the future workforce. From a theoretical perspective, this study contributes to the existing body of literature on AI education and technology acceptance by extending the TAM framework to include cognitive and HCI factors specific to GenAI tools. By validating the proposed model and hypotheses, this research advances our understanding of the complex interplay between individual perceptions, cognitive processes, and technological interfaces in the context of AI education. Furthermore, the inclusion of HCI factors underscores the importance of considering user experience and interface design in educational technology development, highlighting the need for a more holistic approach to AI education research. Overall, the practical and theoretical implications of this study underscore its significance and provide a foundation for future research in the field of AI education.
The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.
Artificial intelligence education
Artificial intelligence education for k-12
Generative artificial intelligence
Technology acceptance mode
Perceived usefulness
Perceived ease of use
Attitudes towards the use of artificial intelligence
Behavioral intent
Structural Equation model
Student’s intrinsic motivation to learn artificial intelligence
Artificial intelligence readiness
Artificial intelligence confidence
Artificial intelligence anxiety
Human–computer interaction
User interface
Learner-interface interactivity
Computational thinking
Design thinking
Russell Stuart, J. & Norvig, P. Artificial intelligence a modern approach (2010).
Google Scholar
OECD. Trustworthy Artificial Intelligence (AI) in Education, Promises and Challenges. https://www.oecd.org/education/trustworthy-artificial-intelligence-ai-in-education-a6c90fa9-en.html . Accessed 10 Oct 2023. (2020).
Touretzky, D., Gardner-Mccune, C., Breazeal, C., Martin, F. & Seehorn, D. A year in K-12 AI education. AI. Mag. 40 , 88–90 (2019).
Touretzky D, Gardner-McCune C, Martin F, Seehorn D. Envisioning AI for K-12, What should every child know about AI? In Proceedings of the AAAI conference on artificial intelligence, Honolulu, 17 July 2019; pp. 9795-9799. (2019).
Ibe NA, Howsmon R, Penney L, Granor N, DeLyser LA, Wang K. Reflections of a diversity, equity, and inclusion working group based on data from a national CS education program. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education . New York, NY, 21 February 2018; ACM, New York, NY, USA; pp. 711–716. (2018).
Oermann, E. K. & Kondziolka, D. On chatbots and generative artificial intelligence. Neurosurgery 92 , 665–666 (2022).
Article Google Scholar
Yu, H. & Guo, Y. Generative artificial intelligence empowers educational reform, current status, issues, and prospects. Front. Educ. https://doi.org/10.3389/feduc.2023.1183162 (2023).
Stokel-Walker, C. AI bot ChatGPT writes smart essays-should academics worry. Nature https://doi.org/10.1038/d41586-022-04397-7 (2022).
Article PubMed Google Scholar
Cooper, G. Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. J. Sci. Educ. Technol. 32 , 444–452 (2023).
Almaiah, M. A. & Almulhem, A. A conceptual framework for determining the success factors of e-learning system implementation using Delphi technique. J. Theor. Appl. Inf. Technol. 96 (17), 5962–5976 (2018).
Almaiah, M. A., Al-Khasawneh, A. & Althunibat, A. Exploring the critical challenges and factors influencing the E-learning system usage during COVID-19 pandemic. Educ. Inf. Technol. 25 , 5261–5280 (2020).
Almaiah, M. & Al Mulhem, A. Thematic analysis for classifying the main challenges and factors influencing the successful implementation of e-learning system using NVivo. Int. J. Adv. Trends Comput. Sci. Eng. 9 (1), 142–152 (2020).
Long D, Magerko B. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI conference on human factors in computing systems . New York, NY, USA, 23 April 2020, 1–16; ACM, New York, NY, USA; pp. 1–6. (2020).
Zhou, X., Van Brummelen, J. & Lin, P. Designing AI learning experiences for K-12: Emerging works, future opportunities and a design framework. Arxiv https://doi.org/10.48550/arXiv.2009.10228 (2020).
Article PubMed PubMed Central Google Scholar
Lin, P.; Van Brummelen, J. Engaging teachers to co-design integrated AI curriculum for K-12 classrooms. In Proceedings of the 2021 CHI conference on human factors in computing systems . Yokohama, Japan, 07 May 2021; ACM, New York, NY, USA; pp. 1–12. (2021).
Sabuncuoglu A. Designing one year curriculum to teach artificial intelligence for middle school. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education . New York, NY, USA, 15 June 2020; ACM, New York, NY, USA; pp. 96-102. (2020).
Schleiss, J., Laupichler, M. C., Raupach, T. & Stober, S. AI course design planning framework, developing domain-specific ai education courses. Educ. Sci. 13 , 954 (2023).
Ayanwale, M. A., Sanusi, I. T., Adelana, O. P., Aruleba, K. D. & Oyelere, S. S. Teachers’ readiness and intention to teach artificial intelligence in schools. Comput. Educ. Artif. Intell. 3 , 100099 (2022).
Almaiah, M. A. et al. Measuring institutions’ adoption of artificial intelligence applications in online learning environments: Integrating the innovation diffusion theory with technology adoption rate. Electronics 11 (20), 3291 (2022).
Yilmaz, R. & Yilmaz, F. G. K. Augmented intelligence in programming learning: Examining student views on the use of ChatGPT for programming learning. Comput. Hum. Behav. Artif. Hum. 1 , 100005 (2023).
Yilmaz, F. G. K., Yilmaz, R. & Ceylan, M. Generative artificial intelligence acceptance scale: A validity and reliability study. Int. J. Hum. Comput. Interact. https://doi.org/10.1080/10447318.2023.2288730 (2023).
Yılmaz, F. G. & Karaoğlan, and Ramazan Yılmaz,. Yapay Zekâ Okuryazarlığı Ölçeğinin Türkçeye Uyarlanması. Bilgi Ve İletişim Teknolojileri Dergisi 5 (2), 172–190 (2023).
Terblanche, N., Molyn, J., Williams, K. & Maritz, J. Performance matters, students’ perceptions of artificial intelligence coach adoption factors. Coach. Int. J. Theor. 16 , 100–114 (2023).
Chai, J. L. et al. Factors influencing students’ behavioral intention to continue artificial intelligence learning. In International Symposium on Educational Technology (ISET) 147–150 (IEEE, 2020).
Zhou X, Van Brummelen J, Lin P. Designing AI learning experiences for K-12: Emerging works, future opportunities and a design framework. arXiv preprint arXiv:2009.10228. https://ar5iv.labs.arxiv.org/html/2009.10228 . (2020).
Wang, N. & Lester, J. K-12 education in the age of AI: A call to action for K-12 AI literacy. Int. J. Artif. Intell. Educ. 33 , 228–232 (2023).
Chiu, T. K. et al. Creation and evaluation of a pretertiary artificial intelligence (AI) curriculum. IEEE Trans. Educ. 65 , 30–39 (2021).
Lv, Z. Generative artificial intelligence in the metaverse era. Cogn. Robot. 3 , 208–217 (2023).
Al-Sayid, F. & Kirkil, G. Exploring non-linear relationships between perceived interactivity or interface design and acceptance of collaborative web-based learning. Educ. Inf. Technol. 28 , 11819–11866 (2023).
Chen, X., Xie, H., Zou, D. & Hwang, G. J. Application and theory gaps during the rise of artificial intelligence in education. Comput. Educ. Artif. Intell. 1 , 100002 (2020).
Shishakly, R., Almaiah, M., Lutfi, A. & Alrawad, M. The influence of using smart technologies for sustainable development in higher education institutions. Int. J. Data Netw. Sci. 8 (1), 77–90 (2024).
Holmes, W., Bialik, M. & Fadel, C. Artificial Intelligence in Education (Globethics Publications, 2023).
Ng, D. T. K., Luo, W., Chan, H. M. Y. & Chu, S. K. W. Using digital story writing as a pedagogy to develop AI literacy among primary students. Comput. Educ. Artif. Intell. 3 , 100054 (2022).
Dwivedi, Y. K. et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 71 , 102642 (2023).
Chiu, T. K., Moorhouse, B. L., Chai, J. L. & Ismailov, M. Teacher support and student motivation to learn with artificial intelligence (AI) based chatbot. Interact. Learn. Environ. https://doi.org/10.1080/10494820.2023.2172044 (2023).
Chiu, T. K. The impact of generative AI (GenAI) on practices, policies and research direction in education: A case of ChatGPT and midjourney. Interact. Learn. Environ. https://doi.org/10.1080/10494820.2023.2253861 (2023).
Wu, Y., Yu, N., Li, Z., Backes, M. & Zhang, Y. Membership inference attacks against text-to-image generation models. Arxiv https://doi.org/10.48550/arXiv.2210.00968 (2022).
Watson, A. D. & Watson, G. H. Transitioning STEM to STEAM: Reformation of engineering education. J. Qual. Part. 36 , 1–5 (2013).
Cohn, N. Visual narrative comprehension: Universal or not. Psychon. B. Rev. 27 , 266–285 (2020).
Kim, K. H. & Kim, H. G. A study on how to create interactive children’s books using ChatGPT and midjourney. Techart J. Art Imaging Sci. 10 , 39–46 (2023).
Chocarro, R., Cortiñas, M. & Marcos-Matás, G. Teachers’ attitudes towards chatbots in education, a technology acceptance model approach considering the effect of social language, bot proactiveness, and users’ characteristics. Educ. Stud. 49 , 295–313 (2023).
Roy, R., Babakerkhell, M. D., Mukherjee, S., Pal, D. & Funilkul, S. Evaluating the intention for the adoption of artificial intelligence-based robots in the university to educate the students. IEEE Access 10 , 125666–125678 (2022).
Davis, F. D. A Technology Acceptance Model for Empirically Testing New End-User Information Systems: Theory and Results (Massachusetts Institute of Technology, 1985).
Davis, F. D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. Mis. Quart. 13 , 319–340 (1989).
Al-Adwan, A. S. et al. Extending the technology acceptance model (TAM) to Predict University students’ intentions to use metaverse-based learning platforms. Educ. Inf. Technol. 28 (11), 15381–15413 (2023).
Almaiah, M. A. et al. Employing the TAM model to investigate the readiness of M-learning system usage using SEM technique. Electronics 11 (8), 1259 (2022).
Article CAS Google Scholar
Almaiah, M. A. et al. Smart mobile learning success model for higher educational institutions in the context of the COVID-19 pandemic. Electronics 11 (8), 1278 (2022).
Gursoy, D., Chi, O. H., Lu, L. & Nunkoo, R. Consumers acceptance of artificially intelligent (AI) device use in service delivery. Int. J. Inform. Manage. 49 , 157–169 (2019).
Kelly, S., Kaye, S. A. & Oviedo-Trespalacios, O. What factors contribute to acceptance of artificial intelligence? A systematic review. . Telemat. Inform. 77 , 101925 (2022).
Garrison, D. R., Anderson, T. & Archer, W. Critical thinking, cognitive presence, and computer conferencing in distance education. Am. J. Distance Educ. 15 , 7–23 (2001).
Chai, J. L., Wang, X. & Xu, C. An extended theory of planned behavior for the modelling of Chinese secondary school students’ intention to learn artificial intelligence. Mathematics 8 , 2089 (2020).
Lan, Y. J., Botha, A., Shang, J. & Jong, M. S. Y. Guest editorial: Technology enhanced contextual game-based language learning. J. Educ. Technol. Soc. 21 , 86–89 (2018).
Ryan, R. M. & Deci, E. L. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemp. Educ. Psychol. 25 , 54–67 (2000).
Article CAS PubMed Google Scholar
Froiland, J. M. & Worrell, F. C. Intrinsic motivation, learning goals, engagement, and achievement in a diverse high school. Psychol. Sch. 53 , 321–336 (2016).
Fagan, M. H., Neill, S. & Wooldridge, B. R. Exploring the intention to use computers: An empirical investigation of the role of intrinsic motivation, extrinsic motivation, and perceived ease of use. J. Comput. Inform. Syst. 48 , 31–37 (2008).
Martín-Núñez, J. L., Ar, A. Y., Fernández, R. P., Abbas, A. & Radovanović, D. Does intrinsic motivation mediate perceived artificial intelligence (AI) learning and computational thinking of students during the COVID-19 pandemic. Comput. Educ. Artif. Intell. 4 , 100128 (2023).
Parasuraman, A. & Colby, C. L. An updated and streamlined technology readiness index: TRI 2.0. J. Serv. Res. 18 , 59–74 (2015).
Dai, Y. et al. Promoting students’ well-being by developing their readiness for the artificial intelligence age. Sustain. Sci. 12 , 6597 (2020).
Ajzen, I. The theory of planned behavior. Organ. Behav. Hum. Decis. 50 , 179–211 (1991).
Lin, P. Y. et al. Modeling the structural relationship among primary students’ motivation to learn artificial intelligence. Comput. Educ. Artif. Intell. 2 , 100006 (2021).
Chai, J. L. et al. Perceptions of and behavioral intentions towards learning artificial intelligence in primary school students. Educ. Technol. Soc. 24 , 89–101 (2021).
Owolabi, K. et al. Awareness and readiness of Nigerian polytechnic students towards adopting artificial intelligence in libraries. J. Inf. Knowl. 59 , 15–24 (2022).
Nikou, S. A. & Economides, A. A. Mobile-based assessment: Investigating the factors that influence behavioral intention to use. Comput. Educ. 109 , 56–73 (2017).
Ha, J. G., Page, T. & Thorsteinsson, G. A study on technophobia and mobile device design. Int. J. Contents 7 , 17–25 (2011).
Johnson, D. G. & Verdicchio, M. AI anxiety. J. Assoc. Inf. Sci. Tech. 68 , 2267–2270 (2017).
Wang, Y. Y. & Wang, Y. S. Development and validation of an artificial intelligence anxiety scale: An initial application in predicting motivated learning behavior. Interact. Learn. Envir. 30 , 619–634 (2022).
Baek, T. H. & Kim, M. Is ChatGPT scary good? How user motivations affect creepiness and trust in generative artificial intelligence. Telemat. Inform. 83 , 102030 (2023).
Massey, B. L. & Levy, M. R. Interactivity, online journalism, and English-language web newspapers in Asia. J. Mass. Commun. Q. 76 , 138–151 (1999).
Mcmillan, S. J. The researchers and the concept: Moving beyond a blind examination of interactivity. J. Interact. Advert. 5 , 1–4 (2005).
Cho, C. H. Effects of banner clicking and attitude toward the linked target ads on brand-attitude and purchase-intention changes. J. Glob. Acad. Market. Sci. 14 , 1–16 (2004).
Article ADS Google Scholar
Almaiah, M. A. et al. Examining the impact of artificial intelligence and social and computer anxiety in e-learning settings: Students’ perceptions at the university level. Electronics 11 (22), 3662 (2022).
Head, A. J. Design Wise: A Guide for Evaluating the Interface Design of Information Resources 19–99 (Information Today, Inc., 1999).
Cliff, M., Dillon, A. & Richardson, J. User Centered Design of Hypertext and Hypermedia for Education (Macmillan, 1996).
Wang, S. K. & Yang, C. The interface design and the usability testing of a fossilization web-based learning environment. J. Sci. Educ. Technol. 14 , 305–313 (2005).
Lohr, L. L., Falvo, D. A., Hunt, E. & Johnson, B. Improving the usability of distance learning through template modification. In Flexible Learning in an Information Society (ed. Khan, B. H.) 186–197 (IGI Global, 2007).
Chapter Google Scholar
Liu, I. F., Chen, M. C., Sun, Y. S., Wible, D. & Kuo, C. H. Extending the TAM model to explore the factors that affect intention to use an online learning community. Comput. Educ. 54 , 600–610 (2010).
Almaiah, M. A., Jalil, M. A. & Man, M. Extending the TAM to examine the effects of quality features on mobile learning acceptance. J. Comput. Educ. 3 , 453–485 (2016).
Shee, D. Y. & Wang, Y. S. Multi-criteria evaluation of the web-based e-learning system, a methodology based on learner satisfaction and its applications. Comput. Educ. 50 , 894–905 (2008).
Terzis, V. & Economides, A. A. The acceptance and use of computer based assessment. Comput. Educ. 56 , 1032–1044 (2011).
Lee, B. C., Yoon, J. O. & Lee, I. Learners’ acceptance of e-learning in South Korea, theories and results. Comput. Educ. 53 , 1320–1329 (2009).
Isaias, P. & Issa, T. Sustainable design, HCI, usability and environmental concerns (Springer-Verlag, 2015).
Althunibat, A., Almaiah, M. A. & Altarawneh, F. Examining the factors influencing the mobile learning applications usage in higher education during the COVID-19 pandemic. Electronics 10 (21), 2676 (2021).
Al-Adwan, A. S. et al. Unlocking future learning: Exploring higher education students’ intention to adopt meta-education. Heliyon 10 (9), e29544 (2024).
Lee, M. C. Explaining and predicting users’ continuance intention toward e-learning, an extension of the expectation-confirmation model. Comput. Educ. 54 , 506–516 (2010).
Duncan, T. G. & Mckeachie, W. J. The making of the motivated strategies for learning questionnaire. Educ. Psychol. 40 , 117–128 (2005).
Chou, C. Interactivity and interactive functions in web-based learning systems, a technical framework for designers. Brit. J. Educ. Technol. 34 , 265–279 (2003).
Download references
This research was funded by the Chinese Ministry of Education Collaborative Education Project between Universities and Firms (grant number 220605242172594) and the Guangdong University of Technology Online Course Construction Project (grant number 211210102) and Supported by Kunsan National University’s Industry-Academia Cooperation Group (IACG) (grant number 2023H052).
Authors and affiliations.
Department of Computer Information Engineering, Kunsan National University, Gunsan, 54150, Republic of Korea
Yantong Liu
Department of Smart Experience Design, Kookmin University, Seoul, 02707, Republic of Korea
Wei Li & Xiaolin Zhang
Department of Educational Psychology, University of Georgia, Athens, GA, 30605, USA
Department of Poultry Science, University of Georgia, Athens, GA, 30605, USA
Department of International Culture Education, Chodang University, Muan, 58530, Republic of Korea
College of Art and Design, Guangdong University of Technology, Guangzhou, 510006, China
Xiaolin Zhang
You can also search for this author in PubMed Google Scholar
Conceptualization, W.L. and X.Z..; methodology, W.L.; software, W.L.; validation, W.L. and J.L..; formal analysis, W.L.; investigation, W.L. and Y.L..; resources, X.Y.; data curation, J.L. and D.L..; revision and funding, Y.L., X.Z. All authors have read and agreed to the published version of the manuscript.
Correspondence to Yantong Liu .
Competing interests.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Cite this article.
Li, W., Zhang, X., Li, J. et al. An explanatory study of factors influencing engagement in AI education at the K-12 Level: an extension of the classic TAM model. Sci Rep 14 , 13922 (2024). https://doi.org/10.1038/s41598-024-64363-3
Download citation
Received : 20 February 2024
Accepted : 07 June 2024
Published : 17 June 2024
DOI : https://doi.org/10.1038/s41598-024-64363-3
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.
You have full access to this open access article
Pediatric Hodgkin and non-Hodgkin lymphomas differ from adult cases in biology and management, yet there is a lack of survival analysis tailored to pediatric lymphoma. We analyzed lymphoma data from 1975 to 2018, comparing survival trends between 7,871 pediatric and 226,211 adult patients, identified key risk factors for pediatric lymphoma survival, developed a predictive nomogram, and utilized machine learning to predict long-term lymphoma-specific mortality risk. Between 1975 and 2018, we observed substantial increases in 1-year (19.3%), 5-year (41.9%), and 10-year (48.8%) overall survival rates in pediatric patients with lymphoma. Prognostic factors such as age, sex, race, Ann Arbor stage, lymphoma subtypes, and radiotherapy were incorporated into the nomogram. The nomogram exhibited excellent predictive performance with area under the curve (AUC) values of 0.766, 0.724, and 0.703 for one-year, five-year, and ten-year survival, respectively, in the training cohort, and AUC values of 0.776, 0.712, and 0.696 in the validation cohort. Importantly, the nomogram outperformed the Ann Arbor staging system in survival prediction. Machine learning models achieved AUC values of approximately 0.75, surpassing the conventional method (AUC = ~ 0.70) in predicting the risk of lymphoma-specific death. We also observed that pediatric lymphoma survivors had a substantially reduced risk of lymphoma after ten years b,ut faced an increasing risk of non-lymphoma diseases. The study highlights substantial improvements in pediatric lymphoma survival, offers reliable predictive tools, and underscores the importance of long-term monitoring for non-lymphoma health issues in pediatric patients.
Avoid common mistakes on your manuscript.
Lymphoma stands as the third most prevalent pediatric cancer, comprising 15% of childhood malignancies [ 1 ]. Despite significant advancements in treatment approaches that have markedly improved the outlook for pediatric lymphoma patients in recent decades, lymphoma remains a notable contributor to childhood cancer-related mortality [ 2 , 3 ]. This is especially true for children aged 1–10 years. Notably, treatment outcomes can exhibit considerable variability, potentially attributed to a complex interplay of psychosocial factors, patient-specific variables, tumor subtypes, and their underlying biological characteristics [ 4 ]. Therefore, it is imperative to conduct a comprehensive investigation on a substantial scale to discern the factors influencing survival and prognosis in pediatric lymphoma patients.
Recent investigations have illuminated the shifting landscape of pediatric lymphoma through extensive database analyses. For example, Kahn et al. [ 5 ] delved into racial disparities in the survival of pediatric Hodgkin lymphoma (HL) patients, revealing that Black patients exhibited a significantly lower 10-year overall survival (OS) rate compared to Caucasians. Interestingly, this survival gap has been narrowing over time, primarily due to more substantial improvements in the ten-year OS rates observed among Black patients. In a similar vein, Bazzeh et al. [ 6 ] focused their exploration on pediatric HL patients spanning from 1988 to 2005, identifying stage IV disease and the presence of B symptoms as independent prognostic risk factors. Various studies have focused on specific facets or subtypes of pediatric non-Hodgkin lymphoma (NHL), such as cutaneous T-cell or B-cell lymphoma, as well as primary gastrointestinal lymphoma within the Surveillance, Epidemiology, and End Results (SEER) database [ 7 , 8 , 9 ]. The imperative for a comprehensive investigation into the survival and prognosis of both HL and NHL in pediatric patients remains paramount. Therefore, building upon the extensive clinical data available in the SEER database, encompassing patients with lymphoma from 1975 to 2018, we aimed to comprehensively explore the survival and prognosis predictors of pediatric lymphoma, serving as the foundation for the development of machine learning models capable of reliably predicting survival outcomes. Simultaneously, this study sought to analyze survival trends over recent decades to pinpoint key aspects that could guide the future trajectory of pediatric lymphoma research.
Data source.
The data for this study were sourced from cancer records spanning a period from 1975 to 2018, originating from nine specific states within the United States through SEER database collected and consolidated by the National Cancer Institute as part of its commitment to tackling the increasing burden of cancer. The selected states that contributed to this dataset encompass Connecticut, Michigan, Georgia, California, Hawaii, Iowa, New Mexico, Washington, and Utah. The SEER database, accessible at https://seer.cancer.gov , stands as a comprehensive repository of cancer-related information. A visual representation of the study's flow and methodology can be found in Supplemental Figure S1 .
Patients diagnosed with primary lymphoma at ages ranging from 0 to 19 years were identified using the third edition of the International Classification of Diseases for Oncology. To conduct survival-associated analyses, additional screening was carried out to exclude cases lacking follow-up information or involving patients who passed away within one month after their diagnosis. Extensive demographic and clinical data pertaining to the patients were meticulously gathered. This encompassed data such as the age at diagnosis, sex, race, tumor subtype, Ann Arbor staging, year of diagnosis, the utilization of chemotherapy and radiotherapy, and vital status. It is important to underscore that the execution of this study was carried out in strict adherence to the Strengthening the Reporting of Observational Studies in Epidemiology guideline, ensuring the robustness and transparency of the research methodology [ 10 ].
The analysis of factors associated with OS among pediatric lymphoma patients was carried out using a multivariable Cox proportional hazards regression model. A nomogram model, built upon the most influential factors, was developed to predict OS at one-year, five-year, and ten-year intervals. This model underwent external validation within a separate validation cohort, created through random division at a 7:3 ratio. The model's precision was confirmed by assessing the area under the curve (AUC) of receiver operating characteristic (ROC), and comparisons were made between the nomogram and the Ann Arbor staging system. Additionally, a calibration curve was generated to compare the predictive outcomes of the nomogram against actual survival rates.
Five well-established machine learning algorithms were employed to predict the long-term risk of lymphoma-specific mortality. These algorithms included extreme gradient boosting (XGB), the random forest classifier (RFC), adaptive boosting (ADB), artificial neural network (ANN), and gradient boosting decision tree (GBDT), alongside logistic regression (LR). The parameters for each machine learning algorithm are shown in Supplemental Table S1 . The ANN algorithm is a complex, highly interconnected network composed of adaptable units that mimic the interaction of biological nervous systems with real-world objects. RFC represents an advanced iteration of the decision tree algorithm, suitable for both regression and classification tasks. GBDT, XGB, and ADB are part of the ensemble learning category of machine learning algorithms, known for improving classifier generalization by training multiple classifiers and combining their results for enhanced predictive performance. Additionally, to enhance the reliability of models, continuous variables underwent z -score normalization as preprocessing. Except for LR, the transparency of these algorithms is limited, making it challenging for users to decipher the relationship between variables and outcomes. To enhance the reliability of models, continuous variables underwent z-score normalization, and categorical variables were one-hot encoded. Feature selection using Cox regression identified potential prognostic predictors.
The training procedure involved several key steps. Each algorithm was trained using fivefold cross-validation to ensure robustness and prevent overfitting, with the datasets split in a 7:3 ratio for training and validation. For the ANN model, the Adam optimizer was employed with a binary cross-entropy loss function, trained for 100 epochs with a batch size of 32. Early stopping was implemented to prevent overfitting by monitoring the validation loss and halting training if no improvement was observed for ten consecutive epochs. The performance of each model was evaluated using the area under the curve (AUC) of receiver operating characteristic (ROC) curves, and decision curve analysis (DCA) was conducted to assess clinical utility.
The extraction of patient data, including clinical characteristics and follow-up information, was conducted using SEER*Stat version 8.3.9 software, accessible at https://seer.cancer.gov/seerstat . Subsequent statistical analyses were performed using IBM SPSS version 27.0, headquartered in Armonk, NY, USA, and R software version 4.3.1, available at https://www.r-project.org . To compare baseline characteristics between the training and validation cohorts, the χ2 test was employed, encompassing variables such as gender, race, age, lymphoma subtype, Ann Arbor stage, and the initial treatment course (involving chemotherapy and radiotherapy). These variables were further subjected to multivariable Cox proportional hazards regression analysis, which calculated hazard ratios (HR) and their associated 95% confidence intervals (CI) with respect to OS. Survival curves for both OS and disease-specific survival (DSS, lymphoma-specific) were generated using the Kaplan–Meier method, and distinctions among various subpopulations were assessed via a log-rank test. Within the SEER program, the survival time was defined as the duration from the date of diagnosis to either death or the most recent follow-up. It's important to note that the patient data utilized in this study were most recently updated as of November 2020. In this study, statistical significance was determined using a two-sided P value < 0.05.
A cohort of 7871 pediatric individuals, ranging in age from 0 to 19 years with a median age of 15 years, received diagnoses of lymphoma between 1975 and 2018. These cases were extracted from the SEER database, which draws data from nine U.S. states (Supplemental Table S2 ). The majority of these patients, constituting 53.6% ( N = 4215), fell within the 15–19-year age group. Furthermore, 6.5% ( N = 513) of the cases were in the 0–4-year age group, 14.4% ( N = 1137) were aged 5–9 years, and 25.5% ( N = 2006) were aged 10–14 years. Males accounted for a higher proportion, at 59.1% ( N = 4650), compared to females at 40.9% ( N = 3221). The ethnic distribution showed that Caucasians comprised the largest group, with 80.5% ( N = 6338), followed by 11.4% ( N = 897) who were of African descent, and 8.1% ( N = 556) from other ethnic backgrounds, including AI/AN/AP (American Indian/Alaska Native/Asian and Pacific Islander). In terms of lymphoma subtypes, 53.3% ( N = 4193) were diagnosed with HL, of which 4144 were nodal and 49 were extra-nodal, while 46.7% ( N = 3678) had NHL, with 2543 being nodal and 1135 extra-nodal cases. Among the 5579 (70.9%) cases with staging information available, 17.6% ( N = 1388) were categorized as stage I, 24.2% ( N = 1901) as stage II, 11.4% ( N = 896) as stage III, and 17.7% ( N = 1394) as stage IV. As of the latest update, 1859 patients (23.6%) have succumbed to the condition, while 5992 (76.1%) remain alive. It's worth noting that 20 patients (0.3%) among the surviving group were lost to follow-up.
As shown in Fig. 1 and Supplemental Table S2 , both adult and pediatric patients with lymphoma demonstrated gradually improved OS and DSS over the past four decades. Among the pediatric patients, the 1-year, 5-year and 10-year OS probability rates increased by 19.3% (82.5% in 1975 to 98.4% in 2017), 41.9% (66.1% in 1975 to 93.8% in 2013), and 48.8% (59.8% in 1975 to 89.0% in 2008), compared with 15.0% (73.9% in 1975 to 85.0% in 2017), 39.8% (48.7% in 1975 to 68.1% in 2013), and 54.6% (34.6% in 1975 to 53.5% in 2008) among adults, respectively. As for DSS, increases in the 1-year, 5-year, and 10-year rates for pediatric patients were 16.3% (85.5% in 1975 to 99.4% in 2017), 35.4% (70.1% in 1975 to 94.9% in 2013), and 43.8% (65.5% in 1975 to 94.2% in 2008), while adult cases were 12.3% (79.5% in 1975 to 89.3% in 2013), 34.2% (60.3% in 1975 to 80.9% in 2013), and 53.0% (50.0% in 1975 to 76.5% in 2008), respectively. Kahn et al. [ 5 ] reported that the Black population showed more prominent improvement in the long-term survival than Caucasians. In contrast, our subgroup analyses of different races demonstrated that Caucasian children showed consistently higher survival rates, especially in the 5-year and 10-year outcomes (Fig. 1 C).
Overall survival and disease-specific survival trends of pediatric lymphoma over time. A 1-year, 5-year, and 10-year overall survival rates of pediatric and adult lymphoma over the year of diagnosis. B 1-year, 5-year, and 10-year disease-specific survival rates of pediatric and adult lymphoma over the year of diagnosis. C 1-year, 5-year, and 10-year overall survival and disease-specific survival rates of pediatric lymphoma patients among the subgroups of different races over the year of diagnosis. aAI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander
Using the multivariable Cox regression model, the independent prognostic risk factors for OS among pediatric patients with lymphoma were identified, including age (0–4 years: reference, HR = 1; 5–9 years: HR = 0.83, 95%CI 0.66–1.05, P = 0.117; 10–14 years: HR = 1.05, 95%CI 0.85–1.30, P = 0.628; 15–19 years: HR = 1.35, 95%CI 1.11–1.66, P = 0.003), sex (male: reference, HR = 1; female: HR = 0.89, 95%CI 0.81–0.98, P = 0.018), race (Caucasian: reference, HR = 1; Black: HR = 1.32, 95%CI 1.14–1.52, P < 0.001), the lymphoma subtype (HL: reference, HR = 1; Nodal NHL: HR = 1.92, 95%CI 1.71–2.16, P < 0.001; Extra-nodal NHL: HR = 1.42, 95%CI 1.20–1.69, P < 0.001), the Ann Arbor stage (stage I: reference, HR = 1; stage II: HR = 1.32, 95%CI 1.08–1.62, P = 0.006; stage III: HR = 1.67, 95%CI 1.33–2.09, P < 0.001; stage IV: HR = 2.42, 95%CI 2.01–2.92, P < 0.001), and radiotherapy (not receiving: reference, HR = 1; receiving: HR = 1.32, 95%CI 1.19–1.46, P < 0.001) (Fig. 2 ). The survival curves and comparisons associated with OS and DSS among subgroups divided by age, sex, race, the lymphoma subtype, and the Ann Arbor stage are shown in Supplemental Figure S2 – S6 . Importantly, among the pediatric patients with lymphoma, age significantly affected OS but not DSS (Supplemental Figure S2 ). In the first 26 years, approximately, children with a diagnosis at 0–4 years of age performed worse than others and pediatric patients aged 15–19 years demonstrated worse long-term OS. Sex was identified as one of the critical factors affecting not only DSS but also OS (Supplemental Figure S3 ). In terms of long-term outcomes, females consistently demonstrated better DSS than males. Although female patients had significantly better OS, the two groups started to overlap after approximately 20 years, indicating that other factors and other diseases may have affected the long-term survival rather than the lymphoma per se. Furthermore, the differences related to ethnicity are complicated. Pediatric patients of different races demonstrated similar DSS, while Caucasians and AI/AN/AP demonstrated significantly better OS than Black (Supplemental Figure S4 ). This is much more likely to be associated with multiple socioeconomic factors, instead of internal ethnicity differences. As for lymphoma subtypes, HL always showed better DSS than NHL among pediatric patients. Although, both subtypes may have no effects on long-term survival (Supplemental Figure S5 ). Surprisingly, we observed that the OS lines intersected at about 34 years after diagnosis. Pediatric patients with HL may be more susceptible to other associated factors or other diseases than pediatric patients with NHL during the long-term survival period.
Multivariable Cox proportional hazards regression analysis for overall survival among pediatric patients with lymphoma. aHR, hazard ratio; AI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander; HL, Hodgkin lymphoma; N-NHL, Nodal Non-Hodgkin lymphoma; E-NHL, extra-nodal non-Hodgkin lymphoma
A total of 7741 pediatric patients with lymphoma were randomly divided into the training cohort and the validation cohort in a ratio of 7:3. The demographic characteristics of the two cohorts were not significantly different (Supplemental Table S4 ). Based on the independent prognostic factors identified using the multiple Cox regression model, a prediction nomogram was developed using the variables that involved sex, age, race, Ann Arbor stages, lymphoma subtypes, and radiotherapy in the training cohort (Fig. 3 A). Both internal and external validations to test the calibration and predictive ability of the nomogram were performed. Calibration curves of 1-year, 5-year, and 10-year OS demonstrated great consistency between the nomogram-predicted outcomes and the actual OS rates in both the training and the validation cohorts (Supplemental Figure S7 ). Furthermore, the prediction ability of the nomogram (1-year: 0.766 and 0.776, 5-year: 0.724 and 0.712, 10-year: 0.703 and 0.696, in the training and validation cohorts, respectively) was evaluated by AUCs of the ROC curves, and the nomogram performed better than Ann Arbor staging system (1-year: 0.666 and 0.668, 5-year: 0.647 and 0.651, 10-year: 0.646 and 0.641, in the training and validation cohorts, respectively) (Fig. 3 B). We further visualized relationship between all patients’ nomogram scores and survival time (Fig. 3 C), and higher nomogram scores indicated significantly worse survival outcomes (Fig. 3 D–E).
The nomogram to predict 1-year, 5-year, and 10-year overall survival (OS) probabilities among pediatric patients with lymphoma. A Quantitative nomogram to predict survival probabilities according to the total points based on sex, age, race, the Ann Arbor stage, the lymphoma subtype, and radiotherapy. White, Caucasians; Black, African-American; AI/AN/AP, American Indian/Alaska Native/Asian and Pacific Islander. B Receiver operating characteristic curves of the nomogram and the Ann Arbor Staging System to predict 1-year, 5-year, and 10-year OS probabilities in the training and validation cohorts. AUC, the area under the ROC curve. AUCs of the nomogram (1-year: 0.766 and 0.776, 5-year: 0.724 and 0.712, 10-year: 0.703 and 0.696, in the training and validation cohorts, respectively) vs AUCs of the Ann Arbor Staging System (1-year: 0.666 and 0.668, 5-year: 0.647 and 0.651, 10-year: 0.646 and 0.641, in the training and validation cohorts, respectively). C Relationship between nomogram scores and survival time of each pediatric lymphoma patient. D and E Kaplan–Meier survival curves for pediatric lymphoma patients grouped by the median nomogram score in the training cohort and validation cohort, respectively
To further explore relationships between demographic characteristics and long-term outcomes of pediatric lymphoma, we developed multiple machine learning algorithm-based models for predicting the 5-year, 10-year and 20-year risk of lymphoma-specific death using the abovementioned variables. All machine learning models (AUC = ~ 0.75) demonstrated significantly higher AUCs than conventional LR (AUC = ~ 0.70) with better performance in decision curves, highlighting the superiority of artificial intelligence (Fig. 4 A, B). Furthermore, patients were nearly free from lymphoma-specific death about ten years after diagnosis of pediatric lymphoma, while the non-lymphoma death risk increased sharply all the time (Fig. 4 C, D). The sensitivity and specificity values for each model were confirmed at the maximal Youden index (Table 1 ). The non-lymphoma death causes for pediatric lymphoma patients were shown in Fig. 5 .
Machine learning models for risk prediction of long-term lymphoma-specific death in patients with pediatric lymphoma. A Receiver operating characteristic curves of five classical machine learning-based models and logistic regression (LR) with areas under the curve (AUC). B Decision curve analysis for five classical machine learning-based models and LR. C Number of lymphoma-specific and non-lymphoma deaths as survival years after lymphoma diagnosis. D Cumulative lymphoma-specific and non-lymphoma mortalities as survival years after lymphoma diagnosis
Analysis of death causes among pediatric patients with lymphoma
In this comprehensive population-based study, we leveraged the largest available dataset of cancer patients from the SEER database to conduct a systematic analysis of survival and outcome prediction for pediatric lymphomas, employing advanced machine learning techniques. Our investigation into survival trends revealed a notable increase in OS and DSS over the decades, both in the pediatric and adult lymphoma patient populations. Crucially, our findings indicated a remarkable similarity between 5-year and 10-year survival rates among pediatric patients, implying that the 5-year mark might serve as a critical management checkpoint for long-term survival prospects. It suggests that once pediatric patients with lymphoma surpass the initial 5-year survival threshold, their chances of being cured and enjoying sustained remission significantly improve. Additionally, we observed that OS rates closely mirrored DSS rates within the pediatric population, in stark contrast to the adult population. This intriguing pattern could be attributed to the fact that pediatric patients rarely succumbed to NHL, as evidenced by our analysis of causes of death. Specifically, among deceased pediatric patients, only 52.1% were attributed to lymphoma-related causes (NHL: N = 574, 32.8%; Hodgkin Lymphoma: N = 337, 19.3%), while other non-lymphoma causes included heart diseases, infectious diseases, accidents, adverse effects, and acute lymphocytic leukemia, among others. Patients with lymphoma were found to have long-term death risk of cardiovascular diseases [ 11 , 12 ]. The potential immune deficiency caused by lymphomas may also increase the risk of infection [ 13 ]. These insights shed light on the complex interplay of factors affecting survival in pediatric lymphoma patients, emphasizing the importance of long-term follow-up and tailored management strategies.
We also performed a multivariable analysis using Cox proportional hazards regression to identify potential independent risk factors for survival outcomes among pediatric patients with lymphoma. Age, sex, race, the lymphoma subtype, the stage, and radiotherapy were found to be significantly associated with OS. The Kaplan–Meier curves also suggested similar survival comparisons. Pediatric patients aged 0–4 years had lower OS than other ages in the first 20 plus years. Surprisingly, we found that the older pediatric patients, aged 15–19 years, demonstrated worse long-term OS outcomes, which was not observed in DSS curves. Consistent with a previous report, in some populations, the number of patients who died of other factors or other diseases can be comparable to those who died of lymphoma itself [ 14 ]/Moreover, the cumulative mortality curve also demonstrated that patients after diagnosis of pediatric lymphoma could be exempted from lymphoma-specific death but had an increasing risk of non-lymphoma diseases, especially after surviving ten years. Regardless of OS or DSS, pediatric males had significantly worse survival outcomes than pediatric females. Yet, there was still overlap between the two groups in the OS curve after a follow-up period of more than 30 years. The same situation was also observed between HL and NHL. Our results showed that males and pediatric patients with HL may be more susceptible to some long-term events, such as secondary malignancies and cardiovascular diseases [ 14 , 15 , 16 ]. As for race, Caucasian and AI/AN/AP children had significantly better OS than the Black children, while all of them demonstrated similar DSS. Populations of different ethnicities may have specific internal sensitivity to treatment, such as chemotherapy and radiotherapy. Moreover, socioeconomic limitations may lead to delayed diagnoses and management among Black. Furthermore, transplantation is currently one of the most critical treatments for cure, However, Black is under-represented in the marrow donor registries, and thus, have fewer opportunities to undergo transplantation [ 5 ].
Though lymphoma does not occur as commonly in children as in adults, it is still one of the most common malignancies among children [ 17 ]. The predictive tools that previous studies developed mainly focused on adult lymphoma, very few studies have reported the survival prediction nomogram among pediatric patients with lymphoma. Of note, the Ann Arbor staging system, which focused on the distribution of nodal involvement, was initially developed for HL. The biological features of NHL are different from those of HL. Thus, by integrating the independent prognostic risk factors using one of the largest lymphoma datasets from the SEER database, we developed a predictive nomogram model that can be easily used by clinicians worldwide. We also compared the predictive ability between the nomogram we developed and the Ann Arbor staging system. We found that the nomogram performed better in predicting 1-year, 5-year, and 10-year OS in both the training and validation cohorts. Besides, all machine learning models we developed also performed better than the conventional method in predicting long-term lymphoma-specific death risk, showing the superiority of machine learning in data mining. machine learning models can process large volumes of patient data, including clinical records, images, and genetic information, to assist physicians in devising more personalized treatment plans. In detail, our machine learning models may aid doctors in initial screening and diagnosis, saving time and allowing doctors to focus more on interacting with patients and formulating treatment plans. However, it's important to note that the application of machine learning models requires high-quality data and appropriate regulation to ensure their safety and effectiveness [ 18 ]. Additionally, machine learning models should only serve as an auxiliary tool in medical decision-making, with the ultimate treatment decisions still being made by experienced physicians. The ongoing development and improvement of more artificial intelligence tools will contribute to enhancing the diagnosis and treatment outcomes for lymphoma patients [ 19 ]. Overall, the quantitative nomogram and machine learning models may be useful for accurately and effectively predicting the survival probability for each individual child and contribute to clinical decision-making.
Our study had several limitations that merit consideration. Firstly, the clinical data we relied upon were sourced from the SEER database, representing only nine U.S. states. As a result, it is essential to acknowledge that our findings might not be entirely representative of the entire pediatric lymphoma landscape across the United States. Nevertheless, it is noteworthy that our study boasted the largest cohort of pediatric lymphoma patients to date, enabling systematic clinical analyses. Another limitation pertains to the comprehensiveness of information within the SEER database. Certain critical clinical details, such as specific treatment protocols, were regrettably unavailable, limiting the further optimization for enhanced predictive accuracy. To address these limitations and enhance the precision of our predictive models, our future plans include launching a multicenter cohort study that encompasses a broader array of pediatric lymphoma patients, thus allowing for the collection of a more comprehensive dataset encompassing a wider range of patient characteristics.
In conclusion, our study pioneered the exploration of survival trends, revealing that advancements in diagnostic and treatment approaches have led to notable improvements in both short-term and long-term survival outcomes. Moreover, we introduced an innovative quantitative nomogram and deployed multiple machine learning models to facilitate outcome prediction, showcasing their remarkable predictive accuracy and practical utility. The insights gleaned from this comprehensive clinical investigation are poised to offer valuable and actionable information on pediatric lymphoma, benefitting clinicians globally and serving as a catalyst for further research in this field.
The data is available on the Surveillance, Epidemiology, and End Results (SEER, http://seer.cancer.gov ) database.
Buhtoiarov I. Pediatric lymphoma. Pediatr Rev. 2017;38(9):410–23. https://doi.org/10.1542/pir.2016-0152 .
Article PubMed Google Scholar
Smith MA, Seibel NL, Altekruse SF, et al. Outcomes for children and adolescents with cancer: challenges for the twenty-first century. J Clin Oncol. 2010;28(15):2625–34. https://doi.org/10.1200/jco.2009.27.0421 .
Article PubMed PubMed Central Google Scholar
Mauz-Körholz C, Metzger ML, Kelly KM, et al. Pediatric Hodgkin lymphoma. J Clin Oncol. 2015;33(27):2975–85. https://doi.org/10.1200/jco.2014.59.4853 .
Sandlund JT, Martin MG. Non-Hodgkin lymphoma across the pediatric and adolescent and young adult age spectrum. Hematology Am Soc Hematol Educ Prog. 2016;1:589–97. https://doi.org/10.1182/asheducation-2016.1.589 .
Article Google Scholar
Kahn JM, Keegan TH, Tao L, et al. Racial disparities in the survival of American children, adolescents, and young adults with acute lymphoblastic leukemia, acute myelogenous leukemia, and Hodgkin lymphoma. Cancer. 2016;122(17):2723–30. https://doi.org/10.1002/cncr.30089 .
Bazzeh F, Rihani R, Howard S, Sultan I. Comparing adult and pediatric Hodgkin lymphoma in the surveillance, epidemiology and end results program, 1988–2005: an analysis of 21 734 cases. Leuk Lymphoma. 2010;51(12):2198–207. https://doi.org/10.3109/10428194.2010.525724 .
Bomze D, Sprecher E, Goldberg I, Samuelov L, Geller S. Primary cutaneous B-cell lymphomas in children and adolescents: a SEER population-based study. Clin Lymphoma Myeloma Leuk. 2021;21(12):e1000–5. https://doi.org/10.1016/j.clml.2021.07.021 .
Article CAS PubMed Google Scholar
Kassira N, Pedroso FE, Cheung MC, Koniaris LG, Sola JE. Primary gastrointestinal tract lymphoma in the pediatric patient: review of 265 patients from the SEER registry. J Pediatr Surg. 2011;46(10):1956–64. https://doi.org/10.1016/j.jpedsurg.2011.06.006 .
Naeem B, Ayub A. Primary pediatric non-Hodgkin lymphomas of the gastrointestinal tract: a population-based analysis. Anticancer Res. 2019;39(11):6413–6.
von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7. https://doi.org/10.1016/s0140-6736(07)61602-x .
Gröbner S, Worst B, Weischenfeldt J, et al. The landscape of genomic alterations across childhood cancers. Nature. 2018;555(7696):321–7. https://doi.org/10.1038/nature25480 .
Ma X, Liu Y, Liu Y, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555(7696):371–6. https://doi.org/10.1038/nature25795 .
Article CAS PubMed PubMed Central Google Scholar
Dulai PS, Thompson KD, Blunt HB, Dubinsky MC, Siegel CA. Risks of serious infection or lymphoma with anti-tumor necrosis factor therapy for pediatric inflammatory bowel disease: a systematic review. Clin Gastroenterol Hepatol. 2014;12(9):1443–51. https://doi.org/10.1016/j.cgh.2014.01.021 .
Gao J, Chen Y, Wu P, et al. Causes of death and effect of non-cancer-specific death on rates of overall survival in adult classic Hodgkin lymphoma: a populated-based competing risk analysis. BMC Cancer. 2021;21(1):955. https://doi.org/10.1186/s12885-021-08683-x .
Bhakta N, Liu Q, Yeo F, et al. Cumulative burden of cardiovascular morbidity in paediatric, adolescent, and young adult survivors of Hodgkin’s lymphoma: an analysis from the St Jude lifetime cohort study. Lancet Oncol. 2016;17(9):1325–34. https://doi.org/10.1016/s1470-2045(16)30215-7 .
Kupeli S. Cardiovascular disease after Hodgkin’s lymphoma: a role for screening. Lancet Haematol. 2015;2(11):e461–2. https://doi.org/10.1016/s2352-3026(15)00194-5 .
Horn SR, Stoltzfus KC, Mackley HB, et al. Long-term causes of death among pediatric patients with cancer. Cancer. 2020;126(13):3102–13. https://doi.org/10.1002/cncr.32885 .
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol. 2022;23(1):40–55. https://doi.org/10.1038/s41580-021-00407-0 .
Bzdok D, Krzywinski M, Altman N. Machine learning: supervised methods. Nat Methods. 2018;15(1):5–6. https://doi.org/10.1038/nmeth.4551 .
Download references
All authors sincerely thank the staff of the SEER Program for the development of the database.
This work was supported by Postdoctoral Fellowship Program of CPSF (No. GZB20230481), Post-Doctor Research Project, West China Hospital, Sichuan University (No. 2024HXBH149, No. 2024HXBH006), National Natural Science Foundation of China (No. 82303773, No. 82303772, No. 82303694, No. 82204490), Natural Science Foundation of Sichuan Province (No. 2023NSFSC1885, No. 2024NSFSC1908), Key Research and Development Program of Sichuan Province (No. 23ZDYF2836).
Yue Zheng and Chunlan Zhang have contributed equally to this work.
Division of Thoracic Tumor Multimodality Treatment, Cancer Center, West China Hospital, Sichuan University, Chengdu, China
Yue Zheng, Kai Kang, Ren Luo & Yijun Wu
Laboratory of Clinical Cell Therapy, West China Hospital, Sichuan University, Chengdu, China
Department of Hematology, West China Hospital, Sichuan University, Chengdu, China
Chunlan Zhang, Xu Sun & Ailin Zhao
You can also search for this author in PubMed Google Scholar
Conceptualization, A.Z. and Y.W.; methodology, Y.Z.; software, C.Z.; validation, Y.W. and Z.A.; formal analysis, X.S.; investigation, K.K.; resources, R.L.; data curation, Y.W.; writing—original draft preparation, Y.Z. and C.Z.; writing—review and editing, X.S.; visualization, K.K.; supervision, A.Z.; project administration, Y.W.; funding acquisition, A.Z. and Y.W. All authors have read and agreed to the published version of the manuscript.
Correspondence to Ailin Zhao or Yijun Wu .
Conflict of interest.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Zheng, Y., Zhang, C., Sun, X. et al. Survival trend and outcome prediction for pediatric Hodgkin and non-Hodgkin lymphomas based on machine learning. Clin Exp Med 24 , 132 (2024). https://doi.org/10.1007/s10238-024-01402-3
Download citation
Received : 24 April 2024
Accepted : 12 June 2024
Published : 18 June 2024
DOI : https://doi.org/10.1007/s10238-024-01402-3
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Introduction to experimental psychology.
This course provides an introduction to the basic topics of psychology including our three major areas of distribution: the biological basis of behavior, the cognitive basis of behavior, and individual and group bases of behavior. Topics include, but are not limited to, neuropsychology, learning, cognition, development, disorder, personality, and social psychology.
3440 Market Street, Suite 450 Philadelphia, PA 19104-3335
(215) 746-2309 [email protected]
Stem Cell Research & Therapy volume 15 , Article number: 173 ( 2024 ) Cite this article
Metrics details
Spinal cord injury (SCI) is a disease that causes permanent impairment of motor, sensory, and autonomic nervous system functions. Stem cell transplantation for neuron regeneration is a promising strategic treatment for SCI. However, selecting stem cell sources and cell transplantation based on experimental evidence is required. Therefore, this study aimed to investigate the efficacy of combination cell transplantation using the brain-derived neurotrophic factor (BDNF) over-expressing engineered mesenchymal stem cell (BDNF-eMSC) and induced pluripotent stem cell-derived motor neuron progenitor cell (iMNP) in a chronic SCI rat model.
A contusive chronic SCI was induced in Sprague-Dawley rats. At 6 weeks post-injury, BDNF-eMSC and iMNP were transplanted into the lesion site via the intralesional route. At 12 weeks post-injury, differentiation and growth factors were evaluated through immunofluorescence staining and western blot analysis. Motor neuron differentiation and neurite outgrowth were evaluated by co-culturing BDNF-eMSC and iMNP in vitro in 2-dimensional and 3-dimensional.
Combination cell transplantation in the chronic SCI model improved behavioral recovery more than single-cell transplantation. Additionally, combination cell transplantation enhanced mature motor neuron differentiation and axonal regeneration at the injured spinal cord. Both BDNF-eMSC and iMNP played a critical role in neurite outgrowth and motor neuron maturation via BDNF expression.
Our results suggest that the combined transplantation of BDNF- eMSC and iMNP in chronic SCI results in a significant clinical recovery. The transplanted iMNP cells predominantly differentiated into mature motor neurons. Additionally, BDNF-eMSC exerts a paracrine effect on neuron regeneration through BDNF expression in the injured spinal cord.
Spinal cord injury (SCI) is a disease that causes motor, sensory, and autonomic dysfunction. It is characterized by various symptoms, including post-injury paralysis, paresthesia, spastic pain, and cardiovascular, bladder, or sexual dysfunction. Severe SCI is a leading cause of death owing to severe autonomic dysfunction and neurogenic shock [ 1 ]. Based on a retrospective population-based study conducted between 2011 and 2020, 1,303 traumatic SCI (TSCI) accidents occurred among 4.9 million residents. The recent increase in TSCI incidence has increased its recognition as a global health priority [ 2 ]. The causes of TSCI are motor vehicle accidents, falls, work-related injuries, violent crimes, and sport-related injuries [ 3 ]. Patients with TSCI experience substantial mortality and morbidity rates, as well as an economic burden, owing to the high cost and complexity of medical care and lost productivity [ 2 ].
The pathophysiology of TSCI comprises two phases: primary and secondary. In most clinical situations, the focus is to prevent secondary injury mechanisms that occur following the primary injury. The secondary injury process is divided into acute, sub-acute, and chronic phases based on the time after the injury [ 3 ]. After the primary injury, pathological changes, such as chronic inflammation, cell dysfunction, and vascular changes, occur in the injured spinal cord (SC) tissue. These changes activate resident astrocytes, microglia, fibroblasts, and other glial cells at the lesion site and contribute to the infiltration of peripheral immune cells. The interactions between these cells at the lesion site are the basis of glial scarring, which inhibits axonal regeneration and myelination formation [ 4 , 5 , 6 , 7 ].
Currently, effective treatments for acute and chronic SCI do not exist. Stem cell transplantation has emerged as a promising strategy to inhibit glial scarring and reduce inflammation. Various cell sources are being explored in stem cell transplantation studies for SCI. The transplantation of Schwann cells, neural stem or progenitor cells, olfactory ensheathing cells (OECs), oligodendrocyte precursor cells, and mesenchymal stem cells (MSC) have been investigated as potential therapies for SCI. Stem cell transplantation can be derived from adult and embryonic stem cells (ESC) and induced pluripotent stem cells (iPSC) via direct conversion technology [ 8 , 9 , 10 , 11 , 12 , 13 ]. A combined cell transplantation approach is required to treat SCI more effectively because cellular response factors within the injured tissue determine SCI progression [ 14 , 15 ].
We aimed to confirm the feasibility of a combination cell transplantation strategy in a chronic SCI model. Initially, we used MSC as the first combination cell source. Prior research has reported significant clinical improvement in chronic SCI following MSC transplantation. Astrocytic differentiation of the transplanted cells was predominant at the lesion site. The potential of transplanted cells in chronic SCI has been confirmed; however, further research is required to improve their migration and differentiation into functional cells [ 16 ]. Brain-derived neurotrophic factor (BDNF) plays an essential role in neuronal maturation, differentiation, and survival of newly generated neurons via BDNF and TrkB signaling. BDNF has a promising potential as a treatment for central nervous system diseases such as brain disease and SCI; however, its application for neurological diseases is limited [ 17 , 18 , 19 ]. BDNF has a short half-life in vivo and cannot cross the blood-brain barrier, posing complex challenges in its application. To overcome these limitations, BDNF overexpression in MSCs has been attempted as a treatment for neurological diseases. Based on previous research, we have engineered human MSCs (hMSCs) to overexpress BDNF. Moreover, previous studies have confirmed that BDNF over-expressing engineered MSC (BDNF-eMSC) after irradiation can enhance their efficacy in facilitating recovery from brain diseases in rodent models [ 17 , 18 , 19 ].
Furthermore, previous studies have reported that transplanting ESC-derived and iPSC-derived-motor neuron progenitor cells increased neuronal survival and promoted neurite branching, resulting in functional recovery in SCI models. Recent experimental studies have suggested motor neuron and motor neuron progenitor cells as potential stem cell therapy strategies for SCI [ 20 , 21 , 22 , 23 ]. We aimed to transplant iPSC-derived motor neuron progenitor cells (iMNP) as a second combination cell source to increase the motor neuron differentiation rate at the lesion site in a chronic SCI model.
Trials of ideal cell types and cell transplantation strategies for chronic SCI are required to achieve effective stem cell transplantation. In this study, we used BDNF-eMSC and iMNP combination cell transplantation in a chronic SCI model. We hypothesized that transplanting BDNF-eMSC and iMNP cells in the severe stage of chronic SCI would induce functional recovery through BDNF expression, mature motor neuron differentiation, and axonal regeneration.
In vitro assay, bdnf-emsc preparation.
BDNF-eMSC was established based on the previously reported protocol [ 17 , 18 , 19 , 24 , 25 ] and provided as irradiated form by SL BIGEN, Inc., (Incheno, Korea). Human bone marrow-derived MSCs were purchased from the Catholic Institute of Cell Therapy, South Korea. MSC was cultured in low glucose-containing Dulbecco’s Modified Eagle Medium (DMEM) (Gibco, Grand Island, NY, USA) supplemented with 20% fetal bovine serum (FBS) (Gibco) and 5 ng/mL basic fibroblast growth factor (bFGF) (PeproTech, Rocky Hill, NJ, USA) and BDNF-eMSC was cultured in low glucose-DMEM supplemented with 10% FBS, 10 ng/mL bFGF in a humidified atmosphere of 5% CO 2 at 37℃.
Human iPSCs were generated from cord blood mononuclear cells (CBMCs) as previously described using Cyto Tune-iPSC Sendai Reprogramming kit containing Yamanaka factors (A16518, Thermo Fisher Scientific). CBMCs were directly obtained from the Cord Blood Bank of the Seoul St. Mary’s Hospital [ 26 , 27 ]. The CBMC-derived iPSCs were cultured and maintained in vitronectin-coated plate dishes using Essential 8™ Basal medium (Thermo Fisher Scientific) and supplements (Thermo Fisher Scientific). The differentiation of iPSC into motor neuron using small molecules was performed based on a previously reported protocol [ 28 , 29 ]. During motor neuron differentiation, we used motor neuron induction medium, including DMEM/F12, Neurobasal medium at 1:1, 1% N 2 , 1% B27, (Thermo Fisher Scientific), 0.1 mM ascorbic acid (Sigma-Aldrich, St Louis, MO, USA), 1X Glutamax, and 1X penicillin/streptomycin (Thermo Fisher Scientific). Induced pluripotent stem cell-derived neuron epithelial progenitor (iNEP) differentiation was induced in a motor neuron induction medium containing CHIR99021 (3 µM, Tocris, Bristol, United Kingdom), 2 µM dorsomorphin homolog 1 (Tocris), and 2 µM SB431542 (Stemgent, Cambridge, MA, USA). The culture medium was changed every other day for 6 days. During the induction of iMNP differentiation, retinoic acid (RA; 0.1 µM, Stemgent) and pumorphamine (Pur; 0.5 µM, Stemgent) were added to the iNEP cells along with 1 µM CHIR99021 (Tocris), 2 µM DMH1 (Tocris), and 2 µM SB431542 (Tocris) for 6 days. Subsequently, iMNP cells were cultured in a suspension of motor neuron induction medium containing 0.5 µM RA and 0.1 µM Pur to induce pluripotent stem cell-derived motor neuron (iMN) differentiation for an additional 6 days. For mature motor neuron differentiation, iMN cells were cultured with 0.5 µM RA, 0.1 µM Pur, and 0.1 µM Compound E (Calbiochem, San Diego, CA, USA) for 10 days.
To assess BDNF expression, 5 × 10 4 MSC and BDNF-eMSC were seeded onto coverslips in a 12-well plate. IF staining for MSC and BDNF-eMSC were performed 2 and 7 days after cell seeding, respectively. Cell seeding was performed on a 12-well plate laminin (10 mg/ml)-coated coverslip for IF staining in motor neuron cells. In iNEP, 5 × 10 4 cells were seeded, and for iMNP, iMN, and iPSC-derived mature motor neuron (iMature MN) cells were seeded with 5 × 10 5 cells on a laminin-coated coverslip. IF staining for iNEP, iMNP, and iMN were performed 6 days after cell seeding, whereas iMature MN was stained after 10 days. The IF staining protocol for BDNF expression and motor neuron differentiation were performed under the same conditions. All cells were fixed in 4% PFA for 30 min at RT and permeabilized using 0.1% Triton X-100 for 20 min at RT. Cell blocking was performed using PBS containing 2% BSA (PBA, Sigma-Aldrich) for 30 min. Primary antibodies were incubated with 2% PBA for 2 h at RT. After washing the cells using tris-buffered saline (TBS) with 0.05% Tween-20 (TBST), secondary antibodies conjugated with Alex Fluor-488 or 594 (Life Technologies) were incubated with 2% PBA for 1 h at RT. The stained cells were counterstained with 4′, 6-diamidino-2-phenylindole (DAPI, Roche, Basel, Switzerland), washed, and mounted using an antifade mounting reagent (Thermo Fisher Scientific). The stained cells were observed under a fluorescence LSM 900 and FV 3000 confocal microscope (Carl Zeiss, Oberkochen, Germany and Olympus Life science (EVIDNT), Tokyo, Japan ) (x 200 magnifications). The intensity of IF staining was measured in four areas at 200× magnification. The measured fluorescence intensity was analyzed using Fiji (Windos-64 Image J). Table 1 details the primary antibodies.
We performed BDNF-eMSC and iMN co-culture to analyze BDNF expression, mature motor neuron differentiation, and neurite outgrowth in vitro. For BDNF expression and mature motor neuron differentiation, BDNF-eMSC and iMN cells were cultured at a 1:1 ratio in a 2D co-culture. Mature motor neurons were differentiated for 10 days after 2D co-culture cell seeding on a laminin-coated plate. We used a 3D co-culture platform to assess neurite outgrowth cells during mature motor neuron differentiation. We generated BDNF-eMSC and iMN 3D aggregates using microwell plates (AggreWell TM 800, STEMCELL Technologies, Seattle, WA USA) following the manufacturer’s instructions. After aggregating BDNF-eMSC and iMN in the aggrewell for 2 days, the 3D-co-culture spheroids were attached to a laminin-coated plate and differentiated into mature motor neurons for 10 days. The neurite outgrowth during mature motor neuron differentiation was confirmed and evaluated using microtubule-associated protein-2 (MAP-2) and a neurite outgrowth assay kit (Life Technologies). Neurite outgrowth was evaluated using a fluorescence plate reader. Red fluorescence was detected using emission settings of 554/567 nm for the neurite outgrowth cells. To analyze synaptic connections and neural networks in mature motor neurons, IF staining was performed using synapsin-1, Tuj-1, and MAP-2 antibodies. Fluorescence intensity was measured in four areas at 200× magnification.
Electrophysiological analysis of mature motor neurons was performed using MEA. The 3D spheroid was made using AggreWell. Before cell seeding, the MEM plate was coated with 0.1% polyethyleneimine solution for 1 h at RT, followed by rinsing with sterile deionized water thrice and dried overnight in a biosafety cabinet at RT. The 3D spheroids per AggreWell treated with laminin (10 µg/mL) were seeded onto the MEA plate. After 1 h, the 3D spheroids were incubated with a mature motor neuron induction medium. The MEA plate was placed in a cell culture incubator with a 5% CO 2 humidified atmosphere at 37 ℃. Electroactivity of neurons was monitored after 10 days of culture and the number of spikes was recorded.
Protein was extracted from BDNF-eMSC, MSC, and iMNP cells using RIPA buffer (Thermo Fisher Scientific). The cell lysate was obtained after 10 days of 2D co-culture to analyze the expressions of BDNF and mature motor neuron marker, as well as mature motor neuron differentiation through 2D BNDF-eMSC and iMN culture. The cell lysate was incubated with RIPA buffer for 30 min at 4℃, followed by centrifugation at 16,000 rpm for 20 min. The amount of extracted protein was quantified using a bicinchoninic acid (BCA) protein assay. To confirm the expression of BDNF, motor neuron differentiation markers and MAP-2 in the quantified protein supernatant were separated using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred onto a nitrocellulose blotting membrane. The membrane was blocked with 3% BSA for 1 h at RT and incubated with primary antibodies overnight at 4℃. The following day, the membrane was incubated with secondary antibodies at RT for 1 h. Subsequently, protein expression was confirmed using an enhanced chemiluminescence solution. Protein expression was detected using LAS 4000 (BioRad, Herecules, CA, USA), and band intensity was quantified using multi-gauge V 3.0 software (Fujifilm, Tokyo, Japan). Full-length Western blot images are presented in Additional file 4 : Fig. 4 .
Animal care and contusive chronic sci model.
The Animal Studies Committee of the School of Medicine, the Catholic University of Korea, approved this study (IACUC approval Number CUMC-2020-0364-04). All animal care, operation, and cell transplantation procedures were conducted in accordance with the Laboratory Animal Welfare Act and the Guideline and Policies for Rodent Survival Surgery. The Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were followed. Contusive chronic SCI models were generated and prepared based on a previously reported surgical procedure [ 16 , 30 ]. Briefly, a contusive chronic SCI model was generated using 7-week-old adult male Sprague-Dawley rats (weighing between 270 and 320 g). The rats were anesthetized with isoflurane via inhalation and Rompun (2 mg/kg) via intraperitoneal injection. After anesthesia, the rats were shaved and sterilized with antiseptic betadine. The paravertebral muscles from thoracic 8 and 10 (T8–10) were exposed, and a laminectomy was performed at T9. The contusive SCI model was induced using the Multicenter Animal Spinal Cord Injury Study impactor (a 10 g rod was dropped from a height of 2.5 cm) during the laminectomy at T9. Pre- and post-operatively, the rats were administered 5 mg of ketoprofen, gentamicin, and warm saline solution for 3–5 days. The bladders of all rats with SCI were manually emptied for 1 week. Behavioral recovery was observed for 6 weeks after SCI, and rats that spontaneously recovered were excluded before combination cell transplantation.
The concepts of combination cell transplantation have been performed in the induced contusive chronic SCI model. We performed and recorded the behavioral assessment before cell transplantation. The rats were randomized into the following groups at 6 weeks post-SCI for cell transplantation: (1) chronic SCI + phosphate-buffered saline (PBS) group ( n = 8), (2) chronic SCI + BDNF-eMSC group ( n = 6), (3) chronic SCI + iMNP group ( n = 7), and (4) chronic SCI + BDNF-eMSC + iMNP group ( n = 8). Before cell transplantation, BDNF-eMSCs were labeled with PKH26 (red fluorescence), whereas iMNPs were labeled with PKH67 (green fluorescence). At 6 weeks post-injury, the lesion site (T9) was re-exposed, and 1 × 10 6 cells of BDNF-eMSC and iMNP cells in 10 µL PBS were transplanted using a Hamilton needle in the BDNF-eMSC and iMNP groups, respectively. The BDNF-eMSC + iMNP group was transplanted with both types of cells (1:1) in 10 µL PBS at the lesion site. The PBS group was transplanted with 10 µL PBS at the lesion site. All groups were transplanted at the rostral (5 µL) and caudal (5 µL) of the lesion site. We intramuscularly administered 10 mg/kg of cyclosporin A (Cipol Inj, Chongkundang Pharmaceutical) daily after cell transplantation.
The transplanted cells were tracked, and their engraftment and differentiation in the injured SC were confirmed using PKH26 (red fluorescence) (Sigma-Aldrich, St Louis, MO, USA) and PKH67 (green fluorescence) (Sigma-Aldrich). PKH26 and PKH67 are fluorescent cell membrane-intercalating dyes. Before cell transplantation, BDNF-eMSC was labeled with PKH26, whereas iMNP was labeled with PKH67. Briefly, the PKH26 and PKH67 cell tracking procedure was the same. A cell pellet containing 1 × 10 6 cells was incubated with Diluent C and cell tracking dye solutions for 5 min at room temperature (RT). After labeling, the activity of the cell tracking dyes was stopped using 1% bovine serum albumin (BSA). After the final wash, the cell pellet was centrifuged and suspended in PBS for cell transplantation.
We assessed behavioral recovery using the Basso, Beattie, and Bresnahan (BBB) locomotor rating scale after SCI. Three researchers monitored the BBB locomotor scale and recorded the scores every week for 12 weeks. Rats that exhibited natural, spontaneous improvement of hindlimb function within 24 h post-injury were excluded. Furthermore, rats that spontaneously recovered before cell transplantation were excluded. Based on the BBB locomotor scales, rats were divided into two grades: grade 1 (0–5) and grade 2 (6–11). Improvements in behavioral recovery were compared using the incidence rate. The incidence rate (%) was calculated as follows: (BBB grade score of total rats/total rats) x 100.
We euthanized the specimens using CO 2 gas (30–70% chamber volume/min) before obtaining the injured SC, following the American Veterinary Medical Association Guidelines for the Euthanasia of Animals (2020 Edition). The injured SC samples (approximately 1 cm segment) were obtained at 12 weeks post-injury. For immunofluorescence (IF) staining assessments, we first confirmed cardiac arrest, and then the injured SC was obtained after trans-cardiac perfusion with PBS and 4% paraformaldehyde (PFA). The obtained injured SC was fixed overnight in 4% PFA, followed by overnight incubation in 15% and 30% sucrose at 4℃. Injured SC was embedded in optical cutting temperature (Tissue-Tek; Sakura Finetek USA, Torrance, CA, USA) and snap-frozen using liquid nitrogen. For Western blot (WB) analysis, injured SC (approximately 1 cm segment) was obtained after euthanasia without a prior cardiac perfusion procedure. The obtained injured SC was immersed in liquid nitrogen and stored in a deep freezer at -80℃.
IF staining was performed on frozen sections of the embedded injured segments (approximately 1 cm) to assess transplanted cell engraftment, differentiations, axonal regeneration, and BDNF expression. Frozen SC sections (4 μm thick) were obtained and mounted on saline-coated slides. The SC sections were fixed using cold acetone for 10 min at RT, followed by washing with TBST. The slide sections were permeabilized using 0.1% Triton X-100 for 20 min at RT. Subsequently, the SC sections were blocked with normal goat or horse serum containing 0.1% Triton X-100 for 1 h at RT. Primary antibodies were incubated with 0.1% Triton X-100 and 1% normal goat or horse serum overnight at 4℃. After washing, Cy5 (Life Technologies, Carlsbad, CA, USA) Fluor-conjugated secondary antibodies were incubated and diluted in TBST at RT for 1 h. Slides were washed, stained with DAPI, and finally washed and mounted using an antifade mounting reagent. We confirmed and observed the IF staining under a confocal microscope, LSM 700 and 900 and FV 3000 (Carl Zeiss, Oberkochen, Germany) and Olympus Life science (EVIDNT). The number of engrafted cells in the lesion site was counted in six areas at 400× magnification for PKH26 (BDNF-eMSC) and PKH67 (iMNP). Transplanted cells were counted using the cell counter plug-in of Fiji (Windows-64 Image J) program. The intensity of engrafted cells was measured in six areas at 400× magnification. The intensity of IF staining was measured in four areas at x200 magnification. The measured fluorescence intensity was analyzed using Fiji (Windows-64 image J) program. Table 1 details the primary antibodies.
The injured SC segments (approximately 1 cm length) were extracted using tissue protein extraction reagent (Thermo Fisher Scientific) with one protease inhibitor cocktail tablet (Roche, Basel, Switzerland) and 1 mM phenyl methyl sulfonyl fluoride for 1 h at 4℃. Additionally, the tissue-extracted protein amount was quantified using a BCA quantitative analysis. Protein quantification was separated using sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred onto a nitrocellulose blotting membrane. The transferred membranes were blocked using 3% BSA in TBST for 1 h at RT and incubated with primary antibodies overnight at 4℃. Table 1 detail the primary antibodies. Membranes were incubated with secondary peroxidase-conjugated antibodies for 1 h at RT. Protein expression was confirmed using an enhanced chemiluminescence solution. Protein expression was detected via exposure to LAS 4000 (BioRad, Hercules, CA, USA), and band intensity was quantified using multi-gauge V 3.0 software (Fujifilm, Tokyo, Japan). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
All results were statistically analyzed using IBM SPSS statistics for Windows, version XX (IBM Corp., Armonk, N.Y., USA). All data were expressed as means ± standard deviations. For BDNF expression analysis in hMSC and BDNF-eMSC in vitro assay, the paired t-test (#) and Kruskal–Wallis test followed by Mann–Whitney U test (†) were used to compare the results between the two groups. Statistical significance was set at a p < 0.05. The statistical relevancies were expressed using a one-way analysis of variance and Fisher’s least significant difference (*) and Kruskal–Wallis tests followed by Mann–Whitney U test for intergroup comparison to compare the results among the three or four groups. Statistical significance was set at p <0.05 (†*# p < 0.05, ††**## p < 0.01, †††***### p < 0.001, n.s = not significant).
In previous studies reported that the BDNF-eMSC demonstrated highly proliferative and secreted BDNF expression than naïve MSCs. BDNF-eMSC was also increased the BDNF expression than naïve MSCs. The BDNF-eMSC was generated as previously described [ 17 , 18 , 19 ]. Before cell transplantation, we confirmed and performed BDNF expression in BDNF-eMSC. The BDNF-eMSC was established using a lentiviral vectors encoding the c-Myc, tTA and BDNF genes and then irradiated with 200 Gy radiation using an X-ray irradiation device (Red Source Technologies, Buford, GA, USA) (Fig. 1 a). On day 2 after cell thawing, the BDNF-eMSC displayed a homogenous spindle-like cell morphology representative of MSCs (Fig. 1 b). MSC and BDNF-eMSC exhibited positive expression of BDNF on day 2. The fluorescence intensity of BDNF was statistically significantly higher in BDNF-eMSC than in MSC (Fig. 1 c). BDNF-eMSC maintained their spindle-like cell morphology till day 7 (Fig. 1 d). IF staining revealed BDNF expression in BDNF-eMSC on day 7. However, the fluorescence intensity of BDNF in BDNF-eMSC significantly decreased on day 7 compared to that on day 2 (Fig. 1 e). WB analysis confirmed higher BDNF expression in BDNF-eMSC than in MSCs (Fig. 1 f). Furthermore, we observed that BDNF expression in BDNF-eMSC decreased on day 7 than on day 2 (Fig. 1 g). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Generation of BDNF-eMSCin vitro. ( a ) The scheme of BDNF-eMSC. ( b ) Representative light microscope images of MSC and BDNF-eMSC cell morphologies on day 2. ( c ) Representative fluorescence images showing BDNF expression in MSC and BDNF-eMSC on day 2. Quantification of the fluorescence intensity of BDNF in MSC and BDNF-eMSC ( n = 8). ( d ) Representative light microscope image depicting BDNF-eMSC morphology on day 7. ( e ) Representative fluorescence image showing BDNF expression on BDNF-eMSC on day 7. Quantification of the fluorescence intensity of BDNF in BDNF-eMSC on day 2 and 7 ( n = 8). ( f ) Western blotting (WB) results showing BDNF expression in MSC and BDNF-eMSC after 2 days of cell lysates. ( g ) WB results demonstrating BDNF expression in BDNF-eMSC after 2 and 7 days of cell lysates. Full-length western blot images are presented in Additional file 4 : Fig. 4 . The data are presented as mean ± SEM. Statistical significance was estimated using paired t-test (#) and Kruskal–Wallis analysis followed by Mann–Whitney (†) analysis for intergroup comparison. #, † P < 0.05, ## P < 0.01. ### P < 0.001. (MSC: 2 days n = 4, BDNF-eMSC: 2 days n = 4, BDNF-eMSC: 7 days n = 4). Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; IF, Immunofluorescence staining; WB, Western blot
The motor neuron progenitor and mature motor neurons were differentiated using a small molecule cocktail as previously described [ 28 , 29 ] (Fig. 2 a). Our results revealed that iMNP and iMature MN were successfully differentiated and reproduced from human iPSC, as confirmed via light microscopy (Fig. 2 b). We evaluated the stages of motor neuron differentiation using the specific marker expression of each cell differentiation phase. On day 6, IF staining and WB analysis revealed Sox 1 expression in the iNEP phase (Fig. 2 c and d-e). The fluorescence intensity of SOX1 was significantly higher in iNEP (Additional file 1 : Fig. 1 ). OLIG2 expression was more strongly expressed in the iMNP phase than in other motor neuron cell differentiation stages (Fig. 2 d and f). The fluorescence intensity of OLIG2 was also significantly higher in iMNP (Additional file 1 : Fig. 1 ). IF staining revealed HB9-positive cells in the iMN phase (Fig. 2 c), and the fluorescence intensity of HB9 was significantly higher in iMN (Additional file 1 : Fig. 1 ). The protein expression increased on day 18, as confirmed by WB analysis (Fig. 2 d and g). In the iMature MN phase, we evaluated SMI-32 expression using IF staining and WB analysis. The fluorescence intensity of SMI-32 was statistically significantly higher in mature iMNs than in iNEP, iMNP, and iMN (Additional file 1 : Fig. 1 ). On day 28, iMature MN showed increased SMI-32 positivity and protein expression (Fig. 2 c and d-h). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Generation of induced pluripotent stem cell (iPSC)-derived motor neurons using small moleculesin vitro. ( a ) The scheme of motor neuron progenitor cells (MNP), motor neuron (MN), and mature MN differentiation from human iPSCs using a small molecule cocktail. ( b ) Representative time course light microscopy images displaying induced pluripotent stem cell-derived neuron epithelial progenitor cell (iNEP), induced pluripotent stem cell-derived motor neuron progenitor cells (iMNP), induced pluripotent stem cell-derived motor neuron cells (iMN), and induced pluripotent stem cell-derived mature motor neuron cells (iMature MN) differentiation. ( c ) Fluorescence time course images of iNEP, iMNP, iMN, and iMature MN using stage differentiation markers. ( d-h ) WB results of motor neuron stage differentiation markers of stage cell differentiation in cell lysates. (iNEP: n = 4, iMNP; n = 4, iMN: n = 4, and iMature MN: n = 4). Full-length western blot images are presented in Additional file 1 : Fig. 1 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P < 0.05, **†† p < 0.01. Scale bars = 50 μm. iPSCs, induced pluripotent stem cell; iNEPs, induced pluripotent stem cell-derived neuron epithelial progenitor cells; iMNPs, induced pluripotent stem cell-derived motor neuron progenitor cells; iMNs, induced pluripotent stem cell-derived motor neuron cells; iMature MNs, induced pluripotent stem cell-derived mature motor neuron cells; IF, Immunofluorescence staining; WB, Western blot. SOX1 = iNEP, OLIG2 = iMNP, HB9 = iMN, SMI-32 = iMature MN.
Before cell transplantation, BDNF-eMSC was labeled with PKH 26 (red), whereas iMNP cells were labeled with PKH67 (green). First, we generated a contusive chronic SCI model and transplanted BDNF-eMSC and iMNP cells into the injured SC via the intralesional route at 6 weeks post-injury. At 12 weeks post-injury, we assessed the engraftment of transplanted cells and BDNF expression in the injured SC (Fig. 3 a). We used the BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post-injury. At 12 weeks post-injury, the BDNF-eMSC + iMNP group exhibited a significantly improved functional recovery than the PBS and BDNF-eMSC groups. The incidence ratio of BBB score 6–11 was 62.5% in the BDNF-eMSC + iMNP group and 14.28% in the iMNP group at 12 weeks post-injury (Fig. 3 b and Supplementary Movie 1 , 2 , 3 and 4 ). At 12 weeks post-injury, transplanted BDNF-eMSC (PKH26, red) were not observed in the white and gray matter of the lesion site. In contrast, transplanted iMNP (PKH67, green) and BDNF-eMSC + iMNP (PKH67, green) cells were observed and persistent in the white and gray matter of the lesion (Fig. 3 c and e-f). However, fewer BDNF-eMSC + iMNP cells were observed in the transplanted BDNF-eMSC at the lesion site (Fig. 3 c and e-f). At 12 weeks after injury, iMNP cells rather than BDNF-eMSCs remained at the lesion site (Fig. 3 c and e-f). However, it was confirmed that BDNF-eMSCs engrafted and remained at the lesion site one week after transplantation (Additional file 2 : Fig. 2 ). BDNF expression was assessed using IF staining to confirm BDNF expression at the lesion site. All cell transplantation groups showed BDNF expression compared with the PBS group at 12 weeks post-injury. The IF staining showed the BDNF-eMSC + iMNP group had a higher fluorescence intensity than the PBS, BDNF-eMSC, and iMNP groups. However, significant differences were not observed between the groups (Fig. 3 d and g). In the WB analysis, the BDNF-eMSC + iMNP group had a higher BDNF expression than the PBS, BDNF-eMSC, and iMNP groups at the lesion segment (approximately 1 cm). However, significant differences were not observed between all groups (Fig. 3 h). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Combined transplantation of BDNF-eMSC and iMNP in a contusive chronic SCI Model. ( a ) Experimental schemes illustrating contusive chronic SCI rat model generation, combined cell transplantation, and clinical behavior and histology assessments. ( b ) The BBB scales and incidence rates 12 weeks after SCI (PBS: n = 8, BDNF-eMSC: n = 6, iMNP: n = 7, BDNF-eMSC + iMNP n = 8). ( c ) IF images showing merged 4´,6-diamidino-2-phenylindole and transplanted cells (red; BDNF-eMSC, green; iMNP) at the lesion site at 12 weeks. ( d ) Multi-fluorescent confocal images showing the expression of BDNF and transplanted cells in the white matter. ( e ) The number of engrafted cells at the lesion site ( n = 6). ( f ) Quantification of the fluorescence intensity of engrafted cells at the lesion site ( n = 6). ( g ) Quantification of the fluorescence intensity of BDNF at the lesion site ( n = 4). ( h ) WB results of BDNF expression at the lesion site segments (approximately 1 cm). Full-length western blot images are presented in Additional file 4 : Fig. 4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and Mann–Whitney U (†) test with least significant difference post hoc analysis (*); *, † P < 0.05, **†† p < 0.01. (PBS: n = 4, BDNF-eMSC: n = 4, iMNP: n = 4, BDNF-eMSC + iMNP: n = 4). Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; BBB, Basso–Beattie–Bresnahan; iMNPs, induced pluripotent stem cell-derived motor neuron progenitor cells; IF, Immunofluorescence staining; WB; Western blot
IF staining and WB analysis confirmed motor neuron differentiation and axonal regeneration of the transplanted cells at the lesion site. Mature motor neuron differentiation was evaluated using the SMI-32 marker. At 12 weeks post-injury, SMI-32 expression was observed around the gray matter of the lesion site. The expression of SMI-32 and transplanted iMNP cells were higher in the iMNP and BDNF-eMSC + iMNP groups than that in the PBS and BDNF-eMSC groups (Fig. 4 a). SMI-32 protein expression in the injured segments of the BDNF-eMSC + iMNP group significantly increased than that of the PBS group. However, significant differences were not observed between cell transplantation groups (Fig. 4 b). MAP-2 marker expression was analyzed in the axial section of the injured SC to confirm axonal regeneration of the transplanted cells at the lesion site 12 weeks post-injury. MAP-2 expression was observed around the dorsal horn and central canal of the lesion site. MAP-2 expression in the transplanted iMNP and BDNF-eMSC + iMNP cells were predominant in the injured SC (Fig. 4 c). MAP-2 revealed a significantly higher expression in the BDNF-eMSC + iMNP group compared with the PBS and iMNP groups (Fig. 4 d). The growth density of the neuronal process at the lesion site was analyzed using the GAP-43 marker at 12 weeks post-injury. GAP-43 expression was observed around the dorsal horn and central canal of the lesion site (Additional file 3 : Fig. 3 a). The IF staining showed the fluorescence intensity of the cell transplantation groups was significantly higher than that of PBS (Additional file 3 : Fig. 3 b). The expression of GAP-43 was significantly higher in the iMNP and BDNF-eMSC + iMNP groups than in the PBS group (Additional file 3 : Fig. 3 c). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Enhancements of mature MN differentiation and growth density of neuronal processes by BDNF-eMSC and iMNPin vivo. ( a ) IF images showing the merged SMI-32 and transplanted cells at the lesion site, with SMI-32 differentiation of the transplanted iMNP cells being predominant in the lesion site at 12 weeks post-injury. ( b ) WB results of SMI-32 expression at the lesion site segment (approximately 1 cm) (PBS: n = 4, BDNF-eMSC: n = 4, iMNP: n = 4, BDNF-eMSC + iMNP: n = 4). Full-length WB images are presented in Additional file 1 : Fig. 1 . ( c ) Confocal images showing MAP-2 and transplanted cell expression around the lesion site. ( d ) WB results of MAP-2 expression at the lesion site segment (approximately 1 cm) (PBS: n = 4, BDNF-eMSC: n = 4, iMNP: n = 4, BDNF-eMSC + iMNP n = 4). Full-length western blot images are presented in Additional file 4 : Fig. 4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P < 0.05. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMNP, induced pluripotent stem cell-derived motor neuron progenitor cells; IF, Immunofluorescence staining; WB, Western blot
IF staining and WB analysis were performed to confirm the expression of oligodendrocyte and neuronal cell differentiation in the engrafted cells. We assessed oligodendrocyte and neuronal cell differentiation using the CC-1 and NeuN markers at the lesion site. CC-1 is a representative oligodendrocyte phenotype marker. At 12 weeks post-injury, CC-1 expression was observed around the white matter. Oligodendrocyte-positive cells of the transplanted iMNP and BDNF-eMSC + iMNP cells were predominant in the injured SC and were qualitatively abundant around the engrafted iMNP cells (Fig. 5 a). The IF staining confirmed that the fluorescence intensity of CC-1 in the cell transplantation groups was significantly higher than that in the PBS group (Fig. 5 b). The BDNF-eMSC + iMNP group had significantly higher CC-1 expression than the PBS group (Fig. 5 c). NeuN is a neuronal cell phenotype marker. NeuN expression was mainly observed in the dorsal horn of the gray matter and central canal. NeuN and iMNP were highly expressed in the iMNP and BDNF-eMSC + iMNP groups than in the PBS and BDNF-eMSC groups in the dorsal horn of the gray matter and central canal at 12 weeks post-injury (Fig. 5 d). The IF staining confirmed that the fluorescence intensity of NeuN in the cell transplantation groups was significantly higher than that in the PBS group (Fig. 5 e). The iMNP and BDNF-eMSC + iMNP groups had significantly higher NeuN expression than the PBS group (Fig. 5 f). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Increased oligodendrocyte and neuronal cells by BDNF-eMSC and iMNP in the lesion site. ( a ) IF image showing oligodendrocyte differentiation by transplanted BDNF-eMSC and iMNP around the injured site. ( b ) Quantification of the fluorescence intensity of CC-1 at the lesion site ( n = 4) ( c ) WB results of CC-1 expression in the injured segment (approximately 1 cm) (PBS: n = 4, BDNF-eMSC: n = 4, iMNP: n = 4, BDNF-eMSC + iMNP: n = 4). Full-length WB images are presented in Additional file 4 : Fig. 4 . ( d ) IF analysis of NeuN expression, merged with transplanted cells, was observed in the gray matter. ( e ) Quantification of the fluorescence intensity of NeuN at the lesion site ( n = 4). ( f ) Expression of neuronal cell marker NeuN, confirmed in the injured segment (approximately 1 cm) via WB. Full-length western blot images are presented in Additional file 4 : Fig. 4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P < 0.05. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMNP, induced pluripotent stem cell-derived motor neuron progenitor cell; IF; Immunofluorescence staining, WB; Western blot
In vivo results demonstrated that combination cell transplantation using BDNF-eMSC and iMNP promoted motor neuron maturation and axonal regeneration at the lesion site. 2D and 3D co-culture of BDNF-eMSC and iMN were performed to confirm the effect of motor neuron differentiation and axonal regeneration in vitro based on previous in vivo results. We analyzed the motor neuron maturation and axonal regeneration by co-culturing BDNF-eMSC and iMN in 2D and 3D spheroid platforms during mature motor neuron differentiation (Fig. 6 a). In the IF staining, SMI-32 expression was qualitatively higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig. 6 g). In the WB analysis, SMI-32 expression was also significantly higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig. 6 b and c). BDNF expression was qualitatively higher in the BDNF-eMSC and BDNF-eMSC + iMN groups than in the iMN group (Fig. 6 g). In the WB analysis, BDNF expression was significantly higher in the BDNF-eMSC group than in the iMN and BDNF-eMSC + iMN groups. Additionally, BDNF expression was significantly higher in the BDNF-eMSC + iMN group than in the iMN groups on day 10 (Fig. 6 d). The MAP-2 marker was analyzed in a 2D co-culture platform to evaluate the expression of axonal regeneration. In the IF staining and WB analysis, MAP-2 expression was significantly higher in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig. 6 e and h). We assessed neurite outgrowth induction by co-culturing BDNF-eMSC and iMN in 3D spheroid platforms. We attached the BDNF-eMSC and iMN 3D spheroids to laminin-coated plates for 10 days and assessed neurite outgrowth from the 3D spheroids on day 10 using a neurite outgrowth assay kit. In the red fluorescence staining and intensity analysis, neurite outgrowth was significantly higher in the BDNF-eMSC + iMN group than in the BDNF-eMSC and iMN groups. Furthermore, neurite outgrowth was significantly higher in the iMN group than in the BDNF-eMSC group (Fig. 6 f and h). Full-length western blot images are presented in Additional file 4 : Fig. 4 .
Increased MN maturation and axonal regeneration induction by BDNF-eMSC and iMN cell co-culture. ( a ) Schematic of BDNF-eMSC and iMN 2D and 3D co-culture motor neuron differentiation and maturation and axonal regeneration assessments in vitro assay. ( b ) Representative WB images of SMI-32, BDNF, MAP-2 in 2D co-culture on day 10. ( c ) Protein expression of SMI-32 in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10. ( d ) Protein expression of BDNF in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10. ( e ) Protein expression of MAP-2 in BDNF-eMSC and iMN 2D co-culture cell lysates on day 10 (BDNF-eMSC: n = 3, iMN: n = 3, BDNF-eMSC + iMN: n = 3). ( f ) Quantification of neurite outgrowths on day 10 in 3D co-cultured spheroid (BDNF-eMSC: n = 6, iMN: n = 6, BDNF-eMSC + iMN: n = 6). ( g ) Representative IF images of SMI-32 in BDNF-eMSC and iMN 2D co-culture on day 10. Representative IF images of BDNF in BDNF-eMSC and iMN 2D co-culture on day 10. ( h ) Representative IF images of MAP-2 in BDNF-eMSC and iMN 2D co-culture on day 10. Representative fluorescence images of neurite outgrowth on day 10 in 3D co-cultured spheroid. Full-length western blot images are presented in Additional file 4 : Fig. 4 . The data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P < 0.05, **†† p < 0.01, *** p < 0.001. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMN, induced pluripotent stem cell-derived motor neuron; IF, Immunofluorescence staining; WB, Western blot
We showed that cell transplantation using the combination of BDNF-eMSC and iMN increased the differentiation of mature motor neurons and growth density of the neuronal process in injured SC; however, it was difficult to confirm the possible mechanism in vivo. We hypothesized that the 3D co-culture of BDNF-eMSC and iMN would increase the neural circuitry and connection during MN differentiation and maturation. To confirm this hypothesis, we analyzed the induction of neural circuitry and connection by co-culturing BDNF-eMSC and iMN on 3D spheroid platforms during the differentiation of mature iMN. Synaptic connections and local neural networks were assessed in the 3D co-culture using synapsin-1, Tuj-1, MAP-2 markers, and MEA analysis on day 10 (Fig. 7 a). The IF staining showed that synapsin-1, Tuj-1, and MAP-2 expression were higher in the BDNF-eMSC + iMN group than in the BDNF-eMSC and iMN groups (Fig. 7 b - f). Dendrite connections between spheroids were confirmed in the iMN and BDNF-eMSC + iMN groups, but not in the BDNF-eMSC group (Fig. 7 b and c). The IF staining confirmed that the fluorescence intensity of synapsin-1, Tuj-1, and MAP-2 in the iMN and BDNF-eMSC + iMN groups was significantly higher than that in the BDNF-eMSC group (Fig. 7 d-f). Electrophysiology of the 3D spheroid during mature MN differentiation was confirmed using MEA. Neural spikes and activities increased in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group. MEA showed a higher number of spikes in the iMN and BDNF-eMSC + iMN groups than in the BDNF-eMSC group (Fig. 7 g-i). Taken together, we confirmed the successful neural circuitry and connection during the maturation of MN generated from the 3D co-culture platform.
Synergistic effect of promoting synaptic connections and neural networks by BDNF-eMSC and iMN 3D co-culture platform during the differentiation of mature motor neurons. ( a ) Schematic of the assessments of BDNF-eMSC and iMN 3D co-culture in promoting the differentiation of motor neurons, synaptic connections, and neural networking using IF staining and MEA analysis in vitro assay. ( b ) Representative IF images of synapsin-1 and Tuj-1 in 3D co-culture on day 10. ( c ) Representative IF images of synapsin-1 and MAP-2 in 3D co-culture on day 10. ( d ) Quantification of the fluorescence intensity of synapsin-1 in 3D co-culture on day 10. ( e ) Quantification of the fluorescence intensity of Tuj-1 in 3D co-culture on day 10. ( f ) Quantification of the fluorescence intensity of MAP-2 in 3D co-culture on day 10. ( g ) Representative images of heatmap activity for plate-wide visualization of spike or beat rates and amplitudes on MEAs (3D BDNF-eMSC, n = 4; 3D iMN, n = 4 and 3D BDNF-eMSC + iMN, n = 4). ( h ) Measurement of active electrodes per well. i Measure of average number of spikes of active electrodes per well (3D BDNF-eMSC, n = 4; 3D iMN, n = 4 and 3D BDNF-eMSC + iMN, n = 4). Data are presented as mean ± SEM. Statistical significance was estimated using the Kruskal–Wallis test with post hoc analysis and the Mann–Whitney (†) test with the least significant difference post hoc analysis (*); *, † P < 0.05, **†† p < 0.01, *** p < 0.001. Scale bars = 50 μm. BDNF-eMSC, BDNF over-expressing engineered mesenchymal stem cells; iMN, induced pluripotent stem cell-derived motor neuron; IF, Immunofluorescence staining; MEA, multi-electrode arrays
Current pharmacological or physical rehabilitation-based therapies for chronic SCI are limited and primarily focus on managing symptoms such as pain or muscle stiffness. Moreover, treatments that can clinically improve motor or sensory function in patients with chronic SCI trauma are lacking [ 31 , 32 , 33 , 34 , 35 ]. Clinical therapeutic trials for overcoming SCI include neuronal protection and regeneration approaches. Neuroregenerative trials have been used to enhance exogenous supplement using various stem cells in chronic SCI model. These stem cells include Schwann cells, OECs, MSCs, neural stem/progenitor cells, ESCs, and iPSC-derived cells [ 3 , 8 ]. Trials aimed at identifying the ideal cell types and transplantation strategies in chronic SCI are required to achieve effective stem cell transplantation in SCI. In this study, we aimed to investigate the combination transplant of BDNF-eMSC and iMNP in a chronic SCI model.
MSC cell transplantation in chronic SCI offers promising neuron regenerative strategies. Previous studies have reported significant clinical improvement with MSC transplantation in a chronic SCI model [ 16 ]. However, the efficacy of cell engraftment is low owing to the distinct pathology of chronic SCI. Therefore, modulation of the microenvironment in chronic SCI is required to enhance the efficacy of transplantation [ 30 ]. Based on previous studies, we aimed to increase the efficacy of cell engraftment and differentiation at the lesion site using BDNF-eMSCs in a chronic SCI model. Previous studies have reported the neuroprotective therapeutic effects of BDNF-eMSC in neonatal hypoxic-ischemic, traumatic brain injury, and neurogenic bladder models in rats [ 17 , 18 , 25 , 30 ]. However, to therapeutic efficacy, safety is another critical issue for successful clinical translation. BDNF-eMSC was established using a lentiviral vector encoding the c-Myc, the reprogramming factor, tumorigenicity is a major concern for their in vivo. Previous studies findings suggest that the BDNF-eMSC is safety and ready to use and therapeutic efficacy confirmed in neonatal hypoxic-ischemic, traumatic brain injury model in rats and cardiac repair [ 17 , 18 , 24 ]. We performed an in vitro assay to assess BDNF expression in BDNF-eMSC and naïve MSC before cell transplantation. Our data suggest that BDNF-eMSC effectively increased BDNF expression more than naïve MSC (Fig. 1 c - f). After cell seeding, BDNF expression decreased on day 7 compared with day 2, but continued BDNF expression was confirmed (Fig. 1 d - g).
Recovery of motor function in chronic SCI model is still limited. Therefore, a combination treatment strategy involving various approaches must be considered to improve motor functions in chronic SCI models [ 36 ]. We attempted a combination cell transplantation strategy involving BDNF-eMSC and iMNP to increase the survival of engrafted cells at the lesion site and enhance the differentiation capacity of motor neurons in a chronic SCI model. We used iMNP cells to promote motor neuron differentiation at the lesion site. We generated iPSC-derived motor neurons using a previously reported small molecule approach [ 28 , 29 ]. Our results suggest that iPSC-derived motor neurons were successfully differentiated in vitro (Fig. 2 a-c), and the iMNP cell phenotype was confirmed before cell transplantation using the OLIG2 marker (Fig. 2 d). We performed combination cell transplantation of irradiated BDNF-eMSC and iMNP in a contusive chronic SCI model. At 6 weeks post-injury, BDNF-eMSC + iMNP group containing both types of cells (1:1) in 10 µL PBS was transplanted to the lesion site. Interestingly, BDNF-eMSC and iMNP combination cell transplantation improved clinical recovery and incidence rate than PBS and single-cell transplantation groups in the chronic SCI model (Fig. 3 a and b). At 12 weeks post-injury, we discovered that transplanted iMNP in the iMNP and BDNF-eMSC + iMNP groups remained at the lesion site, but no BDNF-eMSC were detected (Fig. 3 c). Previous studies have reported that irradiated cultured HGF (Hepatocyte growth factor) over-expressing engineered mesenchymal stem cell (HGF-eMSC) exhibited decreased proliferation rates in vitro culture and no tumor was detected in in vivo tumorigenicity testing using nude mice [ 24 ]. Our result revealed that irradiated the culture BDNF-eMSC exhibited decreased BDNF expression on day 7 (Fig. 1 g), and at 6 weeks after implantation, most of the BDNF-eMSC did not remain at the lesion site using microscopic observations (Fig. 3 c). These results suggest that genetically engineered cell is a suitable combination cell transplantation strategy in chronic SCI models.
Previous studies have reported that allogenic bone marrow-derived MSC transplantation without cell manipulation in acute and chronic SCI mainly resulted in astrocytic differentiation at the lesion site [ 16 , 37 , 38 ]. Our research discovered that BDNF-eMSC transplantation in chronic SCI increased the oligodendrocyte and neuron cells compared with PBS. However, the BDNF-eMSC group showed less neuron differentiation at the lesion site than the iMNP group (Fig. 5 a - f). Another study reported that human iMNP cell transplantation in an acute SCI model resulted in transplanted human iMNP with a motor neuron lineage of mixed maturation state in the ventral horns [ 23 ]. Our research observed higher SMI-32 expression in transplanted iMNP cell in the iMNP and BDNF-eMSC + iMNP groups than that in the PBS and BDNF-eMSC groups at 12 weeks post-injury (Fig. 4 a and b). Our results suggest that iMNP directly influences mature motor neuron differentiation more than BDNF-eMSC at the lesion site. Transplanted hMNP cell increased endogenous neuronal survival and promote neurite branching [ 23 ]. We confirmed that BDNF-eMSC and iMNP combination cell transplantation increased axonal regeneration, as indicated by MAP-2 expression, compared with PBS. The BDNF-eMSC and iMNP groups showed significantly higher MAP-2 expression than the iMNP group at the lesion site (Fig. 4 c and d). Our results suggest that mature motor neuron differentiation and growth density of neuronal processes are enhanced by the synergistic effects of BDNF-eMSC and iMNP combination cell transplantation at the lesion site in the chronic SCI model (Figs. 3 b and 4 b and d). In this study, we were able to confirm that transplanted cells at the lesion site could promote the growth density of neuronal processes using MAP-2 and GAP-43 markers. However, the limitation is that we could not detect axonal regeneration using retrovirus or anterograde tracer BDA. In future studies, axonal regeneration detection using anterograde tracer BDA is needed at the lesion site after cell transplantation.
Other studies have reported that motor neurons respond to neurotrophic cues and express and secret growth factors. Moreover, hMNPs express and secrete neurotrophic factors that promote axonal growth and protect neurons from cell death [ 22 , 23 , 39 ]. We confirmed that mature motor neuron differentiation and BDNF expression were increased at the lesion site by BDNF-eMSC + iMNP combination cell transplantation in chronic SCI (Figs. 3 h and 4 a and b). In addition, axonal regeneration was promoted at the lesion site (Fig. 4 d). However, it was complicated to confirm the possible mechanism in vivo. We hypothesized that the BDNF-eMSC and iMN might synergically promote neurite outgrowth induction during motor neuron differentiation and maturation through BDNF expression. We co-cultured BDNF-eMSC and iMN in 2D and 3D spheroid platforms during mature motor neuron differentiation and assessed the neurite outgrowth in vitro assay to confirm this hypothesis. As in previous studies, the BDNF-eMSC + iMN group had significantly higher mature motor neuron differences and BDNF expression than the iMN group. In addition, neurite outgrowth was significantly promoted in the BDNF-eMSC + iMN group (Fig. 6 f and h). However, BDNF-eMSC has a paracrine effect on motor neuron differentiation and neurite outgrowth promotion (Fig. 6 f, g and h). Our results suggest that BDNF-eMSC and iMN co-cultures play an essential role in promoting mature motor neuron differentiation and neurite outgrowth. Additionally, the in vitro assay confirmed that the co-culture of BDNF-eMSC and iMN could promote functional synaptic connections and neural networks during the differentiation of mature motor neurons. These results show successful neural circuitry and connection during maturation of MN generated from the 3D co-culture platform. We were able to confirm that the synergistic effect of BDNF-eMSC + iMN promoted the differentiation of mature motor neurons and neural networks in vitro (Fig. 7 ). However, in future studies, it is necessary to confirm that the transplanted cells at the lesion site promote functional synaptic connections and local neural networks after transplantation of a combination of BDNF-eMSCand iMNP cells.
In summary, this study confirms that behavioral abilities were recovered through the induction of differentiation of mature motor neurons at the lesion site by transplanting a combination of BDNF-eMSC and iMNP cells in a chronic SCI model, suggesting the therapeutic efficacy of the transplantation strategy using a combination of genetically engineered cells and iPSCs in a chronic SCI rat model. However, the limitation of this study is the lack of explanation of the mechanisms supporting the synergistic effect of transplantation of combined genetically engineered cells and iPSCs in the chronic SCI model. In future studies, it will be necessary to confirm the mechanisms of the synergistic effects of transplantation of a combination of cell types using RNA sequencing (RNA-seq) or single-cell analysis at the lesion site. In addition, it is necessary to study the effect of neural regeneration by the differentiation of motor neurons and BDNF expression according to cell ratio and number of transplants. To reduce variation in animal experiments, a sample size sufficient for statistical analysis should be calculated using a few free software packages (G power, power sample).
To our knowledge, this study demonstrates that the combination cell transplantation of BDNF-eMSC and iMNP improves behavioral recovery in the chronic SCI model. At 12 weeks post-injury, the transplanted iMNP predominantly differentiated into mature motor neurons. The BDNF-eMSC exerted a paracrine effect on neuron regeneration, as evidenced by BDNF expression at the lesion site. In vivo and in vitro, the co-culture of BDNF-eMSC and iMNP played a crucial role in motor neuron maturation and axonal regeneration through BDNF expression. Overall, our findings provide proof of concept that stem cell-based gene therapy and combination cell transplantation can enhance motor neuron maturation and BDNF expression in chronic SCI.
All datasets of this article are included within the article.
American veterinary medical association
Animal research: reporting of in vivo experiments
Blood-Brain-Barrier
Basso–Beattie–Bresnahan
Bovine serum albumin
Brain-derived neurotrophic factor
Cord blood mononuclear cells
Dulbecco’s modified eagle’s medium
Dorsomorphin homologue 1
4′,6-diamidino-2-phenylindole
Engineered mesenchymal stem cells
Embryonic stem cell
Human mesenchymal stem cells
Hepatocyte growth factor
Induced pluripotent stem cell
Induced pluripotent stem cell derived Neuro epithelial progenitor
Induced pluripotent stem cell derived motor neuron progenitor cells
Induced pluripotent stem cell derived motor neuron
Induced pluripotent stem cell derived mature motor neurons
Intra lesional
Immuno Fluorescence
Motor neuron cells
Multicenter animal spinal cord injury study
Microtubule-Associated protein-2
Neural stem and progenitor cells
Olfactory ensheathing cells
Optimal cutting temperature
Phosphate-buffered saline
Paraformaldehyde
Phenyl methyl sulfonyl fluoride
Pumorphamine
Room temperature
Retinoic acid
Spinal cord injury
Spinal cord
Sprague–Dawley
Tetra cycline trans activator
Tris-buffered saline
Tris-buffered saline with 0.05% Tween-20
Traumatic spinal cord injury
Western blot
Quadri SA, Farooqui M, Ikram A, Zafar A, Khan MA, Suriya SS, Claus CF, Fiani B, Rahman M, Ramachandran A, et al. Recent update on basic mechanisms of spinal cord injury. Neurosurg Rev. 2020;43:425–41.
Article PubMed Google Scholar
Barbiellini Amidei C, Salmaso L, Bellio S, Saia M. Epidemiology of traumatic spinal cord injury: a large population-based study. Spinal Cord. 2022;60:812–9.
Article PubMed PubMed Central Google Scholar
Kim YH, Ha KY, Kim SI. Spinal cord Injury and related clinical trials. Clin Orthop Surg. 2017;9:1–9.
Pang QM, Chen SY, Xu QJ, Fu SP, Yang YC, Zou WH, Zhang M, Liu J, Wan WH, Peng JC, Zhang T. Neuroinflammation and Scarring after spinal cord Injury: therapeutic roles of MSCs on inflammation and glial scar. Front Immunol. 2021;12:751021.
Article CAS PubMed PubMed Central Google Scholar
Al Mamun A, Monalisa I, Tul Kubra K, Akter A, Akter J, Sarker T, Munir F, Wu Y, Jia C, Afrin Taniya M, Xiao J. Advances in immunotherapy for the treatment of spinal cord injury. Immunobiology. 2021;226:152033.
Article CAS PubMed Google Scholar
Eli I, Lerner DP, Ghogawala Z. Acute traumatic spinal cord Injury. Neurol Clin. 2021;39:471–88.
Fischer I, Dulin JN, Lane MA. Transplanting neural progenitor cells to restore connectivity after spinal cord injury. Nat Rev Neurosci. 2020;21:366–83.
Assinck P, Duncan GJ, Hilton BJ, Plemel JR, Tetzlaff W. Cell transplantation therapy for spinal cord injury. Nat Neurosci. 2017;20:637–47.
Tetzlaff W, Okon EB, Karimi-Abdolrezaee S, Hill CE, Sparling JS, Plemel JR, Plunet WT, Tsai EC, Baptiste D, Smithson LJ, et al. A systematic review of cellular transplantation therapies for spinal cord injury. J Neurotrauma. 2011;28:1611–82.
Lu P, Woodruff G, Wang Y, Graham L, Hunt M, Wu D, Boehle E, Ahmad R, Poplawski G, Brock J, et al. Long-distance axonal growth from human induced pluripotent stem cells after spinal cord injury. Neuron. 2014;83:789–96.
Yang N, Zuchero JB, Ahlenius H, Marro S, Ng YH, Vierbuchen T, Hawkins JS, Geissler R, Barres BA, Wernig M. Generation of oligodendroglial cells by direct lineage conversion. Nat Biotechnol. 2013;31:434–9.
Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–76.
Kramer AS, Harvey AR, Plant GW, Hodgetts SI. Systematic review of induced pluripotent stem cell technology as a potential clinical therapy for spinal cord injury. Cell Transpl. 2013;22:571–617.
Article Google Scholar
Sun L, Wang F, Chen H, Liu D, Qu T, Li X, Xu D, Liu F, Yin Z, Chen Y. Co-transplantation of Human umbilical cord mesenchymal stem cells and human neural stem cells improves the outcome in rats with spinal cord Injury. Cell Transpl. 2019;28:893–906.
Siddiqui AM, Khazaei M, Fehlings MG. Translating mechanisms of neuroprotection, regeneration, and repair to treatment of spinal cord injury. Prog Brain Res. 2015;218:15–54.
Kim JW, Ha KY, Molon JN, Kim YH. Bone marrow-derived mesenchymal stem cell transplantation for chronic spinal cord injury in rats: comparative study between intralesional and intravenous transplantation. Spine (Phila Pa 1976). 2013;38:E1065–1074.
Ahn SY, Sung DK, Chang YS, Sung SI, Kim YE, Kim HJ, Lee SM, Park WS. BDNF-Overexpressing Engineered mesenchymal stem cells enhances their therapeutic efficacy against severe neonatal hypoxic ischemic brain Injury. Int J Mol Sci 2021, 22.
Choi BY, Hong DK, Kang BS, Lee SH, Choi S, Kim HJ, Lee SM, Suh SW. Engineered Mesenchymal stem cells over-expressing BDNF protect the Brain from Traumatic Brain Injury-Induced neuronal death, neurological deficits, and cognitive impairments. Pharmaceuticals (Basel) 2023, 16.
Ahn SY, Sung DK, Kim YE, Sung S, Chang YS, Park WS. Brain-derived neurotropic factor mediates neuroprotection of mesenchymal stem cell-derived extracellular vesicles against severe intraventricular hemorrhage in newborn rats. Stem Cells Transl Med. 2021;10:374–84.
Khazaei M, Siddiqui AM, Fehlings MG. The potential for iPS-Derived stem cells as a therapeutic strategy for spinal cord Injury: opportunities and challenges. J Clin Med. 2014;4:37–65.
Nogradi A, Pajer K, Marton G. The role of embryonic motoneuron transplants to restore the lost motor function of the injured spinal cord. Ann Anat. 2011;193:362–70.
Lukovic D, Valdes-Sanchez L, Sanchez-Vera I, Moreno-Manzano V, Stojkovic M, Bhattacharya SS, Erceg S. Brief report: astrogliosis promotes functional recovery of completely transected spinal cord following transplantation of hESC-derived oligodendrocyte and motoneuron progenitors. Stem Cells. 2014;32:594–9.
Rossi SL, Nistor G, Wyatt T, Yin HZ, Poole AJ, Weiss JH, Gardener MJ, Dijkstra S, Fischer DF, Keirstead HS. Histological and functional benefit following transplantation of motor neuron progenitors to the injured rat spinal cord. PLoS ONE. 2010;5:e11852.
Park BW, Jung SH, Das S, Lee SM, Park JH, Kim H, Hwang JW, Lee S, Kim HJ, Kim HY, et al. In vivo priming of human mesenchymal stem cells with hepatocyte growth factor-engineered mesenchymal stem cells promotes therapeutic potential for cardiac repair. Sci Adv. 2020;6:eaay6994.
Tian WJ, Jeon SH, Zhu GQ, Kwon EB, Kim GE, Bae WJ, Cho HJ, Ha US, Hong SH, Lee JY, et al. Effect of high-BDNF microenvironment stem cells therapy on neurogenic bladder model in rats. Transl Androl Urol. 2021;10:345–55.
Rim YA, Nam Y, Ju JH. Application of Cord Blood and Cord Blood-Derived Induced Pluripotent Stem cells for cartilage regeneration. Cell Transpl. 2019;28:529–37.
Nam Y, Rim YA, Jung SM, Ju JH. Cord blood cell-derived iPSCs as a new candidate for chondrogenic differentiation and cartilage regeneration. Stem Cell Res Ther. 2017;8:16.
Du ZW, Chen H, Liu H, Lu J, Qian K, Huang CL, Zhong X, Fan F, Zhang SC. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat Commun. 2015;6:6626.
Li XJ, Du ZW, Zarnowska ED, Pankratz M, Hansen LO, Pearce RA, Zhang SC. Specification of motoneurons from human embryonic stem cells. Nat Biotechnol. 2005;23:215–21.
Lee JY, Ha KY, Kim JW, Seo JY, Kim YH. Does extracorporeal shock wave introduce alteration of microenvironment in cell therapy for chronic spinal cord injury? Spine (Phila Pa 1976). 2014;39:E1553–1559.
Curtis E, Martin JR, Gabel B, Sidhu N, Rzesiewicz TK, Mandeville R, Van Gorp S, Leerink M, Tadokoro T, Marsala S, et al. A first-in-Human, phase I study of neural stem cell transplantation for chronic spinal cord Injury. Cell Stem Cell. 2018;22:941–e950946.
Saulino M, Averna JF. Evaluation and management of SCI-Associated Pain. Curr Pain Headache Rep. 2016;20:53.
Gwak YS, Kim HY, Lee BH, Yang CH. Combined approaches for the relief of spinal cord injury-induced neuropathic pain. Complement Ther Med. 2016;25:27–33.
McIntyre A, Mays R, Mehta S, Janzen S, Townson A, Hsieh J, Wolfe D, Teasell R. Examining the effectiveness of intrathecal baclofen on spasticity in individuals with chronic spinal cord injury: a systematic review. J Spinal Cord Med. 2014;37:11–8.
Emamhadi M, Alijani B, Andalib S. Long-term clinical outcomes of spinal accessory nerve transfer to the suprascapular nerve in patients with brachial plexus palsy. Acta Neurochir (Wien). 2016;158:1801–6.
Gomes-Osman J, Cortes M, Guest J, Pascual-Leone A. A systematic review of experimental strategies aimed at improving motor function after Acute and chronic spinal cord Injury. J Neurotrauma. 2016;33:425–38.
Kim YC, Kim YH, Kim JW, Ha KY. Transplantation of mesenchymal stem cells for Acute spinal cord Injury in rats: comparative study between Intralesional Injection and Scaffold based transplantation. J Korean Med Sci. 2016;31:1373–82.
Kang ES, Ha KY, Kim YH. Fate of transplanted bone marrow derived mesenchymal stem cells following spinal cord injury in rats by transplantation routes. J Korean Med Sci. 2012;27:586–93.
Erceg S, Ronaghi M, Oria M, Rosello MG, Arago MA, Lopez MG, Radojevic I, Moreno-Manzano V, Rodriguez-Jimenez FJ, Bhattacharya SS, et al. Transplanted oligodendrocytes and motoneuron progenitors generated from human embryonic stem cells promote locomotor recovery after spinal cord transection. Stem Cells. 2010;28:1541–9.
Download references
Not applicable.
This work was supported by a grant from the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT, & Future Planning (grant number: NRF-2019R1A5A2027588, and NRF-2021R1C1C2004688). This research was also supported by a grant from the Catholic Institute of Cell Therapy (CRC) in 2024. The basic Medical Science Facilitation Program through the Catholic Medical Center of the Catholic University of Korea funded by the Catholic Education Foundation. The funding body played no role in the design of the study, collection, analysis, and interpretation of data, and manuscript writing.
Authors and affiliations.
CiSTEM laboratory, Catholic iPSC Research Center (CiRC), College of Medicine, The Catholic University of Korea, Seoul, 137-701, Republic of Korea
Jang-Woon Kim, Yeri Alice Rim, Se In Jung, Jooyoung Lim & Ji Hyeon Ju
Department of Biomedicine & Health Science, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
Division of Rheumatology, Department of Internal Medicine, Seoul St. Mary’s Hospital, Institute of Medical Science, College of Medicine, The Catholic University of Korea, Seoul, 137-701, Republic of Korea
Ji Hyeon Ju
YiPSCELL, Inc., Seoul, Republic of Korea
Juryun Kim, Yoojun Nam, Hyewon Kim & Ji Hyeon Ju
SL BiGen, Inc., Incheon, Republic of Korea
Soon Min Lee, Young Chul Sung & Hyo-Jin Kim
You can also search for this author in PubMed Google Scholar
Study design: J.-W.K., J.R.K., S.M.L., Y.C.S., H.-J.K., and J.H.J.; data collection: J.-W.K., J.R.K., Y.A.R., Y.N., H.K., S.I.J., J.L, and J.H.J.; data analysis: J.-W.K., J.R.K., S.M.L., Y.C.S., Y.A.R., and J.H.J.; drafting manuscript: J.-W.K. and J.H.J.
Correspondence to Ji Hyeon Ju .
Ethics approval and consent to participate.
Title of the approved project; Efficacy evaluation on neuronal protection and regeneration of combined treatment of MNP and BM-102 in contusive spinal cord injury model.
Name of the institutional approval committee; The Animal Studies Committee of the School of Medicine, the Catholic University of Korea.
Approval Number; IACUC approval Number CUMC-2020-0364-04.
Date of approval; 29. December 2021.
Not application.
The authors declare that there is no competing interest. Affiliation 4 declares that there is no competing interest. J.K., Y.N., and H.K. are employees at YiPSCELL, Inc., and J.H.J. is the employer. J.H.J. is the founder of YiPSCELL, Inc., and also works at the Seoul St. Mary’s hospital, Catholic University of Korea. The two groups do not have competing interests. Affiliation 5 declares that there is no competing interest. S.M.L., Y.C.S., H.-J. K. are employees at SL BiGen, Inc. The two groups do not have competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Supplementary material 2, 13287_2024_3770_moesm3_esm.mp4.
Supplementary Movie 1: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in PBS group (separate Movie file)
Supplementary Movie 2: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in BDNF eMSC group (separate Movie file)
Supplementary Movie 3: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in iMNP group (separate Movie file)
Supplementary Movie 4: BBB locomotor scales to evaluate the clinical recovery of behavior for 12 weeks post injury in BDNF eMSC+iMNP group (separate Movie file)
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Kim, JW., Kim, J., Lee, S.M. et al. Combination of induced pluripotent stem cell-derived motor neuron progenitor cells with irradiated brain-derived neurotrophic factor over-expressing engineered mesenchymal stem cells enhanced restoration of axonal regeneration in a chronic spinal cord injury rat model. Stem Cell Res Ther 15 , 173 (2024). https://doi.org/10.1186/s13287-024-03770-9
Download citation
Received : 28 November 2023
Accepted : 26 May 2024
Published : 18 June 2024
DOI : https://doi.org/10.1186/s13287-024-03770-9
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1757-6512
Original research article, experimental investigation on pedestrian walking load in steel footbridges.
Accurate simulation of walking load is of great significance in conducting human-induced vibration analyses. However, accurate pedestrian walking load data obtained from long-span footbridges is scarce and data reliability depends on the sensor used for measurement. In the current work, Yanluo Footbridge with 102 m span was adopted as test site and Xnode high-precision acceleration wireless sensor was applied for measurements. An experimental investigation was performed on walking loads according to bipedal walking force model. In experimental studies, single-person and multi-person walking tests were performed at Yanluo Footbridge to measure corresponding stride frequency and dynamic load factor. The acceleration time-histories of walking pedestrians were accurately recorded using three-axis wireless acceleration sensor Xnode. Furthermore, the equation of dynamic load factor was derived by analyzing time-histories and power spectra and the design models of pedestrian walking load and crowd load were developed based on a great number of experimental data. Time histories of pedestrian walking loads showed regular periodic changes and dynamic load factor increased by increasing stride frequency. Using the walking load model developed in this work, the reliable structure response of human-induced vibration analysis can be obtained.
To measure pedestrian force, the time-history curve of footstep load was directly observed using force measuring plate, walking machine, and other instruments to analyze its characteristics and develop a mathematical model. ( Harper et al., 1961 ) developed a force plate to measure the time-history curves of vertical, lateral, and longitudinal components of the footstep load of an individual pedestrian, which was the first experimental measurement of pedestrian load. Due to early development of force plates and high accuracy of their measured data, many researchers have applied and improved them and its technology has been matured. However, the large volume and fixed position of the instrument limit the free walking of the tester. However, application of a wireless acceleration sensor fixed in the body or waist leg is more convenient when collecting data on human movement characteristics and pedestrian movement is not limited by the bulkiness of the device. With rapid development of electronic technology, production cost of wireless sensors is decreasing, attracting the attention of researchers around the world. For example, Song et al. applied acceleration sensors for the development of activity recognition systems capable of identifying daily activities such as running, walking, sitting, standing, lying down, falling, etc., by fixing sensor module connected to mobile phone on a belt with recognition accuracy of up to 95.5%. A human body daily activity recognition system was designed using three-axis acceleration sensors ( Khan et al., 2010 ). By placing the recognition model on waist, coat pocket, etc., daily physical activities such as going up and downstairs, resting, riding and so on could be identified. Recognition accuracy of this method could reach 95.96%. ( Ailisto et al., 2005 ) applied a triaxial acceleration sensor fixed at waist to collect acceleration signals of walking and compared the collected signals and with stored template signals. He found that the average error rate was only 6.4%. In addition, accurate pedestrian walking load data obtained from long-span footbridges, such as arch bridge ( Lu et al., 2021a ; Lu et al., 2021b ; Yang et al., 2022a ) is scarce and data reliability depends on the sensor used for measurement.
Acceleration sensors could effectively record responses containing human motion information. Many other researchers have conducted similar studies ( Zhu et al., 2018 ; Chen et al., 2019 ; Wang et al., 2019 ; Mohammed and Pavic, 2021 ; Paweł et al., 2021 ; Xiong et al., 2021 ; Chen et al., 2022 ). However, few studies are available on indirect measurement of walking force using acceleration sensors. Several approaches have been designed to measure walking force by acceleration detection systems based on 2DOF-bipedal walking force ( Ebrahimpour et al., 1996 ; Živanović et al., 2005 ; Bachus et al., 2006 ; Geyer et al., 2006 ; Bachus et al., 2006 ; Gurney et al., 2008 ; Jones et al., 2011 ). Our findings confirmed that the developed system was able to effectively measure walking force and walking characteristics. However, both studies simplified human body as a double particle and measured human body acceleration at the double center of mass by acceleration sensors. However, test procedure was relatively complex; the accelerations of mass centers of the three test pedestrians were obtained under their free walking and the number of test pedestrians and test conditions were inadequate, which made it impossible to demonstrate the differences of the walking forces of testers with different human characteristic parameters. According to previous research works based on bipedal walking force model, in this work, human body was considered as a single particle. A three-axis wireless acceleration sensor measured human body motion accelerations at the center of mass and two legs. Furthermore, the dynamic load factor of pedestrian walking load, which is a fundamental parameter in describing harmonic amplitude, could be obtained from a large number of experimental data. Using the walking load model developed in this work, the reliable structure response of human-induced vibration analysis can be obtained. Since pedestrian load can be regarded as impact load or periodic load, the results of this experiment can provide a standardized load model for the response of bridge structure under impact load ( Yang et al., 2021a ) or periodic load ( Yang et al., 2022b ). In addition, the experiments described in this paper provided test results for the formulation of regulations related to structural vibrations in China and provided technical support for vibration displacement monitoring of long-span structure ( Lu et al., 2020 ) and composite structures ( Yang et al., 2021b ).
In this paper, an experimental investigation was performed on pedestrian walking and crowd loads and a corresponding mechanical model was developed. We performed theoretical studies using bipedal walking force model. In experimental investigations, out single-person and multi-person walking tests were performed at Yanluo footbridge and tested corresponding stride frequency and dynamic load factor were evaluated. With three-axis wireless acceleration sensor Xnode, the acceleration time-histories of pedestrian walking were accurately recorded. Also, dynamic load factor equation was obtained by analyzing time-histories and power spectra. Finally, design models were established for pedestrian walking and crowd loads based on a huge number of experimental data.
Several pedestrian models have been developed to study pedestrian load ( Wei and Griffin. 1998 ; Živanovic et al., 2005 ; Racic et al., 2009 ; Venuti and Bruno 2009 ; Bocian et al., 2013 ; Han et al., 2017 ; Gao et al., 2018 ; Han et al., 2021 ). Bipedal force model was established by ( Geyer et al., 2006 ) based on spring-mass model taking into account the gait characteristics supported by bipedal during walking. As shown in Figure 1 , this model ignored foot rotation and simplified human body into a single-particle system. Legs were simplified as two independent massless springs supported on point mass. Both springs had stiffness, rest length, and constant angle relative to gravity (g is gravity acceleration) during follow-up phase. The figure showed a complete gait, including two stages of monopod support and bipedal support. Bipedal stage was from touching the ground with the heel of the right foot (TD) to lifting the tip of the left foot off the ground (TO). The leg which is pulled off the ground from the tip of the foot until the heel is landed again during walking is known as trailing leg, while the other leg is called leading leg. Geyer proved that the model could simulate vertical walking force with bimodal properties and vertical displacement of the center of mass.
FIGURE 1 . Biped walking force model (Gurney et al.).
( Qin et al., 2013 ) introduced damping parameters on the basis of Geyer’s bipedal model taking into account human-structure interaction. As shown in Figure 2 , the two legs of a person were simplified as massless springs and damping rods which moved independently. In the process of moving, two spring legs absorbed the generated impact energy when they touched the ground (impact angle) simultaneously providing propulsion power. In the case of single-foot support, trailing leg was not in contact with ground. At this moment, spring leg was in original state, under which elastic recovery and damping forces were zero. Meanwhile, leading leg was in contact with ground. Accordingly, spring was in compressed state, so that elastic recovery and damping forces balanced each other with gravity and inertia forces of human body.
FIGURE 2 . Damped biped walking force model (Chen et al.).
Bipedal model of biomechanics simplified human body as a point mass. Accordingly, two legs of human body were simplified as a massless system consisting of two springs and damping forces. The elastic and damping forces generated by pedestrians when touching the ground in motion was in balance with gravity and inertia forces of human body. As presented in Figure 3 , human body mass was represented by the mass m h of the center of mass. In bipedal model, human body and ground were regarded as a whole for force analysis. When the influence of air damping force was ignored, the balance equations of the gravity, inertia force and ground reaction force of human body was derived as
where z ¨ and y ¨ are vertical and longitudinal accelerations of M, respectively, F ′ z and F ′ y denote vertical and longitudinal reactions of ground, respectively, and g is acceleration due to gravity.
FIGURE 3 . Mechanical analysis.
According to the derived balance equation, a three-axis acceleration sensor was fixed on pedestrian waist to measure the vertical and longitudinal accelerations of the pedestrian when walking. Therefore, ground reaction force could be calculated. The walking force of pedestrian was obtained based on action-reaction relationship as
Furthermore, by fixing the three-axis acceleration sensor on the left or right legs of the pedestrian, the vertical acceleration of pedestrians’ walking when his left or right leg touched the ground could be effectively measured.
Pedestrian walking load is difficult to predict and its frequency and magnitude can significantly change. Walking is generally characterized as regular predominantly horizontal human body motion whereby at least one foot is always in contact with ground in a frequency ranging from 1.4 to 2.5 Hz ( Jones et al., 2011 ). Typical Fourier series was used to represent periodic pedestrian walking load, which was expressed as
where G is human body weight and F w ( t ) is the dynamic component of pedestrian walking load stated as
where f w is the stride frequency of walking pedestrian A 1 , A 2 and A 3 are the first, second and third dynamic load factors and ϕ 1 , ϕ 2 and ϕ 3 are the three associated phase lags in radians, respectively. According to reference [30], the values of these factors were
Therefore, in order to develop pedestrian walking load model, first dynamic load factor had to be evaluated. In addition, the expansion of pedestrian walking load models calibrated for individuals into models for crowd loads is an important subject. According to Eq. 5 , crowd load model could be expressed as
where Q is the weight of associated crowd. In experiments, both first dynamic load factor A 1 and stride frequency f w of the crowd load had to be tested.
The major instruments adopted in this work included three-axis wireless acceleration sensor, gateway node the computer with a corresponding terminal program installed. Terminal program enabled the computer to communicate with Xnode gateway node, as shown in Figure 4 . Embedor’s proprietary synchronized distributed sensing framework could precisely deliver synchronized sensed data from thousands of distributed sensor channels. Wireless communication protocol between Xnodes and Gateways enabled highly accurate time synchronization with precision of 50-microsecond and ensured reliable and lossless data transfer under any operating conditions. Furthermore, each Xnode could be configured either as a sensor node or as a Gateway to coordinate and maintain wireless transmissions across a network of distributed wireless sensor nodes. This modular and versatile sensor platform enabled wireless data acquisition and processing for data-intensive applications (high resolution and high sampling rate) such as structural health monitoring, manufacturing and monitoring of industrial equipment, and seismic sensing. Sensor board employed a 24-bit ADC (Texas Instruments ADS131E8) with eight channels allowing maximum sampling rates up to 16 kHz. The device was equipped with an ultra-compact low power triaxial accelerometer and technical parameters are summarized in Table 1 . Xnode wireless triaxial acceleration sensor could obtain acceleration along three directions of any position in human body. Studies have shown that human body mass center was closest to waist abdomen. Therefore, in experiments, three sensors were placed on tester’s back, waist, and two legs. The X -axis of sensor was aligned with the vertical direction of human body to obtain vertical acceleration of mass center, Y -axis was aligned with horizontal direction of human body, and Z -axis was aligned with forward direction.
FIGURE 4 . Xnode wireless sensor.
TABLE 1 . Xnode performance parameters.
Yanluo Footbridge in Shenzhen with 102 m length and level ground was selected as test site, as shown in Figure 5 . Yanluo Footbridge is a steel foot bridge and was designed for Foxconn workers to connect dormitory and work areas. Therefore, the flow of people was relatively large and people walked in a hurry. Using MIDAS Civil for eigenvalue analysis, natural vibration characteristics of the first 3 orders were calculated. The frequency and period of natural vibration are summarized in Table 2 and the first 3 modal diagrams are shown in Figure 6 .
FIGURE 5 . Yanluo footbridge.
TABLE 2 . The first 10 natural vibration frequencies and periods.
FIGURE 6 . Diagram of the first 3 modes: (A) 1 order vibration mode; (B) 2 order vibration mode; (C) 3 order vibration mode.
When the frequency of the pedestrian load is close to the natural vibration frequency of the footbridge, the pedestrian bridge resonance may occur. It could be seen from Figure 6 that the third-order natural vibration frequency of the modified bridge was relatively close to pedestrian stride frequency but further analysis was required.
To avoid pedestrian interference, tests were performed at night. For accurate investigation, two men and two women without walking defects were adopted for the test. The physical characteristics of the testers were collected and recorded before the test, as shown in Table 3 . In order to simultaneously observe leg acceleration change during walking process, three wireless acceleration sensors were attached to the waist and front side of the left and right thighs of the testers through bandages.
TABLE 3 . Physical characteristic values of testers.
In experiments, fixed walking frequencies of 1.4, 1.6, 1.8, 2.0, 2.2, and 2.5 Hz and two groups of random walking frequencies were selected in all eight working conditions, in which a metronome controlled fixed walking frequency. After the sensor was fixed, testers performed adaptive walking training until they could adapt to normal walking instrument. After turning on the child node switch and waiting for the gateway node to set out the test instruction, the tester followed the metronome to walk uniformly along the test route. After finishing fixed cadence test, testers walked along a random uniform straight line with cadence according to their walking habits. During the tests, step number and walking time of each tester was recorded and processed; the obtained test results are summarized in Tables 4 , Table 5 , Table 6 , and Table 7 .
TABLE 4 . Experimental data of tester 1 walking at different step frequencies.
TABLE 5 . Experimental data of tester 2 walking at different step frequencies.
TABLE 6 . Experimental data of tester 3 walking at different step frequencies.
TABLE 7 . Experimental data of tester 4 walking at different step frequencies.
From the experimental data given in Tables 4 , Table 5 , Table 6 , and Table 7 , it was seen that the walking speeds of four testers were gradually increased with the increase of stride frequency, which was consistent with theorical findings. By calculation, the average stride frequency of four test testers for a total of 8 free walks was 1.8224 Hz, which was consistent with the average stride frequency of 1.82 Hz experimentally obtained for more than 2,000 students in reference [31]. At the same time, average free walking speed of the four testers was calculated to be 1.288 m/s, which was also consistent with the findings of previous studies. Comparing the data from the four groups, it was found that the average walking speed of the two men was greater than that of the two women when walking freely. Step length of men was longer because they had longer legs than women. Therefore, walking speed of men was much greater than women at the same walking frequency since its value was equal to the product of step length and stride frequency.
Selection of the site and time of tests was consistent with single-person walking load test procedure. In addition to the four testers who were evaluated in single-person walking force tests, two additional testers with no movement disorder and different heights and weights were selected for multi-person tests. The physical characteristics of the testers are summarized in Table 8 .
TABLE 8 . Physical characteristics of testers.
Unlike walking load test conditions of individual pedestrians, it was necessary to conduct synchronous adaptability training before tests and attach an acceleration sensor on the back of the waist on each tester during walking tests. First of all, two testers were required to walk in a row less than 60 cm from each other because longer intervals could reduce the test to a one-pedestrian walking test. After several tests on the walking conditions of two people standing in a row, the testers walked side by side in two rows again. Finally, tests with 4 and 6 people were performed in the same manner. Walking time and the number of steps of the testers under all conditions were recorded throughout multi-person walking tests, as presented in Table 9 .
TABLE 9 . Experimental data of synchronized multi-person walking.
As summarized in Table 5 , Table 6 , Table 7 , Table 8 , and Table 9 both walking time and the number of steps were increased with the increase of the number of testers, no matter they walked in one or two columns, which indicated that walking speed and step length of pedestrians crossing the bridge were decreased by increasing crowd density under normal circumstances. Total walking steps and walking time in the opposite direction were smaller than those for one or two lines, which might be because the testers were far apart before the intersection in the opposite direction without interfering with each other.
Time history analysis.
Dynamic load factor (DLF) of pedestrian walking load, also known as the first harmonic dynamic load factor, is a basic parameter describing the harmonic amplitude of dynamic load. Dynamic load amplitude was calculated as the product of pedestrian weight and dynamic load factor. To illustrate the effects of stride frequency and measuring position of human body on time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G of tester 1 were plotted for different stride frequencies and measuring positions, as shown in Figure 7 , where stride frequency f w was set at 1.4 and 1.8 Hz. Measuring positions were centroid, left leg and right leg.
FIGURE 7 . Time histories of non-dimensional pedestrian walking loads for different stride frequencies and measuring positions: (A) 1.4Hz; (B) 1.8 Hz.
It was seen from Figure 7 that when measuring position was in centroid, time history showed regular periodic changes, where the period of the time history curve with stride frequency 1.4 Hz was longer than that with stride frequency 1.8 Hz. It was also seen that when measuring position was in left and right legs, the half period of time history curve showed regular periodic changes and the other half showed irregularity. This indicated that the irregular half period of time history denoted one foot off the ground and the regular half period indicated that one foot or two feet touched the ground. In addition, DLF, defined as the amplitude of time history curve of non-dimensional pedestrian walking load, also shown in Figure 7 , for stride frequency 1.4 Hz was smaller than that for stride frequency 1.8 Hz. Also, the DLF value obtained for measuring positions of two legs was higher than that with measure position located in centroid. In addition, to illustrate human weight effects on time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G were plotted for testers with weights G = 686, 549, 519 and 441N, and stride frequency f w = 1.6 Hz, as shown in Figure 8 .
FIGURE 8 . Time histories of non-dimensional pedestrian walking loads for different testers.
It was seen from Figure 8 that the amplitude of time history curves for the four testers had little differences and the tester with weight of G = 686N had the highest value. This indicated that the amplitude of time history curve was slightly increased by increasing in human weight. Therefore, the effects of human weight in pedestrian walking load model were ignored. Furthermore, to illustrate the effects of multi-person walking on the time history and DLF of pedestrian walking load, time histories of non-dimensional pedestrian walking load F w ( t ) / G were plotted, as shown in Figure 9 for testers 1, 2, 3, and 6 walking together. It was seen from the figure that, when pedestrians were walking together, the range of stride frequency f w of pedestrians was narrow; i.e. 1.7 Hz–1.9 Hz. It was also seen from Figure 9 that the amplitudes of time history curves of all testers were very close. This indicated that crowd loads had very narrow ranges of stride frequency and dynamic load factor. Therefore, according to test results, the stride frequency of crowd load could be considered to be 1.8 Hz.
FIGURE 9 . Time histories of non-dimensional pedestrian walking loads for multi-person walking.
In order to further investigate the DLF of pedestrian walking loads, power spectra of pedestrian walking loads have been analyzed in this section. To illustrate the effects of stride frequency and measuring position of human body on the power spectra and DLFs of pedestrian walking loads, the power spectrum of pedestrian walking load of tester 1 for centroid, left leg and right leg measuring positions were plotted for stride frequencies f w = 1.4 ( Figure 10A ), 1.6 Hz ( Figure 10B ), f w = 1.8 ( Figure 10C ) and 2.0 Hz ( Figure 10D ). It could be seen from Figure 10 that when measuring position was in centroid, the frequency segment near stride frequency presented a peak in corresponding power spectrum curve, which was considered as DLF A 1 , and increased by increasing stride frequency f w . When measuring position was in left or right leg, the frequency segments near 0.5 f w , f w and 1.5 f w presented three peaks in corresponding power spectrum curve, which were defined A 0.5 , A 1 and A 0.5 , respectively, and were increased by increasing stride frequency f w . Since the left and right legs were in ground touching status only for half time, the lowest frequency segment of power spectrum curve having a peak for measuring position being in the left or right leg is closed to 0.5 f w .
FIGURE 10 . Power spectra of pedestrian walking loads for stride frequencies f w of 1.4 and 2.0 Hz
To illustrate the effects of multi-person walking on the power spectra and DLFs of pedestrian walking loads, the power spectra of pedestrian walking loads were plotted for testers 1 and 2 walking in a line and side by side ( Figure 11 ), testers 1, 2, 3 and 6 walking in two lines ( Figure 12A ), testers 1, 2, 3 and 6 walking in one line ( Figure 12B ), testers 1–6 walking in two lines ( Figure 13A ), and testers 1–6 walking in one line ( Figure 13B ). Figure 11 , Figure 12 , and Figure 13 showed that when pedestrians walked together, the frequency segment near the stride frequency presented a peak in corresponding power spectrum curve, which was called DLF A 1 , and had a narrow range of about 0.168–0.254. It was also found that the stride frequencies f w of pedestrians walking together had a narrow range of about 1.782–1.904 Hz.
FIGURE 11 . Power spectra of pedestrian walking loads for two testers in a line or side by side.
FIGURE 12 . Power spectra of pedestrian walking loads for four testers.
FIGURE 13 . Power spectra of pedestrian walking loads for six testers.
According to test results, design equations of pedestrian walking loads and crowd loads were built up in this section.
The Eq. 6 can be used to build the design equation of pedestrian walking loads, which can be given by
where A 1 is the DLF of pedestrian walking loads obtained from power spectrum analysis. In order to obtain the equation of the DLF of pedestrian walking loads A 1 , variations of A 1 obtained under various test conditions in the experiment with the stride frequency f w are shown in Figure 14 . In addition, the upper and lower bounds of the relation between A 1 and f w were obtained by linear fitting analysis, which is also shown in Figure 14 . For safety reason, the upper bounds were applied to derive the equation of the DLF of pedestrian walking loads as:
FIGURE 14 . The relation between A1 and fw as well as corresponding fitting curve.
Furthermore, Eq. 8 could be used to derive the design equation of crowd loads, which was stated as
The stride frequency f w and DLF A 1 of the crowd loads were obtained from the mid-value of test data. According to Figure 11 , Figure 12 , and Figure 13 , the mid-value of the stride frequency f w of crowd loads test was 1.843Hz, and that of the DLF A 1 of crowd loads test was 0.211. Therefore, the design equation of crowd loads was expressed as
This paper presented an experimental investigation on pedestrian walking and crowd loads for Yanluo foot bridge. A theoretical study was carried out using a bipedal walking force model. In the experimental investigation, both single-person and multi-person walking tests were performed and corresponding stride frequencies and dynamic load factors were evaluated, respectively. The average frequency and walking speed of four testers in individual pedestrian tests for eight free walks were consistent with those reported in previous studies, which verified the reliability of the tests. In addition, design equations of pedestrian walking loads and crowd loads was obtained by the analysis of time-histories and power spectra. It was found that there was a similarity in variation rules of walking forces of different testers, but due to differences in walking habits and human characteristic parameters (such as height, weight, leg length, etc.), walking force could change. It was also found that dynamic load factor was increased by increasing stride frequency, and the mid-value of the stride frequency f w of crowd loads test was 1.843 Hz, and that of the DLF A 1 of crowd loads test was 0.211.
The experiments described in this paper provide a standardized load model for the response of bridge structure under impact load or periodic load, provided test results for the formulation of regulations related to structural vibrations in China and provided technical support for vibration displacement monitoring of long-span structure and composite structures.
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
DD, XZ, and HL contributed to conception and design of the study. ZW organized the database. DD performed the statistical analysis. DD and XZ wrote the first draft of the manuscript. DD, ZW, XZ, and HL wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.
Author DD is employed by China Construction Steel Engineering Co., Ltd., Author ZW is employed by China Railway Guangzhou Group Co., Ltd., Author XZ is employed by Dongguan Hongchuan Intelligent Logistics Development Co., Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
The authors would like to thank all participants who provided support for our approach.
Ailisto, H. J., Lindholm, M., Mantyjarvi, J., and Mäkelä, S. M. (2005). “Identifying People from Gait Pattern with Accelerometers,” in Proceedings of SPIE - The International Society for Optical Engineering , 28 March 2005 (Orlando, Florida, United States: SPIE ), 5779. doi:10.1117/12.603331
CrossRef Full Text | Google Scholar
Bachus, K. N., Demarco, A. L., Judd, K. T., Horwitz, D. S., and Brodke, D. S. (2006). Measuring Contact Area, Force, and Pressure for Bioengineering Applications: Using Fuji Film and TekScan Systems. Med. Eng. Phys. 28 (5), 483–488. doi:10.1016/j.medengphy.2005.07.022
PubMed Abstract | CrossRef Full Text | Google Scholar
Bocian, M., Macdonald, J. H. G., and Burn, J. F. (2013). Biomechanically Inspired Modeling of Pedestrian-Induced Vertical Self-Excited Forces. J. Bridge Eng. 18, 1336–1346. doi:10.1061/(asce)be.1943-5592.0000490
Chen, J., Han, Z., and Brownjhn, J. (2019). Human Shaker Modal Testing Technology via Wearable Inertial Measurement Units. J. Vib. Eng. 32 (4), 644–652. doi:10.16385/j.cnki.issn.1004-4523.201904.011
Chen, Z., Chen, Z., Zhang, X., Huang, S., and Chen, Z. (2022). Dynamic Response and Vibration Reduction of Steel Truss Corridor Pedestrian Bridge Under Pedestrian Load. Front. Mater. , 31.
Google Scholar
Ebrahimpour, A., Hamam, A., Sack, R. L., and Patten, W. N. (1996). Measuring and Modeling Dynamic Loads Imposed by Moving Crowds. J. Struct. Eng. 122 (12), 1468–1474. doi:10.1061/(asce)0733-9445(1996)122:12(1468)
Gao, Y. A., Yang, Q. S., and Dong, Y. (2018). A Three-Dimensional Pedestrian-Structure Interaction Model for General Applications. Int. J. Struct. Stable. Dyn. 18 (9), 1850107. doi:10.1142/s0219455418501079
Geyer, H., Seyfarth, A., and Blickhan, R. (2006). Compliant Leg Behaviour Explains Basic Dynamics of Walking and Running. Proc. R. Soc. B 273 (1603), 2861–2867. doi:10.1098/rspb.2006.3637
Gurney, J. K., Kersting, U. G., and Rosenbaum, D. (2008). Between-Day Reliability of Repeated Plantar Pressure Distribution Measurements in a Normal Population. Gait Posture 27 (4), 706–709. doi:10.1016/j.gaitpost.2007.07.002
Han, H. X., Zhou, D., and Ji, T. (2017). Mechanical Parameters of Standing Body and Applications in Human-Structure Interaction. Int. J. Appl. Mech. 9 (2), 1750021. doi:10.1142/s1758825117500211
Han, H., Zhou, D., Ji, T., and Zhang, J. (2021). Modelling of Lateral Forces Generated by Pedestrians Walking Across Footbridges. Appl. Math. Model. 89, 1775–1791. doi:10.1016/j.apm.2020.08.081
Harper, F. C., Warlow, W. J., and Clarke, B. L. (1961). The Forces Applied to the Floor by the Foot in Walking , London: Department of Scientific and Industrial Research. Building Research Station , 495–497.
Jones, C. A., Reynolds, P., and Pavic, A. (2011). Vibration Serviceability of Stadia Structures Subjected to Dynamic Crowd Loads: A Literature Review. J. Sound Vib. 330 (8), 1531–1566. doi:10.1016/j.jsv.2010.10.032
Khan, A. M., Young-Koo Lee, Y. K., Lee, S. Y., and Tae-Seong, K. (2010). A Triaxial Accelerometer-Based Physical-Activity Recognition via Augmented-Signal Features and a Hierarchical Recognizer. IEEE Trans. Inf. Technol. Biomed. 14 (5), 1166–1172. doi:10.1109/titb.2010.2051955
Lu, H., Liu, L., Liu, A., Pi, Y.-L., Bradford, M. A., and Huang, Y. (2020). Effects of Movement and Rotation of Supports on Nonlinear Instability of Fixed Shallow Arches. Thin-Walled Struct. 155, 106909. doi:10.1016/j.tws.2020.106909
Lu, H., Zhou, J., Sahmani, S., and Safaei, B. (2021a). Nonlinear Stability of Axially Compressed Couple Stress-Based Composite Micropanels Reinforced with Random Checkerboard Nanofillers. Phys. Scr. 96 (12), 125703. doi:10.1088/1402-4896/ac1d7f
Lu, H., Zhou, J., Yang, Z., Liu, A., and Zhu, J. (2021b). Nonlinear Buckling of Fixed Functionally Graded Material Arches Under a Locally Uniformly Distributed Radial Load. Front. Mater. 8, 310. doi:10.3389/fmats.2021.731627
Mohammed, A., and Pavic, A. (2021). Human-Structure Dynamic Interaction Between Building Floors and Walking Occupants in Vertical Direction. Mech. Syst. Signal Process. 147, 107036. doi:10.1016/j.ymssp.2020.107036
Paweł, H., Roberto, P., Rafaela, S., and Silva, F. (2021). Vertical Vibrations of Footbridges Due to Group Loading: Effect of Pedestrian-Structure Interaction. Appl. Sci. 11, 1–16. doi:10.3390/app11041355
Qin, J. W., Law, S. S., Yang, Q. S., and Yang, N. (2013). Pedestrian-Bridge Dynamic Interaction, Including Human Participation. J. Sound Vib. 332 (4), 1107–1124. doi:10.1016/j.jsv.2012.09.021
Racic, V., Pavic, A., and Brownjohn, J. M. W. (2009). Experimental Identification and Analytical Modelling of Human Walking Forces: Literature Review. J. Sound Vib. 326, 1–49. doi:10.1016/j.jsv.2009.04.020
Venuti, F., and Bruno, L. (2009). Crowd-Structure Interaction in Lively Footbridges Under Synchronous Lateral Excitation: A Literature Review. Phys. Life Rev. 6, 176–206. doi:10.1016/j.plrev.2009.07.001
Wang, Q., Song, Z. G., and Wang, Z. Y. (2019). Tests for Measuring Vertical Pedestrian Loads Using Acceleration Sensors. J. Vib. Shock 38 (1), 215–220. doi:10.13465/j.cnki.jvs.2019.01.031
Wei, L., and Griffin, M. J. (1998). Mathematical Models for the Apparent Mass of the Seated Human Body Exposed to Vertical Vibration. J. Sound Vib. 212, 855–874. doi:10.1006/jsvi.1997.1473
Xiong, J., Chen, J., and Caprani, C. (2021). Spectral Analysis of Human-Structure Interaction During Crowd Jumping. Appl. Math. Model. 89, 610–626. doi:10.1016/j.apm.2020.07.030
Yang, Z., Liu, A., Lai, S.-K., Safaei, B., Lv, J., Huang, Y., et al. (2022a). Thermally Induced Instability on Asymmetric Buckling Analysis of Pinned-Fixed FG-GPLRC Arches. Eng. Struct. 250, 113243. doi:10.1016/j.engstruct.2021.113243
Yang, Z., Lu, H., Sahmani, S., and Safaei, B. (2021b). Isogeometric Couple Stress Continuum-Based Linear and Nonlinear Flexural Responses of Functionally Graded Composite Microplates with Variable Thickness. Archives Civ. Mech. Eng. 21 (3), 1–19. doi:10.1007/s43452-021-00264-w
Yang, Z., Safaei, B., Sahmani, S., and Zhang, Y. (2022b). A Couple-Stress-Based Moving Kriging Meshfree Shell Model for Axial Postbuckling Analysis of Random Checkerboard Composite Cylindrical Microshells. Thin-Walled Struct. 170, 108631. doi:10.1016/j.tws.2021.108631
Yang, Z., Wu, D., Yang, J., Lai, S. K., Lv, J., Liu, A., et al. (2021a). Dynamic Buckling of Rotationally Restrained FG Porous Arches Reinforced with Graphene Nanoplatelets Under a Uniform Step Load. Thin-Walled Struct. 166, 08103. doi:10.1016/j.tws.2021.108103
Zhu, Q. K., Chen, K., Du, Y. F., Jach, M., and Drodza, M. (2018). A Pedestrian up and Down Stairs Biodynamic Model Based on the Measured Data. J. Vib. Shock 37 (4), 233–239. doi:10.1155/2020/8015465
Živanović, S., Pavic, A., and Reynolds, P. (2005). Vibration Serviceability of Footbridges under Human-Induced Excitation: a Literature Review. J. Sound Vib. 279, 1–74. doi:10.1016/j.jsv.2004.01.019
Keywords: bipedal walking force model, pedestrian walking load, wireless sensor, stride frequency, dynamic load factor
Citation: Deng D, Wang Z, Zhang X and Lin H (2022) Experimental Investigation on Pedestrian Walking Load in Steel Footbridges. Front. Mater. 9:922545. doi: 10.3389/fmats.2022.922545
Received: 18 April 2022; Accepted: 16 May 2022; Published: 16 June 2022.
Reviewed by:
Copyright © 2022 Deng, Wang, Zhang and Lin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Deyuan Deng, [email protected]
Advanced Steel and Composite Structures in Civil Engineering Volume II
Andrew stefaniak | 2 hours ago.
This Kentucky basketball team that Coach Mark Pope has put together is going to have a lot of veteran players who have played lots of minutes during their careers.
The question is, do any of the players on this roster have a chance to win SEC Player of the Year?
It seems like there are only two players who could win the SEC Player of the Year Award for the Wildcats and they are Andrew Carr and Koby Brea. Their chances are still very small, but we have seen players win this award before that no one expected.
Some might say Amari Williams has a chance at this award, but more than likely, he won't score enough points to win it this season. Williams does have a solid chance to bring home the Defensive Player of the Year Award in the SEC this season if he continues to block shots and protect the rim as he did at Drexel.
Carr and Brea are the two players who would have a chance to win this award, as they are the two with the best chance to lead the Wildcats in scoring.
With players like Mark Sears and Johni Broome in the SEC, it would likely take 20 points per game from Carr or Brea to win this award.
This is possible for the transfers but would take a lot. More than likely, these two with both average around 15 points per contest, which would make it hard to win the SEC Player of the Year Award. But if a Wildcat were to win this award, it would most likely be Carr or Brea.
ANDREW STEFANIAK
IMAGES
VIDEO
COMMENTS
1. In a study of the effects of colors and prices on sales of cars, the factors being studied are color (qualitative variable) and price (quantitative variable). 2. In an investigation of the effects of education on income, the factor being studied is education level (qualitative but ordinal). Factor levels.
Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.
A standard guideline for an experimental design is to "Block what you can, randomize what you cannot." Use blocking for a few primary nuisance factors. Then use random assignment to distribute the unblocked nuisance factors equally between the experimental conditions. You can also use covariates to control nuisance factors.
Definitions Factor - A variable under the control of the experimenter. Factors are explanatory variables. A factor has 2 or more levels. Treatment - The combination of experimental conditions applied to an experimental unit. Response - The outcome being measured. Experimental unit - The unit to which the treatment is applied. Observational unit - The unit on which the response is
The practical steps needed for planning and conducting an experiment include: recognizing the goal of the experiment, choice of factors, choice of response, choice of the design, analysis and then drawing conclusions. This pretty much covers the steps involved in the scientific method. What this course will deal with primarily is the choice of ...
The three-level design is written as a 3 k factorial design. It means that k factors are considered, each at 3 levels. These are (usually) referred to as low, intermediate and high levels. These levels are numerically expressed as 0, 1, and 2. One could have considered the digits -1, 0, and +1, but this may be confusing with respect to the 2 ...
Each of the independent variables is called a factor, and each factor has two levels (yes or no). As this experiment has 3 factors with 2 levels, this is a 2 x 2 x 2 = 2 3 factorial design. An experiment with 3 factors and 3 levels would be a 3 3 factorial design and an experiment with 2 factors and 3 levels would be a 3 2 factorial design.
Experimental Design. Experimental design is a process of planning and conducting scientific experiments to investigate a hypothesis or research question. It involves carefully designing an experiment that can test the hypothesis, and controlling for other variables that may influence the results. Experimental design typically includes ...
Design of experiments (DOE) is defined as a branch of applied statistics that deals with planning, conducting, analyzing, and interpreting controlled tests to evaluate the factors that control the value of a parameter or group of parameters. DOE is a powerful data collection and analysis tool that can be used in a variety of experimental ...
In experimental design terminology, factors are variables that are controlled and varied during the course of the experiment. For example, treatment is a factor in a clinical trial with experimental units randomized to treatment. Another example is pressure and temperature as factors in a chemical experiment. Most clinical trials are structured ...
The subset of experimental conditions from the complete three-factor factorial experiment in Table 1 that would be implemented in the individual experiments approach is depicted in the first section of Table 2. This design, considered as a whole, is not balanced. Each of the independent variables is set to On once and set to Off five times.
The average CS interaction is therefore ( − 13 − 14) / 2 = − 13.5. You can interchange C and S and still get the same result. For the ST interaction, there are two estimates of S T: ( − 1 + 0) / 2 = − 0.5. Calculate in the same way as above. Calculate the single three-factor interaction (3fi).
The full factorial experiment design with the three factors A, B, and C consists of 2 3 = 8 factor-level combinations. These factor-level combinations are used to calculate the main effects of factors A, B, and C, their two-way interaction (i.e., AB, AC, and BC), as well as their three-way interaction (ABC). Refer to Table 1.3 in this case.
Imagine, for example, an experiment on the effect of cell phone use (yes vs. no) and time of day (day vs. night) on driving ability. This is shown in the factorial design table in Figure 3.1.1 3.1. 1. The columns of the table represent cell phone use, and the rows represent time of day. The four cells of the table represent the four possible ...
In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design.
A 2×3 factorial design is a type of experimental design that allows researchers to understand the effects of two independent variables on a single dependent variable.. In this type of design, one independent variable has two levels and the other independent variable has three levels.. For example, suppose a botanist wants to understand the effects of sunlight (low vs. medium vs. high) and ...
Experimental factors are those that you can specify and set yourself. For example, the maximum temperature to which you can heat a solution. Classification factors can't be specified or set, but they can be recognized and your samples selected accordingly. For example, a person's age or gender. Treatment factors are those which are of ...
Three-level, mixed-level and fractional factorial designs. Mixed level designs have some factors with, say, 2 levels, and some with 3 levels or 4 levels. The 2 k and 3 k experiments are special cases of factorial designs. In a factorial design, one obtains data at every combination of the levels. The importance of factorial designs, especially ...
The analysis begins with a two-level, three-variable experimental design - also written 23 2 3, with n = 2 n = 2 levels for each factor, k = 3 k = 3 different factors. We start by encoding each fo the three variables to something generic: (x1,x2,x3) ( x 1, x 2, x 3). A dataframe with input variable values is then populated.
Abstract. IN an experiment with three factors at three levels each, the experimenter may be willing to sacrifice information on certain components of the two-factor interactions and to ignore the ...
The number of treatment groups is the product of factor levels. Experimental units are randomly selected from a known population. Each experimental unit is randomly assigned to one, and only one, treatment group. ... The observed value of the F ratio for Factor B is 3.33. Since Factor B is a fixed effect, the F ratio (F B) was computed from the ...
The two components will be defined as a linear combination as follows, where X 1 is the level of factor A and X 2 is the level of factor B using the {0,1,2} coding system. Let the A B component be defined as. L A B = X 1 + X 2 ( m o d 3) and the A B 2 component will be defined as: L A B 2 = X 1 + 2 X 2 ( m o d 3) Using these definitions we can ...
Choosing an experimental design. Contents of Section 3. This section describes in detail the process of choosing an experimental design to obtain the results you need. The basic designs an engineer needs to know about are described in detail. Note that this section describes the basic designs used for most engineering and scientific applications.
By including these factors, an expanded model is presented to capture the complexity of student engagement with AI education. ... (3) Proposing an improved experimental methodology based on the ...
Pediatric Hodgkin and non-Hodgkin lymphomas differ from adult cases in biology and management, yet there is a lack of survival analysis tailored to pediatric lymphoma. We analyzed lymphoma data from 1975 to 2018, comparing survival trends between 7,871 pediatric and 226,211 adult patients, identified key risk factors for pediatric lymphoma survival, developed a predictive nomogram, and ...
3440 Market Street, Suite 450 Philadelphia, PA 19104-3335 (215) 746-2309 [email protected]
BioFactors is an international journal aimed at identifying and increasing our understanding of the precise biochemical effects and roles of the large number of trace substances that are required by living organisms. These include vitamins and trace elements, as well as growth factors and regulatory substances made by cells themselves. The elucidation, in a particular organism or cell line, of ...
Background Spinal cord injury (SCI) is a disease that causes permanent impairment of motor, sensory, and autonomic nervous system functions. Stem cell transplantation for neuron regeneration is a promising strategic treatment for SCI. However, selecting stem cell sources and cell transplantation based on experimental evidence is required. Therefore, this study aimed to investigate the efficacy ...
where Q is the weight of associated crowd. In experiments, both first dynamic load factor A 1 and stride frequency f w of the crowd load had to be tested.. Experimental Researches. The major instruments adopted in this work included three-axis wireless acceleration sensor, gateway node the computer with a corresponding terminal program installed.
Kentucky basketball has to land these three 2025 recruits . Who will be the x-factor for the 2024-25 Kentucky basketball team? Kentucky will have an elite frontcourt during the 2024-25 season.