U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Randomised controlled trials—the gold standard for effectiveness research

Eduardo hariton.

1 Department of Obstetrics, Gynecology, and Reproductive Biology, Brigham and Women’s Hospital, 75 Francis Street, Boston, MA, 02116, USA

Joseph J. Locascio

2 Department of Neurology, Massachusetts General Hospital, 15 Parkman Street, Boston, Massachusetts 02114

Randomized controlled trials (RCT) are prospective studies that measure the effectiveness of a new intervention or treatment. Although no study is likely on its own to prove causality, randomization reduces bias and provides a rigorous tool to examine cause-effect relationships between an intervention and outcome. This is because the act of randomization balances participant characteristics (both observed and unobserved) between the groups allowing attribution of any differences in outcome to the study intervention. This is not possible with any other study design.

In designing an RCT, researchers must carefully select the population, the interventions to be compared and the outcomes of interest. Once these are defined, the number of participants needed to reliably determine if such a relationship exists is calculated (power calculation). Participants are then recruited and randomly assigned to either the intervention or the comparator group. 1 It is important to ensure that at the time of recruitment there is no knowledge of which group the participant will be allocated to; this is known as concealment. This is often ensured by using automated randomization systems (e.g. computer generated). RCTs are often blinded so that participants and doctors, nurses or researchers do not know what treatment each participant is receiving, further minimizing bias.

RCTs can be analyzed by intentionto-treat analysis (ITT; subjects analyzed in the groups to which they were randomized), per protocol (only participants who completed the treatment originally allocated are analyzed), or other variations, with ITT often regarded least biased. All RCTs should have pre-specified primary outcomes, should be registered with a clinical trials database and should have appropriate ethical approvals.

RCTs can have their drawbacks, including their high cost in terms of time and money, problems with generalisabilty (participants that volunteer to participate might not be representative of the population being studied) and loss to follow up.

USEFUL RESOURCES

  • CONSORT Statement: CONsolidated Standards of Reporting Trials guidelines designed to improve the reporting of parallel-group randomized controlled trials - http://www.consort-statement.org/consort-2010
  • Link to A Randomized, Controlled Trial of Magnesium Sulfate for the Prevention of Cerebral Palsyin the New England Journal of Medicine – A well designed RCT that had a significant impact in practice patterns. http://www.nejm.org/doi/full/10.1056/NEJMoa0801187#t=abstract

LEARNING POINTS

While expensive and time consuming, RCTs are the gold-standard for studying causal relationships as randomization eliminates much of the bias inherent with other study designs.

To provide true assessment of causality RCTs need to be conducted appropriately (i.e. having concealment of allocation, ITT analysis and blinding when appropriate)

Disclosures: The authors have no financial interests to disclose

Randomization in Statistics and Experimental Design

Design of Experiments > Randomization

What is Randomization?

Randomization in an experiment is where you choose your experimental participants randomly . For example, you might use simple random sampling , where participants names are drawn randomly from a pool where everyone has an even probability of being chosen. You can also assign treatments randomly to participants, by assigning random numbers from a random number table.

If you use randomization in your experiments, you guard against bias . For example, selection bias (where some groups are underrepresented) is eliminated and accidental bias (where chance imbalances happen) is minimized. You can also run a variety of statistical tests on your data (to test your hypotheses) if your sample is random.

Randomization Techniques

The word “random” has a very specific meaning in statistics. Arbitrarily choosing names from a list might seem random, but it actually isn’t. Hidden biases (like a subconscious preference for English names, names that sound like friends, or names that roll off the tongue) means that what you think is a random selection probably isn’t. Because these biases are often hidden, or overlooked, specific randomization techniques have been developed for researchers:

randomization

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 October 2020

The “completely randomised” and the “randomised block” are the only experimental designs suitable for widespread use in pre-clinical research

  • Michael F. W. Festing   ORCID: orcid.org/0000-0001-9092-4562 1  

Scientific Reports volume  10 , Article number:  17577 ( 2020 ) Cite this article

13k Accesses

27 Citations

9 Altmetric

Metrics details

  • Experimental models of disease
  • Medical research

Too many pre-clinical experiments are giving results which cannot be reproduced. This may be because the experiments are incorrectly designed. In “Completely randomized” (CR) and “Randomised block” (RB) experimental designs, both the assignment of treatments to experimental subjects and the order in which the experiment is done, are randomly determined. These designs have been used successfully in agricultural and industrial research and in clinical trials for nearly a century without excessive levels of irreproducibility. They must also be used in pre-clinical research if the excessive level of irreproducibility is to be eliminated. A survey of 100 papers involving mice and rats was used to determine whether scientists had used the CR or RB designs. The papers were assigned to three categories “Design acceptable”, “Randomised to treatment groups”, so of doubtful validity, or “Room for improvement”. Only 32 ± 4.7% of the papers fell into the first group, although none of them actually named either the CR or RB design. If the current high level of irreproducibility is to be eliminated, it is essential that scientists engaged in pre-clinical research use “Completely randomised” (CR), “Randomised block” (RB), or one of the more specialised named experimental designs described in textbooks on the subject.

Similar content being viewed by others

randomised experimental design

Errors in the implementation, analysis, and reporting of randomization within obesity and nutrition research: a guide to their avoidance

randomised experimental design

Randomization, design and analysis for interdependency in aging research: no person or mouse is an island

randomised experimental design

Individualized therapy trials: navigating patient care, research goals and ethics

Introduction.

Excessive numbers of randomised, controlled, pre-clinical experiments give results which can’t be reproduced 1 , 2 . This leads to a waste of scientific resources with excessive numbers of laboratory animals being subjected to pain and distress 3 . There is a considerable body of literature on its possible causes 4 , 5 , 6 , 7 , but failure by scientists to use named experimental designs described in textbooks needs further discussion.

Only two designs are suitable for widespread use in pre-clinical research: “Completely randomised” (CR) shown in Fig.  1 A, and “Randomised block” (RB) shown in Fig.  1 B. In the CR design, each subject (experimental unit) has one of the treatments randomly assigned to it, so that subjects receiving different treatments are randomly intermingled within the research environment. Results can be statistically analysed using a one-way analysis of variance, with the usual assumptions of homogeneity of variances and normality of the residuals.

figure 1

Representation of three experimental designs, each with three treatments (colours) with a sample size of four (for illustration). Each small rectangle represents an experimental unit (for example, a single animal in a cage). Designs A and B can have any number of treatments and sample sizes as, well as additional factors such as both sexes, or more than one strain. Design C is not statistically valid. ( A ) The “Completely randomised” (CR) design. Both assignment of treatments to subjects, and the order in which the experiment is done are randomly determined. This design can accommodate unequal sample sizes. Randomisation was done using EXCEL: Four “A”s, four “B”s and four “C”s were entered into column one and 12 random numbers were put in column two using the command “ = rand()”, and pulling down on the small box on the lower right of the cell. Columns one and two were then marked and sorted on column two using “data, sort”. The row numbers represent individual identification numbers. Different results will be obtained each time. ( B ) The “Randomised block” (RB) design. In this example the experiment has four blocks (outer rectangles) each having a single individual receiving each of the three treatments, in random order. The blocks can be separated in time and/or location. Randomisation was done as follows: Four “A”s, “B”s and “C”s, were put in column 1 and the numbers 1–4 repeated three times were put in column 2. Twelve random numbers were then put in column three, as above. All three columns were then marked and sorted first on column two and then on column three. Row numbers are the individual identity numbers. ( C ) The “Randomisation to treatment group” (RTTG) “design”. This is not a valid design because treatment and environmental effects are confounded. Any environmental effect that differs between groups may be mistaken for the effects of the treatment, leading to bias and irreproducible results.

In the RB design, the experiment is split up into a number of independent “blocks” each of which has a single subject assigned at random to each treatment. When there are only two treatments, this is known as a “matched pairs” design. The whole experiment consists of “N” such blocks where N is sample size. A two-way analysis of variance without interaction is used to analyse the results. The matched pairs design can also be analysed using a one-sample t-test.

Unfortunately, most authors appear to use the in-valid “Randomisation to treatment group” (RTTG) design, shown in Fig.  1 C. In this design, subjects are randomly assigned to physical treatment groups but the order in which the experiment is done is not randomised. This is not valid because each treatment group will occupy a different micro-environment, the effects of which may be mistaken for treatment effects, leading to bias and irreproduciblity.

The origin of randomized controlled experiments

Randomized controlled experiments have a long history of successful use in agricultural research. They were developed largely by R. A. Fisher in the 1920s as a way of detecting small but important differences in yield of agricultural crop varieties or following different fertilizer treatments 8 . Each variety was sown in several adjacent field plots, chosen at random, so that variation among plots growing the same and different crop varieties could be estimated. He used the analysis of variance, which he had invented in previous genetic studies, to statistically evaluate the results.

Fisher noted that in any experiment there are two sources of variation which need to be taken into account if true treatment differences are to be reliably detected. First, is the variation among the experimental subjects, due for example, to the number of grains in a given weight of seed, or to individual variation in a group of mice. Second is the variation caused during the course of the experiment by the research environment and in the assessment of the results. B o th types of variation must be controlled if bias and irreproducibility are to be avoided.

In most pre-clinical research the inter-individual variation can be minimised by careful selection of experimental subjects. But variation associated with the environment caused, for example, by cage location, lighting levels, noise, time of day and changes in the skill of investigators must also be considered. Fisher’s designs minimised bias by using uniform material and by replication and randomisation so that plots receiving different varieties were randomly “intermingled” in the research environment.

According to Montgomery 9 “By randomization we mean that both the allocation of the experimental material, and the order in which the individual runs or trials of the experiment are to be performed, are randomly determined”.

The RB design often provides better control of both inter-individual and environmental variation. Subjects within a block can be matched and each block has a small environmental footprint, compared with the CR design. In one example this resulted in extra power equivalent to using about 40% more animals 10 . The RB design is also convenient because individual blocks can be set up over a period of time to suit the investigator. Positive results will only be detected if the blocks give similar results, as assessed by statistical analysis 11 , 12 . Montgomery 9 ,p 12 even suggests that blocking is one of the three basic principles of experimental design, along with “replication” and “randomisation”.

Fisher and others invented a few other named designs including the “Split plot”, the “Latin square” and the “Cross-over” designs. These can also be used in pre-clinical research in appropriate situations 13 , although they are not discussed here.

The research environment is an important source of variation in pre-clinical research

In most pre-clinical experiments inter-individual variation can be minimised by choosing animals which are similar in age and/or weight. They will have been maintained in the same animal house and should be free of infectious disease. They may also be genetically identical if an inbred strain is used. So the research environment may be the main source of inter-individual variation.

Temporal variation due to circadian and other rhythms such as cage cleaning and feeding routines can affect the physiology and behaviour of the animals over short periods, as do physical factors such as cage location, lighting and noise 14 . If two or more animals are housed in the same cage they will interact, this can increase physiological variation. Even external factors such as barometric pressure can affect the activity of mice 15 . Staff may also become more proficient at handling animals, applying treatments, doing autopsies and measuring results during the course of an experiment, leading to changes in the quality of data.

To avoid bias, cages receiving different treatments must be intermingled (see Fig.  1 A,B), and results should be assessed “blind” and in random order. This happens automatically if subjects are only identified by their identification number once the treatments have been given.

The RB design, is already widely used in studies involving pre-weaned mice and rats 11 . No litter is large enough to make up a whole experiment. So each is regarded as a “block” and one of the treatments, chosen at random, is assigned to each pup within the litter. Results from several litters are then combined in the analysis 16 .

Possible confusion associated with the meaning of the word “group”

Research scientists are sometimes urged to “randomise their subjects to treatment groups”. Such advice is ambiguous. According to Chambers Twentieth Century Dictionary (1972) the word “ group ” can mean “a number of persons or things together” or “a number of individual things related in some definite way differentiating them from others” .

Statisticians involved in clinical trials sometimes write about “randomising patients to treatment groups”. Clearly, they are using the second definition as there are no physical groups in a clinical trial. But if scientists assign their animals to physical groups (“….things together”), they will be using the invalid “Randomisation to treatment group” (RTTG) design shown in Fig.  1 C, possibly leading irreproducibility.

A sample survey of experimental design in published pre-clinical papers

A survey of published papers using mice or rats was used to assess the use of CR, RB, or other named experimental designs. PubMed Central is a collection of several million full-text pre-clinical scientific papers that can be searched for specific English words. A search for “Mouse” and “Experiment” retrieved 682,264 papers. The first fifty of these had been published between 2014 and 2020. They were not in any obvious identification number or date order. For example, the first ten papers had been published in 2017, 17, 19, 19, 19, 18, 15, 16, 19, and 18. And the first two digits of their identification numbers were 55, 55, 66, 65, 66, 59, 71, 61, 46 and 48. In order to introduce a random element to the selection, only papers with an even identification number were used.

Each paper was searched for the words “random”, “experiment”, “statistical”, “matched” and other words necessary to understand how the experiments had been designed. Tables and figures were also inspected. The discipline and type of animals which had been used (wild-type, mutant, or genetically modified) was also noted. The aim was to assess the design of the experiments, not the quality of research.

Most papers involved several experiments, but the designs were usually similar. All were assessed and re-assessed, blind to the previous scores, after an interval of approximately 2 weeks. The results in seventeen of the papers were discordant so they were reassessed.

Papers which used laboratory mice

The results for mice and rats are summarised in Table 1 . Thirty six (72 ± 3.2%) of the “mouse” papers involved genetically modified or mutant mice. Each was assigned to one of three categories:

“Apparently well designed” (13 papers, 26 ± 1.6%). None of these papers mentioned either the CR or RB design by name, although a few of them appeared to have used one of these designs. For example, one stated: "All three genotypes were tested on the same day in randomized order by two investigators who were blind to the genotypes." This was scored as a CR design.

“Room for improvement” (22 papers, 44 ± 3.5%). None of these papers used the word “random” with respect to the assignment of treatments to the animals, or the order in which the experiment was done, although it was sometimes used in other contexts. So these papers had not, apparently, used any named experimental design, so were susceptible to bias.

“Randomised to group” (15, papers, 30 ± 2.1%). These papers stated that the subjects had been “Randomised to the treatment groups”. The most likely interpretation is that these were physical groups, so the experiments had used the statistically invalid RTTG design as shown in Fig.  1 C. However, as noted above, the word group is ambiguous. if it meant that one of the treatments, chosen at random, had been assigned to each animal, then this would have constituted a “Completely randomised” (CR) design. As the first interpretation seems to be most likely, these experiments were classified as being of doubtful validity.

Papers which used laboratory rats

A similar search in Pubmed on “rat” and “experiment” found 483,490 papers. The first 50 of these with even identification numbers were published between 2015 and 2020. Four of them used mutant or genetically modified, the rest used wild-type rats. Twenty two of them involved experimental pathology, nineteen behaviour, seven physiology, one immunology and one pharmacology. Again, it was only the quality of the experimental design which was assessed, not the biological validity of results.

Nineteen (38 ± 0.3%) of the rat papers were placed in the “Design acceptable” category. Those involving behaviour were of notably high statistical quality (and complexity). Three stated that they had used the “Matched pairs” design and one had used a RB design without naming it. None of them mentioned either the CR or RB designs.

In eleven (22 ± 2.9%) papers, the rats were assigned to treatment groups. So it is unclear whether these papers had used the valid CR design or the in-valid RTTG design as discussed above for mice, although the latter seems to be more likely.

The “Room for improvement” group consisted of 20 (40 ± 4.0%) of the papers. These had not used the word “random” with respect to the assignment of treatments to subjects or vice versa and there was no evidence that they had used the RB, CR or other recognised experimental designs.

Conclusions from the sample survey

Results for both mice and rats are summarised in Table 1 . The quality of the experimental design in papers involving rats was slightly higher than that involving mice (Chi-sq. = 1.84, p  = 0.04). This was largely due to the high quality of the behaviour (psychological) studies in the rat.

Combining the two species, 32 ± 4.7% of the papers were judged to have been designed and randomised to an acceptable standard, although none of them stated that they had used either the CR or RB design. One mouse paper had used a “Latin square” design. Another had used a “Completely randomised” design without naming it, and a mouse paper noted that “All experiments were performed independently at least three times.” Such repetition can lead to tricky statistical problems if results are to be combined 17 . Scientists wishing to build repeatability into their experiments could use the RB design, spreading the blocks over a period of time.

Discussion and conclusions

Names matter. People, places, species and scientific procedures have names which can be used to identify and describe a subject or a procedure. Experimental designs also have names; “Completely randomised”(CR), “Randomised block”(RB), “Latin square”, “Matched pairs” etc. These can be found in textbooks which describe the characteristics and uses of each design 13 . However, none of the papers in the above survey mentioned either the CR or the RB design by name, although these are the only designs suitable for general use.

The widespread use of the statistically in-valid RTTG design, which is not found in any reputable textbooks, may account for a substantial fraction of the observed irreproducibility. Organisations which support pre-clinical research and training should ensure that their literature and web sites have been peer reviewed by qualified statisticians and that they refer to named, statistically valid experimental designs.

The RB and CR designs are quite versatile. They can be used for any number of treatments and sample sizes as well as for additional factors such as both sexes or several strains of animals, often without increasing the total numbers.

The first clinical trials were supervised by statisticians who adapted the CR design for such work. But scientists doing pre-clinical research have received little statistical support, so it is not surprising that so many of their experiments are incorrectly designed. High levels of irreproducibility are unlikely to be found in pre-clinical research in the pharmaceutical industry because the “PSI”, (the Association of Statisticians in the UK pharmaceutical industry), has about 800 members employed in the U.K.

Irreproducibility is wasteful and expensive. The employment of more applied statisticians in Academia to assist the scientists doing pre-clinical research would be an excellent investment.

Begley, C. G. & Ellis, L. M. Drug development: Raise standards for preclinical cancer research. Nature 483 , 531–533 (2012).

Article   ADS   CAS   Google Scholar  

Scott, S. et al. Design, power, and interpretation of studies in the standard murine model of ALS. Amyotroph Lateral Scler 9 , 4–15 (2008).

Article   CAS   Google Scholar  

Freedman L.P., Cockburn IM, Simcoe TS: The Economics of Reproducibility in Preclinical Research. PLoS Biol ; 13: e1002165.(2015).

Fiala C., Diamandis E.P.,: Benign and malignant scientific irreproducibility. Clin Biochem . May;55:1–2.(2018).

Boulbes D.R., Costello T., Baggerly K., Fan F., Wang R., Bhattacharya R., et al.: A Survey on Data Reproducibility and the Effect of Publication Process on the Ethical Reporting of Laboratory Research. Clin Cancer Res Jul 15;24(14):3447–55.(2018).

Marino M.,J.: How often should we expect to be wrong? Statistical power, P values, and the expected prevalence of false discoveries. Biochem Pharmacol May;151:226–33.(2018).

Roberts, I., Kwan, I., Evans, P. & Haig, S. Does animal experimentation inform human healthcare? Observations from a systematic review of international animal experiments on fluid resuscitation. BMJ 324 , 474–476 (2002).

Article   Google Scholar  

Fisher, R. A. The design of experiments (Hafner Publishing Company, Inc, New York, 1960).

Google Scholar  

Montgomery, D. C. Design and Analysis of Experiments (John Wiley & Sons, Inc., New York, 1984).

Festing, M. F. W. The scope for improving the design of laboratory animal experiments. Lab. Anim. 26 , 256–267 (1992).

Festing, M. F. Randomized block experimental designs can increase the power and reproducibility of laboratory animal experiments. ILAR J. 55 , 472–476 (2014).

Festing, M. F. W. Experimental design and irreproducibility in pre-clinical research. Physiol. News 118 , 14–15 (2020).

Festing, M. F. W., Overend, P., CortinaBorja, M. & Berdoy, M. The Design of Animal Experiments 2nd edn. (Sage, New York, 2016).

Nevalainen, T. Animal husbandry and experimental design. ILAR J. 55 , 392–398 (2014).

Article   MathSciNet   CAS   Google Scholar  

Sprott, R. L. Barometric pressure fluctuations: effect on the activity of laboratory mice. Science 157 , 1206–1207 (1967).

Festing, M. F. W. Design and statistical methods in studies using animal models of development. ILAR J. 47 , 5–14 (2006).

Frommlet, F. & Heinze, G. Experimental replications in animal trials. Lab. Anim. https://doi.org/10.1177/002367722090761 (2020).

Article   PubMed   Google Scholar  

Download references

Acknowledgement

The author wishes to thank Laboratory Animals Limited for financial support in the publication of this paper.

Author information

Authors and affiliations.

c/o The Medical Research Council, 2nd. floor, David Phillips Building, Polaris House, North Star Av., Swindon, Wiltshire, SN2 1FL, UK

Michael F. W. Festing

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Michael F. W. Festing .

Ethics declarations

Competing interests.

The author declares no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Festing, M.F.W. The “completely randomised” and the “randomised block” are the only experimental designs suitable for widespread use in pre-clinical research. Sci Rep 10 , 17577 (2020). https://doi.org/10.1038/s41598-020-74538-3

Download citation

Received : 26 February 2020

Accepted : 01 October 2020

Published : 16 October 2020

DOI : https://doi.org/10.1038/s41598-020-74538-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Maternal n-3 enriched diet reprograms the offspring neurovascular transcriptome and blunts inflammation induced by endotoxin in the neonate.

  • Tetyana Chumak
  • Amandine Jullienne
  • Carina Mallard

Journal of Neuroinflammation (2024)

Statistical design of experiments: the forgotten component of Reduction

  • Penny Reynolds

Lab Animal (2024)

The impact of hedonic social media use during microbreaks on employee resources recovery

  • Jaroslaw Grobelny
  • Marta Glinka
  • Teresa Chirkowska-Smolak

Scientific Reports (2024)

Between two stools: preclinical research, reproducibility, and statistical design of experiments

  • Penny S. Reynolds

BMC Research Notes (2022)

Planning preclinical confirmatory multicenter trials to strengthen translation from basic to clinical research – a multi-stakeholder workshop report

  • Natascha Ingrid Drude
  • Lorena Martinez-Gamboa

Translational Medicine Communications (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

randomised experimental design

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Randomized Controlled Trial (RCT) Overview

By Jim Frost Leave a Comment

What is a Randomized Controlled Trial (RCT)?

A randomized controlled trial (RCT) is a prospective experimental design that randomly assigns participants to an experimental or control group. RCTs are the gold standard for establishing causal relationships and ruling out confounding variables and selection bias. Researchers must be able to control who receives the treatments and who are the controls to use this design.

Photo of a scientist working on a randomized controlled trial (RCT).

Random assignment is crucial for ruling out other potentially explanatory factors that could have caused those outcome differences. This process in RCTs is so effective that it even works with potential confounders that the researchers don’t know about! Think age, lifestyle, or genetics. Learn more about Random Assignment in Experiments .

Scientists use randomized controlled trials most frequently in fields like medicine, psychology, and social sciences to rigorously test interventions and treatments.

In this post, learn how RCTs work, the various types, and their strengths and weaknesses.

Randomized Controlled Trial Example

Imagine testing a new drug against a placebo using a randomized controlled trial. We take a representative sample of 100 patients. 50 get the drug; 50 get the placebo. Who gets what? It’s random! Perhaps we flip a coin. For more complex designs, we’d probably use computers for random assignment.

After a month, we measure health outcomes. Did the drug help more than the placebo? That’s what we find out!

To read about several examples of top-notch RCTs in more detail, read my following posts:

  • How Effective Are Flu Shots?
  • COVID Vaccination Randomized Controlled Trial

Common Elements for Effective RCT Designs

While randomization springs to mind when discussing RCTs, other equally vital components shape these robust experimental designs. Most well-designed randomized controlled trials contain the following elements.

  • Control Group : Almost every RCT features a control group. This group might receive a placebo, no intervention, or standard care. You can estimate the treatment’s effect size by comparing the outcome in a treatment group to the control group. Learn more about Control Groups in an Experiment  and controlling for the Placebo Effect .
  • Blinding : Blinding hides group assignments from researchers and participants to prevent group assignment knowledge from influencing results. More on this shortly!
  • Pre-defined Inclusion and Exclusion Criteria : These criteria set the boundaries for who can participate based on specifics like age or health conditions.
  • Baseline Assessment : Before diving in, an initial assessment records participants’ starting conditions.
  • Outcome Measures : Clear, pre-defined outcomes, like symptom reduction or survival rates, drive the study’s goals.
  • Controlled, Standardized Environments : Ensuring variables are measured and treatments administered consistently minimizes external factors that could affect results.
  • Monitoring and Data Collection : Regular checks guarantee participant safety and uniform data gathering.
  • Ethical Oversight : Ensures participants’ rights and well-being are prioritized.
  • Informed Consent : Participants must know the drill and agree to participate before joining .
  • Statistical Plan : Detailing how statisticians will analyze the data before the RCT begins helps keep the evaluation objective and prevents p-hacking. Learn more about P-Hacking Best Practices .
  • Protocol Adherence : Consistency is critical. Following the plan ensures reliable results.
  • Analysis and Reporting : Once done, researchers share the results—good, bad, or neutral. Transparency builds trust.

These components ensure randomized controlled trials are both rigorous and ethically sound, leading to trustworthy results.

Common Variations of Randomized Controlled Trial Designs

Randomized controlled trial designs aren’t one-size-fits-all. Depending on the research question and context, researchers can apply various configurations.

Let’s explore the most common RCT designs:

  • Parallel Group : Participants are randomly put into an intervention or control group.
  • Crossover : Participants randomly receive both intervention and control at different times.
  • Factorial : Tests multiple interventions at once. Useful for combination therapies.
  • Cluster : Groups, not individuals, are randomized. For instance, researchers can randomly assign schools or towns to the experimental groups.

If you can’t randomly assign subjects and you want to draw causal conclusions about an intervention, consider using a quasi-experimental design .

Learn more about Experimental Design: Definition and Types .

Blinding in RCTs

Blinding is a standard protection in randomized controlled trials. The term refers to procedures that hide group assignments from those involved. While randomization ensures initial group balance, it doesn’t prevent uneven treatment or assessment as the RCT progresses, which could skew results.

So, what is the best way to sidestep potential biases?

Keep as many people in the dark about group assignments as possible. In a blinded randomized controlled trial, participants, and sometimes researchers, don’t know who gets the intervention.

There are three types of blinding:

  • Single : Participants don’t know if they’re in the intervention or control group.
  • Double : Both participants and researchers are in the dark.
  • Triple : Participants, researchers, and statisticians all don’t know.

It guards against sneaky biases that might creep into our RCT results. Let’s look at a few:

  • Confirmation Bias : Without blinding in a randomized controlled trial, researchers might unconsciously favor results that align with their expectations. For example, they might interpret ambiguous data as positive effects of a new drug if they’re hopeful about its efficacy.
  • Placebo Effect : Participants who know they’re getting the ‘real deal’ might report improved outcomes simply because they believe in the treatment’s power. Conversely, those aware they’re in the control group might not notice genuine improvements.
  • Observer Bias : If a researcher knows which participant is in which group, they might inadvertently influence outcomes. Imagine a physiotherapist unknowingly encouraging a participant more because they know they’re receiving the new treatment.

Blinding helps keep these biases at bay, making our results more reliable. It boosts confidence in a randomized controlled trial. Let’s close by summarizing the benefits and disadvantages of an RCT.

The Benefits of Randomized Controlled Studies

Randomized controlled trials offer a unique blend of strengths:

  • RCTs are best for identifying causal relationships.
  • Random assignment reduces both known and unknown biases.
  • Many RCT designs exist, tailored for different research questions.
  • Well-defined steps and controlled conditions ensure replicability across studies.
  • Internal validity tends to be high in a randomized controlled trial. You can be confident that other variables don’t affect or account for the observed relationship.

Learn more about Correlation vs. Causation: Understanding the Differences .

The Drawbacks of RCTs

While powerful, RCTs also come with limitations:

  • Randomized controlled trials can be expensive in time, money, and resources.
  • Ethical concerns can arise when withholding treatments from a control group.
  • Random assignment might not be possible in some circumstances.
  • External validity can be low in an RCT. Conditions can be so controlled that the results might not always generalize beyond the study.

For a good comparison, learn about the differences and tradeoffs between using Observational Studies and Randomized Experiments .

Learn more about Internal and External Validity in Experiments and see how they’re a tradeoff.

Share this:

randomised experimental design

Reader Interactions

Comments and questions cancel reply.

Logo

What is a Completely Randomized Design?

by Kim Love   Leave a Comment

Stage 2

Let’s take a look.

How It Works

The basic idea of any experiment is to learn how different conditions or versions of a treatment affect an outcome. To do this, you assign subjects to different treatment groups. You then run the experiment and record the results for each subject.

Afterward, you use statistical methods to determine whether the different treatment groups have different outcomes.

Key principles for any experimental design are randomization, replication, and reduction of variance. Randomization means assigning the subjects to the different groups in a random way.

Replication means ensuring there are multiple subjects in each group.

Reduction of variance refers to removing or accounting for systematic differences among subjects. Completely randomized designs address the first two principles in a simple way.

To execute a completely randomized design, first determine how many versions of the treatment there are. Next determine how many subjects are available. Divide the number of subjects by the number of treatments to get the number of subjects in each group.

The final design step is to randomly assign individual subjects to fill the spots in each group.

Suppose you are running an experiment. You want to compare three training regimens that may affect the time it takes to run one mile. You also have 12 human subjects who are willing to participate in the experiment. Because you have three training regimens, you will have 12/3 = 4 subjects in each group.

Statistical software (or even Excel) can do the actual assignment. You only need to start by numbering the subjects from 1 to 12 in any way that is convenient. The following table shows one possible random assignment of 12 subjects to three groups.

randomised experimental design

It’s okay if the number of replicates in each group isn’t exactly the same. Make them as even as possible and assign more to groups that are more interesting to you. Modern statistical software has no trouble adjusting for different sample sizes.

When there is more than one treatment variable, not much changes. Use the combination of treatments when performing random assignment.

For example, say that you add a diet treatment with two conditions in addition to the training. Combined with the three versions of training, there are six possible treatment groups. Assign the subjects in the exact way already described, but with six groups instead of three.

Do not skip randomization! Randomization is the only way to ensure your groups are similar except for the treatment. This is important to ensuring you can attribute group differences to the treatment.

When This Design DOESN’T Work

The completely randomized design is excellent when plenty of unrelated subjects are available to sample.  But some situations call for more advanced designs.

This design doesn’t address the third principle of experimental design , reduction of variance.

Sure, you may be able to address this by adding covariates to the analysis. These are variables that are not experimentally assigned but you can measure them. But if reduction of variance is important, other designs do this better.

If some of the subjects are related to each other or a single subject is exposed to multiple conditions of a treatment, you’re going to need another design.

Sometimes it is important to measure outcomes more than once during experimental treatment. For example, you might want to know how quickly the subjects make progress in their training. Again, any repeated measures of outcomes constitute a more complicated design.

Strengths of the Completely Randomized Design

When it works, it has many strengths.

It’s not only easy to create, it’s straightforward to analyze. The results are relatively easy to explain to a non-statistical audience.

Finally, familiarity with this design will help you recognize when it isn’t appropriate. Understanding the ways in which it is not appropriate can help you choose a more advanced design.

randomised experimental design

Reader Interactions

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Privacy Overview

  • Instructor View

Completely Randomized Designs

Last updated on 2024-05-14 | Edit this page

  • What is a completely randomized design (CRD)?
  • CRD is the simplest experimental design.
  • In CRD, treatments are assigned randomly to experimental units.
  • CRD assumes that the experimental units are relatively homogeneous or similar.
  • CRD doesn’t remove or account for systematic differences among experimental units.

A single qualitative factor

Analysis of variance (anova), equal variances and normality, a single quantitative factor, design issues.

  • CRD is a simple design that can be used when experimental are homogeneous.

Cart

  • SUGGESTED TOPICS
  • The Magazine
  • Newsletters
  • Managing Yourself
  • Managing Teams
  • Work-life Balance
  • The Big Idea
  • Data & Visuals
  • Case Selections
  • HBR Learning
  • Topic Feeds
  • Account Settings
  • Email Preferences

A Refresher on Randomized Controlled Experiments

How to design the right kind of test.

In order to make smart decisions at work, we need data. Where that data comes from and how we analyze it depends on a lot of factors — for example, what we’re trying to do with the results, how accurate we need the findings to be, and how much of a budget we have. There is a spectrum of experiments that managers can do from quick, informal ones, to pilot studies, to field experiments, and to lab research. One of the more structured experiments is the randomized controlled experiment­ .

randomised experimental design

  • Amy Gallo is a contributing editor at Harvard Business Review, cohost of the Women at Work podcast , and the author of two books: Getting Along: How to Work with Anyone (Even Difficult People) and the HBR Guide to Dealing with Conflict . She writes and speaks about workplace dynamics. Watch her TEDx talk on conflict and follow her on LinkedIn . amyegallo

Partner Center

  • Privacy Policy

Research Method

Home » Experimental Design – Types, Methods, Guide

Experimental Design – Types, Methods, Guide

Table of Contents

Experimental Research Design

Experimental Design

Experimental design is a process of planning and conducting scientific experiments to investigate a hypothesis or research question. It involves carefully designing an experiment that can test the hypothesis, and controlling for other variables that may influence the results.

Experimental design typically includes identifying the variables that will be manipulated or measured, defining the sample or population to be studied, selecting an appropriate method of sampling, choosing a method for data collection and analysis, and determining the appropriate statistical tests to use.

Types of Experimental Design

Here are the different types of experimental design:

Completely Randomized Design

In this design, participants are randomly assigned to one of two or more groups, and each group is exposed to a different treatment or condition.

Randomized Block Design

This design involves dividing participants into blocks based on a specific characteristic, such as age or gender, and then randomly assigning participants within each block to one of two or more treatment groups.

Factorial Design

In a factorial design, participants are randomly assigned to one of several groups, each of which receives a different combination of two or more independent variables.

Repeated Measures Design

In this design, each participant is exposed to all of the different treatments or conditions, either in a random order or in a predetermined order.

Crossover Design

This design involves randomly assigning participants to one of two or more treatment groups, with each group receiving one treatment during the first phase of the study and then switching to a different treatment during the second phase.

Split-plot Design

In this design, the researcher manipulates one or more variables at different levels and uses a randomized block design to control for other variables.

Nested Design

This design involves grouping participants within larger units, such as schools or households, and then randomly assigning these units to different treatment groups.

Laboratory Experiment

Laboratory experiments are conducted under controlled conditions, which allows for greater precision and accuracy. However, because laboratory conditions are not always representative of real-world conditions, the results of these experiments may not be generalizable to the population at large.

Field Experiment

Field experiments are conducted in naturalistic settings and allow for more realistic observations. However, because field experiments are not as controlled as laboratory experiments, they may be subject to more sources of error.

Experimental Design Methods

Experimental design methods refer to the techniques and procedures used to design and conduct experiments in scientific research. Here are some common experimental design methods:

Randomization

This involves randomly assigning participants to different groups or treatments to ensure that any observed differences between groups are due to the treatment and not to other factors.

Control Group

The use of a control group is an important experimental design method that involves having a group of participants that do not receive the treatment or intervention being studied. The control group is used as a baseline to compare the effects of the treatment group.

Blinding involves keeping participants, researchers, or both unaware of which treatment group participants are in, in order to reduce the risk of bias in the results.

Counterbalancing

This involves systematically varying the order in which participants receive treatments or interventions in order to control for order effects.

Replication

Replication involves conducting the same experiment with different samples or under different conditions to increase the reliability and validity of the results.

This experimental design method involves manipulating multiple independent variables simultaneously to investigate their combined effects on the dependent variable.

This involves dividing participants into subgroups or blocks based on specific characteristics, such as age or gender, in order to reduce the risk of confounding variables.

Data Collection Method

Experimental design data collection methods are techniques and procedures used to collect data in experimental research. Here are some common experimental design data collection methods:

Direct Observation

This method involves observing and recording the behavior or phenomenon of interest in real time. It may involve the use of structured or unstructured observation, and may be conducted in a laboratory or naturalistic setting.

Self-report Measures

Self-report measures involve asking participants to report their thoughts, feelings, or behaviors using questionnaires, surveys, or interviews. These measures may be administered in person or online.

Behavioral Measures

Behavioral measures involve measuring participants’ behavior directly, such as through reaction time tasks or performance tests. These measures may be administered using specialized equipment or software.

Physiological Measures

Physiological measures involve measuring participants’ physiological responses, such as heart rate, blood pressure, or brain activity, using specialized equipment. These measures may be invasive or non-invasive, and may be administered in a laboratory or clinical setting.

Archival Data

Archival data involves using existing records or data, such as medical records, administrative records, or historical documents, as a source of information. These data may be collected from public or private sources.

Computerized Measures

Computerized measures involve using software or computer programs to collect data on participants’ behavior or responses. These measures may include reaction time tasks, cognitive tests, or other types of computer-based assessments.

Video Recording

Video recording involves recording participants’ behavior or interactions using cameras or other recording equipment. This method can be used to capture detailed information about participants’ behavior or to analyze social interactions.

Data Analysis Method

Experimental design data analysis methods refer to the statistical techniques and procedures used to analyze data collected in experimental research. Here are some common experimental design data analysis methods:

Descriptive Statistics

Descriptive statistics are used to summarize and describe the data collected in the study. This includes measures such as mean, median, mode, range, and standard deviation.

Inferential Statistics

Inferential statistics are used to make inferences or generalizations about a larger population based on the data collected in the study. This includes hypothesis testing and estimation.

Analysis of Variance (ANOVA)

ANOVA is a statistical technique used to compare means across two or more groups in order to determine whether there are significant differences between the groups. There are several types of ANOVA, including one-way ANOVA, two-way ANOVA, and repeated measures ANOVA.

Regression Analysis

Regression analysis is used to model the relationship between two or more variables in order to determine the strength and direction of the relationship. There are several types of regression analysis, including linear regression, logistic regression, and multiple regression.

Factor Analysis

Factor analysis is used to identify underlying factors or dimensions in a set of variables. This can be used to reduce the complexity of the data and identify patterns in the data.

Structural Equation Modeling (SEM)

SEM is a statistical technique used to model complex relationships between variables. It can be used to test complex theories and models of causality.

Cluster Analysis

Cluster analysis is used to group similar cases or observations together based on similarities or differences in their characteristics.

Time Series Analysis

Time series analysis is used to analyze data collected over time in order to identify trends, patterns, or changes in the data.

Multilevel Modeling

Multilevel modeling is used to analyze data that is nested within multiple levels, such as students nested within schools or employees nested within companies.

Applications of Experimental Design 

Experimental design is a versatile research methodology that can be applied in many fields. Here are some applications of experimental design:

  • Medical Research: Experimental design is commonly used to test new treatments or medications for various medical conditions. This includes clinical trials to evaluate the safety and effectiveness of new drugs or medical devices.
  • Agriculture : Experimental design is used to test new crop varieties, fertilizers, and other agricultural practices. This includes randomized field trials to evaluate the effects of different treatments on crop yield, quality, and pest resistance.
  • Environmental science: Experimental design is used to study the effects of environmental factors, such as pollution or climate change, on ecosystems and wildlife. This includes controlled experiments to study the effects of pollutants on plant growth or animal behavior.
  • Psychology : Experimental design is used to study human behavior and cognitive processes. This includes experiments to test the effects of different interventions, such as therapy or medication, on mental health outcomes.
  • Engineering : Experimental design is used to test new materials, designs, and manufacturing processes in engineering applications. This includes laboratory experiments to test the strength and durability of new materials, or field experiments to test the performance of new technologies.
  • Education : Experimental design is used to evaluate the effectiveness of teaching methods, educational interventions, and programs. This includes randomized controlled trials to compare different teaching methods or evaluate the impact of educational programs on student outcomes.
  • Marketing : Experimental design is used to test the effectiveness of marketing campaigns, pricing strategies, and product designs. This includes experiments to test the impact of different marketing messages or pricing schemes on consumer behavior.

Examples of Experimental Design 

Here are some examples of experimental design in different fields:

  • Example in Medical research : A study that investigates the effectiveness of a new drug treatment for a particular condition. Patients are randomly assigned to either a treatment group or a control group, with the treatment group receiving the new drug and the control group receiving a placebo. The outcomes, such as improvement in symptoms or side effects, are measured and compared between the two groups.
  • Example in Education research: A study that examines the impact of a new teaching method on student learning outcomes. Students are randomly assigned to either a group that receives the new teaching method or a group that receives the traditional teaching method. Student achievement is measured before and after the intervention, and the results are compared between the two groups.
  • Example in Environmental science: A study that tests the effectiveness of a new method for reducing pollution in a river. Two sections of the river are selected, with one section treated with the new method and the other section left untreated. The water quality is measured before and after the intervention, and the results are compared between the two sections.
  • Example in Marketing research: A study that investigates the impact of a new advertising campaign on consumer behavior. Participants are randomly assigned to either a group that is exposed to the new campaign or a group that is not. Their behavior, such as purchasing or product awareness, is measured and compared between the two groups.
  • Example in Social psychology: A study that examines the effect of a new social intervention on reducing prejudice towards a marginalized group. Participants are randomly assigned to either a group that receives the intervention or a control group that does not. Their attitudes and behavior towards the marginalized group are measured before and after the intervention, and the results are compared between the two groups.

When to use Experimental Research Design 

Experimental research design should be used when a researcher wants to establish a cause-and-effect relationship between variables. It is particularly useful when studying the impact of an intervention or treatment on a particular outcome.

Here are some situations where experimental research design may be appropriate:

  • When studying the effects of a new drug or medical treatment: Experimental research design is commonly used in medical research to test the effectiveness and safety of new drugs or medical treatments. By randomly assigning patients to treatment and control groups, researchers can determine whether the treatment is effective in improving health outcomes.
  • When evaluating the effectiveness of an educational intervention: An experimental research design can be used to evaluate the impact of a new teaching method or educational program on student learning outcomes. By randomly assigning students to treatment and control groups, researchers can determine whether the intervention is effective in improving academic performance.
  • When testing the effectiveness of a marketing campaign: An experimental research design can be used to test the effectiveness of different marketing messages or strategies. By randomly assigning participants to treatment and control groups, researchers can determine whether the marketing campaign is effective in changing consumer behavior.
  • When studying the effects of an environmental intervention: Experimental research design can be used to study the impact of environmental interventions, such as pollution reduction programs or conservation efforts. By randomly assigning locations or areas to treatment and control groups, researchers can determine whether the intervention is effective in improving environmental outcomes.
  • When testing the effects of a new technology: An experimental research design can be used to test the effectiveness and safety of new technologies or engineering designs. By randomly assigning participants or locations to treatment and control groups, researchers can determine whether the new technology is effective in achieving its intended purpose.

How to Conduct Experimental Research

Here are the steps to conduct Experimental Research:

  • Identify a Research Question : Start by identifying a research question that you want to answer through the experiment. The question should be clear, specific, and testable.
  • Develop a Hypothesis: Based on your research question, develop a hypothesis that predicts the relationship between the independent and dependent variables. The hypothesis should be clear and testable.
  • Design the Experiment : Determine the type of experimental design you will use, such as a between-subjects design or a within-subjects design. Also, decide on the experimental conditions, such as the number of independent variables, the levels of the independent variable, and the dependent variable to be measured.
  • Select Participants: Select the participants who will take part in the experiment. They should be representative of the population you are interested in studying.
  • Randomly Assign Participants to Groups: If you are using a between-subjects design, randomly assign participants to groups to control for individual differences.
  • Conduct the Experiment : Conduct the experiment by manipulating the independent variable(s) and measuring the dependent variable(s) across the different conditions.
  • Analyze the Data: Analyze the data using appropriate statistical methods to determine if there is a significant effect of the independent variable(s) on the dependent variable(s).
  • Draw Conclusions: Based on the data analysis, draw conclusions about the relationship between the independent and dependent variables. If the results support the hypothesis, then it is accepted. If the results do not support the hypothesis, then it is rejected.
  • Communicate the Results: Finally, communicate the results of the experiment through a research report or presentation. Include the purpose of the study, the methods used, the results obtained, and the conclusions drawn.

Purpose of Experimental Design 

The purpose of experimental design is to control and manipulate one or more independent variables to determine their effect on a dependent variable. Experimental design allows researchers to systematically investigate causal relationships between variables, and to establish cause-and-effect relationships between the independent and dependent variables. Through experimental design, researchers can test hypotheses and make inferences about the population from which the sample was drawn.

Experimental design provides a structured approach to designing and conducting experiments, ensuring that the results are reliable and valid. By carefully controlling for extraneous variables that may affect the outcome of the study, experimental design allows researchers to isolate the effect of the independent variable(s) on the dependent variable(s), and to minimize the influence of other factors that may confound the results.

Experimental design also allows researchers to generalize their findings to the larger population from which the sample was drawn. By randomly selecting participants and using statistical techniques to analyze the data, researchers can make inferences about the larger population with a high degree of confidence.

Overall, the purpose of experimental design is to provide a rigorous, systematic, and scientific method for testing hypotheses and establishing cause-and-effect relationships between variables. Experimental design is a powerful tool for advancing scientific knowledge and informing evidence-based practice in various fields, including psychology, biology, medicine, engineering, and social sciences.

Advantages of Experimental Design 

Experimental design offers several advantages in research. Here are some of the main advantages:

  • Control over extraneous variables: Experimental design allows researchers to control for extraneous variables that may affect the outcome of the study. By manipulating the independent variable and holding all other variables constant, researchers can isolate the effect of the independent variable on the dependent variable.
  • Establishing causality: Experimental design allows researchers to establish causality by manipulating the independent variable and observing its effect on the dependent variable. This allows researchers to determine whether changes in the independent variable cause changes in the dependent variable.
  • Replication : Experimental design allows researchers to replicate their experiments to ensure that the findings are consistent and reliable. Replication is important for establishing the validity and generalizability of the findings.
  • Random assignment: Experimental design often involves randomly assigning participants to conditions. This helps to ensure that individual differences between participants are evenly distributed across conditions, which increases the internal validity of the study.
  • Precision : Experimental design allows researchers to measure variables with precision, which can increase the accuracy and reliability of the data.
  • Generalizability : If the study is well-designed, experimental design can increase the generalizability of the findings. By controlling for extraneous variables and using random assignment, researchers can increase the likelihood that the findings will apply to other populations and contexts.

Limitations of Experimental Design

Experimental design has some limitations that researchers should be aware of. Here are some of the main limitations:

  • Artificiality : Experimental design often involves creating artificial situations that may not reflect real-world situations. This can limit the external validity of the findings, or the extent to which the findings can be generalized to real-world settings.
  • Ethical concerns: Some experimental designs may raise ethical concerns, particularly if they involve manipulating variables that could cause harm to participants or if they involve deception.
  • Participant bias : Participants in experimental studies may modify their behavior in response to the experiment, which can lead to participant bias.
  • Limited generalizability: The conditions of the experiment may not reflect the complexities of real-world situations. As a result, the findings may not be applicable to all populations and contexts.
  • Cost and time : Experimental design can be expensive and time-consuming, particularly if the experiment requires specialized equipment or if the sample size is large.
  • Researcher bias : Researchers may unintentionally bias the results of the experiment if they have expectations or preferences for certain outcomes.
  • Lack of feasibility : Experimental design may not be feasible in some cases, particularly if the research question involves variables that cannot be manipulated or controlled.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Descriptive Research Design

Descriptive Research Design – Types, Methods and...

One-to-One Interview in Research

One-to-One Interview – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Correlational Research Design

Correlational Research – Methods, Types and...

Basic Research

Basic Research – Types, Methods and Examples

Qualitative Research

Qualitative Research – Methods, Analysis Types...

randomised experimental design

  • Voxco Online
  • Voxco Panel Management
  • Voxco Panel Portal
  • Voxco Audience
  • Voxco Mobile Offline
  • Voxco Dialer Cloud
  • Voxco Dialer On-premise
  • Voxco TCPA Connect
  • Voxco Analytics
  • Voxco Text & Sentiment Analysis

randomised experimental design

  • 40+ question types
  • Drag-and-drop interface
  • Skip logic and branching
  • Multi-lingual survey
  • Text piping
  • Question library
  • CSS customization
  • White-label surveys
  • Customizable ‘Thank You’ page
  • Customizable survey theme
  • Reminder send-outs
  • Survey rewards
  • Social media
  • Website surveys
  • Correlation analysis
  • Cross-tabulation analysis
  • Trend analysis
  • Real-time dashboard
  • Customizable report
  • Email address validation
  • Recaptcha validation
  • SSL security

Take a peek at our powerful survey features to design surveys that scale discoveries.

Download feature sheet.

  • Hospitality
  • Academic Research
  • Customer Experience
  • Employee Experience
  • Product Experience
  • Market Research
  • Social Research
  • Data Analysis

Explore Voxco 

Need to map Voxco’s features & offerings? We can help!

Watch a Demo 

Download Brochures 

Get a Quote

  • NPS Calculator
  • CES Calculator
  • A/B Testing Calculator
  • Margin of Error Calculator
  • Sample Size Calculator
  • CX Strategy & Management Hub
  • Market Research Hub
  • Patient Experience Hub
  • Employee Experience Hub
  • NPS Knowledge Hub
  • Market Research Guide
  • Customer Experience Guide
  • Survey Research Guides
  • Survey Template Library
  • Webinars and Events
  • Feature Sheets
  • Try a sample survey
  • Professional Services

randomised experimental design

Get exclusive insights into research trends and best practices from top experts! Access Voxco’s ‘State of Research Report 2024 edition’ .

We’ve been avid users of the Voxco platform now for over 20 years. It gives us the flexibility to routinely enhance our survey toolkit and provides our clients with a more robust dataset and story to tell their clients.

VP Innovation & Strategic Partnerships, The Logit Group

  • Client Stories
  • Voxco Reviews
  • Why Voxco Research?
  • Careers at Voxco
  • Vulnerabilities and Ethical Hacking

Explore Regional Offices

  • Survey Software The world’s leading omnichannel survey software
  • Online Survey Tools Create sophisticated surveys with ease.
  • Mobile Offline Conduct efficient field surveys.
  • Text Analysis
  • Close The Loop
  • Automated Translations
  • NPS Dashboard
  • CATI Manage high volume phone surveys efficiently
  • Cloud/On-premise Dialer TCPA compliant Cloud on-premise dialer
  • IVR Survey Software Boost productivity with automated call workflows.
  • Analytics Analyze survey data with visual dashboards
  • Panel Manager Nurture a loyal community of respondents.
  • Survey Portal Best-in-class user friendly survey portal.
  • Voxco Audience Conduct targeted sample research in hours.
  • Predictive Analytics
  • Customer 360
  • Customer Loyalty
  • Fraud & Risk Management
  • AI/ML Enablement Services
  • Credit Underwriting

randomised experimental design

Find the best survey software for you! (Along with a checklist to compare platforms)

Get Buyer’s Guide

  • 100+ question types
  • SMS surveys
  • Financial Services
  • Banking & Financial Services
  • Retail Solution
  • Risk Management
  • Customer Lifecycle Solutions
  • Net Promoter Score
  • Customer Behaviour Analytics
  • Customer Segmentation
  • Data Unification

Explore Voxco 

Watch a Demo 

Download Brochures 

  • CX Strategy & Management Hub
  • The Voxco Guide to Customer Experience
  • Professional services
  • Blogs & White papers
  • Case Studies

Find the best customer experience platform

Uncover customer pain points, analyze feedback and run successful CX programs with the best CX platform for your team.

Get the Guide Now

randomised experimental design

VP Innovation & Strategic Partnerships, The Logit Group

  • Why Voxco Intelligence?
  • Our clients
  • Client stories
  • Featuresheets

COVER IMAGE 1

Value and Techniques of Randomization in Experimental Design

  • October 5, 2021

SHARE THE ARTICLE ON

What is randomization in experimental design?

Randomization in an experiment refers to a random assignment of participants to the treatment in an experiment. OR, for instance we can say that randomization is assignment of treatment to the participants randomly. 

For example : a teacher decides to take a viva in the class and randomly starts asking the students.

Here, all the participants have equal chance of getting into the experiment. Like with our example, every student has equal chance of getting a question asked by the teacher. Randomization helps you stand a chance against biases. It can be a case when you select a group using some category, there can be personal biases or accidental biases. But when the selection is random, you don’t get a chance to look into each participant and hence the groups are fairly divided. 

See Voxco survey software in action with a Free demo.

Why is randomization in experimental design important?

As mentioned earlier, randomization minimizes the biases. But apart from that it also provides various benefits when adopted as a selection method in experiments. 

  • Randomization prevents biases and makes the results fair. 
  • It makes sure that the groups made for conducting an experiments are as similar as possible to each other so that the results come out as accurate as possible. 
  • It also helps control the lurking variables which can affect the results to be different from what they are supposed to be. 
  • The sample that is randomly selected is meant to be representative of the population and since it doesn’t involve researcher’s interference, it is fairly selected. 
  • Randomizing the experiments helps you get the best cause-effect relationships between the variables. 
  • It makes sure that the random selection is done from all genders, casts, races and the groups are not too different from each other. 
  • Researchers control values of the explanatory variable with a randomization procedure. So, if we see a relationship between the explanatory variable and response variables, we can say that it is a causal one.

What are different types of randomization techniques in experimental design?

Randomization can be subject to errors when it comes to “randomly” selecting the participants. As for our example, the teacher surely said she will ask questions to random students, but it is possible that she might subconsciously target mischievous students. This means we think the selection is random, but most of the times it isn’t. 

Hence, to avoid these unintended biases, there are three techniques that researchers use commonly:

Simple Random Sampling

SIMPLE RANDOM SAMPLING

In simple random sampling. The selection of the participants is completely luck and probability based. Every participant has an equal chance of getting into the sample. 

This method is theoretically easy to understand and works best against a sample size of 100 or more. The main factor here is that every participants gets an equal chance of being included in a treatment, and this is why it is also called the method of chances. 

Methods of simple random sampling:

  • Lottery – Like the old ways, the participants are given a number each. The selection is done by randomly drawing a number from the pot. 
  • Random numbers – Similar to the lottery method, this includes giving numbers to the participants and using random number table.

Example : A teacher wants to know how good her class is in mathematics. So she will give each student a number and will draw numbers from a bunch of chits. This will include a randomly selected sample size and It won’t have any biases depending on teachers interference. 

Customer experience

Permuted Block Randomization

It is a method of randomly assigning participants to the treatment groups. A block is a group is randomly ordered treatment group. All the blocks have a fair balance of treatment assignment throughout. 

Example : A teacher wants to enroll student in two treatments A and B. and she plans to enroll 6 students per week. The blocks would look like this:

Week 1- AABABA

Week 2- BABAAB

Week 3- BBABAB

Each block has 9 A and 9 B. both treatments have been balanced even though their ordering is random. 

There are two types of block assignment in permuted block randomization:

  • Random number generator

Generate a random number for each treatment that is assign in the block. In our example, the block “Week 1” would look like- A(4), A(5), B(56), A(33), B(40), A(10)

Then arrange these treatments according to their number is ascending order, the new treatment could be- AAABB

  • Permutations

This includes listing the permutations for the block. Simply, writing down all possible variations. 

The formula is b! / ((b/2)! (b/2)!)

For our example, the block sixe is 6, so the possible arrangements would be:

6! / ((6/2)! (6/2)!)

6! / (3)! x (3)!

6x5x4x3x2x1 / (3x2x1) x (3x2x1)

20 possible arrangements. 

Stratified Random Sampling

STRATIFIED RANDOM SAMPLING

The word “strata” refers to characteristics. Every population has characteristics like gender, cast, age, background etc. Stratified random sampling helps you consider these stratum while sampling the population. The stratum can be pre-defined or you can define them yourself any way you think is best suitable for your study. 

Example: you want to categorize population of a state depending on literacy. Your categories would be- (1) Literate (2) Intermediate (3) Illiterate. 

Steps to conduct stratified random sampling: 

  • Define the target audience.
  • Identify the stratification variables and decide the number of strata to be used.
  • Using a pre-existent sampling frame or by creating a frame that includes all the information of the stratification variable for the elements in the target audience.
  • Make changes after evaluating the sampling frame depending on its coverage.
  • Each stratum should be unique and should cover each and every member of the population. 
  • Assign a random, unique number to each element.
  • Define the size of each stratum according to your requirement. 
  • The researcher can then select random elements from each stratum to form the sample. 

Explore all the survey question types possible on Voxco

Explore Voxco Survey Software

Online page new product image3 02.png 1

+ Omnichannel Survey Software 

+ Online Survey Software 

+ CATI Survey Software 

+ IVR Survey Software 

+ Market Research Tool

+ Customer Experience Tool 

+ Product Experience Software 

+ Enterprise Survey Software 

A detailed guide on how to evaluate a research report

How to Evaluate a Research Report with Precision: Bridging the Gap SHARE THE ARTICLE ON Table of Contents Why do you need a research report?

Correlation vs Causation1

Correlation vs Causation

Correlation vs Causation The Ultimate Guide to Cluster Sampling Get a step-by-step guide for choosing the correct representative sample for survey research. Download Now SHARE

CUSTOMER SATISFACTION SURVEYS1

Achieving Customer Satisfaction Through Customer Servicing

ACHIEVING CUSTOMER SATISFACTION THROUGH CUSTOMER SERVICING Voxco is trusted by 450+ Global Brands in 40+ countries See what question types are possible with a sample

HOW TO CHOOSE THE RIGHT SURVEY TOOL

Are surveys qualitative or quantitative research?

Are surveys qualitative or quantitative research? Transform your insight generation process Use our in-depth online survey guide to create an actionable feedback collection survey process.

anita qa feature 400x250 1

Q&A With Voxco’s Product Manager: Anita Butani

This week, we sat down with our colleague, Product Manager, Anita Butani to learn more about her perspective on Voxco’s product offerings and future directions

download 5

Margin of Error

Margin of Error: How TO Find and Reduce It? SHARE THE ARTICLE ON Table of Contents Do you wonder if your survey results are close

We use cookies in our website to give you the best browsing experience and to tailor advertising. By continuing to use our website, you give us consent to the use of cookies. Read More

Name Domain Purpose Expiry Type
hubspotutk www.voxco.com HubSpot functional cookie. 1 year HTTP
lhc_dir_locale amplifyreach.com --- 52 years ---
lhc_dirclass amplifyreach.com --- 52 years ---
Name Domain Purpose Expiry Type
_fbp www.voxco.com Facebook Pixel advertising first-party cookie 3 months HTTP
__hstc www.voxco.com Hubspot marketing platform cookie. 1 year HTTP
__hssrc www.voxco.com Hubspot marketing platform cookie. 52 years HTTP
__hssc www.voxco.com Hubspot marketing platform cookie. Session HTTP
Name Domain Purpose Expiry Type
_gid www.voxco.com Google Universal Analytics short-time unique user tracking identifier. 1 days HTTP
MUID bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 1 year HTTP
MR bat.bing.com Microsoft User Identifier tracking cookie used by Bing Ads. 7 days HTTP
IDE doubleclick.net Google advertising cookie used for user tracking and ad targeting purposes. 2 years HTTP
_vwo_uuid_v2 www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie. 1 year HTTP
_vis_opt_s www.voxco.com Generic Visual Website Optimizer (VWO) user tracking cookie that detects if the user is new or returning to a particular campaign. 3 months HTTP
_vis_opt_test_cookie www.voxco.com A session (temporary) cookie used by Generic Visual Website Optimizer (VWO) to detect if the cookies are enabled on the browser of the user or not. 52 years HTTP
_ga www.voxco.com Google Universal Analytics long-time unique user tracking identifier. 2 years HTTP
_uetsid www.voxco.com Microsoft Bing Ads Universal Event Tracking (UET) tracking cookie. 1 days HTTP
vuid vimeo.com Vimeo tracking cookie 2 years HTTP
Name Domain Purpose Expiry Type
__cf_bm hubspot.com Generic CloudFlare functional cookie. Session HTTP
Name Domain Purpose Expiry Type
_gcl_au www.voxco.com --- 3 months ---
_gat_gtag_UA_3262734_1 www.voxco.com --- Session ---
_clck www.voxco.com --- 1 year ---
_ga_HNFQQ528PZ www.voxco.com --- 2 years ---
_clsk www.voxco.com --- 1 days ---
visitor_id18452 pardot.com --- 10 years ---
visitor_id18452-hash pardot.com --- 10 years ---
lpv18452 pi.pardot.com --- Session ---
lhc_per www.voxco.com --- 6 months ---
_uetvid www.voxco.com --- 1 year ---

Teach yourself statistics

Randomized Block Designs

This lesson begins our discussion of randomized block experiments . The purpose of this lesson is to provide background knowledge that can help you decide whether a randomized block design is the right design for your study. Specifically, we will answer four questions:

  • What is a blocking variable?
  • What is blocking?
  • What is a randomized block experiment?
  • What are advantages and disadvantages of a randomized block experiment?

We will explain how to analyze data from a randomized block experiment in the next lesson: Randomized Block Experiments: Data Analysis .

Note: The discussion in this lesson is confined to randomized block designs with independent groups . Randomized block designs with repeated measures involve some special issues, so we will discuss the repeated measures design in a future lesson.

What is a Blocking Variable?

In a randomized block experiment, a good blocking variable has four distinguishing characteristics:

  • It is included as a factor in the experiment.
  • It is not of primary interest to the experimenter.
  • It affects the dependent variable.
  • It is unrelated to independent variables in the experiment.

A blocking variable is a potential nuisance variable - a source of undesired variation in the dependent variable. By explicitly including a blocking variable in an experiment, the experimenter can tease out nuisance effects and more clearly test treatment effects of interest.

Warning: If a blocking variable does not affect the dependent variable or if it is strongly related to an independent variable, a randomized block design may not be the best choice. Other designs may be more efficient.

What is Blocking?

Blocking is the technique used in a randomized block experiment to sort experimental units into homogeneous groups, called blocks . The goal of blocking is to create blocks such that dependent variable scores are more similar within blocks than across blocks.

For example, consider an experiment designed to test the effect of different teaching methods on academic performance. In this experiment, IQ is a potential nuisance variable. That is, even though the experimenter is primarily interested in the effect of teaching methods, academic performance will also be affected by student IQ.

To control for the unwanted effects of IQ, we might include IQ as a blocking variable in a randomized block experiment. We would assign students to blocks, such that students within the same block have the same (or similar) IQ's. By holding IQ constant within blocks, we can attribute within-block differences in academic performance to differences in teaching methods, rather than to differences in IQ.

What is a Randomized Block Experiment?

A randomized block experiment with independent groups is distinguished by the following attributes:

  • The design has one or more factors (i.e., one or more independent variables ), each with two or more levels .
  • Treatment groups are defined by a unique combination of non-overlapping factor levels.
  • Experimental units are randomly selected from a known population .
  • Each experimental unit is assigned to one block, such that variability within blocks is less than variability between blocks.
  • The number of experimental units within each block is equal to the number of treatment groups.
  • Within each block, each experimental unit is randomly assigned to a different treatment group.
  • Each experimental unit provides one dependent variable score.

The table below shows the layout for a typical randomized block experiment.

  T T T T
B X X X X
B X X X X
B X X X X
B X X X X
B X X X X

In this experiment, there are five blocks ( B i  ) and four treatment levels ( T j  ). Dependent variable scores are represented by X  i, j  , where X  i, j is the score for the subject in block i who received treatment j .

Advantages and Disadvantages

With respect to analysis of variance, a randomized block experiment with independent groups has advantages and disadvantages. Advantages include the following:

  • With an effective blocking variable - a blocking variable that is strongly related to the dependent variable but not related to the independent variable(s) - the design can provide more precision than other independent groups designs of comparable size.
  • The design works with any number of treatments and blocking variables.

Disadvantages include the following:

  • When the experiment has many treatment levels, it can be hard to form homogeneous blocks.
  • With an ineffective blocking variable - a blocking variable that is weakly related to the dependent variable or strongly related to one or more independent variables - the design may provide less precision than other independent groups designs of comparable size.
  • The design assumes zero interaction between blocks and treatments. If an interaction exists, tests of treatment effects may be biased.

Test Your Understanding

Which, if any, of the following attributes does not describe a good blocking variable?

(A) It is included as a factor in the experiment. (B) It is not of primary interest to the experimenter. (C) It affects the dependent variable. (D) It affects the independent variable. (E) All of the attributes describe a good blocking variable.

The correct answer is (D).

A good blocking variable is not related to an independent variable. When the blocking variable and treatment variable are related, tests of treatment effects may be biased.

Why would an experimenter choose to use a randomized block design?

(A) To test the effect of a blocking variable on a dependent variable. (B) To assess the interaction between a blocking variable and an independent variable. (C) To control unwanted effects of a suspected nuisance variable. (D) None of the above. (E) All of the above.

The correct answer is (C).

The blocking variable is not of primary interest to an experimenter, so the experimenter would not choose a randomized block design to test the effect of a blocking variable. A randomized block design assumes that there is no interaction between a blocking variable and an independent variable, so the experimenter would not choose a randomized block design to test the interaction effect. A full factorial experiment would be a better choice to accomplish either of these objectives.

A blocking variable is a potential nuisance variable - a source of undesired variation in the dependent variable. By explicitly including a blocking variable in an experiment, the experimenter can tease out nuisance effects and more clearly test treatment effects of interest. Thus, an experimenter might choose a randomized block design to control unwanted effects of a suspected nuisance variable.

  • Introduction
  • Conclusions
  • Article Information

RCT indicates randomized clinical trial.

Each circle shows the summary odds ratio (OR) obtained from a meta-analysis of randomized clinical trials (RCTs; vertical axis) and nonrandomized studies (NRSs; horizonal axis) for 1 clinical question. An OR less than 1 indicates a beneficial effect. The solid orange line indicates perfect agreement (exact same summary OR obtained from randomized and nonrandomized studies) and the dashed orange lines indicate substantial disagreement (OR obtained from randomized studies is at most one-half of the OR obtained from nonrandomized studies, or vice versa). Results for alternative cutoff values for substantial disagreement are provided in eTable 2 in Supplement 1 . Circles in the upper left quadrant show meta-analyses where NRS evidence indicates a beneficial effect (summary OR <1) and RCT evidence a detrimental effect (summary OR >1), and circles in the bottom right quadrant show meta-analyses where NRS evidence indicates a detrimental effect (summary OR >1) and RCT evidence a beneficial effect (summary OR <1). Circles in the upper right quadrant show meta-analyses where both NRS and RCT evidence indicate a detrimental effect; circles above the solid orange line indicate a larger detrimental effect size in RCTs and circles below the solid orange line indicate a larger detrimental effect size in NRSs. Circles in the bottom left quadrant show meta-analyses where both NRS and RCT evidence indicate a beneficial effect; circles above the solid orange line indicate a larger beneficial effect size in NRS and circles below the solid orange line indicate a larger beneficial effect size in RCTs.

The figure shows proportions of meta-analyses based on the statistical conclusions about the existence of a therapeutic benefit drawn from NRS or RCT evidence. A favorable or detrimental effect was deemed to exist if the 95% CI of the summary odds ratio did not include 1. Evidence was considered inconclusive if the 95% CI of the summary odds ratio included 1.

The figure shows the ratio of odds ratios (ROR) comparing effect size estimates obtained from nonrandomized studies (NRSs) with effect size estimates obtained from randomized clinical trials (RCTs) and heterogeneity parameters (φ, between–meta-analysis heterogeneity; κ, increase in within–meta-analysis heterogeneity). Results are shown for all meta-analyses, followed by subgroup analyses by type of NRS, different types of outcomes, types of comparators, matching quality of RCTs and NRSs in the same meta-analysis, and high-quality publications. MA indicates meta-analyses.

eAppendix 1. Search Strategy

eAppendix 2. Extracted Data From Source Meta-Analyses

eAppendix 3. Subgroup Analyses

eReferences.

eTable 1. Characteristics of Included Meta-Analyses

eTable 2. Results for Measures of Discrepancy Between Nonrandomized Studies and RCTs

eFigure. Results From Additional Subgroup Analyses for Study-Level Characteristics

Data Sharing Statement

See More About

Sign up for emails based on your interests, select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

  • Download PDF
  • X Facebook More LinkedIn

Salcher-Konrad M , Nguyen M , Savović J , Higgins JPT , Naci H. Treatment Effects in Randomized and Nonrandomized Studies of Pharmacological Interventions : A Meta-Analysis . JAMA Netw Open. 2024;7(9):e2436230. doi:10.1001/jamanetworkopen.2024.36230

Manage citations:

© 2024

  • Permissions

Treatment Effects in Randomized and Nonrandomized Studies of Pharmacological Interventions : A Meta-Analysis

  • 1 Department of Health Policy, London School of Economics and Political Science, London, United Kingdom
  • 2 World Health Organization Collaborating Centre for Pharmaceutical Pricing and Reimbursement Policies, Pharmacoeconomics Department, Gesundheit Österreich GmbH (GÖG)/Austrian National Public Health Institute, Vienna, Austria
  • 3 Department of Family and Community Medicine, University of California, San Francisco
  • 4 Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
  • 5 National Institute for Health and Care Research Applied Research Collaboration West, University Hospitals Bristol and Weston National Health Service Foundation Trust, Bristol, United Kingdom

Question   How do treatment effects for drugs compare when obtained from nonrandomized vs randomized studies?

Findings   In this meta-analysis of 2746 primary studies in 346 meta-analyses using a meta-epidemiological framework, there was no strong evidence of systematic overestimation or underestimation of treatment effects. However, disagreements between nonrandomized and randomized studies were beyond chance in 15.6% of meta-analyses, and the 2 study types led to different statistical conclusions about the therapeutic effect of drug interventions in 37.6% of meta-analyses.

Meaning   These findings suggest that relying on nonrandomized studies as substitutes for randomized clinical trials may introduce additional uncertainty about the therapeutic effects of new drugs.

Importance   Randomized clinical trials (RCTs) are widely regarded as the methodological benchmark for assessing clinical efficacy and safety of health interventions. There is growing interest in using nonrandomized studies to assess efficacy and safety of new drugs.

Objective   To determine how treatment effects for the same drug compare when evaluated in nonrandomized vs randomized studies.

Data Sources   Meta-analyses published between 2009 and 2018 were identified in MEDLINE via PubMed and the Cochrane Database of Systematic Reviews. Data analysis was conducted from October 2019 to July 2024.

Study Selection   Meta-analyses of pharmacological interventions were eligible for inclusion if both randomized and nonrandomized studies contributed to a single meta-analytic estimate.

Data Extraction and Synthesis   For this meta-analysis using a meta-epidemiological framework, separate summary effect size estimates were calculated for nonrandomized and randomized studies within each meta-analysis using a random-effects model and then these estimates were compared. The reporting of this study followed the Guidelines for Reporting Meta-Epidemiological Methodology Research and relevant portions of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guideline.

Main Outcome and Measures   The primary outcome was discrepancies in treatment effects obtained from nonrandomized and randomized studies, as measured by the proportion of meta-analyses where the 2 study types disagreed about the direction or magnitude of effect, disagreed beyond chance about the effect size estimate, and the summary ratio of odds ratios (ROR) obtained from nonrandomized vs randomized studies combined across all meta-analyses.

Results   A total of 346 meta-analyses with 2746 studies were included. Statistical conclusions about drug benefits and harms were different for 130 of 346 meta-analyses (37.6%) when focusing solely on either nonrandomized or randomized studies. Disagreements were beyond chance for 54 meta-analyses (15.6%). Across all meta-analyses, there was no strong evidence of consistent differences in treatment effects obtained from nonrandomized vs randomized studies (summary ROR, 0.95; 95% credible interval [CrI], 0.89-1.02). Compared with experimental nonrandomized studies, randomized studies produced on average a 19% smaller treatment effect (ROR, 0.81; 95% CrI, 0.68-0.97). There was increased heterogeneity in effect size estimates obtained from nonrandomized compared with randomized studies.

Conclusions and Relevance   In this meta-analysis of treatment effects of pharmacological interventions obtained from randomized and nonrandomized studies, there was no overall difference in effect size estimates between study types on average, but nonrandomized studies both overestimated and underestimated treatment effects observed in randomized studies and introduced additional uncertainty. These findings suggest that relying on nonrandomized studies as substitutes for RCTs may introduce additional uncertainty about the therapeutic effects of new drugs.

Randomized clinical trials (RCTs), in which participants are randomly assigned to treatments, are widely regarded as the methodological benchmark for assessing the clinical efficacy and safety of drugs. 1 , 2 When designed, conducted, analyzed, and reported adequately, RCTs minimize bias and can therefore provide regulatory bodies, payers, clinicians, and patients with robust evidence on what treatments work. In contrast with RCTs, treatment assignment in nonrandomized studies (NRSs) is influenced by the patient, the clinician, or the setting. Despite their higher generalizability, NRSs are more susceptible to bias due to confounding and to selection bias. 3 Consequently, discrepancies may emerge between the results of RCTs and NRSs.

The internal validity of NRSs has recently attracted renewed interest due to a growing enthusiasm for using NRSs when making decisions about new drugs. Drug regulatory agencies and health technology assessment bodies in the US and Europe are actively exploring the feasibility and validity of utilizing NRSs, including data collected outside of clinical trials (ie, observational data). 4 - 7 While NRSs have traditionally been used as a complement to RCTs, there is interest in potentially substituting or replacing RCTs with well-conducted NRSs. 8

Previous research 9 - 18 has examined the comparability of treatment effect size estimates between RCTs and NRSs, yielding varied findings. However, the most recent comprehensive review, 12 encompassing 45 clinical questions and 408 individual studies, was published more than 20 years ago. Most published studies focused on selected therapeutic areas, limiting the generalizability of their findings. Most recently, replication studies for highly selected clinical questions with good data availability have identified a general alignment between RCTs and their nonrandomized emulations, although disagreements in results were observed in approximately one-quarter of the cases. 19 A comprehensive review of potential discrepancies between treatment effects of RCTs and NRSs is needed. In this study, our primary objective was to assess and compare treatment effects of the same drug when evaluated in NRSs vs RCTs.

The study protocol for this meta-analysis using a meta-epidemiological framework was registered on PROSPERO ( CRD42018062204 ). The reporting of this study followed the Guidelines for Reporting Meta-Epidemiological Methodology Research by Murad et al 20 and relevant portions of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses ( PRISMA ) reporting guideline. 21

We identified clinical questions for which meta-analyses including at least 1 RCT and 1 NRS were conducted to obtain estimates of the effectiveness of pharmacological treatments as defined in the participants, interventions, comparators, and outcomes (PICO) framework. Clinical questions with potentially eligible meta-analyses were identified through 3 sources: (1) a database search in MEDLINE (via PubMed) for existing meta-epidemiological studies comparing RCTs with NRSs, (2) a database search in MEDLINE (via PubMed) for systematic reviews including both RCTs and NRSs, and (3) a review of all systematic reviews indexed in the Cochrane Database of Systematic Reviews that included both RCTs and NRSs. We only included records published from 2009 to 2018 to cover clinical questions from the last decade (our original plan was to cover 2000-2018). Details of the database searches are available in eAppendix 1 in Supplement 1 .

We included only clinical questions where RCTs and NRS contributed to a single meta-analytic estimate, following the within–meta-analyses approach for meta-epidemiological studies. 22 We therefore capitalized on the subject matter expertise of researchers conducting meta-analysis in their area of interest and who judged RCTs and NRSs to be sufficiently similar to each with other with respect to study participants, intervention, comparator, and outcome to provide evidence on a drug’s benefits or harms. Systematic reviews where RCTs and NRS were meta-analyzed separately were excluded.

Potential source systematic reviews containing such meta-analyses, as identified through database searches, were screened at the title and abstract level independently by 2 reviewers (M.S.K. and a research assistant). Conflicting decisions were resolved by consensus. Full texts of remaining records were screened by 1 reviewer (M.N. or M.S.K.), after double screening of a 10% sample of records showed almost perfect agreement (κ = 0.85).

For each included source systematic review, we selected 1 meta-analysis for data extraction. We extracted data for the meta-analysis of the primary outcome. In cases where the meta-analysis of the primary outcome did not include both RCTs and NRSs, we extracted the next most prominently presented outcome with the highest number of contributing RCTs and NRSs. We identified possible double-counting of original studies included in the identified meta-analyses on the basis of unique identifiers. 23 While original studies were eligible to contribute to several meta-analyses (eg, meta-analyses of the same intervention but measuring different outcomes), within each meta-analysis, only unique individual studies were included.

Meta-analysis–level and study-level information were extracted from source systematic reviews using a prespecified spreadsheet by a single researcher (M.N.). We used a guidebook with instructions for each item and data extraction was checked by a second researcher (M.S.K.) for approximately 10% of meta-analyses. Where possible, we used prespecified categories for study design characteristics (eAppendix 2 in Supplement 1 ).

We based the categorization of study designs on typologies used in previous meta-epidemiological reviews. 13 , 24 We distinguished between RCTs and NRSs, where the former was defined by the use of a random sequence to allocate study participants to intervention and control groups, and the latter by the absence of such a random sequence. We relied on the assessment made by the authors of the source reviews whether a study should be categorized as an RCT or NRS.

For NRSs, we further distinguished between experimental and observational designs, a categorization also applied by others. 13 , 25 - 27 Experimental NRSs are studies in which the investigator has some control over study conditions, including the allocation of participants into treatment and control groups (eg, clinical trials where the allocation mechanism falls short of true randomization or where allocation is by patient or physician preference). Observational NRSs lack the experimental intention of experimental NRSs, exploiting natural variation in the use of interventions to evaluate patient outcomes.

All effect size estimates were converted into log odds ratios (ORs) and coded so that an OR less than 1 indicated a beneficial effect of the drug under investigation. For meta-analyses reporting continuous outcomes, we first converted these into standardized mean differences (SMDs) 28 and then to ORs. 29 For meta-analyses with active comparators, we identified which drug was considered experimental through the descriptions provided by the authors of the source review or through web searches in cases where this could not be determined with certainty from the source review.

In descriptive analyses, we first plotted the summary estimates for NRSs and RCTs conducted for the same clinical question and reported the number of meta-analyses for which the NRS and RCT effect size estimates, respectively, were more favorable. Within each meta-analysis, we calculated the summary estimates and 95% CIs of NRSs and RCTs, respectively, using a random-effects Hartung-Knapp-Sidik-Jonkman meta-analysis model to take into account between-study heterogeneity. 30 , 31

We reported 4 measures of discrepancy. First, we reported the frequency of substantial disagreement, operationalized as the summary OR obtained from one type of study being twice as favorable as the other (ie, OR obtained from one study type was at most one-half the OR obtained from the other study type). 12 We also considered alternative cutoff values (differences in summary OR by 50% and 10%). Second, we reported the frequency of discrepancies in the summary logOR being beyond what would be expected by chance alone at the 5% significance level. 12 We compared the summary logORs for the NRS and RCT for each meta-analysis using the equation:

log ROR  = log(OR NRS ) − log(OR RCT ),

where ROR is the ratio of odds ratios, and then computed a 95% CI using standard error (SE) of logROR using the equation:

and compared these CIs with the null value of logROR = 0. Third, we reported the frequency of meta-analyses for which the summary estimates of NRSs and RCTs, respectively, led to different statistical conclusions. A different statistical conclusion was considered to be reached if one study type produced a meta-analytic result with 95% CI excluding an OR of 1 in a particular direction and the other study type did not. Contradictory treatment effects were considered to occur when a 95% CI for the meta-analytic OR for NRSs was entirely less than 1 while that for the meta-analytic OR for an RCTs was entirely greater than 1, or vice versa. This analysis did not account for differences in sample sizes between the 2 study types. Fourth, in the main, prespecified analysis, we quantified discrepancies between NRSs and RCTs through a 2-stage meta-analysis to obtain RORs for treatment effects obtained from NRSs vs RCTs. 32 The analysis was implemented in a bayesian framework, with noninformative prior distributions for the discrepancy of treatment effects between NRS and RCTs. 33 We also quantified the variation of discrepant treatment effects between NRS and RCT results across meta-analyses using the between–meta-analysis SD in discrepancies (φ) and the variation of discrepancies across studies within meta-analyses using the between-study SD in discrepancies (κ). 34 , 35 These measures indicate variation in effect size estimates obtained from different study designs; higher values indicate a wider spread in the magnitude of discrepancies between the 2 study types across meta-analyses (φ) and across individual studies within meta-analyses (κ).

Other measures for assessing discrepancies in treatment effects exist, such as correlation and concordance coefficients and the absolute ROR. 10 , 12 , 14 , 15 , 17 , 34 , 36 - 38 We focused on measures that we deemed important from a clinical or regulatory decision-making perspective (ie, that provide estimates of both absolute and relative discrepancies, potential differences in statistical conclusions drawn, and direction of deviation).

Analyses were implemented in Stata version 13.1 (StataCorp) and WinBUGS version 1.4.3 (Imperial College and Medical Research Council). Analysis was conducted from October 2019 to July 2024.

Subgroup analyses were conducted for prespecified characteristics at the meta-analysis level and study level. Additional subgroup analysis to explore heterogeneity in the discrepancy in treatment effects in RCTs vs NRSs was conducted by data source of NRSs, type of control in NRSs, therapeutic area, how well matched RCTs and NRSs included in a meta-analysis were, and methodological quality of source meta-analyses. Study-level characteristics were often not reported in detail in source meta-analyses, resulting in small sample sizes for most subgroups. We therefore only report the results of subgroup analyses for selected characteristics (details in eAppendix 3 in Supplement 1 ). In a post hoc sensitivity analysis, we restricted our sample to meta-analyses where NRSs were published before the first RCT.

A total of 10 957 records were screened at the title and abstract level, and 830 were reviewed in full, resulting in a total of 336 14 , 39 - 373 included records ( Figure 1 ). These 336 records contributed 346 unique meta-analyses (2 meta-epidemiological studies 14 , 174 contributed more than 1 meta-analysis), with 2746 contributing individual studies (median [range] 3 [1-92] RCTs with a median [range] 100 [5-235 600] participants and median [range] 2 [1-44] NRSs with a median [range] 195 [6-2 145 593] participants per meta-analysis). Characteristics of included meta-analyses are presented in eTable 1 in Supplement 1 and summarized in the Table .

Discrepancies between treatment effects are displayed in Figure 2 , which shows the effect size estimates obtained from RCTs and NRSs for all 346 meta-analyses. NRSs gave a more favorable effect (ie, a lower summary OR) for 186 meta-analyses (53.8%), and RCTs gave a more favorable effect for 158 meta-analyses (45.7%). Results for all measures of discrepancy are summarized in the eTable 2 in Supplement 1 . For 121 meta-analyses (35.0%), the OR obtained from one study type was twice as large or more (or one-half the OR or less) than the other, including 65 (18.8% of all meta-analyses) where NRSs indicated a substantially more beneficial effect and 56 (16.2%) where RCTs indicated a substantially more beneficial effect ( Figure 2 ). Disagreement between study types was beyond chance for 54 meta-analyses (15.6%), including 30 (8.7%) where the OR obtained from NRSs was more beneficial, and 24 (6.9%) where the OR obtained from RCTs was more beneficial. In a subgroup analysis that only included experimental NRSs, the OR from one study type was twice as favorable as the other for 55 meta-analyses (45.1% of all meta-analyses including experimental NRSs), including 36 (29.5%) where the OR obtained from experimental NRSs was one-half the OR of RCTs or less. Disagreement between study types was beyond chance for 31 meta-analyses (25.4%) with experimental NRS. The subgroup analysis for observational studies showed lower frequencies of discrepancies (eTable 2 in Supplement 1 ).

RCTs and NRSs led to different statistical conclusions about the therapeutic benefit of pharmacological interventions in 130 meta-analyses (37.6%) and 216 (62.4%) reached the same statistical conclusion, based on comparing 95% CIs around the OR from either study type with a null effect ( Figure 3 ). In 69 meta-analyses (19.9%), NRSs showed a favorable effect while evidence obtained from RCTs was inconclusive and in 33 meta-analyses (9.5%), RCTs showed a favorable effect while the NRS evidence was inconclusive. Contradictory treatment effects were observed in 4 meta-analyses (1.2%).

In the main analysis, there was no evidence of a difference between effect size estimates obtained from NRSs vs RCTs on average when combining discrepancies across all 346 meta-analyses (ROR, 0.95; 95% credible interval [CrI], 0.89-1.02) ( Figure 4 ). In subgroup analyses, effect size estimates obtained from experimental NRSs were more favorable compared with RCTs (ROR, 0.81; 95% CrI, 0.68-0.97), overestimating RCT estimates by 19%, while no difference was observed between observational NRS and RCTs (ROR, 0.98; 95% CrI, 0.87-1.06).

Variation in the discrepancy of treatment effects was present between studies within meta-analyses (κ = 0.22) and between meta-analyses (φ = 0.26). Variation between meta-analyses was reduced for meta-analyses measuring mortality (φ = 0.11) compared with other objective outcomes (φ = 0.34) or subjective outcomes (φ = 0.28). There were no systematic differences in between-meta-analysis variation (φ) or within-meta-analysis variation (κ) for the other characteristics at meta-analysis level.

Study-level data regarding analytical methods and data sources used in NRSs were only available for a subset of meta-analyses. Between-meta-analysis variation (φ) and within-meta-analysis variation (κ) were reduced for studies using propensity score methods compared with other analytical methods (eFigure in Supplement 1 ).

In 146 meta-analyses (42.2%), the first NRS was published before the first RCT. In this subset of meta-analyses, findings were consistent with the overall sample (eTable 2 in Supplement 1 ). In 53 of the 146 meta-analyses (36.3%), the summary OR was twice as favorable for one study type vs the other; in 31 meta-analyses (21.2%), the discrepancy in summary OR was beyond chance, while 50 (34.2%) reached different statistical conclusions and the ROR was 0.95 (95% CrI, 0.83-1.08) ( Figure 4 ).

This meta-analysis of 346 clinical questions using a meta-epidemiological framework did not uncover any systematic underestimation or overestimation of treatment effects in NRSs when compared with RCTs. However, this overall finding masks substantial variability in the observed differences between treatment effects derived from the 2 study types. A considerable number of meta-analyses exhibited discrepancies in effect size estimates, with some cases showing effect size estimates differing by a factor of 2 or more. Estimates of the variation in discrepancies show that decision-makers face uncertainty around both the direction and magnitude of potential disagreement between RCTs and NRSs; NRSs both overestimated and underestimated treatment effects observed in randomized studies.

Our study extends previous research investigating the comparability of treatment effects derived from RCTs and NRSs. 18 In particular, it provides findings across a broad range of therapeutic areas, reflecting how NRSs were designed and implemented for the clinical questions included and quantifies uncertainty associated with treatment effects derived from NRSs. Previous meta-epidemiological reviews yielded mixed results, 9 - 13 with varying factors such as outcome types, 24 study timing, 14 and analytical methods in NRSs contributing to discrepancies across reviews. 15 , 18 , 375 , 376 In our study, 37.6% of meta-analyses reached different statistical conclusions regarding the effectiveness of a drug depending on the type of study design considered, and 62.4% reached the same statistical conclusion. This finding broadly aligns with a recent study 19 that sought to emulate highly selected RCTs using administrative data, yielding concordant conclusions for 56% of emulated trials. Our approach was different from this study 19 and other observational studies 377 aiming to emulate RCTs by design using the target trial approach. By applying strict criteria to emulate RCTs, these observational studies aim to obtain the same estimand of effectiveness as the target trial. Other NRSs do not necessarily aim to replicate RCTs, and discrepancies in effect size estimates may reflect differences in study design, implementation, and populations. From a decision-maker’s perspective, what matters is the availability of clinical evidence; in situations with uncertainty about the effectiveness of a treatment, NRSs of any design are likely to inform decision-making. Target trial emulation studies apply advanced methodological standards, but there are important data limitations to implement them. 378 While they are becoming more common, 379 they represent a small subset of all NRSs evaluating treatment effects. It is therefore important to understand how the body of evidence from NRSs overall compares with evidence obtained from RCTs.

We also provide novel evidence on how different types of study designs, analytical methods, and data used in NRSs perform when compared against RCTs. We found that effect size estimates obtained from experimental NRSs were systematically more favorable than those obtained from RCTs (overestimating RCT estimates by 19%). Experimental NRSs share important validity traits with RCTs, such as a controlled environment for administering the treatment and strict participant inclusion criteria. Nevertheless, the absence of random participant allocation in these studies can introduce bias through confounding. Experimental NRSs showed at least twice as favorable treatment effects as RCTs for 45.1% of meta-analyses.

Our study has important policy implications. NRSs are playing an increasingly important role in influencing decisions about the approval and reimbursement of new drugs. 380 - 383 Between 2015 and 2017, approximately 18% of new drugs gained approval in the US based on NRSs, up from just 6% between 1995 and 1997. 384 In draft guidance, the US Food and Drug Administration FDA names observational data as potentially suitable evidence for drug approval, replacing the previously used standard of 2 independent clinical studies. 385 It is therefore important to understand the benefits and risks of relying on NRSs for the evaluation of new drugs. While we found overall no systematic difference in treatment effects obtained from randomized and observational studies, there was considerable disagreement about therapeutic benefit (eTable 2 in Supplement 1 ).

Our study has implications for practice. Although RCTs are the mainstay of clinical practice guidelines, there are valid concerns about their cost and complexity. 386 RCTs may also be at high risk of bias due to problems with their design, conduct, analysis, and reporting. 387 Despite these concerns, our findings underline their importance because the conclusions about a drug’s effect may differ when based on NRSs. In our study, the statistical conclusions about a drug’s treatment effect were different for almost 4 in 10 clinical questions. In the past, medical reversals occurred because RCTs provided conclusive evidence about the benefits and harms of long-standing medical practices that were based on evidence obtained from NRSs. 388 - 392 Yet, there appears to be a limited effort to simplify the design and conduct of RCTs. As the push toward NRSs gains more traction, it could potentially impede the necessary progress required to improve the feasibility of RCTs. 393

This study has limitations. This is an observational study which limits causal interpretation of results. 22 We included 346 distinct clinical questions that were the subject of meta-analyses published from 2009 to 2018. While this represents, to our knowledge, the largest sample of clinical questions in a meta-epidemiological study comparing RCTs and NRSs, more recent clinical questions, in particular those relating to COVID-19, 17 were not included.

We included only meta-analyses where researchers combined both RCTs and NRSs in the same meta-analysis. While the 2 designs may not study the same estimand, the fact that they are pooled in the same meta-analysis suggests that the researchers considered them both to provide relevant evidence for decision-makers about whether the treatment is effective or harmful. It is therefore important to understand how their effect size estimates compare. The methodological decision to include meta-analyses where RCTs and NRSs were combined likely resulted in a sample more representative of clinical questions with overall limited levels of evidence (otherwise, only RCTs would be expected to be included in a meta-analysis). Including both study types in the same meta-analysis may also reflect limited methodological understanding of the authors of source meta-analyses, but our conclusions did not change when restricting our sample to meta-analyses conducted by Cochrane groups or those published in high-impact journals. Excluding clinical questions where researchers determined that there were substantial differences between the 2 study types—possibly due to observed differences in results—may have resulted in an underestimation of the true difference between treatment effects obtained from RCTs and NRSs.

In this meta-analysis using a meta-epidemiological framework, we found substantial disagreements between nonrandomized and randomized studies about the magnitude of effect and statistical conclusions about the therapeutic effect of pharmacological interventions for a large subset of clinical questions. While there was overall no systematic difference in effect size estimates obtained from NRSs vs RCTs, experimental NRSs studies produced 19% larger treatment effects compared with RCTs. Our findings suggest that caution is warranted when relying on NRSs as substitutes for RCTs.

Accepted for Publication: August 4, 2024.

Published: September 27, 2024. doi:10.1001/jamanetworkopen.2024.36230

Open Access: This is an open access article distributed under the terms of the CC-BY License . © 2024 Salcher-Konrad M et al. JAMA Network Open .

Corresponding Author: Maximilian Salcher-Konrad, MSc, Department of Health Policy, The London School of Economics and Political Science, Houghton Street, London WC2A 2AE, United Kingdom ( [email protected] ).

Author Contributions: Mr Salcher-Konrad had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Salcher-Konrad, Savović, Naci.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Salcher-Konrad.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Salcher-Konrad, Higgins.

Obtained funding: Salcher-Konrad, Naci.

Supervision: Salcher-Konrad, Naci.

Conflict of Interest Disclosures: Mr Salcher-Konrad reported receiving nonfinancial support from Medicines for Europe (travel and accommodation fees for attendance at a conference) outside the submitted work. Dr Savović reported receiving grants from the National Institute for Health and Care Research and personal fees from Core Models Ltd (to teach on an online course about basic systematic review methods) and JEMMDx Limited (to virtually attend a 1-day expert meeting to provide input into a discussion of evidence and pathway fit for the MeMed BV diagnostic test) and nonfinancial support from the University of Washington (travel expenses reimbursed for attending the Society of Research Synthesis Methods Conference in 2023 to present the development of latitudes-network.org, the development of which was supported by a grant from University of Washington) outside the submitted work. Dr Naci reported receiving grants from the Commonwealth Fund, Health Foundation, and National Institute for Health and Care Research; and personal fees from the World Health Organization and The BMJ outside the submitted work. No other disclosures were reported.

Funding/Support: This study has received funding from the European Union Horizon 2020 Research and Innovation Programme (grant agreement No. 779312 to Mr Salcher-Konrad and Drs Nguyen and Naci) and the National Institute for Health and Care Research Applied Research Collaboration West (NIHR ARC West) at University Hospitals Bristol and Weston National Health Service (NHS) Foundation Trust (to Drs. Savović and Higgins).

Role of the Funder/Sponsor: The sponsors had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: The views expressed in this article are those of the authors and do not necessarily represent those of the NHS, the NIHR, or the Department of Health and Social Care.

Data Sharing Statement: See Supplement 2 .

  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts
  • Open access
  • Published: 27 September 2024

Integrating randomized controlled trials and non-randomized studies of interventions to assess the effect of rare events: a Bayesian re-analysis of two meta-analyses

  • Yun Zhou 1 , 2   na1 ,
  • Minghong Yao 1 , 3 , 4   na1 ,
  • Fan Mei 1 , 3 , 4 ,
  • Yu Ma 1 , 3 , 4 ,
  • Jiayidaer Huan 1 , 3 , 4 ,
  • Kang Zou 1 , 3 , 4 ,
  • Ling Li 1 , 3 , 4 &
  • Xin Sun 1 , 3 , 4 , 5  

BMC Medical Research Methodology volume  24 , Article number:  219 ( 2024 ) Cite this article

Metrics details

There is a growing trend to include non-randomised studies of interventions (NRSIs) in rare events meta-analyses of randomised controlled trials (RCTs) to complement the evidence from the latter. An important consideration when combining RCTs and NRSIs is how to address potential bias and down-weighting of NRSIs in the pooled estimates. The aim of this study is to explore the use of a power prior approach in a Bayesian framework for integrating RCTs and NRSIs to assess the effect of rare events.

We proposed a method of specifying the down-weighting factor based on judgments of the relative magnitude (no information, and low, moderate, serious and critical risk of bias) of the overall risk of bias for each NRSI using the ROBINS-I tool. The methods were illustrated using two meta-analyses, with particular interest in the risk of diabetic ketoacidosis (DKA) in patients using sodium/glucose cotransporter-2 (SGLT-2) inhibitors compared with active comparators, and the association between low-dose methotrexate exposure and melanoma.

No significant results were observed for these two analyses when the data from RCTs only were pooled (risk of DKA: OR = 0.82, 95% confidence interval (CI): 0.25–2.69; risk of melanoma: OR = 1.94, 95%CI: 0.72–5.27). When RCTs and NRSIs were directly combined without distinction in the same meta-analysis, both meta-analyses showed significant results (risk of DKA: OR = 1.50, 95%CI: 1.11–2.03; risk of melanoma: OR = 1.16, 95%CI: 1.08–1.24). Using Bayesian analysis to account for NRSI bias, there was a 90% probability of an increased risk of DKA in users receiving SGLT-2 inhibitors and an 91% probability of an increased risk of melanoma in patients using low-dose methotrexate.

Conclusions

Our study showed that including NRSIs in a meta-analysis of RCTs for rare events could increase the certainty and comprehensiveness of the evidence. The estimates obtained from NRSIs are generally considered to be biased, and the possible influence of NRSIs on the certainty of the combined evidence needs to be carefully investigated.

Peer Review reports

Introduction

Evidence from high-quality randomized controlled trials (RCTs) is considered the gold standard for assessing the relative effects of health interventions [ 1 ]. However, RCTs have a strictly experimental setting and their inclusion criteria may limit their generalizability to real-world clinical practice [ 2 ]. Meta-analyses often ignore evidence from non-randomized studies of interventions (NRSIs) because their estimates of relative effects are more likely to be biased, especially if bias has not been adequately addressed. In recent years, there has been considerable development in the methods used in NRSIs, with a particular focus on causal inference [ 3 ]. NRSIs could complement the evidence provided by RCTs and potentially address some of their limitations, especially in cases where an RCT may be impossible to conduct (e.g., rare diseases), inadequate (e.g., lower external validity), or inappropriate (e.g., when studying rare adverse or long-term events) [ 4 ].

The study of rare events is one scenario in which evidence from NRSIs complements that from RCTs [ 4 ]. Rare events often occur when investigating rare adverse effects of health interventions. The results of RCTs may be very sparse due to smaller sample sizes and short follow-up periods [ 5 ], with some trials not observing any events at all, resulting in low statistical power [ 6 ]. NRSIs are important for studying rare adverse events because of the larger sample size and longer follow-up up [ 7 ]. NRSIs are increasingly included in systematic reviews and meta-analyses of rare adverse events evaluations to complement the evidence from RCTs [ 8 ]. Several tools, frameworks and guidelines exist to facilitate the combination of evidence from RCTs and NRSIs [ 4 , 9 , 10 , 11 ]. However, the inclusion of NRSIs in a meta-analysis of RCTs is a complex challenge because estimates derived from NRSIs should be interpreted with caution [ 12 ].

Bun et al. [ 8 ] reviewed meta-analyses that included both RCTs and NRSIs published between 2014 and 2018 in five leading journals and the Cochrane Database of Systematic Reviews. They found that 53% of studies combined RCTs and NRSIs in the same meta-analysis without distinction. However, there are fundamental differences between RCTs and NRSIs in design, conduct, data collection, analysis, etc [ 4 ]. These differences may raise questions about potential bias and conflicting evidence between studies. Therefore, combining results and ignoring design types may lead to misleading conclusions [ 13 ]. Statistical methods for generalized evidence synthesis approaches have been proposed to combine evidence from RCTs and NRSIs [ 13 , 14 , 15 ]. Verde and Ohmann [ 14 ] have provided a comprehensive review of the methods and applications of combining the evidence from NRSIs and RCTs over the last two decades. They categorized statistical approaches into four main groups: the confidence profile method [ 16 ], cross-design synthesis [ 17 ], direct likelihood bias modelling, and Bayesian hierarchical modelling [ 18 ]. Bayesian methods are gaining increasing attention because of their outstanding flexibility in combining information from multiple sources. Verde [ 15 ] recently proposed a bias-corrected meta-analysis model for combining studies of different types and quality. Yao et al. [ 13 ] conducted an extensive simulation study to evaluate an array of alternative Bayesian methods for incorporating NRSIs into rare events meta-analysis of RCT, and found that the bias-corrected meta-analysis model yielded favorable results.

Most methods are based on normal approximations for both RCTs and NRSIs studies, because the aggregated data, i.e. the treatment effect estimates with the corresponding standard errors, are usually available for NRSIs. Most of these methods use RCTs as anchors and adjust for bias in NRSIs to ultimately obtain a pooled estimate [ 19 ]. However, there are dangers in using a normal distribution for rare events meta-analysis of RCTs [ 20 ]. If there are problems in modelling the RCT anchor, this would affect the final pooled result. In the context of rare events meta-analysis of RCTs, many studies have confirmed that the use of exact likelihoods, such as the binomial-normal hierarchical model for RCTs, may be preferable [ 21 ].

In order to account for the differences in study design between RCTs and NRSIs, a power prior approach is a good potential option [ 22 ]. This approach allows down-weighting of the NRSIs, so that the data from this type of study contribute less than the data from RCTs when they have the same precision before down-weighting. In this study, we used exact likelihoods for RCTs as an anchor, and an informative prior distribution on the treatment effect parameter is derived from NRSIs through a power prior method [ 23 ]. Compared with prior methods, this method does not depend on normal approximations, and the results may be more accurate. An important consideration for the power prior approach is how to set the values of the down-weighting factor to account for the potential bias in the pooled estimates [ 4 ]. The common approach is to elicit expert opinion regarding the range of plausible values for the bias parameters [ 24 , 25 ]. However, this process is time consuming and it can be difficult to pool opinions from different experts [ 26 ].

Therefore, the aim of this study was to explore the use of a power prior within a Bayesian framework to integrate RCTs and NRSIs [ 27 ]. This approach did not adjust for the possible bias in the point estimates and only took into account the down-weighting of the NRSIs in the pooled estimates, with an uncertainty reflected in the down-weighting factor. We also proposed a way of specifying the down-weighting factor based on judgments of the relative magnitude (no information, and low, moderate, serious and critical risk of bias ) of the overall risk of bias for each NRSI using the ROBINS-I tool [ 28 ], leading to transparent probabilities and therefore more informed decision making.

For this study, we re-analyzed the two recently published meta-analyses of the risk of diabetic ketoacidosis (DKA) in patients using sodium/glucose cotransporter 2 (SGLT-2) inhibitors compared with active comparators [ 29 ], and the association between low-dose methotrexate exposure and melanoma [ 30 ]. Our study did not require ethics committee approval or patients consent, as it is a secondary analysis of the publicly available datasets.

The first meta-analysis was conducted by Alkabbani et al. [ 29 ]. This study used evidence from RCTs and NRSIs to investigate the risk of DKA associated with one or more individual SGLT-2 inhibitors. The meta-analysis included twelve placebo-controlled RCTs, seven active-comparator RCTs, and seven observational studies. All the NRSIs were retrospective, propensity score-matched cohort studies. Our primary concern was whether SGLT-2 inhibitors increased the risk of DKA compared with the active comparator. We included all studies in the initial analysis, then we performed a sensitivity analysis by excluding one NRSI because its control was not an active comparator [ 31 ]. The second meta-analysis was done by Yan et al. [ 30 ]. This meta-analysis included six RCTs and six NRSIs for the primary analysis. For the NRSIs, two case-control studies and four cohort studies were included.

Assessment of the risk of bias

The risk of bias is assessed at the outcome-level and not study-level, if a study includes multiple outcomes, multiple risk of bias assessments should be performed. For the outcome from both RCTs and NRSIs, there are widely available tools that can be used to assess the risk of bias [ 32 , 33 ]. For the both two meta-analyses, the first originally assessed the quality of both RCTs and NRSIs using the checklist proposed by Downs et al. [ 34 ], the second used the Cochrane risk-of-bias tool [ 35 ] for RCTs and the Joanna Briggs Institute checklist [ 36 ] for NRSIs. In this study, we reassessed the risk of bias for each study included in the two meta-analyses. The Cochrane risk-of-bias tool (RoB 2) was used for RCTs [ 35 ], as this is already an established practice for assessing the quality of RCTs. The RoB 2 table takes into account the following domains: bias arising from the randomization process, bias due to deviations from intended interventions, bias in missing outcome data, bias in the measurement of the outcome, and bias in the selection of the reported result [ 35 ]. Each domain is classified into three categories: “low risk of bias,” “some concerns,” or “high risk of bias.” [ 35 ]. The response options for an overall risk-of-bias judgment are the same as for individual domains.

The choice of assessment tool for NRSIs is therefore a critical consideration, as it may affect the selection of NRSIs for quantitative analysis and the credibility of subsequent meta-analysis results. For NRSIs, we used the ROBINS-I tool. This tool covers most of the issues commonly encountered in NRSIs [ 9 ] and assesses the risk of bias of NRSIs in relation to an ideal (or target) RCT as the standard of reference [ 28 ]. In other words, an NRSI that is judged to have a low risk of bias - using ROBINS-I - is comparable to a well-conducted RCT [ 37 ]. The ROBINS-I tool takes into account the following domains: pre-intervention (bias due to confounding, bias in the selection of participants into the study), at intervention (bias in classification of interventions), post-intervention (bias due to deviations from intended interventions, bias due to missing data, bias in measurement of outcomes, bias in selection of the reported result). Each domain is classified into five categories: “Low risk of bias” or “Moderate risk of bias” or “Serious risk of bias” or “Critical risk of bias” or “No information”. The“No Information” category should be used only when insufficient data are reported to permit a judgment of bias. The response options for an overall risk of bias judgement are also the same as for individual domains.

In the Bayesian analysis section, we showed how to specify the down-weighting factor based on judgments of the relative magnitude (i.e. no information, and low, moderate, serious and critical risk of bias) of the overall risk of bias for each NRSI using the ROBINS-I tool.

The conventional random effects model

The pooled odds ratio (OR) was calculated for the both two meta-analyses using the conventional random-effects model, also known as the naïve data synthesis method. A random-effects model was employed to account for potential heterogeneity between-studies. This method was also the most commonly used in the empirical analysis [ 8 ]. Between-study variance was estimated using restricted maximum likelihood estimation. The level of variability due to heterogeneity rather than chance was assessed using the I 2 statistic, and subgroup analyses were conducted by type of study design (RCTs vs. NRSIs). We used of continuity correction (adding 0.5) for zero-event trials. All analyses were performed with R software (version 4.1.1, R Foundation for Statistical Computing, Vienna, Austria) using the meta package (version 4.19-0) [ 38 ].

Bayesian analysis

We used the power prior method to combine the data from RCTs and NRSIs, which combines the likelihood contribution of the NRSI, raised to the power parameter of alpha ( \(\:\alpha\:\) ), with the likelihood of the RCT data [ 22 ]. The power prior approach allows to down-weigh the NRSI, thus making the data from this type of studies contribute less compared to data obtained from the RCTs. The power prior is constructed as the product of an initial prior and the likelihood of the NRSIs’ data with a down-weighting factor \(\:\alpha\:\in\:\left[\text{0,1}\right].\) [ 22 ] Defined as: \(\:\pi\:\left(\mu\:|NRSI,\alpha\:\right)\propto\:{L\left(\mu\:|NRSI\right)}^{\alpha\:}\pi\:\left(\mu\:\right)\) , \(\:\pi\:\left(\mu\:\right)\) is the initial prior before the NRSIs’ data is observed. \(\:\alpha\:\) with zero meaning that NRSI is entirely discounted, and with one indicating that NRSI is considered at ‘face-value’. \(\:\alpha\:\) is fixed and often specified based on the confidence to be placed in the NRSIs or determined based on data from the NRSIs and the RCTs in a dynamic way [ 39 ]. Here, we treated the \(\:\alpha\:\) as random to be estimated by using the full Bayesian methodology. For multiple NRSIs, we assign different independent down-weighting factors \(\:{\alpha\:}_{m}\) for each of the NRSI’s data [ 40 ], \(\:\varvec{\alpha\:}\:\in\:\:\left\{{\alpha\:}_{1},{\alpha\:}_{2},\dots\:,{\alpha\:}_{M}\right\}\) . We assumed M NRSIs and K RCTs, the overall joint posterior distribution is given by [ 41 ]:

where \(\:L\left(\mu\:|Y\right)\) is the likelihood of \(\:\mu\:\) given data Y, data are split into the part obtained from RCTs and part from NRSIs to form separate likelihood contributions and then combined (with the down-weighting factor for NRSIs’ data) to give the overall posterior distribution.

The likelihood of non-randomized studies of interventions

The NRSIs’ is modelled using the normal-normal hierarchical random effects meta-analysis model with a weight indexed by \(\:\varvec{\alpha\:}\) . The model can be written as:

Where m  = 1, 2, …, M denotes NRSI m . \(\:{\widehat{\theta\:}}_{m}\) and \(\:SE\left({\widehat{\theta\:}}_{m}\right)\) are the observed relative treatment effect and the corresponding standard error for study m , respectively. Both the treatment effect ( \(\:{\widehat{\theta\:}}_{m}\) ) and standard error ( \(\:SE\left({\widehat{\theta\:}}_{m}\right)\) ) are calculated on the log OR scale. \(\:{\theta\:}_{m}\) denotes the true treatment effect for study m . \(\:\mu\:\) represents the overall combined effect and \(\:{\tau\:}_{NRSI}^{2}\) is the between-study variance. We assign a weakly informative prior (WIP) to the treatment effect and the heterogeneity parameter, which is a normal prior with mean 0 and standard deviation 2.82 for the treatment effect \(\:[\mu \: \sim N\left( {{\rm{0,2}}{\rm{.82}}} \right)]\) [ 42 ] and a half-normal prior with scale of 0.5 for the heterogeneity parameter \(\:\tau {\:_{NRSI}}\:\left[ {\tau {\:_{NRSI}} \sim HN\left( {0.5} \right)} \right]\) [ 43 ]. The WIP of the treatment effect has two advantages. First, the normal prior is symmetric and the OR is constrained from 1/250 to 250 with a 95% probability. Second, it was consistent with effect estimates obtained from 37,773 meta-analysis datasets published in the Cochrane Database of Systematic Reviews [ 42 ].

The likelihood of randomized controlled trials

We consider a set of k RCTs with a binary outcome. In each trial \(\:i\:\in\:(1,\:2,\:\dots\:\:K)\) , \(\:{\pi\:}_{it}\) ( \(\:{\pi\:}_{ic}\) ), \(\:{n}_{it}\) ( \(\:{n}_{ic})\) , and \(\:{r}_{it}\left({r}_{ic}\right)\) denote the probability of the event, number of subjects, and event counts in the treatment (control) group, respectively. The number of events is modeled to follow a binomial distribution: \(\:{r_{it}} \sim Bin({n_{it}},\pi {\:_{it}})\) and \(\:{r}_{ic} \sim Bin({n}_{ic},{\pi\:}_{ic})\) . Under a random-effects assumption, a commonly-used Bayesian binomial-normal hierarchical model can be written as follows [ 44 , 45 ]:

Where the \(\:{\mu\:}_{i}\) are the fixed effects describing the baseline risks of the event in study i , \(\:{\theta\:}_{i}\sim N\left(\mu\:,{\tau\:}_{RCT}^{2}\right)\) , \(\:\:\mu\:\:\) is the mean treatment effect and \(\:{\tau\:}_{RCT}\) measures the heterogeneity of treatment effects across RCTs.

To ensure full Bayesian inference, we need to specify the prior distributions for the parameters \(\:\:{\:\mu\:}_{i}\:\) and \(\:{\:\tau\:}_{RCT}\) . For \(\:{\mu\:}_{i},\) we assume a vague normal prior \(\:{\mu\:}_{i}\sim N\left(0,{10}^{2}\right)\) . A weakly informative prior (WIP) is assigned for the heterogeneity parameter \(\:{\:\tau\:}_{RCT\:}\) , that is, a half-normal prior with scale of 0.5 \(\:[{\:\tau\:}_{RCT\:}\sim HN\left(0.5\right)\) ] [ 46 ].

Down-weighting factor

The down-weighting factor can be interpreted as the quality of the study, we could set its magnitude according to the risk of bias of each NRSI [ 47 , 48 ]. This approach follows standard health technology assessment methods, where the risk of bias is assessed at the individual outcome level. In ROBINS-I, a NRSI was classified as “Low risk of bias” or “Moderate risk of bias” or “Serious risk of bias” or “Critical risk of bias” or “No information” based on the risk of bias assessment. If a NRSI was assessed as having a “Low risk of bias,” we set the down-weighting factor to 1. This is because a low risk of bias in a NRSI, as assessed by ROBINS-I, indicates that the quality of the study is comparable to that of a well-conducted RCT [ 37 ].

For the other categories, we consider \(\:{\alpha\:}_{m}\:\) as scale random variables and we model it as beta distribution.

To elicit a value of 𝜈 , we can use the prior mean [ 15 ], which is

If we take 𝜈 = 0.5, which corresponds to down-weighting in average 1- \(\:E\left({\alpha\:}_{m}\right)\:\) = 0.67 for the low-quality studies.

We set the down-weighting factor for the NRSI as \(\:{\alpha\:}_{m}\) ~ beta (4, 1) if it was assessed as having a “Moderate risk of bias”, which corresponds to a down-weighting in the average 1- E ( \(\:{\alpha\:}_{m}\) ) = 0.2; or \(\:{\alpha\:}_{m}\) ~ beta (1.5, 1) if it was assessed as “Serious risk of bias”, which corresponds to a down-weighting in the average 1- E ( \(\:{\alpha\:}_{m}\) ) = 0.4, or \(\:{\alpha\:}_{m}\) ~ beta (0.25, 1) if it was assessed as “Critical risk of bias”, which corresponds to a down-weighting in the average 1- E ( \(\:{\alpha\:}_{m}\) ) = 0.8 [ 49 ]. If a study was rated as “No information” we handled this case as a “Critical risk of bias” from a conservative perspective.

Sensitivity analysis

Spiegelhalter and Best [ 50 ] proposed to give a set of fixed values (i.e. 0.1, 0.2, 0.3, 0.4) to discount low-quality studies and to perform a sensitivity analysis. Efthimiou et al. [ 51 ] set the down-weighting factor with a uniform distribution, e.g. uniform (0, 0.3), uniform (0.3, 0.7), and uniform (0.7, 1) represent places of low, medium and high confidence in the quality of the evidence. Therefore, we performed a sensitivity analysis to compare the results of our method with those of the Spiegelhalter and Best [ 50 ] and Efthimiou et al. [ 51 ]. For the method proposed by Spiegelhalter and Best [ 50 ], we provided a set of results using different values (i.e. 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1). For the method used by Efthimiou et al. [ 51 ], we set the down-weighting factor for the NRSI as \(\:{\alpha\:}_{m}\) ~ uniform (0, 0.3), \(\:{\alpha\:}_{m}\) ~ uniform (0.3, 0.7), and \(\:{\alpha\:}_{m}\) ~ uniform (0.7, 1) if a NRSI was assessed to have a “Critical risk of bias”, “Serious risk of bias”, and “Moderate risk of bias”, respectively.

Model implementation

All point estimates (OR) are presented with 95% credible interval (CrI). In addition, we calculated the posterior probabilities of any risk (OR > 1) and of meaningful clinical association (defined as OR > 1.15, i.e., at least 15% odds increase in outcomes) [ 52 ]. We assessed the posterior distribution of the between-study standard deviation ( τ , a proxy for heterogeneity) by calculating the posterior probabilities of “small [ τ \(\:\in\:(0,\:0.1\) )]” “reasonable [ τ \(\:\in\:(0.1,\:0.5\) )]” “fairly high [ τ \(\:\in\:(0.5,\:1\) )]” and “fairly extreme [ τ \(\:\in\:(1,\:\text{i}\text{n}\text{f}\text{i}\text{n}\text{i}\text{t}\text{y}\) )]” heterogeneity [ 53 ].

We performed Bayesian analysis using the RStan package (version 2.21.3). We fitted four chains for each model, each with 5,000 iterations. In each chain, we took the first 2,500 iterations as a warm-up and thinned the remaining 2,500 iterations by one. We performed convergence checks; convergence was judged to have occurred when \(\:\widehat{R}\) (the potential scale reduction factor) was no greater than 1.1 for all parameters [ 54 ]. Overall, convergence was achieved.

Study characteristics

Tables  1 and 2 show the basic characteristics of the included RCTs and NRSIs for the first and second meta-analyses, respectively. In the first meta-analysis, the majority of subjects were men, with a mean age of 46.0 to 74.2 years. Length of follow-up ranged from 0.54 to 2 years for the RCTs and from 0.5 years to 12 years for the NRSIs. A total of 8 DKA outcomes were reported in all RCTs, which included 8,100 patients, resulting in an incidence rate of 0.1%. For all NRSIs, we observed 2,693 DKA events in 1,311,868 patients, for an incidence rate of 0.2%.

In the second meta-analysis, the majority of subjects were women, with a mean age of 53.0 to 74.0 years. Length of follow-up ranged from 6 to 27.6 months for RCTs and from 6 to 16.4 months for NRSIs. A total of 21 melanoma outcomes were reported in six RCTs involving 11,810 patients, giving an incidence rate of 0.2%. In all NRSIs, 16,628 melanoma outcomes were observed in 773,876 patients, for an incidence rate of 2.1%.

Risk of Bias Assessment

The risk of bias assessment for the included studies in the two meta-analyses is detailed in Tables S1 - S4 in the Supplementary. In the first meta-analysis, 4 RCTs were assessed as ‘some concern’, 2 as ‘low risk’ and 1 as ‘high risk’; 5 NRSIs were assessed as ‘moderate risk of bias’ and 2 as ‘serious risk of bias’. In the second meta-analysis, 4 RCTs were assessed as ‘some concern’ and 2 as ‘low risk of bias’; for NRSIs, 2 as ‘moderate risk of bias’ and 4 as ‘serious risk of bias’.

The results of the conventional random effects model

Figures  1 and 2 show the combined results of RCTs and NRSIs using the conventional random effects model for the first and second meta-analyses. For the risk of DKA among users receiving SGLT-2 inhibitors versus active comparators, we observed an increased risk of DKA when data from RCTs and NRSIs were pooled directly (OR = 1.50, 95%CI: 1. 11–2.03, I 2  = 82%) and from NRSIs alone (OR = 1.56, 95%CI: 1.13–2.15, I 2  = 90%), whereas no significant effect was observed when results from RCTs were pooled (OR = 0.82, 95%CI: 0.25–2.69, I 2  = 0%). We found that the weight of RCTs in the total body of evidence is only 6.1%.

figure 1

Odds ratio of diabetic ketoacidosis among patients receiving sodium-glucose co-transporter-2 inhibitors versus active comparators in randomized control trials and non-randomized studies of intervention

figure 2

Odds ratio of melanoma among patients with low-dose methotrexate exposure in randomized control trials and non-randomized studies of intervention

For the association between low-dose methotrexate exposure and melanoma, we also observed an increased risk of melanoma when data were pooled directly from RCTs and NRSIs (OR = 1.16, 95%CI: 1.08–1.24, I 2  = 0%) and from NRSIs alone (OR = 1. 14, 95%CI: 1.04–1.26, I 2  = 0%), while no significant effect was observed when the results were pooled from RCTs only (OR = 1.94, 95%CI: 0.72–5.27, I 2  = 0%).

The results of the Bayesian analysis

The estimated risk of DKA in users receiving SGLT-2 inhibitors versus active comparators is shown in Fig.  3 using Bayesian analysis. The point estimate from Bayesian analysis was much closer to the estimate from conventional random effects model [exp(0.34) = 1.40], while the interval from Bayesian analysis was much wider than the estimate from conventional random effects model. Despite the down-weighting of NRSIs, which increased the posterior variance, there was a near 90% probability of an increased risk and a 40% probability of a > 15% increased risk. There was reasonable heterogeneity based on the point estimate ( τ =  0.33, not shown). When we excluded the study that its control is not an active comparator [ 31 ], there was a 97% probability of an increased risk and a 68% probability of a > 15% increased risk (Figure S1 ). There was also a reasonable heterogeneity based on the point estimate (τ = 0.32, not shown).

figure 3

Three posterior distributions for the pooled log (OR) assessing the risk of diabetic ketoacidosis among patients using sodium/glucose cotransporter 2 inhibitors compared with active comparators: The dark green and the green lines correspond to Bayesian meta-analyses including only RCTs or NRSIs, respectively. The blue line is a posterior distribution combined by a power prior approach

Figure  4 shows the Bayesian estimates of the association between low-dose methotrexate exposure and melanoma. The Bayesian analysis showed significant differences compared with the conventional random effects model. Although the point estimate from the Bayesian analysis was also close to the estimate from the conventional random effects model [exp (0.17) = 1.19], it did not indicate an increased risk of melanoma. Furthermore, the Bayesian analysis showed only an 91% probability of an increased risk and a 9.5% probability of a > 15% increased risk. There was also reasonable heterogeneity based on the point estimate ( τ =  0.37, not shown).

figure 4

Three posterior distributions for the pooled log (OR) assessing the association of low-dose methotrexate exposure and melanoma: The dark green and the green lines correspond to Bayesian meta-analyses including only RCTs or NRSIs, respectively. The blue line is a posterior distribution combined by a power prior approach

Figures S2 and S3 show the results for Case 1 and Case 2 when we use different fixed values for the down-weighting factors. For both studies, as we reduce the weight of NRSI, the posterior distribution shows a decrease in the probability of increasing risk. When we assign extremely low weights to NRSIs (α = 0.01), we observed a 52% probability of an increased risk of DKA in users receiving SGLT-2 inhibitors, while an 82% probability of an increased risk of melanoma in patients using low-dose methotrexate.

Figures S4 and S5 show the results for Case 1 and Case 2 when we assign a uniform distribution to the down-weighting factors. For both studies, the results are much closer to the results when we assign a beta distribution to the down-weighting factors.

In this study, we discussed the use of power priors to discount NRSIs and apply this method to incorporate NRSIs in a rare events meta-analysis of RCTs. We demonstrate how to set the down-weighting factor based on judgments of the relative magnitude of the overall risk of bias for each NRSI outcome using the ROBINS-I tool. The methods were illustrated using two recently published meta-analyses, focusing on the risk of DKA in patients using SGLT-2 inhibitors compared with active comparators, and the association between low-dose methotrexate exposure and melanoma. There were no significant results for either meta-analysis when data from RCTs only were pooled. However, significant results were observed when data from NRSIs were pooled. When RCTs and NRSIs were combined directly in the same meta-analysis without distinction, both meta-analyses showed significant results. However, when the bias of the NRSIs was taken into account, there was a 90% probability of an increased risk of DKA in users receiving SGLT-2 inhibitors and an 91% probability of an increased risk of melanoma in patients using low-dose methotrexate.

Our study suggested that including NRSIs during the evidence synthesis process may increase the certainty of the estimates when rare events meta-analyses of RCTs cannot provide sufficient evidence. A previous meta-analysis concluded that the risk of DKA was not increased in users of SGLT-2 inhibitors compared with active comparators, possibly because of the small number of outcomes in all included RCTs [ 55 ]. However, in our study, we found that the sample size, number of DKA cases, and length of follow-up of the RCTs were much smaller than those of the NRSIs, and the range of mean ages in the NRSIs was wider than in the RCTs. There were 8 events in 8,100 patients in all the RCTs, and the pooled result was also not significant. The same results were observed for the risk between low-dose methotrexate exposure and melanoma. In an extensive simulation study, Yao et al. [ 13 ] found that the power of the rare events meta-analysis of RCTs was much lower. In addition, Jia et al. [ 6 ] found that many rare events meta-analyses are underpowered by evaluating the 4,177 rare events meta-analyses obtained from the Cochrane Database of Systematic Reviews. Our study showed that the precision of the relative treatment effect estimates for both meta-analyses increased when we included NRSIs and RCTs. All these results suggest that systematic reviews and meta-analyses of rare events should include evidence from both RCTs and NRSIs.

We do not recommend using the conventional approach as the primary method of the empirical analysis. Two recent meta-epidemiological studies have shown that many meta-analyses directly incorporate NRSIs using the conventional approach [ 8 , 56 ]. The bias of relative treatment effect estimates from NRSIs can be reduced by some post-hoc adjustment techniques, such as propensity score analysis, but cannot be completely eliminated [ 12 ]. The conventional approach ignores differences in study design and is unable to account for the potential bias of NRSIs [ 49 , 51 ]. Therefore, by including NRSIs in a rare events meta-analysis of RCTs using the conventional approach, we are not only combining results of interest, but also combining multiple biases. In addition, compared with RCTs, the results of NRSIs often have a small confidence interval because the events and the sample size are usually much larger [ 29 ]. This would give greater weight than that of RCTs, leading to NRSIs dominating the conclusions. Our two illustrative examples also confirm this. However, confidence intervals for effect estimates from NRSI are less likely to represent the true uncertainty of the observed effect than are the confidence intervals for RCTs [ 57 ]. The conventional approach may be used to assess the compatibility of evidence from NRSIs and RCTs by comparing changes in heterogeneity and inconsistency before and after the inclusion of NRSIs [ 51 ].

Estimates from NRSIs are generally considered to be biased, and it is difficult to quantify potential bias in empirical analysis [ 58 , 59 ]. There are three commonly used methods to assess the direction or magnitude of potential bias in empirical analysis. The first method involves assessing the impact of NRSIs on combined estimates by varying the level of confidence placed in the NRSIs [ 41 ]. The second method treats bias parameters as random variables (i.e. a non-informative prior) to allow the combined estimates to be influenced by the agreement between sources of evidence [ 51 ]. The third approach is to seek expert opinion on the range of plausible values for bias parameters [ 24 , 25 ]. Our study was the first to relate bias to study quality, with the direction or magnitude of possible bias determined by the risk of bias of each NRSI. Although tools to critically appraise NRSIs are widely available [ 33 ], they vary considerably in their content and the quality of the topics covered. We chose the ROBINS-I because it covers most of the issues commonly encountered in NRSIs [ 9 ] and assesses the risk of bias in relation to an ideal (or target) RCT as a standard of reference [ 28 ]. In this study, we did not down-weight of NRSI if it was assessed as having a “Low risk of bias,“, because an NRSI judged as having low risk of bias will be comparable to a well-conducted RCT [ 37 ]. However, one reviewer pointed out that an NRSI with low risk of bias as determined by ROBINS-I is likely to be of lower quality than an RCT with low risk of bias using the Cochrane tool. In the empirical analysis, we recommend using sensitivity analysis to explore the impact of reducing or not reducing the weight of low risk of bias NRSI on the estimates. The down-weighting factor for an NRSI with low risk of bias may be relatively large at this time, for example setting v  = 0.1 or assuming \(\:\alpha\:\:\) = 0.9 or \(\:\alpha\:\:\) ~ uniform (0.9,1).

The choice of the prior distribution for the down-weighting factor is subjective. In this study, we set the down-weighting factors as scale random variables and modelled them as beta distributions. We grouped the studies according to different categories of risk of bias. We used the prior mean to determine the values of the parameter of the beta distribution and then set the values based on the results of the quality assessment of each literature. The values also represent a quantification of the confidence to be placed in each study. In practice, the prior can be informed by external information, such as using the empirical information from meta-epidemiological studies in combination with expert consensus to derive the prior.

The impact of the risk of bias of the RCT on the estimation was not considered in this study. Only a few methodological studies have considered the bias of both NRSIs and RCTs simultaneously. Turner et al. [ 24 ] proposed a method to construct prior distributions to represent the internal and external biases at the individual study level using expert elicitation, followed by synthesizing the estimates across multiple design types of studies. Schnell-Inderst et al. [ 26 ] simplify the methods by Tuner et al. and used the case of total hip replacement prosthesis to illustrate how to integrate evidence from RCT and NRSI. Verde et al. [ 15 ] proposed a bias-corrected meta-analysis model that combines different types of studies in a meta-analysis, with internal validity bias adjusted. This model is based on a mixture of two random effect distributions, where the first component corresponds to the model of interest and the second component corresponds to the hidden bias structure. In our framework, the likelihood function of RCT can be extended to explain its own bias, for example, using the robust Bayesian bias adjustment random effects model proposed by Cruz et al. [ 47 ] However, more in-depth studies need to explore how to assign a rational parameter for the risk of bias in RCTs [ 60 ].

There were some limitations to this study that need to be recognized. First, the bias of point estimates of NRSIs was not considered in the method. Bias in estimates of relative effects from NRSIs could depend on the method used to obtain them. Different methods used to estimate relative treatment effects from an NRSI could produce different results. Therefore, it may be difficult to predict the direction (and also the magnitude) of possible biases. The vast majority of empirical analyses reduce the NRSI weights in the pooled estimates, and this study follows a similar strategy. Second, only two illustrative examples were used in this study. More comprehensive analyses in further empirical or simulation studies are needed. Third, there are other methods for combining RCTs and NRSIs in a meta-analysis [ 14 ], but their performance compared to the current method was not investigated. Therefore, further evaluation of these methods in different scenarios, including the use of comprehensive simulation studies, is warranted. Fourth, although we used the OR as the effect measure, these methods can be applied to other measures of association commonly used in meta-analyses, including relative risk (e.g. using the Poisson regression for RCTs [ 21 ]), risk difference (e.g. using the beta-binomial model for RCTs [ 61 ]).

In summary, the inclusion of NRSIs in a rare events meta-analysis has the potential to corroborate findings from RCTs, increase precision, and improve the decision-making process. Our study provides an example of how to down-weight NRSIs by incorporating information from risk of bias assessments for each NRSI using the ROBINS-I tool.

Data availability

All data in this study have been taken from the published studies and no new data have been generated. Computing code for the two empirical examples can be accessed from the supplementary files.

Abbreviations

Confidence interval

Credible interval

Diabetic ketoacidosis

Non-randomised studies of interventions

Randomized Controlled Trial

Risk of bias in non-randomised studies-of interventions

Sodium/glucose cotransporter-2

Zabor EC, Kaizer AM, Hobbs BP. Randomized controlled trials chest. 2020;158(1s):S79–87.

PubMed   Google Scholar  

Rothwell PM. External validity of randomised controlled trials: to whom do the results of this trial apply? Lancet. 2005;365(9453):82–93.

Article   PubMed   Google Scholar  

Hernán MA, Robins JM, editors. Causal inference: what if. Boca Raton, FL: Chapman & Hall/CRC; 2020.

Google Scholar  

Cuello-Garcia CA, Santesso N, Morgan RL, et al. GRADE guidance 24 optimizing the integration of randomized and non-randomized studies of interventions in evidence syntheses and health guidelines. J Clin Epidemiol. 2022;142:200–8.

Article   PubMed   PubMed Central   Google Scholar  

Hodkinson A, Kontopantelis E. Applications of simple and accessible methods for meta-analysis involving rare events: a simulation study. Stat Methods Med Res. 2021;30(7):1589–608.

Jia P, Lin L, Kwong JSW, et al. Many meta-analyses of rare events in the Cochrane database of systematic reviews were underpowered. J Clin Epidemiol. 2021;131:113–22.

Golder S, Loke YK, Bland M. Meta-analyses of adverse effects data derived from randomised controlled trials as compared to observational studies: methodological overview. PLoS Med. 2011;8(5):e1001026.

Bun RS, Scheer J, Guillo S, et al. Meta-analyses frequently pooled different study types together: a meta-epidemiological study. J Clin Epidemiol. 2020;118:18–28.

Sarri G, Patorno E, Yuan H, et al. Framework for the synthesis of non-randomised studies and randomised controlled trials: a guidance on conducting a systematic review and meta-analysis for healthcare decision making. BMJ Evid Based Med. 2022;27(2):109–19.

Munn Z, Barker TH, Aromataris E, et al. Including nonrandomized studies of interventions in systematic reviews: principles and practicalities. J Clin Epidemiol. 2022;152:314–5.

Saldanha IJ, Adam GP, Bañez LL, et al. Inclusion of nonrandomized studies of interventions in systematic reviews of interventions: updated guidance from the agency for health care research and quality effective health care program. J Clin Epidemiol. 2022;152:300–6.

Sherman RE, Anderson SA, Dal Pan GJ, et al. Real-world evidence - what is it and what can it tell us? N Engl J Med. 2016;375(23):2293–7.

Yao M, Wang Y, Ren Y, et al. Comparison of statistical methods for integrating real-world evidence in a rare events meta-analysis of randomized controlled trials. Res Synth Methods. 2023;14(5):689–706.

Verde PE, Ohmann C. Combining randomized and non-randomized evidence in clinical research: a review of methods and applications. Res Synth Methods. 2015;6(1):45–62.

Verde PE. A bias-corrected meta-analysis model for combining, studies of different types and quality. Biom J. 2021;63(2):406–22.

Eddy DM, Hasselblad V, Shachter R. Meta-analysis by the confidence profile method: the statistical synthesis of evidence. San Diego,CA: Academic; 1992.

Droitcour J, Silberman G, Chelimsky E. A new form of meta-analysis for combining results from randomized clinical trials and medical-practice databases. Int J Technol Assess Health Care. 1993;9(3):440–9.

Article   CAS   PubMed   Google Scholar  

Verde PE, Ohmann C, Morbach S, Icks A. Bayesian evidence synthesis for exploring generalizability of treatment effects: a case study of combining randomized and non-randomized results in diabetes. Stat Med. 2016;35(10):1654–75.

Schmitz S, Adams R, Walsh C. Incorporating data from various trial designs into a mixed treatment comparison model. Stat Med. 2013;32(17):2935–49.

Jackson D, White IR. When should meta-analysis avoid making hidden normality assumptions? Biom J. 2018;60(6):1040–58.

Kuss O. Statistical methods for meta-analyses including information from studies without any events-add nothing to nothing and succeed nevertheless. Stat Med. 2015;34(7):1097–116.

Ibrahim JG, Chen M-H. Power prior distributions for regression models. Stat Sci. 2000;15(1):46–60.

Cook RJ, Farewell VT. The utility of mixed-form likelihoods. Biometrics. 1999;55(1):284–8.

Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser Stat Soc. 2009;172(1):21–47.

Article   Google Scholar  

Efthimiou O, Mavridis D, Cipriani A, Leucht S, Bagos P, Salanti G. An approach for modelling multiple correlated outcomes in a network of interventions using odds ratios. Stat Med. 2014;33(13):2275–87.

Schnell-Inderst P, Iglesias CP, Arvandi M, Ciani O, Matteucci Gothe R, Peters J, Blom AW, Taylor RS, Siebert U. A bias-adjusted evidence synthesis of RCT and observational data: the case of total hip replacement. Health Econ. 2017;26(Suppl 1):46–69.

Ibrahim JG, Chen MH, Gwon Y, Chen F. The power prior: theory and applications. Stat Med. 2015;34(28):3724–49.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG, Ansari MT, Boutron I, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Alkabbani W, Pelletier R, Gamble JM. Sodium/Glucose cotransporter 2 inhibitors and the risk of diabetic ketoacidosis: an example of complementary evidence for rare adverse events. Am J Epidemiol. 2021;190(8):1572–81.

Yan MK, Wang C, Wolfe R, Mar VJ, Wluka AE. Association between low-dose methotrexate exposure and melanoma: a systematic review and meta-analysis. JAMA Dermatol. 2022;158(10):1157–66.

McGurnaghan SJ, Brierley L, Caparrotta TM, McKeigue PM, Blackbourn LAK, Wild SH, Leese GP, McCrimmon RJ, McKnight JA, Pearson ER, et al. The effect of dapagliflozin on glycaemic control and other cardiovascular disease risk factors in type 2 diabetes mellitus: a real-world observational study. Diabetologia. 2019;62(4):621–32.

Morton SC, Costlow MR, Graff JS, Dubois RW. Standards and guidelines for observational studies: quality is in the eye of the beholder. J Clin Epidemiol. 2016;71:3–10.

Quigley JM, Thompson JC, Halfpenny NJ, Scott DA. Critical appraisal of nonrandomized studies-a review of recommended and commonly used tools. J Eval Clin Pract. 2019;25(1):44–52.

Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998;52(6):377–84.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, Cates CJ, Cheng HY, Corbett MS, Eldridge SM, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ. 2019;366:l4898.

Moola S, Munn Z, Tufanaru C et al. Chapter 7: systematic reviews of etiology and risk. In: Aromataris E, Munn Z, eds. JBI Manual for Evidence Synthesis. JBI; 2020. Accessed December 13, 2023. https://jbi-global-wiki.refined.site/space/MANUAL/ 4687372/Chapter + 7% 3A + Systematic + reviews + of + etiology + and + risk.

Cuello CA, Morgan RL, Brozek J, Verbeek J, Thayer K, Ansari MT, Guyatt G, Schünemann HJ. Case studies to explore the optimal use of randomized and nonrandomized studies in evidence syntheses that use GRADE. J Clin Epidemiol. 2022;152:56–69.

Schwarzer G, Carpenter JR, Rücker G. Meta-Analysis with R. Springer international publishing, 2015. https://link.springer.com/book/10.1007/978-3-319-21416-0

Gravestock I, Held L. Adaptive power priors with empirical Bayes for clinical trials. Pharm Stat. 2017;16(5):349–60.

Duan Y, Ye K, Smith EP. Evaluating water quality using power priors to incorporate historical information. Environmetrics. 2006;17(1):95–106.

Jenkins DA, Hussein H, Martina R, Dequen-O’Byrne P, Abrams KR, Bujkiewicz S. Methods for the inclusion of real-world evidence in network meta-analysis. BMC Med Res Methodol. 2021;21(1):207.

Günhan BK, Röver C, Friede T. Random-effects meta-analysis of few studies involving rare events. Res Synth Methods. 2020;11(1):74–90.

Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of two studies in the presence of heterogeneity with applications in rare diseases. Biom J. 2017;59(4):658–71.

Bhaumik DK, Amatya A, Normand SL, Greenhouse J, Kaizar E, Neelon B, Gibbons RD. Meta-analysis of rare binary adverse event data. J Am Stat Assoc. 2012;107(498):555–67.

Yao M, Jia Y, Mei F, Wang Y, Zou K, Li L, Sun X. Comparing various Bayesian random-effects models for pooling randomized controlled trials with rare events. Pharm Stat. 2024. https://doi.org/10.1002/pst.2392

Friede T, Röver C, Wandel S, Neuenschwander B. Meta-analysis of few small studies in orphan diseases. Res Synth Methods. 2017;8(1):79–91.

Raices Cruz I, Troffaes MCM, Lindström J, Sahlin U. A robust Bayesian bias-adjusted random effects model for consideration of uncertainty about bias terms in evidence synthesis. Stat Med. 2022;41(17):3365–79.

Greenland S, O’Rourke K. On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics. 2001;2(4):463–71.

Yao M, Wang Y, Mei F, Zou K, Li L, Sun X. Methods for the inclusion of real-world evidence in a rare events meta-analysis of randomized controlled trials. J Clin Med. 2023;12(4).

Spiegelhalter DJ, Best NG. Bayesian approaches to multiple sources of evidence and uncertainty in complex cost-effectiveness modelling. Stat Med. 2003;22(23):3687–709.

Efthimiou O, Mavridis D, Debray TP, Samara M, Belger M, Siontis GC, Leucht S, Salanti G. Combining randomized and non-randomized evidence in network meta-analysis. Stat Med. 2017;36(8):1210–26.

Nakhlé G, Brophy JM, Renoux C, Khairy P, Bélisle P, LeLorier J. Domperidone increases harmful cardiac events in Parkinson’s disease: a Bayesian re-analysis of an observational study. J Clin Epidemiol. 2021;140:93–100.

Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials and health-care evaluation. Wiley; 2004.

Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge University Press; 2007.

Liu J, Li L, Li S, Wang Y, Qin X, Deng K, Liu Y, Zou K, Sun X. Sodium-glucose co-transporter-2 inhibitors and the risk of diabetic ketoacidosis in patients with type 2 diabetes: a systematic review and meta-analysis of randomized controlled trials. Diabetes Obes Metab. 2020;22(9):1619–27.

Zhang K, Arora P, Sati N, Béliveau A, Troke N, Veroniki AA, Rodrigues M, Rios P, Zarin W, Tricco AC. Characteristics and methods of incorporating randomized and nonrandomized evidence in network meta-analyses: a scoping review. J Clin Epidemiol. 2019;113:1–10.

Deeks JJ, Dinnes J, D’Amico R, Sowden AJ, Sakarovitch C, Song F, Petticrew M, Altman DG. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii–x.

Valentine JC, Thompson SG. Issues relating to confounding and meta-analysis when including non-randomized studies in systematic reviews on the effects of interventions. Res Synth Methods. 2013;4(1):26–35.

Anglemyer A, Horvath HT, Bero L. Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials. Cochrane Database Syst Rev. 2014;2014(4):Mr000034.

PubMed   PubMed Central   Google Scholar  

Yao M, Mei F, Zou K, Li L, Sun X. A Bayesian bias-adjusted random-effects model for synthesizing evidence from randomized controlled trials and nonrandomized studies of interventions. J Evid Based Med. 2024. https://doi.org/10.1111/jebm.12633

Tang Y, Tang Q, Yu Y, Wen S. A Bayesian meta-analysis method for estimating risk difference of rare events. J BioPharm Stat. 2018;28(3):550–61.

Download references

Acknowledgements

Not applicable.

We acknowledge support from the National Natural Science Foundation of China (Grant No. 72204173, 82274368, and 71904134), National Science Fund for Distinguished Young Scholars (Grant No. 82225049), special fund for traditional Chinese medicine of Sichuan Provincial Administration of Traditional Chinese Medicine (Grant No. 2024zd023), and 1.3.5 project for disciplines of excellence, West China Hospital, Sichuan University (Grant No. ZYGD23004).

Author information

Yun Zhou and Minghong Yao contributed equally to this work as co-first authors.

Authors and Affiliations

Department of Neurosurgery and Chinese Evidence-Based Medicine Center and Cochrane China, Center and MAGIC China Center, West China Hospital, Sichuan University, 37 Guo Xue Xiang, Chengdu, Sichuan, China

Yun Zhou, Minghong Yao, Fan Mei, Yu Ma, Jiayidaer Huan, Kang Zou, Ling Li & Xin Sun

President & Dean’s Office, West China Hospital, Sichuan University, Chengdu, China

NMPA Key Laboratory for Real World Data Research and Evaluation in Hainan, West China Hospital, Sichuan University, Chengdu, China

Minghong Yao, Fan Mei, Yu Ma, Jiayidaer Huan, Kang Zou, Ling Li & Xin Sun

Sichuan Center of Technology Innovation for Real World Data, West China Hospital, Sichuan University, Chengdu, China

Department of Epidemiology and Biostatistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu, China

You can also search for this author in PubMed   Google Scholar

Contributions

Y.Z, and M.Y., contributed equally as co-first authors. X.S., L.L., M.Y., and Y.Z., conceived and designed the study. X.S., M.Y., and L.L. acquired the funding. M.Y. drafted the manuscript. M.Y. conducted the data analysis. X.S., M.Y., L.L., F.M., Y.M., J.H., and K.Z. critically revised the article. X.S. is the guarantor.

Corresponding authors

Correspondence to Ling Li or Xin Sun .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, supplementary material 5, supplementary material 6, supplementary material 7, supplementary material 8, supplementary material 9, supplementary material 10, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Zhou, Y., Yao, M., Mei, F. et al. Integrating randomized controlled trials and non-randomized studies of interventions to assess the effect of rare events: a Bayesian re-analysis of two meta-analyses. BMC Med Res Methodol 24 , 219 (2024). https://doi.org/10.1186/s12874-024-02347-7

Download citation

Received : 10 May 2024

Accepted : 19 September 2024

Published : 27 September 2024

DOI : https://doi.org/10.1186/s12874-024-02347-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Meta-analysis
  • Rare events
  • Non-randomized studies of interventions
  • Risk of bias

BMC Medical Research Methodology

ISSN: 1471-2288

randomised experimental design

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 14, Issue 9
  • Early intravenous high-dose vitamin C in postcardiac arrest shock (VICEPAC): study protocol for a randomised, single-blind, open-label, multicentre, controlled trial
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-8198-1239 Jonathan Chelly 1 , 2 ,
  • Noemie Peres 1 ,
  • Ghada Sboui 3 ,
  • Julien Maizel 4 ,
  • Marion Beuzelin 5 ,
  • Olivier Nigeon 6 ,
  • Sebastien Preau 7 ,
  • Ly Van Phach Vong 8 ,
  • Fabienne Tamion 9 ,
  • Fabien Lambiotte 10 ,
  • Nicolas Deye 11 ,
  • Thibaut Bertrand 1 ,
  • Hélène Behal 12 ,
  • Laurent Ducros 1 ,
  • Christophe Vinsonneau 3
  • 1 Intensive Care Unit , Centre Hospitalier Intercommunal Toulon - La Seyne-sur-Mer , Toulon , Provence-Alpes-Côte d'Azu , France
  • 2 Délégation à la Recherche Clinique et à L'innovation du GHT 83 , Toulon , France
  • 3 Service de Médecine Intensive Réanimation , Centre Hospitalier de Béthune , Bethune , Nord-Pas de Calais , France
  • 4 Service de Médecine Intensive Réanimation , Centre Hospitalier Universitaire Amiens-Picardie , Amiens , Hauts-de-France , France
  • 5 Intensive Care Unit , Centre Hospitalier de Dieppe , Dieppe , France
  • 6 Intensive Care Unit , Centre Hospitalier de Lens , Lens , France
  • 7 Intensive Care Unit , CHU de Lille , Lille , Hauts-de-France , France
  • 8 Marne-La-Vallee Hospital , Jossigny , Île-de-France , France
  • 9 Service de Médecine Intensive Réanimation , Centre Hospitalier Universitaire de Rouen , Rouen , Normandie , France
  • 10 Service de Médecine Intensive Réanimation , Centre Hospitalier de Valenciennes , Valenciennes , Nord-Pas de Calais , France
  • 11 Réanimation Médicale et Toxicologique, AP-HP, INSERM UMR-S 942 , Assistance Publique - Hopitaux de Paris , Paris , France
  • 12 Biostatistics Department , CHU de Lille , Lille , Hauts-de-France , France
  • Correspondence to Dr Jonathan Chelly; JONATHAN.CHELLY{at}ch-toulon.fr

Introduction The high incidence of morbidity and mortality associated with the post-cardiac arrest (CA) period highlights the need for novel therapeutic interventions to improve the outcome of out-of-hospital cardiac arrest (OHCA) patients admitted to the intensive care unit (ICU). The aim of this study is to assess the ability of high-dose intravenous vitamin C (Vit-C) to improve post-CA shock.

Methods and analysis This is a single-blind, open-label, multicentre, randomised controlled trial, involving 234 OHCA patients with post-CA shock planned to be enrolled in 10 French ICUs. Patients will be randomised to receive standard-of-care (SOC) or SOC with early high-dose intravenous Vit-C administration (200 mg/kg per day, within 6 hours after return of spontaneous circulation, for 3 days). The primary endpoint is the cumulative incidence of vasopressor withdrawal at 72 hours after enrolment, with death considered as a competing event. The main secondary endpoints are neurological outcome, mortality due to refractory shock, vasopressor-free days and organ failure monitored by the sequential organ failure assessment score.

Ethics and dissemination The study protocol was approved by a French Ethics Committee (EC) on 21 February 2023 ( Comité de Protection des Personnes Ile de France 1 , Paris, France). Due to the short enrolment period to avoid any delay in treatment, the EC approved the study inclusion before informed consent was obtained. As soon as possible, patient and their relative will be asked for their deferred informed consent. The data from the study will be disseminated through conference presentations and peer-reviewed publications.

Trial registration number NCT05817851 .

  • Out-of-Hospital Cardiac Arrest
  • INTENSIVE & CRITICAL CARE
  • Death, Sudden, Cardiac

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2024-087303

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

STRENGTHS AND LIMITATIONS OF THIS STUDY

A weight-adjusted dose of intravenous vitamin C will be compared with standard-of-care in out-of-hospital cardiac arrest patients requiring vasopressors.

Comprehensive data collection will allow the analysis of vasopressor withdrawal and clinical outcomes, such as organ dysfunction or neurological outcome.

The single-blind design without a placebo in the control group may introduce bias.

Haemodynamic management will be standardised in both arms to avoid variability on the primary endpoint.

Introduction

The management of the post-cardiac arrest (CA) period remains a challenging issue among patients admitted to the intensive care unit (ICU) following an out-of-hospital cardiac arrest (OHCA), despite advancements in postresuscitation care. 1 Indeed, approximately two-thirds of OHCA patients will develop a post-CA syndrome during the initial hours of their management, which can lead to hypotension and shock in up to 70% of cases. 2 3 This post-CA shock, potentially reversible within 72 hours, is characterised by cardiac and haemodynamic failure, which frequently leads to multiorgan failure (MOF). In the absence of rapid and appropriate therapies, early death occurs in 20–55% of patients. 2–6 The current post-CA care strategies are focused on optimising the patient’s haemodynamic condition, ensuring adequate organ oxygenation and perfusion, and attempting to reduce the brain injury. 7 However, the high incidence of morbidity and mortality associated with this post-CA period 2–6 highlights the necessity for the development of novel therapeutic interventions to improve such patients’ outcome. 1 8

The overproduction of reactive oxygen species (ROS) during the reperfusion period induces oxidative stress, which is a key component of the post-CA syndrome. 4 5 9 This oxidative stress contributes to cellular damage, myocardial dysfunction and systemic inflammatory responses. Antioxidants are ROS neutralisers and considered as potential therapeutic agents in managing post-CA syndrome. 10–12 Ascorbic acid or vitamin C (Vit-C) has been previously described, as an ROS neutraliser and an immune response modulator. Vit-C has been demonstrated to improve endogenous catecholamines and vasopressin synthesis, and to reduce endothelial dysfunction. 13 14 These multifaceted effects have suggested that Vit-C was a promising agent in mitigating the complex pathophysiology of post-CA shock. 15 Preclinical and clinical studies have suggested the beneficial effect of intravenous high-dose Vit-C to improve shock and organ failures of septic, postoperative or inflammatory origins. 16–22 Recent meta-analyses have highlighted the potential role of Vit-C to reduce the duration of vasopressor use in patients with septic shock. 23 24 They also provided safety data on such treatment, with no notable side effects reported in all human studies conducted using high-dose intravenous Vit-C (>200 mg/kg per day). Furthermore, Vit-C seems relevant in animal models of CA, with benefits observed in myocardial function, postresuscitation shock, neuroprotection and survival. 25 26

Our aim is to assess the ability of a high-dose intravenous Vit-C administered early following resuscitation to improve the post-CA shock in OHCA. We hypothesise that Vit-C can reduce the severity of post-CA shock, thereby improving organ function, survival and neurological recovery.

Methods and analysis

Study setting.

The early intravenous high-dose Vit-C efficacy on postcardiac arrest shock (VICEPAC) trial is an open-label, multicentre (10 French ICUs), randomised controlled trial (RCT), comparing two arms. The trial is sponsored by the Centre Hospitalier de Béthune-Beuvry – Hôpitaux Publics de l’Artois, a non-university tertiary care centre, without any external funding. The initial study protocol was approved by a French Ethics Committee (EC) on 21 February 2023 ( Comité de Protection des Personnes Ile de France 1 , Paris, France) and will be performed in accordance with the Declaration of Helsinki. The study was prospectively registered at ClinicalTrials.gov on 23 March 2023 ( NCT05817851 ) and at the EudraCT database (EudraCT 2022-500717-64-00). Inclusions have commenced in September 2023. An amended version, to clarify inclusion criteria, was finally approved by the same EC on 4 March 2024. This trial follows the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) guidelines (see online supplemental material for the SPIRIT checklist). 27

Supplemental material

Patients and public involvement.

Patients or the public were not involved in any aspect of the design, conduct, reporting or dissemination of the study protocol.

Population and study protocol

Patients assessed for eligibility will be all the comatose adults admitted alive to a participating ICU after an OHCA. Following screening by the attending physician (see figure 1 for the study flow chart), patients will be included if they meet all the inclusion criteria, including the occurrence of post-CA shock (defined as the requirement for epinephrine or norepinephrine continuous infusion for at least 30 min to maintain mean arterial pressure (MAP)≥65 mm Hg) within the first 6 hours after the return of spontaneous circulation (ROSC), and none of the exclusion criteria ( table 1 ). Following their ICU admission, patients will be randomised as soon as possible, within the first 6 hours after ROSC, and allocated to one of two arms ( figure 2 ). The ‘standard-of-care’ (SOC or control) group will be managed according to the current recommendations. 1 Intravenous Vit-C supplementation for up to 1000 mg per day after day 3 and intravenous thiamine supplementation from ICU admission are allowed in the SOC group. In the ‘Vit-C’ group, in addition to SOC, patients will receive 200 mg/kg per day of intravenous Vit-C (Laroscorbine, ascorbic acid ampoules 200 mg/mL, from Bayer Healthcare, France) for 3 days, divided into four boluses of 50 mg/kg every 6 hours, infused for 30 min each (using 100 mL of a 5% dextrose solution). Investigators will be sensibilize to administer the first dose of Vit-C as soon as possible, within 1 hour following the randomisation. The time of each dose of Vit-C will be recorded. Additionally, patients of the ‘Vit-C’ group will receive intravenous thiamine supplementation (200 mg for 30 min, every 12 hours) for 3 days to prevent the oxalate production and urinary excretion. 28

  • Download figure
  • Open in new tab
  • Download powerpoint

Flow chart of the vitamin C efficacy on postcardiac arrest shock study protocol. ICU, intensive care unit; OHCA, out-of-hospital cardiac arrest; ROSC, return of spontaneous circulation; Vit-C, vitamin C.

Experimental plan of the vitamin C efficacy on postcardiac arrest shock study protocol. ICU, intensive care unit; mRS, modified Rankin Scale; OHCA, out-of-hospital cardiac arrest; ROSC, return of spontaneous circulation; SOFA, sequential organ failure assessment score; Vit-C, vitamin C.

  • View inline

Inclusion and exclusion criteria of the vitamin C efficacy on postcardiac arrest shock trial

For both groups, post-CA care will be managed in accordance with the local procedures of each participating centre and the current European guidelines 1 , including targeted temperature management. Fluid resuscitation and the vasopressor used to restore MAP will be selected by the attending physician, and a decision algorithm will be proposed to adjust the vasopressor continuous infusion level according to MAP ( figure 3 ). Inotrope agents, such as dobutamine, will also be permitted and managed by the attending physician. The prediction of neurological outcome and the decision regarding the withdrawal of life-sustaining therapies will be based on routine practices at each centre in accordance with the current neuroprognostication guidelines. 1

Decision algorithm for haemodynamic management. MAP, mean arterial pressure; *for patients under epinephrine and norepinephrine simultaneously, it is suggested to wean epinephrine first.

Randomisation and treatment allocation

The randomisation list will be generated by a statistical software programme (SAS, SAS Institute, Cary, North Carolina, USA), according to a balanced parallel design (1:1), stratified by site in the randomised blocks of various sizes. Patients will be randomised in each site by local investigators. Patients and their next of kin will be blinded to the allocated treatment throughout the study period.

Data collection and outcome variables

In each site, all data will be pseudoanonymised and collected using an electronic centralised case report form (eCRF) according to Utstein recommendations 29 (see online supplemental table 1 for data collection time points). Patient’s characteristics, including age, gender, body mass index and comorbidities, will be collected. The investigator will estimate the patient’s neurofunctional condition prior to OHCA, using the modified Rankin Scale (mRS). Prehospital data collected will be as follows: OHCA history, CA location and circumstances, bystander cardiopulmonary resuscitation, interval from CA to rescue team arrival, no flow and low flow intervals, initial cardiac rhythm and prehospital management (number of external electric shock delivered and epinephrine boluses, volume of fluid resuscitation and vasopressor dose). The main cause of OHCA (either suspected or confirmed) will be recorded. The sequential organ failure assessment score (SOFA) will be calculated at inclusion (SOFA baseline ), using the worst clinical and biological parameters available between ICU admission and inclusion. Creatinine blood level will be the only considered parameter for the renal SOFA baseline subscore, and the neurological SOFA baseline subscore will be based on the last Glasgow outcome scale assessed prior to the start of sedation. Daily SOFA from day 1 to 3 after inclusion will be calculated as previously published (patients who die within this interval will be assigned the maximum score of 24 points). 30

Primary endpoint

The primary endpoint is the cumulative incidence of vasopressor withdrawal at 72 hours after inclusion. Death during this interval will be considered as a competing event.

Secondary endpoints

The remaining endpoints are as follows: the cumulative incidence of death by refractory shock at day 7 after inclusion, with another mode of death considered as a competing event; the rate of patients with a favourable neurological outcome at day 28 after inclusion (defined as mRS between 0 and 3); the maximal vasopressor level among patients alive during the first 3 days after inclusion; the vasopressor-free days (VFD) at day 28; the difference between SOFA at day 3 (SOFA 72 ) and SOFA baseline and the lowest arterial lactate level measured among alive patients up to 72 hours after inclusion.

Statistical method

All statistical analyses will be performed in accordance with the intention-to-treat principle, and the missing data will be considered using the multiple imputation methods.

The primary endpoint will be described in each arm by using the Kalbfleisch and Prentice method, with death considered as a competing event. A comparison will be performed between both groups using the Fine and Grey model, as proposed by Zhou et al . 31 , which allows for an adjustment by centre with a marginal model. Furthermore, a sensitivity analysis will be conducted on the per-protocol population, and a subgroup analysis will be performed according to the vasopressor level at inclusion.

The cumulative incidence of death by refractory shock at day 7 will be compared between both groups with the same methodology as described for the primary endpoint. The proportion of surviving patients with a favourable neurological outcome at day 28 will be compared between both groups by a generalised linear model using the generalised estimating equation (GEE) method (with binomial distribution and log-link function) to account for the centre effect. In case of non-convergence, the Poisson distribution will be used instead of the binomial distribution. The maximal vasopressor level during the first 3 days, the difference between SOFA 72 and SOFA baseline and the lowest arterial lactate level at day 3 will be compared between both groups using a linear mixed model, including centre as a random effect. In case of a non-normal distribution in the residual model, a Mann–Whitney test will be used. A generalised linear model will be used to compare the VFD at day 28 between both groups, with a GEE method (Poisson or negative log-binomial distribution and log-link function) employed to account for the centre effect.

Sample size calculation

In accordance with the findings of the HYPERDIA study 32 , we have estimated a 28% cumulative incidence of vasopressor withdrawal at day 3 in the SOC group, taking into account of 39% mortality rate in the same interval. A relative increase in the vasopressor withdrawal rate at day 3 of 50% in the Vit-C group (resulting in a 42% cumulative incidence of the primary endpoint) will require the inclusion of 117 patients per group, with a two-sided alpha risk of 5% (log-rank test considering death as a competing event, PASS-12 software), and a power of 80%.

Safety, adverse events (AE), serious adverse events (SAE) and suspected unexpected serious adverse reactions (SUSAR)

A single trial, including 872 ICU patients with septic shock who were randomly assigned to receive either placebo or 200 mg/kg per day of intravenous Vit-C for up to 96 hours, has reported a higher risk of a composite criteria, including death or persistent organ dysfunction at day 28 in the Vit-C group compared with the placebo group. However, all the individual primary and secondary endpoints were comparable between the two groups, and the secondary analysis (such as tissue dysoxia, inflammation or endothelial injury biomarkers) did not identify a potential mechanism for harm associated with Vit-C. 33

The theoretical risks associated with high-dose intravenous Vit-C include a paradoxical pro-oxidative effect among patients with iron overload, haemolysis in patients with glucose-6-phosphate dehydrogenase (G6PD) deficiency and the formation of oxalate kidney stones. 14 However, these risks are limited to susceptible patients, who were excluded from the study, such as haemochromatosis, G6PD deficiency or a history of urolithiasis and oxalate nephropathy ( table 1 ). Furthermore, the thiamine supplementation in the Vit-C group will mitigate the risk of oxalate precipitation. Nevertheless, a factitious hyperglycaemia has previously been described among patients treated with high-dose intravenous Vit-C, but only with some capillary blood glycaemia measuring devices. 33 34 Consequently, each participating centre was advised to measure glycaemia using blood gas analysis or their central laboratory from inclusion to day 7 for patients of the Vit-C group. Several RCTs using similar high-dose intravenous Vit-C and meta-analyses in critically ill patients have not documented any notable additional AEs or SAEs. 19 35–37 However, as required by French legislation regarding RCT on medications, all AEs will be reported in the eCRF, and all SAEs and SUSARs will be promptly notified to the study pharmacological vigilance committee. Furthermore, all other treatments administered from the inclusion until day 28 or ICU discharge will also be recorded.

No interim analysis is planned. As requested by French legislation, a safety report will be submitted annually to the French EC and the French national drugs safety agency ( Agence Nationale de Sécurité du Médicament , St Denis, France).

Ethics and dissemination

The initial study protocol was approved by a French EC on 21 February 2023 ( Comité de Protection des Personnes Ile de France 1 , Paris, France). Due to the short enrolment period and the necessity to avoid any delay in treatment, the EC has approved an emergency inclusion procedure, including the deferred informed consent of the patient and his next of kin. In the absence of a relative who could be informed before the maximal delay of inclusion, and given the initial comatose state of such patients, investigators are permitted to include and randomise the patient to start the allocated treatment without delay. As soon as possible, the patient and his relative will be asked for their deferred informed consent. The EC has not established a specific timeframe for obtaining deferred informed consent from both the patient and his relative. Patient or his relative may choose to withdraw their consent to participate at any time in the study. In accordance with French legislation regarding RCT and deferred consent, the data collected prior to their withdrawal may be retained for analysis. The results of the trial will be disseminated via conference presentations and peer-reviewed publications, in accordance with the Consolidated Standards of Reporting Trials statement. 38

The VICEPAC trial is the first randomised trial to assess the impact of a high-dose intravenous Vit-C, in comparison with SOC, in OHCA patients with post-CA shock requiring vasopressors.

To date, only two recent studies have described the effect of supplemental intravenous Vit-C in such patients. Kim et al conducted a retrospective analysis of 234 OHCA patients treated with or without intravenous Vit-C using at a fixed dose of 6 g per day for 3 days. The main result was the absence of a beneficial of Vit-C on neurological outcome at 1 month after ICU admission. 39 However, the authors could have underestimated the benefit of Vit-C on patients’ outcome for several parameters, including the lower dose of Vit-C used in comparison with previous studies on sepsis, the various causes of CA in their cohort and a potential selection bias due to the retrospective design. 40 Privšek et al have recently published the results of an RCT, comparing a low-dose intravenous Vit-C (3 g per day for 4 days) to placebo, in a small cohort of 30 OHCA patients. The authors demonstrated no difference in neuron-specific enolase levels between the two groups. 41 The only similar study to ours concluded its enrolment phase recently ( NCT03509662 ). The VITaCCA RCT was designed to include 270 OHCA patients with shockable rhythms, regardless of the occurrence of post-CA shock, to compare two different intravenous Vit-C dosing regimens (3 or 10 g daily for 96 hours) to placebo, on organ dysfunction using SOFA as a primary outcome. 42

Several limitations must be highlighted. First, we designed a single-blind trial without placebo in the SOC group. Indeed, we consider our study as pragmatic and exploratory, mainly focused on the haemodynamic impact of Vit-C on post-CA shock. However, our study could provide complementary results to the VITaCCA (early high-dose vitamin C in post-cardiac arrest syndrome) trial 42 , and both studies could warrant further investigation regarding Vit-C supplementation in CA. Although the single-blind design may introduce bias regarding our primary endpoints, we believe this will be mitigated by the implementation of the decision algorithm ( figure 3 ) for both groups, which will guide ICU nurses and physicians in vasopressor management during the study period. Second, while two studies have previously demonstrated a significant reduction in vasopressor duration (>50%) using a bundle of hydrocortisone, Vit-C and thiamine supplementation in patients with severe sepsis and septic shock 18 43 , our primary hypothesis for the sample size calculation remains empirical, given the lack of published preclinical or clinical data on Vit-C in CA. Nevertheless, the VITaCCA trial, as mentioned above, has a similar sample size to assess the ability of Vit-C to improve organ failure in OHCA patients. 42 Third, we decided that a minimum dose of vasopressor would not be included in the inclusion criteria, even if they are introduced at low levels. Indeed, there is currently no consensus regarding the minimum dose of vasopressor to define post-CA shock. Moreover, we have considered that awaiting a significant increase in vasopressor level might delay the initiation of Vit-C, and we planned a subgroup analysis on the primary endpoint according to the level of vasopressors at inclusion. Fourth, the study does not include any blood measurement of Vit-C levels. OHCA patients were previously described as at high risk of Vit-C deficiency at ICU admission, similar to septic patients. This progressive decline in Vit-C levels over time is mainly due to a significant oxidative stress, with an increasing metabolic demand and an excessive Vit-C consumption. 44 But the high dose of Vit-C selected in our study is similar to the previous works on septic shock and allows to reach supranormal Vit-C blood levels as previously demonstrated. 45

In conclusion, this trial addresses a significant gap in current postresuscitation care practices by focusing on a novel therapeutic approach. The results of our study could provide valuable insights into the role of Vit-C regarding post-CA care.

Trial status

The first patient was enrolled on 27 December 2023. The current version of the study protocol is V.3.0, dated 22 January 2024. The estimated date of study completion is December 2025.

Ethics statements

Patient consent for publication.

Not applicable.

Acknowledgments

This article is dedicated to the memory of Christophe Vinsonneau, who passed away on 7 April 2024. Our esteemed colleague and friend made a significant contribution to the design of this study, and to the French scientific research in critical care throughout his career. We thank the Délégation à la Recherche Clinique et à l’Innovation (DRCI) du Groupement Hospitalier de Territoire du Var for the eCRF conception help, and the data management support. We would like also to express our great thanks to all the VICEPAC trial participating centres for their support in the protocol improvement and follow-up, and to the BOREAL research network for its support in the recruitment of the participating centers.

  • Sandroni C ,
  • Böttiger BW , et al
  • Lemiale V ,
  • Mongardon N , et al
  • Hirsch KG ,
  • Abella BS ,
  • Amorim E , et al
  • Adib-Conquy M ,
  • Laurent I , et al
  • Laurent I ,
  • Monchi M , et al
  • Chiche J-D , et al
  • Jozwiak M ,
  • Bougouin W ,
  • Geri G , et al
  • Brosnahan SB ,
  • Papadopoulos J , et al
  • Negovsky VA
  • Collard CD ,
  • Claeys MJ ,
  • Timmermans JP , et al
  • Yiang G-T ,
  • Liao W-T , et al
  • Oudemans-van Straaten HM ,
  • de Waard MC
  • Berger MM ,
  • Oudemans-van Straaten HM
  • Spoelstra-de Man AME ,
  • Elbers PWG ,
  • Fowler AA ,
  • Knowlson S , et al
  • Mohammadi M ,
  • Ramezani M , et al
  • Khangoora V ,
  • Rivera R , et al
  • Truwit JD ,
  • Hite RD , et al
  • Matsuda T ,
  • Miyagantani Y , et al
  • Nathens AB ,
  • Jurkovich GJ , et al
  • Lankadeva YR ,
  • Peiris RM ,
  • Okazaki N , et al
  • Shao H , et al
  • Zhang K , et al
  • Huang C-H ,
  • Tsai C-Y , et al
  • Tetzlaff JM ,
  • Altman DG , et al
  • Moskowitz A ,
  • Hou PC , et al
  • Perkins GD ,
  • Jacobs IG ,
  • Nadkarni VM , et al
  • Vincent J-L ,
  • Matos R , et al
  • Latouche A , et al
  • Grimaldi D ,
  • Seguin T , et al
  • Lamontagne F ,
  • Masse M-H ,
  • Menard J , et al
  • Yim J , et al
  • Ortiz-Reyes L ,
  • Lew CCH , et al
  • Schulz KF ,
  • Altman DG ,
  • Moher D , et al
  • Kim YH , et al
  • Vinsonneau C ,
  • Peres N , et al
  • Privšek M ,
  • Rozemeijer S ,
  • de Grooth H-J ,
  • Elbers PWG , et al
  • Iglesias J ,
  • Vassallo AV ,
  • Patel VV , et al
  • Gardner R ,
  • Wang Y , et al
  • Manubulu-Choo W-P ,
  • Zandvliet AS , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2

X @Chellyjonathan

Contributors As the study guarantor, JC is responsible for the overall content. JC, HB, and CV conceived, designed, and supervised the study protocol and wrote the first draft of the manuscript. NP, GS, JM, MB, ON, SP, LVPV, FT, FL, ND, TB, and LD are investigators of the participating ICU and reviewed the manuscript for important intellectual content and approved the final submitted version.

Funding This trial is integrally supported by the Centre Hospitalier de Béthune-Beuvry – Hôpitaux Publics de l’Artois.

Competing interests None declared.

Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • For authors
  • New editors
  • BMJ Journals

You are here

  • Online First
  • Single case experimental design: a rigorous method for addressing inequity and enhancing precision within Para sport and exercise medicine research
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-2011-3382 Sean Tweedy 1 , 2 ,
  • Iain Mayank Dutia 1 , 3 ,
  • John Cairney 1 , 2 ,
  • Emma Beckman 1 , 4
  • 1 The School of Human Movement and Nutrition Sciences , The University of Queensland , Saint Lucia , Queensland , Australia
  • 2 The Queensland Centre for Olympic and Paralympic Studies , The University of Queensland - St Lucia Campus , Brisbane , Queensland , Australia
  • 3 School of Allied Health , Australian Catholic University, Banyo Campus , Brisbane , Queensland , Australia
  • 4 Para sport , Queensland Academy of Sport , Sunnybank , Queensland , Australia
  • Correspondence to Dr Sean Tweedy; s.tweedy{at}uq.edu.au

https://doi.org/10.1136/bjsports-2024-108587

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • Disabled Persons
  • Neurological rehabilitation

Approximately 4400 athletes from 184 nations competed in 22 sports at the 2024 Paris Paralympic Games. However, it is recognised that athletes with more severe disabilities and high support needs are under-represented in sport, and strategies to increase representation are required. Focusing on individuals with cerebral palsy (CP), we present evidence that people with high support needs are also under-represented in Para sport and exercise medicine (P-SEM) research. We outline why single case experimental designs (SCEDs) are a rigorous and effective means of addressing under-representation in P-SEM research.

Cerebral palsy

CP is an eligible underlying health condition for 17 of the 22 Paralympic sports. It results from a non-progressive brain lesion and is defined as a heterogeneous group of permanent disorders affecting movement and posture. 1 CP heterogeneity is multidimensional and can be classified based on:

Neurological subtype: Spastic CP (quadriplegia/diplegia/hemiplegia); dyskinetic; ataxic; and mixed. 2 Subtypes vary in severity and anatomical distribution.

Functional effects: The Gross Motor Function Classification System (GMFCS) is the most common and has five levels: GMFCS level I (least severe) and II are able to walk independently; GMFCS IV/V use wheeled mobility and typically have high support needs (CP-HSN).

Exercise training research in people with CP

At least three major reviews have analysed randomised controlled trials (RCTs) that evaluate exercise training in people with CP and all report that CP-HSN participants are under-represented. 4–6 CP-HSN constitutes approximately 30% of the CP population 7 but …

Contributors ST is the guarantor of this work. All authors made substantial contributions to conception of the work, provided critical revisions, approved the final version and agreed to be accountable for the accuracy and integrity of the work.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests ST, JC and EB work for The University of Queensland, the Official Higher Education Partner for Paralympics Australia. EB is on secondment as Head of Parasport with the Queensland Academy of Sport and Paralympics Australia. ST is a member of the International Paralympic Committee’s Classification Compliance and Oversight Committee and a member of World Para Athletics Classification Advisory Group.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

IMAGES

  1. Experimental Designs 1 Completely Randomized Design 2 Randomized

    randomised experimental design

  2. Diagram form Two-group simple randomized experimental design

    randomised experimental design

  3. PPT

    randomised experimental design

  4. Experimental Design

    randomised experimental design

  5. PPT

    randomised experimental design

  6. Diagram form Two-group simple randomized experimental design

    randomised experimental design

VIDEO

  1. The Completely Randomized Design (CRD)

  2. Completely Randomized Design (CRD)

  3. Randomized block design (RBD)

  4. Random block design // B.sc // degree// statistics// semester 6

  5. QUASI

  6. 'RANDOM'

COMMENTS

  1. Guide to Experimental Design

    A completely randomized design vs a randomized block design. A between-subjects design vs a within-subjects design. Randomization. An experiment can be completely randomized or randomized within blocks (aka strata): In a completely randomized design, every subject is assigned to a treatment group at random.

  2. Experimental Design

    In a randomized experimental design, objects or individuals are randomly assigned (by chance) to an experimental group. Using randomization is the most reliable method of creating homogeneous treatment groups, without involving any potential biases or judgments. There are several variations of randomized experimental designs, two of which are ...

  3. Randomized experiment

    In the design of experiments, the simplest design for comparing treatments is the "completely randomized design". Some "restriction on randomization" can occur with blocking and experiments that have hard-to-change factors; additional restrictions on randomization can occur when a full randomization is infeasible or when it is desirable to ...

  4. Experimental Design: Definition and Types

    An experimental design is a detailed plan for collecting and using data to identify causal relationships. Through careful planning, the design of experiments allows your data collection efforts to have a reasonable chance of detecting effects and testing hypotheses that answer your research questions. An experiment is a data collection ...

  5. Random Assignment in Experiments

    Random Assignment in Experiments | Introduction & Examples. Published on March 8, 2021 by Pritha Bhandari.Revised on June 22, 2023. In experimental research, random assignment is a way of placing participants from your sample into different treatment groups using randomization. With simple random assignment, every member of the sample has a known or equal chance of being placed in a control ...

  6. 7.2: Completely Randomized Design

    In a completely randomized design, treatments are assigned to experimental units at random. This is typically done by listing the treatments and assigning a random number to each. In the greenhouse experiment discussed in Chapter 1, there was a single factor (fertilizer) with 4 levels (i.e. 4 treatments), six replications, and a total of 24 ...

  7. PDF How to design a randomised controlled trial

    How to design a randomised controlled trial P. Brocklehurst*1 and Z. Hoare2 Before you start Whether you are designing an individually ... Fig. 1 The experimental approach to evaluation

  8. Experimental Research Design

    Experimental research design is centrally concerned with constructing research that is high in causal (internal) validity. Randomized experimental designs provide the highest levels of causal validity. Quasi-experimental designs have a number of potential threats to their causal validity. Yet, new quasi-experimental designs adopted from fields ...

  9. Methods of Randomization in Experimental Design

    In Methods of Randomization in Experimental Design, author Valentim R. Alferes presents the main procedures of random assignment and local control in between-subjects experimental designs and the counterbalancing schemes in within-subjects or cross-over experimental designs. Alferes uses a pedagogical strategy that allows the reader to implement all randomization methods by relying on the ...

  10. Randomised controlled trials—the gold standard for effectiveness

    Randomised controlled trials—the gold standard for effectiveness research. Randomized controlled trials (RCT) are prospective studies that measure the effectiveness of a new intervention or treatment. Although no study is likely on its own to prove causality, randomization reduces bias and provides a rigorous tool to examine cause-effect ...

  11. Randomization in Statistics and Experimental Design

    Permuted block randomization is a way to randomly allocate a participant to a treatment group, while keeping a balance across treatment groups. Each "block" has a specified number of randomly ordered treatment assignments. 3. Stratified Random Sampling. Stratified random sampling is useful when you can subdivide areas.

  12. The "completely randomised" and the "randomised block" are the only

    In "Completely randomized" (CR) and "Randomised block" (RB) experimental designs, both the assignment of treatments to experimental subjects and the order in which the experiment is done ...

  13. Randomized Controlled Trial (RCT) Overview

    A randomized controlled trial (RCT) is a prospective experimental design that randomly assigns participants to an experimental or control group. RCTs are the gold standard for establishing causal relationships and ruling out confounding variables and selection bias. Researchers must be able to control who receives the treatments and who are the ...

  14. Experimental Design

    The completely randomized design is probably the simplest experimental design, in terms of data analysis and convenience. With this design, participants are randomly assigned to treatments. A completely randomized design for the Acme Experiment is shown in the table below. In this design, the experimenter randomly assigned participants to one ...

  15. What is a Completely Randomized Design?

    The most basic experimental design is the completely randomized design. It is simple and straightforward when plenty of unrelated subjects are available for an experiment. It's so simple, it almost seems obvious. But there are important principles in this simple design that are important for tackling more complex experimental designs.

  16. Statistical Experimental Design: Completely Randomized Designs

    Objectives. CRD is the simplest experimental design. In CRD, treatments are assigned randomly to experimental units. CRD assumes that the experimental units are relatively homogeneous or similar. CRD doesn't remove or account for systematic differences among experimental units.

  17. A Refresher on Randomized Controlled Experiments

    A Refresher on Randomized Controlled Experiments. In order to make smart decisions at work, we need data. Where that data comes from and how we analyze it depends on a lot of factors — for ...

  18. Experimental Design

    Experimental design is a process of planning and conducting scientific experiments to investigate a hypothesis or research question. It involves carefully designing an experiment that can test the hypothesis, and controlling for other variables that may influence the results. Experimental design typically includes identifying the variables that ...

  19. 14.4: Randomized block design

    A. a completely randomized experimental design B. a randomized block design C. a two-factor factorial experiment D. a random effects or Type II ANOVA E. a mixed model or Type III ANOVA F. a fixed effects model or Type I ANOVA; A clinician wishes to compare the effectiveness of three competing brands of blood pressure medication. She takes a ...

  20. Completely randomized design

    Completely randomized design. In the design of experiments, completely randomized designs are for studying the effects of one primary factor without the need to take other nuisance variables into account. This article describes completely randomized designs that have one primary factor. The experiment compares the values of a response variable ...

  21. Why Randomization in Experimental Design Triumphs

    Randomizing the experiments helps you get the best cause-effect relationships between the variables. It makes sure that the random selection is done from all genders, casts, races and the groups are not too different from each other. Researchers control values of the explanatory variable with a randomization procedure.

  22. Randomized Block Design

    The correct answer is (C). The blocking variable is not of primary interest to an experimenter, so the experimenter would not choose a randomized block design to test the effect of a blocking variable. A randomized block design assumes that there is no interaction between a blocking variable and an independent variable, so the experimenter ...

  23. Quasi-Experimental Design

    Revised on January 22, 2024. Like a true experiment, a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable. However, unlike a true experiment, a quasi-experiment does not rely on random assignment. Instead, subjects are assigned to groups based on non-random criteria.

  24. Treatment Effects in Randomized and Nonrandomized Studies of

    Each circle shows the summary odds ratio (OR) obtained from a meta-analysis of randomized clinical trials (RCTs; vertical axis) and nonrandomized studies (NRSs; horizonal axis) for 1 clinical question. ... Wells GA, Waddington H. Quasi-experimental study designs series-paper 5: a checklist for classifying studies evaluating the effects on ...

  25. Integrating randomized controlled trials and non-randomized studies of

    Evidence from high-quality randomized controlled trials (RCTs) is considered the gold standard for assessing the relative effects of health interventions [].However, RCTs have a strictly experimental setting and their inclusion criteria may limit their generalizability to real-world clinical practice [].Meta-analyses often ignore evidence from non-randomized studies of interventions (NRSIs ...

  26. Early intravenous high-dose vitamin C in postcardiac arrest shock

    Methods and analysis This is a single-blind, open-label, multicentre, randomised controlled trial, involving 234 OHCA patients with post-CA shock planned to be enrolled in 10 French ICUs. Patients will be randomised to receive standard-of-care (SOC) or SOC with early high-dose intravenous Vit-C administration (200 mg/kg per day, within 6 hours after return of spontaneous circulation, for 3 days).

  27. Single case experimental design: a rigorous method for addressing

    Approximately 4400 athletes from 184 nations competed in 22 sports at the 2024 Paris Paralympic Games. However, it is recognised that athletes with more severe disabilities and high support needs are under-represented in sport, and strategies to increase representation are required. Focusing on individuals with cerebral palsy (CP), we present evidence that people with high support needs are ...