Icon image

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS: Edition 5

About this ebook, ratings and reviews.

introduction to biostatistics and research methods

  • Flag inappropriate

introduction to biostatistics and research methods

About the author

Rate this ebook, reading information, similar ebooks.

Thumbnail image

PHI LOGO

PHI Learning

Helping teachers to teach and students to learn.

EASTERN ECONMIC EDITION

Registration Form

Recover password, order detail.

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS

Rao, p. s. s. sundar, richard, j..

  • About the Book
  • About the Author
  • Books by the Same Author
  • Endorsements
  • Write a Review

Description:

Chapter-end exercises

Review the Book

Book ISBN :
Rate :

This book is out-of-stock at the moment. Please proceed to buy eBook. Click on Buy eBook.

Our books and ebooks are also available on

Amazon

Subscribe to our Newsletter

Subscribe to our Blog

Copyright © 2024 · All Rights Reserved · PHI Learning

Privacy Policy | Terms & Conditions | Refund and Cancellation Policy | Disclaimer | Intellectual Property | Shipping Policy | Payments and Logistics Partners Information | Process Flow for Book Purchase

introduction to biostatistics and research methods

introduction to biostatistics and research methods

  • Higher Education Textbooks
  • Medicine & Health Sciences

introduction to biostatistics and research methods

Sorry, there was a problem.

Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet or computer – no Kindle device required .

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera, scan the code below and download the Kindle app.

QR code to download the Kindle App

Image Unavailable

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS 5TH EDITION

  • To view this video download Flash Player

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS 5TH EDITION Paperback – 1 January 2012

7 days replacement.

Replacement Reason Replacement Period Replacement Policy
Physical Damage,
Defective,
Wrong and Missing Item
7 days from delivery Replacement

Replacement Instructions

introduction to biostatistics and research methods

Purchase options and add-ons

  • ISBN-10 8120345207
  • ISBN-13 978-8120345201
  • Edition 5th
  • Publisher PHI Learning Pvt. Ltd.
  • Publication date 1 January 2012
  • Language English
  • Dimensions 20 x 5 x 25 cm
  • Print length 280 pages
  • See all details

Frequently bought together

INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS 5TH EDITION

Customers who viewed this item also viewed

Biostatistics

Product details

  • Publisher ‏ : ‎ PHI Learning Pvt. Ltd.; 5th edition (1 January 2012)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 280 pages
  • ISBN-10 ‏ : ‎ 8120345207
  • ISBN-13 ‏ : ‎ 978-8120345201
  • Item Weight ‏ : ‎ 395 g
  • Dimensions ‏ : ‎ 20 x 5 x 25 cm
  • Country of Origin ‏ : ‎ India
  • #40 in Biotechnology Engineering Textbooks
  • #155 in General Medicine
  • #574 in Mathematics (Books)

Customer reviews

  • 5 star 4 star 3 star 2 star 1 star 5 star 56% 31% 4% 5% 4% 56%
  • 5 star 4 star 3 star 2 star 1 star 4 star 56% 31% 4% 5% 4% 31%
  • 5 star 4 star 3 star 2 star 1 star 3 star 56% 31% 4% 5% 4% 4%
  • 5 star 4 star 3 star 2 star 1 star 2 star 56% 31% 4% 5% 4% 5%
  • 5 star 4 star 3 star 2 star 1 star 1 star 56% 31% 4% 5% 4% 4%

Reviews with images

Customer Image

  • Sort reviews by Top reviews Most recent Top reviews

Top reviews from India

There was a problem filtering reviews right now. please try again later..

introduction to biostatistics and research methods

Top reviews from other countries

  • Press Releases
  • Amazon Science
  • Sell on Amazon
  • Sell under Amazon Accelerator
  • Protect and Build Your Brand
  • Amazon Global Selling
  • Supply to Amazon
  • Become an Affiliate
  • Fulfilment by Amazon
  • Advertise Your Products
  • Amazon Pay on Merchants
  • Your Account
  • Returns Centre
  • Recalls and Product Safety Alerts
  • 100% Purchase Protection
  • Amazon App Download
 
  • Conditions of Use & Sale
  • Privacy Notice
  • Interest-Based Ads

introduction to biostatistics and research methods

Contact for queries or book printing: WhatsApp at +92-323-1976177 . To estimate the print cost CLICK HERE.

upmed-logo shop store

Contact +92 323 1976177 for any queries.

Introduction to Biostatistics and Research Methods - 2nd Edition - Muhammad Ibrahim

Introduction to Biostatistics and Research Methods – 2nd Edition – Muhammad Ibrahim

$  2.78

Author: Muhammad Ibraheem.

Edition: 2nd (Second Edition)

Publisher: Al Hassan Publicatios.

Description

  • The book “Introduction to Biostatistics and Research Methods – Muhammad Ibrahim” Second Edition is the recommended text for clinical researchers, physiotherapists, nurses, and medical students.
  • This is a frequently used book for research and biostats, particularly among students of physiotherapy and allied health professions at the University of Health Sciences Lahore, King Edward Medical University Lahore, and many others.
  • One of the books suggested for research methodology and biostats by the Higher Education Commission of Pakistan.

Included Chapters on:

  • Sampling Techniques
  • Management Of Data
  • Measure Of Location
  • Measure Of Variation
  • Probability
  • Measure Of Relationship
  • Statistical Inferences
  • Validity & Reliability
  • Review Of Literature
  • Research Problems
  • Research Design
  • Data Collection Instruments
  • Computer application

Additional information

Weight 0.9 kg

Related products

Basic-Statistics-for-the-Health-Sciences-Kuzma-upmed.net

Kuzma Basic Statistics for the Health Sciences 5th Edition

Username or Email Address

Remember Me

Join Our WhatsApp Channel

Please join our WhatsApp channel for latest updates like New arrivals and more!

  • News & Highlights

Search

  • Publications and Documents
  • Education in C/T Science
  • Browse Our Courses
  • C/T Research Academy
  • K12 Investigator Training
  • Harvard Catalyst On-Demand
  • Translational Innovator
  • SMART IRB Reliance Request
  • Biostatistics Consulting
  • Regulatory Support
  • Pilot Funding
  • Informatics Program
  • Community Engagement
  • Diversity Inclusion
  • Research Enrollment and Diversity
  • Harvard Catalyst Profiles

Harvard Catalyst Logo

Applied Biostatistics Certificate: Methods & Applications

Online professional certificate program on the principles and methods of biostatistics

Program Tracks

  • Additional Resources

Course Materials

Harvard Catalyst Desire2Learn

You will only be able to access this system if you have received a confirmation letter for this program.

For more information:

Program goals.

  • Understand the most commonly used approaches in medical statistics
  • Choose an appropriate analysis
  • Implement these techniques in statistical software
  • Assess the statistical methods chosen in a paper

This online professional certificate program offers a comprehensive introduction to biostatistics in medical research. The program includes a review of the most common techniques in the field, as well as the manner in which these techniques are applied in standard statistical software. Weekly lectures are combined with learning assessments and practicum exercises to support and engage participants. Optional discussion boards, office hours, and journal club sessions are available for those who wish to engage further with colleagues and faculty.

By the conclusion of the program, participants will be able to:

  • Choose an appropriate statistical analysis plan
  • Calculate the sample size needed to complete a study
  • Analyze the collected data
  • Communicate the results from their experiment

Wondering what a professional certificate entails? Go to our professional credits and certificate page to learn more.

Free Option for Harvard Affiliates

Participants must select one of the following program tracks: 

Track 1: Applied Biostatistics: Core Curriculum

Ideal for participants who are interested in an abbreviated curriculum and shorter program commitment.

Track 2: Applied Biostatistics: Core Curriculum and Advanced Topics

Ideal for learners who are interested in a more comprehensive program, with the opportunity to select from more advanced topics at the end of the program.

For more information, please visit Program Tracks .

Session dates

Track 1: September 9, 2024 – April 1, 2025

Track 2: September 9, 2024 – July 1, 2025

Time commitment

It is expected that program participants will dedicate up to four hours per week to this online program. 

Each week, participants will view a multi-part video lecture and will be responsible for completing a quiz and practicum exercise related to the topic for the week. The practicum will demonstrate how to apply the technique in a standard statistical package (STATA®) and provide practice problems so that the concept is reinforced.

Those participants who view all videos and complete the lecture quizzes and practicum quizzes associated with 80% of program content will earn a certificate of completion.

Professionals interested in learning and applying commonly used approaches in medical statistics.

Eligibility

MPH, MD, PhD, DMD, or doctorate-level degree.

  • Harvard affiliates: $1750.00
  • Non-Harvard affiliates: $2500.00
  • Harvard affiliates: $2800.00
  • Non-Harvard affiliates: $4000.00
  • Applied Biostatistics Certificate participants who wish to withdraw from the program for a full refund must request to do so by October 10th. Participants who wish to withdraw from the program and defer their enrollment to the following year must do so by October 10th. Enrollment will be deferred to the program offered in the next calendar year. Any paid program fees will not be eligible for refund or transfer after the deadline.  Please contact [email protected] for more information.
  • Cancellation and Refund Policy [PDF]
  • Additional 10% off for nurses and Allied Health Professionals (can be combined with other discounts)
  • Community Partners of Harvard Catalyst Programs
  • Countries with  GNI  below $13,000

Accreditation statement

The Harvard Catalyst Education Program is accredited by the Massachusetts Medical Society to provide continuing medical education for physicians.

The application is currently closed.

Introduction to Biostatistics

  • First Online: 24 June 2018

Cite this chapter

introduction to biostatistics and research methods

  • Allen M. Khakshooy 3 &
  • Francesco Chiappelli 4  

1182 Accesses

By the term “biostatistics,” we mean the application of the field of probability and statistics to a wide range of topics that pertain to the biological sciences. We focus our discussion on the practical applications of fundamental biostatistics in the domain of healthcare, including experimental and clinical medicine, dentistry, and nursing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Note the acronym stands originally for population, intervention, comparator, outcome, timeline, and setting; the latter two are parenthetic such that they are not always used or available to use; in any case they can be described as PICO, PICOT, or PICOS research questions.

For example, just a few years ago citizens of the United States questioned the lack of universal healthcare in their country. This was deemed as a problem to the overall well-being of the United States and its constituents, which was supported by epidemiological evidence among others (i.e., mortality rates, prevalence of preventable diseases, etc.). Moreover, the evidence proved that there was much need for an affordable and accessible healthcare plan that would solve the problems that resulted from a lack of universal healthcare in the United States. Hence, in 2008, the US Congress passed the Affordable Care Act which was aimed at settling this real-world problem for the overall well-being of the healthcare field (i.e., legislative policy) and its constituents (i.e., US citizens).

Bibliography

Gaba E. A bust of Socrates in the Louvre [Online Image]. 2005. https://commons.wikimedia.org/wiki/File:Socrates_Louvre.jpg#file .

Download references

Author information

Authors and affiliations.

Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel

Allen M. Khakshooy

UCLA School of Dentistry, Los Angeles, CA, USA

Francesco Chiappelli

You can also search for this author in PubMed   Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Khakshooy, A.M., Chiappelli, F. (2018). Introduction to Biostatistics. In: Practical Biostatistics in Translational Healthcare. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-57437-9_1

Download citation

DOI : https://doi.org/10.1007/978-3-662-57437-9_1

Published : 24 June 2018

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-662-57435-5

Online ISBN : 978-3-662-57437-9

eBook Packages : Medicine Medicine (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Basic Concepts for Biostatistics

Lisa Sullivan, PhD, Professor of Biostatistics, Boston University School of Public Health

introduction to biostatistics and research methods

Introduction

Biostatistics is the application of statistical principles to questions and problems in medicine, public health or biology. One can imagine that it might be of interest to characterize a given population (e.g., adults in Boston or all children in the United States) with respect to the proportion of subjects who are overweight or the proportion who have asthma, and it would also be important to estimate the magnitude of these problems over time or perhaps in different locations. In other circumstances in would be important to make comparisons among groups of subjects in order to determine whether certain behaviors (e.g., smoking, exercise, etc.) are associated with a greater risk of certain health outcomes. It would, of course, be impossible to answer all such questions by collecting information (data) from all subjects in the populations of interest. A more realistic approach is to study samples or subsets of a population. The discipline of biostatistics provides tools and techniques for collecting data and then summarizing, analyzing, and interpreting it. If the samples one takes are representative of the population of interest, they will provide good estimates regarding the population overall. Consequently, in biostatistics one analyzes samples in order to make inferences about the population. This module introduces fundamental concepts and definitions for biostatistics.

Learning Objectives

After completing this module, the student will be able to:

  • Define and distinguish between populations and samples.
  • Define and distinguish between population parameters and sample statistics.
  • Compute a sample mean, sample variance, and sample standard deviation.
  • Compute a population mean, population variance, and population standard deviation.
  • Explain what is meant by statistical inference.

----------- 

Population Parameters versus Sample Statistics

As noted in the Introduction, a fundamental task of biostatistics is to analyze samples in order to make inferences about the population from which the samples were drawn .  To illustrate this, consider the population of Massachusetts in 2010, which consisted of 6,547,629 persons. One characteristic (or variable) of potential interest might be the diastolic blood pressure of the population. There are a number of ways of reporting and analyzing this, which will be considered in the module on Summarizing Data. However, for the time being, we will focus on the mean diastolic blood pressure of all people living in Massachusetts. It is obviously not feasible to measure and record blood pressures for of all the residents, but one could take samples of the population in order estimate the population's mean diastolic blood pressure.

Map of Massachusetts with thousands of iconic people overlayed. Three random samples are drawn from the population and each sample has a slightly different mean value.

Despite the simplicity of this example, it raises a series of concepts and terms that need to be defined. The terms population , subjects , sample , variable , and data elements are defined in the tabbed activity below.

  

It is possible to select many samples from a given population, and we will see in other learning modules that there are several methods that can be used for selecting subjects from a population into a sample. The simple example above shows three small samples that were drawn to estimate the mean diastolic blood pressure of Massachusetts residents, although it doesn't specify how the samples were drawn. Note also that each of the samples provided a different estimate of the mean value for the population, and none of the estimates was the same as the actual mean for the overall population (78 mm Hg in this hypothetical example). In reality, one generally doesn't know the true mean values of the characteristics of the population, which is of course why we are trying to estimate them from samples. Consequently, it is important to define and distinguish between:

  • population size versus sample size
  • parameter versus sample statistic.

Sample Statistics

In order to illustrate the computation of sample statistics, we selected a small subset (n=10) of participants in the Framingham Heart Study. The data values for these ten individuals are shown in the table below. The rightmost column contains the body mass index (BMI) computed using the height and weight measurements. We will come back to this example in the module on Summarizing Data, but it provides a useful illustration of some of the terms that have been introduced and will also serve to illustrate the computation of some sample statistics.

Data Values for a Small Sample

 Participant ID

Systolic Blood Pressure

Diastolic Blood Pressure

Total Serum Cholesterol

 Weight

 Height

 Body Mass Index

1

141

76

199

138

63.00

24.4

2

119

64

150

183

69.75

26.4

3

122

62

227

153

65.75

24.9

4

127

81

227

178

70.00

25.5

5

125

70

163

161

70.50

22.8

6

123

72

210

206

70.00

29.6

7

105

81

205

235

72.00

31.9

8

113

63

275

151

60.75

28.8

9

106

67

208

213

69.00

31.5

10

131

77

159

142

61.00

26.8

The first summary statistic that is important to report is the sample size. In this example the sample size is n=10. Because this sample is small (n=10), it is easy to summarize the sample by inspecting the observed values, for example, by listing the diastolic blood pressures in ascending order:

62        63        64        67        70        72        76        77        81        81

Simple inspection of this small sample gives us a sense of the center of the observed diastolic pressures and also gives us a sense of how much variability there is. However, for a large sample, inspection of the individual data values does not provide a meaningful summary, and summary statistics are necessary.  The two key components of a useful summary for a continuous variable are:

  • a description of the center or 'average' of the data (i.e., what is a typical value?) and
  • an indication of the variability in the data.   

Sample Mean

There are several statistics that describe the center of the data, but for now we will focus on the sample mean, which is computed by summing all of the values for a particular variable in the sample and dividing by the sample size. For the sample of diastolic blood pressures in the table above, the sample mean is computed as follows:

To simplify the formulas for sample statistics (and for population parameters), we usually denote the variable of interest as "X".  X is simply a placeholder for the variable being analyzed.  Here X=diastolic blood pressure. 

The general formula for the sample mean is:

The X with the bar over it represents the sample mean, and it is read as "X bar". The Σ indicates summation (i.e., sum of the X's or sum of the diastolic blood pressures in this example). 

When reporting summary statistics for a continuous variable, the convention is to report one more decimal place than the number of decimal places measured.  Systolic and diastolic blood pressures, total serum cholesterol and weight were measured to the nearest integer, therefore the summary statistics are reported to the nearest tenth place. Height was measured to the nearest quarter inch (hundredths place), therefore the summary statistics are reported to the nearest thousandths place. Body mass index was computed to the nearest tenths place, summary statistics are reported to the nearest hundredths place.  

Sample Variance and Standard Deviation 

If there are no extreme or outlying values of the variable, the mean is the most appropriate summary of a typical value, and to summarize variability in the data we specifically estimate the variability in the sample around the sample mean. If all of the observed values in a sample are close to the sample mean, the standard deviation will be small (i.e., close to zero), and if the observed values vary widely around the sample mean, the standard deviation will be large.  If all of the values in the sample are identical, the sample standard deviation will be zero.

When discussing the sample mean, we found that the sample mean for diastolic blood pressure = 71.3. The table below shows each of the observed values along with its respective deviation from the sample mean.

Table - Diastolic Blood Pressures and Deviations from the Sample Mean

X=Diastolic Blood Pressure

Deviation from the Mean

76

4.7

64

-7.3

62

-9.3

81

9.7

70

-1.3

72

0.7

81

9.7

63

-8.3

67

-4.3

77

5.7

The deviations from the mean reflect how far each individual's diastolic blood pressure is from the mean diastolic blood pressure. The first participant's diastolic blood pressure is 4.7 units above the mean while the second participant's diastolic blood pressure is 7.3 units below the mean. What we need is a summary of these deviations from the mean, in particular a measure of how far, on average, each participant is from the mean diastolic blood pressure.  If we compute the mean of the deviations by summing the deviations and dividing by the sample size we run into a problem.  The sum of the deviations from the mean is zero.  This will always be the case as it is a property of the sample mean, i.e., the sum of the deviations below the mean will always equal the sum of the deviations above the mean. However, the goal is to capture the magnitude of these deviations in a summary measure. To address this problem of the deviations summing to zero, we could take absolute values or square each deviation from the mean.  Both methods would address the problem.  The more popular method to summarize the deviations from the mean involves squaring the deviations (absolute values are difficult in mathematical proofs). The table below displays each of the observed values, the respective deviations from the sample mean and the squared deviations from the mean.

76

4.7

22.09

64

-7.3

53.29

62

-9.3

86.49

81

9.7

94.09

70

-1.3

1.69

72

0.7

0.49

81

9.7

94.09

63

-8.3

68.89

67

-4.3

18.49

77

5.7

32.49

The squared deviations are interpreted as follows. The first participant's squared deviation is 22.09 meaning that his/her diastolic blood pressure is 22.09 units squared from the mean diastolic blood pressure, and the second participant's diastolic blood pressure is 53.29 units squared from the mean diastolic blood pressure. A quantity that is often used to measure variability in a sample is called the sample variance, and it is essentially the mean of the squared deviations. The sample variance is denoted s 2 and is computed as follows:

Why do we divided by (n-1) instead of n?

The sample variance is not actually the mean of the squared deviations, because we divide by (n-1) instead of n. In statistical inference (described in detail in another module) we make generalizations or estimates of population parameters based on sample statistics. If we were to compute the sample variance by taking the mean of the squared deviations and dividing by n we would consistently underestimate the true population variance. Dividing by (n-1) produces a better estimate of the population variance. The sample variance is nonetheless usually interpreted as the average squared deviation from the mean.

 In this sample of n=10 diastolic blood pressures, the sample variance is s 2 = 472.10/9 = 52.46. Thus, on average diastolic blood pressures are 52.46 units squared from the mean diastolic blood pressure. Because of the squaring, the variance is not particularly interpretable. The more common measure of variability in a sample is the sample standard deviation, defined as the square root of the sample variance:

introduction to biostatistics and research methods

A sample of 10 women seeking prenatal care at Boston Medical center agree to participate in a study to assess the quality of prenatal care. At the time of study enrollment, you the study coordinator, collected background characteristics on each of the moms including their age (in years).The data are shown below:

24        18        28        32        26        21        22        43        27        29

Toggle open/close quiz group

A sample of 12 men have been recruited into a study on the risk factors for cardiovascular disease. The following data are HDL cholesterol levels (mg/dL) at study enrollment:

50        45        67        82        44        51        64        105      56        60        74        68 

Toggle open/close quiz group

Population Parameters

The previous page outlined the sample statistics for diastolic blood pressure measurement in our sample. If we had diastolic blood pressure measurements for all subjects in the population, we could also calculate the population parameters as follows:

Population Mean

Typically, a population mean is designated by the lower case Greek letter µ (pronounced 'mu'), and the formula is as follows:

where "N" is the populations size.

Population Variance and Standard Deviation

Statistical inference.

We usually don't have information about all of the subjects in a population of interest, so we take samples from the population in order to make inferences about unknown population parameters .

An obvious concern would be how good a given sample's statistics are in estimating the characteristics of the population from which it was drawn. There are many factors that influence diastolic blood pressure levels, such as age, body weight, fitness, and heredity.

We would ideally like the sample to be representative of the population . Intuitively, it would seem preferable to have a random sample , meaning that all subjects in the population have an equal chance of being selected into the sample; this would minimize systematic errors caused by biased sampling.

In addition, it is also intuitive that small samples might not be representative of the population just by chance, and large samples are less likely to be affected by "the luck of the draw"; this would reduce so-called random error. Since we often rely on a single sample to estimate population parameters, we never actually know how good our estimates are. However, one can use sampling methods that reduce bias, and the degree of random error in a given sample can be estimated in order to get a sense of the precision of our estimates.

Enter the characters you see below

Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.

Introduction to Biostatistics and Research Methods

By p. s. s. sundar rao and j. richard.

  • 1 Want to read
  • 0 Currently reading
  • 0 Have read

Introduction to Biostatistics and Research Methods by P. S. S. Sundar Rao, J. Richard

My Reading Lists:

Use this Work

Create a new list

My book notes.

My private notes about this edition:

Check nearby libraries

  • Library.link

Buy this book

This edition doesn't have a description yet. Can you add one ?

Showing 1 featured edition. View all 1 editions?

1

Add another edition?

Book Details

The physical object, source records, community reviews (0).

  • Created February 28, 2022
  • 2 revisions

Wikipedia citation

Copy and paste this code into your Wikipedia page. Need help?

Edited by import existing book
Created by Imported from

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Clin Transl Sci
  • v.5(1); 2021

Logo of jctsci

Guidance for biostatisticians on their essential contributions to clinical and translational research protocol review

Jody d. ciolino.

1 Department of Preventive Medicine, Division of Biostatistics, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA

Cathie Spino

2 Department of Biostatistics, University of Michigan, Washington Heights, Ann Arbor, MI, USA

Walter T. Ambrosius

3 Department of Biostatistics and Data Science, Division of Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA

Shokoufeh Khalatbari

4 Michigan Institute for Clinical & Health Research (MICHR), University of Michigan, Ann Arbor, MI, USA

Shari Messinger Cayetano

5 Department of Public Health, Division of Biostatistics, University of Miami, Miami, FL, USA

Jodi A. Lapidus

6 School of Public Health, Oregon Health & Sciences University, Portland, OR, USA

Paul J Nietert

7 Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA

Robert A. Oster

8 Department of Medicine, Division of Preventive Medicine, University of Alabama at Birmingham, Birmingham, AL, UK

Susan M. Perkins

9 Department of Biostatistics, Indiana University, Indianapolis, IN, USA

Brad H. Pollock

10 Department of Public Health Sciences, UC Davis School of Medicine, Davis, CA, USA

Gina-Maria Pomann

11 Duke Biostatistics, Epidemiology and Research Design (BERD) Methods Core, Duke University, Durham, NC, USA

Lori Lyn Price

12 Tufts Clinical and Translational Science Institute, Tufts University, Boston, MA, USA

13 Institute of Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA

Todd W. Rice

14 Department of Medicine, Division of Allergy, Pulmonary, and Critical Care Medicine, Medical Director, Vanderbilt Human Research Protections Program, Vice-President for Clinical Trials Innovation and Operations, Nashville, TN, USA

Tor D. Tosteson

15 Department of Biomedical Data Science, Division of Biostatistics, Geisel School of Medicine at Dartmouth, Hanover, NH, USA

Christopher J. Lindsell

16 Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA

Heidi Spratt

17 Department of Preventive Medicine and Population Health, University of Texas Medical Branch, Galveston, TX, USA

Rigorous scientific review of research protocols is critical to making funding decisions, and to the protection of both human and non-human research participants. Given the increasing complexity of research designs and data analysis methods, quantitative experts, such as biostatisticians, play an essential role in evaluating the rigor and reproducibility of proposed methods. However, there is a common misconception that a statistician’s input is relevant only to sample size/power and statistical analysis sections of a protocol. The comprehensive nature of a biostatistical review coupled with limited guidance on key components of protocol review motived this work. Members of the Biostatistics, Epidemiology, and Research Design Special Interest Group of the Association for Clinical and Translational Science used a consensus approach to identify the elements of research protocols that a biostatistician should consider in a review, and provide specific guidance on how each element should be reviewed. We present the resulting review framework as an educational tool and guideline for biostatisticians navigating review boards and panels. We briefly describe the approach to developing the framework, and we provide a comprehensive checklist and guidance on review of each protocol element. We posit that the biostatistical reviewer, through their breadth of engagement across multiple disciplines and experience with a range of research designs, can and should contribute significantly beyond review of the statistical analysis plan and sample size justification. Through careful scientific review, we hope to prevent excess resource expenditure and risk to humans and animals on poorly planned studies.

Introduction

Rigorous scientific review of research protocols is critical to making funding decisions [ 1 , 2 ], and to the protection of both human and non-human research participants [ 3 ]. Two pillars of ethical clinical and translational research include scientific validity and independent review of the proposed research [ 4 ]. As such, the review process often emphasizes the scientific approach and the study design, along with rigor and reproducibility of data collection and analysis. The criterion score labeled “Approach” has been shown to be the strongest predictor of the overall Impact Score and the likelihood of funding for research project grants (e.g., R01s) at the National Institutes of Health (NIH) [ 5 ]. Evidence also favors scientific review as a consequential component of institutional review of human participant research [ 3 ]. Given the increasing complexity of research designs and data analysis methods, quantitative experts, such as biostatisticians, often play an essential role in evaluating the rigor and reproducibility of proposed analytic methods. However, the structure and components of formal review can vary greatly when quantitative methodologists review research protocols prior to data collection, whether for Institutional Review Boards (IRBs), scientific review committees, or intramural and extramural grant review committees.

Protocol submitters and protocol reviewers often mistakenly view a statistician’s input as relevant only to sample size/power and statistical analysis sections of a protocol. Experienced reviewers know that to provide informative and actionable review of a research protocol from a biostatistical perspective requires a comprehensive view of the research strategy. This can be a daunting task to novice quantitative methodologists, yet to our knowledge, there is little guidance on the role and crucial components of biostatistical review of a protocol before data are collected.

Members of the Biostatistics, Epidemiology, and Research Design (BERD) Special Interest Group (SIG) of the Association for Clinical and Translational Science (ACTS) sought to develop this guidance. We used a consensus approach to identify the elements of research protocols that a biostatistician should consider in a review, and provide specific guidance on how each element should be reviewed. The resulting review framework can be used as an educational tool and guideline for biostatisticians navigating review boards and panels. This article briefly describes the approach to developing the framework, provides a comprehensive checklist, and guidance on review of each protocol element. We are disseminating this framework to better position biostatisticians to (1) advocate for research protocols that achieve the goal of answering their proposed study questions while minimizing risk to participants, and (2) serve as a steward of resources, with the ultimate goal of preventing the pursuit of uninformative or unnecessary research activities. We hope a consequence of this work will also be improved rigor and reproducibility of research protocols at the time of submission because protocol writers will also benefit from the guidance.

Approach to Developing Guidelines

In fall of 2017, the BERD SIG of the ACTS identified the considerable variation in the expectations for, and practice of, biostatistical review of research protocols as a modifiable barrier to effectively informing funding decisions, and to weighing risks and benefits for research participants. The BERD SIG is comprised of biostatisticians and epidemiologists with expertise in clinical and translational research at academic medical centers across the USA. Volunteers from this group formed a working group, consisting of all 16 coauthors of this article, to develop a checklist of items a quantitative methodologist should review in a research protocol (Table  1 ). The initial checklist focused on defining essential elements for reviewing a randomized controlled trial (RCT) as this is considered the most robust design in clinical research [ 6 ]. However, RCTs are not necessarily always feasible, practical, scientifically, or ethically justified, so elements for reviewing other important types of studies were added. Protocol elements essential for an RCT may be irrelevant to other types of studies, and vice versa.

Checklist guide of items to consider in biostatistical review of protocols

1. Objectives and hypotheses
  □ (a) Objectives articulated and consistent: pecific, easurable, chievable, elevant, ime bound
  □ (b) Hypotheses follow from objectives
  □ (c) Statistical hypothesis tests are clear or easily inferred and match aims
2. General approach
  □ (a) General study design matches the objectives and hypotheses to address research question
  □ (b) Limitations on conclusions that can be drawn are evident and clear
3. Population and sample
  □ (a) Degree of generalizability is obvious
  □ (b) Inclusion and exclusion criteria are appropriate for state of knowledge
  □ (c) Screening and enrollment processes minimize bias and do not restrict diversity
4. Measurements and outcomes
  □ (a) Choice of measurements, especially the response variable, is justified and consistent with the objectives
  □ (b) Timing of assessments and measurements is clear and standardized (study schedule or visit matrix should be present)
  □ (c) Objectively measured and standardized
  □ (d) If based on subjective or patient report, use validated instruments as appropriate
  □ (e) Measurements are of maximum feasible resolution with no unnecessary categorization in data collection
  □ (f) Ranges of outcomes, distributional properties, and handling in analyses are clear
  □ (g) Algorithms used to derive variables or score outcome assessments are justified (e.g., citations, clinical meaning, etc.)
  □ (h) Measurement of important/standard explanatory variables that will describe sample or address confounding
5. Treatment assignment
  □ (a) Minimization of biases (e.g., randomization and blinding)
  □ (b) Control condition(s) allow for comparability or minimization of confounding
6. Data integrity and data management
  □ (a) Data capture and management platform is described
  □ (b) Security and control of access to study data are discussed
  □ (c) Data validation, error corrections, and query resolution processes are included
7. Statistical analysis plan
  □ (a) Statistical approach is consistent with hypothesis and objectives
  □ (b) A plan for describing the dataset is given
  □ (c) Unit of analysis is clearly described for each analysis
  □ (d) Analysis populations clearly described (e.g., intention-to-treat set, per protocol set, full analysis set)
  □ (e) Key statistical assumptions are addressed
  □ (f) Alternative approaches in the event of violations of assumptions are present
  □ (g) Discussion of control of type I error (multiple comparisons) is present
  □ (h) Description of preventing and handling missing data is given
  □ (i) Interim analyses and statistical stopping guidelines are clear and justified
8. Sample size justification
  □ (a) Type I and II error rates present for all sample size calculations and corresponding statistical tests
  □ (b) Parameter assumptions are clearly stated and justified (i.e., based on previous research and consider the population studied)
  □ (c) Statistical tests used in sample size calculations match those presented in statistical analysis plan or appropriately justify reasoning for straying from it
  □ (d) Minimum clinically important differences or required precision described
9. Reporting and reproducibility
  □ (a) Plans for data sharing and archiving are present
  □ (b) Version control or a means of ensuring rigor, transparency, reproducibility in any processes is evident
  □ (c) Plan to report results according to guidelines or law

As the checklist was finalized, working group members were assigned to draft guidance describing review essentials pertaining to each item on the checklist. The expectation was that the text should describe the biostatical review perspective arrived at during the group discussions that occurred during development of the checklist. Another assigned reviewer then revised each section. The consensus approach involved multiple iterations of review and revision, and the final text presented in this article reflects the consensus of the working group. The co-primary (Ciolino and Spino) and senior authors (Lindsell and Spratt) synthesized all feedback from revision and review to finalize this article and corresponding checklist. Consensus was reached when all coauthors agreed with the final resultant article and checklist tool.

In the spring of 2019, a group of early career investigators (i.e., recipients of K awards) reviewed and provided comment on the checklist and article during a question-and-answer review lunch. Their feedback was that to maximize dissemination and impact beyond the statistical community, it would be more effective to emphasize why the statistical perspective matters for a protocol element rather than trying to justify one statistical argument or another. To obtain additional feedback to help focus the manuscript, we invited members and affiliates of the BERD SIG to rate the relative importance of each protocol element for different study designs (Fig.  1 ). This figure supplements the accompanying checklist of protocol items a biostatistical reviewer should consider in reviewing study protocols. The heat map illustrates the high-level summary view, among coauthors and other quantitative methodologists, of relevance for each checklist item. Individual respondents ( N = 20) rated each item from 1 (most relevance) to 4 (no relevance/not applicable). Darker cells correspond to higher importance or relevance for a given item/study type, while lighter cells indicate less relevance or importance. If we use the RCT as a benchmark, we note that the majority of the checklist items are important to consider and review in a research protocol for this study type. The dark column to the left illustrates this. As the study type strays from the RCT, we illustrate the varying degrees of relevance for each of these items. For example, a statistical reviewer should not put weight on things like interim analyses for several of these other study types (cohort studies, case-control, etc.), and the group determined that use of validated instruments and minimizing bias in enrollment in animal studies are less relevant. On the other hand, the need for clear objectives and hypotheses is consistent throughout, no matter what the study type. With this rich context and feedback, we finalized the guidance and checklist for presentation here.

An external file that holds a picture, illustration, etc.
Object name is S2059866121008141_fig1.jpg

Illustration of varying degrees of relevance for protocol items across common study types. This figure supplements the accompanying checklist of protocol items a biostatistical reviewer should consider in reviewing study protocols. The heat map illustrates the high-level summary view, among coauthors and other quantitative methodologists ( N = 20 respondents), of relevance for each checklist item. Individual respondents rated each item from 1 (most relevance) to 4 (no relevance/not applicable). Darker cells correspond to higher importance or relevance for a given item/study type, while lighter cells indicate less relevance or importance. If we use the randomized controlled trial (RCT) as a benchmark, we note that the majority of the checklist items are important to consider and review in a research protocol for this study type. The ordering of study types from left to right reflects the order in which respondents were presented these items when completing the survey. The dark column to the left illustrates this. As the study type strays from the RCT, we illustrate the varying degrees of relevance for each of these items. For example, a statistical reviewer should not put weight on things like interim analyses for several of these other study types (cohort studies, case-control, etc.), and the group determined that the use of validated instruments and minimizing bias in enrollment in animal studies are less relevant. On the other hand, the need for clear objectives and hypotheses is consistent throughout, no matter what the study type.

Objectives and Hypotheses

Objectives are articulated and consistent: specific, measurable, achievable, relevant, time bound (i.e., smart).

The first step in protocol review is understanding the research question. Objectives describe the explicit goal(s) of the study and should be clearly stated regardless of design. It is common for the objectives to be summarized in the form of “specific aims.” They should be presented in the context of the broader program of research, including a description of existing knowledge gaps and future directions. Objectives should be written so that they are easily understandable by all who read the protocol.

It is challenging to evaluate the rigor and impact of a study when objectives are diffuse. A common guide to writing objectives is the “SMART” approach [ 7 ]. That is, objectives should be specific as to exactly what will be accomplished. They should be measurable so that it can be determined whether the goals are accomplished. They should be achievable within the time, resource, and design constraints. They should be relevant to the scientific context and existing state of knowledge. Finally, objectives need to be tied to a specific time frame, often the duration of a project funding period. Biostatistical reviewers should evaluate objectives according to these criteria, as it will make them better positioned to properly evaluate the rest of the protocol.

Hypotheses Follow from Objectives

Hypotheses are statements of expected findings from the research outlined in the objectives. A study can have one or many hypotheses, or none at all. If a study is not designed to test the veracity of some assumed truth, it is not necessary – and often detrimental – to force a hypothesis statement. The biostatistical reviewer should appropriately temper criticism of studies that are “hypothesis generating” as opposed to formal statistical hypothesis testing. The observational, hypothesis generating loop of the scientific method provides an opportunity for the biostatistical reviewer to focus on evaluating the rigor and reproducibility of the proposed work in the absence of a formally testable hypothesis.

Statistical Hypothesis Tests Are Clear and Match Aims

When a hypothesis is appropriate, it should be stated in a testable framework using the data generated by the proposed study. The biostatistical reviewer should assess how the statistical approach relates to the hypothesis and contextualized by the objectives. Our experience is that an objective with more than one or two key hypotheses has insufficient focus to allow for a rigorous, unbiased study design accompanied by a robust analytic approach. Inclusion of several supportive hypotheses is of less concern.

These same notions of cohesion between objective and analyses apply for preliminary and pilot studies. The objectives of preliminary studies should be clearly stated. They may seek to demonstrate a specific procedure can be performed, a specified number of subjects can be enrolled in a given time frame, or that a technology can be produced. A pilot study with an objective to estimate effect size should be redesigned with alternative objectives because the sample size often precludes estimating the effect size with meaningful precision [ 8 ]. The moniker of pilot study is often mistakenly used to justify an underpowered study (i.e., uninformative study) [ 9 ]. While it is important that pilot studies specify any hypothesis to be tested in a subsequent definitive study, in general they should seldom (if ever) propose to conduct statistical hypothesis tests [ 8 , 10 ]. In every case, the biostatistical reviewer should look for objectives that are specific to demonstration or estimation.

General Approach

General study design matches the objectives and hypotheses to address research questions.

Once a study’s purpose is clear, the next goal of a biostatistical review is to confirm the general approach (i.e., type of study) matches the objectives and is consistent with the hypotheses that will be tested. RCTs are generally accepted for confirming causal effects, but there are many situations where they are not feasible nor ethically justified, and well-designed observational (non-experimental) studies are required. For example, RCTs to evaluate parachute use in preventing death and major trauma in a gravitational challenge do not exist because of clear ethical concerns. Between the experimental and observational approaches lie a class of studies called quasi-experimental studies that evaluated interventions or exposures without randomization using design and analytical techniques such as instrumental variables (natural experiments) and propensity scores [ 11 , 12 ]. The biostatistical reviewer should consider the relevant merits and tradeoffs between the experimental, non-experimental, and quasi-experimental approaches and comment on the strength of evidence for answering the study question.

We highlighted a few possible design approaches in Fig.  1 . Within each, there are innumerable design options. For example, with the RCT design, there are crossover, factorial, dose-escalating, and cluster-randomized designs, and many more [ 6 ]. The biostatistical reviewer should acknowledge the balance between rigor and feasibility, noting that the most rigorous design may not be the most efficient, least invasive, ethical, or resource preserving.

Limitations on Conclusions that Can Be Drawn Are Evident and Clear

When using innovative designs, the biostatistical reviewer must consider whether the design was selected because it is most appropriate rather than other factors such as current trends and usage in the field. There are typically multiple designs available to answer similar questions, but the protocol must note the limitations of the design proposed and justify its choice over alternative strategies. As Freidman, Furberg, DeMets, et al. note, “There is no such thing as a perfect study” [ 6 ].

When a protocol requires novel or atypical designs, it is imperative that the biostatistician’s review carefully considers potential biases and the downstream analytic implications the designs may present. For example, a dose-finding study using response-adaptive randomization will not allow for conclusions to be drawn regarding drug efficacy in comparison to placebo using classical statistical methods. It will, however, allow for estimation of a maximum tolerated dose for use in later phase studies. This imposes additional responsibilities on the biostatisticians to understand the state of the science within the field of application, the conclusions one can draw from the proposed research and their impact on subsequent studies that build upon the knowledge gained.

Population and Sample

Degree of generalizability is obvious.

We must recognize that every sample will have limits to generalizability; that is, there will be inherent biases in study design and sampling. RCTs have limits to generalizability as they require specification of eligibility criteria to define the study sample. The more restrictive these criteria are, the less generalizable the inferences become. This concept of generalizability becomes particularly important as reviewers evaluate fully translational research that moves from “bench to bedside.” Basic science and animal studies (i.e., “bench research”) occur in comparatively controlled environments, usually on samples with minimal variability or heterogeneity. The generalizability of these pre-clinical findings to heterogeneous, clinical populations in these situations is limited. For this reason, an effect size observed in pre-clinical populations cannot be generalized to that which one would expect in a clinical population.

All sample selection procedures have advantages and disadvantages, which must be considered when assessing the feasibility, validity, and interpretation of study findings. Biases may be subtle, yet they can have important implications for the interpretability and generalizability of study findings. For example, a randomized, multicenter study conducted in urban health centers evaluating implementation of a primary care quality improvement strategy will likely not allow for generalizability to rural settings. We urge statistical reviewers to evaluate sampling procedures and watch for samples of convenience that may not be merited.

Inclusion and Exclusion Criteria Appropriate for State of Knowledge

No matter the type or phase of study, the protocol should describe how eligibility of study participants is determined. The notion that therapies and diseases have differing underlying mechanisms of action or progression in different populations (e.g., children vs. adults, males vs. females) often leads to increased restrictions on inclusion and exclusion criteria. While sometimes justified scientifically as it allows for a precise estimate of effect within a specialized population, the tradeoff is less generalizability and feasibility to complete enrollment. On the other hand, sample selection or eligibility criteria may be expansive and purposefully inclusive to maximize generalizability. The tradeoff is often increased variability and potential heterogeneity of effect that within specific subgroups. Biostatistical reviewers should question eligibility decisions chosen purely for practical reasons and recognize the limits they place on a study’s generalizability, noting the potential future dilemma for managing patients who would not have completely satisfied a study’s inclusion and exclusion criteria.

Screening and Enrollment Processes Minimize Bias and Do Not Restrict Diversity

It is imperative that clinical and translational research be designed for diversity, equity, and inclusion. Aside from specification of eligibility criteria, the specific way researchers plan to identify, recruit, screen, and ultimately enroll study participants may be prone to biases. For example, reading level and language of the informed consent document may impact accessibility. The timing and location of recruitment activities also restrict access both in person and by mail or electronic communication. Using email outreach or phone outreach to screen and identify patients will exclude those without easy access to technology or stable phone service. Some populations may prefer text messaging to phone calls; some may prefer messages from providers directly rather than participating study staff. Biostatistical reviewers should consider inclusive procedures and those appropriate for the target study population as they have potential to impact bias and variability within the sample. This can ameliorate or amplify both effect sizes and study generalizability.

Measurements and Outcomes

Choice of measurements, especially the response variable, is justified and consistent with the objectives.

The statistical review should ensure that outcome measures are aligned with objectives and appropriately describe the response of the experiment at the unit of analysis of the study (e.g., participant, animal, cell). Outcomes should be clinically relevant, measured or scored on an appropriate scale, valid, objective, reliable, sensitive, specific, precise, and free from bias to the extent possible [ 13 ]. Statistically, the level of specification is important. As an example, risk of death can be assessed as the proportion of participants who die within a specified period of time (binary outcome), or as time to death (i.e., survival). These outcomes require different analytic approaches with consequences on statistical power and interpretation. Ideally, the outcome should provide the maximum possible statistical information. It is common for investigators to dichotomize continuous measurements (e.g., defining a treatment responder as a participant who achieves a certain change in the outcome rather than considering the continuous response of change from baseline). Information is lost when continuous and ordinal responses are replaced with binary or categorical outcomes, and this practice is generally discouraged [ 14 – 16 ]. If the biostatistical reviewer identifies such information loss, they should consider the resulting inefficiency (i.e., increased sample size required; loss of efficacy signal) in the context of the risks to human or animal subjects and the costs of the study.

Outcome deliberations become critical in the design of clinical trials because distinction between primary, secondary, and exploratory endpoints is important. Many considerations are the topic of a large body of literature [ 17 – 20 ]. The biostatistical reviewer should be aware that the Food and Drug Administration (FDA) differentiates among the decisions that are supported by primary, secondary, and exploratory endpoints, as described in the “Discussion of control of type I error (multiple comparisons) is present” section [ 18 ]. Results of primary and secondary endpoints must be reported in clinicaltrials.gov; results of exploratory endpoints do not require reporting. Regardless of the dictates of clinicaltrials.gov, limitations on the number of secondary endpoints are prudent, and all secondary endpoints should be explicitly detailed in the protocol [ 21 ]. It is important that the issue of multiplicity in endpoints is considered and clearly delineated in the sections on sample size and statistical analysis. Beyond multiplicity, what constitutes success of the trial needs to be clear. For example, if there are several co-primary endpoints, it should be noted whether success is achieved if any endpoint is positive, or only if all endpoints are met. It should also be clear whether secondary endpoints will be analyzed if the primary endpoint is not significant. When there are longitudinal measurements, the investigators should specify whether a single time point defines the primary endpoint or whether all time points are incorporated to define the trajectory of response as the primary endpoint. Although the description above focuses on clinical trials, all protocols should describe which outcome(s) is(are) the basis for sample size or integral to defining success of the study.

Composite outcomes deserve special attention in a biostatistical review of a protocol [ 22 – 24 ]. Composite outcomes combine several elements into a single variable. Examples include “days alive and out of hospital” or “death or recurrent myocardial infarction.” Investigators sometimes select composite outcomes because of a very low expected event rate on any one outcome. The biostatistical reviewer should carefully consider the information in the composite endpoint for appropriateness. An example of a composite outcome that would not be appropriate is death combined with lack of cognition (e.g., neurologically intact survival) when the causal pathway is divergent such that the treatment worsens mortality but improves cognitive outcome. Some patients will care about quality of life (i.e., improved cognition) over length of life, while others will not, which yields a situation where one cannot define a single utility function for the composite outcome.

Timing of Assessments and Measurements Is Clear and Standardized (Study Schedule or Visit Matrix Should Be Present)

Regardless of whether a measurement is an outcome, predictor, or other measure, a biostatistical reviewer should evaluate each measurement in terms of who, what, when, where, how, and why, as shown in Table  2 .

Aspects of measurement that should be considered in protocol evaluation

• Who will be assessed?
• Who will make the assessments?
• Is the assessor blinded to the intervention arms?
• What is (are) the measurement variable(s)?
• What is the analysis metric (e.g., change from baseline, end of study value, time to event)?
• Where will the assessment take place (e.g., hospital, home, doctor’s office)?
• When will the assessments take place (specific time points)?
• How are the assessments summarized (e.g., mean, median, proportion)?
• How will the measurements be used in the analysis?
• Why are the assessments clinically relevant for addressing efficacy and safety outcomes?

This information tends to be scattered throughout the protocol: who will be assessed is described in the eligibility criteria; the main predictor may be defined in interventions; what and when are given in the outcomes section; who will take assessments, where and when are described in data collection methods; and how the assessments are summarized may be in the outcomes or the statistical methods section. This makes the evaluation of measurements challenging, but it is a vital component and should not be undervalued. A schedule of evaluations (Table  3 ), or visit matrix, can be a very valuable tool to summarize such information and help reviewers easily identify what measurements are being made by whom, how, and when.

Schedule of evaluations

Screening/baselineFollow-up (FU)
assessments
Assessment procedureVisit 1Visit 2FU 6FU 12FU 18
Participant consentX
HIPAA authorization formX
Personal information (demographics)X
Medical historyX
Current medication useX
Primary outcomeXXXX
Secondary outcomesXXX
Expensive secondary outcomeXX
Tertiary outcomeXXXX
Blood collectionXXX
SF-36XXXX

Objectively Measured and Standardized

To remove potential sources of bias, from a statistical perspective each assessment should be as objective as possible. Thus, the protocol should mention measures of standardization as appropriate (e.g., central reading of images, central laboratory processing), and measures to ensure fidelity and quality control. For example, if outcomes come from a structured interview or clinical rating, there should be discussion of measuring interrater agreement. If the agreement is lower than expected, training or re-training procedures should be described.

If Based on Subjective or Patient Report, Use Validated Instruments as Appropriate

For patient-reported outcomes and questionnaires, the validity and reliability of the instrument is key. Validated instruments with good psychometric properties should be favored over unvalidated alternatives. The biostatistical reviewer should generally be cognizant of the repercussions of even small changes to the instrument, including reformatting or digitizing instruments.

Measurements Are of Maximum Feasible Resolution with No Unnecessary Categorization in Data Collection

Reviewers should assess whether appropriate data collection methods are used to improve the quality of the outcomes. For example, outcomes sometimes require derivation or scoring, such as body mass index (BMI). The reliability of BMI is improved if height and weight are collected and BMI is calculated in analysis programs. This reduces errors in translating between feet and centimeters or incorrect calculations. For each key variable, it should be clear how the data will be generated and recorded; when a case report form is provided for review this can be remarkably helpful.

Ranges of Outcomes, Distributional Properties, and Handling in Analyses Are Clear

The distributional properties of outcomes will allow a statistical reviewer to determine whether the analytic strategy is sound. If a laboratory value is known to be highly skewed or if the investigators use a count variable with several anticipated zero values, a standard t-test may not be appropriate, but rather nonparametric tests or strategies assuming a Poisson or zero-inflated Poisson distribution may be more appropriate. These will have implications on inferences and sample size calculations. Often, time-to-event variables are confused with binary outcome variables. Cumulative measures such as death by a certain time point (e.g., 12-month mortality rate) vs. time-to-death are two different outcomes that are incorrectly used interchangeably. The statistical reviewer should pay careful attention to situations where the outcome could feasibly be treated as binary (i.e., evaluated with relative risk, odds ratio, or risk difference) or as a time-to-event outcome (i.e., evaluated with hazard ratios).

Algorithms Used to Derive Variables or Score Outcome Assessments Are Justified (e.g., Citations, Clinical Meaning)

Just as validated survey/questionnaires are preferred in study design, so are any algorithms that are being used to select participants, allocate interventions (or guide interventions), or derive outcomes. For example, there are several algorithms used to derive estimated glomerular filtration rate [ 25 – 27 ], a measure of kidney function, or percent predicted forced expiratory volume (FEV1), a measure of lung function [ 28 , 29 ]. Some studies use predictive enrichment, using algorithms to select participants for inclusion. Although a statistical reviewer may not be in a position to argue the scientific context of the algorithm, they can ensure that the choice of algorithm or its derivation is discussed in the protocol. This should include whether investigators plan to use unvalidated algorithms or scores. If the algorithms are described according to the Transparent Reporting of multivariable prediction model for Individual Prognosis Or Diagnosis criteria [ 30 ], it provides the biostatistical reviewer the necessary information to either advocate for or against the algorithm as a component of the study.

Measurement of Important/standard Explanatory Variables that Will Describe Sample or Address Confounding

In addition to outcomes, other measures such as predictors, confounders, effect modifiers, and other characteristics of the population (e.g., concomitant medications) should be listed. For example, if obesity is a confounder and will be included in analyses, the metric used to define obesity should be provided. These variables generally do not need as much detail as the outcome unless they are important in the analyses, or the lack of explanation may cause confusion.

Treatment Assignment

Minimization of biases (e.g., randomization and blinding).

Any study that evaluates the effect of a treatment or intervention must consider how participants are allocated to receive the intervention or the comparator. Randomized assignments serve as the ideal study design for minimizing known and unknown differences between study groups and evaluating causality. With this approach, the only experimental condition that differs in comparing interventions is the intervention itself. A biostatistical reviewer should consider the fundamental components of the randomization process to ensure that threats to causal inference are not inadvertently introduced.

Units and methods of randomization will depend upon the goals and nature of the study. Units of randomization may be, for example, animals, patients, or clinics. There is a wide range of methods for controlling balance across study arms [ 31 – 33 ]. Simple randomization and block randomization are straightforward, but techniques such as stratification, minimization, or adaptive randomization may be more appropriate. The choice of randomization approach, including details of the randomization process, should be considered by the biostatistical reviewer in the context of the study design. The algorithm used to generate the allocation sequence should be explained (e.g., stratified blocks, minimization, simple randomization, response-adaptive, use of clusters). However, the reviewer should not ask for details that would defeat the purpose of concealment (e.g., size of blocks).

The integrity of a trial and its randomization process can easily be compromised if the allocation sequence is not concealed properly, and thus the biostatistical reviewer should look for a description of the concealment process, such as use of a central telephone system or centralized web-based system. Concealment is not the same as blinding (sometimes also referred to as masking); concealment of the randomization sequence is intended to prevent selection bias prior to enrollment whereas blinding is intended to prevent biases arising after enrollment. Therefore, open-label and non-blinded randomized studies should also conceal the allocation sequence. If a pre-generated sequence is not used, the biostatistical reviewer should consider how real-time randomization is deployed, as might be required in a response-adaptive design, and whether it is feasible.

Beyond considerations of generating and concealing the randomization algorithm, the biostatistical reviewer should look for biases arising from the randomization process. As an example, if there is extended time between randomization and intervention, there is a high likelihood that the participant’s baseline status has changed, and they may no longer be eligible for treatment with the intervention. This can result in an increased number of patients dropping out of the study or not receiving their assigned intervention. Under the intent-to-treat principle, this results in bias toward the null.

The use of blinding can strengthen the rigor of a study even if the participant’s treating physician cannot be blinded in the traditional sense. For example, a blinded assessor of the primary endpoint can be used. Sometimes blinding of the participant or physician is not possible, such as when intervention is a behavioral therapy in comparison to a medication. It may also be necessary to break the blind during the study in emergency cases or for study oversight by a Data and Safety Monitoring Board. The biostatistical reviewer should consider whether sufficient blinding is in place to minimize bias, and whether the process for maintaining and breaking the blind is sufficient to prevent accidentally revealing treatment allocation to those who may be positioned to introduce bias.

Control Condition(s) Allow for Comparability or Minimization of Confounding

Randomization and blinding are used in prospective interventional trials to minimize bias and maximize the ability to conclude causation of the intervention. However, in some cases randomization may be unethical or infeasible and prospective observation is proposed to assess the treatment effect. In other cases, retrospective observational studies may be proposed. In such cases, there are several biases to which a biostatistical reviewer should be attuned. These include, but are not limited to, treatment selection bias, protopathic bias [ 34 ], confounding by severity, and confounding by indication. Approaches such as multiple regression methods or propensity score matching can be used to address measurable biases, and instrumental variable analysis can address unmeasured bias. The biostatistical reviewer should rightly negate the impact of an observational study of treatment effects when there is no attempt to mitigate the inherent biases, but should also support a protocol when appropriate methods are proposed.

Data Integrity and Data Management

Data capture and management is described.

Data integrity is critical to all research. A biostatistical reviewer should be concerned with how the investigator plans to maintain the accuracy of data as they are generated, collected, and curated. Data collection and storage procedures should be sufficiently described to ascertain the integrity of the primary measurements and, if appropriate, to adjudicate compliance with regulatory and scientific oversight requirements. The amount of detail required is often proportional to the size and complexity of the research.

For larger and more complex studies, and clinical trials in particular, a standalone Data Management Plan (DMP) might be used to augment a research protocol [ 35 , 36 ]. Whether the DMP is separate from the protocol, a biostatistical reviewer should consider details about who is responsible for creation and maintenance of the database; who will perform the data entry; and who, how, and when quality checks will be performed to ensure data integrity. This work is often supported by a data management platform, a custom system used to manage electronic data from entry to creation of an analytic dataset. The wrong tool can undermine data integrity, and a biostatistical reviewer should examine the data management pathway to ensure the final dataset is an accurate representation of the collected data. Investigators should describe their chosen data management platform and how it will support the workflow and satisfy security requirements. The choice of platform should be scaled to the study needs and explained in the study protocol; for supporting complex clinical trials spanning multiple countries, the technical requirements of the data management platform can become extensive.

Studies often utilize data captured from multiple platforms that must eventually be merged with the clinical data for analysis. Examples include data from wearable mobile health technology (smart phones and wearable devices such as accelerometers and step counters), real-time data streams from inpatient data monitors, electronic health record data, and data from laboratory or imaging cores. If the protocol calls for multiple modes of data collection, the protocol should acknowledge the need for merging data source and describe how data integrity will be assured during linkage [e.g., use of Globally Unique Identifiers to identify a single participant across multiple data sources, and reconciliation of such keys during the study).

Security and Control of Access to Study Data Are Discussed

While protection of privacy and confidentiality is traditionally the purview of privacy boards or IRBs, a biostatistical reviewer should ensure the protocol describes data security measures. Expected measures will include procedures for ensuring appropriate authentication for use, storage of data on secure servers (as opposed to a local computer’s hard disk or unencrypted flash drive), and accessibility of the data or data management system. In a clinical trial, for example, the data management system might need to be available 24 h a day, 7 days a week with appropriate backup systems in place that would pick up the workflow in the event of a major failure. Conversely, a chart review study might be supported sufficiently using a simple data capture form deployed in Research Electronic Data Capture [ 37 , 38 ]. Increasingly, data and applications are being maintained on cloud-based systems with high reliability; security requirements also apply to cloud-based data storage.

Data Validation, Errors, Query Resolution Processes Are Included

Missing and erroneous data can have a significant impact on the analysis and results, so the biostatistical reviewer should evaluate plans to minimize missing data and to check for inaccuracies. Beyond traditional clinical trial monitoring for regulatory and protocol compliance, the protocol should consider how to prevent data values outside of allowable ranges and data inconsistencies. A query identification and resolution process that includes range and consistency checks is recommended. For more complex studies, the biostatistical reviewer should expect the investigator to describe plans for minimizing missing [ 39 ] or low-quality data during study implementation, such as routine data quality reporting with corrective action processes. Inherent to more complex research protocols are the practical challenges that result in protocol deviations, including dose modifications, study visits that occurred outside of the prescribed time window, and missed assessments during a study visit. The biostatistical reviewer may expect the DMP to describe the approach to documenting such events and how they are to be considered in subsequent analyses.

Statistical Analysis Plan

Statistical analysis plans provide a reproducible roadmap of analysis that can be very valuable for all studies [ 40 , 41 ]. There are many components that could be included and these will vary on the study type and design [ 42 – 44 ]. A true pilot or feasibility study may not require a statistical analysis plan with the detail that would typically be required for an RCT. The objectives of such studies are often not to answer a particular research question, but to determine feasibility of study conduct. Such studies should not involve the testing of hypotheses, but they often involve quantitative thresholds for enrollment to inform the larger studies. If the analysis plan calls for hypothesis testing, the biostatistical reviewer may rightfully reject the approach as inappropriate. For large epidemiological registries, formal hypotheses may not have been developed at the time of design. Unlike feasibility studies, however, the biostatistical reviewer should expect to see the research team’s general approach to developing and testing hypotheses or modeling the data. The following sections outline key components that should be considered by biostatistical reviewers. Most statistical inference, and the focus of this article, is frequentist. We note that many questions can be better answered using Bayesian inference. Almost all of the points of this article apply equally well to a Bayesian study.

Statistical Approach Is Consistent with Hypothesis and Objectives

Statistical analyses allow for inferences from study data to address the study objectives. If the analyses are misaligned, the upstream research question that it may address, although potentially important, will not be that which the researcher had originally sought to explore. The biostatistical reviewer should ensure alignment between the proposed analyses and (1) the research hypotheses, and (2) the study outcomes. For example, continuous outcomes should not be analyzed using statistical methods designed for the analysis of dichotomous outcomes, such as chi-squared tests or logistic regression. A misaligned analysis plan may have implications not just for missing the study objectives but for sample size considerations. A study involving a continuous outcome but which employs analysis of a binary outcome is generally less efficient and will require more experimental units. A biostatistical reviewer should identify such inefficiencies, particularly in studies that involve extensive resources or that involve risk to human subjects.

For studies with multiple objectives, the biostatistical reviewer should expect a detailed analysis plan for each primary outcome. Secondary outcomes should be described but might be grouped together based on outcome type. Exploratory outcomes might be discussed with fewer details.

A Plan for Describing the Dataset Is Given

Describing the study sample is the first step in any analysis, and it allows those evaluating results to determine generalizability. Thus, any analytic plan should call for a description of the study sample – e.g., baseline characteristics of patients, animals, cells – regardless of importance for the primary study goals. The description should include sampling, screening, and/or randomization methods, as in a Consolidated Standards of Reporting Trials (CONSORT) diagram [ 43 ] for clinical trials (see also Mathilde et al. [ 45 ] for other study types). When a study design involves repeated measurements on the same experimental unit (e.g., patient, animal, or cell), the biostatistical reviewer should consider each experimental unit’s contribution to the analysis at each time point.

Unit of Analysis Is Clearly Described for Each Analysis

Experimental units may be clinical sites, communities, groups of participants, individual participants, cells, tissue samples, or muscle fibers. They may vary by research aim within a single study. A biostatistical reviewer’s evaluation of the analytical approach and sample size estimates depends on experimental unit. In cluster-randomized trials, for example, the analytic unit may be the cluster (e.g., clinic or site) or unit within a cluster (e.g., patients). Common mistakes in cluster-randomized trials, or studies involving analytic units that are inherently correlated with one another, involve failure to specify the units of analyses and failure to adequately account for the intraclass (or sometimes termed intracluster) correlation coefficient (ICC) in both sample size calculations and analyses [ 46 , 47 ]. The biostatistical reviewer should consider whether the analytic strategy adequately accounts for potential correlation among experimental units in such studies.

Analysis Populations Clearly Described (e.g., Intention-to-Treat Set, Per Protocol Set, Full Analysis Set)

Non-adherence or data anomalies are inevitable in clinical research, but decisions to exclude participants or data points from analyses present two problems: they result in a smaller overall analytic sample size, and they introduce a potential source of bias. Many large, clinical trials employ the intention-to-treat principle. Under this principle, once participants are randomized, they are always included in the analysis, and participants are analyzed as originally assigned regardless of adherence. Study protocols should mention any plans to analyze participants according to this principle and any modifications of this principle. As may occur with a safety analysis for an experimental drug or treatment, if an analysis assigns patients to treatment arms based on what actually happened (i.e., a per protocol or as treated dataset), the biostatistical reviewer should ensure this is pre-defined. That is, the criteria that make a participant “adherent” should be clear, including how adherence will be measured (e.g., pill counts, diaries). The biostatistical reviewer should also assess handling of non-adherent participants. The analysis plan should discuss these ideas and describe the analytic dataset with these questions in mind. These same concepts can be applied to observational studies to reduce bias and variability in analyses while keeping true to the study’s aims. Whether it be in the context of the study population in an RCT or in determining causality in an observational context, the biostatistical reviewer should also consider whether causal inference methodology is an appropriate approach to addressing research aims.

Key Statistical Assumptions Are Addressed

Soundness of statistical analyses depends on several assumptions. It would be impractical to list all assumptions in a protocol, but the biostatistical reviewer should evaluate plans to check major assumptions, such as normality and independence, noting that sometimes specific assumptions may be relaxed depending upon study scenario. On the other hand, certain proposed methods may require clear articulation of “out-of-ordinary” or strong assumptions for them to be truly valid in a given context (e.g., the many assumptions that surround causal inference methods). An example of an appropriate way to acknowledge and plan for addressing model assumptions is a high-level statement: “We will assess the data for normality [with the appropriate methods stated here], transform as needed, and analyze using either Student’s t-test or the nonparametric equivalent as appropriate [again stating at least one specific method here].”

Alternative Approaches in the Event of Violations of Assumptions Are Present

While impossible to foresee all possible violations of assumptions and thus plan for all possible alternative approaches that may be appropriate, the investigator should have a contingency plan if violations of assumptions are likely. A strong analysis plan will mention how it will be updated using appropriate version control to address each shift in approach; this documentation will allow for greatest transparency in any unexpected changes in analyses.

Discussion of Control of Type I Error (Multiple Comparisons) Is Present

When an analysis plan calls for many statistical tests, the probability of making a false positive conclusion increases simply due to chance. The biostatistical reviewer should balance the possibility of such type I errors with the strength and context of inferences expected. For clinical trials designed to bring a drug to market, controlling type I error is extremely important and both the FDA and the European Medicines Agency have issued guidance on how to handle this [ 18 ]. For purely exploratory studies, controlling the type I error may be less important, but the possibility of false discovery should be acknowledged.

Description of Preventing and Handling Missing Data Is Given

Missing data are often inevitable, especially in human studies. Poor handling of the missing data problem can introduce significant biases. The analysis plan should discuss anticipated missing data rates, unacceptable rates of missing data, and those that would merit exploration of in-depth sensitivity analyses.

If data are missing completely at random, analyses are generally unbiased, but this is very rarely the case. Under missing at random and missing not at random scenarios, imputation or advanced statistical methodology may be proposed, and the statistical reviewer should expect these to be clearly explained. If there are sensitivity analyses involving imputations, these also require explanation. While it is impractical to anticipate all possible missingness scenarios a priori , the biostatistical reviewer should at minimum determine (1) whether the protocol mentions anticipated missing data rate(s), (2) whether the anticipated rate(s) seem reasonable given the scenario and study population, (3) whether any missingness assumptions are merited, and (4) whether the analysis plans for imputation to explore multiple scenarios, allowing for a true sensitivity analysis.

Interim Analyses and Statistical Stopping Guidelines Are Clear and Justified

The term interim analysis often signifies simple interim descriptive statistics to monitor accrual rates, process measures, and adverse events. A study protocol should pre-specify plans for interim data monitoring in this regard, but there are seldom statistical implications associated with these types of analyses. A biostatistical reviewer should pay more attention when the protocol calls for an interim analysis that involves hypothesis testing. This may occur in clinical trials or prospective studies that use interim data looks to make decisions about adapting study features (such as sample size) in some way, or to make decisions to stop a study for either futility or efficacy. If the study calls for stopping rules, the criteria should be pre-specified in the study protocol. These may be in the form of efficacy or safety boundaries, or futility thresholds [ 48 – 50 ]. The biostatistical reviewer should note that to control the type I error rate for stopping for benefit, an interim analysis for these purposes may necessitate a more conservative significance level upon final statistical analysis. The protocol should ideally state that no formal interim analyses will be conducted or explain the terms of such analyses to include the timing, the frequency or total number of interim “looks” planned, and approach to controlling type I and type II errors.

Sample Size Justification

Type i and ii error rates present for all sample size calculations and corresponding statistical tests.

It is common for investigators to use conventional values of type I error rate ( α = 0.05) and power (80%). However, there may be situations when more emphasis is put on controlling the type II error or the type I error. Phase II studies often aim to determine whether to proceed to a phase III confirmatory study rather than to determine whether a drug is efficacious. In this case, a significance level of 0.20 might be acceptable. For a large, confirmatory study, investigators focus on controlling the type I error, and it may be appropriate to set desired power to 90% or the significance level to 0.01. The biostatistical reviewer should evaluate the selected power and significance levels used to justify the sample size, including decisions to deviate from convention.

Parameter Assumptions Are Clearly Stated and Justified (i.e., Based on Previous Research and Considers the Population Studied)

All power and sample size calculations require a priori assumptions and information. The more complicated the analyses, the more parameter assumptions that the investigators must suggest and justify. The statistical reviewer should be able to use the parameter assumptions provided in any proposed study to replicate the sample size and power calculations (at least to an approximate degree). The justification of the assumed parameter values (e.g., median time-to-death, variance estimates, correlation estimates, control proportions, etc.) should be supported by prior studies or literature.

Additionally, common issues in research such as attrition, loss to follow-up, and withdrawal from the study can greatly affect the final sample size. Sample size justifications usually account for these issues through inflating enrollment numbers beyond the sample size required to achieve the desired significance level and power. This will help ensure analytic sample size(s), after accounting for attrition and loss, resemble the required sample size as determined in the a priori calculations.

Statistical Tests used in Sample Size Calculations Match Those Presented in Statistical Analysis Plan or Appropriately Justify Reasoning for Straying from It

It is critical that the statistical methods assumed for computing power or sample size match, as closely as possible, those that are proposed in the statistical analysis plan. If the primary statistical analysis methods are based on outcomes with continuous data, such as using a two-sample t-test, then the sample size justification should also assume the use of the two-sample t-test. If the primary statistical analysis methods are based on outcomes with categorical data, such as using the chi-squared test to compare proportions, then the sample size justification should also assume the use of the chi-squared test. A mismatch between the power and sample size calculations and the statistical approach essentially render the estimations uninformative. Even making assumptions about how the data are likely to be analyzed in practice, a biostatistical reviewer may barely be able to infer even gross accuracy of the estimates. In general, it may be acceptable to plan for a complicated analysis [e.g., an analysis of covariance, adjusting for baseline), but sample size considerations may be based on a simpler statistical method (e.g., a t-test). However, when there is sufficient information to replicate the data generation mechanism, simulation presents a straightforward solution to understanding the effect of design decisions on the sample size and is desirable when the inputs can be justified.

Minimum Clinically Important Differences or Required Precision Described

Beyond the statistical approach, the factors that have the most influence on the sample size calculation are the minimally important difference between interventions (or change) and the variability of the primary outcome variable. When the minimally important difference is put in context of variability, the “effect size” can be estimated, and this drives the sample size justification. Investigators may propose the minimally important difference based upon clinically meaningful differences or based upon biologically useful differences. When a protocol proposes a minimally important difference arbitrarily or based on observations in preliminary studies, then the biostatistical reviewer should expect some justification that this quantity is biologically relevant and that it will help advance knowledge or learning in the specific research area.

As with the minimally important difference, the protocol should carefully justify the expected distribution of the primary outcome(s). Preliminary studies may provide estimates of the distribution, but preliminary studies often include only small samples in very controlled settings. These samples may not represent the heterogeneity of the population of interest. Estimates of effect sizes or variability from the published literature may also be suspect for multiple reasons, including the use of populations different to that of the present study and publication bias. When investigators rely on estimates from these studies, it may result in a plan for a smaller sample size than is actually required to find the minimally important difference.

The biostatistical reviewer should recognize that the nature of sample size calculations is inherently approximate, but expect the study investigators to be realistic in estimating the parameters used in the calculations. It is helpful when the investigator provides a table or figure displaying ranges for these parameters, the power, the significance level, and the sample size. This can be especially important for less common or novel study designs that may require additional parameter assumptions and consideration, such as the use of the ICC to inflate the sample size to account for clustering or site effects; special considerations for the selection of a priori estimates of standard deviations or proportions; clearly stated parameters for the margins of non-inferiority (for non-inferiority studies) or equivalence (for equivalence studies) [ 51 ]; and an accounting of potential interaction effects of interest between confounding variables.

Powering for subgroup analysis

Funders and regulatory agencies increasingly require investigators explore treatment effects within subgroups. It is impossible to power a study to detect the minimally important effect in every possible subgroup, but it might be reasonable to power the study to detect interactions between treatment effect and some subgrouping variables, as might be done for testing heterogeneity of treatment effects. More frequently, subgroup analyses may be considered exploratory. In this case, the sample size required to observe a minimally important difference within a subgroup is of less importance; however, one may expect some discussion of the magnitude of difference that might be observed within the subgroup. Biostatistical reviewers should expect to see additional considerations if subgroup analyses are planned.

Reporting and Reproducibility

Plans for data sharing and archiving are present.

Many United States Federal agencies, including the NIH, now require sharing of data on completion of the research. The protocol should describe the approach to sharing data publicly in accordance with governing rules. This process can be challenging as the de-identification of data is not trivial. Providing data in a manner that encourages secondary use requires attention to the processes for gaining access and for managing and supporting the requests. The biostatistical reviewer is well positioned to comment on the investigators plans for curating the final dataset for public use.

Version Control or a Means of Ensuring Rigor, Transparency, Reproducibility in Any Processes Is Evident

Every change in research protocols, analysis plans, and datasets is an opportunity for error. Protocols that specify the process for version control and change management are generally more rigorous and reproducible than those that do not.

Plan to Report Results According to Guidelines or Law

With the increased emphasis on transparency of research, there is a growing mandate to publicize clinical research in open databases, such as clinicaltrials.gov. While this is primarily a regulatory concern, a biostatistical reviewer should be cognizant of the effort required and timelines imposed for such reporting and expect this to be reflected in the protocol timeline and, if appropriate, budget.

The Biostatistical Reviewer’s Additional Responsibility

Berger and Matthews stated that “Biostatistics is the discipline concerned with how we ought to make decisions when analyzing biomedical data. It is the evolving discipline concerned with formulating explicit rules to compensate both for the fallibility of human intuition in general and for biases in study design in particular” [ 52 ]. As such, the core of biostatistics is trying to uncover the truth. While some scientists are implicitly biased in believing the alternative hypothesis to be true, a biostatistician’s perspective is appropriately “equipoise.” For example, at the root of basic frequentist statistical hypothesis testing lies the assumption that the null hypothesis (which is often of least interest to investigators in a field) is true. This perspective may lead to viewing the biostatistician in a reviewer role as a skeptic, when in reality they are necessarily neutral. This makes the biostatistician’s perspective helpful and often imperative in protocol review. As an impartial reviewer and according to the foundations of a biostatistician’s education and training, it is therefore the biostatistician’s responsibility to (1) ensure sound study design and analyses, and to (2) be critical and look for flaws in study design that may result in invalid findings. Other content-specific reviewers may have a tendency toward overly enthusiastic review of a research study given the scientific significance of the proposed research or lack of viable treatment options for an understudied disease. The biostatistician reviewer thus often provides a viewpoint that is further removed and more impartial, with the responsibility to preserve scientific rigor and integrity for all study protocols, regardless of significance of the research. We note that a complete, impartial review may not always warrant the same level of feedback to investigators. For example, investigators submitting a grant for review will benefit from the direction of a thorough written critique with guidance. However, for an institution considering joining a multicenter protocol, the statistical review may simply be a go/no-go statement.

As biostatistical reviewers tend to possess both specialized quantitative training and collaborative experiences, exposing them to a broad range of research across multiple disciplines, we view the biostatistician reviewer as an essential voice in any protocol review process. Biostatisticians often engage collaboratively across multiple research domains throughout the study lifecycle, not just review. Given this breadth and depth of involvement, a biostatistician can contrast a proposed study with successful approaches encountered in other disciplines. The biostatistician thus inherits the responsibility to cross-fertilize important methodologies.

A biostatistical reviewer, with sound and constructive critique of study protocols prior to their implementation, has the potential to prevent issues such as poor-quality data abstraction from medical records, high rates of loss to follow-up, lack of separation between treatment groups, insufficient blinding, failure to cleanly capture primary endpoints, and overly optimistic accrual expectations, among other preventable issues. Protocol review offers a chance to predict many such failures, thereby preventing research waste and unnecessary risks.

In this article, we have discussed components of a study protocol that a biostatistical reviewer (and, indeed, all reviewers) should evaluate when assessing whether a proposed study will answer the scientific question at hand. We posit that the biostatistical reviewer, through their breadth of engagement across multiple disciplines and experience with a broad range of research designs, can and should contribute significantly beyond review of the statistical analysis plan and sample size justification. Through careful scientific review, including biostatistical review as we outline here, we hope to prevent excess resource expenditure and risk to humans and animals on poorly planned studies.

Acknowledgments

This study was supported by the following Clinical and Translational Science Awards from the National Center for Advancing Translational Science: UL1TR001422 (J.D.C.), UL1TR002240 (C.S., S.K.), UL1TR001420 (W.T.A.), UL1 TR001450 (P.J.N), UL1TR003096 (R.A.O.), UL1TR002529 (S.M.P.), UL1 TR000002 (B.H.P.), UL1TR002553 (G.P.), UL1TR002544 (L.L.P), UL1 TR002243 (T.W.R, C.J.L), UL1TR001086 (T.D.T), UL1TR001439 (H.M.S.). Other NIH grant support includes NIAMS grant P30 AR072582, NIGMS grant U54-GM104941, and NIDDK P30 DK123704 (P.J.N). Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translational Sciences or the National Institutes of Health.

Biostatistics

  • Environmental Health and Engineering
  • Epidemiology
  • Health Policy and Management
  • Health, Behavior and Society
  • International Health
  • Mental Health
  • Molecular Microbiology and Immunology
  • Population, Family and Reproductive Health
  • Program Finder
  • Admissions Services
  • Course Directory
  • Academic Calendar
  • Hybrid Campus
  • Lecture Series
  • Convocation
  • Strategy and Development
  • Implementation and Impact
  • Integrity and Oversight
  • In the School
  • In the Field
  • In Baltimore
  • Resources for Practitioners
  • Articles & News Releases
  • In The News
  • Statements & Announcements
  • At a Glance
  • Student Life
  • Strategic Priorities
  • Inclusion, Diversity, Anti-Racism, and Equity (IDARE)
  • What is Public Health?

We create and apply methods for quantitative research in the health sciences, and we provide innovative biostatistics education, making discoveries to improve health.

FIND OUT WHAT'S NEW   MEET OUR STUDENTS   ALUMNI SPOTLIGHTS SEMINARS

  • Key Data Science Classes
  • Our Unique Perspective to Biostatistics
  • Anti-Racism Resource Directory
  • Code of Conduct
  • Alan Gittelsohn
  • Allyn Kimball
  • David Duncan
  • Charles Rohde
  • Dr. Helen Abbey
  • Jerome Cornfield
  • Lowell Reed
  • Margaret Merrell
  • Raymond Pearl
  • William Cochran
  • Dr. Alan Ross Through the Years
  • Message from the Chair
  • Health and Wellness
  • Biostatistics Student Organization
  • Journal/Computing Club
  • Departmental Student Awards
  • Major Awards in the Field
  • Past Honors and Awards Winners
  • Schoolwide Awards
  • Job Openings
  • Application Fee Waiver
  • Epidemiology and Biostatistics of Aging Training Grant (NIA-T32)
  • Guide to Introductory Biostatistics Course Sequences
  • Primary Faculty Within Biostatistics
  • Postdoctoral Fellows
  • MHS Student Profiles
  • PhD Student Profiles
  • ScM Student Profiles
  • Alumni Listing
  • Featured Alumni
  • 2016-2017 Noon Seminar Series
  • 2018-2019 Noon Seminar Series
  • 2019-2020 Noon Seminar Series
  • 2020-2021 Noon Seminar Series
  • 2022-2023 Seminar Series
  • 2023-2024 Seminar Series
  • E-Newsletters
  • PhD Student Defenses
  • Make a Gift
  • Professional Society Awards
  • Student Support Faculty

Biostatistics Headlines

Alumni spotlight: christopher lo, scm '23.

Christopher Lo, ScM ’23, is a data science trainer in the Data Science Lab at the Fred Hutch Cancer Center where he teaches biomedical data science to the Fred Hutch Cancer Center community.

Noted Biostatistician and Epidemiologist Jim Tonascia Retires

Jim Tonascia, whose public health career in biostatistics and epidemiology spanned more than five decades, retired from the Bloomberg School of Public Health this August.

Student Spotlight: Alyssa Columbus

Alyssa Columbus is a second-year PhD student in the Department of Biostatistics with an interest in public health informatics and data science, including educational interventions, ethical considerations, and policy implications.

What We Do in the Department of Biostatistics

The Bloomberg School's Department of Biostatistics is the oldest department of its kind in the world and has long been considered one of the best. Our faculty conduct research across the spectrum of statistical science, from foundations of inference to the discovery of new methodologies for health applications.

Our designs and analytic methods enable health scientists and professionals across industries to efficiently acquire knowledge and draw valid conclusions from ever-expanding sources of information.

Biostatistics Highlights

First in u.s..

First freestanding statistics department in the U.S.

Data science driving health and empowering opportunity

Foundational discoveries for inference and modeling

Creative, close-knit community

Biostatistics Programs

The Department of Biostatistics offers three graduate programs to applicants with a bachelor's degree (or higher) interested in professional or academic careers at the interface of the statistical and health sciences.

We also have funded training programs in the  Epidemiology and Biostatistics of Aging for PhD students who are U.S. citizens or permanent residents.

Master of Health Science (MHS)

Our one-year MHS program provides study in biostatistical theory & methods. It is also open to students concurrently enrolled in a JHU doctoral program.

Master of Science (ScM)

Our ScM targets individuals who have demonstrated prior excellence in quantitative or biological sciences and desire a career as a professional statistician.

Doctor of Philosophy (PhD)

Our PhD graduates lead research in the foundations of statistical reasoning, data science, and their application making discoveries to improve health.  

Nilanjan Chatterjee, PhD

Bloomberg Distinguished Professor Nilanjan Chatterjee, PhD, MS, models disease risk associated with genetics, lifestyle, biomarkers, and other factors, with the goal of improving disease prevention. Chatterjee recently received a GKII-KCDH Breakthrough Research Grant on Digital Health. His winning research proposal with Saket Choudhary will involve development of the first risk prediction model and clinical tool for the Indian population.

Nilanjan Chatterjee

Biostatistics Consulting Center

The Johns Hopkins Biostatistics Center is the practice arm of our Department, providing the latest in biostatistical and information science expertise to a wide range of clients both within and outside Johns Hopkins.

group of hands reviewing documents

Alyssa Columbus, Second-Year PhD Student

Alyssa Columbus is a second-year PhD student with an interest in public health informatics and data science, including educational interventions, ethical considerations (e.g., privacy and security) , and policy implications.

Alyssa Columbus

Follow our Department and Stay Up-To-Date!

Biostatistics dept seminar: oliver bear don't walk iv, support our department.

A gift to our department can help to provide student scholarships and internships, attract and retain faculty, and support innovation.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 21 August 2024

Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics

  • Gunsagar S. Gulati   ORCID: orcid.org/0000-0003-2798-6220 1   na1 ,
  • Jeremy Philip D’Silva   ORCID: orcid.org/0009-0005-0053-7726 2   na1 ,
  • Yunhe Liu   ORCID: orcid.org/0000-0002-2120-952X 3 ,
  • Linghua Wang 3 , 4 &
  • Aaron M. Newman   ORCID: orcid.org/0000-0002-1857-8172 2 , 5 , 6 , 7  

Nature Reviews Molecular Cell Biology ( 2024 ) Cite this article

13 Altmetric

Metrics details

  • Gene expression analysis

Single-cell transcriptomics has broadened our understanding of cellular diversity and gene expression dynamics in healthy and diseased tissues. Recently, spatial transcriptomics has emerged as a tool to contextualize single cells in multicellular neighbourhoods and to identify spatially recurrent phenotypes, or ecotypes. These technologies have generated vast datasets with targeted-transcriptome and whole-transcriptome profiles of hundreds to millions of cells. Such data have provided new insights into developmental hierarchies, cellular plasticity and diverse tissue microenvironments, and spurred a burst of innovation in computational methods for single-cell analysis. In this Review, we discuss recent advancements, ongoing challenges and prospects in identifying and characterizing cell states and multicellular neighbourhoods. We discuss recent progress in sample processing, data integration, identification of subtle cell states, trajectory modelling, deconvolution and spatial analysis. Furthermore, we discuss the increasing application of deep learning, including foundation models, in analysing single-cell and spatial transcriptomics data. Finally, we discuss recent applications of these tools in the fields of stem cell biology, immunology, and tumour biology, and the future of single-cell and spatial transcriptomics in biological research and its translation to the clinic.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 12 print issues and online access

$189.00 per year

only $15.75 per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

introduction to biostatistics and research methods

Similar content being viewed by others

introduction to biostatistics and research methods

Reconstruction of the cell pseudo-space from single-cell RNA sequencing data with scSpace

introduction to biostatistics and research methods

Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope

introduction to biostatistics and research methods

Spatial charting of single-cell transcriptomes in tissues

Quake, S. R. A decade of molecular cell atlases. Trends Genet. 38 , 805–810 (2022).

Article   CAS   PubMed   Google Scholar  

Baysoy, A., Bai, Z., Satija, R. & Fan, R. The technological landscape and applications of single-cell multi-omics. Nat. Rev. Mol. Cell Biol. 24 , 695–713 (2023).

Longo, S. K., Guo, M. G., Ji, A. L. & Khavari, P. A. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat. Rev. Genet. 22 , 627–644 (2021).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Rozenblatt-Rosen, O., Stubbington, M. J. T., Regev, A. & Teichmann, S. A. The Human Cell Atlas: from vision to reality. Nature 550 , 451–453 (2017).

Rood, J. E., Maartens, A., Hupalowska, A., Teichmann, S. A. & Regev, A. Impact of the Human Cell Atlas on medicine. Nat. Med. 28 , 2486–2496 (2022).

Ferreira, P. G. et al. The effects of death and post-mortem cold ischemia on human tissue transcriptomes. Nat. Commun. 9 , 490 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Baechler, E. C. et al. Expression levels for many genes in human peripheral blood cells are highly sensitive to ex vivo incubation. Genes Immun. 5 , 347–353 (2004).

Massoni-Badosa, R. et al. Sampling time-dependent artifacts in single-cell genomics studies. Genome Biol. 21 , 112 (2020).

Zhu, Y., Wang, L., Yin, Y. & Yang, E. Systematic analysis of gene expression patterns associated with postmortem interval in human tissues. Sci. Rep. 7 , 5435 (2017).

O’Flanagan, C. H. et al. Dissociation of solid tumor tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses. Genome Biol. 20 , 210 (2019).

Liu, Y. et al. Digestion of nucleic acids starts in the stomach. Sci. Rep. 5 , 11936 (2015).

Martinez-Diez, M. C., Serrano, M. A., Monte, M. J. & Marin, J. J. Comparison of the effects of bile acids on cell viability and DNA synthesis by rat hepatocytes in primary culture. Biochim. Biophys. Acta 1500 , 153–160 (2000).

Sorrentino, S. & Libonati, M. Human pancreatic-type and nonpancreatic-type ribonucleases: a direct side-by-side comparison of their catalytic properties. Arch. Biochem. Biophys. 312 , 340–348 (1994).

Denisenko, E. et al. Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows. Genome Biol. 21 , 130 (2020).

Cossarizza, A. et al. Guidelines for the use of flow cytometry and cell sorting in immunological studies. Eur. J. Immunol. 47 , 1584–1797 (2017).

Lahoz-Beneytez, J. et al. Human neutrophil kinetics: modeling of stable isotope labeling data supports short blood neutrophil half-lives. Blood 127 , 3431–3438 (2016).

Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14 , 865–868 (2017).

Quan, Y. et al. Impact of cell dissociation on identification of breast cancer stem cells. Cancer Biomark. 12 , 125–133 (2012).

Autengruber, A., Gereke, M., Hansen, G., Hennig, C. & Bruder, D. Impact of enzymatic tissue disintegration on the level of surface molecule expression and immune cell function. Eur. J. Microbiol. Immunol. 2 , 112–120 (2012).

Article   CAS   Google Scholar  

Peterson, V. M. et al. Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol. 35 , 936–939 (2017).

Butto, T. et al. Nuclei on the rise: when nuclei-based methods meet next-generation sequencing. Cells 12 , 1051 (2023).

Caglayan, E., Liu, Y. & Konopka, G. Neuronal ambient RNA contamination causes misinterpreted and masked cell types in brain single-nuclei datasets. Neuron 110 , 4043–4056.e5 (2022).

Thrupp, N. et al. Single-nucleus RNA-Seq is not suitable for detection of microglial activation genes in humans. Cell Rep. 32 , 108189 (2020).

Pitchiaya, S. et al. Dynamic recruitment of single RNAs to processing bodies depends on RNA functionality. Mol. Cell 74 , 521–533.e6 (2019).

Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19 , 562–578 (2018).

Article   PubMed   Google Scholar  

Koenig, A. L. et al. Single-cell transcriptomics reveals cell-type-specific diversification in human heart failure. Nat. Cardiovasc. Res. 1 , 263–280 (2022).

Chervov, A. & Zinovyev, A. Computational challenges of cell cycle analysis using single cell transcriptomics. Preprint at https://doi.org/10.48550/arXiv.2208.05229 (2022).

Osorio, D. & Cai, J. J. Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control. Bioinformatics 37 , 963–967 (2021).

Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 21 , 12 (2020).

Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2 , lqaa078 (2020).

Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43 , e47 (2015).

Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36 , 421–427 (2018).

Polański, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics 36 , 964–965 (2020).

Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16 , 1289–1296 (2019).

Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37 , 685–691 (2019).

Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177 , 1873–1887.e17 (2019).

Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177 , 1888–1902.e21 (2019).

Nguyen, H. C. T., Baik, B., Yoon, S., Park, T. & Nam, D. Benchmarking integration of single-cell differential expression. Nat. Commun. 14 , 1570 (2023).

Junttila, S., Smolander, J. & Elo, L. L. Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data. Brief. Bioinform. 23 , bbac286 (2022).

Soneson, C. & Robinson, M. D. Bias, robustness and scalability in single-cell differential expression analysis. Nat. Methods 15 , 255–261 (2018).

Dann, E., Henderson, N. C., Teichmann, S. A., Morgan, M. D. & Marioni, J. C. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nat. Biotechnol. 40 , 245–253 (2022).

Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15 , 1053–1058 (2018).

Lakkis, J. et al. A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics. Genome Res. 31 , 1753–1766 (2021).

Li, H., McCarthy, D. J., Shim, H. & Wei, S. Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. BMC Bioinform. 23 , 460 (2022).

Lotfollahi, M., Wolf, F. A. & Theis, F. J. scGen predicts single-cell perturbation responses. Nat. Methods 16 , 715–721 (2019).

Bahrami, M. et al. Deep feature extraction of single-cell transcriptomes by generative adversarial network. Bioinformatics 37 , 1345–1351 (2021).

Tyler, S. R., Guccione, E. & Schadt, E. E. Erasure of biologically meaningful signal by unsupervised scRNAseq batch-correction methods. Preprint at bioRxiv https://doi.org/10.1101/2021.11.15.468733 (2023).

Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19 , 41–50 (2022).

Zhang, Z. et al. Signal recovery in single cell batch integration. Preprint at bioRxiv https://doi.org/10.1101/2023.05.05.539614 (2023).

Dann, E. et al. Precise identification of cell states altered in disease using healthy single-cell references. Nat. Genet. 55 , 1998–2008 (2023).

Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020 , baaa073 (2020).

Regev, A. et al. The Human Cell Atlas. eLife 6 , e27041 (2017).

Eraslan, G. et al. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science 376 , eabl4290 (2022).

Slyper, M. et al. A single-cell and single-nucleus RNA-seq toolbox for fresh and frozen human tumors. Nat. Med. 26 , 792–802 (2020).

Zhang, A. W. et al. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 16 , 1007–1015 (2019).

Pliner, H. A., Shendure, J. & Trapnell, C. Supervised classification enables rapid annotation of cell atlases. Nat. Methods 16 , 983–986 (2019).

Ianevski, A., Giri, A. K. & Aittokallio, T. Fully-automated and ultra-fast cell-type identification using specific marker combinations from single-cell transcriptomic data. Nat. Commun. 13 , 1246 (2022).

Franchini, M., Pellecchia, S., Viscido, G. & Gambardella, G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom. Bioinform. 5 , lqad024 (2023).

Aran, D. et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol. 20 , 163–172 (2019).

Kiselev, V. Y., Yiu, A. & Hemberg, M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 15 , 359–362 (2018).

de Kanter, J. K., Lijnzaad, P., Candelli, T., Margaritis, T. & Holstege, F. C. P. CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic Acids Res. 47 , e95 (2019).

Lyu, P., Zhai, Y., Li, T. & Qian, J. CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics 39 , btad521 (2023).

Boufea, K., Seth, S. & Batada, N. N. scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell RNA-seq data with batch effect. iScience 23 , 100914 (2020).

Lin, Y. et al. scClassify: sample size estimation and multiscale classification of cells using single and multiple reference. Mol. Syst. Biol. 16 , e9389 (2020).

Yin, Q. et al. scGraph: a graph neural network-based approach to automatically identify cell types. Bioinformatics 38 , 2996–3003 (2022).

Alquicira-Hernandez, J., Sathe, A., Ji, H. P., Nguyen, Q. & Powell, J. E. scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data. Genome Biol. 20 , 264 (2019).

Li, C. et al. SciBet as a portable and fast single cell type identifier. Nat. Commun. 11 , 1818 (2020).

Hou, W. & Ji, Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods https://doi.org/10.1038/s41592-024-02235-4 (2024).

Lotfollahi, M., Yuhan, H., Theis, F. J. & Satija, R. The future of rapid and automated single-cell data analysis using reference mapping. Cell 187 , 2343–2358 (2024).

Michielsen, L. et al. Single-cell reference mapping to construct and extend cell-type hierarchies. NAR Genom. Bioinform. 5 , lqad070 (2023).

Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40 , 121–130 (2022).

Osumi-Sutherland, D. et al. Cell type ontologies of the Human Cell Atlas. Nat. Cell Biol. 23 , 1129–1135 (2021).

Flanagin, A., Frey, T. & Christiansen, S. L. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA 326 , 621–627 (2021).

Brbić, M. et al. Annotation of spatially resolved single-cell data with STELLAR. Nat. Methods 19 , 1411–1418 (2022).

Wen, L. & Tang, F. Single-cell sequencing in stem cell biology. Genome Biol. 17 , 71 (2016).

Chen, H., Ye, F. & Guo, G. Revolutionizing immunology with single-cell RNA sequencing. Cell Mol. Immunol. 16 , 242–249 (2019).

Gavish, A. et al. Hallmarks of transcriptional intratumour heterogeneity across a thousand tumours. Nature 618 , 598–606 (2023).

Zhang, Y. et al. Single-cell RNA sequencing in cancer research. J. Exp. Clin. Cancer Res. 40 , 81 (2021).

Machado, L. et al. Tissue damage induces a conserved stress response that initiates quiescent muscle stem cell activation. Cell Stem Cell 28 , 1125–1135.e7 (2021).

Uniken Venema, W. T. C. et al. Gut mucosa dissociation protocols influence cell type proportions and single-cell gene expression levels. Sci. Rep. 12 , 9897 (2022).

van den Brink, S. C. et al. Single-cell sequencing reveals dissociation-induced gene expression in tissue subpopulations. Nat. Methods 14 , 935–936 (2017).

Jones, R. C. et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376 , eabl4896 (2022).

Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562 , 367–372 (2018).

Article   Google Scholar  

Kumar, T. et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature 620 , 181–191 (2023).

Chen, J. Y. et al. Hoxb5 marks long-term haematopoietic stem cells and reveals a homogenous perivascular niche. Nature 530 , 223–227 (2016).

Rossi, L., Challen, G. A., Sirin, O., Lin, K. K. & Goodell, M. A. Hematopoietic stem cell characterization and isolation. Methods Mol. Biol. 750 , 47–59 (2011).

Ikuta, K. & Weissman, I. L. Evidence that hematopoietic stem cells express mouse c-kit but do not depend on steel factor for their generation. Proc. Natl Acad. Sci. USA 89 , 1502–1506 (1992).

Liu, D. D. et al. Purification and characterization of human neural stem and progenitor cells. Cell 186 , 1179–1194.e15 (2023).

Chan, C. K. F. et al. Identification of the human skeletal stem cell. Cell 175 , 43–56.e21 (2018).

Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65 , 631–643.e4 (2017).

Al’Khafaji, A. M. et al. High-throughput RNA isoform sequencing using programmed cDNA concatenation. Nat. Biotechnol. 42 , 582–586 (2023).

Salmen, F. et al. High-throughput total RNA sequencing in single cells using VASA-seq. Nat. Biotechnol. 40 , 1780–1793 (2022).

Herman, J. S., Sagar & Grün, D. FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data. Nat. Methods 15 , 379–386 (2018).

Jindal, A., Gupta, P., Jayadeva & Sengupta, D. Discovery of rare cells from voluminous single cell expression data. Nat. Commun. 9 , 4719 (2018).

Fa, B. et al. GapClust is a light-weight approach distinguishing rare cells from voluminous single cell expression profiles. Nat. Commun. 12 , 4197 (2021).

Dong, R. & Yuan, G. C. GiniClust3: a fast and memory-efficient tool for rare cell type identification. BMC Bioinform. 21 , 158 (2020).

Wegmann, R. et al. CellSIUS provides sensitive and specific detection of rare cell populations from complex single-cell RNA-seq data. Genome Biol. 20 , 142 (2019).

Song, D., Li, K., Hemminger, Z., Wollman, R. & Li, J. J. scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 37 , i358–i366 (2021).

Neufeld, A., Gao, L. L., Popp, J., Battle, A. & Witten, D. Inference after latent variable estimation for single-cell RNA sequencing data. Biostatistics 25 , 270–287 (2023).

Song, D., Li, K., Ge, X. & Li, J. J. ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double bioRxiv (2023).

Persad, S. et al. SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data. Nat. Biotechnol. 41 , 1746–1757 (2023).

Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 56 , 431–441 (2024).

Cannoodt, R., Saelens, W. & Saeys, Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 46 , 2496–2506 (2016).

Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37 , 547–554 (2019).

La Manno, G. et al. RNA velocity of single cells. Nature 560 , 494–498 (2018).

Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38 , 1408–1414 (2020).

Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21 , 39 (2020).

Hendriks, G. J. et al. NASC-seq monitors RNA synthesis in single cells. Nat. Commun. 10 , 3138 (2019).

Erhard, F. et al. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature 571 , 419–423 (2019).

Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173 , 1535–1548.e16 (2018).

Pei, W. et al. Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548 , 456–460 (2017).

Sharma, R. et al. The TRACE-seq method tracks recombination alleles and identifies clonal reconstitution dynamics of gene targeted human hematopoietic stem cells. Nat. Commun. 12 , 472 (2021).

Yang, D. et al. Lineage tracing reveals the phylodynamics, plasticity, and paths of tumor evolution. Cell 185 , 1905–1923.e25 (2022).

Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595 , 85–90 (2021).

Muyas, F. et al. De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat. Biotechnol. 42 , 758–767 (2024).

Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176 , 1325–1339.e22 (2019).

Gabbutt, C. et al. Fluctuating methylation clocks for cell lineage tracing at high temporal resolution in human tissues. Nat. Biotechnol. 40 , 720–730 (2022).

DuPage, M. & Bluestone, J. A. Harnessing the plasticity of CD4 + T cells to treat immune-mediated disease. Nat. Rev. Immunol. 16 , 149–163 (2016).

Huyghe, A., Trajkova, A. & Lavial, F. Cellular plasticity in reprogramming, rejuvenation and tumorigenesis: a pioneer TF perspective. Trends Cell Biol. 34 , 255–267 (2024).

Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37 , 451–460 (2019).

Stassen, S. V., Yip, G. G. K., Wong, K. K. Y., Ho, J. W. K. & Tsia, K. K. Generalized and scalable trajectory inference in single-cell omics data with VIA. Nat. Commun. 12 , 5528 (2021).

Pandey, K. & Zafar, H. Inference of cell state transitions and cell fate plasticity from single-cell with MARGARET. Nucleic Acids Res. 50 , e86 (2022).

Lönnberg, T. et al. Single-cell RNA-seq and computational analysis using temporal mixture modelling resolves Th1/Tfh fate bifurcation in malaria. Sci. Immunol. 2 , eaal2192 (2017).

Velten, L. et al. Human haematopoietic stem cell lineage commitment is a continuous process. Nat. Cell Biol. 19 , 271–281 (2017).

Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19 , 159–170 (2022).

Weiler, P., Lange, M., Klein, M., Pe’er, D. & Theis, F. CellRank 2: unified fate mapping in multiview single-cell data. Nat. Methods 21 , 1196–1205 (2024).

Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176 , 928–943.e22 (2019).

Tong, A., Huang, J., Wolf, G., van Dijk, D. & Krishnaswamy, S. TrajectoryNet: a dynamic optimal transport network for modeling cellular dynamics. Proc. Mach. Learn. Res. 119 , 9526–9536 (2020).

PubMed   PubMed Central   Google Scholar  

Kamimoto, K. et al. Dissecting cell identity via network inference and in silico gene perturbation. Nature 614 , 742–751 (2023).

Weissman, I. L. Stem cells: units of development, units of regeneration, and units in evolution. Cell 100 , 157–168 (2000).

Senra, D., Guisoni, N. & Diambra, L. ORIGINS: a protein network-based approach to quantify cell pluripotency from scRNA-seq data. MethodsX 9 , 101778 (2022).

Malta, T. M. et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell 173 , 338–354.e15 (2018).

Müller, F. J. et al. Regulatory networks define phenotypic classes of human stem cell lines. Nature 455 , 401–405 (2008).

Zhang, F. et al. FitDevo: accurate inference of single-cell developmental potential using sample-specific gene weight. Brief. Bioinform. 23 , bbac293 (2022).

Gulati, G. S. et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367 , 405–411 (2020).

Grün, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19 , 266–277 (2016).

Kannan, S., Farid, M., Lin, B. L., Miyamoto, M. & Kwon, C. Transcriptomic entropy benchmarks stem cell-derived cardiomyocyte maturation against endogenous tissue at single cell level. PLoS Comput. Biol. 17 , e1009305 (2021).

Jin, S., MacLean, A. L., Peng, T. & Nie, Q. scEpath: energy landscape-based inference of transition probabilities and cellular trajectories from single-cell transcriptomic data. Bioinformatics 34 , 2077–2086 (2018).

Guo, M., Bao, E. L., Wagner, M., Whitsett, J. A. & Xu, Y. SLICE: determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 45 , e54 (2017).

PubMed   Google Scholar  

Teschendorff, A. E. & Enver, T. Single-cell entropy for accurate estimation of differentiation potency from a cell’s transcriptome. Nat. Commun. 8 , 15599 (2017).

Teschendorff, A. E., Maity, A. K., Hu, X., Weiyan, C. & Lechner, M. Ultra-fast scalable estimation of single-cell differentiation potency from scRNA-seq data. Bioinformatics 37 , 1528–1534 (2021).

Ni, X. et al. Accurate estimation of single-cell differentiation potency based on network topology and gene ontology information. IEEE/ACM Trans. Comput. Biol. Bioinform. 19 , 3255–3262 (2022).

Kang, M. et al. Mapping single-cell developmental potential in health and disease with interpretable deep learning. Preprint at bioRxiv https://doi.org/10.1101/2024.03.19.585637 (2024).

Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40 , 308–318 (2022).

Lohoff, T. et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 40 , 74–85 (2022).

Hickey, J. W. et al. Organization of the human intestine at single-cell resolution. Nature 619 , 572–584 (2023).

Greenwald, A. C. et al. Integrative spatial analysis reveals a multi-layered organization of glioblastoma. Cell 187 , 2485–2501.e26 (2024).

Liu, J. et al. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci. Alliance 6 , e202201701 (2023).

He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40 , 1794–1806 (2022).

Eng, C. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH. Nature 568 , 235–239 (2019).

Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361 , eaat5691 (2018).

Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14 , 8353 (2023).

Liang, S. et al. Single-cell manifold-preserving feature selection for detecting rare cell populations. Nat. Comput. Sci. 1 , 374–384 (2021).

Missarova, A. et al. geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq. Genome Biol. 22 , 333 (2021).

Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39 , 313–319 (2021).

Wei, X. et al. Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science 377 , eabp9444 (2022).

Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185 , 1777–1792.e21 (2022).

Nagendran, M. et al. 1457 Visium HD enables spatially resolved, single-cell scale resolution mapping of FFPE human breast cancer tissue. J. Immunother. Cancer 11 , A1620 (2023).

Google Scholar  

Lee, Y. et al. XYZeq: spatially resolved single-cell RNA sequencing reveals expression heterogeneity in the tumor microenvironment. Sci. Adv . 7 , eabg4755 (2021).

Srivatsan, S. R. et al. Embryo-scale, single-cell spatial transcriptomics. Science 373 , 111–117 (2021).

Russell, A. J. C. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature 625 , 101–109 (2024).

Hawrylycz, M. J. et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489 , 391–399 (2012).

Jung, N. & Kim, T.-K. Spatial transcriptomics in neuroscience. Exp. Mol. Med. 55 , 2105–2115 (2023).

Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19 , 534–546 (2022).

Magoulopoulou, A. et al. Padlock probe-based targeted in situ sequencing: overview of methods and applications. Annu. Rev. Genom. Hum. Genet. 24 , 133–150 (2023).

Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research. Genome Med. 14 , 68 (2022).

Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3 , 505–517 (2022).

Newman, A. M. et al. Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol. 37 , 773–782 (2019).

Li, B. et al. Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution. Nat. Methods 19 , 662–670 (2022).

Kleshchevnikov, V. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat. Biotechnol. 40 , 661–671 (2022).

Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40 , 517–526 (2022).

Vahid, M. R. et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat. Biotechnol. 41 , 1543–1548 (2023).

Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. Methods 18 , 1352–1362 (2021).

Wei, R. et al. Spatial charting of single-cell transcriptomes in tissues. Nat. Biotechnol. 40 , 1190–1199 (2022).

Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 39 , 1375–1384 (2021).

Bergenstråhle, L. et al. Super-resolved spatial transcriptomics by deep data fusion. Nat. Biotechnol. 40 , 476–479 (2022).

Hu, J. et al. Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA. Cell Syst. 14 , 404–417.e4 (2023).

Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol . https://doi.org/10.1038/s41587-023-02019-9 (2024).

Miller, B. F., Huang, F., Atta, L., Sahoo, A. & Fan, J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat. Commun. 13 , 2339 (2022).

Garcia-Alonso, L. et al. Single-cell roadmap of human gonadal development. Nature 607 , 540–547 (2022).

Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184 , 810–826.e23 (2021).

Moor, A. E. et al. Spatial reconstruction of single enterocytes uncovers broad zonation along the intestinal villus axis. Cell 175 , 1156–1167.e15 (2018).

Bahar Halpern, K. et al. Lgr5+ telocytes are a signaling source at the intestinal villus tip. Nat. Commun. 11 , 1936 (2020).

Valdeolivas, A. et al. Profiling the heterogeneity of colorectal cancer consensus molecular subtypes using spatial transcriptomics. NPJ Precis. Oncol. 8 , 10 (2024).

Sibai, M. et al. The spatial landscape of Cancer Hallmarks reveals patterns of tumor ecology. Preprint at bioRxiv https://doi.org/10.1101/2022.06.18.496114 (2023).

Heiser, C. N. et al. Molecular cartography uncovers evolutionary and microenvironmental dynamics in sporadic colorectal tumors. Cell 186 , 5620–5637.e16 (2023).

Ren, Y. et al. Spatial transcriptomics reveals niche-specific enrichment and vulnerabilities of radial glial stem-like cells in malignant gliomas. Nat. Commun. 14 , 1028 (2023).

Arora, R. et al. Spatial transcriptomics reveals distinct and conserved tumor core and edge architectures that predict survival and targeted therapy response. Nat. Commun. 14 , 5029 (2023).

Meylan, M. et al. Tertiary lymphoid structures generate and propagate anti-tumor antibody-producing plasma cells in renal cell cancer. Immunity 55 , 527–541.e5 (2022).

Haviv, D. et al. The covariance environment defines cellular niches for spatial inference. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02193-4 (2024).

Abdelaal, T., Mourragui, S., Mahfouz, A. & Reinders, M. J. T. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 48 , e107 (2020).

Sun, E. D., Ma, R., Navarro Negredo, P., Brunet, A. & Zou, J. TISSUE: uncertainty-calibrated prediction of single-cell spatial transcriptomics improves downstream analyses. Nat. Methods 21 , 444–454 (2024).

Clifton, K. et al. STalign: alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat. Commun. 14 , 8123 (2023).

Jones, A., Townes, F. W., Li, D. & Engelhardt, B. E. Alignment of spatial genomics data using deep Gaussian processes. Nat. Methods 20 , 1379–1387 (2023).

Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624 , 343–354 (2023).

Preibisch, S., Saalfeld, S. & Tomancak, P. Globally optimal stitching of tiled 3D microscopic image acquisitions. Bioinformatics 25 , 1463–1465 (2009).

Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. CellCharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet. 56 , 74–84 (2024).

Rajachandran, S. et al. Dissecting the spermatogonial stem cell niche using spatial transcriptomics. Cell Rep. 42 , 112737 (2023).

Walsh, L. A. & Quail, D. F. Decoding the tumor microenvironment with spatial technologies. Nat. Immunol. 24 , 1982–1993 (2023).

Morabito, S., Reese, F., Rahimzadeh, N., Miyoshi, E. & Swarup, V. hdWGCNA identifies co-expression networks in high-dimensional transcriptomics data. Cell Rep. Methods 3 , 100498 (2023).

Choi, J. et al. QuadST: a powerful and robust approach for identifying cell-cell interaction-changed genes on spatially resolved transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2023.12.04.570019 (2023).

Pentimalli, T. M. et al. High-resolution molecular atlas of a lung tumor in 3D. Preprint at bioRxiv https://doi.org/10.1101/2023.05.10.539644 (2023).

Schürch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182 , 1341–1359.e19 (2020).

Qiu, X. et al. Spateo: multidimensional spatiotemporal modeling of single-cell spatial transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2022.12.07.519417 (2022).

Wu, Z. et al. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nat. Biomed. Eng. 6 , 1435–1448 (2022).

Farah, E. N. et al. Spatially organized cellular communities form the developing human heart. Nature 627 , 854–864 (2024).

Bhate, S. S., Barlow, G. L., Schürch, C. M. & Nolan, G. P. Tissue schematics map the specialization of immune tissue motifs and their appropriation by tumors. Cell Syst. 13 , 109–130.e6 (2022).

Kim, J. et al. Unsupervised discovery of tissue architecture in multiplexed imaging. Nat. Methods 19 , 1653–1661 (2022).

Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14 , 1155 (2023).

Ren, H., Walker, B. L., Cang, Z. & Nie, Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 13 , 4076 (2022).

Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci. 2 , 399–408 (2022).

Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat. Commun. 14 , 7739 (2023).

Cang, Z. et al. Screening cell-cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20 , 218–228 (2023).

Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 12 , 6012 (2021).

Birk, S. et al. Large-scale characterization of cell niches in spatial atlases using bio-inspired graph learning. Preprint at bioRxiv https://doi.org/10.1101/2024.02.21.581428 (2024).

Turesson, G. The genotypical response of the plant species to the habitat. Hereditas 3 , 211–350 (1922).

Ortiz, R. Göte Turesson’s research legacy to Hereditas: from the ecotype concept in plants to the analysis of landraces’ diversity in crops. Hereditas 157 , 44 (2020).

Luca, B. A. et al. Atlas of clinically distinct cell states and ecosystems across human solid tumors. Cell 184 , 5482–5496.e28 (2021).

Steen, C. B. et al. The landscape of tumor cell states and ecosystems in diffuse large B cell lymphoma. Cancer Cell 39 , 1422–1437.e10 (2021).

Luca, B. A. et al. Atlas of clinically-distinct cell states and cellular ecosystems across human solid tumors. Cancer Res. 80 , abstr. 3443 (2020).

Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53 , 1334–1347 (2021).

Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40 , 1467–1477 (2022).

Bill, R. et al. CXCL9:SPP1 macrophage polarity identifies a network of cellular programs that control human cancers. Science 381 , 515–524 (2023).

Pelka, K. et al. Spatially organized multicellular immune hubs in human colorectal cancer. Cell 184 , 4734–4752.e20 (2021).

Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182 , 497–514.e22 (2020).

He, S. et al. Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor-immune hubs. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02173-8 (2024).

Liu, C. et al. Spatiotemporal mapping of gene expression landscapes and developmental trajectories during zebrafish embryogenesis. Dev. Cell 57 , 1284–1298.e5 (2022).

Gu, Y., Liu, J., Li, C. & Welch, J. D. Mapping cell fate transition in space and time. Preprint at bioRxiv https://doi.org/10.1101/2024.02.12.579941 (2024).

Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601 , 85–91 (2022).

Xue, Y. et al. Single-cell mitochondrial variant enrichment resolved clonal tracking and spatial architecture in human embryonic hematopoiesis. Preprint at bioRxiv https://doi.org/10.1101/2023.09.18.558215 (2023).

Ratz, M. et al. Clonal relations in the mouse brain revealed by single-cell and spatial transcriptomics. Nat. Neurosci. 25 , 285–294 (2022).

Erickson, A. et al. Spatially resolved clonal copy number alterations in benign and malignant tissue. Nature 608 , 360–367 (2022).

Lomakin, A. et al. Spatial genomics maps the structure, nature and evolution of cancer clones. Nature 611 , 594–602 (2022).

Househam, J. et al. Phenotypic plasticity and genetic control in colorectal cancer evolution. Nature 611 , 744–753 (2022).

Lim, J. et al. Transitioning single-cell genomics into the clinic. Nat. Rev. Genet. 24 , 573–584 (2023).

Van de Sande, B. et al. Applications of single-cell RNA sequencing in drug discovery and development. Nat. Rev. Drug Discov. 22 , 496–520 (2023).

Lozano, A. X. et al. T cell characteristics associated with toxicity to immune checkpoint blockade in patients with melanoma. Nat. Med. 28 , 353–362 (2022).

Kwon, M. et al. Determinants of response and intrinsic resistance to PD-1 blockade in microsatellite instability-high gastric cancer. Cancer Discov. 11 , 2168–2185 (2021).

Zhou, Y. et al. Single-cell multiomics sequencing reveals prevalent genomic alterations in tumor stromal cells of human colorectal cancer. Cancer Cell 38 , 818–828.e5 (2020).

Abe, Y. et al. A single-cell atlas of non-haematopoietic cells in human lymph nodes and lymphoma reveals a landscape of stromal remodelling. Nat. Cell Biol. 24 , 565–578 (2022).

Ajani, J. A. et al. YAP1 mediates gastric adenocarcinoma peritoneal metastases that are attenuated by YAP1 inhibition. Gut 70 , 55–66 (2021).

Beneyto-Calabuig, S. et al. Clonally resolved single-cell multi-omics identifies routes of cellular differentiation in acute myeloid leukemia. Cell Stem Cell 30 , 706–721.e8 (2023).

Miyamoto, D. T., Ting, D. T., Toner, M., Maheswaran, S. & Haber, D. A. Single-cell analysis of circulating tumor cells as a window into tumor heterogeneity. Cold Spring Harb. Symp. Quant. Biol. 81 , 269–274 (2016).

Loh, K. M. et al. Mapping the pairwise choices leading from pluripotency to human bone, heart, and other mesoderm cell types. Cell 166 , 451–467 (2016).

Karimi, E. et al. Single-cell spatial immune landscapes of primary and metastatic brain tumours. Nature 614 , 555–563 (2023).

Wang, X. Q. et al. Spatial predictors of immunotherapy response in triple-negative breast cancer. Nature 621 , 868–876 (2023).

Lin, J. R. et al. High-plex immunofluorescence imaging and traditional histology of the same tissue section for discovering image-based biomarkers. Nat. Cancer 4 , 1036–1052 (2023).

Sorin, M. et al. Single-cell spatial landscapes of the lung tumour immune microenvironment. Nature 614 , 548–554 (2023).

Lin, J. R. et al. Multiplexed 3D atlas of state transitions and immune interaction in colorectal cancer. Cell 186 , 363–381.e19 (2023).

Digre, A. & Lindskog, C. The human protein atlas-Integrated omics for single cell mapping of the human proteome. Protein Sci. 32 , e4562 (2023).

Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods https://doi.org/10.1038/s41592-024-02201-0 (2024).

Carpenter, A. E. & Singh, S. Bringing computation to biology by bridging the last mile. Nat. Cell Biol. 26 , 5–7 (2024).

Hu, Y. et al. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nat. Methods 21 , 267–278 (2024).

Heimberg, G. et al. Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages. Preprint at bioRxiv https://doi.org/10.1101/2023.07.18.549537 (2023).

Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2022).

Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4 , 852–866 (2022).

Bian, H. et al. scMulan: a multitask generative pre-trained language model for single-cell analysis. Preprint at bioRxiv https://doi.org/10.1101/2024.01.25.577152 (2024).

Hao, M. et al. Large-scale foundation model on single-cell transcriptomics. Nat. Methods https://doi.org/10.1038/s41592-024-02305-7 (2024).

Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618 , 616–624 (2023).

Shen, H. et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience 26 , 106536 (2023).

Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with knowledge-informed cross-species foundation model. Preprint at bioRxiv https://doi.org/10.1101/2023.09.26.559542 (2023).

Rosen, Y. et al. Universal cell embeddings: a foundation model for cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).

Zhang, R., Luo, Y., Ma, J., Zhang, M. & Wang, S. scPretrain: multi-task self-supervised learning for cell-type classification. Bioinformatics 38 , 1607–1614 (2022).

Alsabbagh, A. R. et al. Foundation models meet imbalanced single-cell data when learning cell type annotations. Preprint at bioRxiv https://doi.org/10.1101/2023.10.24.563625 (2023).

Boiarsky, R., Singh, N., Buendia, A., Getz, G. & Sontag, D. A deep dive into single-cell RNA sequencing foundation models. Preprint at bioRxiv https://doi.org/10.1101/2023.10.19.563100 (2023).

Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Assessing the limits of zero-shot foundation models in single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.561085 (2023).

Khan, S. A. et al. Reusability report: learning the transcriptional grammar in single-cell RNA-sequencing data using transformers. Nat. Mach. Intell. 5 , 1437–1446 (2023).

Schaar, A. C. et al. Nicheformer: a foundation model for single-cell and spatial omics. Preprint at bioRxiv https://doi.org/10.1101/2024.04.15.589472 (2024).

Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42 , 927–935 (2023).

Zhang, Y., Tiňo, P., Leonardis, A. & Tang, K. A survey on neural network interpretability. IEEE Trans. Emerg. Top. Comput. Intell. 5 , 726–742 (2021).

Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186 , 1772–1791 (2023).

Xu, C. et al. Probabilistic harmonization and annotation of single-cell transcriptomics data with deep generative models. Mol. Syst. Biol. 17 , e9620 (2021).

Zimmerman, K. D., Espeland, M. A. & Langefeld, C. D. A practical solution to pseudoreplication bias in single-cell studies. Nat. Commun. 12 , 738 (2021).

Crowell, H. L. et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun. 11 , 6077 (2020).

He, L. et al. NEBULA is a fast negative binomial mixed model for differential or co-expression analysis of large-scale multi-subject single-cell data. Commun. Biol. 4 , 629 (2021).

Pullin, J. M. & McCarthy, D. J. A comparison of marker gene selection methods for single-cell RNA sequencing data. Genome Biol. 25 , 56 (2024).

Wang, X. et al. An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 28 , 2534–2536 (2012).

Hao, Y. et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42 , 293–304 (2024).

Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566 , 496–502 (2019).

Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19 , 477 (2018).

Ellwanger, D. C., Scheibinger, M., Dumont, R. A., Barr-Gillespie, P. G. & Heller, S. Transcriptional dynamics of hair-bundle morphogenesis revealed with CellTrails. Cell Rep. 23 , 2901–2914.e13 (2018).

Cannoodt, R. et al. SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. Preprint at bioRxiv https://doi.org/10.1101/079509 (2016).

Dong, R. & Yuan, G. C. SpatialDWLS: accurate deconvolution of spatial transcriptomic data. Genome Biol. 22 , 145 (2021).

Wan, X. et al. Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nat. Commun. 14 , 7848 (2023).

Newman, A. M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12 , 453–457 (2015).

Wang, X., Park, J., Susztak, K., Zhang, N. R. & Li, M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun. 10 , 380 (2019).

Menden, K. et al. Deep learning-based cell composition analysis from tissue expression profiles. Sci. Adv. 6 , eaba2619 (2020).

Garmire, L. X. et al. Challenges and perspectives in computational deconvolution of genomics data. Nat. Methods 21 , 391–400 (2024).

Li, H. et al. DeconPeaker, a deconvolution model to identify cell types based on chromatin accessibility in ATAC-seq data of mixture samples. Front. Genet. 11 , 392 (2020).

Hutson, M. Hunting for the best bioscience software tool? Check this database. Nature https://doi.org/10.1038/d41586-023-00053-w (2023).

Decamps, C. et al. Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free DNA methylation deconvolution software. BMC Bioinform. 21 , 16 (2020).

Gohil, S. H., Iorgulescu, J. B., Braun, D. A., Keskin, D. B. & Livak, K. J. Applying high-dimensional single-cell technologies to the analysis of cancer immunotherapy. Nat. Rev. Clin. Oncol. 18 , 244–256 (2021).

Download references

Acknowledgements

The authors are grateful to the members of the Newman and Wang laboratories for the valuable discussions and feedback. The original figures were created with Biorender.com. This work was supported by the National Science Foundation (J.P.D., Graduate Research Fellowship DGE-1656518), the National Cancer Institute (L.W., R01CA266280 and U24CA274274; A.M.N., R01CA255450), the Cancer Prevention and Research Institute of Texas (L.W., RP200385), the Break Through Cancer Foundation (L.W.), the University Cancer Foundation via the Institutional Research Grant Program (L.W.), the Melanoma Research Alliance (A.M.N., grant number 926521), and the Virginia and D. K. Ludwig Fund for Cancer Research (A.M.N.). L.W. is an Andrew Sabin Family Foundation Fellow. A.M.N. is a Chan Zuckerberg Biohub – San Francisco Investigator.

Author information

These authors contributed equally: Gunsagar S. Gulati, Jeremy Philip D’Silva.

Authors and Affiliations

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA

Gunsagar S. Gulati

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA

Jeremy Philip D’Silva & Aaron M. Newman

Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

Yunhe Liu & Linghua Wang

The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA

Linghua Wang

Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA

Aaron M. Newman

Stanford Cancer Institute, Stanford University, Stanford, CA, USA

Chan Zuckerberg Biohub – San Francisco, San Francisco, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

All authors discussed the content of the article, contributed to writing or editing, and reviewed the manuscript before submission. L.W. and A.M.N. jointly supervised the work.

Corresponding author

Correspondence to Aaron M. Newman .

Ethics declarations

Competing interests.

A.M.N. holds patents related to digital cytometry and cancer biomarkers and has ownership interests in CiberMed, Inc., LiquidCell Dx, Inc. and CytoTrace Biosciences, Inc. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Reviews Molecular Cell Biology thanks Tallulah Andrews and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Gene Ontology: https://geneontology.org/

Human Cell Atlas: https://www.humancellatlas.org/

Human Protein Atlas: http://www.proteinatlas.org

Human Tumor Atlas Network: https://humantumoratlas.org/

Extraneous RNA molecules arising from lysed cells during sample processing that can contaminate gene expression measurements.

Comprehensive references of cell types and states, typically generated using single-cell omics technologies.

A technique for grouping elements of a dataset (for example, cells) by a similarity measure.

Quantitative phenotypes (for example, the abundance of a particular cell state or type within a tissue) that are statistically associated with a health outcome (known as ‘prognostic’ biomarkers) or the likelihood of responding to a given treatment (‘predictive’ biomarker).

A computational technique typically applied to bulk RNA admixtures to infer the proportions and characteristics of specific cell types within a complex tissue sample using gene expression signatures.

A dataset containing information on the expression levels of numerous genes across multiple samples, which in single-cell RNA-seq data are single cells.

Statistical methods to correct data by accounting for the effect of one or more linear relationships between variables.

A representation of high-dimensional data (for example, expression matrix with thousands of genes) that reduces the number of variables, while preserving important structures and relationships in the data for simplified analysis and visualization.

A local multicellular microenvironment, with an exact spatial resolution defined based on the assay and the studied tissue, for example, cells within a radius of 50 µm, or the 200 nearest neighbours of a cell.

A problem solved by determining a mapping between two distributions that minimizes a cost function; in the context of single-cell time series data, the distributions can be cell populations at different time points, and the solution finds a mapping that relates cells at a later time point to their inferred antecedents at one or more earlier time points.

Recurrent sets of multicellular neighbourhoods characterized by a co-occurring set of phenotypic states (for example, transcriptional programmes) in one or more cell types.

A cellular phenotype characterized by small or nuanced differences in gene expression compared to other states of the same cell type.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Gulati, G.S., D’Silva, J.P., Liu, Y. et al. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat Rev Mol Cell Biol (2024). https://doi.org/10.1038/s41580-024-00768-2

Download citation

Accepted : 16 July 2024

Published : 21 August 2024

DOI : https://doi.org/10.1038/s41580-024-00768-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

introduction to biostatistics and research methods

IMAGES

  1. Introduction to Biostatistics and Research Methods 2nd Edition

    introduction to biostatistics and research methods

  2. 1. Introduction to biostatistics

    introduction to biostatistics and research methods

  3. (PDF) ESSENTIALS OF BIOSTATISTICS & RESEARCH METHODOLOGY

    introduction to biostatistics and research methods

  4. Research methods and biostatistics

    introduction to biostatistics and research methods

  5. RESEARCH METHODOLOGY AND BIO STATISTICS

    introduction to biostatistics and research methods

  6. Download Introduction To Biostatistics And Research Methods PDF Online 2022

    introduction to biostatistics and research methods

COMMENTS

  1. Introduction to Biostatistics and Research Methods

    Books. INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS. P. S. S. SUNDAR RAO, J. RICHARD. PHI Learning Pvt. Ltd., Jan 9, 2012 - Medical - 280 pages. The last decade has produced many textbooks on Biostatistics, with varying emphasis and degrees of mathematical complexity. This book has stood the test of time and continues to enjoy wide ...

  2. Biostatistics Series Module 1: Basics of Biostatistics

    Basics of Biostatistics. Application of statistical methods in biomedical research began more than 150 years ago. One of the early pioneers, Florence Nightingale, the icon of nursing, worked during the Crimean war of the 1850s to improve the methods of constructing mortality tables. The conclusions from her tables helped to change the practices ...

  3. Introduction to Biostatistics and Research Methods

    Introduction to Biostatistics and Research Methods. The fourth edition of this well-accepted textbook introduces the basic principles and concepts of biostatistics in simple, straightforward terms with minimal use of statistical jargon. With the help of real-life examples, it explains the various statistical techniques that are applied to ...

  4. PDF Introduction to Biostatistics

    Introduction These notes are intended to provide the student with a conceptual overview of statistical methods with emphasis on applications commonly used in pharmaceutical and epidemiological research. We will briefly cover the topics of probability and descriptive statistics, followed by detailed descriptions of widely used inferential ...

  5. INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS: Edition 5

    INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS: Edition 5 - Ebook written by P. S. S. SUNDAR RAO, J. RICHARD. Read this book using Google Play Books app on your PC, android, iOS devices. Download for offline reading, highlight, bookmark or take notes while you read INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS: Edition 5.

  6. Introduction to Biostatistics and Research Methods

    Introduction to Biostatistics and Research Methods $16.70 Only 1 left in stock - order soon. The fourth edition of this well-accepted textbook introduces the basic principles and concepts of biostatistics in simple, straightforward terms with minimal use of statistical jargon.

  7. Introductory Biostatistics Notes: Diagrams & Illustrations

    This Osmosis High-Yield Note provides an overview of Introductory Biostatistics essentials. All Osmosis Notes are clearly laid-out and contain striking images, tables, and diagrams to help visual learners understand complex topics quickly and efficiently. Find more information about Introductory Biostatistics: Company.

  8. Introduction to Biostatistics and Research Method

    INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS FIFTH EDITION By RAO, P. S. S. SUNDAR, RICHARD, J. - Buy only for price Rs.450.00 at PHINDIA.com. ... The chapters on Research Methods, Interventional Studies and Observational Studies provide step-by-step guide to plan and carry out quality research. Questions given in each chapter will help ...

  9. Introduction to Biostatistics and Research Methods 5th Edition

    Amazon.in - Buy INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS 5TH EDITION book online at best prices in India on Amazon.in. Read INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS 5TH EDITION book reviews & author details and more at Amazon.in. Free delivery on qualified orders.

  10. An Introduction to Biostatistics

    Publish with us. Policies and ethics. In this chapter, we discuss the basics of what you need to know about biostatistics in order to statistically analyze and interpret the data from your in vitro and preclinical in vivo experiments. Experiments are conducted to answer one or more specific scientific...

  11. Introduction to Biostatistics

    Biostatistics is the application of statistical methods to the biological and life sciences. Statistical methods include procedures for: (1) collecting data, (2) presenting and summarizing data, and (3) drawing inferences from sample data to a population. These methods are particularly useful in studies involving humans because the processes ...

  12. Introduction to Biostatistics and Research Methods

    The book "Introduction to Biostatistics and Research Methods - Muhammad Ibrahim" Second Edition is the recommended text for clinical researchers, physiotherapists, nurses, and medical students.; This is a frequently used book for research and biostats, particularly among students of physiotherapy and allied health professions at the University of Health Sciences Lahore, King Edward ...

  13. Applied Biostatistics Certificate: Methods & Applications

    This online professional certificate program offers a comprehensive introduction to biostatistics in medical research. The program includes a review of the most common techniques in the field, as well as the manner in which these techniques are applied in standard statistical software.

  14. Introduction to Biostatistics

    1 Core Concepts. By the term "biostatistics," we mean the application of the field of probability and statistics to a wide range of topics that pertain to the biological sciences. We focus our discussion on the practical applications of fundamental biostatistics in the domain of healthcare, including experimental and clinical medicine ...

  15. Basic Concepts for Biostatistics

    Biostatistics is the application of statistical principles to questions and problems in medicine, public health or biology. One can imagine that it might be of interest to characterize a given population (e.g., adults in Boston or all children in the United States) with respect to the proportion of subjects who are overweight or the proportion ...

  16. PDF Introduction to Biostatistics

    Baek, Jonggyu, "Introduction to Biostatistics - Lecture 1: Introduction and Descriptive Statistics" (2019).PEER Liberia Project. 10. ... • Biostatistics is the application of statistics in medical research, e.g.: -Clinical trials -Epidemiology ... Graphical Methods for Discrete variables

  17. Introduction to Biostatistics and Research Methods

    The last decade has produced many textbooks on Biostatistics, with varying emphasis and degrees of mathematical complexity. This book has stood the test of time and continues to enjoy wide acceptance among students of all health and allied professions, other students and even qualified health investigators, who find it practical, simple and yet precise.

  18. Introduction to Biostatistics and Research Methods

    Created by ImportBot. Imported from Better World Books record. Introduction to Biostatistics and Research Methods by P. S. S. Sundar Rao, J. Richard, 2012, Prentice Hall India Pvt., Limited, Prentice Hall of India edition, in English.

  19. Guidance for biostatisticians on their essential contributions to

    Introduction. Rigorous scientific review of research protocols is critical to making funding decisions [1, 2], and to the protection of both human and non-human research participants [].Two pillars of ethical clinical and translational research include scientific validity and independent review of the proposed research [].As such, the review process often emphasizes the scientific approach and ...

  20. Biostat and Research

    Biostat and Research - PSS Sunder Rao - Free ebook download as PDF File (.pdf), Text File (.txt) or read book online for free.

  21. Introduction to Biostatistics and Research Methods

    INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS. P. S. S. SUNDAR RAO, J. RICHARD. PHI Learning Pvt. Ltd., Jan 9, 2012 - Medical - 280 pages. The last decade has produced many textbooks on Biostatistics, with varying emphasis and degrees of mathematical complexity. This book has stood the test of time and continues to enjoy wide acceptance ...

  22. Biostatistics

    What We Do in the Department of Biostatistics. The Bloomberg School's Department of Biostatistics is the oldest department of its kind in the world and has long been considered one of the best. Our faculty conduct research across the spectrum of statistical science, from foundations of inference to the discovery of new methodologies for health ...

  23. PDF INTRODUCTION TO BIOSTATISTICS AND RESEARCH METHODS

    ntroduction to Biostatistics and Research Methods.Dr. P.S.S. Sundar Rao joined Christian Medical College (CMC), Vellore, when the Madras University introduced the study of Biostatistics as a part of the syllabus of Preventive and Social Medicine in 1957 for the M.B.B.S.

  24. Profiling cell identity and tissue architecture with single-cell and

    Single-cell and spatial transcriptomics are transforming our understanding of cell plasticity and tissue diversity. This Review discusses technical and computational advancements and challenges in ...