Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals

You are here

  • Volume 20, Issue 5
  • An introduction to power and sample size estimation
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

This article has corrections. Please see:

  • Correction - January 01, 2004
  • Correction: An introduction to power and sample size estimation - October 01, 2023

Download PDF

  • S R Jones 1 ,
  • S Carley 2 ,
  • M Harrison 3
  • 1 North Manchester Hospital, Manchester, UK
  • 2 Royal Bolton Hospital, Bolton, UK
  • 3 North Staffordshire Hospital, UK
  • Correspondence to: Dr S R Jones, Emergency Department, Manchester Royal Infirmary, Oxford Road, Manchester M13 9WL, UK; steve.r.jones{at}bigfoot.com

The importance of power and sample size estimation for study design and analysis.

  • research design
  • sample size

https://doi.org/10.1136/emj.20.5.453

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Correction notice Following recent feedback from a reader, the authors have corrected this article. The original version of this paper stated that: “Strictly speaking, “power” refers to the number of patients required to avoid a type II error in a comparative study.” However, the formal definition of “power” is that it is the probability of avoiding a type II error (rejecting the alternative hypothesis when it is true), rather than a reference to the number of patients. Power is, however, related to sample size as power increases as the number of patients in the study increases. This statement has therefore been corrected to: “Strictly speaking, “power” refers to the probability of avoiding a type II error in a comparative study.

Linked Articles

  • Correction Correction BMJ Publishing Group Ltd and the British Association for Accident & Emergency Medicine Emergency Medicine Journal 2004; 21 126-126 Published Online First: 20 Jan 2004.
  • Correction Correction: An introduction to power and sample size estimation BMJ Publishing Group Ltd and the British Association for Accident & Emergency Medicine Emergency Medicine Journal 2023; 40 e4-e4 Published Online First: 27 Sep 2023. doi: 10.1136/emj.20.5.453corr2

Read the full text or download the PDF:

Statistical Power: What it is, How to Calculate it

In order to follow this article, you may want to read these articles first: What is a Hypothesis Test? What are Type I and Type II Errors?

What is Power?

The statistical power of a study (sometimes called sensitivity) is how likely the study is to distinguish an actual effect from one of chance. It’s the likelihood that the test is correctly rejecting the null hypothesis (i.e. “proving” your hypothesis ). For example, a study that has an 80% power means that the study has an 80% chance of the test having significant results.

  • A high statistical power means that the test results are likely valid. As the power increases, the probability of making a Type II error decreases.
  • A low statistical power means that the test results are questionable.

Statistical power helps you to determine if your sample size is large enough. It is possible to perform a hypothesis test without calculating the statistical power. If your sample size is too small, your results may be inconclusive when they may have been conclusive if you had a large enough sample .

Statistical Power and Beta

statistical power

Beta( β) is the probability that you won’t reject the null hypothesis when it is false. The statistical power is the complement of this probability: 1- Β

How to Calculate Statistical Power

Statistical Power is quite complex to calculate by hand. This article on MoreSteam explains it well.

Software is normally used to calculate the power.

  • Calculate power in SAS .
  • Calculate power in PASS.

Power Analysis

Power analysis is a method for finding statistical power: the probability of finding an effect, assuming that the effect is actually there. To put it another way, power is the probability of rejecting a null hypothesis when it’s false. Note that power is different from a Type II error, which happens when you fail to reject a false null hypothesis. So you could say that power is your probability of not making a type II error.

A Simple Example of Power Analysis

Let’s say you were conducting a drug trial and that the drug works. You run a series of trials with the effective drug and a placebo. If you had a power of .9, that means 90% of the time you would get a statistically significant result. In 10% of the cases, your results would not be statistically significant. The power in this case tells you the probability of finding a difference between the two means, which is 90%. But 10% of the time, you wouldn’t find a difference.

Reasons to run a Power Analysis

You can run a power analysis for many reasons, including:

  • To find the number of trials needed to get an effect of a certain size. This is probably the most common use for power analysis–it tells you how many trials you need to do to avoid incorrectly rejecting the null hypothesis.
  • To find the power, given an effect size and the number of trials available. This is often useful when you have a limited budget, for say, 100 trials, and you want to know if that number of trials is enough to detect an effect.
  • To validate your research. Conducting power analysis is simply put–good science.

Calculating power is complex and is usually always performed with a computer. You can find a list of links to online power calculators here .

Check out our YouTube channel for hundreds of elementary statistics and Probability videos!

Beyer, W. H. CRC Standard Mathematical Tables, 31st ed. Boca Raton, FL: CRC Press, pp. 536 and 571, 2002. Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York. Dodge, Y. (2008). The Concise Encyclopedia of Statistics . Springer. Salkind, N. (2016). Statistics for People Who (Think They) Hate Statistics: Using Microsoft Excel 4th Edition.

Frequently asked questions

What is a power analysis.

A power analysis is a calculation that helps you determine a minimum sample size for your study. It’s made up of four main components. If you know or have estimates for any three of these, you can calculate the fourth component.

  • Statistical power : the likelihood that a test will detect an effect of a certain size if there is one, usually set at 80% or higher.
  • Sample size : the minimum number of observations needed to observe an effect of a certain size with a given power level.
  • Significance level (alpha) : the maximum risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Expected effect size : a standardized way of expressing the magnitude of the expected result of your study, usually based on similar studies or a pilot study.

Frequently asked questions: Statistics

As the degrees of freedom increase, Student’s t distribution becomes less leptokurtic , meaning that the probability of extreme values decreases. The distribution becomes more and more similar to a standard normal distribution .

The three categories of kurtosis are:

  • Mesokurtosis : An excess kurtosis of 0. Normal distributions are mesokurtic.
  • Platykurtosis : A negative excess kurtosis. Platykurtic distributions are thin-tailed, meaning that they have few outliers .
  • Leptokurtosis : A positive excess kurtosis. Leptokurtic distributions are fat-tailed, meaning that they have many outliers.

Probability distributions belong to two broad categories: discrete probability distributions and continuous probability distributions . Within each category, there are many types of probability distributions.

Probability is the relative frequency over an infinite number of trials.

For example, the probability of a coin landing on heads is .5, meaning that if you flip the coin an infinite number of times, it will land on heads half the time.

Since doing something an infinite number of times is impossible, relative frequency is often used as an estimate of probability. If you flip a coin 1000 times and get 507 heads, the relative frequency, .507, is a good estimate of the probability.

Categorical variables can be described by a frequency distribution. Quantitative variables can also be described by a frequency distribution, but first they need to be grouped into interval classes .

A histogram is an effective way to tell if a frequency distribution appears to have a normal distribution .

Plot a histogram and look at the shape of the bars. If the bars roughly follow a symmetrical bell or hill shape, like the example below, then the distribution is approximately normally distributed.

Frequency-distribution-Normal-distribution

You can use the CHISQ.INV.RT() function to find a chi-square critical value in Excel.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05, click any blank cell and type:

=CHISQ.INV.RT(0.05,22)

You can use the qchisq() function to find a chi-square critical value in R.

For example, to calculate the chi-square critical value for a test with df = 22 and α = .05:

qchisq(p = .05, df = 22, lower.tail = FALSE)

You can use the chisq.test() function to perform a chi-square test of independence in R. Give the contingency table as a matrix for the “x” argument. For example:

m = matrix(data = c(89, 84, 86, 9, 8, 24), nrow = 3, ncol = 2)

chisq.test(x = m)

You can use the CHISQ.TEST() function to perform a chi-square test of independence in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value.

Chi-square goodness of fit tests are often used in genetics. One common application is to check if two genes are linked (i.e., if the assortment is independent). When genes are linked, the allele inherited for one gene affects the allele inherited for another gene.

Suppose that you want to know if the genes for pea texture (R = round, r = wrinkled) and color (Y = yellow, y = green) are linked. You perform a dihybrid cross between two heterozygous ( RY / ry ) pea plants. The hypotheses you’re testing with your experiment are:

  • This would suggest that the genes are unlinked.
  • This would suggest that the genes are linked.

You observe 100 peas:

  • 78 round and yellow peas
  • 6 round and green peas
  • 4 wrinkled and yellow peas
  • 12 wrinkled and green peas

Step 1: Calculate the expected frequencies

To calculate the expected values, you can make a Punnett square. If the two genes are unlinked, the probability of each genotypic combination is equal.

RRYY RrYy RRYy RrYY
RrYy rryy Rryy rrYy
RRYy Rryy RRyy RrYy
RrYY rrYy RrYy rrYY

The expected phenotypic ratios are therefore 9 round and yellow: 3 round and green: 3 wrinkled and yellow: 1 wrinkled and green.

From this, you can calculate the expected phenotypic frequencies for 100 peas:

Round and yellow 78 100 * (9/16) = 56.25
Round and green 6 100 * (3/16) = 18.75
Wrinkled and yellow 4 100 * (3/16) = 18.75
Wrinkled and green 12 100 * (1/16) = 6.21

Step 2: Calculate chi-square

Round and yellow 78 56.25 21.75 473.06 8.41
Round and green 6 18.75 −12.75 162.56 8.67
Wrinkled and yellow 4 18.75 −14.75 217.56 11.6
Wrinkled and green 12 6.21 5.79 33.52 5.4

Χ 2 = 8.41 + 8.67 + 11.6 + 5.4 = 34.08

Step 3: Find the critical chi-square value

Since there are four groups (round and yellow, round and green, wrinkled and yellow, wrinkled and green), there are three degrees of freedom .

For a test of significance at α = .05 and df = 3, the Χ 2 critical value is 7.82.

Step 4: Compare the chi-square value to the critical value

Χ 2 = 34.08

Critical value = 7.82

The Χ 2 value is greater than the critical value .

Step 5: Decide whether the reject the null hypothesis

The Χ 2 value is greater than the critical value, so we reject the null hypothesis that the population of offspring have an equal probability of inheriting all possible genotypic combinations. There is a significant difference between the observed and expected genotypic frequencies ( p < .05).

The data supports the alternative hypothesis that the offspring do not have an equal probability of inheriting all possible genotypic combinations, which suggests that the genes are linked

You can use the chisq.test() function to perform a chi-square goodness of fit test in R. Give the observed values in the “x” argument, give the expected values in the “p” argument, and set “rescale.p” to true. For example:

chisq.test(x = c(22,30,23), p = c(25,25,25), rescale.p = TRUE)

You can use the CHISQ.TEST() function to perform a chi-square goodness of fit test in Excel. It takes two arguments, CHISQ.TEST(observed_range, expected_range), and returns the p value .

Both correlations and chi-square tests can test for relationships between two variables. However, a correlation is used when you have two quantitative variables and a chi-square test of independence is used when you have two categorical variables.

Both chi-square tests and t tests can test for differences between two groups. However, a t test is used when you have a dependent quantitative variable and an independent categorical variable (with two groups). A chi-square test of independence is used when you have two categorical variables.

The two main chi-square tests are the chi-square goodness of fit test and the chi-square test of independence .

A chi-square distribution is a continuous probability distribution . The shape of a chi-square distribution depends on its degrees of freedom , k . The mean of a chi-square distribution is equal to its degrees of freedom ( k ) and the variance is 2 k . The range is 0 to ∞.

As the degrees of freedom ( k ) increases, the chi-square distribution goes from a downward curve to a hump shape. As the degrees of freedom increases further, the hump goes from being strongly right-skewed to being approximately normal.

To find the quartiles of a probability distribution, you can use the distribution’s quantile function.

You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(.25,.5,.75), type=1)” will return the three quartiles.

You can use the QUARTILE() function to find quartiles in Excel. If your data is in column A, then click any blank cell and type “=QUARTILE(A:A,1)” for the first quartile, “=QUARTILE(A:A,2)” for the second quartile, and “=QUARTILE(A:A,3)” for the third quartile.

You can use the PEARSON() function to calculate the Pearson correlation coefficient in Excel. If your variables are in columns A and B, then click any blank cell and type “PEARSON(A:A,B:B)”.

There is no function to directly test the significance of the correlation.

You can use the cor() function to calculate the Pearson correlation coefficient in R. To test the significance of the correlation, you can use the cor.test() function.

You should use the Pearson correlation coefficient when (1) the relationship is linear and (2) both variables are quantitative and (3) normally distributed and (4) have no outliers.

The Pearson correlation coefficient ( r ) is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

This table summarizes the most important differences between normal distributions and Poisson distributions :

Characteristic Normal Poisson
Continuous
Mean (µ) and standard deviation (σ) Lambda (λ)
Shape Bell-shaped Depends on λ
Symmetrical Asymmetrical (right-skewed). As λ increases, the asymmetry decreases.
Range −∞ to ∞ 0 to ∞

When the mean of a Poisson distribution is large (>10), it can be approximated by a normal distribution.

In the Poisson distribution formula, lambda (λ) is the mean number of events within a given interval of time or space. For example, λ = 0.748 floods per year.

The e in the Poisson distribution formula stands for the number 2.718. This number is called Euler’s constant. You can simply substitute e with 2.718 when you’re calculating a Poisson probability. Euler’s constant is a very useful number and is especially important in calculus.

The three types of skewness are:

  • Right skew (also called positive skew ) . A right-skewed distribution is longer on the right side of its peak than on its left.
  • Left skew (also called negative skew). A left-skewed distribution is longer on the left side of its peak than on its right.
  • Zero skew. It is symmetrical and its left and right sides are mirror images.

Skewness of a distribution

Skewness and kurtosis are both important measures of a distribution’s shape.

  • Skewness measures the asymmetry of a distribution.
  • Kurtosis measures the heaviness of a distribution’s tails relative to a normal distribution .

Difference between skewness and kurtosis

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The t distribution was first described by statistician William Sealy Gosset under the pseudonym “Student.”

To calculate a confidence interval of a mean using the critical value of t , follow these four steps:

  • Choose the significance level based on your desired confidence level. The most common confidence level is 95%, which corresponds to α = .05 in the two-tailed t table .
  • Find the critical value of t in the two-tailed t table.
  • Multiply the critical value of t by s / √ n .
  • Add this value to the mean to calculate the upper limit of the confidence interval, and subtract this value from the mean to calculate the lower limit.

To test a hypothesis using the critical value of t , follow these four steps:

  • Calculate the t value for your sample.
  • Find the critical value of t in the t table .
  • Determine if the (absolute) t value is greater than the critical value of t .
  • Reject the null hypothesis if the sample’s t value is greater than the critical value of t . Otherwise, don’t reject the null hypothesis .

You can use the T.INV() function to find the critical value of t for one-tailed tests in Excel, and you can use the T.INV.2T() function for two-tailed tests.

You can use the qt() function to find the critical value of t in R. The function gives the critical value of t for the one-tailed test. If you want the critical value of t for a two-tailed test, divide the significance level by two.

You can use the RSQ() function to calculate R² in Excel. If your dependent variable is in column A and your independent variable is in column B, then click any blank cell and type “RSQ(A:A,B:B)”.

You can use the summary() function to view the R²  of a linear model in R. You will see the “R-squared” near the bottom of the output.

There are two formulas you can use to calculate the coefficient of determination (R²) of a simple linear regression .

R^2=(r)^2

The coefficient of determination (R²) is a number between 0 and 1 that measures how well a statistical model predicts an outcome. You can interpret the R² as the proportion of variation in the dependent variable that is predicted by the statistical model.

There are three main types of missing data .

Missing completely at random (MCAR) data are randomly distributed across the variable and unrelated to other variables .

Missing at random (MAR) data are not randomly distributed but they are accounted for by other observed variables.

Missing not at random (MNAR) data systematically differ from the observed values.

To tidy up your missing data , your options usually include accepting, removing, or recreating the missing data.

  • Acceptance: You leave your data as is
  • Listwise or pairwise deletion: You delete all cases (participants) with missing data from analyses
  • Imputation: You use other data to fill in the missing data

Missing data are important because, depending on the type, they can sometimes bias your results. This means your results may not be generalizable outside of your study because your data come from an unrepresentative sample .

Missing data , or missing values, occur when you don’t have data stored for certain variables or participants.

In any dataset, there’s usually some missing data. In quantitative research , missing values appear as blank cells in your spreadsheet.

There are two steps to calculating the geometric mean :

  • Multiply all values together to get their product.
  • Find the n th root of the product ( n is the number of values).

Before calculating the geometric mean, note that:

  • The geometric mean can only be found for positive values.
  • If any value in the data set is zero, the geometric mean is zero.

The arithmetic mean is the most commonly used type of mean and is often referred to simply as “the mean.” While the arithmetic mean is based on adding and dividing values, the geometric mean multiplies and finds the root of values.

Even though the geometric mean is a less common measure of central tendency , it’s more accurate than the arithmetic mean for percentage change and positively skewed data. The geometric mean is often reported for financial indices and population growth rates.

The geometric mean is an average that multiplies all values and finds a root of the number. For a dataset with n numbers, you find the n th root of their product.

Outliers are extreme values that differ from most values in the dataset. You find outliers at the extreme ends of your dataset.

It’s best to remove outliers only when you have a sound reason for doing so.

Some outliers represent natural variations in the population , and they should be left as is in your dataset. These are called true outliers.

Other outliers are problematic and should be removed because they represent measurement errors , data entry or processing errors, or poor sampling.

You can choose from four main ways to detect outliers :

  • Sorting your values from low to high and checking minimum and maximum values
  • Visualizing your data with a box plot and looking for outliers
  • Using the interquartile range to create fences for your data
  • Using statistical procedures to identify extreme values

Outliers can have a big impact on your statistical analyses and skew the results of any hypothesis test if they are inaccurate.

These extreme values can impact your statistical power as well, making it hard to detect a true effect if there is one.

No, the steepness or slope of the line isn’t related to the correlation coefficient value. The correlation coefficient only tells you how closely your data fit on a line, so two datasets with the same correlation coefficient can have very different slopes.

To find the slope of the line, you’ll need to perform a regression analysis .

Correlation coefficients always range between -1 and 1.

The sign of the coefficient tells you the direction of the relationship: a positive value means the variables change together in the same direction, while a negative value means they change together in opposite directions.

The absolute value of a number is equal to the number without its sign. The absolute value of a correlation coefficient tells you the magnitude of the correlation: the greater the absolute value, the stronger the correlation.

These are the assumptions your data must meet if you want to use Pearson’s r :

  • Both variables are on an interval or ratio level of measurement
  • Data from both variables follow normal distributions
  • Your data have no outliers
  • Your data is from a random or representative sample
  • You expect a linear relationship between the two variables

A correlation coefficient is a single number that describes the strength and direction of the relationship between your variables.

Different types of correlation coefficients might be appropriate for your data based on their levels of measurement and distributions . The Pearson product-moment correlation coefficient (Pearson’s r ) is commonly used to assess a linear relationship between two quantitative variables.

There are various ways to improve power:

  • Increase the potential effect size by manipulating your independent variable more strongly,
  • Increase sample size,
  • Increase the significance level (alpha),
  • Reduce measurement error by increasing the precision and accuracy of your measurement devices and procedures,
  • Use a one-tailed test instead of a two-tailed test for t tests and z tests.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

The risk of making a Type II error is inversely related to the statistical power of a test. Power is the extent to which a test can correctly detect a real effect when there is one.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level to increase statistical power.

The risk of making a Type I error is the significance level (or alpha) that you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value ).

The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

To reduce the Type I error probability, you can set a lower significance level.

In statistics, a Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s actually false.

In statistics, power refers to the likelihood of a hypothesis test detecting a true effect if there is one. A statistically powerful test is more likely to reject a false negative (a Type II error).

If you don’t ensure enough power in your study, you may not be able to detect a statistically significant result even when it has practical significance. Your study might not have the ability to answer your research question.

While statistical significance shows that an effect exists in a study, practical significance shows that the effect is large enough to be meaningful in the real world.

Statistical significance is denoted by p -values whereas practical significance is represented by effect sizes .

There are dozens of measures of effect sizes . The most common effect sizes are Cohen’s d and Pearson’s r . Cohen’s d measures the size of the difference between two groups while Pearson’s r measures the strength of the relationship between two variables .

Effect size tells you how meaningful the relationship between variables or the difference between groups is.

A large effect size means that a research finding has practical significance, while a small effect size indicates limited practical applications.

Using descriptive and inferential statistics , you can make two types of estimates about the population : point estimates and interval estimates.

  • A point estimate is a single value estimate of a parameter . For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Standard error and standard deviation are both measures of variability . The standard deviation reflects variability within a sample, while the standard error estimates the variability across samples of a population.

The standard error of the mean , or simply standard error , indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population.

To figure out whether a given number is a parameter or a statistic , ask yourself the following:

  • Does the number describe a whole, complete population where every member can be reached for data collection ?
  • Is it possible to collect data for this number from every member of the population in a reasonable time frame?

If the answer is yes to both questions, the number is likely to be a parameter. For small populations, data can be collected from the whole population and summarized in parameters.

If the answer is no to either of the questions, then the number is more likely to be a statistic.

The arithmetic mean is the most commonly used mean. It’s often simply called the mean or the average. But there are some other types of means you can calculate depending on your research purposes:

  • Weighted mean: some values contribute more to the mean than others.
  • Geometric mean : values are multiplied rather than summed up.
  • Harmonic mean: reciprocals of values are used instead of the values themselves.

You can find the mean , or average, of a data set in two simple steps:

  • Find the sum of the values by adding them all up.
  • Divide the sum by the number of values in the data set.

This method is the same whether you are dealing with sample or population data or positive or negative numbers.

The median is the most informative measure of central tendency for skewed distributions or distributions with outliers. For example, the median is often used as a measure of central tendency for income distributions, which are generally highly skewed.

Because the median only uses one or two values, it’s unaffected by extreme outliers or non-symmetric distributions of scores. In contrast, the mean and mode can vary in skewed distributions.

To find the median , first order your data. Then calculate the middle position based on n , the number of values in your data set.

\dfrac{(n+1)}{2}

A data set can often have no mode, one mode or more than one mode – it all depends on how many different values repeat most frequently.

Your data can be:

  • without any mode
  • unimodal, with one mode,
  • bimodal, with two modes,
  • trimodal, with three modes, or
  • multimodal, with four or more modes.

To find the mode :

  • If your data is numerical or quantitative, order the values from low to high.
  • If it is categorical, sort the values by group, in any order.

Then you simply need to identify the most frequently occurring value.

The interquartile range is the best measure of variability for skewed distributions or data sets with outliers. Because it’s based on values that come from the middle half of the distribution, it’s unlikely to be influenced by outliers .

The two most common methods for calculating interquartile range are the exclusive and inclusive methods.

The exclusive method excludes the median when identifying Q1 and Q3, while the inclusive method includes the median as a value in the data set in identifying the quartiles.

For each of these methods, you’ll need different procedures for finding the median, Q1 and Q3 depending on whether your sample size is even- or odd-numbered. The exclusive method works best for even-numbered sample sizes, while the inclusive method is often used with odd-numbered sample sizes.

While the range gives you the spread of the whole data set, the interquartile range gives you the spread of the middle half of a data set.

Homoscedasticity, or homogeneity of variances, is an assumption of equal or similar variances in different groups being compared.

This is an important assumption of parametric statistical tests because they are sensitive to any dissimilarities. Uneven variances in samples result in biased and skewed test results.

Statistical tests such as variance tests or the analysis of variance (ANOVA) use sample variance to assess group differences of populations. They use the variances of the samples to assess whether the populations they come from significantly differ from each other.

Variance is the average squared deviations from the mean, while standard deviation is the square root of this number. Both measures reflect variability in a distribution, but their units differ:

  • Standard deviation is expressed in the same units as the original values (e.g., minutes or meters).
  • Variance is expressed in much larger units (e.g., meters squared).

Although the units of variance are harder to intuitively understand, variance is important in statistical tests .

The empirical rule, or the 68-95-99.7 rule, tells you where most of the values lie in a normal distribution :

  • Around 68% of values are within 1 standard deviation of the mean.
  • Around 95% of values are within 2 standard deviations of the mean.
  • Around 99.7% of values are within 3 standard deviations of the mean.

The empirical rule is a quick way to get an overview of your data and check for any outliers or extreme values that don’t follow this pattern.

In a normal distribution , data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center.

The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

Normal distribution

The standard deviation is the average amount of variability in your data set. It tells you, on average, how far each score lies from the mean .

In normal distributions, a high standard deviation means that values are generally far from the mean, while a low standard deviation indicates that values are clustered close to the mean.

No. Because the range formula subtracts the lowest number from the highest number, the range is always zero or a positive number.

In statistics, the range is the spread of your data from the lowest to the highest value in the distribution. It is the simplest measure of variability .

While central tendency tells you where most of your data points lie, variability summarizes how far apart your points from each other.

Data sets can have the same central tendency but different levels of variability or vice versa . Together, they give you a complete picture of your data.

Variability is most commonly measured with the following descriptive statistics :

  • Range : the difference between the highest and lowest values
  • Interquartile range : the range of the middle half of a distribution
  • Standard deviation : average distance from the mean
  • Variance : average of squared distances from the mean

Variability tells you how far apart points lie from each other and from the center of a distribution or a data set.

Variability is also referred to as spread, scatter or dispersion.

While interval and ratio data can both be categorized, ranked, and have equal spacing between adjacent values, only ratio scales have a true zero.

For example, temperature in Celsius or Fahrenheit is at an interval scale because zero is not the lowest possible temperature. In the Kelvin scale, a ratio scale, zero represents a total lack of thermal energy.

A critical value is the value of the test statistic which defines the upper and lower bounds of a confidence interval , or which defines the threshold of statistical significance in a statistical test. It describes how far from the mean of the distribution you have to go to cover a certain amount of the total variation in the data (i.e. 90%, 95%, 99%).

If you are constructing a 95% confidence interval and are using a threshold of statistical significance of p = 0.05, then your critical value will be identical in both cases.

The t -distribution gives more probability to observations in the tails of the distribution than the standard normal distribution (a.k.a. the z -distribution).

In this way, the t -distribution is more conservative than the standard normal distribution: to reach the same level of confidence or statistical significance , you will need to include a wider range of the data.

A t -score (a.k.a. a t -value) is equivalent to the number of standard deviations away from the mean of the t -distribution .

The t -score is the test statistic used in t -tests and regression tests. It can also be used to describe how far from the mean an observation is when the data follow a t -distribution.

The t -distribution is a way of describing a set of observations where most observations fall close to the mean , and the rest of the observations make up the tails on either side. It is a type of normal distribution used for smaller sample sizes, where the variance in the data is unknown.

The t -distribution forms a bell curve when plotted on a graph. It can be described mathematically using the mean and the standard deviation .

In statistics, ordinal and nominal variables are both considered categorical variables .

Even though ordinal data can sometimes be numerical, not all mathematical operations can be performed on them.

Ordinal data has two characteristics:

  • The data can be classified into different categories within a variable.
  • The categories have a natural ranked order.

However, unlike with interval data, the distances between the categories are uneven or unknown.

Nominal and ordinal are two of the four levels of measurement . Nominal level data can only be classified, while ordinal level data can be classified and ordered.

Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable. These categories cannot be ordered in a meaningful way.

For example, for the nominal variable of preferred mode of transportation, you may have the categories of car, bus, train, tram or bicycle.

If your confidence interval for a difference between groups includes zero, that means that if you run your experiment again you have a good chance of finding no difference between groups.

If your confidence interval for a correlation or regression includes zero, that means that if you run your experiment again there is a good chance of finding no correlation in your data.

In both of these cases, you will also find a high p -value when you run your statistical test, meaning that your results could have occurred under the null hypothesis of no relationship between variables or no difference between groups.

If you want to calculate a confidence interval around the mean of data that is not normally distributed , you have two choices:

  • Find a distribution that matches the shape of your data and use that distribution to calculate the confidence interval.
  • Perform a transformation on your data to make it fit a normal distribution, and then find the confidence interval for the transformed data.

The standard normal distribution , also called the z -distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.

Any normal distribution can be converted into the standard normal distribution by turning the individual values into z -scores. In a z -distribution, z -scores tell you how many standard deviations away from the mean each value lies.

The z -score and t -score (aka z -value and t -value) show how many standard deviations away from the mean of the distribution you are, assuming your data follow a z -distribution or a t -distribution .

These scores are used in statistical tests to show how far from the mean of the predicted distribution your statistical estimate is. If your test produces a z -score of 2.5, this means that your estimate is 2.5 standard deviations from the predicted mean.

The predicted mean and distribution of your estimate are generated by the null hypothesis of the statistical test you are using. The more standard deviations away from the predicted mean your estimate is, the less likely it is that the estimate could have occurred under the null hypothesis .

To calculate the confidence interval , you need to know:

  • The point estimate you are constructing the confidence interval for
  • The critical values for the test statistic
  • The standard deviation of the sample
  • The sample size

Then you can plug these components into the confidence interval formula that corresponds to your data. The formula depends on the type of estimate (e.g. a mean or a proportion) and on the distribution of your data.

The confidence level is the percentage of times you expect to get close to the same estimate if you run your experiment again or resample the population in the same way.

The confidence interval consists of the upper and lower bounds of the estimate you expect to find at a given level of confidence.

For example, if you are estimating a 95% confidence interval around the mean proportion of female babies born every year based on a random sample of babies, you might find an upper bound of 0.56 and a lower bound of 0.48. These are the upper and lower bounds of the confidence interval. The confidence level is 95%.

The mean is the most frequently used measure of central tendency because it uses all values in the data set to give you an average.

For data from skewed distributions, the median is better than the mean because it isn’t influenced by extremely large values.

The mode is the only measure you can use for nominal or categorical data that can’t be ordered.

The measures of central tendency you can use depends on the level of measurement of your data.

  • For a nominal level, you can only use the mode to find the most frequent value.
  • For an ordinal level or ranked data, you can also use the median to find the value in the middle of your data set.
  • For interval or ratio levels, in addition to the mode and median, you can use the mean to find the average value.

Measures of central tendency help you find the middle, or the average, of a data set.

The 3 most common measures of central tendency are the mean, median and mode.

  • The mode is the most frequent value.
  • The median is the middle number in an ordered data set.
  • The mean is the sum of all values divided by the total number of values.

Some variables have fixed levels. For example, gender and ethnicity are always nominal level data because they cannot be ranked.

However, for other variables, you can choose the level of measurement . For example, income is a variable that can be recorded on an ordinal or a ratio scale:

  • At an ordinal level , you could create 5 income groupings and code the incomes that fall within them from 1–5.
  • At a ratio level , you would record exact numbers for income.

If you have a choice, the ratio level is always preferable because you can analyze data in more ways. The higher the level of measurement, the more precise your data is.

The level at which you measure a variable determines how you can analyze your data.

Depending on the level of measurement , you can perform different descriptive statistics to get an overall summary of your data and inferential statistics to see if your results support or refute your hypothesis .

Levels of measurement tell you how precisely variables are recorded. There are 4 levels of measurement, which can be ranked from low to high:

  • Nominal : the data can only be categorized.
  • Ordinal : the data can be categorized and ranked.
  • Interval : the data can be categorized and ranked, and evenly spaced.
  • Ratio : the data can be categorized, ranked, evenly spaced and has a natural zero.

No. The p -value only tells you how likely the data you have observed is to have occurred under the null hypothesis .

If the p -value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

The alpha value, or the threshold for statistical significance , is arbitrary – which value you use depends on your field of study.

In most cases, researchers use an alpha of 0.05, which means that there is a less than 5% chance that the data being tested could have occurred under the null hypothesis.

P -values are usually automatically calculated by the program you use to perform your statistical test. They can also be estimated using p -value tables for the relevant test statistic .

P -values are calculated from the null distribution of the test statistic. They tell you how often a test statistic is expected to occur under the null hypothesis of the statistical test, based on where it falls in the null distribution.

If the test statistic is far from the mean of the null distribution, then the p -value will be small, showing that the test statistic is not likely to have occurred under the null hypothesis.

A p -value , or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test .

The test statistic you use will be determined by the statistical test.

You can choose the right statistical test by looking at what type of data you have collected and what type of relationship you want to test.

The test statistic will change based on the number of observations in your data, how variable your observations are, and how strong the underlying patterns in the data are.

For example, if one data set has higher variability while another has lower variability, the first data set will produce a test statistic closer to the null hypothesis , even if the true correlation between two variables is the same in either data set.

The formula for the test statistic depends on the statistical test being used.

Generally, the test statistic is calculated as the pattern in your data (i.e. the correlation between variables or difference between groups) divided by the variance in the data (i.e. the standard deviation ).

  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

In statistics, model selection is a process researchers use to compare the relative value of different statistical models and determine which one is the best fit for the observed data.

The Akaike information criterion is one of the most common methods of model selection. AIC weights the ability of the model to predict the observed data against the number of parameters the model requires to reach that level of precision.

AIC model selection can help researchers find a model that explains the observed variation in their data while avoiding overfitting.

In statistics, a model is the collection of one or more independent variables and their predicted interactions that researchers use to try to explain variation in their dependent variable.

You can test a model using a statistical test . To compare how well different models fit your data, you can use Akaike’s information criterion for model selection.

The Akaike information criterion is calculated from the maximum log-likelihood of the model and the number of parameters (K) used to reach that likelihood. The AIC function is 2K – 2(log-likelihood) .

Lower AIC values indicate a better-fit model, and a model with a delta-AIC (the difference between the two AIC values being compared) of more than -2 is considered significantly better than the model it is being compared to.

The Akaike information criterion is a mathematical test used to evaluate how well a model fits the data it is meant to describe. It penalizes models which use more independent variables (parameters) as a way to avoid over-fitting.

AIC is most often used to compare the relative goodness-of-fit among different models under consideration and to then choose the model that best fits the data.

A factorial ANOVA is any ANOVA that uses more than one categorical independent variable . A two-way ANOVA is a type of factorial ANOVA.

Some examples of factorial ANOVAs include:

  • Testing the combined effects of vaccination (vaccinated or not vaccinated) and health status (healthy or pre-existing condition) on the rate of flu infection in a population.
  • Testing the effects of marital status (married, single, divorced, widowed), job status (employed, self-employed, unemployed, retired), and family history (no family history, some family history) on the incidence of depression in a population.
  • Testing the effects of feed type (type A, B, or C) and barn crowding (not crowded, somewhat crowded, very crowded) on the final weight of chickens in a commercial farming operation.

In ANOVA, the null hypothesis is that there is no difference among group means. If any group differs significantly from the overall group mean, then the ANOVA will report a statistically significant result.

Significant differences among group means are calculated using the F statistic, which is the ratio of the mean sum of squares (the variance explained by the independent variable) to the mean square error (the variance left over).

If the F statistic is higher than the critical value (the value of F that corresponds with your alpha value, usually 0.05), then the difference among groups is deemed statistically significant.

The only difference between one-way and two-way ANOVA is the number of independent variables . A one-way ANOVA has one independent variable, while a two-way ANOVA has two.

  • One-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon.
  • Two-way ANOVA : Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master’s), and race finishing times in a marathon.

All ANOVAs are designed to test for differences among three or more groups. If you are only testing for a difference between two groups, use a t-test instead.

Multiple linear regression is a regression model that estimates the relationship between a quantitative dependent variable and two or more independent variables using a straight line.

Linear regression most often uses mean-square error (MSE) to calculate the error of the model. MSE is calculated by:

  • measuring the distance of the observed y-values from the predicted y-values at each value of x;
  • squaring each of these distances;
  • calculating the mean of each of the squared distances.

Linear regression fits a line to the data by finding the regression coefficient that results in the smallest MSE.

Simple linear regression is a regression model that estimates the relationship between one independent variable and one dependent variable using a straight line. Both variables should be quantitative.

For example, the relationship between temperature and the expansion of mercury in a thermometer can be modeled using a straight line: as temperature increases, the mercury expands. This linear relationship is so certain that we can use mercury thermometers to measure temperature.

A regression model is a statistical model that estimates the relationship between one dependent variable and one or more independent variables using a line (or a plane in the case of two or more independent variables).

A regression model can be used when the dependent variable is quantitative, except in the case of logistic regression, where the dependent variable is binary.

A t-test should not be used to measure differences among more than two groups, because the error structure for a t-test will underestimate the actual error when many groups are being compared.

If you want to compare the means of several groups at once, it’s best to use another statistical test such as ANOVA or a post-hoc test.

A one-sample t-test is used to compare a single population to a standard value (for example, to determine whether the average lifespan of a specific town is different from the country average).

A paired t-test is used to compare a single population before and after some experimental intervention or at two different points in time (for example, measuring student performance on a test before and after being taught the material).

A t-test measures the difference in group means divided by the pooled standard error of the two group means.

In this way, it calculates a number (the t-value) illustrating the magnitude of the difference between the two group means being compared, and estimates the likelihood that this difference exists purely by chance (p-value).

Your choice of t-test depends on whether you are studying one group or two groups, and whether you care about the direction of the difference in group means.

If you are studying one group, use a paired t-test to compare the group mean over time or after an intervention, or use a one-sample t-test to compare the group mean to a standard value. If you are studying two groups, use a two-sample t-test .

If you want to know only whether a difference exists, use a two-tailed test . If you want to know if one group mean is greater or less than the other, use a left-tailed or right-tailed one-tailed test .

A t-test is a statistical test that compares the means of two samples . It is used in hypothesis testing , with a null hypothesis that the difference in group means is zero and an alternate hypothesis that the difference in group means is different from zero.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

A test statistic is a number calculated by a  statistical test . It describes how far your observed data is from the  null hypothesis  of no relationship between  variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical tests commonly assume that:

  • the data are normally distributed
  • the groups that are being compared have similar variance
  • the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

Ask our team

Want to contact us directly? No problem.  We  are always here for you.

Support team - Nina

Our team helps students graduate by offering:

  • A world-class citation generator
  • Plagiarism Checker software powered by Turnitin
  • Innovative Citation Checker software
  • Professional proofreading services
  • Over 300 helpful articles about academic writing, citing sources, plagiarism, and more

Scribbr specializes in editing study-related documents . We proofread:

  • PhD dissertations
  • Research proposals
  • Personal statements
  • Admission essays
  • Motivation letters
  • Reflection papers
  • Journal articles
  • Capstone projects

Scribbr’s Plagiarism Checker is powered by elements of Turnitin’s Similarity Checker , namely the plagiarism detection software and the Internet Archive and Premium Scholarly Publications content databases .

The add-on AI detector is powered by Scribbr’s proprietary software.

The Scribbr Citation Generator is developed using the open-source Citation Style Language (CSL) project and Frank Bennett’s citeproc-js . It’s the same technology used by dozens of other popular citation tools, including Mendeley and Zotero.

You can find all the citation styles and locales used in the Scribbr Citation Generator in our publicly accessible repository on Github .

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

What is Power in Statistics?

By Jim Frost 1 Comment

Power in statistics is the probability that a hypothesis test can detect an effect in a sample when it exists in the population . It is the sensitivity of a hypothesis test. When an effect exists in the population, how likely is the test to detect it in your sample?

Lightbulb plugging in its own plug to get statistical power.

High statistical power occurs when a hypothesis test is likely to find an effect that exists in the population. A low power test is unlikely to detect that effect.

For example, if statistical power is 80%, a hypothesis test has an 80% chance of detecting an effect that actually exists. Now imagine you’re performing a study that has only 10%. That’s not good because the test is far more likely to miss the effect.

In this post, learn about statistical power, why it matters, how to increase it, and calculate it for a study.

Why Power in Statistics Matters

In all hypothesis tests, the researchers are testing an effect of some sort. It can be the effectiveness of a new medication, the strength of a new product, etc. There is a relationship or difference between groups that the researchers hope to identify. Learn more about Effects in Statistics .

Unfortunately, a hypothesis test can fail to detect an effect even when it does exist. This problem happens more frequently when the test has low statistical power.

Consequently, power is a crucial concept to understand before starting a study. Imagine the scenario where an effect exists in the population, but the test fails to detect it in the sample. Not only did the researchers waste their time and money on the project, but they’ve also failed to identify an effect that exists. Consequently, they’re missing out on the benefits the effect would have provided!

Clearly, researchers want an experimental design that produces high statistical power! Unfortunately, if the design is lacking, a study can be doomed to fail from the start.

Power matters in statistics because you don’t want to spend time and money on a project only to miss an effect that exists! It is vital to estimate the power of a statistical test before beginning a study to help ensure it has a reasonable chance of detecting an effect if one exists.

Statistical Power and Hypothesis Testing Errors

To better understand power in statistics, you first need to know why and how hypothesis tests can make incorrect decisions.

Related post : Overview of Hypothesis Testing

Why do hypothesis tests make errors?

Hypothesis tests use samples to draw conclusions about entire populations. Researchers use these tests because it’s rarely possible to measure a whole population. So, they’re stuck with samples.

Unfortunately, samples don’t always accurately reflect the population. Statisticians define sampling error as the difference between a sample and the target population. Occasionally, this error can be large enough to cause hypothesis tests to draw the wrong conclusions. Consequently, statistical power becomes a crucial issue because increasing it reduces the chance of errors. Learn more about Sampling Error: Definition, Sources & Minimizing .

How do they make errors?

Samples sometimes show effects that don’t exist in the population, or they don’t display effects that do exist. Hypothesis tests try to manage these errors, but they’re not perfect. Statisticians have devised clever names for these two types of errors— Type I and Type II errors!

  • Type I : The hypothesis test rejects a true null hypothesis (false positive).
  • Type II : Test fails to reject a false null (false negative).

Power in statistics relates only to type II errors, the false negatives. The effect exists in the population, but the test doesn’t detect it in the sample. Hence, we won’t deal with Type I errors for the rest of this post. If you want to know more about both errors, read my post, Types of Errors in Hypothesis Testing .

The Type II error rate (known as beta or β) is the probability of a false negative for a hypothesis test. Furthermore, the inverse of Type II errors is the probability of correctly detecting an effect (i.e., a true positive), which is the definition of statistical power. In mathematical terms, 1 – β = the statistical power.

For example, if the Type II error rate is 0.2, then statistical power is 1 – 0.2 = 0.8. It logically follows that a lower Type II error rate equates to higher power.

Analysts are typically more interested in estimating power than beta.

How to Increase Statistical Power

Now that you know why power in statistics is essential, how do you ensure that your hypothesis test has high power?

Let’s start by understanding the factors that affect power in statistics. The following conditions increase a hypothesis test’s ability to detect an effect:

  • Larger sample sizes.
  • Larger effect sizes.
  • Lower variability in the population.
  • Higher significance level (alpha) (e.g., 5% → 10%).

Of these factors, researchers typically have the most control over the sample size. Consequently, that’s your go-to method for increasing statistical power.

Effect sizes and variability are often inherent to the subject area you’re studying. Researchers have less control over them than the sample size. However, there might be some steps you can take to increase the effect size (e.g., larger treatments) or reduce the variability (e.g., tightly controlled lab conditions).

Do not choose a significance level to increase statistical power. Instead, set it based on your risk tolerance for a false positive. Usually, you’ll want to leave it at 5% unless you have a compelling reason to change it. To learn more, read my post about Understanding Significance Levels .

Power Analysis

Studies typically want at least 80% power, but sometimes they need even more. How do you plan for a study to have that much capability from the start? Perform a power analysis before collecting data!

A statistical power analysis helps determine how large your sample must be to detect an effect. This process requires entering the following information into your statistical software:

  • Effect size estimate
  • Population variability estimate
  • Statistical power target
  • Significance level

Notice that the effect size and population variability values are estimates. Typically, you’ll produce these estimates through literature reviews and subject-area knowledge. The quality of your power analysis depends on having reasonable estimates!

After entering the required information, your statistical software displays the sample size necessary to achieve your target value for statistical power. I recommend using G*Power for this type of analysis. It’s free!

I’ve written an article about this process in more detail, complete with examples. How to Calculate Sample Size Needed for Power .

For readers who are up for a bit more complex topic, failing to detect an effect is not the only problem with low power studies. When such a study happens to have a significant result, it will report an exaggerated effect size! For more information, read Low Power Tests Exaggerate Effect Sizes .

Share this:

what is power in research study

Reader Interactions

' src=

May 6, 2022 at 5:08 am

I am a physician,and I am a die-hard fan of Jim’s series on statistics which help me a lot to establish an intuitive understanding of various statistical concepts and procedures.Therefore,thanks in earnest for your fabulous books.I wonder whether you could consider writing another one on modern beyesian statistical analysis of data since now the frequentis approaches are under scathing attacks by many statisticians especially the NHST and many of them strongly advocate using beyesian methods as alternative instead.However,as a applied users of statistics with very little and weak mathematics background,I found it extremely hard to grasp these beyesian methods,so I am eager to read such one written by Jim.Thx for your consideration.

Comments and Questions Cancel reply

Guide to Power Analysis and Statistical Power

Power analysis is a process that involves evaluating a test’s statistical power to determine the necessary sample size for a hypothesis test. Learn more.

Anmolika Singh

Statistical power and power analysis are essential tools for any researcher or data scientist . Statistical power measures the likelihood that a hypothesis test will detect a specific effect. Power analysis is a process that researchers use to determine the necessary sample size for a hypothesis test.

Power Analysis Definition

Power analysis is a statistical method that involves calculating the necessary sample size required for a study to detect meaningful results. It ensures that a study isn’t too small, which can result in false negatives, nor too large, which is a waste of resources.

The article explores the factors influencing power, such as sample size, effect size, significance level and data variability. We’ll also examine power analysis, a method ensuring studies have adequate sample sizes to detect meaningful effects. Conducting power analysis before data collection can prevent errors, allow you to allocate resources effectively and design ethically sound studies.

Understanding Statistical Power

Statistical power is a vital concept in hypothesis testing which is a statistical method to determine if the sample data supports a specific claim against the null statement. It measures the likelihood that a test will detect an effect if there truly is one. In other words, it shows how well the test can reject a false null hypothesis. 

In a study, a Type I error occurs when a true null hypothesis is mistakenly rejected, leading to a false positive result. This means that the test indicates an effect or difference when none actually exists. Conversely, a Type II error happens when a false null hypothesis is not rejected, resulting in a false negative. This error means the test fails to detect an actual effect or difference, wrongly concluding that no effect exists. 

High statistical power means there’s a lower chance of making a Type II error, which happens when a test fails to spot a real effect. 

Several factors affect a study’s power, including:

  • Sample size: The tally of observations or data points in a study.
  • Effect size: The magnitude of the difference or relationship being scrutinized.
  • Significance level: The threshold probability for dismissing the null hypothesis, often set at 0.05.
  • Data variability: The extent to which data points diverge from each other.

Ensuring sufficient power in a study is important to correctly identify and reject a false null hypothesis, thereby recognizing genuine effects and not missing them.

More on Data Science An Introduction to the Shapiro-Wilk Test for Normality

What Is Power Analysis?

Power analysis is a process that helps researchers determine the necessary sample size to detect an effect of a given size with a specific level of confidence. This method involves calculating the test’s statistical power for different sample sizes and effect sizes. Researchers use power analysis to design studies that aren’t too small, which might miss significant effects, or too large, which might waste resources. This ensures that they have enough participants to detect meaningful effects while managing resources wisely.

Why Is Power Analysis Important?

Power analysis is essential because it makes sure a study has the right tools to find the effects it aims to uncover. If a study lacks sufficient power, it might miss important effects, leading to false negatives. On the other hand, an overpowered study could waste resources. 

By doing a power analysis before collecting data, researchers can figure out the right sample size, use resources efficiently and boost the reliability of their findings. This step is key to producing trustworthy results in any scientific research.

5 Components of Power Analysis

Several components are essential to power analysis:

  • Effect size: This quantifies the magnitude of the difference or relationship under scrutiny. Larger effect sizes facilitate detection and necessitate smaller sample sizes, whereas smaller effect sizes mandate larger samples.
  • Sample size: This denotes the number of participants or observations in the study. A greater number of participants enhance power by furnishing more data, thereby simplifying the detection of true effects.
  • Significance level (α): Typically established at 0.05, this represents the threshold for refuting the null hypothesis. Reducing the significance level lessens the risk of Type I errors but entails a larger sample size to uphold power.
  • Power (1-β): This signifies the probability of accurately refuting the null hypothesis, commonly sought to be 0.80 or higher. Insufficient power heightens the likelihood of overlooking true effects, resulting in false negatives.
  • Variability: This pertains to the extent of fluctuation in the data. Greater variability diminishes power, rendering the detection of true effects more challenging. Researchers can mitigate variability by utilizing precise measurement tools and managing extraneous variables.

Power Analysis Example

Imagine a team of researchers embarking on a clinical trial to assess a new drug’s efficacy in combating a specific disease. They theorize that the innovative medication will slash symptoms by a remarkable 30 percent compared to the standard treatment. However, before commencing the trial, they must decide the required sample size to detect this effect size with ample power.

Conducting a power analysis, the researchers factor in the anticipated effect size, the desired power level, typically 80 percent or higher, and the significance level, usually 0.05. They also account for the variability in treatment response and potential dropouts or losses to follow-up.

Based on the power analysis, the researchers deduce that they must enlist 100 patients in each group, treatment and control, to achieve 80 percent power. This requires recruiting a total of 200 patients for the trial.

Benefits of Power Analysis

  • Optimal resource allocation: Power analysis ensures that studies utilize appropriate sample sizes, preventing wastage of resources on excessively large samples. This optimization allows for more efficient resource allocation, potentially enabling more studies to be conducted with the same resources.
  • Enhanced study validity: By diminishing the risk of Type II errors, power analysis bolsters the reliability and validity of study outcomes. This confidence in results and conclusions can lead to more impactful research outcomes.
  • Ethical research practices: Power analysis aids in designing ethically sound studies by avoiding unnecessary participant involvement. By determining the minimum sample size required to detect an effect, researchers can minimize participant exposure to experimental conditions without compromising the study’s validity.
  • Informed decision making: Power analysis equips researchers with data to substantiate sample sizes and study designs. This information enables researchers to make informed decisions regarding the feasibility and potential impact of their studies, leading to more successful research outcomes.

More on Data Science How to Do a T-Test in Python

Applications of Power Analysis

  • Medical research: Power analysis is crucial in medical research for determining sample sizes in clinical trials. By ensuring studies have sufficient power, researchers can more accurately detect treatment effects and improve patient outcomes.
  • Psychology: In psychology, power analysis is utilized to design experiments that detect behavioral effects. By determining the necessary sample size, researchers can ensure studies are adequately powered to detect meaningful effects, leading to more robust conclusions.
  • Education: Power analysis is employed in education to evaluate the effectiveness of educational interventions. By determining the sample size necessary to detect a desired effect size, researchers can design studies that offer valuable insights into the impact of educational programs.
  • Business: In business, power analysis is used in market research to assess consumer preferences and behaviors. By determining the sample size required to detect differences in consumer behavior, businesses can make informed decisions about marketing strategies and product development.

Frequently Asked Questions

What comprises power analysis.

The primary components of power analysis encompass effect size, sample size, significance level (α), power (1-β) and variability. Effect size denotes the magnitude of the difference or relationship under scrutiny, while sample size represents the number of observations or participants. The significance level serves as the probability threshold for refuting the null hypothesis, and power stands as the likelihood of accurately refuting the null hypothesis when the alternative hypothesis holds true. Variability indicates the extent of variation in the data, which can impact the study’s power.

What insights does power analysis offer?

Power analysis provides the necessary sample size for detecting an effect of a specified magnitude with a particular level of confidence. It aids researchers in devising studies that possess adequate power to identify significant effects, thereby diminishing the risk of Type II errors and optimizing resource utilization. Additionally, power analysis enlightens researchers about the probability of detecting true effects in their studies, enriching the validity and dependability of their conclusions.

Recent Statistical Analysis Articles

Understanding the Hidden Markov Model

  • About Meera

Search form

what is power in research study

My Environmental Education Evaluation Resource Assistant

  • Evaluation: What is it and why do it?
  • Planning and Implementing an EE Evaluation
  • Step 1: Before You Get Started
  • Step 2: Program Logic
  • Step 3: Goals of Evaluation
  • Step 4: Evaluation Design
  • Step 5: Collecting Data
  • Step 6: Analyzing Data
  • Step 7: Reporting Results
  • Step 8: Improve Program
  • Related Topics
  • Sample EE Evaluations
  • Links & Resources

Power Analysis, Statistical Significance, & Effect Size

If you plan to use inferential statistics (e.g., t-tests, ANOVA, etc.) to analyze your evaluation results, you should first conduct a power analysis to determine what size sample you will need. This page describes what power is as well as what you will need to calculate it.

What is power?

To understand power, it is helpful to review what inferential statistics test. When you conduct an inferential statistical test, you are often comparing two hypotheses:

  • The null hypothesis – This hypothesis predicts that your program will not have an effect on your variable of interest. For example, if you are measuring students’ level of concern for the environment before and after a field trip, the null hypothesis is that their level of concern will remain the same.
  • The alternative hypothesis – This hypothesis predicts that you will find a difference between groups. Using the example above, the alternative hypothesis is that students’ post-trip level of concern for the environment will differ from their pre-trip level of concern.

Statistical tests look for evidence that you can reject the null hypothesis and conclude that your program had an effect. With any statistical test, however, there is always the possibility that you will find a difference between groups when one does not actually exist. This is called a Type I error. Likewise, it is possible that when a difference does exist, the test will not be able to identify it. This type of mistake is called a Type II error.

Power refers to the probability that your test will find a statistically significant difference when such a difference actually exists. In other words, power is the probability that you will reject the null hypothesis when you should (and thus avoid a Type II error). It is generally accepted that power should be .8 or greater; that is, you should have an 80% or greater chance of finding a statistically significant difference when there is one.

Increase your sample size to be on the safe side!

How do I use power calculations to determine my sample size?

Generally speaking, as your sample size increases, so does the power of your test. This should intuitively make sense as a larger sample means that you have collected more information -- which makes it easier to correctly reject the null hypothesis when you should.

To ensure that your sample size is big enough, you will need to conduct a power analysis calculation. Unfortunately, these calculations are not easy to do by hand, so unless you are a statistics whiz, you will want the help of a software program. Several software programs are available for free on the Internet and are described below.

For any power calculation, you will need to know:

  • What type of test you plan to use (e.g., independent t-test, paired t-test, ANOVA, regression, etc. See Step 6 if you are not familiar with these tests.),
  • The alpha value or significance level you are using (usually 0.01 or 0.05. See the next section of this page for more information.),
  • The expected effect size (See the last section of this page for more information.),
  • The sample size you are planning to use

When these values are entered, a power value between 0 and 1 will be generated. If the power is less than 0.8, you will need to increase your sample size.

What is statistical significance?

There is always some likelihood that the changes you observe in your participants’ knowledge, attitudes, and behaviors are due to chance rather than to the program. Testing for statistical significance helps you learn how likely it is that these changes occurred randomly and do not represent differences due to the program.

To learn whether the difference is statistically significant, you will have to compare the probability number you get from your test (the p-value) to the critical probability value you determined ahead of time (the alpha level). If the p-value is less than the alpha value, you can conclude that the difference you observed is statistically significant.

P-Value : the probability that the results were due to chance and not based on your program. P-values range from 0 to 1. The lower the p-value, the more likely it is that a difference occurred as a result of your program.

Alpha (α) level : the error rate that you are willing to accept. Alpha is often set at .05 or .01. The alpha level is also known as the Type I error rate. An alpha of .05 means that you are willing to accept that there is a 5% chance that your results are due to chance rather than to your program.

What alpha value should I use to calculate power?

An alpha level of less than .05 is accepted in most social science fields as statistically significant, and this is the most common alpha level used in EE evaluations.

The following resources provide more information on statistical significance:

Statistical Significance Creative Research Systems, (2000). Beginner This page provides an introduction to what statistical significance means in easy-to-understand language, including descriptions and examples of p-values and alpha values, and several common errors in statistical significance testing. Part 2 provides a more advanced discussion of the meaning of statistical significance numbers.

Statistical Significance Statpac, (2005). Beginner This page introduces statistical significance and explains the difference between one-tailed and two-tailed significance tests. The site also describes the procedure used to test for significance (including the p value)

What is effect size?

When a difference is statistically significant, it does not necessarily mean that it is big, important, or helpful in decision-making. It simply means you can be confident that there is a difference. Let’s say, for example, that you evaluate the effect of an EE activity on student knowledge using pre and posttests. The mean score on the pretest was 83 out of 100 while the mean score on the posttest was 84. Although you find that the difference in scores is statistically significant (because of a large sample size), the difference is very slight, suggesting that the program did not lead to a meaningful increase in student knowledge.

To know if an observed difference is not only statistically significant but also important or meaningful, you will need to calculate its effect size. Rather than reporting the difference in terms of, for example, the number of points earned on a test or the number of pounds of recycling collected, effect size is standardized. In other words, all effect sizes are calculated on a common scale -- which allows you to compare the effectiveness of different programs on the same outcome.

How do I calculate effect size?

There are different ways to calculate effect size depending on the evaluation design you use. Generally, effect size is calculated by taking the difference between the two groups (e.g., the mean of treatment group minus the mean of the control group) and dividing it by the standard deviation of one of the groups. For example, in an evaluation with a treatment group and control group, effect size is the difference in means between the two groups divided by the standard deviation of the control group.

mean of treatment group – mean of control group

standard deviation of control group

To interpret the resulting number, most social scientists use this general guide developed by Cohen:

  • < 0.1 = trivial effect
  • 0.1 - 0.3 = small effect
  • 0.3 - 0.5 = moderate effect
  • > 0.5 = large difference effect

How do I estimate effect size for calculating power?

Because effect size can only be calculated after you collect data from program participants, you will have to use an estimate for the power analysis. Common practice is to use a value of 0.5 as it indicates a moderate to large difference.

For more information on effect size, see:

Effect Size Resources Coe, R. (2000). Curriculum, Evaluation, and Management Center Intermediate Advanced This page offers three useful resources on effect size: 1) a brief introduction to the concept, 2) a more thorough guide to effect size, which explains how to interpret effect sizes, discusses the relationship between significance and effect size, and discusses the factors that influence effect size, and 3) an effect size calculator with an accompanying user's guide.

Effect Size (ES) Becker, L. (2000). Intermediate Advanced This website provides an overview of what effect size is (including Cohen’s definition of effect size). It also discusses how to measure effect size for two independent groups, for two dependent groups, and when conducting Analysis of Variance. Several effect size calculators are also provided.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New Jersey: Lawrence Erlbaum.

Smith, M. (2004). Is it the sample size of the sample as a fraction of the population that matters? Journal of Statistics Education. 12:2. Retrieved September 14, 2006 from http://www.amstat.org/publications/jse/v12n2/smith.html

Patton, M. Q. (1990). Qualitative research and evaluation methods. London: Sage Publications.

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Statistical Methods and Data Analytics

Introduction to Power Analysis

This seminar treats power and the various factors that affect power on both a conceptual and a mechanical level. While we will not cover the formulas needed to actually run a power analysis, later on we will discuss some of the software packages that can be used to conduct power analyses.

OK, let’s start off with a basic definition of what a power is.  Power is the probability of detecting an effect, given that the effect is really there.  In other words, it is the probability of rejecting the null hypothesis when it is in fact false.  For example, let’s say that we have a simple study with drug A and a placebo group, and that the drug truly is effective; the power is the probability of finding a difference between the two groups.  So, imagine that we had a power of .8 and that this simple study was conducted many times.  Having power of .8 means that 80% of the time, we would get a statistically significant difference between the drug A and placebo groups.  This also means that 20% of the times that we run this experiment, we will not obtain a statistically significant effect between the two groups, even though there really is an effect in reality.

There are several of reasons why one might do a power analysis.  Perhaps the most common use is to determine the necessary number of subjects needed to detect an effect of a given size.  Note that trying to find the absolute, bare minimum number of subjects needed in the study is often not a good idea.  Additionally, power analysis can be used to determine power, given an effect size and the number of subjects available.  You might do this when you know, for example, that only 75 subjects are available (or that you only have the budget for 75 subjects), and you want to know if you will have enough power to justify actually doing the study.  In most cases, there is really no point to conducting a study that is seriously underpowered.  Besides the issue of the number of necessary subjects, there are other good reasons for doing a power analysis.  For example, a power analysis is often required as part of a grant proposal.  And finally, doing a power analysis is often just part of doing good research.  A power analysis is a good way of making sure that you have thought through every aspect of the study and the statistical analysis before you start collecting data.

Despite these advantages of power analyses, there are some limitations.  One limitation is that power analyses do not typically generalize very well.  If you change the methodology used to collect the data or change the statistical procedure used to analyze the data, you will most likely have to redo the power analysis.  In some cases, a power analysis might suggest a number of subjects that is inadequate for the statistical procedure.  For example, a power analysis might suggest that you need 30 subjects for your logistic regression, but logistic regression, like all maximum likelihood procedures, require much larger sample sizes.  Perhaps the most important limitation is that a standard power analysis gives you a “best case scenario” estimate of the necessary number of subjects needed to detect the effect.  In most cases, this “best case scenario” is based on assumptions and educated guesses.  If any of these assumptions or guesses are incorrect, you may have less power than you need to detect the effect.  Finally, because power analyses are based on assumptions and educated guesses, you often get a range of the number of subjects needed, not a precise number.  For example, if you do not know what the standard deviation of your outcome measure will be, you guess at this value, run the power analysis and get X number of subjects.  Then you guess a slightly larger value, rerun the power analysis and get a slightly larger number of necessary subjects.  You repeat this process over the plausible range of values of the standard deviation, which gives you a range of the number of subjects that you will need.

After all of this discussion of power analyses and the necessary number of subjects, we need to stress that power is not the only consideration when determining the necessary sample size.  For example, different researchers might have different reasons for conducting a regression analysis.  One might want to see if the regression coefficient is different from zero, while the other wants to get a very precise estimate of the regression coefficient with a very small confidence interval around it.  This second purpose requires a larger sample size than does merely seeing if the regression coefficient is different from zero.  Another consideration when determining the necessary sample size is the assumptions of the statistical procedure that is going to be used.  The number of statistical tests that you intend to conduct will also influence your necessary sample size:  the more tests that you want to run, the more subjects that you will need.  You will also want to consider the representativeness of the sample, which, of course, influences the generalizability of the results.  Unless you have a really sophisticated sampling plan, the greater the desired generalizability, the larger the necessary sample size.  Finally, please note that most of what is in this presentation does not readily apply to people who are developing a sampling plan for a survey or psychometric analyses.

Definitions

Before we move on, let’s make sure we are all using the same definitions.  We have already defined power as the probability of detecting a “true” effect, when the effect exists.  Most recommendations for power fall between .8 and .9.  We have also been using the term “effect size”, and while intuitively it is an easy concept, there are lots of definitions and lots of formulas for calculating effect sizes.  For example, the current APA manual has a list of more than 15 effect sizes, and there are more than a few books mostly dedicated to the calculation of effect sizes in various situations.  For now, let’s stick with one of the simplest definitions, which is that an effect size is the difference of two group means divided by the pooled standard deviation.  Going back to our previous example, suppose the mean of the outcome variable for the drug A group was 10 and it was 5 for the placebo group.  If the pooled standard deviation was 2.5, we would have and effect size which is equal to (10-5)/2.5 = 2 (which is a large effect size).

We also need to think about “statistically significance” versus “clinically relevant”.  This issue comes up often when considering effect sizes. For example, for a given number of subjects, you might only need a small effect size to have a power of .9.  But that effect size might correspond to a difference between the drug and placebo groups that isn’t clinically meaningful, say reducing blood pressure by two points.  So even though you would have enough power, it still might not be worth doing the study, because the results would not be useful for clinicians.

There are a few other definitions that we will need later in this seminar.  A Type I error occurs when the null hypothesis is true (in other words, there really is no effect), but you reject the null hypothesis.  A Type II error occurs when the alternative hypothesis is correct, but you fail to reject the null hypothesis (in other words, there really is an effect, but you failed to detect it).  Alpha inflation refers to the increase in the nominal alpha level when the number of statistical tests conducted on a given data set is increased.

When discussing statistical power, we have four inter-related concepts: power, effect size, sample size and alpha.  These four things are related such that each is a function of the other three.  In other words, if three of these values are fixed, the fourth is completely determined (Cohen, 1988, page 14).  We mention this because, by increasing one, you can decrease (or increase) another.  For example, if you can increase your effect size, you will need fewer subjects, given the same power and alpha level.  Specifically, increasing the effect size, the sample size and/or alpha will increase your power.

While we are thinking about these related concepts and the effect of increasing things, let’s take a quick look at a standard power graph.  (This graph was made in SPSS Sample Power, and for this example, we’ve used .61 and 4 for our two proportion positive values.)

We like these kinds of graphs because they make clear the diminishing returns you get for adding more and more subjects.  For example, let’s say that we have only 10 subjects per group.  We can see that we have a power of about .15, which is really, really low.  We add 50 subjects per group, now we have a power of about .6, an increase of .45.  However, if we started with 100 subjects per group (power of about .8) and added 50 per group, we would have a power of .95, an increase of only .15.  So each additional subject gives you less additional power.  This curve also illustrates the “cost” of increasing your desired power from .8 to .9.

Knowing your research project

As we mentioned before, one of the big benefits of doing a power analysis is making sure that you have thought through every detail of your research project.

Now most researchers have thought through most, if not all, of the substantive issues involved in their research.  While this is absolutely necessary, it often is not sufficient.  Researchers also need to carefully consider all aspects of the experimental design, the variables involved, and the statistical analysis technique that will be used.  As you will see in the next sections of this presentation, a power analysis is the union of substantive knowledge (i.e., knowledge about the subject matter), experimental or quasi-experimental design issues, and statistical analysis.  Almost every aspect of the experimental design can affect power.  For example, the type of control group that is used or the number of time points that are collected will affect how much power you have.  So knowing about these issues and carefully considering your options is important.  There are plenty of excellent books that cover these issues in detail, including Shadish, Cook and Campbell (2002); Cook and Campbell (1979); Campbell and Stanley (1963); Brickman (2000a, 2000b); Campbell and Russo (2001); Webb, Campbell, Schwartz and Sechrest (2000); and Anderson (2001).

Also, you want to know as much as possible about the statistical technique that you are going to use.  If you learn that you need to use a binary logistic regression because your outcome variable is 0/1, don’t stop there; rather, get a sample data set (there are plenty of sample data sets on our web site) and try it out.  You may discover that the statistical package that you use doesn’t do the type of analysis that need to do.  For example, if you are an SPSS user and you need to do a weighted multilevel logistic regression, you will quickly discover that SPSS doesn’t do that (as of version 25), and you will have to find (and probably learn) another statistical package that will do that analysis.  Maybe you want to learn another statistical package, or maybe that is beyond what you want to do for this project.  If you are writing a grant proposal, maybe you will want to include funds for purchasing the new software.  You will also want to learn what the assumptions are and what the “quirks” are with this particular type of analysis.  Remember that the number of necessary subjects given to you by a power analysis assumes that all of the assumptions of the analysis have been met, so knowing what those assumptions are is important deciding if they are likely to be met or not.

The point of this section is to make clear that knowing your research project involves many things, and you may find that you need to do some research about experimental design or statistical techniques before you do your power analysis.

We want to emphasize that this is time and effort well spent.  We also want to remind you that for almost all researchers, this is a normal part of doing good research.  UCLA researchers are welcome and encouraged to come by walk-in consulting at this stage of the research process to discuss issues and ideas, check out books and try out software.

What you need to know to do a power analysis

In the previous section, we discussed in general terms what you need to know to do a power analysis.  In this section we will discuss some of the actual quantities that you need to know to do a power analysis for some simple statistics.  Although we understand very few researchers test their main hypothesis with a t-test or a chi-square test, our point here is only to give you a flavor of the types of things that you will need to know (or guess at) in order to be ready for a power analysis.

– For an independent samples t-test, you will need to know the population means of the two groups (or the difference between the means), and the population standard deviations of the two groups.  So, using our example of drug A and placebo, we would need to know the difference in the means of the two groups, as well as the standard deviation for each group (because the group means and standard deviations are the best estimate that we have of those population values).  Clearly, if we knew all of this, we wouldn’t need to conduct the study.  In reality, researchers make educated guesses at these values.  We always recommend that you use several different values, such as decreasing the difference in the means and increasing the standard deviations, so that you get a range of values for the number of necessary subjects.

In SPSS Sample Power, we would have a screen that looks like the one below, and we would fill in the necessary values.  As we can see, we would need a total of 70 subjects (35 per group) to have a power of .91 if we had a mean of 5 and a standard deviation of 2.5 in the drug A group, and a mean of 3 and a standard deviation of 2.5 in the placebo group.  If we decreased the difference in the means and increased the standard deviations such that for the drug A group, we had a mean of 4.5 and a standard deviation of 3, and for the placebo group a mean of 3.5 and a standard deviation of 3, we would need 190 subjects per group, or a total of 380 subjects, to have a power of .90.  In other words, seemingly small differences in means and standard deviations can have a huge effect on the number of subjects required.

Image t-test

– For a correlation, you need to know/guess at the correlation in the population.  This is a good time to remember back to an early stats class where they emphasized that correlation is a large N procedure (Chen and Popovich, 2002).  If you guess that the population correlation is .6, a power analysis would suggest (with an alpha of .05 and for a power of .8) that you would need only 16 subjects.  There are several points to be made here.  First, common sense suggests that N = 16 is pretty low.  Second, a population correlation of .6 is pretty high, especially in the social sciences.  Third, the power analysis assumes that all of the assumptions of the correlation have been met.  For example, we are assuming that there is no restriction of range issue, which is common with Likert scales; the sample data for both variables are normally distributed; the relationship between the two variables is linear; and there are no serious outliers.  Also, whereas you might be able to say that the sample correlation does not equal zero, you likely will not have a very precise estimate of the population correlation coefficient.

Image corr

– For a chi-square test, you will need to know the proportion positive for both populations (i.e., rows and columns).  Let’s assume that we will have a 2 x 2 chi-square, and let’s think of both variables as 0/1.  Let’s say that we wanted to know if there was a relationship between drug group (drug A/placebo) and improved health.  In SPSS Sample Power, you would see a screen like this.

Image chi-square

In order to get the .60 and the .30, we would need to know (or guess at) the number of people whose health improved in both the drug A and placebo groups.

We would also need to know (or guess at) either the number of people whose health did not improve in those two groups, or the total number of people in each group.

Improved health (positive) Not improved health Row total
Drug A (positive) 33   (33/55 = .6) 22 55
Placebo 17   (17/55 = .3) 38 55
Column total 50 60 Grand Total = 110

– For an ordinary least squares regression, you would need to know things like the R 2 for the full and reduced model.  For a simple logistic regression analysis with only one continuous predictor variable, you would need to know the probability of a positive outcome (i.e., the probability that the outcome equals 1) at the mean of the predictor variable and the probability of a positive outcome at one standard deviation above the mean of the predictor variable.  Especially for the various types of logistic models (e.g., binary, ordinal and multinomial), you will need to think very carefully about your sample size, and information from a power analysis will only be part of your considerations.  For example, according to Long (1997, pages 53-54), 100 is a minimum sample size for logistic regression, and you want *at least* 10 observations per predictor.  This does not mean that if you have only one predictor you need only 10 observations.

Also, if you have categorical predictors, you may need to have more observations to avoid computational difficulties caused by empty cells or cells with few observations.  More observations are needed when the outcome variable is very lopsided; in other words, when there are very few 1s and lots of 0s, or vice versa.  These cautions emphasize the need to know your data set well, so that you know if your outcome variable is lopsided or if you are likely to have a problem with empty cells.

The point of this section is to give you a sense of the level of detail about your variables that you need to be able to estimate in order to do a power analysis. Also, when doing power analyses for regression models, power programs will start to ask for values that most researchers are not accustomed to providing.  Guessing at the mean and standard deviation of your response variable is one thing, but increments to R 2 is a metric in which few researchers are used to thinking.  In our next section we will discuss how you can guestimate these numbers.

Obtaining the necessary numbers to do a power analysis

There are at least three ways to guestimate the values that are needed to do a power analysis: a literature review, a pilot study and using Cohen’s recommendations.  We will review the pros and cons of each of these methods.  For this discussion, we will focus on finding the effect size, as that is often the most difficult number to obtain and often has the strongest impact on power.

Literature review: Sometimes you can find one or more published studies that are similar enough to yours that you can get a idea of the effect size.  If you can find several such studies, you might be able to use meta-analysis techniques to get a robust estimate of the effect size.  However, oftentimes there are no studies similar enough to your study to get a good estimate of the effect size.  Even if you can find such an study, the necessary effect sizes or other values are often not clearly stated in the article and need to be calculated (if they can) based on the information provided.

Pilot studies:  There are lots of good reasons to do a pilot study prior to conducting the actual study.  From a power analysis prospective, a pilot study can give you a rough estimate of the effect size, as well as a rough estimate of the variability in your measures.  You can also get some idea about where missing data might occur, and as we will discuss later, how you handle missing data can greatly affect your power.  Other benefits of a pilot study include allowing you to identify coding problems, setting up the data base, and inputting the data for a practice analysis.  This will allow you to determine if the data are input in the correct shape, etc.

Of course, there are some limitations to the information that you can get from a pilot study.  (Many of these limitations apply to small samples in general.)  First of all, when estimating effect sizes based on nonsignificant results, the effect size estimate will necessarily have an increased error; in other words, the standard error of the effect size estimate will be larger than when the result is significant. The effect size estimate that you obtain may be unduly influenced by some peculiarity of the small sample.  Also, you often cannot get a good idea of the degree of missingness and attrition that will be seen in the real study.  Despite these limitations, we strongly encourage researchers to conduct a pilot study.  The opportunity to identify and correct “bugs” before collecting the real data is often invaluable.  Also, because of the number of values that need to be guestimated in a power analysis, the precision of any one of these values is not that important.  If you can estimate the effect size to within 10% or 20% of the true value, that is probably sufficient for you to conduct a meaningful power analysis, and such fluctuations can be taken into account during the power analysis.

Cohen’s recommendations:  Jacob Cohen has many well-known publications regarding issues of power and power analyses, including some recommendations about effect sizes that you can use when doing your power analysis.  Many researchers (including Cohen) consider the use of such recommendations as a last resort, when a thorough literature review has failed to reveal any useful numbers and a pilot study is either not possible or not feasible.  From Cohen (1988, pages 24-27):

– Small effect:  1% of the variance; d = 0.25 (too small to detect other than statistically; lower limit of what is clinically relevant)

– Medium effect:  6% of the variance; d = 0.5 (apparent with careful observation)

– Large effect: at least 15% of the variance; d = 0.8 (apparent with a superficial glance; unlikely to be the focus of research because it is too obvious)

Lipsey and Wilson (1993) did a meta analysis of 302 meta analyses of over 10,000 studies and found that the average effect size was .5, adding support to Cohen’s recommendation that, as a last resort, guess that the effect size is .5 (cited in Bausell and Li, 2002).  Sedlmeier and Gigerenzer (1989) found that the average effect size for articles in The Journal of Abnormal Psychology was a medium effect.  According to Keppel and Wickens (2004), when you really have no idea what the effect size is, go with the smallest effect size of practical value.  In other words, you need to know how small of a difference is meaningful to you.  Keep in mind that research suggests that most researchers are overly optimistic about the effect sizes in their research, and that most research studies are under powered (Keppel and Wickens, 2004; Tversky and Kahneman, 1971).  This is part of the reason why we stress that a power analysis gives you a lower limit to the number of necessary subjects.

Factors that affect power

From the preceding discussion, you might be starting to think that the number of subjects and the effect size are the most important factors, or even the only factors, that affect power.  Although effect size is often the largest contributor to power, saying it is the only important issue is far from the truth.  There are at least a dozen other factors that can influence the power of a study, and many of these factors should be considered not only from the perspective of doing a power analysis, but also as part of doing good research.  The first couple of factors that we will discuss are more “mechanical” ways of increasing power (e.g., alpha level, sample size and effect size). After that, the discussion will turn to more methodological issues that affect power.

1.  Alpha level:  One obvious way to increase your power is to increase your alpha (from .05 to say, .1).  Whereas this might be an advisable strategy when doing a pilot study, increasing your alpha usually is not a viable option.  We should point out here that many researchers are starting to prefer to use .01 as an alpha level instead of .05 as a crude attempt to assure results are clinically relevant; this alpha reduction reduces power.

1a.  One- versus two-tailed tests:  In some cases, you can test your hypothesis with a one-tailed test.  For example, if your hypothesis was that drug A is better than the placebo, then you could use a one-tailed test.  However, you would fail to detect a difference, even if it was a large difference, if the placebo was better than drug A.  The advantage of one-tailed tests is that they put all of your power “on one side” to test your hypothesis.  The disadvantage is that you cannot detect differences that are in the opposite direction of your hypothesis.  Moreover, many grant and journal reviewers frown on the use of one-tailed tests, believing it is a way to feign significance (Stratton and Neil, 2004).

2.  Sample size:  A second obvious way to increase power is simply collect data on more subjects.  In some situations, though, the subjects are difficult to get or extremely costly to run.  For example, you may have access to only 20 autistic children or only have enough funding to interview 30 cancer survivors.  If possible, you might try increasing the number of subjects in groups that do not have these restrictions, for example, if you are comparing to a group of normal controls.  While it is true that, in general, it is often desirable to have roughly the same number of subjects in each group, this is not absolutely necessary.  However, you get diminishing returns for additional subjects in the control group:  adding an extra 100 subjects to the control group might not be much more helpful than adding 10 extra subjects to the control group.

3.  Effect size:  Another obvious way to increase your power is to increase the effect size.  Of course, this is often easier said than done. A common way of increasing the effect size is to increase the experimental manipulation.  Going back to our example of drug A and placebo, increasing the experimental manipulation might mean increasing the dose of the drug. While this might be a realistic option more often than increasing your alpha level, there are still plenty of times when you cannot do this.  Perhaps the human subjects committee will not allow it, it does not make sense clinically, or it doesn’t allow you to generalize your results the way you want to.  Many of the other issues discussed below indirectly increase effect size by providing a stronger research design or a more powerful statistical analysis.

4.  Experimental task:  Well, maybe you can not increase the experimental manipulation, but perhaps you can change the experimental task, if there is one.  If a variety of tasks have been used in your research area, consider which of these tasks provides the most power (compared to other important issues, such as relevancy, participant discomfort, and the like).  However, if various tasks have not been reviewed in your field, designing a more sensitive task might be beyond the scope of your research project.

5.  Response variable:  How you measure your response variable(s) is just as important as what task you have the subject perform.  When thinking about power, you want to use a measure that is as high in sensitivity and low in measurement error as is possible.  Researchers in the social sciences often have a variety of measures from which they can choose, while researchers in other fields may not.  For example, there are numerous established measures of anxiety, IQ, attitudes, etc.  Even if there are not established measures, you still have some choice.  Do you want to use a Likert scale, and if so, how many points should it have?  Modifications to procedures can also help reduce measurement error.  For example, you want to make sure that each subject knows exactly what he or she is supposed to be rating.  Oral instructions need to be clear, and items on questionnaires need to be unambiguous to all respondents.  When possible, use direct instead of indirect measures.  For example, asking people what tax bracket they are in is a more direct way of determining their annual income than asking them about the square footage of their house.  Again, this point may be more applicable to those in the social sciences than those in other areas of research.  We should also note that minimizing the measurement error in your predictor variables will also help increase your power.

Just as an aside, most texts on experimental design strongly suggest collecting more than one measure of the response in which you are interested. While this is very good methodologically and provides marked benefits for certain analyses and missing data, it does complicate the power analysis.

6.  Experimental design:  Another thing to consider is that some types of experimental designs are more powerful than others.  For example, repeated measures designs are virtually always more powerful than designs in which you only get measurements at one time.  If you are already using a repeated measures design, increasing the number of time points a response variable is collected to at least four or five will also provide increased power over fewer data collections.  There is a point of diminishing return when a researcher collects too many time points, though this depends on many factors such as the response variable, statistical design, age of participants, etc.

7.  Groups:  Another point to consider is the number and types of groups that you are using.  Reducing the number of experimental conditions will reduce the number of subjects that is needed, or you can keep the same number of subjects and just have more per group.  When thinking about which groups to exclude from the design, you might want to leave out those in the middle and keep the groups with the more extreme manipulations.  Going back to our drug A example, let’s say that we were originally thinking about having a total of four groups: the first group will be our placebo group, the second group would get a small dose of drug A, the third group a medium dose, and the fourth group a large dose.  Clearly, much more power is needed to detect an effect between the medium and large dose groups than to detect an effect between the large dose group and the placebo group.  If we found that we were unable to increase the power enough such that we were likely to find an effect between small and medium dose groups or between the medium and the large dose groups, then it would probably make more sense to run the study without these groups.  In some cases, you may even be able to change your comparison group to something more extreme.  For example, we once had a client who was designing a study to compare people with clinical levels of anxiety to a group that had subclinical levels of anxiety.  However, while doing the power analysis and realizing how many subjects she would need to detect the effect, she found that she needed far fewer subjects if she compared the group with the clinical levels of anxiety to a group of “normal” people (a number of subjects she could reasonably obtain).

8.  Statistical procedure:  Changing the type of statistical analysis may also help increase power, especially when some of the assumptions of the test are violated.  For example, as Maxwell and Delaney (2004) noted, “Even when ANOVA is robust, it may not provide the most powerful test available when its assumptions have been violated.”  In particular, violations of assumptions regarding independence, normality and heterogeneity can reduce power. In such cases, nonparametric alternatives may be more powerful.

9.  Statistical model:  You can also modify the statistical model.  For example, interactions often require more power than main effects.  Hence, you might find that you have reasonable power for a main effects model, but not enough power when the model includes interactions.  Many (perhaps most?) power analysis programs do not have an option to include interaction terms when describing the proposed analysis, so you need to keep this in mind when using these programs to help you determine how many subjects will be needed.  When thinking about the statistical model, you might want to consider using covariates or blocking variables.  Ideally, both covariates and blocking variables reduce the variability in the response variable.  However, it can be challenging to find such variables.  Moreover, your statistical model should use as many of the response variable time points as possible when examining longitudinal data.  Using a change-score analysis when one has collected five time points makes little sense and ignores the added power from these additional time points.  The more the statistical model “knows” about how a person changes over time, the more variance that can be pulled out of the error term and ascribed to an effect.

9a. Correlation between time points:  Understanding the expected correlation between a response variable measured at one time in your study with the same response variable measured at another time can provide important and power-saving information.  As noted previously, when the statistical model has a certain amount of information regarding the manner by which people change over time, it can enhance the effect size estimate.  This is largely dependent on the correlation of the response measure over time.  For example, in a before-after data collection scenario, response variables with a .00 correlation from before the treatment to after the treatment would provide no extra benefit to the statistical model, as we can’t better understand a subject’s score by knowing how he or she changes over time.  Rarely, however, do variables have a .00 correlation on the same outcomes measured at different times.  It is important to know that outcome variables with larger correlations over time provide enhanced power when used in a complimentary statistical model.

10.  Modify response variable:  Besides modifying your statistical model, you might also try modifying your response variable.  Possible benefits of this strategy include reducing extreme scores and/or meeting the assumptions of the statistical procedure.  For example, some response variables might need to be log transformed.  However, you need to be careful here.  Transforming variables often makes the results more difficult to interpret, because now you are working in, say, a logarithm metric instead of the metric in which the variable was originally measured. Moreover, if you use a transformation that adjusts the model too much, you can loose more power than is necessary.  Categorizing continuous response variables (sometimes used as a way of handling extreme scores) can also be problematic, because logistic or ordinal logistic regression often requires many more subjects than does OLS regression.  It makes sense that categorizing a response variable will lead to a loss of power, as information is being “thrown away.”

11.  Purpose of the study:  Different researchers have different reasons for conducting research.  Some are trying to determine if a coefficient (such as a regression coefficient) is different from zero.  Others are trying to get a precise estimate of a coefficient.  Still others are replicating research that has already been done.  The purpose of the research can affect the necessary sample size.  Going back to our drug A and placebo study, let’s suppose our purpose is to test the difference in means to see if it equals zero.   In this case, we need a relatively small sample size.  If our purpose is to get a precise estimate of the means (i.e., minimizing the standard errors), then we will need a larger sample size.  If our purpose is to replicate previous research, then again we will need a relatively large sample size.  Tversky and Kahneman (1971) pointed out that we often need more subjects in a replication study than were in the original study.  They also noted that researchers are often too optimistic about how much power they really have.  They claim that researchers too readily assign “causal” reasons to explain differences between studies, instead of sampling error. They also mentioned that researchers tend to underestimate the impact of sampling and think that results will replicate more often than is the case.

12.  Missing data:  A final point that we would like to make here regards missing data.  Almost all researchers have issues with missing data.  When designing your study and selecting your measures, you want to do everything possible to minimize missing data.  Handling missing data via imputation methods can be very tricky and very time-consuming.  If the data set is small, the situation can be even more difficult.  In general, missing data reduces power; poor imputation methods can greatly reduce power.  If you have to impute, you want to have as few missing data points on as few variables as possible.  When designing the study, you might want to collect data specifically for use in an imputation model (which usually involves a different set of variables than the model used to test your hypothesis).  It is also important to note that the default technique for handling missing data by virtually every statistical program is to remove the entire case from an analysis (i.e., listwise deletion).  This process is undertaken even if the analysis involves 20 variables and a subject is missing only one datum of the 20.  Listwise deletion is one of the biggest contributors to loss of power, both because of the omnipresence of missing data and because of the omnipresence of this default setting in statistical programs (Graham et al., 2003).

This ends the section on the various factors that can influence power.  We know that was a lot, and we understand that much of this can be frustrating because there is very little that is “black and white”.  We hope that this section made clear the close relationship between the experimental design, the statistical analysis and power.

Cautions about small sample sizes and sampling variation

We want to take a moment here to mention some issues that frequently arise when using small samples.  (We aren’t going to put a lower limit on what we mean be “small sample size.”)  While there are situations in which a researcher can either only get or afford a small number of subjects, in most cases, the researcher has some choice in how many subjects to include.  Considerations of time and effort argue for running as few subjects as possible, but there are some difficulties associated with small sample sizes, and these may outweigh any gains from the saving of time, effort or both.  One obvious problem with small sample sizes is that they have low power.  This means that you need to have a large effect size to detect anything.  You will also have fewer options with respect to appropriate statistical procedures, as many common procedures, such as correlations, logistic regression and multilevel modeling, are not appropriate with small sample sizes.  It may also be more difficult to evaluate the assumptions of the statistical procedure that is used (especially assumptions like normality).  In most cases, the statistical model must be smaller when the data set is small. Interaction terms, which often test interesting hypotheses, are frequently the first casualties.  Generalizability of the results may also be comprised, and it can be difficult to argue that a small sample is representative of a large and varied population. Missing data are also more problematic; there are a reduced number of imputations methods available to you, and these are not considered to be desirable imputation methods (such as mean imputation).  Finally, with a small sample size, alpha inflation issues can be more difficult to address, and you are more likely to run as many tests as you have subjects.

While the issue of sampling variability is relevant to all research, it is especially relevant to studies with small sample sizes.  To quote Murphy and Myors (2004, page 59), “The lack of attention to power analysis (and the deplorable habit of placing too much weight on the results of small sample studies) are well documented in the literature, and there is no good excuse to ignore power in designing studies.”  In an early article entitled The Law of Small Numbers , Tversky and Kahneman (1971) stated that many researchers act like the Law of Large Numbers applies to small numbers.  People often believe that small samples are more representative of the population than they really are.

The last two points to be made here is that there is usually no point to conducting an underpowered study, and that underpowered studies can cause chaos in the literature because studies that are similar methodologically may report conflicting results.

We will briefly discuss some of the programs that you can use to assist you with your power analysis.  Most programs are fairly easy to use, but you still need to know effect sizes, means, standard deviations, etc.

Among the programs specifically designed for power analysis, we use SPSS Sample Power, PASS and GPower.  These programs have a friendly point-and-click interface and will do power analyses for things like correlations, OLS regression and logistic regression.  We have also started using Optimal Design for repeated measures, longitudinal and multilevel designs. We should note that Sample Power is a stand-alone program that is sold by SPSS; it is not part of SPSS Base or an add-on module.  PASS can be purchased directly from NCSS at http://www.ncss.com/index.htm . GPower (please see GPower for details) and Optimal Design (please see http://sitemaker.umich.edu/group-based/home for details) are free.

Several general use stat packages also have procedures for calculating power.  SAS has proc power , which has a lot of features and is pretty nice.  Stata has the sampsi command, as well as many user-written commands, including fpower , powerreg and aipe (written by our IDRE statistical consultants).  Statistica has an add-on module for power analysis.  There are also many programs online that are free.

For more advanced/complicated analyses, Mplus is a good choice.  It will allow you to do Monte Carlo simulations, and there are some examples at http://www.statmodel.com/power.shtml and http://www.statmodel.com/ugexcerpts.shtml .

Most of the programs that we have mentioned do roughly the same things, so when selecting a power analysis program, the real issue is your comfort; all of the programs require you to provide the same kind of information.

Multiplicity

This issue of multiplicity arises when a researcher has more than one outcome of interest in a given study.  While it is often good methodological practice to have more than one measure of the response variable of interest, additional response variables mean more statistical tests need to be conducted on the data set, and this leads to question of experimentwise alpha control. Returning to our example of drug A and placebo, if we have only one response variable, then only one t test is needed to test our hypothesis.  However, if we have three measures of our response variable, we would want to do three t tests, hoping that each would show results in the same direction.  The question is how to control the Type I error (AKA false alarm) rate.  Most researchers are familiar with Bonferroni correction, which calls for dividing the prespecified alpha level (usually .05) by the number of tests to be conducted.  In our example, we would have .05/3 = .0167.  Hence, .0167 would be our new critical alpha level, and statistics with a p-value greater than .0167 would be classified as not statistically significant.  It is well-known that the Bonferroni correction is very conservative; there are other ways of adjusting the alpha level.

Afterthoughts:  A post-hoc power analysis

In general, just say “No!” to post-hoc analyses.  There are many reasons, both mechanical and theoretical, why most researchers should not do post-hoc power analyses.  Excellent summaries can be found in Hoenig and Heisey (2001) The Abuse of Power:  The Pervasive Fallacy of Power Calculations for Data Analysis and Levine and Ensom (2001) Post Hoc Power Analysis:  An Idea Whose Time Has Passed? .  As Hoenig and Heisey show, power is mathematically directly related to the p-value; hence, calculating power once you know the p-value associated with a statistic adds no new information.  Furthermore, as Levine and Ensom clearly explain, the logic underlying post-hoc power analysis is fundamentally flawed.

However, there are some things that you should look at after your study is completed.  Have a look at the means and standard deviations of your variables and see how close they are (or are not) from the values that you used in the power analysis.  Many researchers do a series of related studies, and this information can aid in making decisions in future research.  For example, if you find that your outcome variable had a standard deviation of 7, and in your power analysis you were guessing it would have a standard deviation of 2, you may want to consider using a different measure that has less variance in your next study.

The point here is that in addition to answering your research question(s), your current research project can also assist with your next power analysis.

Conclusions

Conducting research is kind of like buying a car.  While buying a car isn’t the biggest purchase that you will make in your life, few of us enter into the process lightly.  Rather, we consider a variety of things, such as need and cost, before making a purchase.  You would do your research before you went and bought a car, because once you drove the car off the dealer’s lot, there is nothing you can do about it if you realize this isn’t the car that you need.  Choosing the type of analysis is like choosing which kind of car to buy.  The number of subjects is like your budget, and the model is like your expenses.  You would never go buy a car without first having some idea about what the payments will be.  This is like doing a power analysis to determine approximately how many subjects will be needed.  Imagine signing the papers for your new Maserati only to find that the payments will be twice your monthly take-home pay.  This is like wanting to do a multilevel model with a binary outcome, 10 predictors and lots of cross-level interactions and realizing that you can’t do this with only 50 subjects.  You don’t have enough “currency” to run that kind of model.  You need to find a model that is “more in your price range.”  If you had $530 a month budgeted for your new car, you probably wouldn’t want exactly $530 in monthly payments. Rather you would want some “wiggle-room” in case something cost a little more than anticipated or you were running a little short on money that month. Likewise, if your power analysis says you need about 300 subjects, you wouldn’t want to collect data on exactly 300 subjects.  You would want to collect data on 300 subjects plus a few, just to give yourself some “wiggle-room” just in case.

Don’t be afraid of what you don’t know.  Get in there and try it BEFORE you collect your data.  Correcting things is easy at this stage; after you collect your data, all you can do is damage control.  If you are in a hurry to get a project done, perhaps the worst thing that you can do is start collecting data now and worry about the rest later.  The project will take much longer if you do this than if you do what we are suggesting and do the power analysis and other planning steps.  If you have everything all planned out, things will go much smoother and you will have fewer and/or less intense panic attacks.  Of course, some thing unexpected will always happen, but it is unlikely to be as big of a problem.  UCLA researchers are always welcome and strongly encouraged to come into our walk-in consulting and discuss their research before they begin the project.

Power analysis = planning.  You will want to plan not only for the test of your main hypothesis, but also for follow-up tests and tests of secondary hypotheses.  You will want to make sure that “confirmation” checks will run as planned (for example, checking to see that interrater reliability was acceptable).  If you intend to use imputation methods to address missing data issues, you will need to become familiar with the issues surrounding the particular procedure as well as including any additional variables in your data collection procedures.  Part of your planning should also include a list of the statistical tests that you intend to run and consideration of any procedure to address alpha inflation issues that might be necessary.

The number output by any power analysis program is often just a starting point of thought more than a final answer to the question of how many subjects will be needed.  As we have seen, you also need to consider the purpose of the study (coefficient different from 0, precise point estimate, replication), the type of statistical test that will be used (t-test versus maximum likelihood technique), the total number of statistical tests that will be performed on the data set, genearlizability from the sample to the population, and probably several other things as well.

The take-home message from this seminar is “do your research before you do your research.”

Anderson, N. H.  (2001).  Empirical Direction in Design and Analysis.  Mahwah, New Jersey:  Lawrence Erlbaum Associates.

Bausell, R. B. and Li, Y.  (2002).  Power Analysis for Experimental Research:  A Practical Guide for the Biological, Medical and Social Sciences.  Cambridge University Press, New York, New York.

Bickman, L., Editor.  (2000).  Research Design:  Donald Campbell’s Legacy, Volume 2.  Thousand Oaks, CA:  Sage Publications.

Bickman, L., Editor.  (2000).  Validity and Social Experimentation. Thousand Oaks, CA:  Sage Publications.

Campbell, D. T. and Russo, M. J.  (2001).  Social Measurement. Thousand Oaks, CA:  Sage Publications.

Campbell, D. T. and Stanley, J. C.  (1963).  Experimental and Quasi-experimental Designs for Research.  Reprinted from Handbook of Research on Teaching .  Palo Alto, CA:  Houghton Mifflin Co.

Chen, P. and Popovich, P. M.  (2002).  Correlation: Parametric and Nonparametric Measures.  Thousand Oaks, CA:  Sage Publications.

Cohen, J. (1988).  Statistical Power Analysis for the Behavioral Sciences, Second Edition.  Hillsdale, New Jersey:  Lawrence Erlbaum Associates.

Cook, T. D. and Campbell, D. T.  Quasi-experimentation:  Design and Analysis Issues for Field Settings.  (1979).  Palo Alto, CA: Houghton Mifflin Co.

Graham, J. W., Cumsille, P. E., and Elek-Fisk, E. (2003). Methods for handling missing data. In J. A. Schinka and W. F. Velicer (Eds.), Handbook of psychology (Vol. 2, pp. 87-114). New York: Wiley.

Green, S. B.  (1991).  How many subjects does it take to do a regression analysis?  Multivariate Behavioral Research, 26(3) , 499-510.

Hoenig, J. M. and Heisey, D. M.  (2001).  The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis.  The American Statistician, 55(1) , 19-24.

Kelley, K and Maxwell, S. E.  (2003).  Sample size for multiple regression:  Obtaining regression coefficients that are accurate, not simply significant.  Psychological Methods, 8(3) , 305-321.

Keppel, G. and Wickens, T. D. (2004).  Design and Analysis:  A Researcher’s Handbook, Fourth Edition.  Pearson Prentice Hall:  Upper Saddle River, New Jersey.

Kline, R. B. Beyond Significance  (2004).  Beyond Significance Testing:  Reforming Data Analysis Methods in Behavioral Research. American Psychological Association:  Washington, D.C.

Levine, M., and Ensom M. H. H.  (2001).  Post Hoc Power Analysis: An Idea Whose Time Has Passed?  Pharmacotherapy, 21(4) , 405-409.

Lipsey, M. W. and Wilson, D. B.  (1993).  The Efficacy of Psychological, Educational, and Behavioral Treatment:  Confirmation from Meta-analysis.  American Psychologist, 48(12) , 1181-1209.

Long, J. S. (1997).  Regression Models for Categorical and Limited Dependent Variables.  Thousand Oaks, CA:  Sage Publications.

Maxwell, S. E.  (2000).  Sample size and multiple regression analysis.  Psychological Methods, 5(4) , 434-458.

Maxwell, S. E. and Delany, H. D.  (2004).  Designing Experiments and Analyzing Data:  A Model Comparison Perspective, Second Edition. Lawrence Erlbaum Associates, Mahwah, New Jersey.

Murphy, K. R. and Myors, B.  (2004).  Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. Mahwah, New Jersey:  Lawrence Erlbaum Associates.

Publication Manual of the American Psychological Association, Fifth Edition. (2001).  Washington, D.C.:  American Psychological Association.

Sedlmeier, P. and Gigerenzer, G.  (1989).  Do Studies of Statistical Power Have an Effect on the Power of Studies?  Psychological Bulletin, 105(2) , 309-316.

Shadish, W. R., Cook, T. D. and Campbell, D. T.  (2002). Experimental and Quasi-experimental Designs for Generalized Causal Inference. Boston:  Houghton Mifflin Co.

Stratton, I. M. and Neil, A.  (2004).  How to ensure your paper is rejected by the statistical reviewer.  Diabetic Medicine , 22, 371-373.

Tversky, A. and Kahneman, D.  (1971).  Belief in the Law of Small Numbers.  Psychological Bulletin, 76(23) , 105-110.

Webb, E., Campbell, D. T., Schwartz, R. D., and Sechrest, L.  (2000). Unobtrusive Measures, Revised Edition.  Thousand Oaks, CA:  Sage Publications.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2024 UC REGENTS

The University of Manchester home

Power calculation

Steve Roberts, Centre for Biostatistics.

What are power and sample size calculations?

Choosing an appropriate size for an experimental study is one major component in the design of any research study. The study needs to be large enough to have an acceptable chance of answering the research question, but not larger than necessary.

For studies with explicit pre-specified hypotheses it is in principal possible to estimate the probability that a study of a given size will answer the question - the power of the study - and many reviewers, funders and ethics committees ask for such calculations.

Whilst power calculations can be helpful, they are not a panacea. The parameters required to estimate study power are rarely known with great precision. Any study design is a compromise between gaining information and practical constraints of time, availability of participants or samples and funding.

Power calculations become part of the iterative dialogue that leads to the eventual compromise study design, and consideration of the uncertainty of the assumptions and their impact contributes to understanding the robustness of the design. There is no right size for any given experiment.

Power calculations are often not appropriate – there may not be a simple hypothesis (eg in a prevalence study, or a pilot study), or for early stage work there may be no information one can use to base a power calculation on.

Once you have agreed your compromise you need to communicate your reasoning and justify your choice and power calculations are part (but only part) of this justification.

Graph: Power v sample size - two groups: unpaired t-test

  • PS: Power and Sample Size Calculation - Vanderbilt University
  • Stats Direct
  • NQuery Advisor
  • Power and Sample Size Programs - University of California

Your friendly local statistician is the person to talk to. For complex designs you need to find a good statistical collaborator who understands your particular application.

Download PDF slides of the presentation ' What is power calculation? '

Statistical Power: What It Is and How It Is Used in Practice

Statistical power is a measure of study efficiency, calculated before conducting the study to estimate the chance of discovering a true effect rather than obtaining a false negative result, or worse, overestimating the effect by detecting the noise in the data.

Here are 5 seemingly different, but actually similar, ways of describing statistical power:

Definition #1: Statistical power is the probability of detecting an effect, assuming that one exists. The effect can be, for example, an association between 2 variables of interest.

Definition #2: Statistical power is the probability of achieving statistical significance (typically a p-value < 0.05), when testing a real phenomenon or effect.

Definition #3: Statistical power is the probability of rejecting the null hypothesis H 0 , assuming that the alternative hypothesis H A is true.

Definition #4: Statistical power is the probability of making the correct conclusion of rejecting a false null hypothesis.

Definition #5: Statistical power is the probability of avoiding a type II error. A type II error happens if we fail to reject the null hypothesis when we actually should.

Factors that influence statistical power

The factors that affect the power of a study are:

  • The sample size: A larger sample reduces the noise in the data, and therefore increases the chance of detecting an effect, assuming that one exists — so increasing the sample size will increase statistical power.
  • The effect size: The larger the effect size (for instance, the difference between 2 treatments A and B), the easier it would be to detect it — so increasing the effect size (e.g. by increasing the dose of the treatment) will increase statistical power.
  • The level of statistical significance α: The level of statistical significance is generally set at 0.05. The lower the level of statistical significance, the harder it would be to detect an effect — so choosing a higher level of statistical significance will increase statistical power.

Here’s an example:

Information needed to calculate the statistical power

How statistical power is used in research

In the previous section, we have established that there is a relationship between: power, sample size, effect size and level of statistical significance.

Any of these quantities can be calculated if we know the values of the other 3. But, since the level of statistical significance is usually fixed and not calculated, we are left with 3 types of analyses that we can perform:

  • A-priori power analysis: Which consists of calculating the minimum sample size required before conducting the study.
  • Sensitivity analysis: Which consists of calculating the smallest detectable effect size after conducting the study.
  • Post-hoc power analysis: Which consists of calculating the statistical power after conducting the study.

Note that these calculations can be quite complicated to do by hand, therefore, I recommend using G*Power, R, or other statistical software to perform these analyses.

1. A-priori power analysis

An a-priori power analysis is done before conducting the study, with the objective of calculating the minimum sample size needed to detect an effect with a certain specified power.

In order to perform an a-priori analysis, we need to specify:

  • The statistical power: A fixed number that we aim to achieve (most researchers consider 80% a good threshold to aim for).
  • The level of statistical significance: The alpha level that we will use in the study (usually set at 0.05).
  • The effect size: A guess of what the true effect size is in the population of interest.

In practice, guessing the true effect size in the population before conducting the study is quite difficult, since if we already know the size of the effect, we wouldn’t be doing the study in the first place.

Common solutions to this problem are:

Solution #1: Perform a pilot study in order to determine the effect size.

The problem with this solution is that the estimate of the effect size will be highly unstable, since the sample size in the pilot study is very small by definition.

Solution #2: Use the smallest effect size instead of the true effect size.

Specifying the smallest effect size that actually matters will be easier than guessing the true effect size in the population. Using this approach, the sample size calculated in the a-priori analysis is said to be “the minimum sample size needed to detect the smallest relevant effect with an 80% power”.

2. Sensitivity analysis

A sensitivity analysis is done after the study is finished, with the objective of calculating the smallest effect that the study can detect given a certain specified power.

In order to perform a sensitivity analysis, we need to specify:

  • The level of statistical significance: The alpha level used in the study (usually set at 0.05).
  • The sample size: The number of participants included in the study.

3. Post-hoc power analysis

A post-hoc power analysis is done after the study is finished, with the objective of calculating the statistical power in order to determine the chance that a study had to detect a true effect.

In order to perform a post-hoc analysis, we need to specify:

  • The effect size: As reported in the study.
  • The level of statistical significance: The alpha level chosen in the study (usually set at 0.05).

Consequences of low statistical power

A low-power study has a high probability of missing out on a scientific discovery, and therefore will have a hard time being approved by funding agencies.

But the problem with a low-power study is more serious than having a high risk of missing out on detecting a true effect. A low-power study also has a high risk of publishing noise — an overestimation of the true effect. In other words, its results will be biased and unlikely to replicate.

  • Gelman A. Regression and Other Stories . 1st edition. Cambridge University Press; 2020.
  • Triola M. Essentials of Statistics . 6th edition. Pearson; 2018.
  • Bruce P, Bruce A. Practical Statistics for Data Scientists: 50 Essential Concepts . 1st edition. O’Reilly Media; 2017.
  • Improving Your Statistical Questions. Coursera. Accessed October 30, 2021. https://www.coursera.org/learn/improving-statistical-questions

Further reading

  • P-Value: A Simple Explanation for Non-Statisticians
  • 7 Tricks to Get Statistically Significant p-Values
  • Statistical Software Popularity in 40,582 Research Papers
  • Checking the Popularity of 125 Statistical Tests and Models
  • How it works

researchprospect post subheader

Statistical Power – A Complete Guide

Published by Carmen Troy at September 20th, 2021 , Revised On August 3, 2023

Definition 

In research , a researcher sometimes might or might not find something ‘ statistically significant ‘ in the data that has been collected. The likelihood of something statistically significant being found in data is called statistical power. It is, of course, a statistical measure. Statistical power is only accounted for when it is known that a certain effect exists in the research population.

In other words, statistical power is a decision by a researcher/statistician that results of a study/experiment can be explained by factors other than chance alone. The statistical power of a study is also referred to as its sensitivity in some cases. 

Explaining Statistical Power

Statistical power is denoted as 1 – β . 

What this means is that if the power is high, the probability of claiming that there is no effect when there is one becomes low. In other words, when the power is high, that implies the researcher has claimed there is no statistical significance in the data, even though there is. Contrary to this, when the power is low, that means the researcher has successfully detected the statistical significance that exists in the data.

Importance of Statistical Power

Firstly, high statistical power helps researchers go back to the sample, re-evaluate, re-analyze, and so on. Second, statistical power analysis may be used to put an estimate on the minimum sample size required for an experiment, research, etc. 

Standard Statistical Power

Researchers generally claim statistical power ought to be equal to or greater than. 8. in other words, one should have an 80% (or greater) chance of detecting a statistically significant effect in a sample when there is one.

This means the degree of confidence a researcher has about the result not occurring by chance alone. A confidence level commonly present in some statistical tests is 95%, which results in a p-value of 5%.

Factors Influencing Statistical Power

Four main factors might influence statistical power . They are:

  • Sample size – has a direct relationship with statistical power; the bigger the sample size, the higher the power and vice versa (given other parameters are kept constant).
  • Data collection method – certain data collection methods, such as stratified sampling, cause low error variance and, consequently, increase power. To increase the power even further, a different design can be adopted. Bear in mind, though, that statistical power calculation, as well as its analysis, depends on the kind of experimental design used.
  • The difference between group means – has a direct relationship with power; the high the difference, the lower and power and vice versa. 
  • Variability among subjects – has an inverse relationship with power; the larger the variance , the smaller the power.

Tip: Understand statistical power , clearly explained, in under 10 minutes!

Get statistical analysis help at an affordable price

  • An expert statistician will complete your work
  • Rigorous quality checks
  • Confidentiality and reliability
  • Any statistical software of your choice
  • Free Plagiarism Report

statistical analysis help at an affordable price

How to Increase Statistical Power

To increase statistical power in a research/experiment, 5 different factors can be altered, and they are: 

  • Increasing alpha α: Setting it to the standard value used in most researches, which is .05. However, it should not be increased so much so that a true null hypothesis , is rejected. That would be a type I error. 
  • Selecting a larger value for differences: This increases power because it is easier to identify larger differences in population means.
  • Decreasing random error: There are two ways to decrease random error. Firstly, a researcher can simply try to not make it in the first place. Or, if it has been made, explain it through a control variable so it becomes explained variation. Variation accounted for, that is. 
  • Secondly, a researcher can use some kind of repeated measuring design. Since there are multiple measurements on a subject, variance in error can be separated from variance in subjects. 
  • Increase sample size: This is a very practical and commonly-used method to increase statistical power. It provides further information about the sample population. And that, in turn, increases the power. 
  • Using a directional/one-tailed hypothesis: This kind of hypothesis has more power to help pinpoint the statistical significance. At the same time, though, a directional hypothesis can also decrease power in some cases. 

Statistical Power and P-value

With regards to whether they are the same or not, significance ( p-value ) is the probability that one rejects a null but true hypothesis. And power is the probability of rejecting a null and false hypothesis. Furthermore, power and p-values both measure the same thing: the probability of rejecting a null hypothesis. 

Whether that hypothesis is false or true is what differentiates the two (among other factors). In the same context, just as an 80% chance or greater is considered a ‘good’ statistical power, a p-value of 5% or lower is considered statistically significant.

What ‘Isn’t’ Statistically Significant

If there is such a thing as statistical power and it corresponds to statistical significance , is there such a thing as statistically NOT significant? As it turns out, yes, there is. It is used for results where differences as large as that is, equal to (or larger than), the observed difference is expected to occur by chance more than 1 out of 20 times. In other words, the p-value is > 0.05 for the data to be statistically not significant.

Calculating Statistical Power

Calculating statistical power is a complex method and is usually done using computers and other online calculators. The kind of method or tool to use is determined by the type of tests, sample, data collection method and so on.

PowerAndSampleSize provides at least 19 interactive calculators for statistical power calculations. There are also other online platforms offering tools to calculate statistical power. For instance, on StatPages , numerous types of tools are offered to determine power, depending on sample size, type of test, and many other factors.

Almost every researcher and/or statistician can make use of these online tools to calculate statistical power by inputting different parameters into the calculators. The entire process is called power analysis . And even for power analysis, different software is used, the most common ones being SAS and PASS .

Furthermore, calculating sample size with statistical power analysis is also a part of this process.  

What does underpowered statistics mean?

While reading through statistical power, mention of ‘underpowered statistics’ might be present. The term is mainly used for samples in research. An ‘underpowered’ study is one that lacks a significantly large sample size. Or rather, it is not large enough to gauge answers to the research question(s) at hand. Contrarily, an ‘overpowered’ research study is one with a very large sample size. Size is so large that more resources might be needed to work with it.

How do type I and type II errors fit in with statistical power?

In simple terms, a low statistical power implies there is a significant chance of committing Type II error, that is, a false negative hypothesis is accepted. On the other hand, a high statistical power implies a small chance of committing Type II error.

What are the four main elements of statistical power?

The four elements of statistical power are the same as the four ‘factors’ influencing statistical power. And they are sample size, the method used to collect data, the difference between group means, and variability among subjects. The first three have a direct relationship with statistical power; one increases and so does the other. The last factor, variability among subjects, has an inverse relationship with statistical power.

You May Also Like

This comprehensive guide introduces what mean is, how it’s calculated and represented and its importance, along with some common examples.

Statistical significance is described as the measure of the null hypothesis being plausible as compared to the acceptable level of vagueness regarding the true answer.

One way ANOVA test is a kind of ANOVA that aims to find if there is a significant statistical difference among the means of two or more independent groups.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Statistical power and underpowered statistics ¶

We’ve seen that it’s possible to miss a real effect simply by not taking enough data. In most cases, this is a problem: we might miss a viable medicine or fail to notice an important side-effect. How do we know how much data to collect?

Statisticians provide the answer in the form of “statistical power.” The power of a study is the likelihood that it will distinguish an effect of a certain size from pure luck. A study might easily detect a huge benefit from a medication, but detecting a subtle difference is much less likely. Let’s try a simple example.

Suppose a gambler is convinced that an opponent has an unfair coin: rather than getting heads half the time and tails half the time, the proportion is different, and the opponent is using this to cheat at incredibly boring coin-flipping games. How to prove it?

You can’t just flip the coin a hundred times and count the heads. Even with a perfectly fair coin, you don’t always get fifty heads:

_images/binomial.png

This shows the likelihood of getting different numbers of heads, if you flip a coin a hundred times.

You can see that 50 heads is the most likely option, but it’s also reasonably likely to get 45 or 57. So if you get 57 heads, the coin might be rigged, but you might just be lucky.

Let’s work out the math. Let’s say we look for a p value of 0.05 or less, as scientists typically do. That is, if I count up the number of heads after 10 or 100 trials and find a deviation from what I’d expect – half heads, half tails – I call the coin unfair if there’s only a 5% chance of getting a deviation that size or larger with a fair coin. Otherwise, I can conclude nothing: the coin may be fair, or it may be only a little unfair. I can’t tell.

So, what happens if I flip a coin ten times and apply these criteria?

_images/power-curve-10.png

This is called a power curve. Along the horizontal axis, we have the different possibilities for the coin’s true probability of getting heads, corresponding to different levels of unfairness. On the vertical axis is the probability that I will conclude the coin is rigged after ten tosses, based on the p value of the result.

You can see that if the coin is rigged to give heads 60% of the time, and I flip the coin 10 times, I only have a 20% chance of concluding that it’s rigged. There’s just too little data to separate rigging from random variation. The coin would have to be incredibly biased for me to always notice.

But what if I flip the coin 100 times?

_images/power-curve-100.png

Or 1,000 times?

_images/power-curve-1000.png

With one thousand flips, I can easily tell if the coin is rigged to give heads 60% of the time. It’s just overwhelmingly unlikely that I could flip a fair coin 1,000 times and get more than 600 heads.

The power of being underpowered ¶

After hearing all this, you might think calculations of statistical power are essential to medical trials. A scientist might want to know how many patients are needed to test if a new medication improves survival by more than 10%, and a quick calculation of statistical power would provide the answer. Scientists are usually satisfied when the statistical power is 0.8 or higher, corresponding to an 80% chance of concluding there’s a real effect.

However, few scientists ever perform this calculation, and few journal articles ever mention the statistical power of their tests.

Consider a trial testing two different treatments for the same condition. You might want to know which medicine is safer, but unfortunately, side effects are rare. You can test each medicine on a hundred patients, but only a few in each group suffer serious side effects.

Obviously, you won’t have terribly much data to compare side effect rates. If four people have serious side effects in one group, and three in the other, you can’t tell if that’s the medication’s fault.

Unfortunately, many trials conclude with “There was no statistically significant difference in adverse effects between groups” without noting that there was insufficient data to detect any but the largest differences. 57 And so doctors erroneously think the medications are equally safe, when one could well be much more dangerous than the other.

You might think this is only a problem when the medication only has a weak effect. But no: in one sample of studies published between 1975 and 1990 in prestigious medical journals, 27% of randomized controlled trials gave negative results, but 64% of these didn’t collect enough data to detect a 50% difference in primary outcome between treatment groups. Fifty percent! Even if one medication decreases symptoms by 50% more than the other medication, there’s insufficient data to conclude it’s more effective. And 84% of the negative trials didn’t have the power to detect a 25% difference. 17 , 4 , 11 , 16

In neuroscience the problem is even worse. Suppose we aggregate the data collected by numerous neuroscience papers investigating one particular effect and arrive at a strong estimate of the effect’s size. The median study has only a 20% chance of being able to detect that effect. Only after many studies were aggregated could the effect be discerned. Similar problems arise in neuroscience studies using animal models – which raises a significant ethical concern. If each individual study is underpowered, the true effect will only likely be discovered after many studies using many animals have been completed and analyzed, using far more animal subjects than if the study had been done properly the first time. 12

That’s not to say scientists are lying when they state they detected no significant difference between groups. You’re just misleading yourself when you assume this means there is no real difference. There may be a difference, but the study was too small to notice it.

Let’s consider an example we see every day.

The wrong turn on red ¶

In the 1970s, many parts of the United States began to allow drivers to turn right at a red light. For many years prior, road designers and civil engineers argued that allowing right turns on a red light would be a safety hazard, causing many additional crashes and pedestrian deaths. But the 1973 oil crisis and its fallout spurred politicians to consider allowing right turn on red to save fuel wasted by commuters waiting at red lights.

Several studies were conducted to consider the safety impact of the change. For example, a consultant for the Virginia Department of Highways and Transportation conducted a before-and-after study of twenty intersections which began to allow right turns on red. Before the change there were 308 accidents at the intersections; after, there were 337 in a similar length of time. However, this difference was not statistically significant, and so the consultant concluded there was no safety impact.

Several subsequent studies had similar findings: small increases in the number of crashes, but not enough data to conclude these increases were significant. As one report concluded,

There is no reason to suspect that pedestrian accidents involving RT operations (right turns) have increased after the adoption of [right turn on red]…

Based on this data, more cities and states began to allow right turns at red lights. The problem, of course, is that these studies were underpowered. More pedestrians were being run over and more cars were involved in collisions, but nobody collected enough data to show this conclusively until several years later, when studies arrived clearly showing the results: significant increases in collisions and pedestrian accidents (sometimes up to 100% increases). 27 , 48 The misinterpretation of underpowered studies cost lives.

Logo

There's a book!

The revised and expanded Statistics Done Wrong , with three times as many statistical errors and examples, is available in print and eBook! An essential book for any scientist, data scientist, or statistician.

(or use Amazon , IndieBound , Book Depository , or BN .)

(or Deutsch , 한국어 , Italiano , 中文 (简体) , 中文 (繁體) , 日本語 .)

Table Of Contents

  • Introduction
  • An introduction to data analysis
  • The power of being underpowered
  • The wrong turn on red
  • Pseudoreplication: choose your data wisely
  • The p value and the base rate fallacy
  • When differences in significance aren’t significant differences
  • Stopping rules and regression to the mean
  • Researcher freedom: good vibrations?
  • Everybody makes mistakes
  • Hiding the data
  • What have we wrought?
  • What can be done?
  • Bibliography

Quick search

  • Next chapter: Pseudoreplication: choose your data wisely

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

[Types of studies, power of study and choice of test]

Affiliation.

In scientific research different study designs are used to provide answers to different questions (e.g., What is the cause of a disease? Is medication A better than medication B? What is the prevalence of a disease?). Cross-sectional study is appropriate when we are interested in the prevalence of a disease or certain feature, cohort study provides answer to the incidence or relative risk, whereas case-control study is conducted to obtain odds ratio. While these studies are observational, randomized control trial is the only experimental study design which can give answers to questions of causative relationships between variables as where researcher himself or herself assigns subjects to different groups (experimental or control). Power of a study represents the probability of finding a difference that exists in a population. It depends on the chosen level of significance, difference that we look for (effect size), variability of the measured variables, and sample size. Most often sample size is the only element under direct researcher's control. Therefore, methods have been developed to approximate the necessary sample size to obtain the desired power. Choosing appropriate statistical procedure is a complicated task that demands specific knowledge and training. To determine an appropriate statistical procedure, it is important to keep in mind the following characteristics of the study: (1) purpose of the study (description, investigation of differences or associations); (2) type of sample (dependent or independent samples); (3) number of groups (one, two, or more); (4) level of measurement (nominal, ordinal, interval or ratio); and (5) distribution of the data (normal or not).

PubMed Disclaimer

Similar articles

  • [Comparison of two or more samples of quantitative data]. Ivanković D, Tiljak MK. Ivanković D, et al. Acta Med Croatica. 2006;60 Suppl 1:37-46. Acta Med Croatica. 2006. PMID: 16526306 Croatian.
  • Proportions, odds, and risk. Sistrom CL, Garvan CW. Sistrom CL, et al. Radiology. 2004 Jan;230(1):12-9. doi: 10.1148/radiol.2301031028. Radiology. 2004. PMID: 14695382 Review.
  • Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovascular mortality in the Netherlands: the NLCS-AIR study. Brunekreef B, Beelen R, Hoek G, Schouten L, Bausch-Goldbohm S, Fischer P, Armstrong B, Hughes E, Jerrett M, van den Brandt P. Brunekreef B, et al. Res Rep Health Eff Inst. 2009 Mar;(139):5-71; discussion 73-89. Res Rep Health Eff Inst. 2009. PMID: 19554969
  • A review of cohort study design for cardiovascular nursing research. Hood MN. Hood MN. J Cardiovasc Nurs. 2009 Nov-Dec;24(6):E1-9. doi: 10.1097/JCN.0b013e3181ada743. J Cardiovasc Nurs. 2009. PMID: 19858946 Review.
  • Statistical power and estimation of the number of required subjects for a study based on the t-test: a surgeon's primer. Livingston EH, Cassidy L. Livingston EH, et al. J Surg Res. 2005 Jun 15;126(2):149-59. doi: 10.1016/j.jss.2004.12.013. J Surg Res. 2005. PMID: 15919413
  • Bias in research. Simundić AM. Simundić AM. Biochem Med (Zagreb). 2013;23(1):12-5. doi: 10.11613/bm.2013.003. Biochem Med (Zagreb). 2013. PMID: 23457761 Free PMC article.

Publication types

  • Search in MeSH
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

STORM TRACKER 2 WEATHER ALERT image

"We could get a month's worth of rain in about 12 hours overnight Saturday," KATU meteorologist Dave Salesky warns of the anticipated storm

  • CURRENT FORECAST
  • SEND US STORM PICS

OSU research on electrical mapping aids U.S. power grid protection

by Allison Gutleber

Aurora borealis up Low Pass Oregon. Photo Courtesy Jennah Litecky

PORTLAND, Ore. (KATU) — OSU Researchers completed an electrical mapping project that helps protect the U.S. power grid.

Adam Schultz, a Professor of Geophysics at OSU and a lead researcher on the electrical mapping project, says the difference about this project is that the data has been made public immediately.

When the project began in 2006, researchers deployed instruments across the continental U.S. every 70 kilometers to survey the energy below the surface. Large equipment was moved to stations for several months to capture data.

The original goal of the project was to determine the structure and evolution of the North American continent. The original goal was achieved but over time there was a new opportunity that presented itself.

"It became obvious that the data we were collecting was actually very important to the electric power industry because what happens is, for example, recently we had a big aurora in May that you could see in Oregon and actually this past week as well," said Schultz.

Space weather impacts such as the geomagnetic storms we’ve seen recently can put the grid under stress and it was not intended to handle it.

Shultz explains in extreme cases this can cause parts of the power grid to go down. The last time they saw this was back in 1989 which did impact the providence of Quebec and northern states in the U.S.

"The data will help the power industry determine the intensity of the geomagnetic storms we have recently encountered with the Aurora Borealis. So this is a big issue and our data will help the power industry determine the intensity of the geomagnetic induced currents that they can expect," Schultz said.

Schultz adds Homeland Security found impacts on critical infrastructure.

"So we are helping contribute to mitigating risk to all of the infrastructure that everyone depends on in this country," he says.

  • Election 2024
  • Entertainment
  • Newsletters
  • Photography
  • AP Buyline Personal Finance
  • AP Buyline Shopping
  • Press Releases
  • Israel-Hamas War
  • Russia-Ukraine War
  • Global elections
  • Asia Pacific
  • Latin America
  • Middle East
  • Delegate Tracker
  • AP & Elections
  • 2024 Paris Olympic Games
  • Auto Racing
  • Movie reviews
  • Book reviews
  • Financial Markets
  • Business Highlights
  • Financial wellness
  • Artificial Intelligence
  • Social Media

US, India, Russia, Japan are building out wind power much too slowly for climate change, report says

Image

FILE - Windmills are seen at Ilocos Norte province, northern Philippines on Monday, May 6, 2024. (AP Photo/Aaron Favila)

  • Copy Link copied

The world is falling well short of a promise made at global climate talks last year to triple the amount of wind power, according to a report by an energy think tank released Thursday.

Last December, countries at the U.N. COP28 climate conference committed to tripling all renewable electricity by 2030. Wind power specifically must triple to achieve that, according to the International Energy Agency and others.

Examining national targets set by 70 countries that account for 99% of existing wind power, Ember, an energy nonprofit based in London, projects that over the next six years, wind power will double, not triple, compared to the 2022 baseline.

The report looked at wind turbines both onshore and offshore.

“Governments are lacking ambition on wind, and especially onshore wind,” said Katye Altieri, electricity analyst at Ember. “Wind is not getting enough attention.”

Wind often blows hardest when the sun is not high in the sky, making it a good complement to solar energy in efforts to make clean electricity 24 hours per day.

The report also measured countries’ progress towards their own goals. The U.S. ranked worst on this, falling 100 gigawatts short, or enough to power more than 30 million homes. The target used for the U.S. comes from the National Renewable Energy Laboratory, part of the Energy Department. In an email exchange, the department declined to comment.

Image

The second-biggest gap between national targets and wind projects in development was in India, at more than 30 gigawatts. Despite having considerable wind potential , only 4% of electricity in India comes from wind, said Altieri. Several officials at India’s energy ministry did not reply to emailed requests for comment.

Ranking best by this measure were Brazil and Finland which are on track to exceed their wind targets by 15 and 11 gigawatts respectively. They were among just 10 countries due to surpass their goals. Seven of the 10 were in Europe, including Turkey.

Brian O’Callaghan, a lead researcher at the Smith School of Enterprise and the Environment at the University of Oxford, England, pointed out technology is key. Wind is stronger at altitude, so taller turbines can produce more electricity.

The past two decades have seen “dramatic technological improvements leading to taller turbines, especially offshore,” he said.

That means there is a big opportunity for countries willing to grasp it.

Wind speed also matters. Doubling wind speed results in an eightfold increase in power.

“Most coastal nations have barely tapped into their offshore wind resources,” he said. “The UK is a prime example.”

Some countries have strong wind but have barely begun building out wind turbines. Altieri pointed to Russia, Japan and South Korea in this category.

Russia has among the greatest wind potential of any country, according to the NREL, but Ember said it generated less than 1% of its electricity from wind in 2023. John Reilly of the Massachusetts Institute of Technology, who’s studied energy policy and climate change for 45 years, said Russia is not committed to reducing greenhouse gas emissions.

“It has huge amounts of natural gas and coal, so there’s no real economic incentive for them to develop wind,” he said.

Russia’s Energy Ministry did not respond to emails seeking comment.

Like many islands, Japan is also very windy, but generates just over 1% of its power from wind, Altieri said.

“The ocean is very deep just a little bit offshore of Japan, so that makes it more difficult,” said Reilly. The country is also steeply mountainous, making it hard to place turbines, he said.

Heavy regulation in South Korea makes it difficult to build wind turbines and public opinion had slowed development further, he said. Worldwide, there has often been resistance to wind turbines.

Japan’s Trade and Economy Ministry did not respond to email requests for comment. South Korea’s Energy Agency could not be reached for comment.

More broadly, the tumbling price of solar power may help explain the comparative lack of interest in wind, said Reilly.

“When many of these big commitments were made,” he said, “wind looked like the cheapest renewable energy source.”

But since 2020 the price of solar has fallen dramatically, he said.

While some countries are lagging behind, the study’s lead author Altieri says there’s reasons for encouragement.

“Europe is doing great,” she said, and that’s with the North Sea, an incredible wind resource, barely tapped.

She predicted Europe and China will continue to be dominant in the expansion of electricity made from the wind.

———————————

The Associated Press’ climate and environmental coverage receives financial support from multiple private foundations. AP is solely responsible for all content. Find AP’s standards for working with philanthropies, a list of supporters and funded coverage areas at AP.org .

what is power in research study

Numbers, Facts and Trends Shaping Your World

Read our research on:

Full Topic List

Regions & Countries

  • Publications
  • Our Methods
  • Short Reads
  • Tools & Resources

Read Our Research On:

Majority of Americans support more nuclear power in the country

Diablo Canyon, the only operational nuclear power plant left in California, is seeking to extend operations past its scheduled decommissioning in 2025. (George Rose/Getty Images)

A majority of U.S. adults remain supportive of expanding nuclear power in the country, according to  a Pew Research Center survey from May . Overall, 56% say they favor more nuclear power plants to generate electricity. This share is statistically unchanged from last year.

A line chart showing that a majority of Americans continue to support more nuclear power in the U.S.

But the future of large-scale nuclear power in America is uncertain. While Congress recently passed a bipartisan act intended to ease the nuclear energy industry’s financial and regulatory challenges, reactor shutdowns continue to gradually outpace new construction.

Americans remain more likely to favor expanding solar power (78%) and wind power (72%) than nuclear power. Yet while support for solar and wind power has declined by double digits since 2020 – largely driven by drops in Republican support – the share who favor nuclear power has grown by 13 percentage points over that span.

When asked about the federal government’s role in encouraging the production of nuclear energy, Americans are somewhat split. On balance, more say the government should encourage (41%) than discourage (22%) this. But 36% say the government should not exert influence either way, according to a March 2023 Center survey .

To measure public attitudes toward the use of nuclear power in the United States, we analyzed data from Pew Research Center surveys. Most of the data comes from our survey of 8,638 U.S. adults conducted May 13-19, 2024.

Everyone who took part in the survey is a member of the Center’s American Trends Panel (ATP), an online survey panel that is recruited through national, random sampling of residential addresses. This way, nearly all U.S. adults have a chance of selection. The survey is weighted to be representative of the U.S. adult population by gender, race, ethnicity, partisan affiliation, education and other categories. Read more about the  ATP’s methodology .

Here are the survey  questions used for this analysis , along with responses, and its  methodology .

Links to related Center surveys, including their questions and methodologies, can be found throughout the post.

In addition, we tracked the number of U.S. nuclear power reactors over time by analyzing data from the International Atomic Energy Agency’s (IAEA)  Power Reactor Information System . The IAEA classifies a reactor as “operational” from the date of its first electrical grid connection to the date of its permanent shutdown. Reactors that face temporary outages are still categorized as operational. Annual totals exclude reactors that closed that year.

Views by gender

Attitudes on nuclear power production have long differed by gender.

In the May survey, men remain far more likely than women to favor more nuclear power plants to generate electricity in the United States (70% vs. 44%). This pattern holds true among adults in both political parties.

Views on nuclear energy differ by gender globally, too, according to a Center survey conducted from fall 2019 to spring 2020 . In 18 of the 20 places surveyed around the world (including the U.S.), men were more likely than women to favor using more nuclear power as a source of domestic energy.

Views by party

A dot plot showing that Republicans and Democrats are less divided on nuclear power than on fossil fuel sources.

Republicans are more likely than Democrats to favor expanding nuclear power to generate electricity in the U.S. Two-thirds of Republicans and Republican-leaning independents say they support this, compared with about half of Democrats and Democratic leaners.

Republicans have supported nuclear power in greater shares than Democrats each time this question has been asked since 2016.

The partisan gap in support for nuclear power (18 points) is smaller than those for other types of energy, including fossil fuel sources such as coal mining (48 points) and offshore oil and gas drilling (47 points).

Still, Americans in both parties now see nuclear power more positively than they did earlier this decade. While Democrats remain divided on the topic (49% support, 49% oppose), the share who favor expanding the energy source is up 12 points since 2020. Republican support has grown by 14 points over this period.

While younger Republicans generally tend to be more supportive of increasing domestic renewable energy sources than their older peers, the pattern reverses when it comes to nuclear energy. For example, Republicans under 30 are much more likely than those ages 65 and older to favor more solar panel farms in the U.S. (80% vs. 54%); there’s a similar gap over expanding wind power. But when it comes to expanding nuclear power, Republicans under 30 are 11 points less likely than the oldest Republicans to express support (61% vs. 72%).

A look at U.S. nuclear power reactors

An area chart showing that the number of U.S. nuclear power reactors gradually declined in past 3 decades.

The U.S. currently has 94 nuclear power reactors, including one that just began operating in Georgia this spring. Reactors collectively generated  18.6% of all U.S. electricity in 2023 , according to the U.S. Energy Information Administration.

About half of the United States’ nuclear power reactors (48) are in the South, while nearly a quarter (22) are in the Midwest. There are 18 reactors in the Northeast and six in the West, according to data from the International Atomic Energy Agency (IAEA).

The number of U.S. reactors has steadily fallen since peaking at 111 in 1990. Nine Mile Point-1, located in Scriba, New York, is the oldest U.S. nuclear power reactor still in operation. It first connected to the power grid in November 1969. Most of the 94 current reactors began operations in the 1970s (41) or 1980s (44), according to IAEA data. (The IAEA classifies reactors as “operational” from their first electrical grid connection to their date of permanent shutdown.)

Within the last decade, just three new reactors joined the power fleet. Three times as many shut down over the same timespan.

One of the many reasons nuclear power projects have dwindled in recent decades may be the perceived dangers following  nuclear accidents  in the U.S. and abroad. For example, the 2011  Fukushima Daiichi accident  led the Japanese government to greatly decrease its reliance on nuclear power and prompted other countries to  rethink their nuclear energy plans . High construction costs and radioactive waste storage issues are also oft-cited hurdles to nuclear energy advancement.

Still, many advocates say that nuclear power is key to reducing emissions from electricity generation. There’s been a recent flurry of interest in reviving decommissioned nuclear power sites, including the infamous Three Mile Island plant and the Palisades plant , the latter of which shuttered in 2022. Last year, California announced it would delay the retirement of its one remaining nuclear power plant until 2030. And just this summer, construction began on a new plant in Wyoming. It’s set to house an advanced sodium-cooled fast reactor, pending approval from the Nuclear Regulatory Commission .

Note: Here are the  questions used for the analysis , along with responses, and its  methodology . This is an update of a post first published March 23, 2022.

  • Climate, Energy & Environment
  • Partisanship & Issues
  • Political Issues

Download Rebecca Leppert's photo

Rebecca Leppert is a copy editor at Pew Research Center .

Download Brian Kennedy's photo

Brian Kennedy is a senior researcher focusing on science and society research at Pew Research Center .

Americans’ Extreme Weather Policy Views and Personal Experiences

U.s. adults under 30 have different foreign policy priorities than older adults, about 3 in 10 americans would seriously consider buying an electric vehicle, how americans view national, local and personal energy choices, electric vehicle charging infrastructure in the u.s., most popular.

901 E St. NW, Suite 300 Washington, DC 20004 USA (+1) 202-419-4300 | Main (+1) 202-857-8562 | Fax (+1) 202-419-4372 |  Media Inquiries

Research Topics

  • Email Newsletters

ABOUT PEW RESEARCH CENTER  Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It conducts public opinion polling, demographic research, media content analysis and other empirical social science research. Pew Research Center does not take policy positions. It is a subsidiary of  The Pew Charitable Trusts .

© 2024 Pew Research Center

NutraIngredients Europe

  • News, Analysis & Insights on Nutrition, Supplements, and Health

NutraIngredients Europe

Kaempferol may boost physical activity, sleep quality: Study

13-Aug-2024 - Last updated on 13-Aug-2024 at 10:09 GMT

  • Email to a friend

Kaempferol is a flavonoid found in many plants, including strawberries.   Image © Jonathan Knowles / Getty Images

Data published in Frontiers in Nutrition ​ indicated that two weeks of daily supplementation with 10 mg of kaempferol was associated with reductions in sleeping and resting heart rate, linked to improved physical activity and better sleep.

“KMP [kaempferol] could potentially be used as a food ingredient that can help maintain the quality of life for people living in the era of a 100-year lifespan, considering that it can be ingested daily,” wrote researchers from Otsuka Pharmaceutical’s Saga Nutraceuticals Research Institute in Japan.

The benefits of physical activities ​

Moderate exercise is vital to maintain overall good health and prevent diseases, with data ​ showing that a mere increase of 1,500 steps per day could reduce mortality risk by 2.2%. Physical activity mitigates chronic health conditions through weight loss, reducing inflammation and promoting mental well-being.

Linked to this, mitochondrial function declines with aging. Physical activity is also correlated with maximal oxygen uptake (VO 2max ​), a determinant of aerobic capacity. A decrease in VO 2max ​ leads to reduced daytime activity and poor sleep quality. Another important factor that promotes good health is sleep.

While nutritional approaches to boost sleep like melatonin are well known, another approach may be to increase daytime physical activity because it enhances heart rate through accumulating fatigue and raising VO 2max ​ that ultimately improve sleep quality.

Study details ​

All participants wore a Fitbit Charge 4 tracker on their non-dominant hand for almost 24 hours a day. The Fitbit provided data on physical activity levels, heart rate and sleep.

The results indicated that kaempferol supplementation improved physical activity and sleep quality, compared to placebo. These improvements were attributed to a reduced heart rate (HR) at each physical activity level. Specifically, kaempferol decreased heart rate during light, moderate, and vigorous activities by 4 bpm, 3 bpm and 2 bpm, respectively. Relative to the placebo, kaempferol intake also reduced physical load index and HR (sleep and rest) by 6 bpm.

In addition, kaempferol significantly increased physical activity, as determined by an increased step count of 624 steps/day and moving distance of 0.4 km/day. No change in walking speed was observed in the placebo group. Participants belonging to the kaempferol group also exhibited a significant increase in weekend activities and recreation.

“[Kaempferol], a single ingredient, has multiple health benefits through multiple mechanisms, which is intriguing and strongly indicates its usefulness," the researchers concluded. "Our results suggest that [kaempferol], by mediating effects on such as oxygen supply and energy production and improving overall quality of life rather than treating specific diseases, has the potential to offer broad value to individuals and society."

Source: Frontiers in Nutrition ​ Volume 11 – 2024, doi: 10.3389/fnut.2024.1386389 ​ “Effect of kaempferol ingestion on physical activity and sleep quality: a double-blind, placebo-controlled, randomized, crossover trial” Authors: Yasutaka Ikeda et al.

Related products

Maintaining Health from the first signs of aging

Maintaining Health from the first signs of aging

Content provided by IFF | 01-Jul-2024 | Product Brochure

In 2023, 35% of the EU population was at least 55 years old1. That percentage is likely to climb.

Dynamic Duo - More Power, Less Fatigue

Dynamic Duo - More Power, Less Fatigue

Content provided by Enovate Biolife LLC | 27-Jun-2024 | White Paper

Better physical performance & vitality have deep connections to muscular as well as cardio-respiratory health.

Go Beyond Ordinary in Cognitive Health - BacoMind™

Go Beyond Ordinary in Cognitive Health - BacoMind™

Content provided by Natural Remedies Pvt. Ltd. | 26-Jun-2024 | White Paper

BacoMind™, this groundbreaking, clinically proven, and pioneering Bacopa monnieri extract unleashes the power of 9 bioactive compounds to organically boost...

Ingredients to support women at every life stage

Ingredients to support women at every life stage

Content provided by Gencor | 21-Jun-2024 | White Paper

While addressing women’s health across all stages of life, we might as well have opened Pandora’s box. Dietary supplements can play an important role in...

Related suppliers

  • Akay Bioactives
  • Cambridge Commodities
  • Ohly announces new NEIVA range of health products Ohly | Download Product Brochure
  • Maintaining Health from the first signs of aging IFF | Download Product Brochure
  • Dynamic Duo - More Power, Less Fatigue Enovate Biolife LLC | Download White Paper
  • Go Beyond Ordinary in Cognitive Health - BacoMind™ Natural Remedies Pvt. Ltd. | Download White Paper
  • Ingredients to support women at every life stage Gencor | Download White Paper
  • Bacognize®: Natural, Clean label Solution for Brain Energy, Focus and Mood LEHVOSS Nutrition | Download Product Brochure

Upcoming editorial webinars

  • 08 Oct 2024 Tue Webinar State of the Market
  • 15 Oct 2024 Tue Webinar Cognitive Health

On-demand webinars

  • Personalized nutrition: the rise of the biohacker Webinar
  • Unleashing the power of magnesium for an active lifestyle Lipofoods, a Lubrizol Company
  • Proteins for all: essential in sports from amateur to professional levels Ingredia SA
  • Women's health Webinar
  • Voluntary communication on food supplement FoodChain ID

Probiota calls for entries for its 2025 Scientific Frontiers Session

Promotional features

Using nutrigenomics to improve active ingredients in supplements

NutraIngredients

  • Advertise with us
  • Press Releases – Guidelines
  • Apply to reuse our content
  • Contact the Editor
  • Report a technical problem
  • Subscription Benefits
  • Why Register
  • Whitelist our newsletters
  • Editorial Calendar
  • Event Calendar

what is power in research study

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Hum Reprod Sci
  • v.5(1); Jan-Apr 2012

This article has been retracted.

Sample size estimation and power analysis for clinical research studies.

Department of Biostatistics, National Institute of Animal Nutrition and Physiology, Bangalore, India

S Chandrashekara

1 Department of Immunology and Reumatology, ChanRe Rheumatology and Immunology Center and Research, Bangalore, India

Determining the optimal sample size for a study assures an adequate power to detect statistical significance. Hence, it is a critical step in the design of a planned research protocol. Using too many participants in a study is expensive and exposes more number of subjects to procedure. Similarly, if study is underpowered, it will be statistically inconclusive and may make the whole protocol a failure. This paper covers the essentials in calculating power and sample size for a variety of applied study designs. Sample size computation for single group mean, survey type of studies, 2 group studies based on means and proportions or rates, correlation studies and for case-control for assessing the categorical outcome are presented in detail.

INTRODUCTION

Clinical research studies can be classified into surveys, experiments, observational studies etc. They need to be carefully planned to achieve the objective of the study. The planning of a good research has many aspects. First step is to define the problem and it should be operational. Second step is to define the experimental or observational units and the appropriate subjects and controls. Meticulously, one has to define the inclusion and exclusion criteria, which should take care of all possible variables which could influence the observations and the units which are measured. The study design must be clear and the procedures are defined to the best possible and available methodology. Based on these factors, the study must have an adequate sample size, relative to the goals and the possible variabilities of the study. Sample must be ‘big enough’ such that the effect of expected magnitude of scientific significance, to be also statistically significant. Same time, It is important that the study sample should not be ‘too big’ where an effect of little scientific importance is nevertheless statistically detectable. In addition, sample size is important for economic reasons: An under-sized study can be a waste of resources since it may not produce useful results while an over-sized study uses more resources than necessary. In an experiment involving human or animal subjects, sample size is a critical ethical issue. Since an ill-designed experiment exposes the subjects to potentially harmful treatments without advancing knowledge.[ 1 , 2 ] Thus, a fundamental step in the design of clinical research is the computation of power and sample size. Power is the probability of correctly rejecting the null hypothesis that sample estimates (e.g. Mean, proportion, odds, correlation co-efficient etc.) does not statistically differ between study groups in the underlying population. Large values of power are desirable, at least 80%, is desirable given the available resources and ethical considerations. Power proportionately increases as the sample size for study increases. Accordingly, an investigator can control the study power by adjusting the sample size and vice versa.[ 3 , 4 ]

A clinical study will be expressed in terms of an estimate of effect, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the true effect in the population while the P value determines the how likely that the observed effect in the sample is due to chance. A related quantity is the statistical power; this is the probability of identifying an exact difference between 2 groups in the study samples when one genuinely exists in the populations from which the samples were drawn.

Factors that affect the sample size

The calculation of an appropriate sample size relies on choice of certain factors and in some instances on crude estimates. There are 3 factors that should be considered in calculation of appropriate sample size- summarized in Table 1 . The each of these factors influences the sample size independently, but it is important to combine all these factors in order to arrive at an appropriate sample size.

Factors that affect sample size calculations

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g001.jpg

The Normal deviates for different significance levels (Type I error or Alpha) for one tailed and two tailed alternative hypothesis are shown in Table 2 .

The normal deviates for Type I error (Alpha)

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g002.jpg

The normal deviates for different power, probability of rejecting null hypothesis when it is not true or one minus probability of type II error are in shown Table 3 .

The normal deviates for statistical power

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g003.jpg

Study design, outcome variable and sample size

Study design has a major impact on the sample size. Descriptive studies need hundreds of subjects to give acceptable confidence interval for small effects. Experimental studies generally need lesser sample while the cross-over designs needs one-quarter of the number required compared to a control group because every subject gets the experimental treatment in cross-over study. An evaluation studies in single group with pre-post type of design needs half the number for a similar study with a control group. A study design with one-tailed hypothesis requires 20% lesser subjects compared to two-tailed studies. Non-randomized studies needs 20% more subjects compared to randomized studies in order to accommodate confounding factors. Additional 10 - 20% subjects are required to allow adjustment of other factors such as withdrawals, missing data, lost to follow-up etc.

The “outcome” expected under study should be considered. There are 3 possible categories of outcome. The first is a simple case where 2 alternatives exist: Yes/no, death/alive, vaccinated/not vaccinated, etc. The second category covers multiple, mutually exclusive alternatives such as religious beliefs or blood groups. For these 2 categories of outcome, the data are generally expressed as percentages or rates[ 5 – 7 ] The third category covers continuous response variables such as weight, height, blood pressure, VAS score, IL6, TNF-a, homocysteine etc, which are continuous measures and are summarized as means and standard deviations. The statistical methods appropriates the sample size based on which of these outcomes measure is critical for the study, for example, larger sample size is required to assess the categorical variable compared to continuous outcome variable.

Alpha level

The definition of alpha is the probability of detecting a significant difference when the treatments are equally effective or risk of false positive findings. The alpha level used in determining the sample size in most of academic research studies are either 0.05 or 0.01.[ 7 ] Lower the alpha level, larger is the sample size. For example, a study with alpha level of 0.01 requires more subjects when compared to a study with alpha level of 0.05 for similar outcome variable. Lower alpha viz 0.01 or less is used when the decisions based on the research are critical and the errors may cause substantial, financial, or personal harm.

Variance or standard deviation

The variance or standard deviation for sample size calculation is obtained either from previous studies or from pilot study. Larger the standard deviation, larger is the sample size required in a study. For example, in a study, with primary outcome variable is TNF-a, needs more subjects compared to a variable of birth weight, 10-point Vas score etc. as the natural variability of TNF-a is wide compared to others.

Minimum detectable difference

This is the expected difference or relationship between 2 independent samples, also known as the effect size. The obvious question is how to know the difference in a study, which is not conducted. If available, it may be useful to use the effect size found from prior studies. Where no previous study exists, the effect size is determined from literature review, logical assertion, and conjecture.

The difference between 2 groups in a study will be explored in terms of estimate of effect, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the true effect in a population while P value determines how likely it is that the observed effect in the sample is due to chance. A related quantity is the statistical power of the study, is the probability of detecting a predefined clinical significance. The ideal study is the one, which has high power. This means that the study has a high chance of detecting a difference between groups if it exists, consequently, if the study demonstrates no difference between the groups, the researcher can reasonably confident in concluding that none exists. The ideal power for any study is considered to be 80%.[ 8 ]

In research, statistical power is generally calculated with 2 objectives. 1) It can be calculated before data collection based on information from previous studies to decide the sample size needed for the current study. 2) It can also be calculated after data analysis. The second situation occurs when the result turns out to be non-significant. In this case, statistical power is calculated to verify whether the non-significance result is due to lack of relationship between the groups or due to lack of statistical power.

Statistical power is positively correlated with the sample size, which means that given the level of the other factors viz. alpha and minimum detectable difference, a larger sample size gives greater power. However, researchers should be clear to find a difference between statistical difference and scientific difference. Although a larger sample size enables researchers to find smaller difference statistically significant, the difference found may not be scientifically meaningful. Therefore, it is recommended that researchers must have prior idea of what they would expect to be a scientifically meaningful difference before doing a power analysis and determine the actual sample size needed. Power analysis is now integral to the health and behavioral sciences, and its use is steadily increasing whenever the empirical studies are performed.

Withdrawals, missing data and losses to follow-up

Sample size calculated is the total number of subjects who are required for the final study analysis. There are few practical issues, which need to be considered while calculating the number of subjects required. It is a fact that all eligible subjects may not be willing to take part and may be necessary screen more subjects than the final number of subjects entering the study. In addition, even in well-designed and conducted studies, it is unusual to finish with a dataset, which is complete for all the subjects recruited, in a usable format. The reason could be subject factor like- subjects may fail or refuse to give valid responses to particular questions, physical measurements may suffer from technical problems, and in studies involving follow-up (eg. Trials or cohort studies), there will be some degree of attrition. The reason could be technical and the procedural problem- like contamination, failure to get the assessment or test performed in time. It may, therefore, necessary to consider these issues before calculating the number of subjects to be recruited in a study in order to achieve the final desired sample size.

Example, say in a study, a total of N number of subjects are required in the end of the study with all the data being complete for analysis, but a proportion (q) are expected to refuse to participate or drop out before the study ends. In this case, the following total number of subjects (N 1 ) would have to be recruited to ensure that the final sample size (N) is achieved:

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g004.jpg

The proportion of eligible subjects who will refuse to participate or provide the inadequate information will be unknown at the beginning of the study. Approximate estimates is often possible using information from similar studies in comparable populations or from an appropriate pilot study.[ 9 ]

Sample size estimation for proportion in survey type of studies

A common goal of survey research is to collect data representative of population. The researcher uses information gathered from the survey to generalize findings from a drawn sample back to a population, within the limits of random error. The general rule relative to acceptable margins of error in survey research is 5 - 10%. The sample size can be estimated using the following formula

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g005.jpg

Where P is the prevalence or proportion of event of interest for the study, E is the Precision (or margin of error) with which a researcher want to measure something. Generally, E will be 10% of P and Z α/2 is normal deviate for two-tailed alternative hypothesis at a level of significance; for example, for 5% level of significance, Z α/2 is 1.96 and for 1% level of significance it is 2.58 as shown in Table 2 . D is the design effect reflects the sampling design used in the survey type of study. This is 1 for simple random sampling and higher values (usually 1 to 2) for other designs such as stratified, systematic, cluster random sampling etc, estimated to compensate for deviation from simple random sampling procedure. The design effect for cluster random sampling is taken as 1.5 to 2. For the purposive sampling, convenience or judgment sampling, D will cross 10. Higher the D, the more will be sample size required for a study. Simple random sampling is unlikely to be the sampling method in an actual filed survey. If another sampling method such as systematic, stratified, cluster sampling etc. is used, a larger sample size is likely to be needed because of the “design effect”.[ 10 – 12 ] In case of impact study, P may be estimated at 50% to reflect the assumption that an impact is expected in 50% of the population. A P of 50% is also a conservative estimate; Example: Researcher interested to know the sample size for conducting a survey for measuring the prevalence of obesity in certain community. Previous literature gives the estimate of an obesity at 20% in the population to be surveyed, and assuming 95% confidence interval or 5% level of significance and 10% margin of error, the sample size can be calculated as follow as;

N = (Z α/2 ) 2 P(1-P)*1 / E 2 = (1.96) 2 *0.20*(1-0.20)/(0.1*0.20) 2 = 3.8416*0.16/(0.02) 2 = 1537 for a simple random sampling design. Hence, sample size of 1537 is required to conduct community-based survey to estimate the prevalence of obesity. Note-E is the margin of error, in the present example; it is 10% χ 0.20 = 0.02.

To find the final adjusted sample size, allowing non-response rate of 10% in the above example, the adjusted sample size will be 1537/(1-0.10) = 1537/0.90 = 1708.

Sample size estimation with single group mean

If researcher is conducting a study in single group such as outcome assessment in a group of patients subjected to certain treatment or patients with particular type of illness and the primary outcome is a continuous variable for which the mean and standard deviation are expression of results or estimates of population, the sample size can be estimated using the following formula

N = (Z α/2 ) 2 s 2 / d 2 ,

where s is the standard deviation obtained from previous study or pilot study, and d is the accuracy of estimate or how close to the true mean. Z α/2 is normal deviate for two- tailed alternative hypothesis at a level of significance.

Research studies with one tailed hypothesis, above formula can be rewritten as

N = (Z α ) 2 s 2 / d 2 , the Z α values are 1.64 and 2.33 for 5% and 1% level of significance.

Example: In a study for estimating the weight of population and wants the error of estimation to be less than 2 kg of true mean (that is expected difference of weight to be 2 kg), the sample standard deviation was 5 and with a probability of 95%, and (that is) at an error rate of 5%, the sample size estimated as N = (1.96) 2 (5) 2 / 2 2 gives the sample of 24 subjects, if the allowance of 10% for missing, losses to follow-up, withdrawals is assumed, then the corrected sample will be 27 subjects. Corrected sample size thus obtained is 24/(1.0-0.10) ≅ 24/0.9 = 27 and for 20% allowances, the corrected sample size will be 30.

Sample size estimation with two means

In a study with research hypothesis viz; Null hypothesis H o : m 1 = m 2 vs. alternative hypothesis H a : m 1 = m 2 + d where d is the difference between two means and n1 and n2 are the sample size for Group I and Group II such that N = n1 + n2. The ratio r = n1/n2 is considered whenever the researcher needs unequal sample size due to various reasons, such as ethical, cost, availability etc.

Then, the total sample size for the study is as follows

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g006.jpg

Sample size estimation with two proportions

In study based on outcome in proportions of event in two populations (groups), such as percentage of complications, mortality improvement, awareness, surgical or medical outcome etc., the sample size estimation is based on proportions of outcome, which is obtained from previous literature review or conducting pilot study on smaller sample size. A study with null hypothesis of H o : π 1 = π 2 vs. H a : π 1 = π 2 + d , where π are population proportion and p1 and p2 are the corresponding sample estimates, the sample size can be estimated using the following formula

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g008.jpg

If researcher is planning to conduct a study with unequal groups, he or she must calculate N as if we are using equal groups, and then calculate the modified sample size. If r = n1/n2 is the ratio of sample size in 2 groups, then the required sample size is N 1 = N (1+ r ) 2 /4 r , if n1 = 2n2 that is sample size ratio is 2:1 for group 1 and group 2, then N 1 = 9 N /8, a fairly small increase in total sample size.

Example: It is believed that the proportion of patients who develop complications after undergoing one type of surgery is 5% while the proportion of patients who develop complications after a second type of surgery is 15%. How large should the sample be in each of the 2 groups of patients if an investigator wishes to detect, with a power of 90%, whether the second procedure has a complications rate significantly higher than the first at the 5% level of significance?

In the example,

  • a) Test value of difference in complication rate 0%
  • b) Anticipated complication rate 5%, 15% in 2 groups
  • c) Level of significance 5%
  • d) Power of the test 90%
  • e) Alternative hypothesis(one tailed) (p 1 -p 2 ) < 0%

The total sample size required is 74 for equal size distribution, for unequal distribution of sample size with 1.5:1 that is r = 1.5, the total sample size will be 77 with 46 for group I and 31 for group II.

Sample size estimation with correlation co-efficient

In an observational studies, which involves to estimate a correlation (r) between 2 variables of interest say, X and Y, a typical hypothesis of form H 0 : r = 0 against H a :r ≠ 0, the sample size for correlation study can be obtained by computing

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g010.jpg

Example: According to the literature, the correlation between salt intake and systolic blood pressure is around 0.30. A study is conducted to attests this correlation in a population, with the significance level of 1% and power of 90%. The sample size for such a study can be estimated as follows:

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g011.jpg

Sample size estimation with odds ratio

In case-control study, data are usually summarized in odds ratio, rather than difference between two proportions when the outcome variables of interest were categorical in nature. If P1 and P2 are proportion of cases and controls, respectively, exposed to a risk factor, then:

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g012.jpg

Example: The prevalence of vertebral fracture in a population is 25%. When the study is interested to estimate the effect of smoking on the fracture, with an odds ratio of 2, at the significance level of 5% (one-sided test) and power of 80%, the total sample size for the study of equal sample size can be estimated by:

An external file that holds a picture, illustration, etc.
Object name is JHRS-5-7-g014.jpg

The equations in this paper assume that the selection of individual is random and unbiased. The decisions to include a subject in the study depend on whether or not that subject has the characteristic or the outcome studied. Second, in studies in which the mean is calculated, the measurements are assumed to have normal distributions.[ 13 , 14 ]

The concept of statistical power is more associated with sample size, the power of the study increases with an increase in sample size. Ideally, minimum power of a study required is 80%. Hence, the sample size calculation is critical and fundamental for designing a study protocol. Even after completion of study, a retrospective power analysis will be useful, especially when a statistically not a significant results are obtained.[ 15 ] Here, actual sample size and alpha-level are known, and the variance observed in the sample provides an estimate of variance of population. The analysis of power retrospectively re-emphasizes the fact negative finding is a true negative finding.

The ideal study for the researcher is one in which the power is high. This means that the study has a high chance of detecting a difference between groups if one exists; consequently, if the study demonstrates no difference between groups, the researcher can be reasonably confident in concluding that none exists. The Power of the study depends on several factors, but as a general rule, higher power is achieved by increasing the sample size.[ 16 ] Many apparently null studies may be under-powered rather than genuinely demonstrating no difference between groups, absence of evidence is not evidence of absence.[ 9 ]

A Sample size calculation is an essential step in research protocols and is a must to justify the size of clinical studies in papers, reports etc. Nevertheless, one of the most common error in papers reporting clinical trials is a lack of justification of the sample size, and it is a major concern that important therapeutic effects are being missed because of inadequately sized studies.[ 17 , 18 ] The purpose of this review is to make available a collection of formulas for sample size calculations and examples for variety of situations likely to be encountered.

Often, the research is faced with various constraints that may force them to use an inadequate sample size because of both practical and statistical reasons. These constraints may include budget, time, personnel, and other resource limitations. In these cases, the researchers should report both the appropriate sample size along with sample size actually used in the study; the reasons for using inadequate sample sizes and a discussion of the effect of inadequate sample size may have on the results of the study. The researcher should exercise caution when making pragmatic recommendations based on the research with an inadequate sample size.

Sample size determination is an important major step in the design of a research study. Appropriately-sized samples are essential to infer with confidence that sample estimated are reflective of underlying population parameters. The sample size required to reject or accept a study hypothesis is determined by the power of an a-test. A study that is sufficiently powered has a statistical rescannable chance of answering the questions put forth at the beginning of research study. Inadequately sized studies often results in investigator's unrealistic assumptions about the effectiveness of study treatment. Misjudgment of the underlying variability for parameter estimates wrong estimate of follow-up period to observe the intended effects of the treatment and inability to predict the lack of compliance of the study regimen, and a high drop-rate rates and/or the failure to account for the multiplicity of study endpoints are the common error in a clinical research. Conducting a study that has little chance of answering the hypothesis at hand is a misuse of time and valuable resources and may unnecessarily expose participants to potential harm or unwarranted expectations of therapeutic benefits. As scientific and ethical issue go hand-in-hand, the awareness of determination of minimum required sample size and application of appropriate sampling methods are extremely important in achieving scientifically and statistically sound results. Using an adequate sample size along with high quality data collection efforts will result in more reliable, valid and generalizable results, it could also result in saving resources. This paper was designed as a tool that a researcher could use in planning and conducting quality research.

Source of Support: Nil

Conflict of Interest: None declared.

IMAGES

  1. How To Calculate The Power Of A Research Study

    what is power in research study

  2. There are three major considerations when doing a power analysis for

    what is power in research study

  3. How To Use Power Analysis To Determine The Appropriate Sample Size Of A

    what is power in research study

  4. How To Calculate Statistical Power Of A Study

    what is power in research study

  5. THE POWER OF RESEARCH- An Article by Achyuth Chandra, Grand Student

    what is power in research study

  6. Introduction to Power Analysis

    what is power in research study

COMMENTS

  1. Sample size, power and effect size revisited: simplified and practical approaches in pre-clinical, clinical and laboratory studies

    The study results should be evaluated together with the effect size, study efficiencies ( i.e. basic research, clinical laboratory, and clinical studies) and confidence interval levels.

  2. Power Analysis and Sample Size, When and Why?

    Rosenfeld and Rockette ( 1) showed in their study that only 1% of 541 original research articles published in four prestigious otolaryngology journals of 1989 studied sample size or power analysis. Today, the first step of a clinical or experimental study is design.

  3. Statistical Power and Why It Matters

    Statistical power, or sensitivity, is the likelihood of a significance test detecting an effect when there actually is one. A true effect is a real, non-zero relationship between variables in a population. An effect is usually indicated by a real difference between groups or a correlation between variables.

  4. An introduction to power and sample size estimation

    The importance of power and sample size estimation for study design and analysis.

  5. How to Calculate Sample Size Needed for Power

    Learn how to calculate the sample size needed for your study using power analysis. This article explains the concepts and steps of power analysis with examples and graphs.

  6. Statistical Power: What it is, How to Calculate it

    What is Power? The statistical power of a study (sometimes called sensitivity) is how likely the study is to distinguish an actual effect from one of chance. It's the likelihood that the test is correctly rejecting the null hypothesis (i.e. "proving" your hypothesis ). For example, a study that has an 80% power means that the study has an 80% chance of the test having significant results.

  7. Statistical primer: sample size and power calculations—why, when and

    When designing a clinical study, a fundamental aspect is the sample size. In this article, we describe the rationale for sample size calculations, when it should be calculated and describe the components necessary to calculate it. For simple studies, ...

  8. What is a power analysis?

    A power analysis is a calculation that helps you determine a minimum sample size for your study. It's made up of four main components. If you know or have estimates for any three of these, you can calculate the fourth component. Statistical power: the likelihood that a test will detect an effect of a certain size if there is one, usually set ...

  9. Power (statistics)

    Power (statistics) In frequentist statistics, power is a measure of the ability of an experimental design and hypothesis testing setup to detect a particular effect if it is truly present. In typical use, it is a function of the test used (including the desired level of statistical significance ), the assumed distribution of the test (for ...

  10. Power to Detect What? Considerations for Planning and Evaluating Sample

    Power also comes into play when evaluating research post hoc. Although the sample size reported in a study is important, the power of its key tests is also tied to study design, analytic choices, and other features of the research setting that feed into effect size. For example, a repeated-measures analysis with few participants, but many data points per participant, can have far greater power ...

  11. What Is Power?

    In Study D, the p -value continues to be greater than alpha, but—unlike Study B and Study C—Study D has an appropriate power set at 80%. That is a good thing.

  12. What is Power in Statistics?

    Power in statistics is the probability that a hypothesis test can detect an effect in a sample when it exists in the population.

  13. Guide to Power Analysis and Statistical Power

    Medical research: Power analysis is crucial in medical research for determining sample sizes in clinical trials. By ensuring studies have sufficient power, researchers can more accurately detect treatment effects and improve patient outcomes. Psychology: In psychology, power analysis is utilized to design experiments that detect behavioral ...

  14. Power Analysis, Statistical Significance, & Effect Size

    Power Analysis, Statistical Significance, & Effect Size If you plan to use inferential statistics (e.g., t-tests, ANOVA, etc.) to analyze your evaluation results, you should first conduct a power analysis to determine what size sample you will need. This page describes what power is as well as what you will need to calculate it.

  15. Introduction to Power Analysis

    A power analysis is a good way of making sure that you have thought through every aspect of the study and the statistical analysis before you start collecting data. Despite these advantages of power analyses, there are some limitations. One limitation is that power analyses do not typically generalize very well.

  16. Power calculation

    For studies with explicit pre-specified hypotheses it is in principal possible to estimate the probability that a study of a given size will answer the question - the power of the study - and many reviewers, funders and ethics committees ask for such calculations.

  17. Using and Understanding Power in Psychological Research: A Survey Study

    In this study, we surveyed 214 psychological researchers, and asked them about their experiences of using a priori power analysis, effect size estimation methods, post hoc power, and their understanding of what the term "power" actually means.

  18. Statistical Power: What It Is and How It Is Used in Practice

    Statistical Power: What It Is and How It Is Used in Practice Statistical power is a measure of study efficiency, calculated before conducting the study to estimate the chance of discovering a true effect rather than obtaining a false negative result, or worse, overestimating the effect by detecting the noise in the data.

  19. Statistical Power

    Statistical power is only accounted for when it is known that a certain effect exists in the research population. In other words, statistical power is a decision by a researcher/statistician that results of a study/experiment can be explained by factors other than chance alone. The statistical power of a study is also referred to as its ...

  20. Statistical power and underpowered statistics

    Statisticians provide the answer in the form of "statistical power.". The power of a study is the likelihood that it will distinguish an effect of a certain size from pure luck. A study might easily detect a huge benefit from a medication, but detecting a subtle difference is much less likely. Let's try a simple example.

  21. How can we define the Power of Research study?

    The statistical power of a study is the power, or ability, of a study to detect a difference if a difference really exists. It depends on two things: the sample size (number of subjects), and the ...

  22. In Brief: Statistics in Brief: Statistical Power: What Is It and When

    The most meaningful application of statistical power is to decide before initiation of a clinical study whether it is worth doing, given the needed effort, cost, and in the case of clinical experiments, patient involvement. A hypothesis test with little power will likely yield large p values and large confidence intervals.

  23. [Types of studies, power of study and choice of test]

    Power of a study represents the probability of finding a difference that exists in a population. It depends on the chosen level of significance, difference that we look for (effect size), variability of the measured variables, and sample size. Most often sample size is the only element under direct researcher's control.

  24. OSU research on electrical mapping aids U.S. power grid protection

    OSU Researchers completed an electrical mapping project which helps protect the U.S. power grid.Adam Schultz, a professor of geophysics at the OSU who is also

  25. Power in the Court: Legal Argumentation and the Hierarchy of

    Existing research shows that tenants are disadvantaged in eviction proceedings, but studies have provided an incomplete account of the way court procedures and practices work together to produce systematic disadvantage.

  26. US, India, Russia, Japan are building out wind power much too slowly

    The world is falling well short of a promise signed at global climate talks last year to triple the amount of wind power, according to research by an energy think tank released Thursday.

  27. Horses are smart enough to plan and strategise, new study shows

    The study, published in the Applied Animal Behaviour Science journal shows that horses are more cognitively advanced than they are given credit for, Dr Carrie Ijichi, a senior equine researcher at ...

  28. Majority of Americans support more nuclear power in the country

    A majority of U.S. adults remain supportive of expanding nuclear power in the country, according to a Pew Research Center survey from May. Overall, 56% say they favor more nuclear power plants to generate electricity. This share is statistically unchanged from last year. But the future of large ...

  29. Kaempferol may boost physical activity, sleep quality: Study

    Daily consumption of kaempferol, a plant flavonoid, may induce behavioral changes, such as an increase in physical activity (PA) and the number of recreational weekend outings, which improve sleep quality, says a new study.

  30. Sample size estimation and power analysis for clinical research studies

    Determining the optimal sample size for a study assures an adequate power to detect statistical significance. Hence, it is a critical step in the design of a planned research protocol. Using too many participants in a study is expensive and exposes more ...