• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Z Test: Uses, Formula & Examples

By Jim Frost Leave a Comment

What is a Z Test?

Use a Z test when you need to compare group means. Use the 1-sample analysis to determine whether a population mean is different from a hypothesized value. Or use the 2-sample version to determine whether two population means differ.

A Z test is a form of inferential statistics . It uses samples to draw conclusions about populations.

For example, use Z tests to assess the following:

  • One sample : Do students in an honors program have an average IQ score different than a hypothesized value of 100?
  • Two sample : Do two IQ boosting programs have different mean scores?

In this post, learn about when to use a Z test vs T test. Then we’ll review the Z test’s hypotheses, assumptions, interpretation, and formula. Finally, we’ll use the formula in a worked example.

Related post : Difference between Descriptive and Inferential Statistics

Z test vs T test

Z tests and t tests are similar. They both assess the means of one or two groups, have similar assumptions, and allow you to draw the same conclusions about population means.

However, there is one critical difference.

Z tests require you to know the population standard deviation, while t tests use a sample estimate of the standard deviation. Learn more about Population Parameters vs. Sample Statistics .

In practice, analysts rarely use Z tests because it’s rare that they’ll know the population standard deviation. It’s even rarer that they’ll know it and yet need to assess an unknown population mean!

A Z test is often the first hypothesis test students learn because its results are easier to calculate by hand and it builds on the standard normal distribution that they probably already understand. Additionally, students don’t need to know about the degrees of freedom .

Z and T test results converge as the sample size approaches infinity. Indeed, for sample sizes greater than 30, the differences between the two analyses become small.

William Sealy Gosset developed the t test specifically to account for the additional uncertainty associated with smaller samples. Conversely, Z tests are too sensitive to mean differences in smaller samples and can produce statistically significant results incorrectly (i.e., false positives).

When to use a T Test vs Z Test

Let’s put a button on it.

When you know the population standard deviation, use a Z test.

When you have a sample estimate of the standard deviation, which will be the vast majority of the time, the best statistical practice is to use a t test regardless of the sample size.

However, the difference between the two analyses becomes trivial when the sample size exceeds 30.

Learn more about a T-Test Overview: How to Use & Examples and How T-Tests Work .

Z Test Hypotheses

This analysis uses sample data to evaluate hypotheses that refer to population means (µ). The hypotheses depend on whether you’re assessing one or two samples.

One-Sample Z Test Hypotheses

  • Null hypothesis (H 0 ): The population mean equals a hypothesized value (µ = µ 0 ).
  • Alternative hypothesis (H A ): The population mean DOES NOT equal a hypothesized value (µ ≠ µ 0 ).

When the p-value is less or equal to your significance level (e.g., 0.05), reject the null hypothesis. The difference between your sample mean and the hypothesized value is statistically significant. Your sample data support the notion that the population mean does not equal the hypothesized value.

Related posts : Null Hypothesis: Definition, Rejecting & Examples and Understanding Significance Levels

Two-Sample Z Test Hypotheses

  • Null hypothesis (H 0 ): Two population means are equal (µ 1 = µ 2 ).
  • Alternative hypothesis (H A ): Two population means are not equal (µ 1 ≠ µ 2 ).

Again, when the p-value is less than or equal to your significance level, reject the null hypothesis. The difference between the two means is statistically significant. Your sample data support the idea that the two population means are different.

These hypotheses are for two-sided analyses. You can use one-sided, directional hypotheses instead. Learn more in my post, One-Tailed and Two-Tailed Hypothesis Tests Explained .

Related posts : How to Interpret P Values and Statistical Significance

Z Test Assumptions

For reliable results, your data should satisfy the following assumptions:

You have a random sample

Drawing a random sample from your target population helps ensure that the sample represents the population. Representative samples are crucial for accurately inferring population properties. The Z test results won’t be valid if your data do not reflect the population.

Related posts : Random Sampling and Representative Samples

Continuous data

Z tests require continuous data . Continuous variables can assume any numeric value, and the scale can be divided meaningfully into smaller increments, such as fractional and decimal values. For example, weight, height, and temperature are continuous.

Other analyses can assess additional data types. For more information, read Comparing Hypothesis Tests for Continuous, Binary, and Count Data .

Your sample data follow a normal distribution, or you have a large sample size

All Z tests assume your data follow a normal distribution . However, due to the central limit theorem, you can ignore this assumption when your sample is large enough.

The following sample size guidelines indicate when normality becomes less of a concern:

  • One-Sample : 20 or more observations.
  • Two-Sample : At least 15 in each group.

Related posts : Central Limit Theorem and Skewed Distributions

Independent samples

For the two-sample analysis, the groups must contain different sets of items. This analysis compares two distinct samples.

Related post : Independent and Dependent Samples

Population standard deviation is known

As I mention in the Z test vs T test section, use a Z test when you know the population standard deviation. However, when n > 30, the difference between the analyses becomes trivial.

Related post : Standard Deviations

Z Test Formula

These Z test formulas allow you to calculate the test statistic. Use the Z statistic to determine statistical significance by comparing it to the appropriate critical values and use it to find p-values.

The correct formula depends on whether you’re performing a one- or two-sample analysis. Both formulas require sample means (x̅) and sample sizes (n) from your sample. Additionally, you specify the population standard deviation (σ) or variance (σ 2 ), which does not come from your sample.

I present a worked example using the Z test formula at the end of this post.

Learn more about Z-Scores and Test Statistics .

One Sample Z Test Formula

One sample Z test formula.

The one sample Z test formula is a ratio.

The numerator is the difference between your sample mean and a hypothesized value for the population mean (µ 0 ). This value is often a strawman argument that you hope to disprove.

The denominator is the standard error of the mean. It represents the uncertainty in how well the sample mean estimates the population mean.

Learn more about the Standard Error of the Mean .

Two Sample Z Test Formula

Two sample Z test formula.

The two sample Z test formula is also a ratio.

The numerator is the difference between your two sample means.

The denominator calculates the pooled standard error of the mean by combining both samples. In this Z test formula, enter the population variances (σ 2 ) for each sample.

Z Test Critical Values

As I mentioned in the Z vs T test section, a Z test does not use degrees of freedom. It evaluates Z-scores in the context of the standard normal distribution. Unlike the t-distribution , the standard normal distribution doesn’t change shape as the sample size changes. Consequently, the critical values don’t change with the sample size.

To find the critical value for a Z test, you need to know the significance level and whether it is one- or two-tailed.

0.01 Two-Tailed ±2.576
0.01 Left Tail –2.326
0.01 Right Tail +2.326
0.05 Two-Tailed ±1.960
0.05 Left Tail +1.650
0.05 Right Tail –1.650

Learn more about Critical Values: Definition, Finding & Calculator .

Z Test Worked Example

Let’s close this post by calculating the results for a Z test by hand!

Suppose we randomly sampled subjects from an honors program. We want to determine whether their mean IQ score differs from the general population. The general population’s IQ scores are defined as having a mean of 100 and a standard deviation of 15.

We’ll determine whether the difference between our sample mean and the hypothesized population mean of 100 is statistically significant.

Specifically, we’ll use a two-tailed analysis with a significance level of 0.05. Looking at the table above, you’ll see that this Z test has critical values of ± 1.960. Our results are statistically significant if our Z statistic is below –1.960 or above +1.960.

The hypotheses are the following:

  • Null (H 0 ): µ = 100
  • Alternative (H A ): µ ≠ 100

Entering Our Results into the Formula

Here are the values from our study that we need to enter into the Z test formula:

  • IQ score sample mean (x̅): 107
  • Sample size (n): 25
  • Hypothesized population mean (µ 0 ): 100
  • Population standard deviation (σ): 15

Using the formula to calculate the results.

The Z-score is 2.333. This value is greater than the critical value of 1.960, making the results statistically significant. Below is a graphical representation of our Z test results showing how the Z statistic falls within the critical region.

Graph displaying the Z statistic falling in the critical region.

We can reject the null and conclude that the mean IQ score for the population of honors students does not equal 100. Based on the sample mean of 107, we know their mean IQ score is higher.

Now let’s find the p-value. We could use technology to do that, such as an online calculator. However, let’s go old school and use a Z table.

To find the p-value that corresponds to a Z-score from a two-tailed analysis, we need to find the negative value of our Z-score (even when it’s positive) and double it.

In the truncated Z-table below, I highlight the cell corresponding to a Z-score of -2.33.

Using a Z-table to find the p-value.

The cell value of 0.00990 represents the area or probability to the left of the Z-score -2.33. We need to double it to include the area > +2.33 to obtain the p-value for a two-tailed analysis.

P-value = 0.00990 * 2 = 0.0198

That p-value is an approximation because it uses a Z-score of 2.33 rather than 2.333. Using an online calculator, the p-value for our Z test is a more precise 0.0196. This p-value is less than our significance level of 0.05, which reconfirms the statistically significant results.

See my full Z-table , which explains how to use it to solve other types of problems.

Share this:

z test null and alternative hypothesis

Reader Interactions

Comments and questions cancel reply.

Z-test Calculator

Table of contents

This Z-test calculator is a tool that helps you perform a one-sample Z-test on the population's mean . Two forms of this test - a two-tailed Z-test and a one-tailed Z-tests - exist, and can be used depending on your needs. You can also choose whether the calculator should determine the p-value from Z-test or you'd rather use the critical value approach!

Read on to learn more about Z-test in statistics, and, in particular, when to use Z-tests, what is the Z-test formula, and whether to use Z-test vs. t-test. As a bonus, we give some step-by-step examples of how to perform Z-tests!

Or you may also check our t-statistic calculator , where you can learn the concept of another essential statistic. If you are also interested in F-test, check our F-statistic calculator .

What is a Z-test?

A one sample Z-test is one of the most popular location tests. The null hypothesis is that the population mean value is equal to a given number, μ 0 \mu_0 μ 0 ​ :

We perform a two-tailed Z-test if we want to test whether the population mean is not μ 0 \mu_0 μ 0 ​ :

and a one-tailed Z-test if we want to test whether the population mean is less/greater than μ 0 \mu_0 μ 0 ​ :

Let us now discuss the assumptions of a one-sample Z-test.

When do I use Z-tests?

You may use a Z-test if your sample consists of independent data points and:

the data is normally distributed , and you know the population variance ;

the sample is large , and data follows a distribution which has a finite mean and variance. You don't need to know the population variance.

The reason these two possibilities exist is that we want the test statistics that follow the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) . In the former case, it is an exact standard normal distribution, while in the latter, it is approximately so, thanks to the central limit theorem.

The question remains, "When is my sample considered large?" Well, there's no universal criterion. In general, the more data points you have, the better the approximation works. Statistics textbooks recommend having no fewer than 50 data points, while 30 is considered the bare minimum.

Z-test formula

Let x 1 , . . . , x n x_1, ..., x_n x 1 ​ , ... , x n ​ be an independent sample following the normal distribution N ( μ , σ 2 ) \mathrm N(\mu, \sigma^2) N ( μ , σ 2 ) , i.e., with a mean equal to μ \mu μ , and variance equal to σ 2 \sigma ^2 σ 2 .

We pose the null hypothesis, H 0  ⁣  ⁣ :  ⁣  ⁣   μ = μ 0 \mathrm H_0 \!\!:\!\! \mu = \mu_0 H 0 ​ :   μ = μ 0 ​ .

We define the test statistic, Z , as:

x ˉ \bar x x ˉ is the sample mean, i.e., x ˉ = ( x 1 + . . . + x n ) / n \bar x = (x_1 + ... + x_n) / n x ˉ = ( x 1 ​ + ... + x n ​ ) / n ;

μ 0 \mu_0 μ 0 ​ is the mean postulated in H 0 \mathrm H_0 H 0 ​ ;

n n n is sample size; and

σ \sigma σ is the population standard deviation.

In what follows, the uppercase Z Z Z stands for the test statistic (treated as a random variable), while the lowercase z z z will denote an actual value of Z Z Z , computed for a given sample drawn from N(μ,σ²).

If H 0 \mathrm H_0 H 0 ​ holds, then the sum S n = x 1 + . . . + x n S_n = x_1 + ... + x_n S n ​ = x 1 ​ + ... + x n ​ follows the normal distribution, with mean n μ 0 n \mu_0 n μ 0 ​ and variance n 2 σ n^2 \sigma n 2 σ . As Z Z Z is the standardization (z-score) of S n / n S_n/n S n ​ / n , we can conclude that the test statistic Z Z Z follows the standard normal distribution N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , provided that H 0 \mathrm H_0 H 0 ​ is true. By the way, we have the z-score calculator if you want to focus on this value alone.

If our data does not follow a normal distribution, or if the population standard deviation is unknown (and thus in the formula for Z Z Z we substitute the population standard deviation σ \sigma σ with sample standard deviation), then the test statistics Z Z Z is not necessarily normal. However, if the sample is sufficiently large, then the central limit theorem guarantees that Z Z Z is approximately N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

In the sections below, we will explain to you how to use the value of the test statistic, z z z , to make a decision , whether or not you should reject the null hypothesis . Two approaches can be used in order to arrive at that decision: the p-value approach, and critical value approach - and we cover both of them! Which one should you use? In the past, the critical value approach was more popular because it was difficult to calculate p-value from Z-test. However, with help of modern computers, we can do it fairly easily, and with decent precision. In general, you are strongly advised to report the p-value of your tests!

p-value from Z-test

Formally, the p-value is the smallest level of significance at which the null hypothesis could be rejected. More intuitively, p-value answers the questions: provided that I live in a world where the null hypothesis holds, how probable is it that the value of the test statistic will be at least as extreme as the z z z - value I've got for my sample? Hence, a small p-value means that your result is very improbable under the null hypothesis, and so there is strong evidence against the null hypothesis - the smaller the p-value, the stronger the evidence.

To find the p-value, you have to calculate the probability that the test statistic, Z Z Z , is at least as extreme as the value we've actually observed, z z z , provided that the null hypothesis is true. (The probability of an event calculated under the assumption that H 0 \mathrm H_0 H 0 ​ is true will be denoted as P r ( event ∣ H 0 ) \small \mathrm{Pr}(\text{event} | \mathrm{H_0}) Pr ( event ∣ H 0 ​ ) .) It is the alternative hypothesis which determines what more extreme means :

  • Two-tailed Z-test: extreme values are those whose absolute value exceeds ∣ z ∣ |z| ∣ z ∣ , so those smaller than − ∣ z ∣ -|z| − ∣ z ∣ or greater than ∣ z ∣ |z| ∣ z ∣ . Therefore, we have:

The symmetry of the normal distribution gives:

  • Left-tailed Z-test: extreme values are those smaller than z z z , so
  • Right-tailed Z-test: extreme values are those greater than z z z , so

To compute these probabilities, we can use the cumulative distribution function, (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , which for a real number, x x x , is defined as:

Also, p-values can be nicely depicted as the area under the probability density function (pdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) , due to:

Two-tailed Z-test and one-tailed Z-test

With all the knowledge you've got from the previous section, you're ready to learn about Z-tests.

  • Two-tailed Z-test:

From the fact that Φ ( − z ) = 1 − Φ ( z ) \Phi(-z) = 1 - \Phi(z) Φ ( − z ) = 1 − Φ ( z ) , we deduce that

The p-value is the area under the probability distribution function (pdf) both to the left of − ∣ z ∣ -|z| − ∣ z ∣ , and to the right of ∣ z ∣ |z| ∣ z ∣ :

two-tailed p value

  • Left-tailed Z-test:

The p-value is the area under the pdf to the left of our z z z :

left-tailed p value

  • Right-tailed Z-test:

The p-value is the area under the pdf to the right of z z z :

right-tailed p value

The decision as to whether or not you should reject the null hypothesis can be now made at any significance level, α \alpha α , you desire!

if the p-value is less than, or equal to, α \alpha α , the null hypothesis is rejected at this significance level; and

if the p-value is greater than α \alpha α , then there is not enough evidence to reject the null hypothesis at this significance level.

Z-test critical values & critical regions

The critical value approach involves comparing the value of the test statistic obtained for our sample, z z z , to the so-called critical values . These values constitute the boundaries of regions where the test statistic is highly improbable to lie . Those regions are often referred to as the critical regions , or rejection regions . The decision of whether or not you should reject the null hypothesis is then based on whether or not our z z z belongs to the critical region.

The critical regions depend on a significance level, α \alpha α , of the test, and on the alternative hypothesis. The choice of α \alpha α is arbitrary; in practice, the values of 0.1, 0.05, or 0.01 are most commonly used as α \alpha α .

Once we agree on the value of α \alpha α , we can easily determine the critical regions of the Z-test:

To decide the fate of H 0 \mathrm H_0 H 0 ​ , check whether or not your z z z falls in the critical region:

If yes, then reject H 0 \mathrm H_0 H 0 ​ and accept H 1 \mathrm H_1 H 1 ​ ; and

If no, then there is not enough evidence to reject H 0 \mathrm H_0 H 0 ​ .

As you see, the formulae for the critical values of Z-tests involve the inverse, Φ − 1 \Phi^{-1} Φ − 1 , of the cumulative distribution function (cdf) of N ( 0 , 1 ) \mathrm N(0, 1) N ( 0 , 1 ) .

How to use the one-sample Z-test calculator?

Our calculator reduces all the complicated steps:

Choose the alternative hypothesis: two-tailed or left/right-tailed.

In our Z-test calculator, you can decide whether to use the p-value or critical regions approach. In the latter case, set the significance level, α \alpha α .

Enter the value of the test statistic, z z z . If you don't know it, then you can enter some data that will allow us to calculate your z z z for you:

  • sample mean x ˉ \bar x x ˉ (If you have raw data, go to the average calculator to determine the mean);
  • tested mean μ 0 \mu_0 μ 0 ​ ;
  • sample size n n n ; and
  • population standard deviation σ \sigma σ (or sample standard deviation if your sample is large).

Results appear immediately below the calculator.

If you want to find z z z based on p-value , please remember that in the case of two-tailed tests there are two possible values of z z z : one positive and one negative, and they are opposite numbers. This Z-test calculator returns the positive value in such a case. In order to find the other possible value of z z z for a given p-value, just take the number opposite to the value of z z z displayed by the calculator.

Z-test examples

To make sure that you've fully understood the essence of Z-test, let's go through some examples:

  • A bottle filling machine follows a normal distribution. Its standard deviation, as declared by the manufacturer, is equal to 30 ml. A juice seller claims that the volume poured in each bottle is, on average, one liter, i.e., 1000 ml, but we suspect that in fact the average volume is smaller than that...

Formally, the hypotheses that we set are the following:

H 0  ⁣ :   μ = 1000  ml \mathrm H_0 \! : \mu = 1000 \text{ ml} H 0 ​ :   μ = 1000  ml

H 1  ⁣ :   μ < 1000  ml \mathrm H_1 \! : \mu \lt 1000 \text{ ml} H 1 ​ :   μ < 1000  ml

We went to a shop and bought a sample of 9 bottles. After carefully measuring the volume of juice in each bottle, we've obtained the following sample (in milliliters):

1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 \small 1020, 970, 1000, 980, 1010, 930, 950, 980, 980 1020 , 970 , 1000 , 980 , 1010 , 930 , 950 , 980 , 980 .

Sample size: n = 9 n = 9 n = 9 ;

Sample mean: x ˉ = 980   m l \bar x = 980 \ \mathrm{ml} x ˉ = 980   ml ;

Population standard deviation: σ = 30   m l \sigma = 30 \ \mathrm{ml} σ = 30   ml ;

And, therefore, p-value = Φ ( − 2 ) ≈ 0.0228 \text{p-value} = \Phi(-2) \approx 0.0228 p-value = Φ ( − 2 ) ≈ 0.0228 .

As 0.0228 < 0.05 0.0228 \lt 0.05 0.0228 < 0.05 , we conclude that our suspicions aren't groundless; at the most common significance level, 0.05, we would reject the producer's claim, H 0 \mathrm H_0 H 0 ​ , and accept the alternative hypothesis, H 1 \mathrm H_1 H 1 ​ .

We tossed a coin 50 times. We got 20 tails and 30 heads. Is there sufficient evidence to claim that the coin is biased?

Clearly, our data follows Bernoulli distribution, with some success probability p p p and variance σ 2 = p ( 1 − p ) \sigma^2 = p (1-p) σ 2 = p ( 1 − p ) . However, the sample is large, so we can safely perform a Z-test. We adopt the convention that getting tails is a success.

Let us state the null and alternative hypotheses:

H 0  ⁣ :   p = 0.5 \mathrm H_0 \! : p = 0.5 H 0 ​ :   p = 0.5 (the coin is fair - the probability of tails is 0.5 0.5 0.5 )

H 1  ⁣ :   p ≠ 0.5 \mathrm H_1 \! : p \ne 0.5 H 1 ​ :   p  = 0.5 (the coin is biased - the probability of tails differs from 0.5 0.5 0.5 )

In our sample we have 20 successes (denoted by ones) and 30 failures (denoted by zeros), so:

Sample size n = 50 n = 50 n = 50 ;

Sample mean x ˉ = 20 / 50 = 0.4 \bar x = 20/50 = 0.4 x ˉ = 20/50 = 0.4 ;

Population standard deviation is given by σ = 0.5 × 0.5 \sigma = \sqrt{0.5 \times 0.5} σ = 0.5 × 0.5 ​ (because 0.5 0.5 0.5 is the proportion p p p hypothesized in H 0 \mathrm H_0 H 0 ​ ). Hence, σ = 0.5 \sigma = 0.5 σ = 0.5 ;

  • And, therefore

Since 0.1573 > 0.1 0.1573 \gt 0.1 0.1573 > 0.1 we don't have enough evidence to reject the claim that the coin is fair , even at such a large significance level as 0.1 0.1 0.1 . In that case, you may safely toss it to your Witcher or use the coin flip probability calculator to find your chances of getting, e.g., 10 heads in a row (which are extremely low!).

What is the difference between Z-test vs t-test?

We use a t-test for testing the population mean of a normally distributed dataset which had an unknown population standard deviation . We get this by replacing the population standard deviation in the Z-test statistic formula by the sample standard deviation, which means that this new test statistic follows (provided that H₀ holds) the t-Student distribution with n-1 degrees of freedom instead of N(0,1) .

When should I use t-test over the Z-test?

For large samples, the t-Student distribution with n degrees of freedom approaches the N(0,1). Hence, as long as there are a sufficient number of data points (at least 30), it does not really matter whether you use the Z-test or the t-test, since the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test instead of Z-test .

How do I calculate the Z test statistic?

To calculate the Z test statistic:

  • Compute the arithmetic mean of your sample .
  • From this mean subtract the mean postulated in null hypothesis .
  • Multiply by the square root of size sample .
  • Divide by the population standard deviation .
  • That's it, you've just computed the Z test statistic!

Here, we perform a Z-test for population mean μ. Null hypothesis H₀: μ = μ₀.

Alternative hypothesis H₁

Significance level α

The probability that we reject the true hypothesis H₀ (type I error).

Z Test: Definition & Two Proportion Z-Test

What is a z test.

z test

For example, if someone said they had found a new drug that cures cancer, you would want to be sure it was probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is used when your data is approximately normally distributed (i.e. the data has the shape of a bell curve when you graph it).

When you can run a Z Test.

Several different types of tests are used in statistics (i.e. f test , chi square test , t test ). You would use a Z test if:

  • Your sample size is greater than 30 . Otherwise, use a t test .
  • Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.
  • Your data should be normally distributed . However, for large sample sizes (over 30) this doesn’t always matter.
  • Your data should be randomly selected from a population, where each item has an equal chance of being selected.
  • Sample sizes should be equal if at all possible.

How do I run a Z Test?

Running a Z test on your data requires five steps:

  • State the null hypothesis and alternate hypothesis .
  • Choose an alpha level .
  • Find the critical value of z in a z table .
  • Calculate the z test statistic (see below).
  • Compare the test statistic to the critical z value and decide if you should support or reject the null hypothesis .

You could perform all these steps by hand. For example, you could find a critical value by hand , or calculate a z value by hand . For a step by step example, watch the following video: Watch the video for an example:

z test null and alternative hypothesis

Can’t see the video? Click here to watch it on YouTube. You could also use technology, for example:

  • Two sample z test in Excel .
  • Find a critical z value on the TI 83 .
  • Find a critical value on the TI 89 (left-tail) .

Two Proportion Z-Test

Watch the video to see a two proportion z-test:

z test null and alternative hypothesis

Can’t see the video? Click here to watch it on YouTube.

A Two Proportion Z-Test (or Z-interval) allows you to calculate the true difference in proportions of two independent groups to a given confidence interval .

There are a few familiar conditions that need to be met for the Two Proportion Z-Interval to be valid.

  • The groups must be independent. Subjects can be in one group or the other, but not both – like teens and adults.
  • The data must be selected randomly and independently from a homogenous population. A survey is a common example.
  • The population should be at least ten times bigger than the sample size. If the population is teenagers for example, there should be at least ten times as many total teenagers as the number of teenagers being surveyed.
  • The null hypothesis (H 0 ) for the test is that the proportions are the same.
  • The alternate hypothesis (H 1 ) is that the proportions are not the same.

Example question: let’s say you’re testing two flu drugs A and B. Drug A works on 41 people out of a sample of 195. Drug B works on 351 people in a sample of 605. Are the two drugs comparable? Use a 5% alpha level .

Step 1: Find the two proportions:

  • P 1 = 41/195 = 0.21 (that’s 21%)
  • P 2 = 351/605 = 0.58 (that’s 58%).

Set these numbers aside for a moment.

Step 2: Find the overall sample proportion . The numerator will be the total number of “positive” results for the two samples and the denominator is the total number of people in the two samples.

  • p = (41 + 351) / (195 + 605) = 0.49.

Set this number aside for a moment.

two-proprtion-z-test

Solving the formula, we get: Z = 8.99

We need to find out if the z-score falls into the “ rejection region .”

z alpha

Step 5: Compare the calculated z-score from Step 3 with the table z-score from Step 4. If the calculated z-score is larger, you can reject the null hypothesis.

8.99 > 1.96, so we can reject the null hypothesis .

Example 2:  Suppose that in a survey of 700 women and 700 men, 35% of women and 30% of men indicated that they support a particular presidential candidate. Let’s say we wanted to find the true difference in proportions of these two groups to a 95% confidence interval .

At first glance the survey indicates that women support the candidate more than men by about 5% . However, for this statistical inference to be valid we need to construct a range of values to a given confidence interval.

To do this, we use the formula for Two Proportion Z-Interval:

z test null and alternative hypothesis

Plugging in values we find the true difference in proportions to be

z test null and alternative hypothesis

Based on the results of the survey, we are 95% confident that the difference in proportions of women and men that support the presidential candidate is between about 0 % and 10% .

Check out our YouTube channel for more stats help and tips!

Z-Test for Statistical Hypothesis Testing Explained

The Z-test is a statistical hypothesis test that determines where the distribution of the statistic we are measuring, like the mean, is part of the normal distribution.

Egor Howell

The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .

There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.

What Is a Z-Test?

A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.  

The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.

Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.

Requirements for a Z-Test

In order to conduct a Z-test, your statistics need to meet a few requirements, including:

  • A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
  • The standard deviation and mean of the population is known .
  • The sample data is collected/acquired randomly .

More on Data Science:   What Is Bootstrapping Statistics?

Z-Test Steps

There are four steps to complete a Z-test. Let’s examine each one.

4 Steps to a Z-Test

  • State the null hypothesis.
  • State the alternate hypothesis.
  • Choose your critical value.
  • Calculate your Z-test statistics. 

1. State the Null Hypothesis

The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :

2. State the Alternate Hypothesis

Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:

3. Choose Your Critical Value

Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:

This critical value is based on confidence intervals.

4. Calculate Your Z-Test Statistic

Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :

If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.

Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.

More on Data Science: Basic Probability Theory and Statistics Terms to Know

Z-Test Example

Let’s go through an example to fully understand the one-sample mean Z-test.

A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?

The null and alternate hypotheses are:

Where we are saying that our sample, the school, has a higher mean IQ than the population mean.

Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.

If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.

Computing our Z-test statistic, we see:

Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.

Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.  

Recent Data Science Articles

What Is a Data Platform? 33 Examples of Big Data Platforms to Know.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Null and Alternative Hypotheses | Definitions & Examples

Null & Alternative Hypotheses | Definitions, Templates & Examples

Published on May 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test :

  • Null hypothesis ( H 0 ): There’s no effect in the population .
  • Alternative hypothesis ( H a or H 1 ) : There’s an effect in the population.

Table of contents

Answering your research question with hypotheses, what is a null hypothesis, what is an alternative hypothesis, similarities and differences between null and alternative hypotheses, how to write null and alternative hypotheses, other interesting articles, frequently asked questions.

The null and alternative hypotheses offer competing answers to your research question . When the research question asks “Does the independent variable affect the dependent variable?”:

  • The null hypothesis ( H 0 ) answers “No, there’s no effect in the population.”
  • The alternative hypothesis ( H a ) answers “Yes, there is an effect in the population.”

The null and alternative are always claims about the population. That’s because the goal of hypothesis testing is to make inferences about a population based on a sample . Often, we infer whether there’s an effect in the population by looking at differences between groups or relationships between variables in the sample. It’s critical for your research to write strong hypotheses .

You can use a statistical test to decide whether the evidence favors the null or alternative hypothesis. Each type of statistical test comes with a specific way of phrasing the null and alternative hypothesis. However, the hypotheses can also be phrased in a general way that applies to any test.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

z test null and alternative hypothesis

The null hypothesis is the claim that there’s no effect in the population.

If the sample provides enough evidence against the claim that there’s no effect in the population ( p ≤ α), then we can reject the null hypothesis . Otherwise, we fail to reject the null hypothesis.

Although “fail to reject” may sound awkward, it’s the only wording that statisticians accept . Be careful not to say you “prove” or “accept” the null hypothesis.

Null hypotheses often include phrases such as “no effect,” “no difference,” or “no relationship.” When written in mathematical terms, they always include an equality (usually =, but sometimes ≥ or ≤).

You can never know with complete certainty whether there is an effect in the population. Some percentage of the time, your inference about the population will be incorrect. When you incorrectly reject the null hypothesis, it’s called a type I error . When you incorrectly fail to reject it, it’s a type II error.

Examples of null hypotheses

The table below gives examples of research questions and null hypotheses. There’s always more than one way to answer a research question, but these null hypotheses can help you get started.

( )
Does tooth flossing affect the number of cavities? Tooth flossing has on the number of cavities. test:

The mean number of cavities per person does not differ between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ = µ .

Does the amount of text highlighted in the textbook affect exam scores? The amount of text highlighted in the textbook has on exam scores. :

There is no relationship between the amount of text highlighted and exam scores in the population; β = 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression.* test:

The proportion of people with depression in the daily-meditation group ( ) is greater than or equal to the no-meditation group ( ) in the population; ≥ .

*Note that some researchers prefer to always write the null hypothesis in terms of “no effect” and “=”. It would be fine to say that daily meditation has no effect on the incidence of depression and p 1 = p 2 .

The alternative hypothesis ( H a ) is the other answer to your research question . It claims that there’s an effect in the population.

Often, your alternative hypothesis is the same as your research hypothesis. In other words, it’s the claim that you expect or hope will be true.

The alternative hypothesis is the complement to the null hypothesis. Null and alternative hypotheses are exhaustive, meaning that together they cover every possible outcome. They are also mutually exclusive, meaning that only one can be true at a time.

Alternative hypotheses often include phrases such as “an effect,” “a difference,” or “a relationship.” When alternative hypotheses are written in mathematical terms, they always include an inequality (usually ≠, but sometimes < or >). As with null hypotheses, there are many acceptable ways to phrase an alternative hypothesis.

Examples of alternative hypotheses

The table below gives examples of research questions and alternative hypotheses to help you get started with formulating your own.

Does tooth flossing affect the number of cavities? Tooth flossing has an on the number of cavities. test:

The mean number of cavities per person differs between the flossing group (µ ) and the non-flossing group (µ ) in the population; µ ≠ µ .

Does the amount of text highlighted in a textbook affect exam scores? The amount of text highlighted in the textbook has an on exam scores. :

There is a relationship between the amount of text highlighted and exam scores in the population; β ≠ 0.

Does daily meditation decrease the incidence of depression? Daily meditation the incidence of depression. test:

The proportion of people with depression in the daily-meditation group ( ) is less than the no-meditation group ( ) in the population; < .

Null and alternative hypotheses are similar in some ways:

  • They’re both answers to the research question.
  • They both make claims about the population.
  • They’re both evaluated by statistical tests.

However, there are important differences between the two types of hypotheses, summarized in the following table.

A claim that there is in the population. A claim that there is in the population.

Equality symbol (=, ≥, or ≤) Inequality symbol (≠, <, or >)
Rejected Supported
Failed to reject Not supported

Prevent plagiarism. Run a free check.

To help you write your hypotheses, you can use the template sentences below. If you know which statistical test you’re going to use, you can use the test-specific template sentences. Otherwise, you can use the general template sentences.

General template sentences

The only thing you need to know to use these general template sentences are your dependent and independent variables. To write your research question, null hypothesis, and alternative hypothesis, fill in the following sentences with your variables:

Does independent variable affect dependent variable ?

  • Null hypothesis ( H 0 ): Independent variable does not affect dependent variable.
  • Alternative hypothesis ( H a ): Independent variable affects dependent variable.

Test-specific template sentences

Once you know the statistical test you’ll be using, you can write your hypotheses in a more precise and mathematical way specific to the test you chose. The table below provides template sentences for common statistical tests.

( )
test 

with two groups

The mean dependent variable does not differ between group 1 (µ ) and group 2 (µ ) in the population; µ = µ . The mean dependent variable differs between group 1 (µ ) and group 2 (µ ) in the population; µ ≠ µ .
with three groups The mean dependent variable does not differ between group 1 (µ ), group 2 (µ ), and group 3 (µ ) in the population; µ = µ = µ . The mean dependent variable of group 1 (µ ), group 2 (µ ), and group 3 (µ ) are not all equal in the population.
There is no correlation between independent variable and dependent variable in the population; ρ = 0. There is a correlation between independent variable and dependent variable in the population; ρ ≠ 0.
There is no relationship between independent variable and dependent variable in the population; β = 0. There is a relationship between independent variable and dependent variable in the population; β ≠ 0.
Two-proportions test The dependent variable expressed as a proportion does not differ between group 1 ( ) and group 2 ( ) in the population; = . The dependent variable expressed as a proportion differs between group 1 ( ) and group 2 ( ) in the population; ≠ .

Note: The template sentences above assume that you’re performing one-tailed tests . One-tailed tests are appropriate for most studies.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

The null hypothesis is often abbreviated as H 0 . When the null hypothesis is written using mathematical symbols, it always includes an equality symbol (usually =, but sometimes ≥ or ≤).

The alternative hypothesis is often abbreviated as H a or H 1 . When the alternative hypothesis is written using mathematical symbols, it always includes an inequality symbol (usually ≠, but sometimes < or >).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (“ x affects y because …”).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses . In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Null & Alternative Hypotheses | Definitions, Templates & Examples. Scribbr. Retrieved September 4, 2024, from https://www.scribbr.com/statistics/null-and-alternative-hypotheses/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, inferential statistics | an easy introduction & examples, hypothesis testing | a step-by-step guide with easy examples, type i & type ii errors | differences, examples, visualizations, what is your plagiarism score.

Logo for Maricopa Open Digital Press

10 Chapter 10: Hypothesis Testing with Z

Setting up the hypotheses.

When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.

Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.

As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.

If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47. 

Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47

In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.

Determining Critical Value from α

We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.

When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .

z test null and alternative hypothesis

Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.

When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .

z test null and alternative hypothesis

Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.

Determining Cutoff Scores with Two-Tailed Tests

Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).

When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.

z test null and alternative hypothesis

Figure 3. Critical value differences in one and two-tail tests. Photo Credit

Let’s re view th e set critical values for Z.

We discussed z-scores and probability in chapter 8.  If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.

  • A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
  • A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
  • A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
  • A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.

Review: Critical values, p-values, and significance level

There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.

Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.

image

Figure 4: The rejection region for a one-tailed test

The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.

The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.

Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.

image

Figure 5: Two-tailed rejection region

Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.

Calculate the test statistic: Z

Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z!  The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.

Make a decision

To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.

The formula for our z- statistic has not changed:

z test null and alternative hypothesis

To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.

Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.

image

Figure 6. Relation between α, z obt , and p

When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.

Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.

Review: Steps of the Hypothesis Testing Process

The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.

Step 1: State the Hypotheses

Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.

Step 2: Find the Critical Values

Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.

Step 3: Compute the Test Statistic

Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.

Step 4: Make the Decision

Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.

Example: Movie Popcorn

Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.

Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:

H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00

Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.

Our assumption of no difference, the null hypothesis, is that this mean is exactly

the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:

H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00

In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.

Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense

image

Figure 7: Rejection region for z* = ±1.96

Step 3: Calculate the Test Statistic

Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:

So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:

image

Figure 8: Test statistic location

Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:

When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.

Effect Size

When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.

For mean differences like we calculated here, our effect size is Cohen’s d :

z test null and alternative hypothesis

Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size

d Interpretation
0.0 – 0.2 negligible
0.2 – 0.5 small
0.5 – 0.8 medium
0.8 – large

Table 1. Interpretation of Cohen’s d

Example: Office Temperature

Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed

to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.

You start by laying out the null hypothesis:

H 0 : There is no difference in the average building temperature H 0 : μ = 74

Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:

H A : The average building temperature is higher than claimed H A : μ > 74

image

Now that you have everything set up, you spend one week collecting temperature data:

Day

Temp

Monday

77

Tuesday

76

Wednesday

74

Thursday

78

Friday

78

You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):

z = 76.60 − 74.00 = 2.60    = 5.78

          1.00/√5            0.45

This value falls so far into the tail that it cannot even be plotted on the distribution!

image

Figure 7: Obtained z-statistic

You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.

d = (76.60-74.00)/ 1= 2.60

The effect size you calculate is definitely large, meaning someone has some explaining to do!

Example: Different Significance Level

First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.

We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50

We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50

We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.

We can now calculate our test statistic.  The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:

Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:

Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).

Review Considerations in Hypothesis Testing

Errors in hypothesis testing.

Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.

Statistical Power

The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.

Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.

Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.

It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.

Learning Objectives

Having read the chapter, students should be able to:

  • Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
  • Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.

Exercises – Ch. 10

  • List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
  • z = 1.99, two-tailed test at α = 0.05
  • z = 1.99, two-tailed test at α = 0.01
  • z = 1.99, one-tailed test at α = 0.05
  • You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
  • A study examines self-esteem and depression in teenagers.  A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory.  The average score for the group is 20.9.  For the general population, the average score is 18.3 with σ = 12.  Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
  • You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.

Answers to Odd- Numbered Exercises – Ch. 10

1. List hypotheses. Determine critical region. Calculate z.  Compare z to critical region. Draw Conclusion.  We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important

5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”

Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

z test null and alternative hypothesis

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

On This Page sidebar

Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

Type i and type ii errors.

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

The procedure for hypothesis testing is based on the ideas described above. Specifically, we set up competing hypotheses, select a random sample from the population of interest and compute summary statistics. We then determine whether the sample data supports the null or alternative hypotheses. The procedure can be broken down into the following five steps.  

  • Step 1. Set up hypotheses and select the level of significance α.

H 0 : Null hypothesis (no change, no difference);  

H 1 : Research hypothesis (investigator's belief); α =0.05

 

Upper-tailed, Lower-tailed, Two-tailed Tests

The research or alternative hypothesis can take one of three forms. An investigator might believe that the parameter has increased, decreased or changed. For example, an investigator might hypothesize:  

: μ > μ , where μ is the comparator or null value (e.g., μ =191 in our example about weight in men in 2006) and an increase is hypothesized - this type of test is called an ; : μ < μ , where a decrease is hypothesized and this is called a ; or : μ ≠ μ where a difference is hypothesized and this is called a .  

The exact form of the research hypothesis depends on the investigator's belief about the parameter of interest and whether it has possibly increased, decreased or is different from the null value. The research hypothesis is set up by the investigator before any data are collected.

 

  • Step 2. Select the appropriate test statistic.  

The test statistic is a single number that summarizes the sample information.   An example of a test statistic is the Z statistic computed as follows:

When the sample size is small, we will use t statistics (just as we did when constructing confidence intervals for small samples). As we present each scenario, alternative test statistics are provided along with conditions for their appropriate use.

  • Step 3.  Set up decision rule.  

The decision rule is a statement that tells under what circumstances to reject the null hypothesis. The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the test statistic and the level of significance. Each is discussed below.

  • The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test statistic is larger than the critical value. In a lower-tailed test the decision rule has investigators reject H 0 if the test statistic is smaller than the critical value.  In a two-tailed test the decision rule has investigators reject H 0 if the test statistic is extreme, either larger than an upper critical value or smaller than a lower critical value.
  • The exact form of the test statistic is also important in determining the decision rule. If the test statistic follows the standard normal distribution (Z), then the decision rule will be based on the standard normal distribution. If the test statistic follows the t distribution, then the decision rule will be based on the t distribution. The appropriate critical value will be selected from the t distribution again depending on the specific alternative hypothesis and the level of significance.  
  • The third factor is the level of significance. The level of significance which is selected in Step 1 (e.g., α =0.05) dictates the critical value.   For example, in an upper tailed Z test, if α =0.05 then the critical value is Z=1.645.  

The following figures illustrate the rejection regions defined by the decision rule for upper-, lower- and two-tailed Z tests with α=0.05. Notice that the rejection regions are in the upper, lower and both tails of the curves, respectively. The decision rules are written below each figure.

Rejection Region for Upper-Tailed Z Test (H : μ > μ ) with α=0.05

The decision rule is: Reject H if Z 1.645.

 

 

α

Z

0.10

1.282

0.05

1.645

0.025

1.960

0.010

2.326

0.005

2.576

0.001

3.090

0.0001

3.719

Standard normal distribution with lower tail at -1.645 and alpha=0.05

Rejection Region for Lower-Tailed Z Test (H 1 : μ < μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < 1.645.

a

Z

0.10

-1.282

0.05

-1.645

0.025

-1.960

0.010

-2.326

0.005

-2.576

0.001

-3.090

0.0001

-3.719

Standard normal distribution with two tails

Rejection Region for Two-Tailed Z Test (H 1 : μ ≠ μ 0 ) with α =0.05

The decision rule is: Reject H 0 if Z < -1.960 or if Z > 1.960.

0.20

1.282

0.10

1.645

0.05

1.960

0.010

2.576

0.001

3.291

0.0001

3.819

The complete table of critical values of Z for upper, lower and two-tailed tests can be found in the table of Z values to the right in "Other Resources."

Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in "Other Resources."

  • Step 4. Compute the test statistic.  

Here we compute the test statistic by substituting the observed sample data into the test statistic identified in Step 2.

  • Step 5. Conclusion.  

The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule. The final conclusion will be either to reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely).  

If the null hypothesis is rejected, then an exact significance level is computed to describe the likelihood of observing the sample data assuming that the null hypothesis is true. The exact level of significance is called the p-value and it will be less than the chosen level of significance if we reject H 0 .

Statistical computing packages provide exact p-values as part of their standard output for hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and the significance criterion should also be determined (e.g., α =0.05). Statistical computing packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The investigator can then determine statistical significance using the following: If p < α then reject H 0 .  

 

 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : μ = 191 H 1 : μ > 191                 α =0.05

The research hypothesis is that weights have increased, and therefore an upper tailed test is used.

  • Step 2. Select the appropriate test statistic.

Because the sample size is large (n > 30) the appropriate test statistic is

  • Step 3. Set up decision rule.  

In this example, we are performing an upper tailed test (H 1 : μ> 191), with a Z test statistic and selected α =0.05.   Reject H 0 if Z > 1.645.

We now substitute the sample data into the formula for the test statistic identified in Step 2.  

We reject H 0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null hypothesis, we now approximate the p-value which is the likelihood of observing the sample data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of significance where we can still reject H 0 . In this example, we observed Z=2.38 and for α=0.05, the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0 . In our conclusion we reported a statistically significant increase in mean weight at a 5% level of significance. Using the table of critical values for upper tailed tests, we can approximate the p-value. If we select α=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select α=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we select α=0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576. Therefore, the smallest α where we still reject H 0 is 0.010. This is the p-value. A statistical computing package would produce a more precise p-value which would be in between 0.005 and 0.010. Here we are approximating the p-value and would report p < 0.010.                  

In all tests of hypothesis, there are two types of errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true. This is also called a false positive result (as we incorrectly conclude that the research hypothesis is true when in fact it is not). When we run a test of hypothesis and decide to reject H 0 (e.g., because the test statistic exceeds the critical value in an upper tailed test) then either we make a correct decision because the research hypothesis is true or we commit a Type I error. The different conclusions are summarized in the table below. Note that we will never know whether the null hypothesis is really true or false (i.e., we will never know which row of the following table reflects reality).

Table - Conclusions in Test of Hypothesis

 

is True

Correct Decision

Type I Error

is False

Type II Error

Correct Decision

In the first step of the hypothesis test, we select a level of significance, α, and α= P(Type I error). Because we purposely select a small value for α, we control the probability of committing a Type I error. For example, if we select α=0.05, and our test tells us to reject H 0 , then there is a 5% probability that we commit a Type I error. Most investigators are very comfortable with this and are confident when rejecting H 0 that the research hypothesis is true (as it is the more likely scenario when we reject H 0 ).

When we run a test of hypothesis and decide not to reject H 0 (e.g., because the test statistic is below the critical value in an upper tailed test) then either we make a correct decision because the null hypothesis is true or we commit a Type II error. Beta (β) represents the probability of a Type II error and is defined as follows: β=P(Type II error) = P(Do not Reject H 0 | H 0 is false). Unfortunately, we cannot choose β to be small (e.g., 0.05) to control the probability of committing a Type II error because β depends on several factors including the sample size, α, and the research hypothesis. When we do not reject H 0 , it may be very likely that we are committing a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and the null hypothesis is not rejected we often make a weak concluding statement allowing for the possibility that we might be committing a Type II error. If we do not reject H 0 , we conclude that we do not have significant evidence to show that H 1 is true. We do not conclude that H 0 is true.

Lightbulb icon signifying an important idea

 The most common reason for a Type II error is a small sample size.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

Approximate Hypothesis Tests: the z Test and the t Test

This chapter presents two common tests of the hypothesis that a population mean equals a particular value and of the hypothesis that two population means are equal: the z test and the t test. These tests are approximate : They are based on approximations to the probability distribution of the test statistic when the null hypothesis is true, so their significance levels are not exactly what they claim to be. If the sample size is reasonably large and the population from which the sample is drawn has a nearly normal distribution —a notion defined in this chapter—the nominal significance levels of the tests are close to their actual significance levels. If these conditions are not met, the significance levels of the approximate tests can differ substantially from their nominal values. The z test is based on the normal approximation ; the t test is based on Student's t curve, which approximates some probability histograms better than the normal curve does. The chapter also presents the deep connection between hypothesis tests and confidence intervals, and shows how to compute approximate confidence intervals for the population mean of nearly normal populations using Student's t -curve.

where \(\phi\) is the pooled sample percentage of the two samples. The estimate of \(SE(\phi^{t-c})\) under the null hypothesis is

\[ se = s^*\times(1/n_t + 1/n_c)^{1/2}, \]

where \(n_t\) and \(n_c\) are the sizes of the two samples. If the null hypothesis is true, the Z statistic,

\[ Z=\phi^{t-c}/se, \]

is the original test statistic \(\phi^{t-c}\) in approximately standard units , and Z has a probability histogram that is approximated well by the normal curve , which allowed us to select the rejection region for the approximate test.

This strategy—transforming a test statistic approximately to standard units under the assumption that the null hypothesisis true, and then using the normal approximation to determine the rejection region for the test—works to construct approximate hypothesis tests in many other situations, too. The resulting hypothesis test is called a z test. Suppose that we are testing a null hypothesis using a test statistic \(X\) , and the following conditions hold:

  • We have a probability model for how the observations arise, assuming the null hypothesis is true. Typically, the model is that under the null hypothesis, the data are like random draws with or without replacement from a box of numbered tickets.
  • Under the null hypothesis, the test statistic \(X\) , converted to standard units, has a probability histogram that can be approximated well by the normal curve.
  • Under the null hypothesis, we can find the expected value of the test statistic, \(E(X)\) .
  • Under the null hypothesis, either we can find the SE of the test statistic, \(SE(X)\) , or we can estimate \(SE(X)\) accurately enough to ignore the error of the estimate of the SE. Let se denote either the exact SE of \(X\) under the null hypothesis, or the estimated value of \(SE(X)\) under the null hypothesis.

Then, under the null hypothesis, the probability histogram of the Z statistic

\[ Z = (X-E(X))/se \]

is approximated well by the normal curve, and we can use the normal approximation to select the rejection region for the test using \(Z\) as the test statistic. If the null hypothesis is true,

\[ P(Z < z_a) \approx a \]

\[ P(Z > z_{1-a} ) \approx a, \]

\[ P(|Z| > z_{1-a/2} ) \approx a. \]

These three approximations yield three different z tests of the hypothesis that \(\mu = \mu_0\) at approximate significance level \(a\) :

  • Reject the null hypothesis whenever \(Z (left-tail z test)
  • Reject the null hypothesis whenever \(Z > z_{1-a}\) (right-tail z test)
  • Reject the null hypothesis whenever \(|Z|> z_{1-a/2}\) (two-tail z test)

The word "tail" refers to the tails of the normal curve: In a left-tail test, the probability of a Type I error is approximately the area of the left tail of the normal curve, from minus infinity to \(z_a\) . In a right-tail test, the probability of a Type I error is approximately the area of the right tail of the normal curve, from \(z_{1-a}\) to infinity. In a two-tail test, the probability of a Type I error is approximately the sum of the areas of both tails of the normal curve, the left tail from minus infinity to \(z_{a/2}\) and the right tail from \(z_{1-a/2}\) to infinity. All three of these tests are called z tests. The observed value of Z is called the z score .

Which of these three tests, if any, should one use? The answer depends on the probability distribution of Z when the alternative hypothesis is true. As a rule of thumb, if, under the alternative hypothesis, \(E(Z) , use the left-tail test. If, under the alternative hypothesis, \(E(Z) > 0\) , use the right-tail test. If, under the alternative hypothesis, it is possible that \(E(Z) and it is possible that \(E(Z) > 0\) , use the two-tail test. If, under the alternative hypothesis, \(E(Z) = 0\) , consult a statistician. Generally (but not always), this rule of thumb selects the test with the most power for a given significance level.

P values for z tests

Each of the three z tests gives us a family of procedures for testing the null hypothesis at any (approximate) significance level \(a\) between 0 and 100%—we just use the appropriate quantile of the normal curve. This makes it particularly easy to find the P value for a z test. Recall that the P value is the smallest significance level for which we would reject the null hypothesis, among a family of tests of the null hypothesis at different significance levels.

Suppose the z score (the observed value of \(Z\) ) is \(x\) . In a left-tail test, the P value is the area under the normal curve to the left of \(x\) : Had we chosen the significance level \(a\) so that \(z_a=x\) , we would have rejected the null hypothesis, but we would not have rejected it for any smaller value of \(a\) , because for all smaller values of \(a\) , \(z_a . Similarly, for a right-tail z test, the P value is the area under the normal curve to the right of \(x\) : If \(x=z_{1-a}\) we would reject the null hypothesis at approximate significance level \(a\) , but not at smaller significance levels. For a two-tail z test, the P value is the sum of the area under the normal curve to the left of \(-|x|\) and the area under the normal curve to the right of \(|x|\) .

Finding P values and specifying the rejection region for the z test involves the probability distribution of \(Z\) under the assumption that the null hypothesis is true. Rarely is the alternative hypothesis sufficiently detailed to specify the probability distribution of \(Z\) completely, but often the alternative does help us choose intelligently among left-tail, right-tail, and two-tail z tests. This is perhaps the most important issue in deciding which hypothesis to take as the null hypothesis and which as the alternative: We calculate the significance level under the null hypothesis, and that calculation must be tractable.

However, to construct a z test, we need to know the expected value and SE of the test statistic under the null hypothesis. Usually it is easy to determine the expected value, but often the SE must be estimated from the data. Later in this chapter we shall see what to do if the SE cannot be estimated accurately, but the shape of the distribution of the numbers in the population is known. The next section develops z tests for the population percentage and mean, and for the difference between two population means.

Examples of z tests

The central limit theorem assures us that the probability histogram of the sample mean of random draws with replacement from a box of tickets—transformed to standard units—can be approximated increasingly well by a normal curve as the number of draws increases. In the previous section, we learned that the probability histogram of a sum or difference of independent sample means of draws with replacement also can be approximated increasingly well by a normal curve as the two sample sizes increase. We shall use these facts to derive z tests for population means and percentages and differences of population means and percentages.

z Test for a Population Percentage

Suppose we have a population of \(N\) units of which \(G\) are labeled "1" and the rest are labeled "0." Let \(p = G/N\) be the population percentage. Consider testing the null hypothesis that \(p = p_0\) against the alternative hypothesis that \(p \ne p_0\) , using a random sample of \(n\) units drawn with replacement. (We could assume instead that \(N >> n\) and allow the draws to be without replacement.)

Under the null hypothesis, the sample percentage

\[ \phi = \frac{\mbox{# tickets labeled "1" in the sample}}{n} \]

has expected value \(E(\phi) = p_0\) and standard error

\[ SE(\phi) = \sqrt{\frac{p_0 \times (1 - p_0)}{n}}. \]

Let \(Z\) be \(\phi\) transformed to standard units :

\[ Z = (\phi - p_0)/SE(\phi). \]

Provided \(n\) is large and \(p_0\) is not too close to zero or 100% (say \(n \times p > 30\) and \(n \times (1-p) > 30)\) , the probability histogram of \(Z\) will be approximated reasonably well by the normal curve, and we can use it as the Z statistic in a z test. For example, if we reject the null hypothesis when \(|Z| > 1.96\) , the significance level of the test will be about 95%.

z Test for a Population Mean

The approach in the previous subsection applies, mutatis mutandis , to testing the hypothesis that the population mean equals a given value, even when the population contains numbers other than just 0 and 1. However, in contrast to the hypothesis that the population percentage equals a given value, the null hypothesis that a more general population mean equals a given value does not specify the SD of the population, which poses difficulties that are surmountable (by approximation and estimation) if the sample size is large enough. (There are also nonparametric methods that can be used.)

Consider testing the null hypothesis that the population mean \(\mu\) is equal to a specific null value \(\mu_0\) , against the alternative hypothesis that \(\mu , on the basis of a random sample with replacement of size \(n\) . Recall that the sample mean \(M\) of \(n\) random draws with or without replacement from a box of numbered tickets is an unbiased estimator of the population mean \(\mu\) : If

\[ M = \frac{\mbox{sum of sample values}}{n}, \]

\[ E(M) = \mu = \frac{\mbox{sum of population values}}{N}, \]

where \(N\) is the size of the population. The population mean determines the expected value of the sample mean. The SE of the sample mean of a random sample with replacement is

\[ \frac{SD(\mbox{box})}{\sqrt{n}}, \]

where SD(box) is the SD of the list of all the numbers in the box, and \(n\) is the sample size. As a special case, the sample percentage \phi of \(n\) independent random draws from a 0-1 box is an unbiased estimator of the population percentage p , with SE equal to

\[ \sqrt{\frac{p\times(1-p)}{n}}. \]

In testing the null hypothesis that a population percentage \(p\) equals \(p_0\) , the null hypothesis specifies not only the expected value of the sample percentage \phi, it automatically specifies the SE of the sample percentage as well, because the SD of the values in a 0-1 box is determined by the population percentage \(p\) :

\[ SD(box) = \sqrt{p\times(1-p)}. \]

The null hypothesis thus gives us all the information we need to standardize the sample percentage under the null hypothesis. In contrast, the SD of the values in a box of tickets labeled with arbitrary numbers bears no particular relation to the mean of the values, so the null hypothesis that the population mean \(\mu\) of a box of tickets labeled with arbitrary numbers equals a specific value \(\mu_0\) determines the expected value of the sample mean, but not the standard error of the sample mean. To standardize the sample mean to construct a z test for the value of a population mean, we need to estimate the SE of the sample mean under the null hypothesis. When the sample size is large, the sample standard deviation s> is likely to be close to the SD of the population, and

\[ se=\frac{s}{\sqrt{n}} \]

is likely to be an accurate estimate of \(SE(M)\) . The central limit theorem tells us that when the sample size \(n\) is large, the probability histogram of the sample mean, converted to standard units, is approximated well by the normal curve. Under the null hypothesis,

\[ E(M) = \mu_0, \]

and thus when \(n\) is large

\[ Z = \frac{M-\mu_0}{s/\sqrt{n}} \]

has expected value zero, and its probability histogram is approximated well by the normal curve, so we can use \(Z\) as the Z statistic in a z test. If the alternative hypothesis is true, the expected value of \(Z\) could be either greater than zero or less than zero, so it is appropriate to use a two-tail z test. If the alternative hypothesis is \(\mu > \mu_0\) , then under the alternative hypothesis, the expected value of \(Z\) is greater than zero, and it is appropriate to use a right-tail z test. If the alternative hypothesis is \(\mu , then under the alternative hypothesis, the expected value of \(Z\) is less than zero, and it is appropriate to use a left-tail z test.

z Test for a Difference of Population Means

Consider the problem of testing the hypothesis that two population means are equal, using random samples from the two populations. Different sampling designs lead to different hypothesis testing procedures. In this section, we consider two kinds of random samples from the two populations: paired samples and independent samples , and construct z tests appropriate for each.

Paired Samples

Consider a population of \(N\) individuals, each of whom is labeled with two numbers. For example, the \(N\) individuals might be a group of doctors, and the two numbers that label each doctor might be the annual payments to the doctor by an HMO under the terms of the current contract and under the terms of a proposed revision of the contract. Let the two numbers associated with individual \(i\) be \(c_i\) and \(t_i\) . (Think of \(c\) as control and \(t\) as treatment . In this example, control is the current contract, and treatment is the proposed contract.) Let \(\mu_c\) be the population mean of the \(N\) values

\[ \{c_1, c_2, \ldots, c_N \}, \]

and let \(\mu_t\) be the population mean of the \(N\) values

\[ \{t_1, t_2, \ldots, t_N\}. \]

Suppose we want to test the null hypothesis that

\[ \mu = \mu_t - \mu_c = \mu_0 \]

against the alternative hypothesis that \(\mu . With \(\mu_0=\$0\) , this null hypothesis is that the average annual payment to doctors under the proposed revision would be the same as the average payment under the current contract, and the alternative is that on average doctors would be paid less under the new contract than under the current contract. With \(\mu_0=-\$5,000\) , this null hypothesis is that the proposed contract would save the HMO an average of $5,000 per doctor, compared with the current contract; the alternative is that under the proposed contract, the HMO would save even more than that. With \(\mu_0=\$1,000\) , this null hypothesis is that doctors would be paid an average of $1,000 more per year under the new contract than under the old one; the alternative hypothesis is that on average doctors would be paid less than an additional $1,000 per year under the new contract—perhaps even less than they are paid under the current contract. For the remainder of this example, we shall take \(\mu_0=\$1,000\) .

The data on which we shall base the test are observations of both \(c_i\) and \(t_i\) for a sample of \(n\) individuals chosen at random with replacement from the population of \(N\) individuals (or a simple random sample of size \(n ): We select \(n\) doctors at random from the \(N\) doctors under contract to the HMO, record the current annual payments to them, and calculate what the payments to them would be under the terms of the new contract. This is called a paired sample , because the samples from the population of control values and from the population of treatment values come in pairs: one value for control and one for treatment for each individual in the sample. Testing the hypothesis that the difference between two population means is equal to \(\mu_0\) using a paired sample is just the problem of testing the hypothesis that the population mean \(\mu\) of the set of differences

\[ d_i = t_i - c_i, \;\; i= 1, 2, \ldots, N, \]

is equal to \(\mu_0\) . Denote the \(n\) (random) observed values of \(c_i\) and \(t_i\) by \(\{C_1, C_2, \ldots, C_n\}\) and \(\{T_1, T_2, \ldots, T_n \}\) , respectively. The sample mean \(M\) of the differences between the observed values of \(t_i\) and \(c_i\) is the difference of the two sample means:

\[ M = \frac{(T_1-C_1)+(T_2-C_2) + \cdots + (T_n-C_n)}{n} = \frac{T_1+T_2+ \cdots + T_n}{n} - \frac{C_1+C_2+ \cdots + C_n}{n} \]

\[ = (\mbox{sample mean of observed values of } t_i) - (\mbox{sample mean of observed values of } c_i). \]

\(M\) is an unbiased estimator of \(\mu\) , and if n is large, the normal approximation to its probability histogram will be accurate. The SE of \(M\) is the population standard deviation of the \(N\) values \(\{d_1, d_2, \ldots, d_N\}\) , which we shall denote \(SD_d\) , divided by the square root of the sample size, \(n^{1/2}\) . Let \(sd\) denote the sample standard deviation of the \(n\) observed differences \((T_i - C_i), \;\; i=1, 2, \ldots, n\) :

\[ sd = \sqrt{\frac{(T_1-C_1-M)^2 + (T_2-C_2-M)^2 + \cdots + (T_n-C_n-M)^2}{n-1}} \]

(recall that \(M\) is the sample mean of the observed differences). If the sample size \(n\) is large, sd is very likely to be close to SD( d ), and so, under the null hypothesis,

\[ Z = \frac{M-\mu_0}{sd/n^{1/2}} \]

has expected value zero, and when \(n\) is large the probability histogram of \(Z\) can be approximated well by the normal curve. Thus we can use \(Z\) as the Z statistic in a z test of the null hypothesis that \(\mu=\mu_0\) . Under the alternative hypothesis that \(\mu (doctors on the average are paid less than an additional $1,000 per year under the new contract), the expected value of \(Z\) is less than zero, so we should use a left-tail z test. Under the alternative hypothesis \(\mu\ne\mu_0\) (on average, the difference in average annual payments to doctors is not an increase of $1,000, but some other number instead), the expected value of \(Z\) could be positive or negative, so we would use a two-tail z test. Under the alternative hypothesis that \(\mu>\mu_0\) (on average, under the new contract, doctors are paid more than an additional $1,000 per year), the expected value of \(Z\) would be greater than zero, so we should use a right-tail z test.

Independent Samples

Consider two separate populations of numbers, with population means \(\mu_t\) and \(\mu_c\) , respectively. Let \(\mu=\mu_t-\mu_c\) be the difference between the two population means. We would like to test the null hypothesis that \(\mu=\mu_0\) against the alternative hypothesis that \(\mu>0\) . For example, let \(\mu_t\) be the average annual payment by an HMO to doctors in the Los Angeles area, and let \(\mu_c\) be the average annual payment by the same HMO to doctors in the San Francisco area. Then the null hypothesis with \(\mu_0=0\) is that the HMO pays doctors in the two regions the same amount annually, on average; the alternative hypothesis is that the average annual payment by the HMO to doctors differs between the two areas. Suppose we draw a random sample of size \(n_t\) with replacement from the first population, and independently draw a random sample of size \(n_c\) with replacement from the second population. Let \(M_t\) and \(M_c\) be the sample means of the two samples, respectively, and let

\[ M = M_t - M_c \]

be the difference between the two sample means. Because the expected value of \(M_t\) is \(\mu_t\) and the expected value of \(M_c\) is \(\mu_c\) , the expected value of \(M\) is

\[ E(M) = E(M_t - M_c) = E(M_t) - E(M_c) = \mu_t - \mu_c = \mu. \]

Because the two random samples are independent , \(M_t\) and \(-M_c\) are independent random variables, and the SE of their sum is

\[ SE(M) = (SE^2(M_t) + SE^2(M_c))^{1/2}. \]

Let \(s_t\) and \(s_c\) be the sample standard deviations of the two samples, respectively. If \(n_t\) and \(n_c\) are both very large, the two sample standard deviations are likely to be close to the standard deviations of the corresponding populations, and so \(s_t/n_t^{1/2}\) is likely to be close to \(SE(M_t)\) , and \(s_c/n_c^{1/2}\) is likely to be close to \(SE(M_c)\) . Therefore, the pooled estimate of the standard error

\[ se_\mbox{diff} = ( (s_t/n_t^{1/2})^2 + (s_c/n_c^{1/2})^2)^{1/2} = \sqrt{ s_t^2/n_t + s_c^2/n_c} \]

is likely to be close to \(SE(M)\) . Under the null hypothesis, the statistic

\[ Z = \frac{M - \mu_0}{se_\mbox{diff}} = \frac{M_1 - M_2 - \mu_0}{\sqrt{ s_t^2/n_t + s_c^2/n_c}} \]

has expected value zero and its probability histogram is approximated well by the normal curve, so we can use it as the Z statistic in a z test.

Under the alternative hypothesis

\[ \mu = \mu_t - \mu_c > \mu_0, \]

the expected value of \(Z\) is greater than zero, so it is appropriate to use a right-tail z test.

If the alternative hypothesis were \(\mu \ne \mu_0\) , under the alternative the expected value of \(Z\) could be greater than zero or less than zero, so it would be appropriate to use a two-tail z test. If the alternative hypothesis were \(\mu , under the alternative the expected value of \(Z\) would be less than zero, so it would be appropriate to use a left-tail z test.

The following exercises check that you can compute the z test for a population mean or a difference of population means. The exercises are dynamic: the data will tend to change when you reload the page.

For the nominal significance level of the z test for a population mean to be approximately correct, the sample size typically must be large. When the sample size is small, two factors limit the accuracy of the z test: the normal approximation to the probability distribution of the sample mean can be poor, and the sample standard deviation can be an inaccurate estimate of the population standard deviation, so se is not an accurate estimate of the SE of the test statistic Z . For nearly normal populations , defined in the next subsection, the probability distribution of the sample mean is nearly normal even when the sample size is small, and the uncertainty of the sample standard deviation as an estimate of the population standard deviation can be accounted for by using a curve that is broader than the normal curve to approximate the probability distribution of the (approximately) standardized test statistic. The broader curve is Student's t curve . Student's t curve depends on the sample size: The smaller the sample size, the more spread out the curve.

Nearly Normally Distributed Populations

A list of numbers is nearly normally distributed if the fraction of values in any range is close to the area under the normal curve for the corresponding range of standard units—that is, if the list has mean \(\mu\) and standard deviation SD, and for every pair of values \(a < b\) ,

\[ \mbox{ the fraction of numbers in the list between } a \mbox{ and } b \approx \mbox{the area under the normal curve between } (a - \mu)/SD \mbox{ and } (b - \mu)/SD. \]

A list is nearly normally distributed if the normal curve is a good approximation to the histogram of the list transformed to standard units. The histogram of a list that is approximately normally distributed is (nearly) symmetric about some point, and is (nearly) bell-shaped.

No finite population can be exactly normally distributed, because the area under the normal curve between every two distinct values is strictly positive—no matter how large or small the values nor how close together they are. No population that contains only a finite number of distinct values can be exactly normally distributed, for the same reason. In particular, populations that contain only zeros and ones are not approximately normally distributed, so results for the sample mean of samples drawn from nearly normally distributed populations need not apply to the sample percentage of samples drawn from 0-1 boxes. Such results will be more accurate for the sample percentage when the population percentage is close to 50% than when the population percentage is close to 0% or 100%, because then the histogram of population values is more nearly symmetric.

Suppose a population is nearly normally distributed. Then a histogram of the population is approximately symmetric about the mean of the population. The fraction of numbers in the population within ±1 SD of the mean of the population is about 68%, the fraction of numbers within ±2 SD of the mean of the population is about 95%, and the fraction of numbers in the population within ±3 SD of the mean of the population is about 99.7%.

The following exercises check that you understand what it means for a list to be nearly normally distributed. The exercises are dynamic: the data tend to change when you reload the page.

Student's t -curve

Student's t curve is similar to the normal curve, but broader. It is positive, has a single maximum, and is symmetric about zero. The total area under Student's t curve is 100%. Student's t curve approximates some probability histograms more accurately than the normal curve does. There are actually infinitely many Student t curves, one for each positive integer value of the degrees of freedom. As the degrees of freedom increases, the difference between Student's t curve and the normal curve decreases.

Consider a population of \(N\) units labeled with numbers. Let \(\mu\) denote the population mean of the \(N\) numbers, and let SD denote the population standard deviation of the \(N\) numbers. Let \(M\) denote the sample mean of a random sample of size \(n\) drawn with replacement from a population, and let s> denote the sample standard deviation of the sample. The expected value of \(M\) is \(\mu\) , and the SE of \(M\) is \(SD/n^{1/2}\) . Let

\[ Z = (M - \mu)/(SD/n^{1/2}). \]

Then the expected value of \(Z\) is zero, the SE of \(Z\) is 1, and if \(n\) is large enough, the normal curve is a good approximation to the probability histogram of \(Z\) . The closer to normal the distribution of values in the population is, the smaller \(n\) needs to be for the normal curve to be a good approximation to the distribution of \(Z\) . Consider the statistic

\[ T = \frac{M - \mu}{s/n^{1/2}}, \]

which replaces SD by its estimated value (the sample standard deviation \(s\) ). If \(n\) is large enough, \(s\) is very likely to be close to SD, so \(T\) will be close to \(Z\) ; the normal curve will be a good approximation to the probability histogram of \(T\) ; and we can use \(T\) as the Z statistic in a z test of hypotheses about \(\mu\) .

For many populations, when the sample size is small—say less than 25, but the accuracy depends on the population—the normal curve is not a good approximation to the probability histogram of \(T\) . For nearly normally distributed populations, when the sample size is intermediate—say 25–100, but again this depends on the population—the normal curve is a good approximation to the probability histogram of \(Z\) , but not to the probability histogram of \(T\) , because of the variability of the sample standard deviation s> from sample to sample, which tends to broaden the probability distribution of \(T\) (i.e., to make \(SE(T)>1\) ).

When you first load this page, the degrees of freedom will be set to 25, and the region from -1.96 to 1.96 will be hilighted. The area under the normal curve between ±1.96 is 95%, but for Student's t curve with 25 degrees of freedom, the area is about 93.9%: Student's t curve with d.f.=25 is broader than the normal curve. Increase the degrees of freedom to 200; you will see that the Student t curve gets slightly narrower, and the area under the curve between ±1.96 is about 94.9%.

We define quantiles of Student t curves in the same way we defined quantiles of the normal curve: For any number a between 0 and 100%, the a quantile of Student's t curve with \(d.f.=d\) , \(t_{d,a}\) , is the unique value such that the area under the Student t curve with d degrees of freedom from minus infinity to \(t_{d,a}\) is equal to \(a\) . For example, \(t_{d,0.5} = 0\) for all values of \(d\) . Generally, the value of \(t_{d,a}\) depends on the degrees of freedom \(d\) . The probability calculator allows you to find quantiles of Student's t curve.

t test for the Mean of a Nearly Normally Distributed Population

We can use Student's t curve to construct approximate tests of hypotheses about the population mean \(\mu\) when the population standard deviation is unknown, for intermediate values of the sample size \(n\) . The approach is directly analogous to the z test, but instead of using a quantile of the normal curve, we use the corresponding quantile of Student's t curve (with the appropriate number of degrees of freedom). However, for the test to be accurate when \(n\) is small or intermediate, the distribution of values in the population must be nearly normal for the test to have approximately its nominal level. This is a somewhat bizarre restriction: It may require a very large sample to detect that the population is not nearly normal—but if the sample is very large, we can use the z test instead of the t test, so we don't need to rely as much on the assumption. It is my opinion that the t test is over-taught and overused—because its assumptions are not verifiable in the situations where it is potentially useful.

Consider testing the null hypothesis that \(\mu=\mu_0\) using the sample mean \(M\) and sample standard deviation s> of a random sample of size \(n\) drawn with replacement from a population that is known to have a nearly normal distribution. Define

\[ T = \frac{M - \mu_0}{s/n^{1/2}}. \]

Under the null hypothesis, if \(n\) is not too small, Student's t curve with \(n-1\) degrees of freedom will be an accurate approximation to the probability histogram of \(T\) , so

\[ P(T < t_{n-1,a}), \]

\[ P(T > t_{n-1,1-a}), \]

\[ P(|T| > t_{n-1,1-a/2}) \]

all are approximately equal to \(a\) . As we saw earlier in this chapter for the Z statistic, these three approximations give three tests of the null hypothesis \(\mu=\mu_0\) at approximate significance level \(a\) —a left-tail t test, a right-tail t test, and a two-tail t test:

  • Reject the null hypothesis if \(T (left-tail)
  • Reject the null hypothesis if \(T > t_{n-1,1-a}\) (right-tail)
  • Reject the null hypothesis if \(|T| > t_{n-1,1-a/2}\) (two-tail)

To decide which t test to use, we can apply the same rule of thumb we used for the z test:

  • Use a left-tail t test if, under the alternative hypothesis, the expected value of \(T\) is less than zero.
  • Use a right-tail t test if, under the alternative hypothesis, the expected value of \(T\) is greater than zero.
  • Use a two-tail t test if, under the alternative hypothesis, the expected value of \(T\) is not zero, but could be less than or greater than zero.
  • Consult a statistician for a more appropriate test if, under the alternative hypothesis, the expected value of \(T\) is zero.

P-values for t tests are computed in much the same way as P-values for z tests. Let t be the observed value of \(T\) (the t score). In a left-tail t test, the P-value is the area under Student's t curve with \(n-1\) degrees of freedom, from minus infinity to \(t\) . In a right-tail t test, the P-value is the area under Student's t curve with \(n-1\) degrees of freedom, from \(t\) to infinity. In a two-tail t test, the P-value is the total area under Student's t curve with \(n-1\) degrees of freedom between minus infinity and \(-|t|\) and between \(|t|\) and infinity.

There are versions of the t test for comparing two means, as well. Just like for the z test, the method depends on how the samples from the two populations are drawn. For example, if the two samples are paired (if we are sampling individuals labeled with two numbers and for each individual in the sample, we observe both numbers), we may base the t test on the sample mean of the paired differences and the sample standard deviation of the paired differences. Let \(\mu_1\) and \(\mu_2\) be the means of the two populations, and let

\[ \mu = \mu_1 - \mu_2. \]

The \(T\) statistic to test the null hypothesis that \(\mu=\mu_0\) is

\[ T = \frac{(\mbox{sample mean of differences}) - \mu_0 }{(\mbox{sample standard deviation of differences})/n^{1/2}}, \]

and the appropriate curve to use to find the rejection region for the test is Student's t curve with \(n-1\) degrees of freedom, where \(n\) is the number of individuals (differences) in the sample.

Two-sample t tests for a difference of means using independent samples depend on additional assumptions, such as equality of the two population standard deviations; we shall not present such tests here. The following exercises check your ability to compute t tests. The exercises are dynamic: the data tend to change when you reload the page.

Hypothesis Tests and Confidence Intervals

There is a deep connection between hypothesis tests about parameters, and confidence intervals for parameters. If we have a procedure for constructing a level \(100\% \times (1-a)\) confidence interval for a parameter \(\mu\) , then the following rule is a two-sided significance level \(a\) test of the null hypothesis that \(\mu = \mu_0\) :

reject the null hypothesis if the confidence interval does not contain \(\mu_0\).

Similarly, suppose we have an hypothesis-testing procedure that lets us test the null hypothesis that \(\mu=\mu_0\) for any value of \(\mu_0\) , at significance level \(a\) . Define

\(A\) = (all values of \(\mu_0\) for which we would not reject the null hypothesis that \(\mu = \mu_0\)).

Then \(A\) is a \(100\% \times (1-a)\) confidence set for \(\mu\) :

\[ P( A \mbox{ contains the true value of } \mu ) = 100\% \times (1-a). \]

(A confidence set is a generalization of the idea of a confidence interval: a \(1-a\) confidence set for the parameter \(\mu\) is a random set that has probability \(1-a\) of containing \(\mu\) . As is the case with confidence intervals, the probability makes sense only before collecting the data.) The set \(A\) might or might not be an interval, depending on the nature of the test. If one starts with a two-tail z test or two-tail t test, one ends up with a confidence interval rather than a more general confidence set.

Confidence Intervals Using Student's t curve

The t test lets us test the hypothesis that the population mean \(\mu\) is equal to \(\mu_0\) at approximate significance level a using a random sample with replacement of size n from a population with a nearly normal distribution. If the sample size n is small, the actual significance level is likely to differ considerably from the nominal significance level. Consider a two-sided t test of the hypothesis \(\mu=\mu_0\) at significance level \(a\) . If the sample mean is \(M\) and the sample standard deviation is \(s\) , we would not reject the null hypothesis at significance level \(a\) if

\[ \frac{|M-\mu_0|}{s/n^{1/2}} \le t_{n-1,1-a/2}. \]

We rearrange this inequality:

\[ -t_{n-1,1-a/2} \le \frac{M-\mu_0}{s/n^{1/2}} \le t_{n-1,1-a/2} \]

\[ -t_{n-1,1-a/2} \times s/n^{1/2} \le M - \mu_0 \le t_{n-1,1-a/2} \times s/n^{1/2} \]

\[ -M - t_{n-1,1-a/2} \times s/n^{1/2} \le - \mu_0 \le -M + t_{n-1,1-a/2} \times s/n^{1/2} \]

\[ M + t_{n-1,1-a/2} \times s/n^{1/2} \le \mu_0 \le M - t_{n-1,1-a/2} \times s/n^{1/2} \]

That is, we would not reject the hypothesis \(\mu = \mu_0\) provided \(\mu_0\) is in the interval

\[ [M - t_{n-1,1-a/2} \times s/n^{1/2}, M + t_{n-1,1-a/2} \times s/n^{1/2}]. \]

Therefore, that interval is a \(100\%-a\) confidence interval for \(\mu\) :

\[ P([M - t_{n-1,1-a/2} \times s/n^{1/2}, M + t_{n-1,1-a/2} \times s/n^{1/2}] \mbox{ contains } \mu) \approx 1-a. \]

The following exercise checks that you can use Student's t curve to construct a confidence interval for a population mean. The exercise is dynamic: the data tend to change when you reload the page.

In hypothesis testing, a Z statistic is a random variable whose probability histogram is approximated well by the normal curve if the null hypothesis is correct: If the null hypothesis is true, the expected value of a Z statistic is zero, the SE of a Z statistic is approximately 1, and the probability that a Z statistic is between \(a\) and \(b\) is approximately the area under the normal curve between \(a\) and \(b\) . Suppose that the random variable \(Z\) is a Z statistic. If, under the alternative hypothesis, \(E(Z) , the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the left-tailed z test: Reject the null hypothesis if \(Z , where \(z_a\) is the \(a\) quantile of the normal curve. If, under the alternative hypothesis, \(E(Z)>0\) , the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the right-tailed z test: Reject the null hypothesis if \(Z>z_{1-a}\) . If, under the alternative hypothesis, \(E(Z)\ne 0 \) but could be greater than 0 or less than 0, the appropriate z test to test the null hypothesis at approximate significance level \(a\) is the two-tailed z test: reject the null hypothesis if \(|Z|>z_{1-a/2}\) . If, under the alternative hypothesis, \(E(Z)=0\) , a z test probably is not appropriate—consult a statistician. The exact significance levels of these tests differ from \(a\) by an amount that depends on how closely the normal curve approximates the probability histogram of \(Z\) .

Z statistics often are constructed from other statistics by transforming approximately to standard units, which requires knowing the expected value and SE of the original statistic on the assumption that the null hypothesis is true. Let \(X\) be a test statistic; let \(E(X)\) be the expected value of \(X\) if the null hypothesis is true, and let \(se\) be approximately equal to the SE of \(X\) if the null hypothesis is true. If \(X\) is a sample sum of a large random sample with replacement, a sample mean of a large random sample with replacement, or a sum or difference of independent sample means of large samples with replacement,

\[ Z = \frac{X-E(X)}{se} \]

is a Z statistic.

Consider testing the null hypothesis that a population percentage \(p\) is equal to the value \(p_0\) on the basis of the sample percentage \phi of a random sample of size \(n\) with replacement. Under the null hypothesis, \(E(\phi)=p_0\) and

\[ SE(\phi) = \sqrt{\frac{p_0\times(1-p_0)}{n}}, \]

and if \(n\) is sufficiently large (say \(n \times p > 30\) and \(n \times (1-p)>30\) , but this depends on the desired accuracy), the normal approximation to

\[ Z = \frac{\phi-p_0}{\sqrt{(p_0 \times (1-p_0))/n}} \]

will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis \(p=p_0\) .

Consider testing the null hypothesis that a population mean \(\mu\) is equal to the value \(\mu_0\) , on the basis of the sample mean \(M\) of a random sample of size \(n\) with replacement. Let \(s\) denote the sample standard deviation. Under the null hypothesis, \(E(M)=\mu_0\) , and if \(n\) is large,

\[ SE(M)=SD/n^{1/2} \approx s/n^{1/2}, \]

and the normal approximation to

\[ Z = \frac{M-\mu_0}{s/n^{1/2}} \]

will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis \(\mu=\mu_0\) .

Consider a population of \(N\) individuals, each labeled with two numbers. The \(i\) th individual is labeled with the numbers \(c_i\) and \(t_i\) , \(i=1, 2, \ldots, N\) . Let \(\mu_c\) be the population mean of the \(N\) values \(\{c_1, \ldots, c_N\}\) and let \(\mu_t\) be the population mean of the \(N\) values \(\{t_1, \ldots, t_N \}\) . Let \(\mu=\mu_t-\mu_c\) be the difference between the two population means. Consider testing the null hypothesis that \(\mu=\mu_0\) on the basis of a paired random sample of size \(n\) with replacement from the population: that is, a random sample of size \(n\) is drawn with replacement from the population, and for each individual \(i\) in the sample, \(c_i\) and \(t_i\) are observed. This is equivalent to testing the hypothesis that the population mean of the \(N\) values \(\{(t_1-c_1), \ldots, (t_N-c_N)\}\) is equal to \(\mu_0\) , on the basis of the random sample of size \(n\) drawn with replacement from those \(N\) values. Let \(M_t\) be the sample mean of the \(n\) observed values of \(t_i\) and let \(M_c\) be the sample mean of the \(n\) observed values of \(c_i\) . Let \(sd\) denote the sample standard deviation of the \(n\) observed differences \(\{(t_i-c_i)\}\) . Under the null hypothesis, the expected value of \(M_t-M_c\) is \(\mu_0\) , and if \(n\) is large,

\[ SE(M_t-M_c) \approx sd/n^{1/2}, \]

and the normal approximation to the probability histogram of

\[ Z = \frac{M_t-M_c-\mu_0}{sd/n^{1/2}} \]

will be reasonably accurate, so \(Z\) can be used as the Z statistic in a z test of the null hypothesis that \(\mu_t-\mu_c=\mu_0\) .

Consider testing the hypothesis that the difference ( \(\mu_t-\mu_c\) ) between two population means, \(\mu_c\) and \(\mu_t\) , is equal to \(\mu_0\) , on the basis of the difference ( \(M_t-M_c\) ) between the sample mean \(M_c\) of a random sample of size \(n_c\) with replacement from the first population and the sample mean \(M_t\) of an independent random sample of size \(n_t\) with replacement from the second population. Let \(s_c\) denote the sample standard deviation of the sample of size \(n_c\) from the first population and let \(s_t\) denote the sample standard deviation of the sample of size \(n_t\) from the second population. If the null hypothesis is true,

\[ E(M_t-M_c)=\mu_0, \]

and if \(n_c\) and \(n_t\) are both large,

\[ SE(M_t-M_c) \approx \sqrt{s_t^2/n_t + s_c^2/n_c} \]

\[ Z = \frac{M_t-M_c-\mu_0}{\sqrt{s_t^2/n_t + s_c^2/n_c}} \]

A list of numbers is nearly normally distributed if the fraction of numbers between any pair of values, \(a , is approximately equal to the area under the normal curve between \((a-\mu)/SD\) and \((b-\mu)/SD\) , where \(\mu\) is the mean of the list and SD is the standard deviation of the list.

Student's t curve with \(d\) degrees of freedom is symmetric about 0, has a single bump centered at 0, and is broader and flatter than the normal curve. The total area under Student's t curve is 1, no matter what \(d\) is; as \(d\) increases, Student's t curve gets narrower, its peak gets higher, and it becomes closer and closer to the normal curve.

Let \(M\) be the sample mean of a random sample of size \(n\) with replacement from a population with mean \(\mu\) and a nearly normal distribution, and let \(s\) be the sample standard deviation of the random sample. For moderate values of \(n\) ( \(n or so), Student's t curve approximates the probability histogram of \((M-\mu)/(s/n^{1/2})\) better than the normal curve does, which can lead to an approximate hypothesis test about \(\mu\) that is more accurate than the z test.

Consider testing the null hypothesis that the mean \(\mu\) of a population with a nearly normal distribution is equal to \(\mu_0\) from a random sample of size \(n\) with replacement. Let

\[ T=\frac{M-\mu_0}{s/n^{1/2}}, \]

where \(M\) is the sample mean and \(s\) is the sample standard deviation. The tests that reject the null hypothesis if \(T (left-tail t test), if \(T>t_{n-1,1-a}\) (right-tail t test), or if \(|T|>t_{n-1,1-a/2}\) (two-tail t test) all have approximate significance level \(a\) . How close the nominal significance level \(a\) is to the true significance level depends on the distribution of the numbers in the population, the sample size \(n\) , and \(a\) . The same rule of thumb for selecting whether to use a left, right, or two-tailed z test (or not to use a z test at all) works to select whether to use a left, right, or two-tailed t test: If, under the alternative hypothesis, \(E(T) , use a left-tail test. If, under the alternative hypothesis, \(E(T) > 0 \) , use a right-tail test. If, under the alternative hypothesis, \(E(T)\) could be less than zero or greater than zero, use a two-tail test. If, under the alternative hypothesis, \(E(T) = 0 \) , consult an expert. Because the t test differs from the z test only when the sample size is small, and from a small sample it is not possible to tell whether the population has a nearly normal distribution, the t test should be used with caution.

A \(1-a\) confidence set for a parameter \(\mu\) is like a \(1-a\) confidence interval for a parameter \(\mu\) : It is a random set of values that has probability \(1-a\) of containing the true value of \(\mu\) . The difference is that the set need not be an interval.

There is a deep duality between hypothesis tests about a parameter \(\mu\) and confidence sets for \(\mu\) . Given a procedure for constructing a \(1-a\) confidence set for \(\mu\) , the rule reject the null hypothesis that \(\mu=\mu_0\) if the confidence set does not contain \(\mu\) is a significance level \(a\) test of the null hypothesis that \(\mu=\mu_0\) . Conversely, given a family of significance level \(a\) hypothesis tests that allow one to test the hypothesis that \(\mu=\mu_0\) for any value of \(\mu_0\) , the set of all values \(\mu_0\) for which the test does not reject the null hypothesis that \(\mu=\mu_0\) is a \(1-a\) confidence set for \(\mu\) .

  • alternative hypothesis
  • central limit theorem
  • confidence interval
  • confidence set
  • expected value
  • independent
  • independent random variable
  • mutatis mutandis
  • nearly normal distribution
  • normal approximation
  • normal curve
  • null hypothesis
  • pooled bootstrap estimate of the population SD
  • pooled bootstrap estimate of the SE
  • population mean
  • population percentage
  • population standard deviation
  • probability
  • probability distribution
  • probability histogram
  • random sample
  • random variable
  • rejection region
  • sample mean
  • sample percentage
  • sample size
  • sample standard deviation
  • significance level
  • simple random sample
  • standard deviation (SD)
  • standard error (SE)
  • standard unit
  • Student's t curve
  • test statistic
  • two-tailed test
  • Type I error
  • Z statistic

Z test is a statistical test that is conducted on data that approximately follows a normal distribution. The z test can be performed on one sample, two samples, or on proportions for hypothesis testing. It checks if the means of two large samples are different or not when the population variance is known.

A z test can further be classified into left-tailed, right-tailed, and two-tailed hypothesis tests depending upon the parameters of the data. In this article, we will learn more about the z test, its formula, the z test statistic, and how to perform the test for different types of data using examples.

1.
2.
3.
4.
5.
6.

What is Z Test?

A z test is a test that is used to check if the means of two populations are different or not provided the data follows a normal distribution. For this purpose, the null hypothesis and the alternative hypothesis must be set up and the value of the z test statistic must be calculated. The decision criterion is based on the z critical value.

Z Test Definition

A z test is conducted on a population that follows a normal distribution with independent data points and has a sample size that is greater than or equal to 30. It is used to check whether the means of two populations are equal to each other when the population variance is known. The null hypothesis of a z test can be rejected if the z test statistic is statistically significant when compared with the critical value.

Z Test Formula

The z test formula compares the z statistic with the z critical value to test whether there is a difference in the means of two populations. In hypothesis testing , the z critical value divides the distribution graph into the acceptance and the rejection regions. If the test statistic falls in the rejection region then the null hypothesis can be rejected otherwise it cannot be rejected. The z test formula to set up the required hypothesis tests for a one sample and a two-sample z test are given below.

One-Sample Z Test

A one-sample z test is used to check if there is a difference between the sample mean and the population mean when the population standard deviation is known. The formula for the z test statistic is given as follows:

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the sample size.

The algorithm to set a one sample z test based on the z test statistic is given as follows:

Left Tailed Test:

Null Hypothesis: \(H_{0}\) : \(\mu = \mu_{0}\)

Alternate Hypothesis: \(H_{1}\) : \(\mu < \mu_{0}\)

Decision Criteria: If the z statistic < z critical value then reject the null hypothesis.

Right Tailed Test:

Alternate Hypothesis: \(H_{1}\) : \(\mu > \mu_{0}\)

Decision Criteria: If the z statistic > z critical value then reject the null hypothesis.

Two Tailed Test:

Alternate Hypothesis: \(H_{1}\) : \(\mu \neq \mu_{0}\)

Two Sample Z Test

A two sample z test is used to check if there is a difference between the means of two samples. The z test statistic formula is given as follows:

z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\). \(\overline{x_{1}}\), \(\mu_{1}\), \(\sigma_{1}^{2}\) are the sample mean, population mean and population variance respectively for the first sample. \(\overline{x_{2}}\), \(\mu_{2}\), \(\sigma_{2}^{2}\) are the sample mean, population mean and population variance respectively for the second sample.

The two-sample z test can be set up in the same way as the one-sample test. However, this test will be used to compare the means of the two samples. For example, the null hypothesis is given as \(H_{0}\) : \(\mu_{1} = \mu_{2}\).

z test

Z Test for Proportions

A z test for proportions is used to check the difference in proportions. A z test can either be used for one proportion or two proportions. The formulas are given as follows.

One Proportion Z Test

A one proportion z test is used when there are two groups and compares the value of an observed proportion to a theoretical one. The z test statistic for a one proportion z test is given as follows:

z = \(\frac{p-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\). Here, p is the observed value of the proportion, \(p_{0}\) is the theoretical proportion value and n is the sample size.

The null hypothesis is that the two proportions are the same while the alternative hypothesis is that they are not the same.

Two Proportion Z Test

A two proportion z test is conducted on two proportions to check if they are the same or not. The test statistic formula is given as follows:

z =\(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\)

where p = \(\frac{x_{1}+x_{2}}{n_{1}+n_{2}}\)

\(p_{1}\) is the proportion of sample 1 with sample size \(n_{1}\) and \(x_{1}\) number of trials.

\(p_{2}\) is the proportion of sample 2 with sample size \(n_{2}\) and \(x_{2}\) number of trials.

How to Calculate Z Test Statistic?

The most important step in calculating the z test statistic is to interpret the problem correctly. It is necessary to determine which tailed test needs to be conducted and what type of test does the z statistic belong to. Suppose a teacher claims that his section's students will score higher than his colleague's section. The mean score is 22.1 for 60 students belonging to his section with a standard deviation of 4.8. For his colleague's section, the mean score is 18.8 for 40 students and the standard deviation is 8.1. Test his claim at \(\alpha\) = 0.05. The steps to calculate the z test statistic are as follows:

  • Identify the type of test. In this example, the means of two populations have to be compared in one direction thus, the test is a right-tailed two-sample z test.
  • Set up the hypotheses. \(H_{0}\): \(\mu_{1} = \mu_{2}\), \(H_{1}\): \(\mu_{1} > \mu_{2}\).
  • Find the critical value at the given alpha level using the z table. The critical value is 1.645.
  • Determine the z test statistic using the appropriate formula. This is given by z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\). Substitute values in this equation. \(\overline{x_{1}}\) = 22.1, \(\sigma_{1}\) = 4.8, \(n_{1}\) = 60, \(\overline{x_{2}}\) = 18.8, \(\sigma_{2}\) = 8.1, \(n_{2}\) = 40 and \(\mu_{1} - \mu_{2} = 0\). Thus, z = 2.32
  • Compare the critical value and test statistic to arrive at a conclusion. As 2.32 > 1.645 thus, the null hypothesis can be rejected. It can be concluded that there is enough evidence to support the teacher's claim that the scores of students are better in his class.

Z Test vs T-Test

Both z test and t-test are univariate tests used on the means of two datasets. The differences between both tests are outlined in the table given below:

Z Test T-Test
A z test is a statistical test that is used to check if the means of two data sets are different when the population variance is known. A is used to check if the means of two data sets are different when the population variance is not known.
The sample size is greater than or equal to 30. The sample size is lesser than 30.
The follows a normal distribution. The data follows a student-t distribution.
The one-sample z test statistic is given by \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) The t test statistic is given as \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) where s is the sample standard deviation

Related Articles:

  • Probability and Statistics
  • Data Handling
  • Summary Statistics

Important Notes on Z Test

  • Z test is a statistical test that is conducted on normally distributed data to check if there is a difference in means of two data sets.
  • The sample size should be greater than 30 and the population variance must be known to perform a z test.
  • The one-sample z test checks if there is a difference in the sample and population mean,
  • The two sample z test checks if the means of two different groups are equal.

Examples on Z Test

Example 1: A teacher claims that the mean score of students in his class is greater than 82 with a standard deviation of 20. If a sample of 81 students was selected with a mean score of 90 then check if there is enough evidence to support this claim at a 0.05 significance level.

Solution: As the sample size is 81 and population standard deviation is known, this is an example of a right-tailed one-sample z test.

\(H_{0}\) : \(\mu = 82\)

\(H_{1}\) : \(\mu > 82\)

From the z table the critical value at \(\alpha\) = 1.645

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\)

\(\overline{x}\) = 90, \(\mu\) = 82, n = 81, \(\sigma\) = 20

As 3.6 > 1.645 thus, the null hypothesis is rejected and it is concluded that there is enough evidence to support the teacher's claim.

Answer: Reject the null hypothesis

Example 2: An online medicine shop claims that the mean delivery time for medicines is less than 120 minutes with a standard deviation of 30 minutes. Is there enough evidence to support this claim at a 0.05 significance level if 49 orders were examined with a mean of 100 minutes?

Solution: As the sample size is 49 and population standard deviation is known, this is an example of a left-tailed one-sample z test.

\(H_{0}\) : \(\mu = 120\)

\(H_{1}\) : \(\mu < 120\)

From the z table the critical value at \(\alpha\) = -1.645. A negative sign is used as this is a left tailed test.

\(\overline{x}\) = 100, \(\mu\) = 120, n = 49, \(\sigma\) = 30

As -4.66 < -1.645 thus, the null hypothesis is rejected and it is concluded that there is enough evidence to support the medicine shop's claim.

Example 3: A company wants to improve the quality of products by reducing defects and monitoring the efficiency of assembly lines. In assembly line A, there were 18 defects reported out of 200 samples while in line B, 25 defects out of 600 samples were noted. Is there a difference in the procedures at a 0.05 alpha level?

Solution: This is an example of a two-tailed two proportion z test.

\(H_{0}\): The two proportions are the same.

\(H_{1}\): The two proportions are not the same.

As this is a two-tailed test the alpha level needs to be divided by 2 to get 0.025.

Using this, the critical value from the z table is 1.96.

\(n_{1}\) = 200, \(n_{2}\) = 600

\(p_{1}\) = 18 / 200 = 0.09

\(p_{2}\) = 25 / 600 = 0.0416

p = (18 + 25) / (200 + 600) = 0.0537

z =\(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\) = 2.62

As 2.62 > 1.96 thus, the null hypothesis is rejected and it is concluded that there is a significant difference between the two lines.

go to slide go to slide go to slide

z test null and alternative hypothesis

Book a Free Trial Class

FAQs on Z Test

What is a z test in statistics.

A z test in statistics is conducted on data that is normally distributed to test if the means of two datasets are equal. It can be performed when the sample size is greater than 30 and the population variance is known.

What is a One-Sample Z Test?

A one-sample z test is used when the population standard deviation is known, to compare the sample mean and the population mean. The z test statistic is given by the formula \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

What is the Two-Sample Z Test Formula?

The two sample z test is used when the means of two populations have to be compared. The z test formula is given as \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is a One Proportion Z test?

A one proportion z test is used to check if the value of the observed proportion is different from the value of the theoretical proportion. The z statistic is given by \(\frac{p-p_{0}}{\sqrt{\frac{p_{0}(1-p_{0})}{n}}}\).

What is a Two Proportion Z Test?

When the proportions of two samples have to be compared then the two proportion z test is used. The formula is given by \(\frac{p_{1}-p_{2}-0}{\sqrt{p(1-p)\left ( \frac{1}{n_{1}} +\frac{1}{n_{2}}\right )}}\).

How Do You Find the Z Test?

The steps to perform the z test are as follows:

  • Set up the null and alternative hypotheses.
  • Find the critical value using the alpha level and z table.
  • Calculate the z statistic.
  • Compare the critical value and the test statistic to decide whether to reject or not to reject the null hypothesis.

What is the Difference Between the Z Test and the T-Test?

A z test is used on large samples n ≥ 30 and normally distributed data while a t-test is used on small samples (n < 30) following a student t distribution . Both tests are used to check if the means of two datasets are the same.

  • Practice Mathematical Algorithm
  • Mathematical Algorithms
  • Pythagorean Triplet
  • Fibonacci Number
  • Euclidean Algorithm
  • LCM of Array
  • GCD of Array
  • Binomial Coefficient
  • Catalan Numbers
  • Sieve of Eratosthenes
  • Euler Totient Function
  • Modular Exponentiation
  • Modular Multiplicative Inverse
  • Stein's Algorithm
  • Juggler Sequence
  • Chinese Remainder Theorem
  • Quiz on Fibonacci Numbers

Z-test : Formula, Types, Examples

Z-test is especially useful when you have a large sample size and know the population’s standard deviation. Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis.

Table of Content

What is Z-Test?

Z-test formula, when to use z-test, hypothesis testing, steps to perform z-test, type of z-test, practice problems.

Z-test

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

[Tex]\text{Z-Score} = \frac{\bar{x}-\mu}{\sigma} [/Tex]

  • [Tex]\bar{x}  [/Tex] : mean of the sample.
  • [Tex]\mu  [/Tex] : mean of the population.
  • [Tex]\sigma  [/Tex] : Standard deviation of the population.

z-test assumes that the test statistic (z-score) follows a standard normal distribution.

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

[Tex]\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma} \\&=\frac{300-200}{5} \\&=20 \end{aligned} [/Tex]

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

  • The sample size should be greater than 30. Otherwise, we should use the t-test.
  • Samples should be drawn at random from the population.
  • The standard deviation of the population should be known.
  • Samples that are drawn from the population should be independent of each other.
  • The data should be normally distributed , however, for a large sample size, it is assumed to have a normal distribution because central limit theorem

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

  • Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null hypothesis. The null hypothesis is denoted by H 0 .
  • Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different from the claimed value. It is denoted by H A .
  • Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).
  • First, identify the null and alternate hypotheses.
  • Determine the level of significance (∝).
  • Find the critical value of z in the z-test using
  • n: sample size.
  • Now compare with the hypothesis and decide whether to reject or not reject the null hypothesis

Left-tailed Test

In this test, our region of rejection is located to the extreme left of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Z-test

Right-tailed Test

In this test, our region of rejection is located to the extreme right of the distribution. Here our null hypothesis is that the claimed value is less than or equal to the mean population value.

Z-test

One-Tailed Test

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

  • First, we define the null hypothesis and the alternate hypothesis. Our null hypothesis will be: [Tex]H_0 : \mu  = 100        [/Tex] and our alternate hypothesis. [Tex]H_A : \mu > 100 [/Tex]
  • State the level of significance. Here, our level of significance is given in this question ( [Tex]\alpha [/Tex]  =0.05), if not given then we take ∝=0.05 in general.
  • Now, we compute the Z-Score: X = 110 Mean = 100 Standard Deviation = 15 Number of samples = 50 [Tex]\begin{aligned} \text{Z-Score}&=\frac{\bar{x}-\mu}{\sigma/\sqrt{n}} \\&=\frac{110-100}{15/\sqrt{50}} \\&=\frac{10}{2.12} \\&=4.71 \end{aligned} [/Tex]
  • Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
  • Here 4.71 >1.645, so we reject the null hypothesis. 
  • If the z-test statistics are less than the z-score, then we will not reject the null hypothesis.

Code Implementations of One-Tailed Z-Test

# Import the necessary libraries import numpy as np import scipy.stats as stats # Given information sample_mean = 110 population_mean = 100 population_std = 15 sample_size = 50 alpha = 0.05 # compute the z-score z_score = ( sample_mean - population_mean ) / ( population_std / np . sqrt ( 50 )) print ( 'Z-Score :' , z_score ) # Approach 1: Using Critical Z-Score # Critical Z-Score z_critical = stats . norm . ppf ( 1 - alpha ) print ( 'Critical Z-Score :' , z_critical ) # Hypothesis if z_score > z_critical : print ( "Reject Null Hypothesis" ) else : print ( "Fail to Reject Null Hypothesis" ) # Approach 2: Using P-value # P-Value : Probability of getting less than a Z-score p_value = 1 - stats . norm . cdf ( z_score ) print ( 'p-value :' , p_value ) # Hypothesis if p_value < alpha : print ( "Reject Null Hypothesis" ) else : print ( "Fail to Reject Null Hypothesis" )

Z-Score : 4.714045207910317Critical Z-Score : 1.6448536269514722Reject Null Hypothesisp-value : 1.2142337364462463e-06Reject Null Hypothesis

Two-tailed test

In this test, our region of rejection is located to both extremes of the distribution. Here our null hypothesis is that the claimed value is equal to the mean population value.

z test null and alternative hypothesis

Below is an example of performing the z-test:

Two-sampled z-test

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u 1 and u 2 to be the population mean, and X 1 and X 2 to be the observed sample mean. Here, our null hypothesis could be like this:

[Tex]H_{0} : \mu_{1} -\mu_{2} = 0    [/Tex]

and alternative hypothesis

[Tex]H_{1} :  \mu_{1} – \mu_{2} \ne 0    [/Tex]

and the formula for calculating the z-test score:

[Tex]Z = \frac{\left ( \overline{X_{1}} – \overline{X_{2}} \right ) – \left ( \mu_{1} – \mu_{2} \right )}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}}    [/Tex]

where  [Tex]\sigma_1 [/Tex]   and  [Tex]\sigma_2 [/Tex]   are the standard deviation and n 1 and n 2 are the sample size of population corresponding to u 1 and u 2 .  

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10 Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Step 1: Null & Alternate Hypothesis

  • Null Hypothesis: There is no significant difference between the mean score between the online and offline classes [Tex] \mu_1 -\mu_2 = 0 [/Tex]
  • Alternate Hypothesis: There is a significant difference in the mean scores between the online and offline classes. [Tex] \mu_1 -\mu_2 \neq 0 [/Tex]

Step 2: Significance Label

  • Significance Label: 5%  [Tex]\alpha = 0.05 [/Tex]

Step 3: Z-Score

[Tex]\begin{aligned} \text{Z-score} &= \frac{(x_1-x_2)-(\mu_1 -\mu_2)} {\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_1}}} \\ &= \frac{(75-80)-0} {\sqrt{\frac{10^2}{50}+\frac{12^2}{60}}} \\ &= \frac{-5} {\sqrt{2+2.4}} \\ &= \frac{-5} {2.0976} \\&=-2.384 \end{aligned} [/Tex]

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

  •  Critical Z-Score = 1.96

Step 5: Compare with the absolute Z-Score value

  • absolute(Z-Score) > Critical Z-Score
  • Reject the null hypothesis. There is a significant difference between the online and offline classes.

Code Implementations on Two-sampled Z-test

import numpy as np import scipy.stats as stats # Group A (Offline Classes) n1 = 50 x1 = 75 s1 = 10 # Group B (Online Classes) n2 = 60 x2 = 80 s2 = 12 # Null Hypothesis = mu_1-mu_2 = 0 # Hypothesized difference (under the null hypothesis) D = 0 # Set the significance level alpha = 0.05 # Calculate the test statistic (z-score) z_score = (( x1 - x2 ) - D ) / np . sqrt (( s1 ** 2 / n1 ) + ( s2 ** 2 / n2 )) print ( 'Z-Score:' , np . abs ( z_score )) # Calculate the critical value z_critical = stats . norm . ppf ( 1 - alpha / 2 ) print ( 'Critical Z-Score:' , z_critical ) # Compare the test statistic with the critical value if np . abs ( z_score ) > z_critical : print ( """Reject the null hypothesis. There is a significant difference between the online and offline classes.""" ) else : print ( """Fail to reject the null hypothesis. There is not enough evidence to suggest a significant difference between the online and offline classes.""" ) # Approach 2: Using P-value # P-Value : Probability of getting less than a Z-score p_value = 2 * ( 1 - stats . norm . cdf ( np . abs ( z_score ))) print ( 'P-Value :' , p_value ) # Compare the p-value with the significance level if p_value < alpha : print ( """Reject the null hypothesis. There is a significant difference between the online and offline classes.""" ) else : print ( """Fail to reject the null hypothesis. There is not enough evidence to suggest significant difference between the online and offline classes.""" )

Z-Score: 2.3836564731139807 Critical Z-Score: 1.959963984540054 Reject the null hypothesis. There is a significant difference between the online and offline classes. P-Value : 0.01714159544079563 Reject the null hypothesis. There is a significant difference between the online and offline classes.

Solved examples :

Example 1: One-sample Z-test

Problem: A company claims that the average battery life of their new smartphone is 12 hours. A consumer group tests 100 phones and finds the average battery life to be 11.8 hours with a population standard deviation of 0.5 hours. At a 5% significance level, is there evidence to refute the company’s claim?

Solution: Step 1: State the hypotheses H₀: μ = 12 (null hypothesis) H₁: μ ≠ 12 (alternative hypothesis) Step 2: Calculate the Z-score Z = (x̄ – μ) / (σ / √n) = (11.8 – 12) / (0.5 / √100) = -0.2 / 0.05 = -4 Step 3: Find the critical value (two-tailed test at 5% significance) Z₀.₀₂₅ = ±1.96 Step 4: Compare Z-score with critical value |-4| > 1.96, so we reject the null hypothesis. Conclusion: There is sufficient evidence to refute the company’s claim about battery life.

Problem: A researcher wants to compare the effectiveness of two different medications for reducing blood pressure. Medication A is tested on 50 patients, resulting in a mean reduction of 15 mmHg with a standard deviation of 3 mmHg. Medication B is tested on 60 patients, resulting in a mean reduction of 13 mmHg with a standard deviation of 4 mmHg. At a 1% significance level, is there a significant difference between the two medications?

Step 1: State the hypotheses H₀: μ₁ – μ₂ = 0 (null hypothesis) H₁: μ₁ – μ₂ ≠ 0 (alternative hypothesis) Step 2: Calculate the Z-score Z = (x̄₁ – x̄₂) / √((σ₁²/n₁) + (σ₂²/n₂)) = (15 – 13) / √((3²/50) + (4²/60)) = 2 / √(0.18 + 0.2667) = 2 / 0.6455 = 3.10 Step 3: Find the critical value (two-tailed test at 1% significance) Z₀.₀₀₅ = ±2.576 Step 4: Compare Z-score with critical value 3.10 > 2.576, so we reject the null hypothesis. Conclusion: There is a significant difference between the effectiveness of the two medications at the 1% significance level.

Problem 3 : A polling company claims that 60% of voters support a new policy. In a sample of 1000 voters, 570 support the policy. At a 5% significance level, is there evidence to support the company’s claim?

Step 1: State the hypotheses H₀: p = 0.60 (null hypothesis) H₁: p ≠ 0.60 (alternative hypothesis) Step 2: Calculate the Z-score p̂ = 570/1000 = 0.57 (sample proportion) Z = (p̂ – p) / √(p(1-p)/n) = (0.57 – 0.60) / √(0.60(1-0.60)/1000) = -0.03 / √(0.24/1000) = -0.03 / 0.0155 = -1.94 Step 3: Find the critical value (two-tailed test at 5% significance) Z₀.₀₂₅ = ±1.96 Step 4: Compare Z-score with critical value |-1.94| < 1.96, so we fail to reject the null hypothesis. Conclusion: There is not enough evidence to refute the polling company’s claim at the 5% significance level.

Problem 4 : A manufacturer claims that their light bulbs last an average of 1000 hours. A sample of 100 bulbs has a mean life of 985 hours. The population standard deviation is known to be 50 hours. At a 5% significance level, is there evidence to reject the manufacturer’s claim?

Solution: H₀: μ = 1000 H₁: μ ≠ 1000 Z = (x̄ – μ) / (σ / √n) = (985 – 1000) / (50 / √100) = -15 / 5 = -3 Critical value (α = 0.05, two-tailed): ±1.96 |-3| > 1.96, so reject H₀. Conclusion: There is sufficient evidence to reject the manufacturer’s claim at the 5% significance level.

Example 5 : Two factories produce semiconductors. Factory A’s chips have a mean resistance of 100 ohms with a standard deviation of 5 ohms. Factory B’s chips have a mean resistance of 98 ohms with a standard deviation of 4 ohms. Samples of 50 chips from each factory are tested. At a 1% significance level, is there a difference in mean resistance between the two factories?

H₀: μA – μB = 0 H₁: μA – μB ≠ 0 Z = (x̄A – x̄B) / √((σA²/nA) + (σB²/nB)) = (100 – 98) / √((5²/50) + (4²/50)) = 2 / √(0.5 + 0.32) = 2 / 0.872 = 2.29 Critical value (α = 0.01, two-tailed): ±2.576 |2.29| < 2.576, so fail to reject H₀. Conclusion: There is not enough evidence to conclude a difference in mean resistance at the 1% significance level.

Problem 6 : A political analyst claims that 40% of voters in a certain district support a new tax policy. In a random sample of 500 voters, 220 support the policy. At a 5% significance level, is there evidence to reject the analyst’s claim?

H₀: p = 0.40 H₁: p ≠ 0.40 p̂ = 220/500 = 0.44 Z = (p̂ – p) / √(p(1-p)/n) = (0.44 – 0.40) / √(0.40(1-0.40)/500) = 0.04 / 0.0219 = 1.83 Critical value (α = 0.05, two-tailed): ±1.96 |1.83| < 1.96, so fail to reject H₀. Conclusion: There is not enough evidence to reject the analyst’s claim at the 5% significance level.

Problem 7 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?

H₀: pA – pB = 0 H₁: pA – pB ≠ 0 p̂A = 150/1000 = 0.15 p̂B = 180/1200 = 0.15 p̂ = (150 + 180) / (1000 + 1200) = 0.15 Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB)) = (0.15 – 0.15) / √(0.15(1-0.15)(1/1000 + 1/1200)) = 0 / 0.0149 = 0 Critical value (α = 0.05, two-tailed): ±1.96 |0| < 1.96, so fail to reject H₀. Conclusion: There is no significant difference in the effectiveness of the two advertising methods at the 5% significance level.

Problem 8 : A new treatment for a disease is tested in two cities. In City A, 120 out of 400 patients recover. In City B, 140 out of 500 patients recover. At a 5% significance level, is there a difference in the recovery rates between the two cities?

H₀: pA – pB = 0 H₁: pA – pB ≠ 0 p̂A = 120/400 = 0.30 p̂B = 140/500 = 0.28 p̂ = (120 + 140) / (400 + 500) = 0.2889 Z = (p̂A – p̂B) / √(p̂(1-p̂)(1/nA + 1/nB)) = (0.30 – 0.28) / √(0.2889(1-0.2889)(1/400 + 1/500)) = 0.02 / 0.0316 = 0.633 Critical value (α = 0.05, two-tailed): ±1.96 |0.633| < 1.96, so fail to reject H₀. Conclusion: There is not enough evidence to conclude a difference in recovery rates between the two cities at the 5% significance level.

Problem 9 : Two advertising methods are compared. Method A results in 150 sales out of 1000 contacts. Method B results in 180 sales out of 1200 contacts. At a 5% significance level, is there a difference in the effectiveness of the two methods?

Problem 10 : A company claims that their product weighs 500 grams on average. A sample of 64 products has a mean weight of 498 grams. The population standard deviation is known to be 8 grams. At a 1% significance level, is there evidence to reject the company’s claim?

H₀: μ = 500 H₁: μ ≠ 500 Z = (x̄ – μ) / (σ / √n) = (498 – 500) / (8 / √64) = -2 / 1 = -2 Critical value (α = 0.01, two-tailed): ±2.576 |-2| < 2.576, so fail to reject H₀. Conclusion: There is not enough evidence to reject the company’s claim at the 1% significance level.

1).A cereal company claims that their boxes contain an average of 350 grams of cereal. A consumer group tests 100 boxes and finds a mean weight of 345 grams with a known population standard deviation of 15 grams. At a 5% significance level, is there evidence to refute the company’s claim?

2).A study compares the effect of two different diets on cholesterol levels. Diet A is tested on 50 people, resulting in a mean reduction of 25 mg/dL with a standard deviation of 8 mg/dL. Diet B is tested on 60 people, resulting in a mean reduction of 22 mg/dL with a standard deviation of 7 mg/dL. At a 1% significance level, is there a significant difference between the two diets?

3).A politician claims that 60% of voters in her district support her re-election. In a random sample of 1000 voters, 570 support her. At a 5% significance level, is there evidence to reject the politician’s claim?

4).Two different teaching methods are compared. Method A results in 80 students passing out of 120 students. Method B results in 90 students passing out of 150 students. At a 5% significance level, is there a difference in the effectiveness of the two methods?

5).A company claims that their new energy-saving light bulbs last an average of 10,000 hours. A sample of 64 bulbs has a mean life of 9,800 hours. The population standard deviation is known to be 500 hours. At a 1% significance level, is there evidence to reject the company’s claim?

6).The mean salary of employees in a large corporation is said to be $75,000 per year. A union representative suspects this is too high and surveys 100 randomly selected employees, finding a mean salary of $72,500. The population standard deviation is known to be $8,000. At a 5% significance level, is there evidence to support the union representative’s suspicion?

7).Two factories produce computer chips. Factory A’s chips have a mean processing speed of 3.2 GHz with a standard deviation of 0.2 GHz. Factory B’s chips have a mean processing speed of 3.3 GHz with a standard deviation of 0.25 GHz. Samples of 100 chips from each factory are tested. At a 5% significance level, is there a difference in mean processing speed between the two factories?

8).A new vaccine is claimed to be 90% effective. In a clinical trial with 500 participants, 440 develop immunity. At a 1% significance level, is there evidence to reject the claim about the vaccine’s effectiveness?

9).Two different advertising campaigns are tested. Campaign A results in 250 sales out of 2000 views. Campaign B results in 300 sales out of 2500 views. At a 5% significance level, is there a difference in the effectiveness of the two campaigns?

10).A quality control manager claims that the defect rate in a production line is 5%. In a sample of 1000 items, 65 are found to be defective. At a 5% significance level, is there evidence to suggest that the actual defect rate is different from the claimed 5%?

Type 1 error and Type II error

  • Type I error: Type 1 error has occurred when we reject the null hypothesis, even when the hypothesis is true. This error is denoted by alpha.
  • Type II error: Type II error occurred when we didn’t reject the null hypothesis, even when the hypothesis is false. This error is denoted by beta.
 Null Hypothesis is TRUENull Hypothesis is FALSE
Reject Null Hypothesis

Type I Error

(False Positive)

Correct decision
Fail to Reject the Null HypothesisCorrect decision

Type II error

(False Negative)

Z-tests are used to determine whether there is a statistically significant difference between a sample statistic and a population parameter, or between two population parameters.Z-tests are statistical tools used to determine if there’s a significant difference between a sample statistic and a population parameter, or between two population parameters. They’re applicable when dealing with large sample sizes (typically n > 30) and known population standard deviations. Z-tests can be used for analyzing means or proportions in both one-sample and two-sample scenarios. The process involves stating hypotheses, calculating a Z-score, comparing it to a critical value based on the chosen significance level (often 5% or 1%), and then making a decision to reject or fail to reject the null hypothesis.

What is the main limitation of the z-test?

The limitation of Z-Tests is that we don’t usually know the population standard deviation. What we do is: When we don’t know the population’s variability, we assume that the sample’s variability is a good basis for estimating the population’s variability.

What is the minimum sample for z-test?

A z-test can only be used if the population standard deviation is known and the sample size is 30 data points or larger. Otherwise, a t-test should be employed.

What is the application of z-test?

It is also used to determine if there is a significant difference between the mean of two independent samples. The z-test can also be used to compare the population proportion to an assumed proportion or to determine the difference between the population proportion of two samples.

What is the theory of the z-test?

The z test is a commonly used hypothesis test in inferential statistics that allows us to compare two populations using the mean values of samples from those populations, or to compare the mean of one population to a hypothesized value, when what we are interested in comparing is a continuous variable.

Please Login to comment...

Similar reads.

  • Engineering Mathematics
  • Machine Learning
  • Mathematical
  • Best Twitch Extensions for 2024: Top Tools for Viewers and Streamers
  • Discord Emojis List 2024: Copy and Paste
  • Best Adblockers for Twitch TV: Enjoy Ad-Free Streaming in 2024
  • PS4 vs. PS5: Which PlayStation Should You Buy in 2024?
  • 15 Most Important Aptitude Topics For Placements [2024]

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Understanding Null Hypothesis vs. Alternative Hypothesis

Null hypothesis (h₀), alternative hypothesis (h₁), role in hypothesis testing.

A Z-test is a type of statistical hypothesis test used to test the mean of a normally distributed test statistic. It tests whether there is a significant difference between an observed population mean and the population mean under the null hypothesis, H 0 .

A Z-test can only be used when the population variance is known (or can be estimated with a high degree of accuracy), or if the sample size of the experiment is large (typically n>30). Also, the test statistic must exhibit a normal distribution; if it exhibits a distribution that is clearly not normal, the Z-test is not applicable. In many cases, population parameters may not be known, or it may not be possible to estimate them accurately. In such cases, or in cases where the sample size is small, a Student's t-test is more appropriate.

How to conduct a Z-test

The procedure for conducting a Z-test is similar to that of other statistical hypothesis tests, and is generally as follows:

  • State the null (H 0 ) and alternative hypotheses (H a ).
  • Select a significance level, α.
  • Calculate the Z-score.
  • Determine the critical value(s) of Z or the p-value.
  • Compare the Z-score of the observed value to the critical value of Z (or compare the p-value to α) to determine if the null hypothesis should be rejected in favor of the alternative hypothesis, or if the null hypothesis should not be rejected.

H 0 and H a

The null hypothesis is typically a statement of no difference. For example, assume that the average score received on the SAT by high schoolers in a given state was a 1200 with a known standard deviation. If the average score of students in a given high school is a 1230, we may use a Z-test to determine whether this result is better, statistically, than the state average. The null hypothesis in this case would be that the average score of students in the high school is not better than the state average, or H 0 : μ ≤ μ 0 , or μ ≤ 1200.

The alternative hypothesis is a statement of difference from the null hypothesis. It can take one of three forms:

  • Given H 0 : μ ≤ μ 0 , H a : μ > μ 0
  • Given H 0 : μ ≥ μ 0 , H a : μ 0
  • Given H 0 : μ = μ 0 , H a : μ ≠ μ 0

In this example, it is believed that a score of 1230 is statistically significant, and that students in this high school performed better than the state average. Therefore, the alternative hypothesis takes on the first form in the list, H a : μ > μ 0 , or μ > 1200.

Significance level

The significance level, α, is the probability of a study rejecting the null hypothesis when the null hypothesis is true. Commonly used significance levels include 0.01, 0.05, and 0.10. A significance level of 0.05, or 5%, means that there is a 5% chance of concluding that a difference exists (thus rejecting H 0 ) when there is no actual difference. The lower the significance level, the more evidence required before the null hypothesis can be rejected. The significance level is compared to the p-value: if a p-value is less than the significance level, the null hypothesis is rejected in favor of the alternative hypothesis.

Calculating a Z-score is a necessary part of conducting a Z-test. A Z-score indicates the number of standard deviations that an observed value is from the mean in a standard normal distribution. For example, an observed value with a Z-score of 1.2 indicates that the observed value is 1.2 standard deviations from the mean. If the population mean and standard deviation are known, the Z-score is calculated using the following formula:

where μ is the mean of the population, σ is the standard deviation of the population, and x is the observed value. In many cases the population mean and standard deviation are not known. In such cases, these population parameters can be estimated using a sample mean and sample standard deviation, and the Z-score can be computed as follows:

where x is the sample mean, s is the sample standard deviation, and x is the observed value.

Critical value and p-value

Once a Z-score has been calculated, there are two methods for drawing conclusions about the test statistic: using the critical value(s), or using a p-value. To form a conclusion for a hypothesis test using a critical value, the Z-score of the observed value is compared to the critical value(s) of the selected significance level; to use a p-value, the p-value of the observed value is compared to the significance level.

Critical value

A critical value is a value that indicates the critical region(s) (or rejection region) of the standard normal distribution, where a critical region is the area of the distribution in which a value must lie in order to reject the null hypothesis.

The critical value is dependent on the significance level as well as whether a one-tailed or two-tailed test is being conducted. A one-tailed test is used when we want to know if a value is significantly larger or smaller than the Z-score. There is only one critical region in a one-tailed Z-test. It is either a left-tailed test (or lower-tailed) or right-tailed test (or upper-tailed) based on the position of the critical region, as shown in the figure below.

The critical regions are shown in pink. If a test statistic lies within the pink region, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

z test null and alternative hypothesis

If a test value lies in either of the critical regions shown in pink, the null hypothesis is rejected in favor of the alternative hypothesis; if it lies within the green region, the null hypothesis is not rejected.

After selecting the significance level and type of test, the critical Z value can be determined using a Z table by finding the Z value that corresponds to the selected significance level. For example, for a one-tailed test and a significance level of 0.05, find the probability closest to 0.05 and read the Z value that results in this probability; the Z value for α = 0.05 for a one-tailed Z-test is -1.96 for a left-tailed Z-test and 1.96 for a right-tailed Z-test. For a two-tailed Z-test, divide α by 2, then determine the corresponding Z-value. For α = 0.05, each tail will comprise an area of 0.025 in the standard normal distribution, which corresponds to Z-values of -1.645 and 1.645. Thus, the critical regions are Z 1.645. The critical values for common significance levels are shown in the table below:

Critical value
α Left-tailed Right-tailed Two-tailed
0.01 -2.326 2.326 ± 2.576
0.05 -1.645 1.645 ± 1.96
0.10 -1.282 1.282 ± 1.645

The p-value indicates the probability of obtaining test results that are at least as extreme as the observed results, assuming that the null hypothesis is true. It tells us how likely it is for an outcome to occur solely based on chance. For example, a p-value of 0.05 means that there is a 5% chance that an outcome occurred solely by chance. The smaller the p-value, the less likely it is for an outcome to occur solely by chance, and the more evidence there is to reject the null hypothesis.

Like critical values, a p-value can be determined using a Z table. For a left-tailed Z-test, the p-value is the area under the standard normal distribution to the left of the Z-score of the observed value; for a right-tailed Z-test, it is the area to the right of the Z-score; for a two-tailed Z-test, it is the sum of the area to the left and right of the Z-score. If the p-value is less than or equal to the significance level, the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null hypothesis is not rejected.

It is important to note that the p-value is not the probability that the null hypothesis is true. It is the probability that the data could deviate from the null hypothesis as much, or more than it did. The calculation of the p-value assumes that the null hypothesis is true, so it is not a measure of whether or not the null hypothesis is correct. Rather, it is a measure of how well the data fits the null hypothesis. Also, the p-value (or critical value) may provide evidence that the null hypothesis should be rejected in favor of the alternative hypothesis at the chosen level of significance . This does not mean that the alternative hypothesis is being accepted, because it is possible that the null hypothesis would not be rejected at a different significance level. Similarly, if the p-value is greater than the significance level, this does not mean that the null hypothesis is being accepted, just that the null hypothesis is not rejected.

Finally, p-values and critical values only indicate statistical significance, and may not necessarily indicate that the study's findings are significant within their context. For example, if a new medicine and a placebo are tested on different populations, and the medicine is found to have a statistically significant effect, it may not necessarily mean that there is clinical significance. It is possible for a finding to be both statistically and clinically significant, or only one or the other. For large sample sizes, it is possible for results to indicate statistical significance even when the effect is actually small and unimportant. Conversely, a small sample may not exhibit statistical significance even when the effect is large and potentially important. Thus, it is important to fully understand the scope of a study, as well as the statistical methods used, in order to effectively interpret the results and draw accurate, unbiased conclusions.

The average score on a national mathematics exam taken by high school seniors is an 82 with a standard deviation of 8. A sample of 1000 seniors achieved an average score of 68. Perform a Z-test to determine whether there is a statistically significant difference between the national average and that of the sample of seniors at a significance level of 0.05.

We want to determine whether there is any difference, so the null hypothesis is that there is no difference, or

H 0 : μ = 82

and the alternative hypothesis is:

H a : μ ≠ 82

Thus, a two-tailed Z-test should be conducted since differences on either side of the distribution must be accounted for.

The selected significance level is:

α = 0.05

This value must be greater than the p-value in order to conclude that the difference in scores is statistically significant.

Since the population standard deviation and mean are known, the Z-score can be computed as:

Based on the selected significance level and the use of a two-tailed Z-test, the critical values are Z = ± 1.96. Since the Z-score of the observed value lies between both tails (rather than within one of them), we fail to reject the null hypothesis, as depicted in the figure below.

z test null and alternative hypothesis

Thus, we conclude that the difference between the observed mean and the population mean is not statistically significant for a significance level of 0.05.

However, had we selected a significance level of 0.10, the critical values would be Z = ±1.645, and Z = -1.75 would lie within the left tail of the distribution. In this case, we would reject the null hypothesis in favor of the alternative hypothesis, and conclude that the observed value is statistically significant for a significance level of 0.10.

The above discussion involved hypothesis testing for one sample, where an observed value was compared to the expected population parameter. In certain cases, scientists may want to compare the means of two samples. In such cases, a two-sample Z-test is used instead.

Two-sample Z-test

A two-sample Z-test is conducted using the same procedures described above for a one-sample Z-test, with the exception that the Z-score is computed using the following formula:

where μ 1 and μ 2 are the means of the two respective populations, x 1 and x 2 are the sample means, and n 1 and n 2 are the sample sizes.

Researchers want to test whether a certain drug has any effect on the scores received by patients who are administered the drug prior to performing a physical stress test. The researchers place patients into 2 groups: 500 are placed into the experimental group and are administered the drug; 300 are placed into the control group and are administered a placebo. Both groups then perform the physical stress test, the results of which are as follows:

Experimental group:  x = 50; σ = 16; n = 100
Control group:  x = 45; σ = 13; n = 150

Determine whether or not there is a statistically significant difference between the two groups at a significance level of 0.05.

The null hypothesis is that there is no difference, so:

H 0 : μ 1 = μ 2

Also, since it is assumed that the null hypothesis is true, μ 1 - μ 2 = 0.

The alternative hypothesis is that there is a difference, so:

H a : μ 1 ≠ μ 2

The selected significance level is 0.05, and we conduct a two-tailed test since we are looking for any observable difference.

The Z-score is then calculated as follows:

Using a Z table (or a p-value calculator), the p-value for a two-tailed Z-test for a Z-score of 2.604 is 0.009214. Since the p-value is less than the selected significance level, we reject the null hypothesis in favor of the alternative hypothesis, and conclude that the drug has a statistically significant effect on the performance of the patients. Since the Z-score lies in the right tail, we may conclude that patients who received the drug scored significantly better than those who received the placebo. If the Z-score were to lie in left tail, we would conclude the opposite: that patients who received the drug performed significantly worse.

We could also have used the critical values Z = ±1.96 for a significance level of 0.05 to reach the same conclusion, since 2.604 lies within the critical region denoted by the right tail of the distribution, as shown in the figure below.

z test null and alternative hypothesis

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

10.1 - z-test: when population variance is known.

Let's start by acknowledging that it is completely unrealistic to think that we'd find ourselves in the situation of knowing the population variance, but not the population mean. Therefore, the hypothesis testing method that we learn on this page has limited practical use. We study it only because we'll use it later to learn about the "power" of a hypothesis test (by learning how to calculate Type II error rates). As usual, let's start with an example.

Example 10-1 Section  

boy playing

Boys of a certain age are known to have a mean weight of \(\mu=85\) pounds. A complaint is made that the boys living in a municipal children's home are underfed. As one bit of evidence, \(n=25\) boys (of the same age) are weighed and found to have a mean weight of \(\bar{x}\) = 80.94 pounds. It is known that the population standard deviation \(\sigma\) is 11.6 pounds (the unrealistic part of this example!). Based on the available data, what should be concluded concerning the complaint?

The null hypothesis is \(H_0:\mu=85\), and the alternative hypothesis is \(H_A:\mu<85\). In general, we know that if the weights are normally distributed, then:

\(Z=\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}\)

follows the standard normal \(N(0,1)\) distribution. It is actually a bit irrelevant here whether or not the weights are normally distributed, because the same size \(n=25\) is large enough for the Central Limit Theorem to apply. In that case, we know that \(Z\), as defined above, follows at least approximately the standard normal distribution. At any rate, it seems reasonable to use the test statistic:

\(Z=\dfrac{\bar{X}-\mu_0}{\sigma/\sqrt{n}}\)

for testing the null hypothesis

\(H_0:\mu=\mu_0\)

against any of the possible alternative hypotheses \(H_A:\mu \neq \mu_0\), \(H_A:\mu<\mu_0\), and \(H_A:\mu>\mu_0\).

For the example in hand, the value of the test statistic is:

\(Z=\dfrac{80.94-85}{11.6/\sqrt{25}}=-1.75\)

The critical region approach tells us to reject the null hypothesis at the \(\alpha=0.05\) level if \(Z<-1.645\). Therefore, we reject the null hypothesis because \(Z=-1.75<-1.645\), and therefore falls in the rejection region:

As always, we draw the same conclusion by using the \(p\)-value approach. Recall that the \(p\)-value approach tells us to reject the null hypothesis at the \(\alpha=0.05\) level if the \(p\)-value \(\le \alpha=0.05\). In this case, the \(p\)-value is \(P(Z<-1.75)=0.0401\):

As expected, we reject the null hypothesis because the \(p\)-value \(=0.0401<\alpha=0.05\).

By the way, we'll learn how to ask Minitab to conduct the \(Z\)-test for a mean \(\mu\) in a bit, but this is what the Minitab output for this example looks like this:

Test of mu = 85 vs  < 85
The assumed standard deviation = 11.6
N Mean SE Mean 95% Upper Bound Z P
25 80.9400 2.3200 84.7561 -1.75 0.040

z test null and alternative hypothesis

  • The Open University
  • Accessibility hub
  • Guest user / Sign out
  • Study with The Open University

My OpenLearn Profile

Personalise your OpenLearn profile, save your favourite content and get recognition for your learning

About this free course

Become an ou student, download this course, share this free course.

Data analysis: hypothesis testing

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.1 Formulating null and alternative hypotheses

In the world of scientific inquiry, you often begin with a null hypothesis (H 0 ), which expresses the currently accepted value for a parameter in the population. The alternative hypothesis (H a ), on the other hand, is the opposite of the null hypothesis and challenges the currently accepted value.

To illustrate this concept of null and alternative hypotheses, you will look at some well-known stories and examples.

In ancient and medieval times, the widely held belief was that all planets orbited around the Earth, as the Earth was considered the centre of the universe. This idea can be considered the null hypothesis, as it represents the currently accepted value for a parameter in the population. Thus, it can be written as:

H 0 : All planets orbit around the Earth.

Planets revolving around the Sun

Planets revolving around the Sun

In the world of business and finance, the idea that paper money must be backed by gold (the gold standard) was also a commonly held belief for a long time. This belief can be considered a null hypothesis. However, following the Great Depression, people began to question this belief and broke the link between banknotes and gold. This alternative hypothesis challenged the gold standard, and it eventually became widely accepted that the value of paper money is not necessarily equal to a fixed amount of gold. Thus, H 0 and H a statements can be written as:

H 0 : The value of paper money is equal to a fixed amount of gold.

H a : The value of paper money is not equal to a fixed amount of gold.

In modern times, people generally place their trust in the value of banknotes issued by central banks or monetary authorities, which are backed by a strong government. This belief can be considered a null hypothesis. However, digital currency, such as Bitcoin, has emerged as an alternative to traditional paper money. Bitcoin is not backed by any central bank or monetary authority, and transactions involving Bitcoin are verified by network nodes using cryptography and recorded in a blockchain. This alternative hypothesis challenges the belief that the value of paper money is solely based on people's trust in central banks or monetary authorities. Thus, H 0 and H a statements can be written as:

H 0 : The value of paper money is equal to people’s trust in central banks or monetary authorities.

H a : The value of paper money is not equal to people’s trust in central banks or monetary authorities.

In conclusion, the alternative hypothesis always challenges the idea expressed in the null hypothesis. By testing the null hypothesis against the alternative hypothesis, you can determine which idea is more supported by the available data. The alternative hypothesis is often referred to as a ‘research hypothesis’ because it initiates the motivation and opportunities for further research.

Let’s return to the first example given in Section 1. If you see that your friends and relatives make more or less than £26,000 annually on average, perhaps you should question the widely accepted proposition of £26,000 as the average annual salary in the UK. This will enable you to develop an alternative hypothesis:

H a : Average annual salary in the UK is not equal to £26,000.

The following activity will test your knowledge of null and alternative hypotheses.

Activity 1 Null hypothesis versus alternative hypothesis

Read the following statements. Can you develop a null hypothesis and an alternative hypothesis?

‘It is believed that a high-end coffee machine produces a cup of caffè latte with an average of 1 cm of foam. The hotel employee claims that after the machine has been repaired, it is no longer able to produce a cup of caffè latte with 1cm foam.’

H 0 : a coffee machine makes a cup of caffè latte with 1cm foam on average.

H a : a coffee machine cannot make a cup of caffè latte with 1 cm foam on average.

If you have developed the hypotheses H 0 and H a as mentioned in the discussion to Activity 1, you have shown that you are familiar with the structure of different types of hypotheses. However, in the next section you will explore the concept of hypothesis formulation further.

Previous

9.1 Null and Alternative Hypotheses

The actual test begins by considering two hypotheses . They are called the null hypothesis and the alternative hypothesis . These hypotheses contain opposing viewpoints.

H 0 : The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

H a : The alternative hypothesis: It is a claim about the population that is contradictory to H 0 and what we conclude when we reject H 0 . This is usually what the researcher is trying to prove.

Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data.

After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are "reject H 0 " if the sample information favors the alternative hypothesis or "do not reject H 0 " or "decline to reject H 0 " if the sample information is insufficient to reject the null hypothesis.

Mathematical Symbols Used in H 0 and H a :

equal (=) not equal (≠) greater than (>) less than (<)
greater than or equal to (≥) less than (<)
less than or equal to (≤) more than (>)

H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.

Example 9.1

H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ .30 H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30

A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.

Example 9.2

We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are: H 0 : μ = 2.0 H a : μ ≠ 2.0

We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 66
  • H a : μ __ 66

Example 9.3

We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are: H 0 : μ ≥ 5 H a : μ < 5

We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : μ __ 45
  • H a : μ __ 45

Example 9.4

In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses. H 0 : p ≤ 0.066 H a : p > 0.066

On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.

  • H 0 : p __ 0.40
  • H a : p __ 0.40

Collaborative Exercise

Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics-2e/pages/9-1-null-and-alternative-hypotheses

© Jul 18, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

  • Neuroscience

Reevaluating the Neural Noise Hypothesis in Dyslexia: Insights from EEG and 7T MRS Biomarkers

Agnieszka glica, katarzyna wasilewska, julia jurkowska, jarosław żygierewicz, bartosz kossowski.

  • Katarzyna Jednoróg author has email address
  • Laboratory of Language Neurobiology, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteur 3 Street, 02-093 Warsaw, Poland
  • Faculty of Physics, University of Warsaw, Pasteur 5 Street, 02-093 Warsaw, Poland
  • Laboratory of Brain Imaging, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Pasteur 3 Street, 02-093 Warsaw, Poland
  • https://doi.org/ 10.7554/eLife.99920.1
  • Open access
  • Copyright information

The neural noise hypothesis of dyslexia posits an imbalance between excitatory and inhibitory (E/I) brain activity as an underlying mechanism of reading difficulties. This study provides the first direct test of this hypothesis using both indirect EEG power spectrum measures in 120 Polish adolescents and young adults (60 with dyslexia, 60 controls) and direct glutamate (Glu) and gamma-aminobutyric acid (GABA) concentrations from magnetic resonance spectroscopy (MRS) at 7T MRI scanner in half of the sample. Our results, supported by Bayesian statistics, show no evidence of E/I balance differences between groups, challenging the hypothesis that cortical hyperexcitability underlies dyslexia. These findings suggest alternative mechanisms must be explored and highlight the need for further research into the E/I balance and its role in neurodevelopmental disorders.

eLife assessment

The authors combined neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) measures to empirically evaluate the neural noise hypothesis of developmental dyslexia. Their results are solid , supported by consistent findings from the two complementary methodologies and Bayesian statistics. Additional analyses, particularly on the neurochemical measures, are necessary to further substantiate the results. This study is useful for understanding the neural mechanisms of dyslexia and neural development in general.

  • https://doi.org/ 10.7554/eLife.99920.1.sa3
  • Read the peer reviews
  • About eLife assessments

Introduction

According to the neural noise hypothesis of dyslexia, reading difficulties stem from an imbalance between excitatory and inhibitory (E/I) neural activity ( Hancock et al., 2017 ). The hypothesis predicts increased cortical excitation leading to more variable and less synchronous neural firing. This instability supposedly results in disrupted sensory representations and impedes phonological awareness and multisensory integration skills, crucial for learning to read ( Hancock et al., 2017 ). Yet, studies testing this hypothesis are lacking.

The non-invasive measurement of the E/I balance can be derived through assessment of glutamate (Glu) and gamma-aminobutyric acid (GABA) neurotransmitters concentration via magnetic resonance spectroscopy (MRS) ( Finkelman et al., 2022 ) or through global, indirect estimations from the electroencephalography (EEG) signal ( Ahmad et al., 2022 ).

Direct measurements of Glu and GABA yielded conflicting findings. Higher Glu concentrations in the midline occipital cortex correlated with poorer reading performance in children ( Del Tufo et al., 2018 ; Pugh et al., 2014 ), while elevated Glu levels in the anterior cingulate cortex (ACC) corresponded to greater phonological skills ( Lebel et al., 2016 ). Elevated GABA in the left inferior frontal gyrus was linked to reduced verbal fluency in adults ( Nakai and Okanoya, 2016 ), and increased GABA in the midline occipital cortex in children was associated with slower reaction times in a linguistic task ( Del Tufo et al., 2018 ). However, notable null findings exist regarding dyslexia status and Glu levels in the ACC among children ( Horowitz-Kraus et al., 2018 ) as well as Glu and GABA levels in the visual and temporo-parietal cortices in both children and adults ( Kossowski et al., 2019 ).

Both beta (∼13-28 Hz) and gamma (> 30 Hz) oscillations may serve as E/I balance indicators ( Ahmad et al., 2022 ), as greater GABA-ergic activity has been associated with greater beta power ( Jensen et al., 2005 ; Porjesz et al., 2002 ) and gamma power or peak frequency ( Brunel and Wang, 2003 ; Chen et al., 2017 ). Resting-state analyses often reported nonsignificant beta power associations with dyslexia ( Babiloni et al., 2012 ; Fraga González et al., 2018 ; Xue et al., 2020 ), however, one study indicated lower beta power in dyslexic compared to control boys ( Fein et al., 1986 ). Mixed results were also observed during tasks. One study found decreased beta power in the dyslexic group ( Spironelli et al., 2008 ), while the other increased beta power relative to the control group ( Rippon and Brunswick, 2000 ). Insignificant relationship between resting gamma power and dyslexia was reported ( Babiloni et al., 2012 ; Lasnick et al., 2023 ). When analyzing auditory steady-state responses, the dyslexic group had a lower gamma peak frequency, while no significant differences in gamma power were observed ( Rufener and Zaehle, 2021 ). Essentially, the majority of studies in dyslexia examining gamma frequencies evaluated cortical entrainment to auditory stimuli ( Lehongre et al., 2011 ; Marchesotti et al., 2020 ; Van Hirtum et al., 2019 ). Therefore, the results from these tasks do not provide direct evidence of differences in either gamma power or peak frequency between the dyslexic and control groups.

The EEG signal comprises both oscillatory, periodic activity, and aperiodic activity, characterized by a gradual decrease in power as frequencies rise (1/f signal) ( Donoghue et al., 2020 ). Recently recognized as a biomarker of E/I balance, a lower exponent of signal decay (flatter slope) indicates a greater dominance of excitation over inhibition in the brain, as shown by the simulation models of local field potentials, ratio of AMPA/GABA a synapses in the rat hippocampus ( Gao et al., 2017 ) and recordings under propofol or ketamine in macaques and humans ( Gao et al., 2017 ; Waschke et al., 2021 ). However, there are also pharmacological studies providing mixed results ( Colombo et al., 2019 ; Salvatore et al., 2024 ). Nonetheless, the 1/f signal has shown associations with various conditions putatively characterized by changes in E/I balance, such as early development in infancy ( Schaworonkow and Voytek, 2021 ), healthy aging ( Voytek et al., 2015 ) and neurodevelopmental disorders like ADHD ( Ostlund et al., 2021 ), autism spectrum disorder ( Manyukhina et al., 2022 ) or schizophrenia ( Molina et al., 2020 ). Despite its potential relevance, the evaluation of the 1/f signal in dyslexia remains limited to one study, revealing flatter slopes among dyslexic compared to control participants at rest ( Turri et al., 2023 ), thereby lending support to the notion of neural noise in dyslexia.

Here, we examined both indirect (1/f signal, beta, and gamma oscillations during both rest and a spoken language task) and direct (Glu and GABA) biomarkers of E/I balance in participants with dyslexia and age-matched controls. The neural noise hypothesis predicts flatter slopes of 1/f signal, decreased beta and gamma power, and higher Glu concentrations in the dyslexic group. Furthermore, we tested the relationships between different E/I measures. Flatter slopes of 1/f signal should be related to higher Glu level, while enhanced beta and gamma power to increased GABA level.

No evidence for group differences in the EEG E/I biomarkers

We recruited 120 Polish adolescents and young adults – 60 with dyslexia diagnosis and 60 controls matched in sex, age, and family socio-economic status. The dyslexic group scored lower in all reading and reading-related tasks and higher in the Polish version of the Adult Reading History Questionnaire (ARHQ-PL) ( Bogdanowicz et al., 2015 ),where a higher score indicates a higher risk of dyslexia (see Table S1 in the Supplementary Material). Although all participants were within the intellectual norm, the dyslexic group scored lower on the IQ scale (including nonverbal subscale only) than the control group. However, the Bayesian statistics did not provide evidence for the difference between groups in the nonverbal IQ.

We analyzed the aperiodic (exponent and offset) components of the EEG signal at rest and during a spoken language task, where participants listened to a sentence and had to indicate its veracity. Due to a technical error, the signal from one person (a female from the dyslexic group) was not recorded during most of the language task and was excluded from the analyses. Hence, the results are provided for 119 participants – 59 in the dyslexic and 60 in the control group.

First, aperiodic parameter values were averaged across all electrodes and compared between groups (dyslexic, control) and conditions (resting state, language task) using a 2×2 repeated measures ANOVA. Age negatively correlated both with the exponent ( r = -.27, p = .003, BF 10 = 7.96) and offset ( r = -.40, p < .001, BF 10 = 3174.29) in line with previous investigations ( Cellier et al., 2021 ; McSweeney et al., 2021 ; Schaworonkow and Voytek, 2021 ; Voytek et al., 2015 ), therefore we included age as a covariate. Post-hoc tests are reported with Bonferroni corrected p -values.

For the mean exponent, we found a significant effect of age ( F (1,116) = 8.90, p = .003, η 2 p = .071, BF incl = 10.47), while the effects of condition ( F (1,116) = 2.32, p = .131, η 2 p = .020, BF incl = 0.39) and group ( F (1,116) = 0.08, p = .779, η 2 p = .001, BF incl = 0.40) were not significant and Bayes Factor did not provide evidence for either inclusion or exclusion. Interaction between group and condition ( F (1,116) = 0.16, p = .689, η 2 p = .001, BF incl = 0.21) was not significant and Bayes Factor indicated against including it in the model.

For the mean offset, we found significant effects of age ( F (1,116) = 22.57, p < .001, η 2 p = .163, BF incl = 1762.19) and condition ( F (1,116) = 23.04, p < .001, η 2 p = .166, BF incl > 10000) with post-hoc comparison indicating that the offset was lower in the resting state condition ( M = -10.80, SD = 0.21) than in the language task ( M = -10.67, SD = 0.26, p corr < .001). The effect of group ( F (1,116) = 0.00, p = .964, η 2 p = .000, BF incl = 0.54) was not significant while Bayes Factor did not provide evidence for either inclusion or exclusion. Interaction between group and condition was not significant ( F (1,116) = 0.07, p = .795, η 2 p = .001, BF incl = 0.22) and Bayes Factor indicated against including it in the model.

Next, we restricted analyses to language regions and averaged exponent and offset values from the frontal electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), as well as temporal electrodes, corresponding to the left (T7, TP7, TP9) and right superior temporal sulcus, STS (T8, TP8, TP10)( Giacometti et al., 2014 )( Scrivener and Reader, 2022 ). A 2×2×2×2 (group, condition, hemisphere, region) repeated measures ANOVA with age as a covariate was applied. Power spectra from the left STS at rest and during the language task are presented in Figure 1A and C , while the results for the exponent, offset, and beta power are presented in Figure 1B and D .

z test null and alternative hypothesis

Overview of the main results obtained in the study. (A) Power spectral densities averaged across 3 electrodes (T7, TP7, TP9) corresponding to the left superior temporal sulcus (STS) separately for dyslexic (DYS) and control (CON) groups at rest and (C) during the language task. (B) Plots illustrating results for the exponent, offset, and the beta power from the left STS electrodes at rest and (D ) during the language task. (E) Group results (CON > DYS) from the fMRI localizer task for words compared to the control stimuli (p < .05 FWE cluster threshold) and overlap of the MRS voxel placement across participants. (F) MRS spectra separately for DYS and CON groups. (G) Plots illustrating results for the Glu, GABA, Glu/GABA ratio and the Glu/GABA imbalance. (H ) Semi-partial correlation between offset at rest (left STS electrodes) and Glu controlling for age and gray matter volume (GMV).

For the exponent, there were significant effects of age ( F (1,116) = 14.00, p < .001, η 2 p = .108, BF incl = 11.46) and condition F (1,116) = 4.06, p = .046, η 2 p = .034, BF incl = 1.88), however, Bayesian statistics did not provide evidence for either including or excluding the condition factor. Furthermore, post-hoc comparisons did not reveal significant differences between the exponent at rest ( M = 1.51, SD = 0.17) and during the language task ( M = 1.51, SD = 0.18, p corr = .546). There was also a significant interaction between region and group, although Bayes Factor indicated against including it in the model ( F (1,116) = 4.44, p = .037, η 2 p = .037, BF incl = 0.25). Post-hoc comparisons indicated that the exponent was higher in the frontal than in the temporal region both in the dyslexic ( M frontal = 1.54, SD frontal = 0.15, M temporal = 1.49, SD temporal = 0.18, p corr < .001) and in the control group ( M frontal = 1.54, SD frontal = 0.17, M temporal = 1.46, SD temporal = 0.20, p corr < .001). The difference between groups was not significant either in the frontal ( p corr = .858) or temporal region ( p corr = .441). The effects of region ( F (1,116) = 1.17, p = .282, η 2 p = .010, BF incl > 10000) and hemisphere ( F (1,116) = 1.17, p = .282, η 2 p = .010, BF incl = 12.48) were not significant, although Bayesian statistics indicated in favor of including them in the model. Furthermore, the interactions between condition and group ( F (1,116) = 0.18, p = .673, η 2 p = .002, BF incl = 3.70), and between region, hemisphere, and condition ( F (1,116) = 0.11, p = .747, η 2 p = .001, BF incl = 7.83) were not significant, however Bayesian statistics indicated in favor of including these interactions in the model. The effect of group ( F (1,116) = 0.12, p = .733, η 2 p = .001, BF incl = 1.19) was not significant, while Bayesian statistics did not provide evidence for either inclusion or exclusion. Any other interactions were not significant and Bayes Factor indicated against including them in the model.

In the case of offset, there were significant effects of condition ( F (1,116) = 20.88, p < .001, η 2 p = .153, BF incl > 10000) and region ( F (1,116) = 6.18, p = .014, η 2 p = .051, BF incl > 10000). For the main effect of condition, post-hoc comparison indicated that the offset was lower in the resting state condition ( M = -10.88, SD = 0.33) than in the language task ( M = -10.76, SD = 0.38, p corr < .001), while for the main effect of region, post-hoc comparison indicated that the offset was lower in the temporal ( M = -10.94, SD = 0.37) as compared to the frontal region ( M = -10.69, SD = 0.34, p corr < .001). There was also a significant effect of age ( F (1,116) = 20.84, p < .001, η 2 p = .152, BF incl = 0.23) and interaction between condition and hemisphere, ( F (1,116) = 4.35, p = .039, η 2 p = .036, BF incl = 0.21), although Bayes Factor indicated against including these factors in the model. Post-hoc comparisons for the condition*hemisphere interaction indicated that the offset was lower in the resting state condition than in the language task both in the left ( M rest = -10.85, SD rest = 0.34, M task = -10.73, SD task = 0.40, p corr < .001) and in the right hemisphere ( M rest = -10.91, SD rest = 0.31, M task = -10.79, SD task = 0.37, p corr < .001) and that the offset was lower in the right as compared to the left hemisphere both at rest ( p corr < .001) and during the language task ( p corr < .001). The interactions between region and condition ( F (1,116) = 1.76, p = .187, η 2 p = .015, BF incl > 10000), hemisphere and group ( F (1,116) = 1.58, p = .211, η 2 p = .013, BF incl = 1595.18), region and group ( F (1,116) = 0.27, p = .605, η 2 p = .002, BF incl = 9.32), as well as between region, condition, and group ( F (1,116) = 0.21, p = .651, η 2 p = .002, BF incl = 2867.18) were not significant, although Bayesian statistics indicated in favor of including them in the model. The effect of group ( F (1,116) = 0.18, p = .673, η 2 p = .002, BF incl < 0.00001) was not significant and Bayesian statistics indicated against including it in the model. Any other interactions were not significant and Bayesian statistics indicated against including them in the model or did not provide evidence for either inclusion or exclusion.

Then, we analyzed the aperiodic-adjusted brain oscillations. Since the algorithm did not find the gamma peak (30-43 Hz) above the aperiodic component in the majority of participants, we report the results only for the beta (14-30 Hz) power. We performed a similar regional analysis as for the exponent and offset with a 2×2×2×2 (group, condition, hemisphere, region) repeated measures ANOVA. However, we did not include age as a covariate, as it did not correlate with any of the periodic measures. The sample size was 117 (DYS n = 57, CON n = 60) since in 2 participants the algorithm did not find the beta peak above the aperiodic component in the left frontal electrodes during the task.

The analysis revealed a significant effect of condition ( F (1,115) = 8.58, p = .004, η 2 p = .069, BF incl = 5.82) with post-hoc comparison indicating that the beta power was greater during the language task ( M = 0.53, SD = 0.22) than at rest ( M = 0.50, SD = 0.19, p corr = .004). There were also significant effects of region ( F (1,115) = 10.98, p = .001, η 2 p = .087, BF incl = 23.71), and hemisphere ( F (1,115) = 12.08, p < .001, η 2 p = .095, BF incl = 23.91). For the main effect of region, post-hoc comparisons indicated that the beta power was greater in the temporal ( M = 0.52, SD = 0.21) as compared to the frontal region ( M = 0.50, SD = 0.19, p corr = .001), while for the main effect of hemisphere, post-hoc comparisons indicated that the beta power was greater in the right ( M = 0.52, SD = 0.20) than in the left hemisphere ( M = 0.51, SD = 0.20, p corr < .001). There was a significant interaction between condition and region ( F (1,115) = 12.68, p < .001, η 2 p = .099, BF incl = 55.26) with greater beta power during the language task as compared to rest significant in the temporal ( M rest = 0.50, SD rest = 0.20, M task = 0.55, SD task = 0.24, p corr < .001), while not in the frontal region ( M rest = 0.49, SD rest = 0.18, M task = 0.51, SD task = 0.22, p corr = .077). Also, greater beta power in the temporal as compared to the frontal region was significant during the language task ( p corr < .001), while not at rest ( p corr = .283). The effect of group ( F (1,115) = 0.05, p = .817, η 2 p = .000, BF incl < 0.00001) was not significant and Bayes Factor indicated against including it in the model. Any other interactions were not significant and Bayesian statistics indicated against including them in the model or did not provide evidence for either inclusion or exclusion.

Additionally, building upon previous findings which demonstrated differences in dyslexia in aperiodic and periodic components within the parieto-occipital region ( Turri et al., 2023 ), we have included analyses for the same cluster of electrodes in the Supplementary Material. However, in this region, we also did not find evidence for group differences either in the exponent, offset or beta power.

No evidence for group differences in Glu and GABA concentrations in the left STS

In total, 59 out of 120 participants underwent MRS session at 7T MRI scanner - 29 from the dyslexic group (13 females, 16 males) and 30 from the control group (14 females, 16 males). The MRS voxel was placed in the left STS, in a region showing highest activation for both visual and auditory words (compared to control stimuli) localized individually in each participant, based on an fMRI task (see Figure 1E for overlap of the MRS voxel placement across participants and Figure 1F for MRS spectra). We decided to analyze the neurometabolites’ levels derived from the left STS, as this region is consistently related to functional and structural differences in dyslexia across languages ( Yan et al., 2021 ).

Due to insufficient magnetic homogeneity or interruption of the study by the participants, 5 participants from the dyslexic group had to be excluded. We excluded further 4 participants due to poor quality of the obtained spectra thus the results for Glu are reported for 50 participants - 21 in the dyslexic (12 females, 9 males) and 29 in the control group (13 females, 16 males). In the case of GABA, we additionally excluded 3 participants based on the Cramér-Rao Lower Bounds (CRLB) > 20%. Therefore, the results for GABA, Glu/GABA ratio and Glu/GABA imbalance are reported for 47 participants - 20 in the dyslexic (12 females, 8 males) and 27 in the control group (11 females, 16 males). Demographic and behavioral characteristics for the subsample of 47 participants are provided in the Table S2.

For each metabolite, we performed a separate univariate ANCOVA with the effect of group being tested and voxel’s gray matter volume (GMV) as a covariate (see Figure 1G ). For the Glu analysis, we also included age as a covariate, due to negative correlation between variables ( r = -.35, p = .014, BF 10 = 3.41). The analysis revealed significant effect of GMV ( F (1,46) = 8.18, p = .006, η 2 p = .151, BF incl = 12.54), while the effects of age ( F (1,46) = 3.01, p = .090, η 2 p = .061, BF incl = 1.15) and group ( F (1,46) = 1.94, p = .170, 1 = .040, BF incl = 0.63) were not significant and Bayes Factor did not provide evidence for either inclusion or exclusion.

Conversely, GABA did not correlate with age ( r = -.11, p = .481, BF 10 = 0.23), thus age was not included as a covariate. The analysis revealed a significant effect of GMV ( F (1,44) = 4.39, p = .042, η 2 p = .091, BF incl = 1.64), however Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.49, p = .490, η 2 p = .011, BF incl = 0.35) although Bayesian statistics did not provide evidence for either inclusion or exclusion.

Also, Glu/GABA ratio did not correlate with age ( r = -.05, p = .744, BF 10 = 0.19), therefore age was not included as a covariate. The results indicated that the effect of GMV was not significant ( F (1,44) = 0.95, p = .335, η 2 p = .021, BF incl = 0.43) while Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.01, p = .933, η 2 p = .000, BF incl = 0.29) and Bayes Factor indicated against including it in the model.

Following a recent study examining developmental changes in both EEG and MRS E/I biomarkers ( McKeon et al., 2024 ), we calculated an additional measure of Glu/GABA imbalance, computed as the absolute residual value from the linear regression of Glu predicted by GABA with greater values indicating greater Glu/GABA imbalance. Alike the previous work ( McKeon et al., 2024 ), we took the square root of this value to ensure a normal distribution of the data. This measure did not correlate with age ( r = -.05, p = .719, BF 10 = 0.19); thus, age was not included as a covariate. The results indicated that the effect of GMV was not significant ( F (1,44) = 0.63, p = .430, η 2 p = .014, BF incl = 0.37) while Bayes Factor did not provide evidence for either inclusion or exclusion. The effect of group was not significant ( F (1,44) = 0.74, p = .396, η 2 p = .016, BF incl = 0.39) although Bayesian statistics did not provide evidence for either inclusion or exclusion.

Correspondence between Glu and GABA concentrations and EEG E/I biomarkers is limited

Next, we investigated correlations between Glu and GABA concentrations in the left STS and EEG markers of E/I balance. Semi-partial correlations were performed ( Table 1 ) to control for confounding variables - for Glu the effects of age and GMV were regressed, for GABA, Glu/GABA ratio and Glu/GABA imbalance the effect of GMV was regressed, while for exponents and offsets the effect of age was regressed. For zero-order correlations between variables see Table S3.

z test null and alternative hypothesis

Semi-partial Correlations Between Direct and Indirect Markers of Excitatory-Inhibitory Balance. For Glu the Effects of Age and Gray Matter Volume (GMV) Were Regressed, for GABA, Glu/GABA Ratio and Glu/GABA Imbalance the Effect of GMV was Regressed, While for Exponents and Offsets the Effect of Age was Regressed

Glu negatively correlated with offset in the left STS both at rest ( r = -.38, p = .007, BF 10 = 6.28; Figure 1H ) and during the language task ( r = -.37, p = .009, BF 10 = 5.05), while any other correlations between Glu and EEG markers were not significant and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis. Furthermore, Glu/GABA imbalance positively correlated with exponent at rest both averaged across all electrodes ( r = .29, p = .048, BF 10 = 1.21), as well as in the left STS electrodes ( r = .35, p = .017, BF 10 = 2.87) although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Conversely, GABA and Glu/GABA ratio were not significantly correlated with any of the EEG markers and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis.

Testing the paths from neural noise to reading

The neural noise hypothesis of dyslexia predicts impact of the neural noise on reading through the impairment of 1) phonological awareness, 2) lexical access and generalization and 3) multisensory integration ( Hancock et al., 2017 ). Therefore, we analyzed correlations between these variables, reading skills and direct and indirect markers of E/I balance. For the composite score of phonological awareness, we averaged z-scores from phoneme deletion, phoneme and syllable spoonerisms tasks. For the composite score of lexical access and generalization we averaged z-scores from objects, colors, letters and digits subtests from rapid automatized naming (RAN) task, while for the composite score of reading we averaged z-scores from words and pseudowords read per minute, and text reading time in reading comprehension task. The outcomes from the RAN and reading comprehension task have been transformed from raw time scores to items/time scores in order to provide the same direction of relationships for all z-scored measures, with greater values indicating better skills. For the multisensory integration score we used results from the redundant target effect task reported in our previous work ( Glica et al., 2024 ), with greater values indicating a greater magnitude of multisensory integration.

Age positively correlated with multisensory integration ( r = .38, p < .001, BF 10 = 87.98), composite scores of reading ( r = .22, p = .014, BF 10 = 2.24) and phonological awareness ( r = .21, p = .021, BF 10 = 1.59), while not with the composite score of RAN ( r = .13, p = .151, BF 10 = 0.32). Hence, we regressed the effect of age from multisensory integration, reading and phonological awareness scores and performed semi-partial correlations ( Table 2 , for zero-order correlations see Table S4).

z test null and alternative hypothesis

Semi-partial Correlations Between Reading, Phonological Awareness, Rapid Automatized Naming, Multisensory Integration and Markers of Excitatory-Inhibitory Balance. For Reading, Phonological Awareness and Multisensory Integration the Effect of Age was Regressed, for Glu the Effects of Age and Gray Matter Volume (GMV) Were Regressed, for GABA, Glu/GABA Ratio and Glu/GABA Imbalance the Effect of GMV was Regressed, While for Exponents and Offsets the Effect of Age was Regressed

Phonological awareness positively correlated with offset in the left STS at rest ( r = .18, p = .049, BF 10 = 0.77) and with beta power in the left STS both at rest ( r = .23, p = .011, BF 10 = 2.73; Figure 2A ) and during the language task ( r = .23, p = .011, BF 10 = 2.84; Figure 2B ), although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Furthermore, multisensory integration positively correlated with GABA concentration ( r = .31, p = .034, BF 10 = 1.62) and negatively with Glu/GABA ratio ( r = -.32, p = .029, BF 10 = 1.84), although Bayes Factor provided absence of evidence for either alternative or null hypothesis. Any other correlations between reading skills and E/I balance markers were not significant and Bayesian statistics indicated in favor of null hypothesis or provided absence of evidence for either hypothesis.

z test null and alternative hypothesis

Associations between beta power, phonological awareness and reading. (A) Semi-partial correlation between phonological awareness controlling for age and beta power (in the left STS electrodes) at rest and (B) during the language task. (C) Partial correlation between phonological awareness and reading controlling for age. (D) Mediation analysis results. Unstandardized b regression coefficients are presented. Age was included in the analysis as a covariate. 95% CI - 95% confidence intervals. left STS - values averaged across 3 electrodes corresponding to the left superior temporal sulcus (T7, TP7, TP9).

Given that beta power correlated with phonological awareness, and considering the prediction that neural noise impedes reading by affecting phonological awareness — we examined this relationship through a mediation model. Since phonological awareness correlated with beta power in the left STS both at rest and during language task, the outcomes from these two conditions were averaged prior to the mediation analysis. Macro PROCESS v4.2 ( Hayes, 2017 ) on IBM SPSS Statistics v29 with model 4 (simple mediation) with 5000 Bootstrap samples to assess the significance of indirect effect was employed. Since age correlated both with phonological awareness and reading, we also included age as a covariate.

The results indicated that both effects of beta power in the left STS ( b = .96, t (116) = 2.71, p = .008, BF incl = 7.53) and age ( b = .06, t (116) = 2.55, p = .012, BF incl = 5.98) on phonological awareness were significant. The effect of phonological awareness on reading was also significant ( b = .69, t (115) = 8.16, p < .001, BF incl > 10000), while the effects of beta power ( b = -.42, t (115) = -1.25, p = .213, BF incl = 0.52) and age ( b = .03, t (115) = 1.18, p = .241, BF incl = 0.49) on reading were not significant when controlling for phonological awareness. Finally, the indirect effect of beta power on reading through phonological awareness was significant ( b = .66, SE = .24, 95% CI = [.24, 1.18]), while the total effect of beta power was not significant ( b = .24, t (116) = 0.61, p = .546, BF incl = 0.41). The results from the mediation analysis are presented in Figure 2D .

Although similar mediation analysis could have been conducted for the Glu/GABA ratio, multisensory integration, and reading based on the correlations between these variables, we did not test this model due to the small sample size (47 participants), which resulted in insufficient statistical power.

The current study aimed to validate the neural noise hypothesis of dyslexia ( Hancock et al., 2017 ) utilizing E/I balance biomarkers from EEG power spectra and ultra-high-field MRS. Contrary to its predictions, we did not observe differences either in 1/f slope, beta power, or Glu and GABA concentrations in participants with dyslexia. Relations between E/I balance biomarkers were limited to significant correlations between Glu and the offset when controlling for age, and between Glu/GABA imbalance and the exponent.

In terms of indirect markers, our study found no evidence of group differences in the aperiodic components of the EEG signal. In most of the models, we did not find evidence for either including or excluding the effect of the group when Bayesian statistics were evaluated. The only exception was the regional analysis for the offset, where results indicated against including the group factor in the model. These findings diverge from previous research on an Italian cohort, which reported decreased exponent and offset in the dyslexic group at rest, specifically within the parieto-occipital region, but not the frontal region ( Turri et al., 2023 ). Despite our study involving twice the number of participants and utilizing a longer acquisition time, we observed no group differences, even in the same cluster of electrodes (refer to Supplementary Material). The participants in both studies were of similar ages. The only methodological difference – EEG acquisition with eyes open in our study versus both eyes-open and eyes-closed in the work by Turri and colleagues (2023) – cannot fully account for the overall lack of group differences observed. The diverging study outcomes highlight the importance of considering potential inflation of effect sizes in studies with smaller samples.

Although a lower exponent of the EEG power spectrum has been associated with other neurodevelopmental disorders, such as ADHD ( Ostlund et al., 2021 ) or ASD (but only in children with IQ below average) ( Manyukhina et al., 2022 ), our study suggests that this is not the case for dyslexia. Considering the frequent comorbidity of dyslexia and ADHD ( Germanò et al., 2010 ; Langer et al., 2019 ), increased neural noise could serve as a common underlying mechanism for both disorders. However, our specific exclusion of participants with a comorbid ADHD diagnosis indicates that the EEG spectral exponent cannot serve as a neurobiological marker for dyslexia in isolation. No information regarding such exclusion criteria was provided in the study by Turri et al. (2023) ; thus, potential comorbidity with ADHD may explain the positive findings related to dyslexia reported therein.

Regarding the aperiodic-adjusted oscillatory EEG activity, Bayesian statistics for beta power, indicated in favor of excluding the group factor from the model. Non-significant group differences in beta power at rest have been previously reported in studies that did not account for aperiodic components ( Babiloni et al., 2012 ; Fraga González et al., 2018 ; Xue et al., 2020 ). This again contrasts with the study by Turri et al. (2023) , which observed lower aperiodic-adjusted beta power (at 15-25 Hz) in the dyslexic group. Concerning beta power during task, our results also contrast with previous studies which showed either reduced ( Spironelli et al., 2008 ) or increased ( Rippon and Brunswick, 2000 ) beta activity in participants with dyslexia. Nevertheless, since both of these studies employed phonological tasks and involved children’s samples, their relevance to our work is limited.

In terms of direct neurometabolite concentrations derived from the MRS, we found no evidence for group differences in either Glu, GABA or Glu/GABA imbalance in the language-sensitive left STS. Conversely, the Bayes Factor suggested against including the group factor in the model for the Glu/GABA ratio. While no previous study has localized the MRS voxel based on the individual activation levels, nonsignificant group differences in Glu and GABA concentrations within the temporo-parietal and visual cortices have been reported in both children and adults ( Kossowski et al., 2019 ), as well as in the ACC in children ( Horowitz-Kraus et al., 2018 ). Although our MRS sample size was half that of the EEG sample, previous research reporting group differences in Glu concentrations involved an even smaller dyslexic cohort (10 participants with dyslexia and 45 typical readers in Pugh et al., 2014 ). Consistent with earlier studies that identified group differences in Glu and GABA concentrations ( Del Tufo et al., 2018 ; Pugh et al., 2014 ) we reported neurometabolite levels relative to total creatine (tCr), indicating that the absence of corresponding results cannot be ascribed to reference differences. Notably, our analysis of the fMRI localizer task revealed greater activation in the control group as compared to the dyslexic group within the left STS for words than control stimuli (see Figure 1E and the Supplementary Material) in line with previous observations ( Blau et al., 2009 ; Dębska et al., 2021 ; Yan et al., 2021 ).

Irrespective of dyslexia status, we found negative correlations between age and exponent and offset, consistent with previous research ( Cellier et al., 2021 ; McSweeney et al., 2021 ; Schaworonkow and Voytek, 2021 ; Voytek et al., 2015 ) and providing further evidence for maturational changes in the aperiodic components (indicative of increased E/I ratio). At the same time, in line with previous MRS works ( Kossowski et al., 2019 ; Marsman et al., 2013 ), we observed a negative correlation between age and Glu concentrations. This suggests a contrasting pattern to EEG results, indicating a decrease in neuronal excitation with age. We also found a condition-dependent change in offset, with a lower offset observed at rest than during the language task. The offset value represents the uniform shift in power across frequencies ( Donoghue et al., 2020 ), with a higher offset linked to increased neuronal spiking rates ( Manning et al., 2009 ). Change in offset between conditions is consistent with observed increased alpha and beta power during the task, indicating elevated activity in both broadband (offset) and narrowband (alpha and beta oscillations) frequency ranges during the language task.

In regard to relationships between EEG and MRS E/I balance biomarkers, we observed a negative correlation between the offset in the left STS (both at rest and during the task) and Glu levels, after controlling for age and GMV. This correlation was not observed in zero-order correlations (see Supplementary Material). Contrary to our predictions, informed by previous studies linking the exponent to E/I ratio ( Colombo et al., 2019 ; Gao et al., 2017 ; Waschke et al., 2021 ), we found the correlation with Glu levels to involve the offset rather than the exponent. This outcome was unexpected, as none of the referenced studies reported results for the offset. However, given the strong correlation between the exponent and offset observed in our study ( r = .68, p < .001, BF 10 > 10000 and r = .72, p < .001, BF 10 > 10000 at rest and during the task respectively) it is conceivable that similar association might be identified for the offset if it were analyzed.

Nevertheless, previous studies examining relationships between EEG and MRS E/I balance biomarkers ( McKeon et al., 2024 ; van Bueren et al., 2023 ) did not identify a similar negative association between Glu and the offset. Instead, one study noted a positive correlation between the Glu/GABA ratio and the exponent ( van Bueren et al., 2023 ), which was significant in the intraparietal sulcus but not in the middle frontal gyrus. This finding presents counterintuitive evidence, suggesting that an increased E/I balance, as indicated by MRS, is associated with a higher aperiodic exponent, considered indicative of decreased E/I balance. In line with this pattern, another study discovered a positive relationship between the exponent and Glu levels in the dorsolateral prefrontal cortex ( McKeon et al., 2024 ). Furthermore, they observed a positive correlation between the exponent and the Glu/GABA imbalance measure, calculated as the absolute residual value of a linear relationship between Glu and GABA ( McKeon et al., 2024 ), a finding replicated in the current work. This implies that a higher spectral exponent might not be directly linked to MRS-derived Glu or GABA levels, but rather to a greater disproportion (in either direction) between these neurotransmitters. These findings, alongside the contrasting relationships between EEG and MRS biomarkers and age, suggest that these methods may reflect distinct biological mechanisms of E/I balance.

Evidence regarding associations between neurotransmitters levels and oscillatory activity also remains mixed. One study found a positive correlation between gamma peak frequency and GABA concentration in the visual cortex ( Muthukumaraswamy et al., 2009 ), a finding later challenged by a study with a larger sample ( Cousijn et al., 2014 ). Similarly, a different study noted a positive correlation between GABA in the left STS and gamma power ( Balz et al., 2016 ), another study, found non-significant relation between these measures ( Wyss et al., 2017 ). Moreover, in a simultaneous EEG and MRS study, an event-related increase in Glu following visual stimulation was found to correlate with greater gamma power ( Lally et al., 2014 ). We could not investigate such associations, as the algorithm failed to identify a gamma peak above the aperiodic component for the majority of participants. Also, contrary to previous findings showing associations between GABA in the motor and sensorimotor cortices and beta power ( Cheng et al., 2017 ; Gaetz et al., 2011 ) or beta peak frequency ( Baumgarten et al., 2016 ), we observed no correlation between Glu or GABA levels and beta power. However, these studies placed MRS voxels in motor regions which are typically linked to movement-related beta activity ( Baker et al., 1999 ; Rubino et al., 2006 ; Sanes and Donoghue, 1993 ) and did not adjust beta power for aperiodic components, making direct comparisons with our findings limited.

Finally, we examined pathways posited by the neural noise hypothesis of dyslexia, through which increased neural noise may impact reading: phonological awareness, lexical access and generalization, and multisensory integration ( Hancock et al., 2017 ). Phonological awareness was positively correlated with the offset in the left STS at rest, and with beta power in the left STS, both at rest and during the task. Additionally, multisensory integration showed correlations with GABA and the Glu/GABA ratio. Since the Bayes Factor did not provide conclusive evidence supporting either the alternative or null hypothesis, these associations appear rather weak. Nonetheless, given the hypothesis’s prediction of a causal link between these variables, we further examined a mediation model involving beta power, phonological awareness, and reading skills. The results suggested a positive indirect effect of beta power on reading via phonological awareness, whereas both the direct (controlling for phonological awareness and age) and total effects (without controlling for phonological awareness) were not significant. This finding is noteworthy, considering that participants with dyslexia exhibited reduced phonological awareness and reading skills, despite no observed differences in beta power. Given the cross-sectional nature of our study, further longitudinal research is necessary to confirm the causal relation among these variables. The effects of GABA and the Glu/GABA ratio on reading, mediated by multisensory integration, warrant further investigation. Additionally, considering our finding that only males with dyslexia showed deficits in multisensory integration ( Glica et al., 2024 ), sex should be considered as a potential moderating factor in future analyses. We did not test this model here due to the smaller sample size for GABA measurements.

Our findings suggest that the neural noise hypothesis, as proposed by Hancock and colleagues (2017) , does not fully explain the reading difficulties observed in dyslexia. Despite the innovative use of both EEG and MRS biomarkers to assess excitatory-inhibitory (E/I) balance, neither method provided evidence supporting an E/I imbalance in dyslexic individuals. Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis. Additionally, our findings are consistent with another study by Tan et al. (2022) which found no evidence for increased variability (’noise’) in behavioral and fMRI response patterns in dyslexia. Together, these results highlight the need to explore alternative neural mechanisms underlying dyslexia and suggest that cortical hyperexcitability may not be the primary cause of reading difficulties.

In conclusion, while our study challenges the neural noise hypothesis as a sole explanatory framework for dyslexia, it also underscores the complexity of the disorder and the necessity for multifaceted research approaches. By refining our understanding of the neural underpinnings of dyslexia, we can better inform future studies and develop more effective interventions for those affected by this condition.

Materials and methods

Participants.

A total of 120 Polish participants aged between 15.09 and 24.95 years ( M = 19.47, SD = 3.06) took part in the study. This included 60 individuals with a clinical diagnosis of dyslexia performed by the psychological and pedagogical counseling centers (28 females and 32 males) and 60 control participants without a history of reading difficulties (28 females and 32 males). All participants were right-handed, born at term, without any reported neurological/psychiatric diagnosis and treatment (including ADHD), without hearing impairment, with normal or corrected-to-normal vision, and IQ higher than 80 as assessed by the Polish version of the Abbreviated Battery of the Stanford-Binet Intelligence Scale-Fifth Edition (SB5) ( Roid et al., 2017 ).

The study was approved by the institutional review board at the University of Warsaw, Poland (reference number 2N/02/2021). All participants (or their parents in the case of underaged participants) provided written informed consent and received monetary remuneration for taking part in the study.

Reading and Reading-Related Tasks

Participants’ reading skills were assessed by multiple paper-pencil tasks described in detail in our previous work ( Glica et al., 2024 ). Briefly, we evaluated words and pseudowords read in one minute ( Szczerbiński and Pelc-Pękała, 2013 ), rapid automatized naming ( Fecenec et al., 2013 ), and reading comprehension speed. We also assessed phonological awareness by a phoneme deletion task ( Szczerbiński and Pelc-Pękała, 2013 ) and spoonerisms tasks ( Bogdanowicz et al., 2016 ), as well as orthographic awareness (Awramiuk and Krasowicz-Kupis, 2013). Furthermore, we evaluated non-verbal perception speed ( Ciechanowicz and Stańczak, 2006 ) and short-term and working memory by forward and backward conditions from the Digit Span subtest from the WAIS-R ( Wechsler, 1981 ). We also assessed participants’ multisensory audiovisual integration by a redundant target effect task, which results have been reported in our previous work ( Glica et al., 2024 ).

Electroencephalography Acquisition and Procedure

EEG was recorded from 62 scalp and 2 ear electrodes using the Brain Products system (actiCHamp Plus, Brain Products GmbH, Gilching, Germany). Data were recorded in BrainVision Recorder Software (Vers. 1.22.0002, Brain Products GmbH, Gilching, Germany) with a 500 Hz sampling rate. Electrodes were positioned in line with the extended 10-20 system. Electrode Cz served as an online reference, while the Fpz as a ground electrode. All electrodes’ impedances were kept below 10 kΩ. Participants sat in a chair with their heads on a chin-rest in a dark, sound-attenuated, and electrically shielded room while the EEG was recorded during both a 5-minute eyes-open resting state and the spoken language comprehension task. The paradigm was prepared in the Presentation software (Version 20.1, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com ).

During rest, participants were instructed to relax and fixate their eyes on a white cross presented centrally on a black background. After 5 minutes, the spoken language comprehension task automatically started. The task consisted of 3 to 5 word-long sentences recorded in a speech synthesizer which were presented binaurally through sound-isolating earphones. After hearing a sentence, participants were asked to indicate whether the sentence was true or false by pressing a corresponding button. In total, there were 256 sentences – 128 true (e.g., “Plants need water”) and 128 false (e.g., “Dogs can fly”).

Sentences were presented in a random order in two blocks of 128 trials. At the beginning of each trial, a white fixation cross was presented centrally on a black background for 500 ms, then a blank screen appeared for either 500, 600, 700, or 800 ms (durations set randomly and equiprobably) followed by an auditory sentence presentation. The length of sentences ranged between 1.17 and 2.78 seconds and was balanced between true ( M = 1.82 seconds, SD = 0.29) and false sentences ( M = 1.82 seconds, SD = 0.32; t (254) = -0.21, p = .835; BF 10 = 0.14). After a sentence presentation, a blank screen was displayed for 1000 ms before starting the next trial. To reduce participants’ fatigue, a 1-minute break between two blocks of trials was introduced, and it took approximately 15 minutes to complete the task.

fMRI Acquisition and Procedure

MRI data were acquired using Siemens 3T Trio system with a 32-channel head coil. Structural data were acquired using whole brain 3D T1-weighted image (MP_RAGE, TI = 1100 ms, GRAPPA parallel imaging with acceleration factor PE = 2, voxel resolution = 1mm 3 , dimensions = 256×256×176). Functional data were acquired using whole-brain echo planar imaging sequence (TE = 30ms, TR = 1410 ms, flip angle FA = 90°, FOV = 212 mm, matrix size = 92×92, 60 axial slices 2.3mm thick, 2.3×2.3 mm in-plane resolution, multiband acceleration factor = 3). Due to a technical issue, data from two participants were acquired with a 12-channel coil (see Supplementary Material).

The fMRI task served as a localizer for later MRS voxel placement in language-sensitive left STS. The task was prepared using Presentation software (Version 20.1, Neurobehavioral Systems, Inc., Berkeley, CA, www.neurobs.com ) and consisted of three runs, each lasting 5 minutes and 9 seconds. Two runs involved the presentation of visual stimuli, while the third run of auditory stimuli. In each run, stimuli were presented in 12 blocks, with 14 stimuli per block. In visual runs, there were four blocks from each category: 1) 3 to 4 letters-long words, 2) the same words presented as a false font string (BACS font) ( Vidal et al., 2017 ), and 3) strings of 3 to 4-long consonants. Similarly, in the auditory run, there were four blocks from each category: 1) words recorded in a speech synthesizer, 2) the same words presented backward, and 3) consonant strings recorded in a speech synthesizer. Stimuli within each block were presented for 800 ms with a 400 ms break in between. The duration of each block was 16.8 seconds. Between blocks, a fixation cross was displayed for 8 seconds. Participants performed a 1-back task to maintain focus. The blocks were presented in a pseudorandom order and each block included 2 to 3 repeated stimuli.

MRS Acquisition and Procedure

The GE 7T system with a 32-channel coil was utilized. Structural data were acquired using whole brain 3D T1-weighted image (3D-SPGR BRAVO, TI = 450ms, TE = 2.6ms, TR = 6.6ms, flip angle = 12 deg, bandwidth = ±32.5kHz, ARC acceleration factor PE = 2, voxel resolution = 1mm, dimensions = 256 x 256 x 180). MRS spectra with 320 averages were acquired from the left STS using single-voxel spectroscopy semiLaser sequence ( Deelchand et al., 2021 ) (voxel size = 15 x 15 x 15 mm, TE = 28ms, TR = 4000ms, 4096 data points, water suppressed using VAPOR). Eight averages with unsuppressed water as a reference were collected.

To localize left STS, T1-weighted images from fMRI and MRS sessions were coregistered and fMRI peak coordinates were used as a center of voxel volume for MRS. Voxels were then adjusted to include only the brain tissue. During the acquisition, participants took part in a simple orthographic task.

Statistical Analyses

The continuous EEG signal was preprocessed in the EEGLAB ( Delorme and Makeig, 2004 ). The data were filtered between 0.5 and 45 Hz (Butterworth filter, 4th order) and re-referenced to the average of both ear electrodes. The data recorded during the break between blocks, as well as bad channels, were manually rejected. The number of rejected channels ranged between 0 and 4 ( M = 0.19, SD = 0.63). Next, independent component analysis (ICA) was applied. Components were automatically labeled by ICLabel ( Pion-Tonachini et al., 2019 ), and those classified with 50-100% source probability as eye blinks, muscle activity, heart activity, channel noise, and line noise, or with 0-50% source probability as brain activity, were excluded. Components labeled as “other” were visually inspected, and those identified as eye blinks and muscle activity were also rejected. The number of rejected components ranged between 11 and 46 ( M = 28.43, SD = 7.26). Previously rejected bad channels were interpolated using the nearest neighbor spline ( Perrin et al., 1989 , 1987 ).

The preprocessed data were divided into a 5-minute resting-state signal and a signal recorded during a spoken language comprehension task using MNE ( Gramfort, 2013 ) and custom Python scripts. The signal from the task was cut up based on the event markers indicating the beginning and end of a sentence. Only trials with correct responses given between 0 and 1000 ms after the end of a sentence were included. The signals recorded during every trial were further multiplied by the Tukey window with α = 0.01 in order to normalize signal amplitudes at the beginning and end of every trial. This allowed a smooth concatenation of signals recorded during task trials, resulting in a continuous signal derived only when participants were listening to the sentences.

The continuous signal from the resting state and the language task was epoched into 2-second-long segments. An automatic rejection criterion of +/-200 μV was applied to exclude epochs with excessive amplitudes. The number of epochs retained in the analysis ranged between 140–150 ( M = 149.66, SD = 1.20) in the resting state condition and between 102–226 ( M = 178.24, SD = 28.94) in the spoken language comprehension task.

Power spectral density (PSD) for 0.5-45 Hz in 0.5 Hz increments was calculated for every artifact-free epoch using Welch’s method for 2-second-long data segments windowed with a Hamming window with no overlap. The estimated PSDs were averaged for each participant and each channel separately for the resting state condition and the language task. Aperiodic and periodic (oscillatory) components were parameterized using the FOOOF method ( Donoghue et al., 2020 ). For each PSD, we extracted parameters for the 1-43 Hz frequency range using the following settings: peak_width_limits = [1, 12], max_n_peaks = infinite, peak_threshold = 2.0, mean_peak_height = 0.0, aperiodic_mode = ‘fixed’. Apart from broad-band aperiodic parameters (exponent and offset), we also extracted power, bandwidth, and the center frequency parameters for the theta (4-7 Hz), alpha (7-14 Hz), beta (14-30 Hz) and gamma (30-43 Hz) bands. Since in the majority of participants, the algorithm did not find the peak above the aperiodic component in theta and gamma bands, we calculated the results only for the alpha and beta bands. The results for other periodic parameters than the beta power are reported in Supplementary Material.

Apart from the frequentist statistics, we also performed Bayesian statistics using JASP ( JASP Team, 2023 ). For Bayesian repeated measures ANOVA, we reported the Bayes Factor for the inclusion of a given effect (BF incl ) with the ’across matched model’ option, as suggested by Keysers and colleagues (2020) , calculated as a likelihood ratio of models with a presence of a specific factor to equivalent models differing only in the absence of the specific factor. For Bayesian t -tests and correlations, we reported the BF 10 value, indicating the ratio of the likelihood of an alternative hypothesis to a null hypothesis. We considered BF incl/10 > 3 and BF incl/10 < 1/3 as evidence for alternative and null hypotheses respectively, while 1/3 < BF incl/10 < 3 as the absence of evidence ( Keysers et al., 2020 ).

MRS voxel localization in the native space

The data were analyzed using Statistical Parametric Mapping (SPM12, Wellcome Trust Centre for Neuroimaging, London, UK) run on MATLAB R2020b (The MathWorks Inc., Natick, MA, USA). First, all functional images were realigned to the participant’s mean. Then, T1-weighted images were coregistered to functional images for each subject. Finally, fMRI data were smoothed with a 6mm isotropic Gaussian kernel.

In each subject, the left STS was localized in the native space as a cluster in the middle and posterior left superior temporal sulcus, exhibiting higher activation for visual words versus false font strings and auditory words versus backward words (logical AND conjunction) at p < .01 uncorrected. For 6 participants, the threshold was lowered to p < .05 uncorrected, while for another 6 participants, the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.

In the Supplementary Material, we also performed the group-level analysis of the fMRI data (Tables S5-S7 and Figure S1).

MRS data were analyzed using fsl-mrs version 2.0.7 ( Clarke et al., 2021 ). Data stored in pfile format were converted into NIfTI-MRS using spec2nii tool. We then used the fsl_mrs_preproc function to automatically perform coil combination, frequency and phase alignment, bad average removal, combination of spectra, eddy current correction, shifting frequency to reference peak and phase correction.

To obtain information about the percentage of WM, GM and CSF in the voxel we used the svs_segmentation with results of fsl_anat as an input. Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12. Next, quantitative fitting was performed using fsl_mrs function. As a basis set, we utilized a collection of 27 metabolite spectra simulated using FID-A ( Simpson et al., 2017 ) and a script tailored for our experiment. We supplemented this with synthetic macromolecule spectra provided by fsl_mrs . Signals acquired with unsuppressed water served as water reference.

Spectra underwent quantitative assessment and visual inspection and those with linewidth higher than 20Hz, %CRLB higher than 20%, and poor fit to the model were excluded from the analysis (see Table S8 in the Supplementary Material for a detailed checklist). Glu and GABA concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine).

Data Availability Statement

Behavioral data, raw and preprocessed EEG data, 2 nd level fMRI data, preprocessed MRS data and Python script for the analysis of preprocessed EEG data can be found at OSF: https://osf.io/4e7ps/

Acknowledgements

This study was supported by the National Science Centre grant (2019/35/B/HS6/01763) awarded to Katarzyna Jednoróg.

We gratefully acknowledge valuable discussions with Ralph Noeske from GE Healthcare for his support in setting up the protocol for an ultra-high field MR spectroscopy and sharing the set-up for basis set simulation in FID-A.

  • Buitelaar J
  • dos Santos FP
  • Verschure PFMJ
  • McAlonan G.
  • Krasowicz-Kupis G
  • Albertini G
  • Roa Romero Y
  • Ittermann B
  • Senkowski D
  • Baumgarten TJ
  • Oeltzschner G
  • Hoogenboom N
  • Wittsack H-J
  • Schnitzler A
  • van Atteveldt N
  • Bogdanowicz KM
  • Bogdanowicz M
  • Sajewicz-Radtke U
  • Karpińska E
  • Łockiewicz M
  • Ciechanowicz A
  • Napolitani M
  • Gosseries O
  • Casarotto S
  • Brichant J-F
  • Massimini M
  • Chieregato A
  • Harrison PJ
  • Dzięgiel-Fivet G
  • Łuniewska M
  • Grabowska A
  • Deelchand DK
  • Berrington A
  • Seraji-Bozorgzad N
  • Del Tufo SN
  • Fulbright RK
  • Peterson EJ
  • Sebastian P
  • Jaworowska A
  • Yingling CD
  • Johnstone J
  • Davenport L
  • Finkelman T
  • Furman-Haran E
  • Fraga González G
  • van der Molen MJW
  • de Geus EJC
  • van der Molen MW.
  • Roberts TPL
  • Giacometti P
  • Wasilewska K
  • Kossowski B
  • Żygierewicz J
  • Horowitz-Kraus T
  • Ermentrout B
  • Wagenmakers E-J
  • Bogorodzki P
  • Roberts M V.
  • Haenschel C
  • Lasnick OHM
  • MacMaster FP
  • Villiermet N
  • Manyukhina VO
  • Prokofyev AO
  • Obukhova TS
  • Schneiderman JF
  • Altukhov DI
  • Stroganova TA
  • Orekhova E V
  • Marchesotti S
  • Donoghue JP
  • van den Heuvel MP
  • Hilleke E. HP
  • Hetherington H
  • McSweeney M
  • Swerdlow NR
  • Muthukumaraswamy SD
  • Swettenham JB
  • Karalunas SL
  • Echallier JF
  • Pion-Tonachini L
  • Kreutz-Delgado K
  • Edenberg HJ
  • Chorlian DB
  • O’Connor SJ
  • Rohrbaugh J
  • Schuckit MA
  • Hesselbrock V
  • Conneally PM
  • Tischfield JA
  • Begleiter H
  • Grigorenko EL
  • Seidenberg MS
  • Brunswick N
  • Hatsopoulos NG
  • Salvatore S V.
  • Zorumski CF
  • Mennerick S
  • Schaworonkow N
  • Scrivener CL
  • Hennessy TJ
  • Spironelli C
  • Penolazzi B
  • Szczerbiński M
  • Pelc-Pękała O
  • van Bueren NER
  • van der Ven SHG
  • Cohen Kadosh R.
  • Van Hirtum T
  • Ghesquière P
  • Tempesta ZR
  • Achermann R

Article and author information

Katarzyna jednoróg, for correspondence:, version history.

  • Sent for peer review : June 11, 2024
  • Preprint posted : June 12, 2024
  • Reviewed Preprint version 1 : September 5, 2024

© 2024, Glica et al.

This article is distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use and redistribution provided that the original author and source are credited.

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Be the first to read new articles from eLife

Inference for Partially Linear Quantile Regression Models in Ultrahigh Dimension

  • Published: 06 September 2024

Cite this article

z test null and alternative hypothesis

  • Hongwei Shi 1 ,
  • Weichao Yang 1 ,
  • Niwen Zhou 2 &
  • Xu Guo   ORCID: orcid.org/0000-0001-7676-8364 1  

Conditional quantile regression provides a useful statistical tool for modeling and inferring the relationship between the response and covariates in the heterogeneous data. In this paper, we develop a novel testing procedure for the ultrahigh-dimensional partially linear quantile regression model to investigate the significance of ultrahigh-dimensional interested covariates in the presence of ultrahigh-dimensional nuisance covariates. The proposed test statistic is an \(L_2\) -type statistic. We estimate the nonparametric component by some flexible machine learners to handle the complexity and ultrahigh dimensionality of considered models. We establish the asymptotic normality of the proposed test statistic under the null and local alternative hypotheses. A screening-based testing procedure is further provided to make our test more powerful in practice under the ultrahigh-dimensional regime. We evaluate the finite-sample performance of the proposed method via extensive simulation studies. A real application to a breast cancer dataset is presented to illustrate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

z test null and alternative hypothesis

Similar content being viewed by others

z test null and alternative hypothesis

Model averaging marginal regression for high dimensional conditional quantile prediction

A tuning-free efficient test for marginal linear effects in high-dimensional quantile regression, variable selection in censored quantile regression with high dimensional data.

Belloni, A., Chernozhukov, V.: \(l_1\) -penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39 (1), 82–130 (2011)

Beyerlein, A., Kries, R., Ness, A.R., Ong, K.K.: Genetic markers of obesity risk: stronger associations with body composition in overweight compared to normal-weight children. PLoS ONE 6 (4), 19057 (2011)

Article   Google Scholar  

Cai, L., Guo, X., Li, G., Tan, F.: Tests for high-dimensional single-index models. Electron. J. Stat. 17 (1), 429–463 (2023)

Article   MathSciNet   Google Scholar  

Chen, S.X., Qin, Y.-L.: A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 38 (2), 808–835 (2010)

Chen, J., Li, Q., Chen, H.Y.: Testing generalized linear models with high-dimensional nuisance parameters. Biometrika 110 (1), 83–99 (2023)

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., Robins, J.: Double/debiased machine learning for treatment and structural parameters. Econometr. J. 21 (1), 1–68 (2018)

Cui, H., Guo, W., Zhong, W.: Test for high-dimensional regression coefficients using refitted cross-validation variance estimation. Ann. Stat. 46 (3), 958–988 (2018)

Cui, H., Zou, F., Ling, L.: Feature screening and error variance estimation for ultrahigh-dimensional linear model with measurement errors. Commun. Math. Stat., pp. 1–33 (2023)

Dezeure, R., Bühlmann, P., Zhang, C.-H.: High-dimensional simultaneous inference with the bootstrap. TEST 26 , 685–719 (2017)

Du, L., Guo, X., Sun, W., Zou, C.: False discovery rate control under general dependence by symmetrized data aggregation. J. Am. Stat. Assoc. 118 (541), 607–621 (2023)

Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 70 (5), 849–911 (2008)

Fan, J., Guo, S., Hao, N.: Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B 74 (1), 37–65 (2012)

Guo, B., Chen, S.X.: Tests for high dimensional generalized linear models. J. R. Stat. Soc. Ser. B 78 (5), 1079–1102 (2016)

Guo, W., Zhong, W., Duan, S., Cui, H.: Conditional test for ultrahigh dimensional linear regression coefficients. Stat. Sin. 32 , 1381–1409 (2022)

MathSciNet   Google Scholar  

Hall, P., Heyde, C.C.: Martingale Limit Theory and Its Application. Academic Press, UK (2014)

Google Scholar  

Khaled, W., Lin, J., Han, Z., Zhao, Y., Hao, H.: Test for heteroscedasticity in partially linear regression models. J. Syst. Sci. Complex. 32 , 1194–1210 (2019)

Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)

Book   Google Scholar  

Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46 (1), 33–50 (1978)

Koenker, R., Chernozhukov, V., He, X., Peng, L.: Handbook of Quantile Regression. CRC Press, New York (2017)

Lu, W., Zhu, Z., Lian, H.: Sparse and low-rank matrix quantile estimation with application to quadratic regression. Stat. Sin. 33 (2), 945–959 (2023)

Ma, R., Cai, T., Li, H.: Global and simultaneous hypothesis testing for high-dimensional logistic regression models. J. Am. Stat. Assoc. 116 (534), 984–998 (2021)

Meinshausen, N., Meier, L., Bühlmann, P.: P-values for high-dimensional regression. J. Am. Stat. Assoc. 104 (488), 1671–1681 (2009)

Méndez Civieta, Á., Aguilera-Morillo, M.C., Lillo, R.E.: Asgl: a python package for penalized linear and quantile regression. arXiv preprint arXiv:2111.00472 (2021)

Ning, Y., Liu, H.: A general theory of hypothesis tests and confidence regions for sparse high dimensional models. Ann. Stat. 45 (1), 158–195 (2017)

Parker, J.S., Mullins, M., Cheang, M.C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X., Hu, Z.: Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27 (8), 1160–1167 (2009)

Prat, A., Bianchini, G., Thomas, M., Belousov, A., Cheang, M.C., Koehler, A., Gómez, P., Semiglazov, V., Eiermann, W., Tjulandin, S.: Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 20 (2), 511–521 (2014)

Sherwood, B., Wang, L.: Partially linear additive quantile regression in ultra-high dimension. Ann. Stat. 44 (1), 288–317 (2016)

Shi, H., Sun, B., Yang, W., Guo, X.: Tests for ultrahigh-dimensional partially linear regression models. arXiv preprint arXiv:2304.07546 (2023)

Song, X., Li, G., Zhou, Z., Wang, X., Ionita-Laza, I., Wei, Y.: QRank: a novel quantile regression tool for eQTL discovery. Bioinformatics 33 (14), 2123–2130 (2017)

Tan, F., Jiang, X., Guo, X., Zhu, L.: Testing heteroscedasticity for regression models based on projections. Stat. Sin. 31 (2), 625–646 (2021)

Tang, Y., Wang, Y., Judy Wang, H., Pan, Q.: Conditional marginal test for high dimensional quantile regression. Stat. Sin. 32 , 869–892 (2022)

Wang, H.J., Zhu, Z., Zhou, J.: Quantile regression in partially linear varying coefficient models. Ann. Stat. 37 (6B), 3841–3866 (2009)

Wang, H., Jin, H., Jiang, X.: Feature selection for high-dimensional varying coefficient models via ordinary least squares projection. Commun. Math. Stat., pp. 1–42 (2023)

Wu, Y., Yin, G.: Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika 102 (1), 65–76 (2015)

Yang, W., Guo, X., Zhu, L.: Score function-based tests for ultrahigh-dimensional linear models. arXiv preprint arXiv:2212.08446 (2022)

Zhang, X., Cheng, G.: Simultaneous inference for high-dimensional linear models. J. Am. Stat. Assoc. 112 (518), 757–768 (2017)

Zhang, Y., Lian, H., Yu, Y.: Ultra-high dimensional single-index quantile regression. J. Mach. Learn. Res. 21 (1), 9212–9236 (2020)

Zhong, P.-S., Chen, S.X.: Tests for high-dimensional regression coefficients with factorial designs. J. Am. Stat. Assoc. 106 (493), 260–274 (2011)

Download references

Acknowledgements

The authors would like to thank the editor, the Associate Editor, and the two anonymous reviewers for their valuable comments and constructive suggestions, which lead to significant improvements in the paper. Xu Guo was supported by National Natural Science Foundation of China (Nos. 12071038, 12322112); Niwen Zhou was supported by National Natural Science Foundation of China (No. 12301331) and Natural Science Foundation of Guangdong Province (No. 2023A1515010026).

Author information

Authors and affiliations.

School of Statistics, Beijing Normal University, 19 Xinjiekouwai Street, Haidian District, 100875, Beijing, People’s Republic of China

Hongwei Shi, Weichao Yang & Xu Guo

Center for Statistics and Data Science, Beijing Normal University, 18 Jinfeng Road, Zhuhai City, 519087, Guangdong Province, People’s Republic of China

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Xu Guo .

Ethics declarations

Conflict of interest.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

7.1 Appendix A: Proofs of Main Results

This section contains the proofs of the theoretical results in the main paper. We first describe some additional notations used in the proofs. For two sequences of positive constants \(a_n\) and \(b_n\) , we write \(a_n \lesssim b_n\) if there exists some universal constant \(c>0\) and positive integer N independent of n such that \(a_n \le c b_n\) for all \(n \ge N\) . \(a_n > rsim b_n\) is equivalent to \(b_n \lesssim a_n\) . We use \(a_n \asymp b_n\) to denote that \(a_n \lesssim b_n\) and \(a_n > rsim b_n\) hold simultaneously. For \(d\times q\) -dimensional matrix \({\textbf{M}}\) , we define \(\Vert {\textbf{M}}\Vert _{F} = \{\textrm{tr}({\textbf{M}}{\textbf{M}}^\top )\}^{1/2}\) and \(\Vert {\textbf{M}}\Vert _2 = \lambda _{\max }({\textbf{M}})\) . If the matrix \({\textbf{M}}\) is symmetric, \(\lambda _{\max }({\textbf{M}})\) is the maximal eigenvalue of \({\textbf{M}}\) .

To ensure brevity while no confusion, we substitute \(n_1\) and \(\hat{m}_{\tau 2}(\cdot )\) in \(U_{n1}\) with n and \(\hat{m}(\cdot )\) , respectively. Then we can write \(U_{n1}\) as \(U_n\) , which is given by

The above \(U_n\) suggests that we estimate \(m_{\tau }(\cdot )\) using the data \({\mathcal {D}}_2\) , while the construction of the test statistic relies on the data \({\mathcal {D}}_1\) . Thus, \(\{\hat{m}({\textbf{Z}}_i), i \in {\mathcal {D}}_{1}\}\) are i.i.d random variables given the data \({\mathcal {D}}_2\) . For simplicity, we write \(F_{Y}(a|{\textbf{X}}, {\textbf{Z}}) = \Pr (Y \le a | {\textbf{X}}, {\textbf{Z}})\) and \(f_{Y}(\cdot |{\textbf{X}}, {\textbf{Z}})\) as \(F_{Y}(\cdot )\) and \(f_{Y}(\cdot )\) , respectively. Here \(F_{Y}(\cdot )\) is the conditional cumulative distribution function of Y given \(({\textbf{X}}, {\textbf{Z}})\) , while \(f_{Y}(\cdot )\) is the conditional density function of Y given \(({\textbf{X}}, {\textbf{Z}})\) . Furthermore, we denote \(m_i:=m({\textbf{Z}}_i)\) , \(\hat{m}_i:=\hat{m}({\textbf{Z}}_i)\) , \(F_Y(m_i):= F_{Y,i}\) , \(e_i:= F_{Y,i} - I(Y_i < m_i)\) , and \(\hat{e}_i:= F_{Y,i} - I(Y_i < \hat{m}_i)\) .

Proof of Theorem 2.7

Before giving the proof of Theorem 2.7 , we introduce some useful technical lemmas, whose proofs are deferred to Appendix B.

Let \({\textbf{M}}_{1}\) and \({\textbf{M}}_2\) be two \(d \times d\) semipositive matrices. Given that \({\mathbb {E}}(\prod _{i=1}^{2}{\textbf{X}}^\top {\textbf{M}}_{i}{\textbf{X}}) \le C\prod _{i=1}^{2}\textrm{tr}({\textbf{M}}_{i}\varvec{\Sigma }_{{\textbf{X}}})\) , where \({\textbf{X}}\) is a d -dimensional random vector with \({\mathbb {E}}({\textbf{X}}) = 0\) and \(\varvec{\Sigma }_{{\textbf{X}}} = {\mathbb {E}}({\textbf{X}}{\textbf{X}}^{\top })\) . Let \(e \bot \!\!\!\bot {\textbf{X}}\) be a bounded variable with mean 0 and variance \(\tau (1-\tau )\) , where \(\tau \in (0, 1)\) . Assume that \(\textrm{tr}(\varvec{\Sigma }^4_{{\textbf{X}}})=o\{\textrm{tr}^2(\varvec{\Sigma }_{{\textbf{X}}}^2)\}\) and \(\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)\rightarrow \infty \) as \((n, d)\rightarrow \infty \) , then the following holds.

Under conditions in Theorem 2.7 , it can be shown that

Here \(\varvec{\Sigma }_{{\textbf{X}}} = {\mathbb {E}}({\textbf{X}}{\textbf{X}}^{\top })\) and \(\varvec{r}_{e{\textbf{X}}}={\mathbb {E}}[(\hat{e}-e) {\textbf{X}}]={\mathbb {E}}[\{I(Y<m_{\tau }({\textbf{Z}}))-I(Y<\hat{m}_{\tau }({\textbf{Z}}))\} {\textbf{X}}]\) .

Under the null hypothesis, \(\tau = \Pr (Y_i \le m_i | {\textbf{X}}, {\textbf{Z}})=F_{Y,i}\) , then \(e_i = F_{Y,i} - I(Y_i< m_i) = \tau - I(Y_i < m_i) = \varphi _{\tau }(Y_i - m_i)\) . Similarly, it follows that \(\hat{e}_i = \varphi _{\tau }(Y_i - \hat{m}_i)\) . We rewrite \(U_n\) as the sum of three terms, that is,

As \(U_1\) , \(U_2\) and \(U_3\) are U -statistics, we can prove the theorem based on the properties of U -statistic.

Denote \(u_{n,2}=\frac{1}{n-1} U_{2} = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,2}\) with the kernel \(u_{ij,2}=(\hat{e}_i-e_i)(\hat{e}_j-e_j) {\textbf{X}}_i^{\top } {\textbf{X}}_j\) , and further define \(\varvec{r}_{e{\textbf{X}}}={\mathbb {E}}[(\hat{e}-e) {\textbf{X}}]={\mathbb {E}}[\{I(Y<m_{\tau }({\textbf{Z}}))-I(Y<\hat{m}_{\tau }({\textbf{Z}}))\} {\textbf{X}}]\) . We yield that

which holds by the equality ( 7.1 ) in Lemma 7.2 .

By the Hoeffding decomposition, the variance of \(u_{n,2}\) is

where \(u_{1i,2} = {\mathbb {E}}(u_{ij,2} | {\textbf{X}}_i, {\textbf{Z}}_i, Y_i) = (\hat{e}_i-e_i) {\textbf{X}}_i^{\top } \varvec{r}_{e{\textbf{X}}}\) is the projection of \(u_{ij,2}\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i\}\) . We further have

We derive that

where the first inequality holds based on the definition of variance, the second inequality follows from the Cauchy–Schwarz inequality and the third inequality holds by the inequality ( 7.2 ) in Lemma 7.2 .

Moreover, we have

where the last inequality holds by \((\hat{e}_1-e_1)\) is independent of \((\hat{e}_2-e_2)\) and the inequality ( 7.3 ) in Lemma 7.2 .

Combining equations ( 7.7 ), ( 7.8 ) and ( 7.9 ), it then follows that

where the last equality holds by Condition 2.4 along with the equalities ( 7.4 ) and ( 7.5 ) in Lemma 7.2 . Therefore, from the results ( 7.6 ) and ( 7.10 ), we conclude that

Secondly, we turn to consider \(U_3\) . Similar to the term \(U_2\) , we write \(u_{n,3}=\frac{1}{n-1} U_3 = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,3}\) with \(u_{ij,3}=[e_i(\hat{e}_j-e_j)+e_j(\hat{e}_i-e_i)] {\textbf{X}}_i^{\top } {\textbf{X}}_j\) . We have \({\mathbb {E}}(u_{n,3})={\mathbb {E}}(u_{ij,3})=0\) ; then,

Furthermore, the projection of \(u_{ij,3}\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i\}\) is

Here the last equality holds by the equality ( 7.4 ) in Lemma 7.2 , while the fourth equality follows from the facts that \({\mathbb {E}}[({\textbf{X}}^{\top } \varvec{r}_{e{\textbf{X}}})^2]={\mathbb {E}}({\varvec{r}_{e{\textbf{X}}}}^{\top } {\textbf{X}}{\textbf{X}}^{\top } \varvec{r}_{e{\textbf{X}}})=\varvec{r}_{e{\textbf{X}}} \varvec{\Sigma }_{{\textbf{X}}} \varvec{r}_{e{\textbf{X}}}\) , and

What is more, we derive that

where the second inequality holds by the equation ( 7.14 ) and Cauchy–Schwarz inequality, the third inequality follows from the inequality ( 7.3 ) in Lemma 7.2 , while the last equality holds based on the equality ( 7.5 ) in Lemma 7.2 .

Combining equations ( 7.13 ) and ( 7.15 ), we have

and then conclude that

From the conclusions presented in ( 7.11 ) for \(U_2\) and ( 7.16 ) for \(U_3\) along with the results for \(U_1\) in Lemma 7.1 , we complete the proof, that is,

\(\square \)

Proof of Theorem 2.8

The proof of the following lemma is deferred to Appendix B.

Under conditions in Theorem 2.8 , it can be shown that

Without loss of generality, we suppose that the intercept term \(b_{\tau 0}\) within the model ( 1.1 ) is zero in the proof. Under the alternative hypotheses,

where the third equality follows from mean value theorem, and \(|{\textbf{X}}_i^{\top }\tilde{\varvec{\beta }_{\tau }}| \le |{\textbf{X}}_i^{\top } \varvec{\beta }_{\tau }|\) .

Under the alternative hypotheses, \(U_n\) can be written as

Under the proof in Theorem 2.7 , we have

For the term \(U_4\) , we denote \(u_{n,4}=\frac{1}{n-1} U_4 = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,4}\) , where \(u_{ij,4} = {\textbf{X}}_i^{\top }\varvec{\beta }_{\tau } \tilde{f}_{Y,i} {\textbf{X}}_j^{\top }\varvec{\beta }_{\tau } \tilde{f}_{Y,j} {\textbf{X}}_i^{\top } {\textbf{X}}_j\) . Note that \(\tilde{f}_{Y,i} = f_Y(m_i + {\textbf{X}}_i^{\top }\tilde{\varvec{\beta }_{\tau }}) = f_Y(m_i) +f^{\prime }_{Y}(m_i + {\textbf{X}}_i^{\top } \breve{\varvec{\beta }_{\tau }}){\textbf{X}}_i^{\top }\tilde{\varvec{\beta }_{\tau }} =: f_{Y,i} + \breve{f}^{\prime }_{Y,i}{\textbf{X}}_i^{\top } \tilde{\varvec{\beta }_{\tau }}\) , where \(|{\textbf{X}}_i^{\top } \breve{\varvec{\beta }_{\tau }}| < |{\textbf{X}}_i^{\top }\tilde{\varvec{\beta }_{\tau }}|\) . It follows that

where denote \(\varvec{\Sigma }_{{\textbf{X}}_f} = {\mathbb {E}}\{f(m_{\tau }({\textbf{Z}})){\textbf{X}}{\textbf{X}}^{\top }\}\) . Further, define \(\varvec{\Sigma }_{{\textbf{X}}_{f^{\prime }}} = {\mathbb {E}}\{f^{\prime }(m_{\tau }({\textbf{Z}})){\textbf{X}}{\textbf{X}}^{\top }\}\) , and we calculate that

where the first inequality holds by the fact that \(|{\textbf{X}}_i^{\top }\tilde{\varvec{\beta }_{\tau }}| \le |{\textbf{X}}_i^{\top } \varvec{\beta }_{\tau }|\) , the second inequality follows from the Cauchy–Schwarz inequality, the third inequality holds by Condition 2.2 and the last equality holds based on the condition \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau } = o(1)\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) . Moreover, by the similar derivation as ( 7.23 ), we can obtain that

Here the second inequality follows from the fact that \(2ab \le a^2 + b^2\) , and the last inequality holds by the inequality ( 7.17 ) in Lemma 7.3 . Combining equations ( 7.21 )–( 7.24 ), we derive that

where the last equality holds by the boundness of \(f_{Y}(\cdot )\) and \(f^{\prime }_Y(\cdot )\) in Condition 2.5 .

The projection of \(u_{ij,4}\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i\}\) is

Here the first inequality follows from the boundness of \(f_{Y}(\cdot )\) in Condition 2.5 . We further derive that

Here the third inequality holds by the Cauchy–Schwarz inequality, and the last inequality follows from inequalities ( 7.17 ) and ( 7.18 ) in Lemma 7.3 .

We then calculate that

where the second inequality follows from Condition 2.5 , the third inequality holds by the Cauchy–Schwarz inequality and the last inequality follows based on the inequality ( 7.17 ) in Lemma 7.3 and the inequality ( 7.3 ) in Lemma 7.2 .

Combining equations ( 7.26 ) and ( 7.27 ), we have

Here the last equality holds by the conditions \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau } = o(1)\) and \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}^3\varvec{\beta }_{\tau } = o\{\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)/{n}\}\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) . In conclusion, from the results ( 7.25 ) and ( 7.28 ), we obtain that

Here the second equality follows from Condition 2.5 , and the last equality holds by the conditions \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}^{2}\varvec{\beta }_{\tau } = o\left\{ {\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)}/(n^2{\varvec{r}_{m{\textbf{X}}}}^\top \varvec{r}_{m{\textbf{X}}})\right\} \) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) and \(n{\varvec{r}_{m{\textbf{X}}}}^{\top }{\varvec{r}_{m{\textbf{X}}}} = o\{\textrm{tr}^{1/2}(\varvec{\Sigma }_{{\textbf{X}}}^2)\}\) in Condition 2.3 .

Now we turn to consider \(U_5\) . We define \(u_{n,5}= \frac{1}{n-1} U_5 = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,5}\) , where

Observe that \({\mathbb {E}}(U_5) = {\mathbb {E}}(u_{ij,5}) = 0\) . The projection of \(u_{ij,5}\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i \}\) is

Here the last inequality follows from the boundness of \(f_{Y}(\cdot )\) in Condition 2.5 . In addition,

where the last equality follows from the equation ( 7.14 ) and the last inequality holds by \(\tau \in (0,1)\) .

Additionally, it follows that

Here the second inequality holds by Condition 2.5 , and the last inequality follows from the inequality ( 7.17 ) in Lemma 7.3 and the inequality ( 7.3 ) in Lemma 7.2 .

Combining equations ( 7.30 ) and ( 7.31 ), we derive that

Here the last equality holds by the conditions \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau } = o(1)\) and \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}^3\varvec{\beta }_{\tau } = o\{\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)/{n}\}\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) . Then, from the results \({\mathbb {E}}(U_5) = 0\) and ( 7.32 ), we have

Lastly, we consider the term \(U_6\) . Define \(u_{n,6} = \frac{1}{n-1}U_6 = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,6}\) with the kernel

We obtain that

where the first inequality holds by Condition 2.5 , and thus,

Here the last equality follows from the equality ( 7.19 ) in Lemma 7.3 .

Note that the projection of \(u_{ij,6}\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i \}\) is

where the first inequality holds by Condition 2.5 . We further derive that

where the third inequality holds by the Cauchy–Schwarz inequality, and the last inequality follows from the inequality ( 7.17 ) in Lemma 7.3 and the inequality ( 7.2 ) in Lemma 7.2 . Similarly,

Then combining equations ( 7.34 )–( 7.36 ), we have

Furthermore, similar to the derivation of the equation ( 7.31 ), we calculate that

Here the third inequality holds by Condition 2.5 , the fourth inequality follows from the Cauchy–Schwarz inequality, and the last inequality follows based on the inequality ( 7.17 ) in Lemma 7.3 and the inequality ( 7.3 ) in Lemma 7.2 .

Accordingly, combining equations ( 7.37 )–( 7.38 ), we have

where the last equality holds by conditions \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau } = o(1)\) and \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}^3\varvec{\beta }_{\tau } = o\{\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)/{n}\}\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) , along with the equalities ( 7.4 ) and ( 7.5 ) in Lemma 7.2 . Thus, we conclude that

In sum, following the results ( 7.20 ), ( 7.29 ), ( 7.33 ) and ( 7.39 ), we verify that

Proof of Theorem 2.9

Under conditions in Theorem 2.9 , it can be shown that

Here \(\hat{\zeta } = F_{Y}(m_{\tau }({\textbf{Z}})) + {\textbf{X}}^{\top }\varvec{\beta }_{\tau } {f}_{Y}(m_{\tau }({\textbf{Z}}) + {\textbf{X}}^{\top } \tilde{\varvec{\beta }_{\tau }})-I\{Y < \hat{m}_{\tau }({\textbf{Z}})\}\) and \(\varvec{r}_{\zeta {\textbf{X}}} = {\mathbb {E}}(\hat{\zeta }{\textbf{X}})\) .

Under the alternative hypotheses, recall that

where denote \(\hat{\zeta }_i=F_{Y,i} + {\textbf{X}}_i^{\top }\varvec{\beta }_{\tau } \tilde{f}_{Y,i}-I(Y_i<\hat{m}_i)\) . \(u_n = \frac{1}{n-1} U_n = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij}\) is U -statistic with the kernel \(u_{ij} = \hat{\zeta }_i\hat{\zeta }_j{\textbf{X}}_i^{\top } {\textbf{X}}_j\) . Clearly, we denote \(\varvec{r}_{\zeta {\textbf{X}}} = {\mathbb {E}}(\hat{\zeta }{\textbf{X}})\) and obtain that

Here the last inequality follows from the facts that \(\varvec{r}_{\zeta {\textbf{X}}} \asymp \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau }+\varvec{r}_{m{\textbf{X}}}\) , which holds based on the boundness of \(f_{Y}(\cdot )\) in Condition 2.5 , along with the condition \(\textrm{tr}^{1/2}(\varvec{\Sigma }_{{\textbf{X}}}^2)=o\big (n\Vert \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau }+\varvec{r}_{m{\textbf{X}}}\Vert _2^2\big )\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{II}(\varvec{\beta }_{\tau })\) .

The projection of \(u_n\) to the space \(\{{\textbf{X}}_i, {\textbf{Z}}_i, Y_i\}\) is

We then derive that

Here the last inequality holds by the inequality ( 7.17 ) in Lemma 7.2 . As a result, from the technical results in Lemma 7.4 and the condition \(n{\varvec{r}_{\zeta {\textbf{X}}}}^{\top }{\varvec{r}_{\zeta {\textbf{X}}}} \gg \textrm{tr}^{1/2}(\varvec{\Sigma }_{{\textbf{X}}}^2)\) , we obtain that

Similarly, we calculate that

where the second inequality holds by the Cauchy–Schwarz inequality, and the third inequality holds based on the inequality ( 7.3 ) in Lemma 7.2 and the equality ( 7.40 ) in Lemma 7.4 .

Combining equations ( 7.43 )–( 7.44 ), we derive that

Following the results ( 7.42 ) and ( 7.45 ), we calculate that

From the condition \(n{\varvec{r}_{\zeta {\textbf{X}}}}^{\top }{\varvec{r}_{\zeta {\textbf{X}}}} \gg \textrm{tr}^{1/2}(\varvec{\Sigma }_{{\textbf{X}}}^2)\) , it follows that

7.2 Appendix B: Proofs of Technical Lemmas

This section contains some useful lemmas.

Proof of Lemma 7.1

We define \(U_1 = \frac{1}{n}\sum _{i\ne j}e_ie_j{\textbf{X}}_i^{\top }{\textbf{X}}_j\) for brevity. Denote

Define \(S_{n k}=\sum _{i=2}^k \eta _{ni}=\frac{2}{n} \sum _{i=2}^k \sum _{j=1}^{i-1} e_i e_j {\textbf{X}}_i^{\top }{\textbf{X}}_j\) with \(S_{nk}-S_{n(k-1)}=\eta _{n k}\) defined as martingale differences, and \({\mathscr {F}}_k = \sigma \{({\textbf{X}}_i, e_i), i=1, \ldots , k\}\) . Obviously, we have \({\mathbb {E}}(\eta _{nk} | {\mathscr {F}}_{k-1})=0\) , which follows that \((S_{nk}, {\mathscr {F}}_{k})\) is a zero-mean martingale sequence. Define \(v_{ni} = \textrm{Var}(\eta _{ni}|{\mathscr {F}}_{i-1})\) and \(V_{n} = \sum _{i=2}^{n}v_{ni}\) . Note that

Therefore, by the martingale central limit theorem [ 15 ], it is sufficient to show that the following two conditions hold.

and for all \(\iota >0\) ,

We first establish the equation ( 7.46 ). Observe that

\(S_{nn}=\frac{1}{n} \sum _{i \ne j} e_i e_j {\textbf{X}}_i^{\top }{\textbf{X}}_j\) and denote \(u_{n,s}=\frac{1}{n-1} S_{n n} = \frac{1}{n(n-1)} \sum _{i \ne j} u_{ij,s}\) is a U -statistic with the kernel \(u_{i j, s}=e_i e_j {\textbf{X}}_i^{\top }{\textbf{X}}_j\) . The projection of \(u_{i j, s}\) to the space \(\{{\textbf{X}}_i, e_i\}\) is \(u_{1i, s}={\mathbb {E}}(u_{i j, s} | {\textbf{X}}_i, e_i)={\mathbb {E}}(e_i e_j {\textbf{X}}_i^{\top } {\textbf{X}}_j | {\textbf{X}}_i, e_i)=0\) . Furthermore, by the Hoeffding decomposition,

Then combining equations ( 7.48 ) and ( 7.49 ), we write

Now we need to show that \(R_1 {\mathop {\rightarrow }\limits ^{p}} 1\) and \(R_2 {\mathop {\rightarrow }\limits ^{p}} 0\) . It can be derived that

Here the last equality holds by the equality \({\mathbb {E}}({\textbf{X}}^{\top } \varvec{\Sigma }_{{\textbf{X}}} {\textbf{X}}) = \textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)\) .

Observe the fact that

where the first inequality follows from e is bounded and the second inequality holds by the condition \({\mathbb {E}}(\prod _{i=1}^{2}{\textbf{X}}^\top {\textbf{M}}_{i}{\textbf{X}}) \le C\prod _{i=1}^{2}\textrm{tr}({\textbf{M}}_{i}\varvec{\Sigma }_{{\textbf{X}}})\) and the equality \({\mathbb {E}}({\textbf{X}}^{\top } \varvec{\Sigma }_{{\textbf{X}}} {\textbf{X}}) = \textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)\) .

Similar to the derivation of \(\textrm{Var}(R_1)\) , we obtain that

Here the last equality holds by the condition \(\textrm{tr}(\varvec{\Sigma }^4_{{\textbf{X}}})=o\{\textrm{tr}^2(\varvec{\Sigma }_{{\textbf{X}}}^2)\}\) . Observe that \({\mathbb {E}}(R_2) = 0\) , combining equations ( 7.50 )–( 7.52 ), Chebyshev inequality yields that \(R_1 {\mathop {\rightarrow }\limits ^{p}} 1\) and \(R_2 {\mathop {\rightarrow }\limits ^{p}} 0\) . Up to now, the equation ( 7.46 ) is verified.

Next, we establish the equation ( 7.47 ). For all \(\iota > 0\) , we have

which holds by Markov inequality. Furthermore, we obtain that

where the first inequality holds by e is bounded. The last inequality holds by equations ( 7.55 ) and ( 7.56 ) as follows,

where the last inequality holds by the inequality ( 7.3 ) in Lemma 7.2 , and similarly,

Consequently, ( 7.47 ) can also be established by combining equations ( 7.53 ) and ( 7.54 ). \(\square \)

Proof of Lemma 7.2

We first prove the equality ( 7.1 ). By mean value theorem, we calculate that

Here \(\tilde{m}({\textbf{Z}})\) is a value between \(m_{\tau }({\textbf{Z}})\) and \(\hat{m}_{\tau }({\textbf{Z}})\) and the last inequality follows from Condition 2.5 . It is clear that \(n{\varvec{r}_{e{\textbf{X}}}}^{\top }{\varvec{r}_{e{\textbf{X}}}} = o\{\textrm{tr}^{1/2}(\varvec{\Sigma }_{{\textbf{X}}}^2)\}\) under Condition 2.3 .

Under Condition 2.2 , we derive that

which verifies the inequality ( 7.2 ). Similarly, we prove the inequality ( 7.3 ),

The first and second inequalities hold by Condition 2.2 . Furthermore, following the equality ( 7.1 ), the equality ( 7.4 ) satisfies

where the second inequality holds based on the fact that the Frobenius norm is an upper bound on the spectral norm, and the last equality follows from the equation ( 7.57 ) and Condition 2.3 .

Now we prove the equality ( 7.5 ). Without loss of generality, we assume \(\hat{m}_{\tau }({\textbf{Z}}) < m_{\tau }({\textbf{Z}})\) given \(({\textbf{X}}, {\textbf{Z}})\) . Similar to the derivation in the equality ( 7.57 ), we obtain that

Here the last equality holds by Condition 2.4 . \(\square \)

Proof of Lemma 7.3

The inequality ( 7.17 ) can be similarly verified as inequality ( 7.2 ) and thus omitted here. By using similar arguments of the proof in ( 7.58 ), we establish the inequality ( 7.18 ),

Next, we prove the equality ( 7.19 ), and we observe that

Here the first equality holds by the condition \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}^{2}\varvec{\beta }_{\tau } = o\{{\textrm{tr}(\varvec{\Sigma }_{{\textbf{X}}}^2)}/(n^2{\varvec{r}_{m{\textbf{X}}}}^\top \varvec{r}_{m{\textbf{X}}})\}\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{I}(\varvec{\beta }_{\tau })\) , and the second inequality follows from the equation ( 7.57 ). \(\square \)

Proof of Lemma 7.4

Firstly, we prove the equality ( 7.40 ). Note that \(F_Y(m_{\tau }({\textbf{Z}}))\) , \(I\{Y < \hat{m}_{\tau }({\textbf{Z}})\}\) and \(f_Y(m_{\tau }({\textbf{Z}}) + {\textbf{X}}^{\top }\tilde{\varvec{\beta }_{\tau }})\) are all bounded; thus, we derive that

Here the second inequality follows by the inequality ( 7.17 ) in Lemma 7.3 , and the last equality holds based on the condition \(\varvec{\beta }_{\tau }^\top \varvec{\Sigma }_{{\textbf{X}}}\varvec{\beta }_{\tau } = O(1)\) in \(\varvec{\beta }_{\tau } \in {\mathscr {L}}^{II}(\varvec{\beta }_{\tau })\) .

Secondly, we establish the inequality ( 7.41 ). By using similar arguments of the proof in ( 7.59 ), we obtain that

7.3 Appendix C: Test Based on Multiple Data Splitting

In the main text, we propose the \(L_2\) -type test statistic \(S_n\) and the screening-based test statistic \(\tilde{S}_n\) based on single data splitting. However, sample splitting may introduce additional randomness into the analysis. To better apply our methods in practice, we further propose an algorithm based on multiple data splitting. We adopt the approach in [ 22 ]. The detailed procedure is summarized in Algorithm 3.

figure c

Testing procedure for \({\mathbb {H}}_0\) based on multiple data splitting

7.4 Appendix D: Simulation Results for ML Estimators

We conduct further simulation studies to evaluate the performance of ML estimators (based on Lasso and Neural Networks) for the unknown smooth function \(m_{\tau }(\cdot )\) . Following the settings in Example 4.1 of the main text, we consider the null hypothesis and generate the random error from Case 1 . The estimation quality of ML estimators is measured by the mean absolute error (MAE) over 500 repetitions, with results summarized in Table 6 . From this table, we can see that the Lasso performs better than MLP with smaller MAEs. Additionally, the value of MAE decreases as the sample size n increases. Due to the high dimension of nuisance covariates \({\textbf{Z}}\) , q and the small sample size n , the MAEs are not very small. However, from the results in Table 1 , it is clear that our procedure can control the empirical size well.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Shi, H., Yang, W., Zhou, N. et al. Inference for Partially Linear Quantile Regression Models in Ultrahigh Dimension. Commun. Math. Stat. (2024). https://doi.org/10.1007/s40304-023-00389-9

Download citation

Received : 19 June 2023

Revised : 09 August 2023

Accepted : 12 November 2023

Published : 06 September 2024

DOI : https://doi.org/10.1007/s40304-023-00389-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Semiparametric model
  • Significance testing
  • Quantile regression
  • Ultrahigh dimensionality

Mathematics Subject Classification

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Two Sample Z Hypothesis Test

    z test null and alternative hypothesis

  2. Hypothesis Testing

    z test null and alternative hypothesis

  3. NE 2207 Part 23

    z test null and alternative hypothesis

  4. Null vs. Alternative Hypothesis: Key Differences and Examples Explained

    z test null and alternative hypothesis

  5. Z-test

    z test null and alternative hypothesis

  6. Hypothesis Testing: Z-Scores. A guide to understanding what…

    z test null and alternative hypothesis

VIDEO

  1. One-Tailed or Two-Tailed Test? Null & Alternative Hypothesis II Tagalog Explained

  2. 1-Prop ZTest: One Sample z Test for a Proportion & p-value

  3. HYPOTHESIS TESTING PROBLEM-9 USING Z TEST VIDEO-12

  4. Hypothesis Z-test for mean

  5. Lecture 5: Z-Test

  6. TEST OF HYPOTHESIS (Z-TEST) || BUSINESS STATISTICS || UGC/NTA NET COMMERCE 2020

COMMENTS

  1. Z Test: Uses, Formula & Examples

    Z Test: Uses, Formula & Examples

  2. Z-test Calculator

    Z-test Calculator | Definition | Examples

  3. Z-test

    Z-test - Wikipedia ... Z-test

  4. Two Sample Z-Test: Definition, Formula, and Example

    An example of how to perform a two sample z-test. Let's jump in! Two Sample Z-Test: Formula. A two sample z-test uses the following null and alternative hypotheses: H 0: μ 1 = μ 2 (the two population means are equal) H A: μ 1 ≠ μ 2 (the two population means are not equal) We use the following formula to calculate the z test statistic:

  5. Z Test: Definition & Two Proportion Z-Test

    Z Test: Definition & Two Proportion Z-Test

  6. One Sample Z-Test: Definition, Formula, and Example

    One Sample Z-Test: Definition, Formula, and Example

  7. Hypothesis Testing: Z-Scores. A guide to understanding what…

    Null hypothesis: All adults sleep 7 hours a day; Alternative hypothesis: All adults do not sleep 7 hours a day; Great, now that we know what hypothesis testing is when to apply the z-test, and the orientations of the hypotheses according to the alternative hypothesis, it's time to see a couple of examples. Let's go for it! Example

  8. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.

  9. Z-Test for Statistical Hypothesis Testing Explained

    Z-Test for Statistical Hypothesis Testing Explained

  10. Null & Alternative Hypotheses

    Null & Alternative Hypotheses | Definitions, Templates & ...

  11. 10 Chapter 10: Hypothesis Testing with Z

    Chapter 10: Hypothesis Testing with Z. This chapter lays out the basic logic and process of hypothesis testing using a z. We will perform a test statistics using z, we use the z formula from chapter 8 and data from a sample mean to make an inference about a population.

  12. 6.2: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  13. Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

    Hypothesis Testing: Upper-, Lower, and Two Tailed Tests

  14. Approximate Hypothesis Tests: the z Test and the t Test

    Approximate Hypothesis Tests: the z Test and the t Test . This chapter presents two common tests of the hypothesis that a population mean equals a particular value and of the hypothesis that two population means are equal: the z test and the t test. These tests are approximate: They are based on approximations to the probability distribution of the test statistic when the null hypothesis is ...

  15. Z Test

    A z test is a test that is used to check if the means of two populations are different or not provided the data follows a normal distribution. For this purpose, the null hypothesis and the alternative hypothesis must be set up and the value of the z test statistic must be calculated.

  16. Z-test : Formula, Types, Examples

    Z-test : Formula, Types, Examples - GeeksforGeeks ... Z-test

  17. 10.2: Null and Alternative Hypotheses

    In a hypothesis test, sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis, typically denoted with \(H_{0}\). The null is not rejected unless the hypothesis ...

  18. Understanding Null Hypothesis vs. Alternative Hypothesis

    Formulation: The null and alternative hypotheses are formulated based on the research question and the hypothesis the researcher seeks to test. Testing: Statistical tests are performed to determine whether there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis. Decision-making: The decision to accept or reject the null hypothesis is based on the results ...

  19. Z-test

    A Z-test is a type of statistical hypothesis test used to test the mean of a normally distributed test statistic. It tests whether there is a significant difference between an observed population mean and the population mean under the null hypothesis, H 0. A Z-test can only be used when the population variance is known (or can be estimated with ...

  20. 10.1

    10.1 - Z-Test: When Population Variance is Known

  21. 8.1: The null and alternative hypotheses

    The Null hypothesis \(\left(H_{O}\right)\) is a statement about the comparisons, e.g., between a sample statistic and the population, or between two treatment groups. The former is referred to as a one-tailed test whereas the latter is called a two-tailed test. The null hypothesis is typically "no statistical difference" between the ...

  22. Data analysis: hypothesis testing: 1.1 Formulating null and alternative

    1.1 Formulating null and alternative hypotheses. In the world of scientific inquiry, you often begin with a null hypothesis (H 0), which expresses the currently accepted value for a parameter in the population.The alternative hypothesis (H a), on the other hand, is the opposite of the null hypothesis and challenges the currently accepted value.. To illustrate this concept of null and ...

  23. 9.1 Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  24. When Alternative Analyses of the Same Data Come to Different

    When the null hypothesis is true, the estimates of effect size are 0.1 and 0.21 if correlation with the selection measure is .2 or .4, respectively. Note that when the null is true, then "power" indicates the false-positive rate, which should be around .05.

  25. Reevaluating the Neural Noise Hypothesis in Dyslexia: Insights ...

    For Bayesian t-tests and correlations, we reported the BF 10 value, indicating the ratio of the likelihood of an alternative hypothesis to a null hypothesis. We considered BF incl/10 > 3 and BF incl/10 < 1/3 as evidence for alternative and null hypotheses respectively, while 1/3 < BF incl/10 < 3 as the absence of evidence (Keysers et al., 2020 ...

  26. Inference for Partially Linear Quantile Regression Models in Ultrahigh

    We establish the asymptotic normality of the proposed test statistic under the null and local alternative hypotheses. A screening-based testing procedure is further provided to make our test more powerful in practice under the ultrahigh-dimensional regime. ... Ma, R., Cai, T., Li, H.: Global and simultaneous hypothesis testing for high ...