LEARN STATISTICS EASILY

LEARN STATISTICS EASILY

Learn Data Analysis Now!

LEARN STATISTICS EASILY LOGO 2

Exploring Standard Deviation: Statistics and Data Analysis Made Easy

You’ll learn the fundamentals of standard deviation and its importance in data analysis, interpretation techniques, and practical applications.

  • Assess data spread using standard deviation around the mean.
  • Lower standard deviations reveal consistent datasets.
  • Evaluate risk in finance with standard deviation.
  • Some interpretations assume normal data distribution.
  • Outliers impact the reliability of standard deviation.

What is Standard Deviation?

Standard deviation measures the variation or dispersion in a set of values. It is used to quantify the spread of data points in a dataset relative to the mean (average) value. For example, a low standard deviation indicates that the data points are close to the mean. In contrast, a high standard deviation shows that the data points are more widely spread.

By providing insights into the variability of a dataset, standard deviation helps researchers and analysts assess the reliability and consistency of data, identify patterns and trends, and make informed decisions based on the data’s distribution.

Standard Deviation Importance

Standard deviation is crucial in statistics and data analysis for understanding the variability of a dataset. It helps identify trends, assess data reliability, detect outliers, compare datasets, and evaluate risk. A high standard deviation indicates a larger spread of values. In contrast, a low standard deviation shows that the values are more tightly clustered around the mean.

Standard Deviation Applications​

The standard deviation has multiple applications across various industries and fields. For example, it is used in finance and investment to measure volatility, in manufacturing to monitor product quality, in social sciences to analyze data from surveys or experiments, in sports to assess athlete performance, in medicine to evaluate treatment outcomes, and in weather and climate analysis to identify patterns and trends.

Applied Statistics: Data Analysis

🕵️‍♂️ Discover the Data Analysis Techniques Top Experts Don’t Want You to Know

Click Here to Learn More! 🤫

How to Calculate Standard Deviation

Calculating standard deviation can be broken down into the following steps:

1. Compute the mean (average) of the dataset:

  • Add up all the values in the dataset.
  • Divide the sum by the number of values in the dataset.

2. Subtract the mean from each data point:

  • For each value in the dataset, subtract the mean calculated in step 1.

3. Square the differences:

  • Take the difference calculated in step 2 for each data point and square it.

4. Calculate the mean of the squared differences:

  • Add up all the squared differences from step 3.
  • Divide the sum by the number of squared differences.

Note: If you’re working with a sample rather than an entire population, divide by (number of squared differences – 1) instead of the number  of squared differences  to get an unbiased estimate of the population variance.

5. Take the square root of the mean of squared differences:

  • The square root of the result from step 4 is the standard deviation of the dataset.

Consider the dataset: [3, 6, 9, 12, 15]

Step 1: Calculate the mean.  Mean = (3 + 6 + 9 + 12 + 15) / 5 = 45 / 5 = 9

Step 2: Subtract the mean from each data point. Differences: [-6, -3, 0, 3, 6]

Step 3: Square the differences. Squared differences: [36, 9, 0, 9, 36]

Step 4: Calculate the mean of the squared differences. Mean of squared differences = (36 + 9 + 0 + 9 + 36) / 5 = 90 / 5 = 18

Step 5: Take the square root of the mean of squared differences. Standard deviation = √18 ≈ 4.24

So, the standard deviation for this dataset is approximately 4.24.

How to interpret Standard Deviation?

Interpreting standard deviation involves understanding what the value represents in the context of the data being analyzed. Here are some general guidelines for interpreting standard deviation:

Measure of dispersion

Standard deviation quantifies a dataset’s spread of data points. A higher standard deviation indicates a greater degree of dispersion or variability in the data, while a lower standard deviation suggests that the data points are more tightly clustered around the mean.

Context-dependent interpretation

The interpretation of standard deviation depends on the context and domain in which it is being used. A high standard deviation may be acceptable in specific fields. In contrast, a low standard deviation may be more desirable in other areas. For example, in finance, a high standard deviation may indicate higher risk, while in quality control, a low standard deviation indicates consistency in the production process.

Relative to the mean

The standard deviation value should be interpreted relative to the mean of the dataset. In some cases, it may be useful to compute the coefficient of variation (CV), the ratio of the standard deviation to the mean. The CV is a dimensionless measure that helps compare the degree of variation across datasets with different units or widely varying means.

Empirical rule

The empirical rule (also known as the 68-95-99.7 rule) can help interpret standard deviation for datasets that follow a normal distribution. According to this rule, approximately 68% of the data falls within one standard deviation from the mean, about 95% within two standard deviations, and around 99.7% within three standard deviations.

Identifying outliers

When interpreting standard deviation, it’s essential to consider the presence of outliers, which can significantly impact the value. Outliers are data points that deviate substantially from the mean and may require further investigation to determine their cause.

Standard Deviation Limitations

Standard deviation is a valuable measure of dispersion. Still, it has limitations, including sensitivity to outliers, the assumption of normal distribution (when applicable), incomparability across different units, and interpretation challenges. Other measures of dispersion, graphical methods, or additional descriptive statistics may be necessary for specific situations.

Standard Deviation Condiderations

When using standard deviation for certain statistical analyses, it is crucial to consider the following factors to ensure accurate and meaningful insights into the data’s variability:

Scale of measurement: The data is measured on an interval or ratio scale.

Validity of the mean: The mean is a valid measure of central tendency.

Normal distribution assumption (when applicable): The data follows a normal distribution. This assumption is relevant for specific statistical tests and methods that involve the standard deviation.

Independence of observations: The data points are independent of each other.

Homoscedasticity (when applicable): The variability in the data is constant across different levels of the independent variable(s). This assumption is relevant when using the standard deviation in linear regression and other parametric analyses.

Understanding these factors is essential when using standard deviation in statistical analysis to ensure accurate and meaningful insights into the data’s variability.

When to Use Standard Deviation

Consider using standard deviation when quantifying dispersion or comparing the variability between datasets with similar means. It’s also helpful for assessing data consistency, evaluating risk or volatility in finance, and analyzing normally distributed data. However, keep in mind its limitations and use other measures of dispersion for skewed or non-interval data. Use standard deviation alongside other statistical tools to comprehensively understand your data.

Key Information on Standard Deviation

Topic Information

Standard deviation is a fundamental measure in statistics and data analysis that quantifies the dispersion of data points in a dataset relative to the mean. It plays a vital role in various fields and industries, helping professionals understand the variability of data, assess its reliability, and make informed decisions. However, it’s essential to be aware of its limitations and the importance of context when interpreting standard deviation. By combining standard deviation with other statistical tools and methods, researchers and analysts can gain a comprehensive understanding of their data and derive valuable insights for decision-making processes.

Refine your data analysis skills and present meaningful insights with confidence using our latest digital book!

Access FREE samples now and master advanced techniques in data analysis, including optimal sample size determination and effective communication of results.

Don’t miss the chance to immerse yourself in  Applied Statistics: Data Analysis  and unlock your full potential in data-driven decision making.

Click the link to start exploring!

Applied Statistics: Data Analysis

Can Standard Deviations Be Negative?

Connect with us on our social networks.

DAILY POSTS ON INSTAGRAM!

What is Standard Deviation Statistics

Similar posts.

absolute mean deviation

Absolute Mean Deviation: Demystifying the Key Statistics Concept

Explore the key statistics concept, Absolute Mean Deviation, its advantages, calculation process, and practical application.

standard deviation

What Is The Standard Deviation?

Explore the concept of Standard Deviation, a critical statistical measure, to understand data variability and when to apply it in data analysis.

how standard deviation is calculated

A Comprehensive Guide to How Standard Deviation is Calculated

Unlock the power of data analysis by learning how standard deviation is calculated. Enhance your statistical skills with our guide.

standard deviation rules

Standard Deviation Rules Misconceptions

Standard deviation rules are often misunderstood, leading to incorrect data analysis. Learn the truth about these rules and how to use them correctly with this guide.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

standard deviation use in analysis of research data

Find the right market research agencies, suppliers, platforms, and facilities by exploring the services and solutions that best match your needs

list of top MR Specialties

Browse all specialties

Browse Companies and Platforms

by Specialty

by Location

Browse Focus Group Facilities

standard deviation use in analysis of research data

Manage your listing

Follow a step-by-step guide with online chat support to create or manage your listing.

About Greenbook Directory

IIEX Conferences

Discover the future of insights at the Insight Innovation Exchange (IIEX) event closest to you

IIEX Virtual Events

Explore important trends, best practices, and innovative use cases without leaving your desk

Insights Tech Showcase

See the latest research tech in action during curated interactive demos from top vendors

Stay updated on what’s new in insights and learn about solutions to the challenges you face

Latest on Insights

Editor's Choice

Greenbook Podcast

The Exchange

Lenny Murphy and Karen Lynch debate current news - live on Linkedln and Youtube every Friday

See more on YouTube | LinkedIn

Greenbook Future list

An esteemed awards program that supports and encourages the voices of emerging leaders in the insight community.

Insight Innovation Competition

Submit your innovation that could impact the insights and market research industry for the better.

Find your next position in the world's largest database of market research and data analytics jobs.

standard deviation use in analysis of research data

For Suppliers

Directory: Renew your listing

Directory: Create a listing

Event sponsorship

Get Recommended Program

Digital Ads

Content marketing

Ads in Reports

Podcasts sponsorship

Run your Webinar

Host a Tech Showcase

Future List Partnership

All services

standard deviation use in analysis of research data

Dana Stanley

Greenbook’s Chief Revenue Officer

How to Interpret Standard Deviation and Standard Error in Survey Research

Presented by DataStar, Inc.

Standard Deviation and Standard Error are perhaps the two least understood statistics commonly shown in data tables. The following article is intended to explain their meaning and provide additional insight on how they are used in data analysis.

Standard Deviation and Standard Error are perhaps the two least understood statistics commonly shown in data tables. The following article is intended to explain their meaning and provide additional insight on how they are used in data analysis. Both statistics are typically shown with the mean of a variable, and in a sense, they both speak about the mean. They are often referred to as the "standard deviation of the mean" and the "standard error of the mean." However, they are not interchangeable and represent very different concepts.

Standard Deviation Standard Deviation (often abbreviated as "Std Dev" or "SD") provides an indication of how far the individual responses to a question vary or "deviate" from the mean. SD tells the researcher how spread out the responses are -- are they concentrated around the mean, or scattered far & wide? Did all of your respondents rate your product in the middle of your scale, or did some love it and some hate it?

Let's say you've asked respondents to rate your product on a series of attributes on a 5-point scale. The mean for a group of ten respondents (labeled 'A' through 'J' below) for "good value for the money" was 3.2 with a SD of 0.4 and the mean for "product reliability" was 3.4 with a SD of 2.1. At first glance (looking at the means only) it would seem that reliability was rated higher than value. But the higher SD for reliability could indicate (as shown in the distribution below) that responses were very polarized, where most respondents had no reliability issues (rated the attribute a "5"), but a smaller, but important segment of respondents, had a reliability problem and rated the attribute "1". Looking at the mean alone tells only part of the story, yet all too often, this is what researchers focus on. The distribution of responses is important to consider and the SD provides a valuable descriptive measure of this.

Respondent: Good Value
for the Money:
Product
Reliability:
A 3 1
B 3 1
C 3 1
D 3 1
E 4 5
F 4 5
G 3 5
H 3 5
I 3 5
J 3 5
Mean 3.2 3.4
Std Dev 0.4 2.1

Two very different distributions of responses to a 5-point rating scale can yield the same mean. Consider the following example showing response values for two different ratings. In the first example (Rating "A") the Standard Deviation is zero because ALL responses were exactly the mean value. The individual responses did not deviate at all from the mean. In Rating "B", even though the group mean is the same (3.0) as the first distribution, the Standard Deviation is higher. The Standard Deviation of 1.15 shows that the individual responses, on average*, were a little over 1 point away from the mean.

Respondent: Rating "A" Rating "B"
A 3 1
B 3 2
C 3 2
D 3 3
E 3 3
F 3 3
G 3 3
H 3 4
I 3 4
J 3 5
Mean 3.0 3.0
Std Dev 0.00 1.15

Another way of looking at Standard Deviation is by plotting the distribution as a histogram of responses. A distribution with a low SD would display as a tall narrow shape, while a large SD would be indicated by a wider shape.

SD generally does not indicate "right or wrong" or "better or worse" -- a lower SD is not necessarily more desireable. It is used purely as a descriptive statistic. It describes the distribution in relation to the mean.

*Technical disclaimer: thinking of the Standard Deviation as an "average deviation" is an excellent way of conceptionally understanding its meaning. However, it is not actually calculated as an average (if it were, we would call it the "average deviation"). Instead, it is "standardized," a somewhat complex method of computing the value using the sum of the squares. For practical purposes, the computation is not important. Most tabulation programs, spreadsheets or other data management tools will calculate the SD for you. More important is to understand what the statistics convey.

Standard Error The Standard Error ("Std Err" or "SE"), is an indication of the reliability of the mean. A small SE is an indication that the sample mean is a more accurate reflection of the actual population mean. A larger sample size will normally result in a smaller SE (while SD is not directly affected by sample size).

Most survey research involves drawing a sample from a population. We then make inferences about the population from the results obtained from that sample. If a second sample was drawn, the results probably won't exactly match the first sample. If the mean value for a rating attribute was 3.2 for one sample, it might be 3.4 for a second sample of the same size. If we were to draw an infinite number of samples (of equal size) from our population, we could display the observed means as a distribution. We could then calculate an average of all of our sample means. This mean would equal the true population mean. We can also calculate the Standard Deviation of the distribution of sample means. The Standard Deviation of this distribution of sample means is the Standard Error of each individual sample mean. Put another way, Standard Error is the Standard Deviation of the population mean .

Sample: Mean
1st 3.2
2nd 3.4
3rd 3.3
4th 3.2
5th 3.1
. .
. .
. .
Mean 3.3
Std Dev 0.13

Think about this. If the SD of this distribution helps us to understand how far a sample mean is from the true population mean, then we can use this to understand how accurate any individual sample mean is in relation to the true mean. That is the essence of the Standard Error. In actuality we have only drawn a single sample from our population, but we can use this result to provide an estimate of the reliability of our observed sample mean.

In fact, SE tells us that we can be 95% confident that our observed sample mean is plus or minus roughly 2 (actually 1.96) Standard Errors from the population mean.

The below table shows the distribution of responses from our first (and only) sample used for our research. The SE of 0.13, being relatively small, gives us an indication that our mean is relatively close to the true mean of our overall population. The margin of error (at 95% confidence) for our mean is (roughly) twice that value (+/- 0.26), telling us that the true mean is most likely between 2.94 and 3.46.

Respondent: Rating:
A 3
B 3
C 3
D 3
E 4
F 4
G 3
H 3
I 3
J 3
Mean 3.2
Std Err 0.13

Summary Many researchers fail to understand the distinction between Standard Deviation and Standard Error, even though they are commonly included in data analysis. While the actual calculations for Standard Deviation and Standard Error look very similar, they represent two very different, but complementary, measures. SD tells us about the shape of our distribution, how close the individual data values are from the mean value. SE tells us how close our sample mean is to the true mean of the overall population. Together, they help to provide a more complete picture than the mean alone can tell us.

Presented by

DataStar, Inc.

EAST WEYMOUTH, Massachusetts

SOCIAL MEDIA

Save to my lists

Featured expert

DataStar, Inc.

Data & Analytics

Data Collection

Quantitative Research

We are the Survey Specialists! Contact us for top quality survey management, incl. web programming/hosting, mail, data entry, tabulation and analysis.

Why choose DataStar, Inc.

standard deviation use in analysis of research data

Wide variety of services

Customized solutions

Fast turnaround

Proven research expertise

Competitive pricing

Learn more about DataStar, Inc.

Sign Up for Updates

Get content that matters, written by top insights industry experts, delivered right to your inbox.

standard deviation use in analysis of research data

67k+ subscribers

Weekly Newsletter

Event Updates

I agree to receive emails with insights-related content from Greenbook. I understand that I can manage my email preferences or unsubscribe at any time and that Greenbook protects my privacy under the General Data Protection Regulation.*

Get the latest updates from top market research, insights, and analytics experts delivered weekly to your inbox

Your guide for all things market research and consumer insights

Create a New Listing

Manage My Listing

Find Companies

Find Focus Group Facilities

Tech Showcases

GRIT Report

Expert Channels

Get in touch

Marketing Services

Future List

Publish With Us

Privacy policy

Cookie policy

Terms of use

Copyright © 2024 New York AMA Communication Services, Inc. All rights reserved. 234 5th Avenue, 2nd Floor, New York, NY 10001 | Phone: (212) 849-2752

  • En español – ExME
  • Em português – EME

A beginner’s guide to standard deviation and standard error

Posted on 26th September 2018 by Eveliina Ilola

Stick person, confused, with 2 equations either side of head

What is standard deviation?

Standard deviation tells you how spread out the data is. It is a measure of how far each observed value is from the mean. In any distribution, about 95% of values will be within 2 standard deviations of the mean.

standard deviation use in analysis of research data

How to calculate standard deviation

Standard deviation is rarely calculated by hand. It can, however, be done using the formula below, where x represents a value in a data set, μ represents the mean of the data set and N represents the number of values in the data set.

standard deviation use in analysis of research data

The steps in calculating the standard deviation are as follows:

  • For each value, find its distance to the mean
  • For each value, find the square of this distance
  • Find the sum of these squared values
  • Divide the sum by the number of values in the data set
  • Find the square root of this

What is standard error?

When you are conducting research, you often only collect data of a small sample of the whole population. Because of this, you are likely to end up with slightly different sets of values with slightly different means each time.

If you take enough samples from a population, the means will be arranged into a distribution around the true population mean. The standard deviation of this distribution, i.e. the standard deviation of sample means, is called the standard error.

The standard error tells you how accurate the mean of any given sample from that population is likely to be compared to the true population mean. When the standard error increases, i.e. the means are more spread out, it becomes more likely that any given mean is an inaccurate representation of the true population mean.

How to calculate standard error

Standard error can be calculated using the formula below, where σ represents standard deviation and n represents sample size.

standard deviation use in analysis of research data

Standard error increases when standard deviation, i.e. the variance of the population, increases. Standard error decreases when sample size increases – as the sample size gets closer to the true size of the population, the sample means cluster more and more around the true population mean.

Image 1: Dan Kernler via Wikipedia Commons: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG 

Image 2: https://www.khanacademy.org/math/probability/data-distributions-a1/summarizing-spread-distributions/a/calculating-standard-deviation-step-by-step

Image 3: https://toptipbio.com/standard-error-formula/

http://www.statisticshowto.com/probability-and-statistics/standard-deviation/

http://www.statisticshowto.com/what-is-the-standard-error-of-a-sample/

https://www.statsdirect.co.uk/help/basic_descriptive_statistics/standard_deviation.htm

https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/2-mean-and-standard-deviation

' src=

Eveliina Ilola

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on A beginner’s guide to standard deviation and standard error

' src=

If it is allowable , I need this topic in the form of pdf

' src=

Thanks for the question Freddie. I have put it onto our Twitter account to see if any of the community can help with this. This article is interesting, but doesn’t answer your question of what to do when the error bar is not labelled: https://www.statisticshowto.com/error-bar-definition/ . I wonder how common this is? I’ll post any answers I get via twitter on here.

' src=

Hi I sometimes see bar charts with error bars, but it is not always stated if such bars are standard deviation or standard error bars. Is there some way to tell if the bars are SD or SE bars if they are not labelled ?

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Using Measures of Variability to Inspect Homogeneity of a Sample: Part 1

Measures of variability are statistical tools that help us assess data variability by informing us about the quality of a dataset mean. This first of two blogs on the topic will cover basic concepts of range, standard deviation, and variance.

Uncomplicated Reviews of Educational Research Methods

  • Mean & Standard Deviation

.pdf version of this page

Descriptive statistics summarize data. To aid in comprehension, we can reorganize scores into lists. For example, we might put test scores in order, so that we can quickly see the lowest and highest scores in a group (this is called an ordinal variable, by the way. You can learn more about scales of measure here ). After arranging data, we can determine frequencies, which are the basis of such descriptive measures as mean, median, mode, range, and standard deviation. Let’s walk through an example using test scores:

1

100

As you can see, these scores are not in a user-friendly, interpretable format.

The first thing to do is order them.

100

Now you’ve got an ordered list that much easier to interpret at first glance.

You can go a step further and put like numbers together.

100

4

2

78

100

97

2

3

97

100

93

1

4

97

100

88

1

5

78

97

87

1

6

88

97

86

1

7

87

93

78

2

8

100

88

This is called a .

9

86

87

10

93

86

11

95

78

12

92

78

Now, we can take those same scores and get some more useful information. Recall that Mean is arithmetic average of the scores, calculated by adding all the scores and dividing by the total number of scores. Excel will perform this function for you using the command =AVERAGE(Number:Number).

Now we know the average score, but maybe knowing the range would help. Recall that Range is the difference between the highest and lowest scores in a distribution, calculated by taking the lowest score from the highest. You can calculate this one by simple subtraction.

Understanding range may lead you to wonder how most students scored. In other words, you know what they scored, but maybe you want to know about where the majority of student scores fell – in other words, the variance of scores. Standard Deviation introduces two important things, The Normal Curve (shown below) and the 68/95/99.7 Rule. We’ll return to the rule soon.

standardcurve

The Normal Curve tells us that numerical data will be distributed in a pattern around an average (the center line).

Standard deviation is considered the most useful index of variability. It is a single number that tells us the variability, or spread, of a distribution (group of scores). Standard Deviation is calculated by:

Step 1. Determine the mean.

Step 2. Take the mean from the score.

Step 3. Square that number.

Step 4. Take the square root of the total of squared scores.

Excel will perform this function for you using the command =STDEV(Number:Number).

1

100

Click on an empty cell and type in the formula for Standard Deviation.

The formula returns the number, in this case, 8.399134, which we round to

2

78

3

97

4

97

5

78

=STDEV(A2:A13)

6

88

7

87

8

100

9

86

10

93

11

95

12

92

That number, 8.40, is 1 unit of standard deviation. The 68/95/99.7 Rule tells us that standard deviations can be converted to percentages, so that:

  • 68% of scores fall within 1 SD of the mean.
  • 95% of all scores fall within 2 SD of the mean.
  • 99.7% of all scores fall within 3 SD of the mean.

For the visual learners, you can put those percentages directly into the standard curve:

standardcurvewpercents

Since 1 SD in our example is 8.40, and we know that the mean is 92, we can be sure that 68% of the scores on this test fall between 83.6 and 100.4. To get this range, I simply added 1 SD (8.40) to the mean (92), and took 1 SD away from the mean. Sometimes you see SD referred to as +/- in a journal article, which is indicating the same thing. Note: Quick thinkers will notice that since 50% of the sample is below the mean (to the left of 0 on the curve), you can add percentages. In other words, 84.13% of the scores fall 1SD above the mean. To get that number, I took the percentages between -3 SD and 0 on the left, (which equal 50), then added the percentage from 0 to 1 SD on the right (which is .3413).

Share this:

About research rundowns.

Research Rundowns was made possible by support from the Dewar College of Education at Valdosta State University .

  • Experimental Design
  • What is Educational Research?
  • Writing Research Questions
  • Mixed Methods Research Designs
  • Qualitative Coding & Analysis
  • Qualitative Research Design
  • Correlation
  • Effect Size
  • Instrument, Validity, Reliability
  • Significance Testing (t-tests)
  • Steps 1-4: Finding Research
  • Steps 5-6: Analyzing & Organizing
  • Steps 7-9: Citing & Writing
  • Writing a Research Report

Create a free website or blog at WordPress.com.

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar

logo

How to Interpret Standard Deviation Results

How to Interpret Standard Deviation Results

In the realm of statistical analysis, Standard Deviation is a term that often emerges. It is a measure used in statistics that quantifies the amount of dispersion or variation in a set of values. But how does one interpret the results derived from calculating the standard deviation?

If you're keen to unravel the mysteries of standard deviation, then you've come to the right place. This article is dedicated to helping you interpret standard deviation results with confidence and clarity.

What is Standard Deviation?

Before delving into the interpretation, let's brush up on what standard deviation is. Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.

If the data points are further from the mean, there is a higher deviation within the data set; thus, the more spread out the data, the higher the standard deviation.

Understanding the Importance of Standard Deviation:

Standard Deviation is a crucial statistical tool because it allows us to comprehend the degree of variation in a dataset. It provides insights into how much variation or "dispersion" exists from the average (mean), or expected value.

A low standard deviation signifies that the values tend to be close to the mean, whereas a high standard deviation indicates that the values are spread out over a wider range.

How to Calculate Standard Deviation?

The standard deviation can be calculated using various tools and techniques. These range from manual calculations to sophisticated statistical software. However, one of the simplest and most convenient methods is using an online tool, such as the Standard Deviation Calculator .

This user-friendly platform offers an effortless way to compute the standard deviation for any given data set.

Interpreting Standard Deviation Results:

Interpreting the results of standard deviation involves understanding its relationship with the mean and the overall data set.

Low Standard Deviation:

A low standard deviation indicates that the data points are generally close to the mean or the expected value. This implies that there is less variability in the data set, and the values are relatively consistent.

High Standard Deviation:

In contrast, a high standard deviation indicates that data points are spread out over a large range of values. This spread signifies a higher level of variability or volatility within the dataset.

Standard Deviation in Real-World Scenarios:

The interpretation of standard deviation becomes much more relatable when applied to real-world scenarios. For instance, in finance, a high standard deviation of stock returns would imply higher volatility and, thus, a riskier investment. In research studies, a high standard deviation might reflect a larger spread of data, which could influence the study's reliability and validity.

Understanding and interpreting standard deviation results is a skill that proves valuable across multiple disciplines, from finance to scientific research. Armed with the Standard Deviation Calculator and the knowledge from this article, you're well on your way to interpreting standard deviation results with ease.

Remember, a low standard deviation signifies consistency, while a high standard deviation denotes variability. Here's to making confident, data-driven decisions!

How Standard Deviation Can Help You Make Better Decisions

Banner

  • Why Study Statistics?
  • Descriptive & Inferential Statistics
  • Fundamental Elements of Statistics
  • Quantitative and Qualitative Data
  • Measurement Data Levels
  • Collecting Data
  • Ethics in Statistics
  • Describing Qualitative Data
  • Describing Quantitative Data
  • Stem-and-Leaf Plots
  • Measures of Central Tendency
  • Measures of Variability
  • Describing Data using the Mean and Standard Deviation
  • Measures of Position
  • Counting Techniques
  • Simple & Compound Events
  • Independent and Dependent Events
  • Mutually Exclusive and Non-Mutually Exclusive Events
  • Permutations and Combinations
  • Normal Distribution
  • Central Limit Theorem
  • Confidence Intervals
  • Determining the Sample Size
  • Hypothesis Testing
  • Hypothesis Testing Process

How does the mean and standard deviation describe data?

The standard deviation is a measurement in reference to the mean that means:

  • A large standard deviation indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean.
  • When deciding whether sample measurements are suitable inferences for the population, the standard deviation of those measurements is of crucial importance.
  • Standard deviations are often used as a measure of risk in finance associated with price-fluctuations of stocks, bonds, etc.

Chebyshev's rule  is an approximation of the percentage of data points captured between deviations of any data set. 

 greater than 1, at least \(1-\frac{1}{k^2}\) of the measurements will fall within  standard deviations of the mean.

undefined

Example: A sample of size \(n=50\) has mean \(\bar{x}=28\) and standard deviation \(s=3\). Without knowing anything else about the sample, what can be said about the number of observations that lie in the interval \(922,34)\)? What can be said about the number of observations that lie outside the interval?

The interval \((22,34)\) is formed by adding and subtracting two standard deviations from the mean. By Chebyshev's Theorem, at least \(\frac{3}{4}\) of the data are within this interval. Since \(\frac{3}{4}\) of \(50\) is \(37.5\), this means that at least 37.5 observations are in the interval. But \(.5\) of a measurement does not make sense, so we conclude that at least 38 observations must lie inside the interval \((22,34)\).

If \(\frac{3}{4}\) of the observations are made inside the interval, than \(\frac{1}{4}\) of them are outside. We conclude that at most 12 \((50-38=12)\) observations lie outside the interval \((22,34)\).

There are more  accurate  ways of calculating the percentage or number of intervals inside standard deviations. Chebyshev's Theorem and the empirical rule we'll introduce next are just approximations.

If the histogram of a data set is approximately bell-shaped, we can approximate the percentage of data between standard deviations using the  empirical rule . 

undefined

Example: Heights of 18-yr-old males have a bell-shaped distribution with mean \(69.6\) inches and standard deviation \(1.4\) inches. About what proportion of all such mean are between 68.2 and 71 inches tall? And What interval centered on the mean should contain about 95% of all such mean?

Since the interval \((68.2,71.0)\) are one standard deviation from the mean, by the emprical rule, 68% of all 18-year old males have heights in this range.

95% by the empirical rule represents plus/minus two standard deviations from the mean.

\[\bar{x} \pm 2s = 69.6 \pm 2(1.4) = 66.8,\,72.4\]

Therefore, 95% of the mean are between 66.8 inches to 72.4 inches.

  • Practice Questions - Empirical Rule

standard deviation use in analysis of research data

  • << Previous: Measures of Variability
  • Next: Measures of Position >>
  • Last Updated: Apr 20, 2023 12:47 PM
  • URL: https://libraryguides.centennialcollege.ca/c.php?g=717168

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Perspect Clin Res
  • v.3(3); Jul-Sep 2012

What to use to express the variability of data: Standard deviation or standard error of mean?

Mohini p. barde.

Shrimohini Centre for Medical Writing and Biostatistics Pune, Maharashtra, India

Prajakt J. Barde

1 Glenmark Pharmaceutical Ltd., Mumbai, Maharashtra, India

Statistics plays a vital role in biomedical research. It helps present data precisely and draws the meaningful conclusions. While presenting data, one should be aware of using adequate statistical measures. In biomedical journals, Standard Error of Mean (SEM) and Standard Deviation (SD) are used interchangeably to express the variability; though they measure different parameters. SEM quantifies uncertainty in estimate of the mean whereas SD indicates dispersion of the data from mean. As readers are generally interested in knowing the variability within sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate. Journals can avoid such errors by requiring authors to adhere to their guidelines.

INTRODUCTION

Statistics plays a vital role in biomedical research. It helps present data precisely and draws meaningful conclusions. A large number of biomedical articles have statistical errors either in presentation[ 1 – 3 ] or analysis of data. The scathing remark by Yates “ It is depressing to find how much good biological work is in danger of being wasted through incompetent and misleading analysis .” highlights need of proper understanding of statistics and its appropriate use in medical literature.

In late nineties, biomedical journals have made a concerted effort to improve quality of statistics.[ 4 – 6 ] Despite this, errors are still present in published articles. One such common error is use of SEM instead of SD to express variability of data.[ 7 – 10 ] Negele et al , also showed clearly that a significant number of published articles in leading journals had misused SEM in descriptive statistics.[ 11 ] In this article, we discussed the concept and use of SD and SEM.

CONCEPT OF SD AND SEM

To study the entire population is time and resource intensive and not always feasible; therefore studies are often done on the sample; and data is summarized using descriptive statistics. These findings are further generalized to the larger, unobserved population using inferential statistics.

For example, in order to understand cholesterol levels of the population, cholesterol levels of study sample, drawn from same population are measured. The findings of this sample are best described by two parameters; mean and SD. Sample mean is average of these observations and denoted by X ̄ . It is the center of distribution of observations (central tendency). Other parameter, SD tells us dispersion of individual observations about the mean. In other words, it characterizes typical distance of an observation from distribution center or middle value. If observations are more disperse, then there will be more variability. Thus, a low SD signifies less variability while high SD indicates more spread out of data. Mathematically, the SD is[ 12 ]

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g001.jpg

s = sample SD; X - individual value; X ̄ - sample mean; n = sample size.

Figure 1a shows cholesterol levels of population of 200 healthy individuals. Cholesterol of the most of individuals is between 190-210mg/dl, with a mean (μ) 200mg/dl and SD (s) 10mg/dl. A study in 10 individuals drawn from same population with cholesterol levels of 180, 200, 190, 180, 220, 190, 230, 190, 190, 180mg/dl gives X ̄ = 195 mg/dl and SD (s) = 17.1 mg/dl.

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g002.jpg

If one draws three different groups of 10 individuals each, one will obtain three different mean and SD. (Adapted from Glantz, 2002)

These sample results are used to make inferences based on the premise that what is true for a randomly selected sample will be true, more or less, for the population from which the sample is chosen. This means, sample mean ( X ̄ ) estimates the true but unknown population mean (μ) and sample SD (s) estimates population SD (s). However, the precision with which sample results determine population parameters needs to be addressed. Thus, in above case X ̄ = 195 mg/ dl estimates the population mean μ = 200 mg/dl. If other samples of 10 individuals are selected, because of intrinsic variability, it is unlikely that exactly same mean and SD [Figures ​ [Figures1b, 1b , c and d] would be observed; and therefore we may expect different estimate of population mean every time.

Figure 2 shows mean of 25 groups of 10 individuals each drawn from the population shown in Figure 1 . If these 25 group means are treated as 25 observations, then as per the statistical “Central Limit Theorem” these observations will be normally distributed regardless of nature of original population. Mean of all these sample means will equal the mean of original population and standard deviation of all these sample means will be called as SEM as explained below.

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g003.jpg

This figure illustrates the mean of 25 groups of 10 individuals each drawn from the population of 200 individuals shown in the Figure 1 . The means of three groups shown in Figure 1 are shown using circles filled with corresponding patterns

SEM is the standard deviation of mean of random samples drawn from the original population. Just as the sample SD (s) is an estimate of variability of observations, SEM is an estimate of variability of possible values of means of samples. As mean values are considered for calculation of SEM, it is expected that there will be less variability in the values of sample mean than in the original population. This shows that SEM is a measure of the precision with which sample mean X ̄ estimate the population mean μ. The precision increases as the sample size increases [ Figure 3 ].

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g004.jpg

The figure shows that the SEM is a function of the sample size

Thus, SEM quantifies uncertainty in the estimate of the mean.[ 13 , 14 ] Mathematically, the best estimate of SEM from single sample is[ 15 ]

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g005.jpg

σ M = SEM; s = SD of sample; n = sample size.

However, SEM by itself doesn’t convey much useful information. Its main function is to help construct confidence intervals (CI).[ 16 ] CI is the range of values that is believed to encompass the actual (“true”) population value. This true population value usually is not known, but can be estimated from an appropriately selected sample. If samples are drawn repeatedly from population and CI is constructed for every sample, then certain percentage of CIs can include the value of true population while certain percentage will not include that value. Wider CIs indicate lesser precision, while narrower ones indicate greater precision.[ 17 ]

CI is calculated for any desired degree of confidence by using sample size and variability (SD) of the sample, although 95% CIs are by far the most commonly used; indicating that the level of certainty to include true parameter value is 95%. CI for the true population mean μ is given by[ 12 ]

An external file that holds a picture, illustration, etc.
Object name is PCR-3-113-g006.jpg

s = SD of sample; n = sample size; z (standardized score) is the value of the standard normal distribution with the specific level of confidence. For a 95% CI, Z = 1.96.

A 95% CI for population as per the first sample with mean and SD as 195 mg/dl and 17.1 mg/dl respectively will be 184.4 - 205.5 mg/dl; indicating that the interval includes true population mean m = 200 mg/dl with 95% confidence. In essence, a confidence interval is a range that we expect, with some level of confidence, to include the actual value of population mean.[ 17 ]

APPLICATION

As explained above, SD and SEM estimate quite different things. But in many articles, SEM and SD are used interchangeably and authors summarize their data with SEM as it makes data seem less variable and more representative. However, unlike SD which quantifies the variability, SEM quantifies uncertainty in estimate of the mean.[ 13 ] As readers are generally interested in knowing the variability within sample and not proximity of mean to the population mean, data should be precisely summarized with SD and not with SEM.[ 18 , 19 ]

The importance of SD in clinical settings is discussed below. In a atherosclerotic disease study, an investigator reports mean peak systolic velocity (PSV) in the carotid artery, a measure of stenosis, as 220cm/sec with SD of 10cm/ sec.[ 20 ] In this case it would be unusual to observe PSV less than 200 cm/sec or greater than 240cm/sec as 95% of population fall within 2SD of the mean, assuming that the population follows a normal distribution. Thus, there is a quick summary of the population and the range against which to compare the specific findings. Unfortunately, investigators are quite likely to report the PSV as 220cm/ sec ± 1.6 (SEM). If one confused the SEM with the SD, one would believe that the range of the population is narrow (216.8 to 223.2cm/sec), which is not the case.

Additionally, when two groups are compared (e.g. treatment and control groups), SD helps in visualizing the effect size, which is an index of how much difference is there between two groups.[ 12 ] Effect size gives an idea of magnitude of difference to help differentiate between statistical significance and practical importance. Effect size is determined by calculating the difference between the means divided by the pooled or average standard deviation from two groups. Generally, effect size of 0.8 or more is considered as a large effect and indicates that the means of two groups are separated by 0.8SD; effect size of 0.5 and 0.2, are considered as moderate or small respectively and indicate that the means of the two groups are separated by 0.5 and 0.2SD.[ 12 ] However, same can’t be interpreted with SEM. More importantly, SEMs do not provide direct visual impression of the effect size, if number of subjects differs between groups.

Exceptionally the SD as an index of variability may be a deceptive one in many experimental situations where biological variable differs grossly from a normal distribution (e.g. distribution of plasma creatinine, growth rate of tumor and plasma concentration of immune or inflammatory mediators). In these cases, because of the skewed distribution, SD will be an inflated measure of variability. In such cases, data can be presented using other measures of variability (e.g. mean absolute deviation and the interquartile range), or can be transformed (common transformations include the logarithmic, inverse, square root, and arc sine transformations).[ 17 ]

Some journal editors require their authors to use the SD and not the SEM. There are two reasons for this trend. First, the SEM is a function of the sample size, so it can be made smaller simply by increasing the sample size (n) [ Figure 3 ]. Second, the interval (mean ± 2 SEM) will contain approximately 95% of the means of samples, but will never contain 95% of the observations on individuals; in the latter situation, mean ± 2 SD is needed.[ 21 ]

In general, the use of the SEM should be limited to inferential statistics where the author explicitly wants to inform the reader about the precision of the study, and how well the sample truly represents the entire population.[ 22 ] In graphs and figures too, use of SD is preferable to the SEM. Further, in every case, standard deviations should preferably be reported in parentheses [i.e., mean (SD)] than using mean ± SD expressions, as the latter specification can be confused with a 95% CI.[ 17 ]

Proper understanding and use of fundamental statistics, such as SD and SEM and their application will allow more reliable analysis, interpretation, and communication of data to readers. Though, SEM and SD are used interchangeably to express the variability; they measure different parameters. SEM, an inferential parameter, quantifies uncertainty in the estimate of the mean; whereas SD is a descriptive parameter and quantifies the variability. As readers are generally interested in knowing variability within the sample, descriptive data should be precisely summarized with SD. Use of SEM should be limited to compute CI which measures the precision of population estimate.

Source of Support: Nil.

Conflict of Interest: Dr. Prajakt Barde (Earlier was employee of Serum Institute of India, Ltd) is currently employee of Glenmark Pharmaceutical Ltd. Views and opinions presented in this article are solely those of the author and do not necessarily represent those of the author's present or past employers.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

By Jim Frost 80 Comments

A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data.

The two plots below show the difference graphically for distributions with the same mean but more and less dispersion. The panel on the left shows a distribution that is tightly clustered around the average, while the distribution in the right panel is more spread out.

Graph that shows two distributions with more and less variability.

Related post : Measures of Central Tendency: Mean, Median, and Mode

Why Understanding Variability is Important

Let’s take a step back and first get a handle on why understanding variability is so essential. Analysts frequently use the mean to summarize the center of a population or a process. While the mean is relevant, people often react to variability even more. When a distribution has lower variability, the values in a dataset are more consistent. However, when the variability is higher, the data points are more dissimilar and extreme values become more likely. Consequently, understanding variability helps you grasp the likelihood of unusual events.

In some situations, extreme values can cause problems! Have you seen a weather report where the meteorologist shows extreme heat and drought in one area and flooding in another? It would be nice to average those together! Frequently, we feel discomfort at the extremes more than the mean. Understanding that variability around the mean provides critical information.

Variability is everywhere. Your commute time to work varies a bit every day. When you order a favorite dish at a restaurant repeatedly, it isn’t exactly the same each time. The parts that come off an assembly line might appear to be identical, but they have subtly different lengths and widths.

These are all examples of real-life variability. Some degree of variation is unavoidable. However, too much inconsistency can cause problems. If your morning commute takes much longer than the mean travel time, you will be late for work. If the restaurant dish is much different than how it is usually, you might not like it at all. And, if a manufactured part is too much out of spec, it won’t function as intended.

Some variation is inevitable, but problems occur at the extremes. Distributions with greater variability produce observations with unusually large and small values more frequently than distributions with less variability.

Variability can also help you assess the sample’s heterogeneity .

Example of Different Amounts of Variability

Let’s take a look at two hypothetical pizza restaurants. They both advertise a mean delivery time of 20 minutes. When we’re ravenous, they both sound equally good! However, this equivalence can be deceptive! To determine the restaurant that you should order from when you’re hungry, we need to analyze their variability.

Suppose we study their delivery times, calculate the variability for each place, and determine that their variabilities are different. We’ve computed the standard deviations for both restaurants—which is a measure that we’ll come back to later in this post. How significant is this difference in getting pizza to their customers promptly?

The graphs below display the distribution of delivery times and provide the answer. The restaurant with more variable delivery times has the broader distribution curve. I’ve used the same scales in both graphs so you can visually compare the two distributions.

Graph that shows the distribution for high variability pizza delivery times.

In these graphs, we consider a 30-minute wait or longer to be unacceptable. We’re hungry after all! The shaded area in each chart represents the proportion of delivery times that surpass 30 minutes. Nearly 16% of the deliveries for the high variability restaurant exceed 30 minutes. On the other hand, only 2% of the deliveries take too long with the low variability restaurant. They both have an average delivery time of 20 minutes, but I know where I’d place my order when I’m hungry!

As this example shows, the central tendency doesn’t provide complete information. We also need to understand the variability around the middle of the distribution to get the full picture. Now, let’s move on to the different ways of measuring variability!

Let’s start with the range because it is the most straightforward measure of variability to calculate and the simplest to understand. The range of a dataset is the difference between the largest and smallest values in that dataset. For example, in the two datasets below, dataset 1 has a range of 20 – 38 = 18 while dataset 2 has a range of 11 – 52 = 41. Dataset 2 has a broader range and, hence, more variability than dataset 1.

Worksheet that shows two datasets that we'll use to calculate the range of the data as a measure of variability.

While the range is easy to understand, it is based on only the two most extreme values in the dataset, which makes it very susceptible to outliers. If one of those numbers is unusually high or low, it affects the entire range even if it is atypical.

Additionally, the size of the dataset affects the range. In general, you are less likely to observe extreme values. However, as you increase the sample size, you have more opportunities to obtain these extreme values. Consequently, when you draw random samples from the same population, the range tends to increase as the sample size increases. Consequently, use the range to compare variability only when the sample sizes are similar.

For more details, read my post, The Range in Statistics .

Learn how you can use the range to estimate the standard deviation using the range rule of thumb .

The Interquartile Range (IQR) . . . and other Percentiles

The interquartile range is the middle half of the data. To visualize it, think about the median value that splits the dataset in half. Similarly, you can divide the data into quarters. Statisticians refer to these quarters as quartiles and denote them from low to high as Q1, Q2, and Q3. The lowest quartile (Q1) contains the quarter of the dataset with the smallest values. The upper quartile (Q4) contains the quarter of the dataset with the highest values. The interquartile range is the middle half of the data that is in between the upper and lower quartiles. In other words, the interquartile range includes the 50% of data points that fall between Q1 and Q3. The IQR is the red area in the graph below.

Graph the illustrates the interquartile range as a measure of variability.

The interquartile range is a robust measure of variability in a similar manner that the median is a robust measure of central tendency. Neither measure is influenced dramatically by outliers because they don’t depend on every value. Additionally, the interquartile range is excellent for skewed distributions, just like the median. As you’ll learn, when you have a normal distribution, the standard deviation tells you the percentage of observations that fall specific distances from the mean. However, this doesn’t work for skewed distributions, and the IQR is a great alternative.

I’ve divided the dataset below into quartiles. The interquartile range (IQR) extends from the low end of Q2 to the upper limit of Q3. For this dataset, the range is 39 – 20 = 19.

Dataset that shows how to find the interquartile range (IQR)

Related posts : Quartile: Definition, Finding, and Using , Interquartile Range: Definition and Uses , and What are Robust Statistics?

Using other percentiles

When you have a skewed distribution, I find that reporting the median with the interquartile range is a particularly good combination. The interquartile range is equivalent to the region between the 75th and 25th percentile (75 – 25 = 50% of the data). You can also use other percentiles to determine the spread of different proportions. For example, the range between the 97.5th percentile and the 2.5th percentile covers 95% of the data. The broader these ranges, the higher the variability in your dataset.

Related post : Percentiles: Interpretations and Calculations

Variance is the average squared difference of the values from the mean. Unlike the previous measures of variability, the variance includes all values in the calculation by comparing each value to the mean. To calculate this statistic, you calculate a set of squared differences between the data points and the mean, sum them, and then divide by the number of observations. Hence, it’s the average squared difference.

There are two formulas for the variance depending on whether you are calculating the variance for an entire population or using a sample to estimate the population variance. The equations are below, and then I work through an example in a table to help bring it to life.

Population variance

The formula for the variance of an entire population is the following:

In the equation, σ 2 is the population parameter for the variance, μ is the parameter for the population mean, and N is the number of data points, which should include the entire population.

Statisticians refer to the numerator portion of the variance formula as the sum of squares .

Sample variance

To use a sample to estimate the variance for a population, use the following formula. Using the previous equation with sample data tends to underestimate the variability. Because it’s usually impossible to measure an entire population, statisticians use the equation for sample variances much more frequently.

In the equation, s 2 is the sample variance, and M is the sample mean. N-1 in the denominator corrects for the tendency of a sample to underestimate the population variance.

Example of calculating the sample variance

I’ll work through an example using the formula for a sample on a dataset with 17 observations in the table below. The numbers in parentheses represent the corresponding table column number. The procedure involves taking each observation (1), subtracting the sample mean (2) to calculate the difference (3), and squaring that difference (4). Then, I sum the squared differences at the bottom of the table. Finally, I take the sum and divide by 16 because I’m using the sample variance equation with 17 observations (17 – 1 = 16) . The variance for this dataset is 201.

Worksheet that shows two datasets that we'll use to calculate the variance of the data as a measure of variability.

Because the calculations use the squared differences, the variance is in squared units rather the original units of the data. While higher values of the variance indicate greater variability, there is no intuitive interpretation for specific values. Despite this limitation, various statistical tests use the variance in their calculations. For an example, read my post about the F-test and ANOVA .

While it is difficult to interpret the variance itself, the standard deviation resolves this problem!

For more details, read my post about the Variance .

Standard Deviation

The standard deviation is the standard or typical difference between each data point and the mean. When the values in a dataset are grouped closer together, you have a smaller standard deviation. On the other hand, when the values are spread out more, the standard deviation is larger because the standard distance is greater.

Conveniently, the standard deviation uses the original units of the data, which makes interpretation easier. Consequently, the standard deviation is the most widely used measure of variability. For example, in the pizza delivery example, a standard deviation of 5 indicates that the typical delivery time is plus or minus 5 minutes from the mean. It’s often reported along with the mean: 20 minutes (s.d. 5).

The standard deviation is just the square root of the variance. Recall that the variance is in squared units. Hence, the square root returns the value to the natural units. The symbol for the standard deviation as a population parameter is σ while s represents it as a sample estimate. To calculate the standard deviation, calculate the variance as shown above, and then take the square root of it. Voila! You have the standard deviation!

In the variance section, we calculated a variance of 201 in the table.

Standard deviation calculations.

Therefore, the standard deviation for that dataset is 14.177.

The standard deviation is similar to the mean absolute deviation. Both use the original data units and they compare the data values to mean to assess variability. However, there are differences. To learn more, read my post about the mean absolute deviation (MAD) .

People often confuse the standard deviation with the standard error of the mean. Both measures assess variability, but they have extremely different purposes. To learn more, read my post The Standard Error of the Mean .

Related post : Using the Standard Deviation

The Empirical Rule for the Standard Deviation of a Normal Distribution

When you have normally distributed data, or approximately so, the standard deviation becomes particularly valuable. You can use it to determine the proportion of the values that fall within a specified number of standard deviations from the mean. For example, in a normal distribution, 68% of the values will fall within +/- 1 standard deviation from the mean. This property is part of the Empirical Rule. This rule describes the percentage of the data that fall within specific numbers of standard deviations from the mean for bell-shaped curves.

1 68%
2 95%
3 99.7%

Let’s take another look at the pizza delivery example where we have a mean delivery time of 20 minutes and a standard deviation of 5 minutes. Using the Empirical Rule, we can use the mean and standard deviation to determine that 68% of the delivery times will fall between 15-25 minutes (20 +/- 5) and 95% will fall between 10-30 minutes (20 +/- 2*5).

Related posts : The Normal Distribution and Empirical Rule

Which is Best—the Range, Interquartile Range, or Standard Deviation?

First off, you probably notice that I didn’t include the variance as one of the options in the heading above. That’s because the variance is in squared units and doesn’t provide an intuitive interpretation. So, I’ve crossed that off the list. Let’s go over the other three measures of variability.

When you are comparing samples that are the same size, consider using the range as the measure of variability. It’s a reasonably intuitive statistic. Just be aware that a single outlier can throw the range off. The range is particularly suitable for small samples when you don’t have enough data to calculate the other measures reliably, and the likelihood of obtaining an outlier is also lower.

When you have a skewed distribution, the median is a better measure of central tendency, and it makes sense to pair it with either the interquartile range or other percentile-based ranges because all of these statistics divide the dataset into groups with specific proportions.

For normally distributed data, or even data that aren’t terribly skewed, using the tried and true combination reporting the mean and the standard deviation is the way to go. This combination is by far the most common. You can still supplement this approach with percentile-base ranges as you need.

Except for variances, the statistics in this post are absolute measures of variability because they use the original variable’s measurement units. Read my post about the coefficient of variation to learn about a relative measure of variability that can be advantageous in some circumstances.

Analysts frequently use measures of variability to describe their datasets. Learn how to Analyze Descriptive Statistics in Excel .

If you’re learning about statistics and like the approach I use in my blog, check out my Introduction to Statistics book! It’s available at Amazon and other retailers.

Cover of my Introduction to Statistics: An Intuitive Guide ebook.

Share this:

standard deviation use in analysis of research data

Reader Interactions

' src=

April 7, 2022 at 6:55 pm

Well narrated calculations and explanations. Big up

' src=

April 7, 2022 at 9:46 pm

Thanks so much!

' src=

April 3, 2022 at 11:58 pm

Jim, how can I buy your e-book, which cost USD 9?

April 4, 2022 at 1:50 am

Hi Lewis, just go to My Webstore , which is where I sell my ebooks. Scroll down past the Amazon links to find my ebooks. You’ll see my Introduction to Statistics ebook is available for USD 9$.

' src=

October 7, 2021 at 1:04 am

Thank you for taking the time to write these amazing posts. I’ve got time-series data on GDP as a measure of output for one country, and I’d like to see how volatile it is. My question is: how can I measure output volatility?

' src=

June 24, 2021 at 10:50 pm

Hello, sir, i am happy to come hear to find answer of my question. Sir, I used NMDS, PERMANOVA, and PERMDISP to examine data, and the results show that there is a location effect rather than dispersion. Now I don’t know what the impact of location effect on my data, please help me understand location effect in a dataset and what is its importance. thank you very much.

' src=

June 15, 2021 at 1:14 am

in measure of central tendency page, you told us that mode is only parameter to calculate the central of data. I want to ask you, how about the categorical data in measures of dispersion? can we calculate measures of dispersion for categorical data?

June 15, 2021 at 1:28 am

You’re correct, the mode is the only measure of central tendency for categorical data.

Variability for categorical variables is rarely used, but a form of it does exist. It’s fairly different than dispersion for continuous data. There is a coefficient of unalikeability, which measures how similar or dissimilar the outcome values are for categorical data. Unalikeability assesses how often observations differ from one another. For example, if the all outcomes are in one category, they are very similar (identical in the extreme case). However, if they’re spread out among the other categories more evenly, they become dissimilar. The coefficient of unalikeability measures that aspect.

I have never used it myself. But, it does exist! Perhaps I’ll write a post about it at some point. However, it’s not a commonly used measure.

You should note that you can’t use the standard measures of variability for categorical data. Typically, you won’t report variability for categorical data. You can report things like the mode and percentages for the various categories.

' src=

April 30, 2021 at 4:14 pm

What do you mean by weighted average? because all 3 tests are with the same sample data (10 measurements per test point). What I would like to do is combine these 3 standard deviations from each test (with different magnitudes) and determine an average value of the standard deviation that can represent the 3 tests. Its possible and valid from statistics point of view?

May 1, 2021 at 12:19 am

Hi Ricardo, you didn’t explain those extra details in your original question. I still don’t fully understand what you want to do so I’m unable to answer your question.

April 30, 2021 at 11:54 am

Continuing with the subject of estimated standard deviation, could be possible to calculate the average of 3 standard deviations calculated from 3 different sample data with different measurement values for each one (e.g. at 10, 50, and 90% of instrument range)? If so, which is the method to do it?

April 30, 2021 at 3:23 pm

Hi Ricardo,

You’d calculate a standard deviation for each dataset. If for some reason you wanted to calculate a pooled standard deviation for all three, there are several approaches.

You could combine all the data into one larger dataset and calculate the standard deviation for it.

Or, you can calculate the separate standard deviations and then calculate a weighted average. The weights are based on sample sizes such that larger samples have more weight.

The first approach calculates the variability of all data points around the grand mean of the combined dataset. Conversely, the weight average approach calculates the average variability of the data points in a group around their group’s mean rather than the grand mean. So, the method depends on which measure you need, variability around the grand mean or around the group means.

April 30, 2021 at 11:18 am

Hi Mr. Frost

Congratulations on your website, it is an excellent resource!

I have read in several guides related to the field of metrology that when a Type A measurement uncertainty evaluation (repeatability test) is carried out, to statistically calculate the standard uncertainty (68%) based on a series of repeated measurements with a normally distributed population is by using the estimated standard deviation (s) but in other guides, it is specified that for a normally distributed population, the best estimate of a standard uncertainty is by using the standard error of the mean.

My question is, what is the correct way to determine the estimated standard uncertainty (68%) in a Type A uncertainty evaluation assuming a normal distribution? Or when should one or the other be used?

Thanks in advance! RV

April 30, 2021 at 3:13 pm

I have not focused on metrology, so I’m not claiming to be an expert in that area. However, from my understanding you would use the standard deviation. For type A uncertainty, you are estimating the standard deviation from an observed frequency distribution. Whereas, for type B, it’s from an assumed distribution.

You almost certainly would not use the standard error of the mean. That measure is the standard deviation of the distribution for the sampling distribution of the means. It’s used to calculate p-values and confidence intervals. The standard error of the mean is not used to estimate the dispersion of observations but rather the dispersion of sample means.

I hope that helps!

' src=

April 28, 2021 at 10:42 pm

When calculating standard deviation what is x

April 28, 2021 at 10:52 pm

Hi Nimusiima,

X in the equation represent the individual observations/data points of your variable of interest. In the table in the section where I show an example of how to calculate the sample variance, you’ll see a column called data point. Each one of those values in that column is an X in the equation. As I explain in that section, you take each individual observation (data point, X) and subtract it from the mean of the variable, square that difference, and then sum all the squared differences.

For the standard deviation, which you’re asking about, all the above applies (including the part about X), but you just add one step at the end for taking the square root of the sum.

' src=

March 10, 2021 at 2:13 am

What are the various methods to measure variability between two data sets with the same variables?

' src=

March 4, 2021 at 3:36 pm

HI Jim, I have bought 2 or 3 books now mostly for my junior team members and for myself as well as a refresher. The most part I like with your approach and the blogs is the simplicity in explaining the concept with some real examples. That is more valuable than reading some other dry textbooks. My team is self-learning a lot as well. A book with use of Excel and functions (step by step) with example templates to download with a key would be an excellent addition.

March 4, 2021 at 11:54 pm

Thanks so much for your kind words! And, I’m glad my books have been helpful.

Yes! An Excel book is very high on my priority list! I’m not sure when I’ll be able to release it but it’s a book I want to write!

' src=

March 3, 2021 at 11:13 pm

Please explain me in simple terms why are we using (N-1) for sample variance in the denominator ?

Thanking you in advance.

March 3, 2021 at 11:17 pm

Hi Anubhab,

That’s known as degrees of freedom (DF). I’ve written a post about degrees of freedom and why you need to use N-1. That’ll explain it for you.

Thanks for writing!

' src=

January 27, 2021 at 1:01 am

what are the importance of measures of variability in educational assessment?

' src=

December 5, 2020 at 10:48 am

Hi. My name is Nana Zahid. Love the way it is explain in a very simple English. Am from Malaysia, your write up helped me a lots in doing my assignment for Ed.D. Thank you so much. Love from Malaysia.

December 5, 2020 at 10:53 pm

I’m so happy to hear that this was helpful! Best of luck on your assignment! 🙂

' src=

November 29, 2020 at 11:50 am

Thank you so much, Mr. Frost! I appreciate your help on both fronts.

' src=

November 28, 2020 at 7:04 pm

This is super helpful! I was hoping to cite your work in a paper I’m writing for my statistics course at Colorado State University. Do you happen to have a date of publication available? I can’t seem to find one.

November 29, 2020 at 2:24 am

Hi Megan, I’m so glad it was helpful! With electronic resources, the data you accessed the URL is more important because they can change. Please refer to Perdue’s Citing Electronic Sources guidelines for details.

' src=

November 23, 2020 at 4:01 am

thank you! this is very helpful

' src=

November 23, 2020 at 3:57 am

Thank you so much, Sir! This has been helpful for me and my friends in understanding this topic of statistics! Thank you again and have a great day 😊

November 23, 2020 at 2:36 am

I have a question. How do we compare the two standard deviations when the units of two sets of data are different?

Thank you so much, Sir! I love your book and your explanations. They have helped me indefinitely. Thank you from a high school student ☺️

November 23, 2020 at 2:50 am

It’s great that you’re taking statistics in High School! I think statistics often doesn’t get enough attention early enough, often not until college!

Unfortunately, when your data use different units, you can’t compare the standard deviations because those too will use different units. Sometimes, you can convert the original units to common units. For example, if you have weights in pounds and kilograms, you can convert the pounds into kilograms and you’ll have the same units. Then calculate the standard deviations while they’re in the same units and compare.

In some cases, you can’t convert the units because they’re inherently different. For example, you might want to compare the variability of weight measurements to strength measurements. In that case, you can use the coefficient of variation. I write about that measure in its own blog post: using the coefficient of variation .

Thanks for writing with the great question! Jim

' src=

September 24, 2020 at 7:41 am

I have purchased your three books and I am trying to get an understanding of indices of dispersion. The standard deviation or Interquartile Range in isolation do not mean very much which is why your histograms and box plots are so useful.

Could you give me a description of some of the indices of dispersion such as Karl Pearson’s coefficient of dispersion (ratio of the standard deviation to the mean) or the Quartile Coefficient of Dispersion (Q3-Q1)/(Q3+Q1).

Are these useful and how are they interpreted? Are there other useful indices of dispersion?

September 25, 2020 at 1:01 am

Hi Richard,

Thanks so much for supporting my books. I really appreciate that!

I guess there are several things I’d point out. First, yes, I think graphs and numeric statistics often work best when they’re used together. The graphs can present the data in a way that’s very easy to understand in just a glance. Meanwhile, the numeric values provide objective measurements that don’t depend on scaling issues. So, I’m not surprised that the histograms added some much needed context! Ideally, that should be standard practice when presenting results.

I think the measures of variability provide more information than you might be giving them credit. I agree that the range is often not helpful. For one thing, sample size affects the range. However, if you regularly collect a standard sample size, the range can be meaningful. Suppose you collect a daily sample of a certain size. The daily sample ranges will be relatively consistent between samples. However, if you see an unusually large range on a given day, it would be a signal to investigate that day’s unusual variability. I’ve seen that approach often used in the quality control context and provides a statistic that is easy to understand in that context.

The standard deviation is meaningful because it’s in the units of the variable and represents the standard difference between the observed values and the mean. That would seem to make it a relatively intuitive measure by itself. But then consider the Empirical Rule. If your data follow a normal distribution, you can easily determine where most of the values fall. Take the IQ distribution for example, mean of 100, standard deviation of 15. Based on that alone, we know that 95% of IQs fall between 100 +/- 2 *15 or [70, 130].

I also like the IQR because it doesn’t depend on the distribution being normal. You know that half the values fall within the range and half fall outside. Additionally, you can use other percentiles if that’s more meaningful.

I have not used the coefficient of dispersion or quartile coefficient of dispersion much. They are less useful for understanding your own dataset. In my mind, they have a very specific use–for comparing the variability across datasets when measurements use different units. When data use different units, you need a method for standardizing the different units, and that’s where these other unitless measures of coefficients of dispersion come into play. These measures answer the question, “relative to some measure of central tendency, how large is the variability.” So, these coefficients of dispersion are relative measures rather than an absolute measure, like the ones I discuss in my book (and earlier in this comment).

For example if your dataset has a coefficient of dispersion of 0.1 and another study has 0.2, you know that your study has less variable data. That’s handy if you’re measuring different characteristics. However, if you’re measuring the same characteristic (say IQ again), then just compare the standard deviations because they’re in the same units.

I’m sure there are certain instances where the coefficient of dispersion can be particularly helpful, but I think that’s less common overall than the absolute measures for the reasons I discuss.

Was there some type of information or interpretation you were hoping to gain relating to variability?

' src=

August 26, 2020 at 11:36 am

This is great explanation, great thanks

' src=

August 4, 2020 at 2:40 am

Thanks Jim for taking out the time to write such a helpful article. Its well well explained.

I am hoping to get one of my query resolved.

If I have a test observations value along with its statistical standard deviation, but the observation value is near zero and including StdDev the value is ranging between negative and positive zone.. in such case how to report the value which has statistically correct sign?

E.g. I have a value of -5 +/- 7 MPa stress and repeatations of same location gives values of +2 +/- 3 MPa, -5 +/- 2 MPa etc… Should I report sign? And what should be the thumb rule for reporting? Standard deviation? Equipment repeatability?

Thanks in advance… Suhail Mulla

August 5, 2020 at 12:18 am

I’m not sure exactly what you’re concern is? I don’t see a problem with using standard deviations in your scenario.

' src=

May 19, 2020 at 4:01 pm

Why would the standard deviation likely not be a reliable measure of variability for a distribution of data that includes at least one extreme outlier.

May 19, 2020 at 4:30 pm

You have to think about the calculations for the variance, which is the foundation for the standard deviation. To calculate the variance, you sum the squared differences between the data points and the mean. The key is the squaring of the distance. So, think of squared values: 1, 2, 4, 8, 16, etc. If you suddenly have a value of 10 and square it, that’s 100! The squared value is so large compared to the other values that it skews the results all by itself.

I’ve written a post about outliers and include an example for how a single outlier can affect both the mean and standard deviation greatly.

' src=

May 18, 2020 at 2:19 am

Well done! It was easy to understand and very concise. enjoyed your article.

' src=

May 9, 2020 at 10:06 am

Excellent session. thanks lot

' src=

April 22, 2020 at 3:21 am

Cv or inter qurtile range or range who is most affected by outlier

April 23, 2020 at 12:57 am

I’d imagine it’s the range because a single value can change the entire range. The effect of that one value is not “diluted” by the other values.

' src=

January 31, 2020 at 11:18 pm

Hey Jim, I just found out about your blog. Very sound explanations. I have a question. I am faced with a client I dont know much about, as I am immersing into their initial reports I notice there is a tendency to compare median to Q3 of leadtimes. This is reported as a “variance”, which semantically speaking is incorrect. My question is: is this comparison relevant? does it translate to variability ? Isn’t it worth analyzing the capability of the process, its sigma scale? Any insights would be welcome! Thanks a lot Carlos

' src=

September 25, 2019 at 9:31 am

In the section where you are discussing Interquartile ranges, you say that Q1 has the lowest values and Q3 has the highest. However, should this not be Q4 has the highest ranges, as there are 4 quartiles.

September 25, 2019 at 2:39 pm

Hi, that’s a typo. Thank you so much for catching it! You’re absolutely correct. I’ve updated the text to indicate that Q4 has the highest values.

' src=

May 8, 2019 at 10:50 pm

Dear Mr Jim, thank you for your answers! Maybe it is a choice. I have read some other references in terms of CV.But it seems there are few references related to the CV when the data are negative and positive.It seems the problems are to be solved.Maybe I have not found related references. Whatever, thank you for your advices. Could you please recommend me several references or books related to the CV when the data are negative. Thank you very much in advance!

May 8, 2019 at 4:25 am

Mr Jim,thank you for description of the the statistics in terms of variability. They are very useful to understand the indices for the persons whose majors are not statistics. I have a question. As you have mentioned, coeffiicent of variation (CV) can be used when all the data are positive. However, there are practically some negative data besides positive data, then how to use the CV? If it is not suitable to use the CV ,what are the other alternatives for the in terms of variability? I am confused on the question. Could you please give me some advices about the problems ? I am looking forward to your advices! Thank you very much in advance!

May 8, 2019 at 3:42 pm

Hello Mingming,

Yes, you can use the coefficient of variation with negative data. If the mean is negative, you’ll have a negative percentage for your CV, which you can interpret as if it was positive.

I hope this helps!

' src=

April 5, 2019 at 2:27 am

Hi Jim, Great content. In what sequence should your articles be read for starters and also which are the more helpful articles for understanding VAR, Value at Risk. Thank you

April 5, 2019 at 4:47 pm

As it is now, the articles can be read in the order that makes sense to you and click the links to additional articles as needed. However, I do plan to write an ebook that serves as an introduction to statistics. This ebook will present content in a logical order and greatly expand the content. I did that with regression by writing an ebook for that analysis, and that has worked really well. So, stay tuned!

' src=

March 26, 2019 at 4:43 pm

Hi Jim, I too have only praise for your blog.

I have another question regarding variance. Is there a way to compute variance from interquartile range? I am trying to perform a meta-analysis in Comprehensive Meta-analysis (CMA) with medians and interquartile ranges, but as per the CMA developer this is only possible if I can enter median plus variance. The data I am trying to met-analyze are by nature very skewed, which is why mean +/- SD will be biased.

Thanks a lot for your insight.

March 27, 2019 at 9:49 am

Hi Christoph,

Thanks for the kind words! I appreciate it!

To calculate the precise variance, you’d need the raw data. If your data had followed the normal distribution, you could use that to estimate the variance. You could find the normal distribution that produces 50% of the values falling within the interquartile range. Then square the standard deviation of that distribution to obtain the variance. I’ve also read that you can estimate the SD for a normal distribution by taking the IQR/1.35.

However, those approaches don’t work with nonnormal data. In fact, that’s probably why the authors of the original study presented the results in the manner they did. With very skewed data, you’d need to know the distribution to estimate the variance. I don’t know if you have that information. Offhand, I don’t know of another approach for estimating the variance.

I did find the reference to this study which might provide some helpful techniques. I haven’t read the article myself to know. The title doesn’t mention IQR, but perhaps you have the other information?

Hozo SP. et al. Estimating the mean and variance from the median, range, and the size of a sample. BMC Medical Research Methodology 2005, 5:13.

' src=

February 3, 2019 at 1:25 am

Hi sir, Thanks a lot for your blogs, they are really awesome, literally i have no words to explain you how helpful your blogs are. I have a one question: We are squaring the differences between mean and observation values because we get a resultant value(sum of all differences) zero if we don’t square them! so what we can interpret from that? why this resultant value gives zero? and what we can interpret from that?

February 4, 2019 at 2:47 pm

Hi Raja, thank you so much! I’m glad to hear that my blog has been helpful!

When you have a symmetric distribution, you’ll have an equal number of values above the mean as below the mean, and at the same distances. So, imagine you have one distribution where you have many observations that are say equally at +10 and -10. And, another distribution where many scores are near +1 and -1 equally. Clearly the first distribution is much more spread out. However, both distributions will sum to approximately zero, and have an average of approximately zero. So, summing the difference doesn’t allow you to differentiate between these distributions. You want the variability score for the first distribution to be larger to accurately reflect the fact that it is more spread out.

That’s why we use the squared differences, because you can add them up without the plusses and minuses cancelling each other out.

' src=

January 3, 2019 at 7:53 pm

Hi Jim, First of all, thanks a lot for taking time out to share your Statistical knowledge with the world. I have a question about Variance vs. Standard Deviation. Why do we even have Variance as a measure of dispersion when we know that it gives squared values which are big and we have to use standard deviation as the easy and more interpretative measure of dispersion anyways?

January 4, 2019 at 2:39 pm

That’s a very good question! While variance really doesn’t mean much to us humans, it turns out that it is importance in various statistical tests. ANOVA is, after all, the analysis of variance. The F-test assesses the ratio of variances to determine whether they are equal. Additionally, in linear models, we have the key notion of sums of squares, which is a similar concept as variances (being squared differences from the mean). So, it’s a useful measure behind the scenes for statistical tests. However, I can’t think of a real world situation where people would think that the actual value of the variance conveys anything meaningful.

' src=

August 1, 2018 at 5:01 am

thats too wonderful and lucid ! hope to clarify on many statistical confusions

' src=

April 10, 2018 at 5:10 am

So beautifully explained! Students all around the world would really benefit from your teachings!

April 12, 2018 at 3:01 pm

Thank you, Kai. That means so much to me!

' src=

March 16, 2018 at 12:10 pm

I am data science student. I have started following your articles . It gives me proper idea about statistics. It’s very beneficial to all non-statistical background people who really wants to learn proper statistics.

Thanks and regards, Hiral Godhania

March 16, 2018 at 1:12 pm

Hi Hiral, I’m so happy that you’ve found my articles to be helpful. Thank you so much for taking the time to write such a nice message!

' src=

March 7, 2018 at 1:46 pm

First of all. Thank you for your time and help to spread stats in simple way My question is I have a dataset with mean 2000 seconds and sd as 1950 seconds. What should I do, when I see such a big sd.

March 7, 2018 at 5:04 pm

Hi Sundar, I’m glad you’ve found my blog to be helpful!

On to your question. Because all of your values are going to be greater than zero, it makes sense to compare the mean and standard deviation. If you could have negative values, then it doesn’t make sense.

You can say that your standard deviation is large compared to the mean. This indicates that while you have an estimate of the central tendency, you really can’t say for any given observation that it is likely to be near the mean. Your data have a lot of variability.

Additionally, I can virtually guarantee you that your data are skewed because you can’t have values less than zero seconds. You tend to get skewed data when you are near a limit. The limit here is zero seconds. And, how near you are to it is defined by the distance between the limit and the central tendency as measured by standard deviations, which is ~1 s.d. in your example. That’s close–so your data are skewed.

You can also think about it in terms of The Empirical Rule. For a bell-shaped curve, you’d expect 95% of the values to fall within +/- 2 standard deviations of the mean. However, that range include negative values. The values that you’d expect to fall below zero must actually fall greater than zero. Hence, your distribution is skewed.

You should graph your data! As for what else you should do, it depends on your goal.

' src=

March 6, 2018 at 1:18 pm

I saw your post shared on social media by Carmen and noticed your at PSU! I’m a stats PhD student there. I found your blog interesting and intuitive and wanted to reach out to see if there are any resources you could share to help me improve my written and oral communication skills. I’m TAing for the first time this summer and want my class to be as interesting as your blog.

March 7, 2018 at 4:50 pm

Hi Isaac! First, thanks so much for your kind words about my blog. That means a lot to me!

As for resources, I don’t have any about written and oral communications skills. I wish I had something helpful to point you towards. I’ve been explaining statistics for several decades and that’s helped me refine my approach.

For starters, the fact that you’re asking about it indicates that you’re already placing a value on clear communications, which is great! I always imagine someone trying to learn this material for the first time. Some of it is very complex. But, you can often find a simple way to explain it. When you find ways that work better at communicating a concept than other, make note and use that. Always refining along the way.

I’ll often go out of my way to read material that teaches statistics and look for things that are missing or not clear, and it gives me an angle on my own writing. A deep understanding of statistics really helps this process. When I read something where I know a certain aspect is particularly important but perhaps the import isn’t clearly conveyed in the material, it stands out to me. Then, when I write about it, I’ll focus on that aspect more. I really try to hone it into something that a novice can grasp. It’s a process and involves refinement, trying new approaches, and seeing how others approach it (for better and worse).

I’m not at PSU any more. It’s been quite awhile. Your class is lucky to have a TA who really values clear communications!

' src=

March 3, 2018 at 10:28 pm

Yes it helps a bit. Thank you for a detailed explanation.

March 4, 2018 at 4:15 pm

You’re very welcome! Best of luck with your studies!

March 3, 2018 at 11:46 am

I did not understand why we subtract ‘-1’ from the sample size in the formula for sample variance. Why ‘-1’ ?

March 3, 2018 at 10:07 pm

Statisticians have found that samples tend to underestimate the variance when you simply divide by n. It turns out that the data points in a sample are closer together than they are in the population. Dividing by n-1, rather than n, solves this problem.

Reducing the denominator counteracts the tendency for underestimation. By dividing by a smaller number, the end result is a bit larger.

For example, let’s say that you have sum of squared differences of 100 and a sample size of 10. Dividing by n (10), you obtain a variance of 100/10 = 10. However, if you divide by n-1 (9), you obtain 100/9 = 11.1. Statisticians have found that using n tends to underestimate the variance (a biased estimator in statistical speak). However, n-1 is unbiased.

' src=

March 3, 2018 at 3:19 am

Hello Sir, Thank you so much for this post. Last year i had faced a huge problem in my paper due to my lack of attention towards variation measurement. This post has cleared some of my confusion. Thank you so much. Keep up the good work. Your explanation is easier to understand for a non statistics/math student like me. I will explore your other blog posts.

March 3, 2018 at 10:08 pm

Hi Mahmudul, you’re very welcome! I’m very glad that my blog posts have helped you. And, thank you for taking the time to write such a nice comment. It means a lot to me!

' src=

March 2, 2018 at 3:09 pm

Great, thank you Jim I’ll look forward for that article.

' src=

March 2, 2018 at 2:50 pm

As usual Jim, very clear explanation. Thank you!

March 2, 2018 at 2:53 pm

Thanks, Roy!

March 2, 2018 at 11:43 am

Thank you so much for taking time to post these awesome articles. I have never seen statistics article as intuitive as you post. Thank you for taking time to do this.

I have a request, could you please post an article on samples sizes and details around when do we need to take type-I error (alpha) and type-II error (beta) into consideration for both mean and proportions case. Please make sure to include formulas as well.

Thank you, Ashwini

March 2, 2018 at 3:04 pm

Hi Ashwini, thanks so much! I strive to make the articles as intuitive as possible.

That sounds like a great idea for an article. For the time being, I’ve written about some of the issues you mention but not all together in one place. Take a look at the following:

Comparing Hypothesis Tests –where I cover various tests including those for the mean and proportions and a little about the differences in required samples sizes.

As for your question about the significance level (alpha), I write about significance levels and p-values . Whenever you perform a hypothesis test, you need to worry about alpha. Beta is important too, but harder to quantify. I need to write about beta specifically at some point!

In regards to your question about sample sizes, sometime in March I’ll write about power and sample size analyses.

I hope this helps for now!

' src=

March 2, 2018 at 9:07 am

Hi. You did explain why the (n-1) denominator in sample variance is used. My question is why do many social science and education textbooks not use the (n-1) in descriptive statistic calculations?

March 2, 2018 at 10:21 am

Hi Jerry, the sample variance formula is used when using a sample to estimate a population. With descriptive statistics, the goal is not to estimate a population but only to describe the data in hand. In a sense, you’re treating that sample as a population and not using it to estimate one. At least that’s what I’m guessing the textbook’s rationale is based on your description. However, once analysts decide they want to estimate the population parameter, they should use the sample variance equation.

' src=

March 2, 2018 at 7:07 am

Thnk sir …. Love u 2 much I wll b waiting fr ur next article. ..nd u hve e helps me a lot

March 2, 2018 at 10:12 am

Thanks you, Khursheed! I appreciate your kind words! And, I’m glad that they have been helpful for you!

Comments and Questions Cancel reply

Article Categories

Book categories, collections.

  • Academics & The Arts Articles
  • Math Articles
  • Statistics Articles

How to Interpret Standard Deviation in a Statistical Data Set

Statistics for dummies.

Book image

Sign up for the Dummies Beta Program to try Dummies' newest way to learn.

The standard deviation measures how concentrated the data are around the mean; the more concentrated, the smaller the standard deviation.

But in situations where you just observe and record data, a large standard deviation isn’t necessarily a bad thing; it just reflects a large amount of variation in the group that is being studied.

For example, if you look at salaries for everyone in a certain company, including everyone from the student intern to the CEO, the standard deviation may be very large. On the other hand, if you narrow the group down by looking only at the student interns, the standard deviation is smaller, because the individuals within this group have salaries that are similar and less variable. The second data set isn’t better, it’s just less variable.

Similar to the mean, outliers affect the standard deviation (after all, the formula for standard deviation includes the mean). Here’s an example: the salaries of the L.A. Lakers in the 2009–2010 season range from the highest, $23,034,375 (Kobe Bryant) down to $959,111 (Didier Ilunga-Mbenga and Josh Powell). Lots of variation, to be sure!

The standard deviation of the salaries for this team turns out to be $6,567,405; it’s almost as large as the average. However, as you may guess, if you remove Kobe Bryant’s salary from the data set, the standard deviation decreases because the remaining salaries are more concentrated around the mean. The standard deviation becomes $4,671,508.

Here are some properties that can help you when interpreting a standard deviation:

The standard deviation can never be a negative number, due to the way it’s calculated and the fact that it measures a distance (distances are never negative numbers).

The smallest possible value for the standard deviation is 0, and that happens only in contrived situations where every single number in the data set is exactly the same (no deviation).

The standard deviation is affected by outliers (extremely low or extremely high numbers in the data set). That’s because the standard deviation is based on the distance from the mean . And remember, the mean is also affected by outliers.

The standard deviation has the same units of measure as the original data. If you're talking about inches, the standard deviation will be in inches.

About This Article

This article is from the book:.

  • Statistics For Dummies ,

About the book author:

Deborah J. Rumsey , PhD, is an Auxiliary Professor and Statistics Education Specialist at The Ohio State University. She is the author of Statistics For Dummies, Statistics II For Dummies, Statistics Workbook For Dummies, and Probability For Dummies.

This article can be found in the category:

  • Statistics ,
  • Statistics For Dummies Cheat Sheet
  • Checking Out Statistical Confidence Interval Critical Values
  • Handling Statistical Hypothesis Tests
  • Statistically Figuring Sample Size
  • Surveying Statistical Confidence Intervals
  • View All Articles From Book

Statology

6 Examples of Using Standard Deviation in Real Life

The standard deviation is used to measure the spread of values in a dataset.

Individuals and companies use standard deviation all the time in different fields to gain a better understanding of datasets.

The following examples explain how the standard deviation is used in different real life scenarios.

Example 1: Standard Deviation in Weather Forecasting

Standard deviation is widely used in weather forecasting to understand how much variation exists in daily and monthly temperatures in different cities.

For example:

  • A weatherman who works in a city with a small standard deviation in temperatures year-round can confidently predict what the weather will be on a given day since temperatures don’t vary much from one day to the next.
  • A weatherman who works in a city with a high standard deviation in temperatures will be less confident in his predictions because there is much more variation in temperatures from one day to the next.

Example 2: Standard Deviation in Healthcare

Standard deviation is widely used by insurance analysts and actuaries in the healthcare industry.

  • Insurance analysts often calculate the standard deviation of the age of the individuals they provide insurance for so they can understand how much variation exists among the age of individuals they provide insurance for.
  • Actuaries calculate standard deviation of healthcare usage so they can know how much variation in usage to expect in a given month, quarter, or year.

Example 3: Standard Deviation in Real Estate

Standard deviation is a metric that is used often by real estate agents.

  • Real estate agents calculate the standard deviation of house prices in a particular area so they can inform their clients of the type of variation in house prices they can expect.
  • Real estate agents also calculate the standard deviation of the square footage of house prices in certain areas so they can inform their clients on what type of variation to expect in terms of square footage of houses in a particular area.

Example 4: Standard Deviation in Human Resources

Standard deviation is often used by individuals who work in Human Resource departments at companies.

  • Human Resource managers often calculate the standard deviation of salaries in a certain field so that they can know what type of variation in salaries to offer to new employees.

Example 5: Standard Deviation in Marketing

Standard deviation is often used by marketers to gain an understanding of how their advertisements perform.

  • Marketers often calculate the standard deviation of revenue earned per advertisement so they can understand how much variation to expect in revenue from a given ad.
  • Marketers also calculate the standard deviation of the number of ads used by competitors to understand whether or not competitors are using more or less ads than normal during a given period.

Example 6: Standard Deviation in Test Scores

Standard deviation is used by professors at universities to calculate the spread of test scores among students.

  • Professors can calculate the standard deviation of test scores on a final exam to better understand whether most students score close to the average or if there is a wide spread in test scores.
  • Professors can also calculate the standard deviation of test scores for multiple classes to understand which classes had the highest variation in test scores among students.

Additional Resources

The following tutorials offer more details on how standard deviation is used in real life.

Why is Standard Deviation Important? (Explanation + Examples) What is Considered a Good Standard Deviation? Range vs. Standard Deviation: When to Use Each

Featured Posts

standard deviation use in analysis of research data

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

One Reply to “6 Examples of Using Standard Deviation in Real Life”

How standard deviation is used in chemistry?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

  • Search Search Please fill out this field.

What Is Standard Deviation?

  • How It Works
  • Key Properties

Standard Deviation vs. Variance

  • Uses in Business
  • Strengths and Limitations

The Bottom Line

  • Corporate Finance
  • Financial Ratios

Standard Deviation Formula and Uses vs. Variance

standard deviation use in analysis of research data

Amanda Bellucco-Chatham is an editor, writer, and fact-checker with years of experience researching personal finance topics. Specialties include general financial planning, career development, lending, retirement, tax preparation, and credit.

Standard deviation is a statistical measurement that looks at how far individual points in a dataset are dispersed from the mean of that set. If data points are further from the mean, there is a higher deviation within the data set. It is calculated as the square root of the variance .

Key Takeaways:

  • Standard deviation measures the dispersion of a dataset relative to its mean.
  • It is calculated as the square root of the variance.
  • Standard deviation, in finance, is often used as a measure of the relative riskiness of an asset.
  • A volatile stock has a high standard deviation, while the deviation of a stable blue-chip stock is usually rather low.
  • Standard deviation is also used by businesses to assess risk, manage business operations, and plan cash flows based on seasonal changes and volatility.

Standard Deviation

How standard deviation works.

Standard deviation is a statistical measurement that is often used in finance, particularly in investing. When applied to the annual rate of return of an investment, it can provide information on that investment's historical volatility . This means that it shows how much the price of that investment has fluctuated over time.

The greater the standard deviation of securities, the greater the variance between each price and the mean, which shows a larger price range. For example, a volatile stock has a high standard deviation, meaning that its price goes up and down frequently. The standard deviation of a stable blue-chip stock, on the other hand, is usually rather low, meaning that its price is usually stable.

Standard deviation can also be used to predict performance trends. In investing, for example, an index fund is designed to replicate a benchmark index. This means that the fund will have a low standard deviation from the value of the benchmark.

On the other hand, aggressive growth funds often have a high standard deviation from relative stock indices. This is because their portfolio managers make aggressive bets to generate higher-than-average returns. This higher standard deviation correlates with the level of risk investors can expect from that index.

Standard deviation is one of the key fundamental risk measures that analysts, portfolio managers, and advisors use. Investment firms report the standard deviation of their mutual funds and other products. A large dispersion shows how much the return on the fund is deviating from the expected normal returns. Because it is easy to understand, this statistic is regularly reported to the end clients and investors.

Standard deviation calculates all uncertainty as risk, even when it’s in the investor's favor—such as above-average returns.

Standard Deviation Formula

Standard deviation is calculated by taking the square root of a value derived from comparing data points to a collective mean of a population. The formula is:

Standard Deviation = ∑ i = 1 n ( x i − x ‾ ) 2 n − 1 where: x i = Value of the  i t h  point in the data set x ‾ = The mean value of the data set n = The number of data points in the data set \begin{aligned} &\text{Standard Deviation} = \sqrt{ \frac{\sum_{i=1}^{n}\left(x_i - \overline{x}\right)^2} {n-1} }\\ &\textbf{where:}\\ &x_i = \text{Value of the } i^{th} \text{ point in the data set}\\ &\overline{x}= \text{The mean value of the data set}\\ &n = \text{The number of data points in the data set} \end{aligned} ​ Standard Deviation = n − 1 ∑ i = 1 n ​ ( x i ​ − x ) 2 ​ ​ where: x i ​ = Value of the  i t h  point in the data set x = The mean value of the data set n = The number of data points in the data set ​

Calculating Standard Deviation

Standard deviation is calculated as follows:

  • Calculate the mean of all data points. The mean is calculated by adding all the data points and dividing them by the number of data points.
  • Calculate the variance for each data point. The variance for each data point is calculated by subtracting the mean from the value of the data point.
  • Square the variance of each data point (from Step 2).
  • Sum of squared variance values (from Step 3).
  • Divide the sum of squared variance values (from Step 4) by the number of data points in the data set less 1.
  • Take the square root of the quotient (from Step 5).

Key Properties of Standard Deviation

One key property of standard deviation is additivity. This means that the standard deviation of a sum of random variables. This means that analysts or researchers using standard deviation are comparing many data points, rather than drawing conclusions based on only analyzing single points of data, which leads to a higher degree of accuracy.

Another property of standard deviation is scale invariance. This is particularly useful in comparing the variability of datasets with different units of measurement. For example, if one dataset is measured in inches and another in centimeters, their standard deviations can still be compared directly without needing to convert units.

Last, standard deviation has properties of symmetry and non-negativity. This means a standard deviation is always positive and symmetrically distributed around the mean. This symmetry property implies that deviations above the mean are balanced by deviations below the mean, resulting in a total balance of the entire data set. The property of always being positive means a standard deviation has a higher degree of comparability when looking at standard deviations across data sets.

Variance and standard deviation are related statistics. Variance is derived by taking the mean of the data points, subtracting the mean from each data point individually, squaring each of these results, and then taking another mean of these squares. Standard deviation is the square root of the variance.

Variance helps determine the data's spread size when compared to the mean value. As the variance gets bigger, more variation in data values occurs, and there may be a larger gap between one data value and another. If the data values are all close together, the variance will be smaller. However, this is more difficult to grasp than the standard deviation because variances represent a squared result that may not be meaningfully expressed on the same graph as the original dataset.

Standard deviations are usually easier to picture and apply. The standard deviation is expressed in the same unit of measurement as the data, which isn't necessarily the case with the variance. Using the standard deviation, statisticians may determine if the data has a normal curve or other mathematical relationship.

If the data behaves in a normal curve, then 68% of the data points will fall within one standard deviation of the average, or mean, data point. Larger variances cause more data points to fall outside the standard deviation. Smaller variances result in more data that is close to average.

The standard deviation is graphically depicted as a bell curve's width around the mean of a data set. The wider the curve, the larger a data set's standard deviation from the mean.

How Standard Deviation Is Used in Business

Standard deviation isn't only used in investing. Business analysts or companies can use standard deviation in a variety of ways to assess risk, make predictions, and manage company operations.

Risk Management

Standard deviation is widely used in business for risk management . It helps businesses quantify and manage various types of risks. By calculating the standard deviation of certain outcomes, businesses can assess the volatility or uncertainty associated with how they operates. For example, a company can use standard deviation to measure the risk of different products being returned.

Financial Analysis

In finance and accounting, standard deviation is used to analyze financial data and assess the variability of financial performance metrics. For example, standard deviation is employed to measure the volatility of investment returns. This can used to determine risk-return tradeoffs and the strategy of how a company wants to deploy capital.

Forecasting

Standard deviation is used in sales forecasting to assess the variability of sales data and predict future sales trends. Standard deviation helps businesses identify seasonality, trends, and patterns in sales data that allow them to plan for cash needs in the near future.

Quality Control

In manufacturing and operations management, standard deviation is used to monitor and improve product quality. Standard deviation is also used in quality control processes such as Six Sigma methodologies to measure process capability, reduce defects, and optimize manufacturing processes for improved quality and customer satisfaction.

Project Management

Standard deviation is used in project management to assess project performance and manage risks. For example, standard deviation can be used related to critical path analysis and earned value. It can used to gauge variances, track progress, and quantify risk related to a critical path or earned value not being achieved.

Strengths and Limitations of Standard Deviation

Like any statistical measurement for analyzing data, standard deviation has both strengths and limitations that should be considered before it is used.

Commonly used

Includes all data points

Can combine datasets

Further computational uses

Doesn't measure dispersion

Impact of outliers

Difficult to calculate manually

  • Commonly used : Standard deviation is a commonly used measure of dispersion. Many analysts are probably more familiar with standard deviation than compared to other statistical calculations of data deviation. For this reason, the standard deviation is used by a variety of professions, from investors to actuaries .
  • Includes all data points : Standard deviation is all-inclusive of observations. Each data point is included in the analysis. Other measurements of deviation such as range only measure the most dispersed points without consideration for the points in between. Therefore, standard deviation is often considered a more robust, accurate measurement compared to other observations.
  • Can combine datasets : The standard deviation of two data sets can be combined using a specific combined standard deviation formula. There are no similar formulas for other dispersion observation measurements in statistics.
  • Further computational uses : Unlike other means of observation, the standard deviation can be used in further algebraic computations, meaning there's some versatility to standard deviation.

Limitations

  • Doesn't measure dispersion : The standard deviation does not actually measure how far a data point is from the mean. Instead, it compares the square of the differences, a subtle but notable difference from actual dispersion from the mean.
  • Impact of outliers : Outliers have a heavier impact on standard deviation. This is especially true considering the difference from the mean is squared, resulting in an even larger quantity compared to other data points. Therefore, be mindful that standard observation naturally gives more weight to extreme values.
  • Difficult to calculate manually : As opposed to other measurements of dispersion such as range (the highest value minus the lowest value), standard deviation requires several cumbersome steps and is more likely to incur computational errors compared to easier measurements. This hurdle can be circumnavigated through the use of a Bloomberg terminal .

Excel can be used to calculate standard deviation. After entering your data, use the STDEV.S formula if your data set is numeric or the STDEVA when you want to include text or logical values. There are also several specific formulas to calculate the standard deviation for an entire population.

Examples of Standard Deviation

If you have the data points 5, 7, 3, and 7 and want to find the standard deviation, start by adding them together:

5 + 7 + 3 + 7 = 22

Find the mean of the dataset by dividing the total by the number of data points (in this case, 4).

22 / 4 = 5.5

This gives you x̄ = 5.5 and N = 4.

To find the variance, subtract the mean value from each data point, then square each of those values:

5 - 5.5 = -0.5 x -0.5 = 0.25
7 - 5.5 = 1.5 x 1.5 = 2.25
3 - 5.5 = -2.5 x -2.5 = 6.25

Add the square values, then divide the result by N-1 to give the variance.

(0.25 + 2.25 + 6.25 + 2.25) / (4-1) = 3.67

Take the square root of the 3.67 to find the standard deviation, which is approximately 1.915.

Or consider shares of Apple (AAPL) over five years. Historical returns for Apple’s stock were 88.97% for 2019, 82.31% for 2020, 34.65% for 2021, -26.41% for 2022 and 28.32% in April 2023. The average return over the five years was thus 41.57%.

The value of each year's return minus the mean were then 47.40%, 40.74%, -6.92%, -67.98%, and -15.57%, respectively. All those values are then squared to yield 22.47%, 16.60%, 0.48%, 46.21%, and 2.42%. The sum of these values is 0.882. Divide that value by 4 (N minus 1) to get the variance (0.882/4) = 0.220.

The square root of the variance is taken to obtain the standard deviation of 0.4690, or 46.90%.

What Does a High Standard Deviation Mean?

A large standard deviation indicates that there is a lot of variance in the observed data around the mean. This indicates that the data observed is quite spread out. A small or low standard deviation would indicate instead that much of the data observed is clustered tightly around the mean.

What Does Standard Deviation Tell You?

Standard deviation describes how dispersed a set of data is. It compares each data point to the mean of all data points, and standard deviation returns a calculated value that describes whether the data points are in close proximity or whether they are spread out. In a normal distribution, standard deviation tells you how far values are from the mean.

How Do You Find the Standard Deviation Quickly?

If you look at the distribution of some observed data visually, you can see if the shape is relatively skinny vs. fat. Fatter distributions have bigger standard deviations. Alternatively, Excel has built-in standard deviation functions depending on the data set.

Is Lower Standard Deviation Better In Investing?

A lower standard deviation isn't necessarily better. It indicates more risk, which investors may or may not prefer. When assessing the amount of deviation in their portfolios, investors should consider their tolerance for volatility and their overall investment objectives. More aggressive investors may be comfortable with an investment strategy that opts for vehicles with higher-than-average volatility, while more conservative investors may not.

Standard deviation is a way to assess risk, especially in business and investing. It uses the distance of points in a dataset from the mean of that dataset to find how dispersed the set is, and thus, how volatile it tends to be over time.

Investors can use standard deviation to determine how stable or predictable an investment is likely to be. Businesses use standard deviation or assess risk, manage operations, and plan cash flows. Like any other statistical measurement, standard deviation has strengths and limitations, which should be taken into account when it is used.

Netcials. " Apple Inc (AAPL) Stock 5 Years History ."

standard deviation use in analysis of research data

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices
  • Words with Friends Cheat
  • Wordle Solver
  • Word Unscrambler
  • Scrabble Dictionary
  • Anagram Solver
  • Wordscapes Answers

Make Our Dictionary Yours

Sign up for our weekly newsletters and get:

  • Grammar and writing tips
  • Fun language articles
  • #WordOfTheDay and quizzes

By signing in, you agree to our Terms and Conditions and Privacy Policy .

We'll see you in your inbox soon.

Examples of Standard Deviation and How It’s Used

standard deviation graph example

  • DESCRIPTION standard deviation example
  • SOURCE Created by Lindy Gaskill for YourDictionary
  • PERMISSION Owned by YourDictionary, Copyright YourDictionary

Standard deviation is a statistical measurement of the amount a number varies from the average number in a series. A low standard deviation means that the data is very closely related to the average, thus very reliable. A high standard deviation means that there is a large variance between the data and the statistical average, and is not as reliable. Keep reading for standard deviation examples and the different ways it appears in daily life.

Calculating Standard Deviation

Standard deviation measures how far results spread from the average value . You can find the standard deviation by finding the square root of the variance , and then squaring the differences from the mean. If you’re wondering, “What is the formula for standard deviation?” it looks like this:

calculating standard deviation example

  • DESCRIPTION calculating standard deviation example

In order to determine standard deviation:

  • Determine the mean (the average of all the numbers) by adding up all the data pieces ( xi ) and dividing by the number of pieces of data ( n ).
  • Subtract the mean ( x̄ ) from each value.
  • Square each of those differences.
  • Determine the average of the squared numbers calculated in #3 to find the variance. (In sample sizes, subtract 1 from the total number of values when finding the average.)
  • Find the square root of the variance. That’s the standard deviation!

For example: Take the values 2, 1, 3, 2 and 4.

1. Determine the mean (average):

2 + 1 +3 + 2 + 4 = 12 12 ÷ 5 = 2.4 (mean)

2. Subtract the mean from each value:

2 - 2.4 = -0.4 1 - 2.4 = -1.4 3 - 2.4 = 0.6 2 - 2.4 = -0.4 4 - 2.4 = 1.6

3. Square each of those differences:

-0.4 x -0.4 = 0.16 -1.4 x -1.4 = 1.96 0.6 x 0.6 = 0.36 -0.4 x -0.4 = 0.16 1.6 x 1.6 = 2.56

4. Determine the average of those squared numbers to get the variance.

0.16 + 1.96 + 0.36 + 0.16 + 2.56 = 5.2 5.2 ÷ 5 = 1.04 (variance)

5. Find the square root of the variance.

Square root of 1.04 = 1.01

The standard deviation of the values 2, 1, 3, 2 and 4 is 1.01.

Examples of Standard Deviation

Unless you’re sitting in a statistics class, you may think that standard deviation doesn’t affect your everyday life. But you’d be wrong! Even though most statisticians calculate standard deviation with computer programs and spreadsheets, it’s helpful to know how to do it by hand.

Here are some examples of situations that demonstrate how standard deviation is used.

Grading Tests

A class of students took a math test. Their teacher wants to know whether most students are performing at the same level, or if there is a high standard deviation.

1. The scores for the test were 85, 86, 100, 76, 81, 93, 84, 99, 71, 69, 93, 85, 81, 87, and 89. When the teacher adds them together, she gets 1279. She divides by the number of scores (15) to get the mean score.

1279 ÷ 15 = 85.2 (mean)

2. 85.2 is a high score, but is everyone performing at that level? To find out, the teacher subtracts the mean from every test score.

85 - 85.2 = -0.2 86 - 85.2 = 0.8 100 - 85.2 = 14.8 76 - 85.2 = -9.2 81 - 85.2 = -4.2 93 - 85.2 = 7.8 84 - 85.2 = -1.2 99 - 85.2 = 13.8 71 - 85.2 = -14.2 69 - 85.2 = -16.2 93 - 85.2 = 7.8 85 - 85.2 = - 0.2 81 - 85.2 = -4.2 87 - 85.2 = 1.8 89 - 85.2 = 3.8

3. She squares each difference:

-0.2 x -0.2 = 0.04 0.8 x 0.8 = 0.64 14.8 14.8 = 219.04 -9.2 x -9.2 = 84.64 -4.2 x -4.2 = 17.64 7.8 x 7.8 = 60.84 -1.2 x -1.2 = 1.44 13.8 x 13.8 = 190.44 -14.2 x -14.2 = 201.64 -16.2 x -16.2 = 262.44 7.8 x 7.8 = 60.84 -0.2 x -0.2 = 0.04 -4.2 x -4.2 = 17.64 1.8 x 1.8 = 3.24 3.8 x 3.8 = 14.44

4. The teacher finds the variance, which is the average of the squares:

0.04 + 0.64 + 219.04 + 84.64 + 17.64 + 60.84 +1.44 +190.44 +201.64 +262.44 + 60.84 + 0.04 + 17.64 + 3.24 + 14.44 = 1135 830.64 ÷ 15 = 75.6 (variance)

5. Last, the teacher finds the square root of the variance:

Square root of 75.6 = 8.7 (standard deviation)

The standard deviation of these tests is 8.7 points out of 100. Since the variance is somewhat low, the teacher knows that most students are performing around the same level.

Results of a Survey

A market researcher is analyzing the results of a recent customer survey that ranks a product from 1 to 10. He wants to have some measure of the reliability of the answers received in the survey in order to predict how a larger group of people might answer the same questions.

Because this is a sample size, the researcher needs to subtract 1 from the total number of values in step 4.

  • The scores for the survey are 9, 7, 10, 8, 9, 7, 8, and 9. The mean is 8.4.
  • The researcher subtracts the mean from every score (differences: 0.6, -1.4, 1.6, -0.4, 0.6, -1.4, -0.4, 0.6).
  • He squares each number (0.36, 1.96, 2.56, 0.16, 0.36, 1.96, 0.16, 0.36).
  • Because this is a sample of responses, the researcher subtracts one from the number of values (8 values -1 = 7) to average squares and find the variance: 1.12 (variance)
  • Last, the researcher finds the square root of the variance: 1.06 (standard deviation)

The standard deviation is 1.06, which is somewhat low. The researcher now knows that the results of the sample size are probably reliable.

Weather Forecasting

You can also use standard deviation to compare two sets of data. For example, a weather reporter is analyzing the high temperature forecasted for two different cities. A low standard deviation would show a reliable weather forecast.

  • The reporter compares a week of high temperatures ( in Fahrenheit ) in two different seasons. The data looks like this:

Monday

95

90

Tuesday

93

81

Wednesday

95

95

Thursday

94

91

Friday

96

86

Saturday

94

82

Sunday

95

78

The mean temperature for City A is 94.6 degrees, and the mean for City B is 86.1 degrees.

  • The reporter subtracts City A’s mean from every City A temperature (differences: 0.4, -1.6, 0.4, -0.6, 1.4, -0.6, 0.4).
  • He squares each number (0.16, 2.56, 0.16, 0.36, 1.96, 0.36, 0.16).
  • He averages the squares and finds the variance: 0.8 (variance)
  • The reporter finds the square root of the variance for City A: 0.89 (standard deviation) .
  • Next, the reporter subtracts City B’s mean from its temperatures (3.9, -5.1, 8.9, 4.9, -0.1, -4.1, -8.1)
  • He squares them (15.21, 26.01, 79.21, 24.01, 0.01, 16.81, 65.61).
  • The average of these squares is 32.41 (variance) .
  • The square root is 5.7 (standard deviation) .
  • City A’s standard deviation is 0.89 degrees , while City B’s standard deviation is 5.7 degrees . City A’s forecasts are more reliable than City B’s forecasts.

Now you see how standard deviation works. Even if you usually perform standard deviation equations on a calculator or spreadsheet formula, it’s good to see how the math works step by step.

Playing the Odds of Mathematics

Standard deviation is an important part of any statistical analysis. But it’s just one part of a wider study that includes probability exercises as well. Check out these examples of probability to further increase your mathematical understanding. You can also apply standard deviation to these random sampling exercises .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 September 2024

Blood metabolites, neurocognition and psychiatric disorders: a Mendelian randomization analysis to investigate causal pathways

  • Jing Guo   ORCID: orcid.org/0000-0003-3662-7254 1   na1 ,
  • Ping Yang 2   na1 ,
  • Jia-Hao Wang 1 ,
  • Shi-Hao Tang 1 ,
  • Ji-Zhou Han 1 ,
  • Shi Yao   ORCID: orcid.org/0000-0002-4574-8106 3 ,
  • Cong-Cong Liu 1 ,
  • Shan-Shan Dong 1 ,
  • Kun Zhang 1 ,
  • Yuan-Yuan Duan 1 ,
  • Tie-Lin Yang 1 &
  • Yan Guo   ORCID: orcid.org/0000-0002-7364-2392 1  

Translational Psychiatry volume  14 , Article number:  376 ( 2024 ) Cite this article

Metrics details

  • Neuroscience

Psychiatric disorders

Neurocognitive dysfunction is observationally associated with the risk of psychiatric disorders. Blood metabolites, which are readily accessible, may become highly promising biomarkers for brain disorders. However, the causal role of blood metabolites in neurocognitive function, and the biological pathways underlying their association with psychiatric disorders remain unclear.

To explore their putative causalities, we conducted bidirectional two-sample Mendelian randomization (MR) using genetic variants associated with 317 human blood metabolites ( n max  = 215,551), g-Factor (an integrated index of multiple neurocognitive tests with n max  = 332,050), and 10 different psychiatric disorders ( n  = 9,725 to 807,553) from the large-scale genome-wide association studies of European ancestry. Mediation analysis was used to assess the potential causal pathway among the candidate metabolite, neurocognitive trait and corresponding psychiatric disorder.

MR evidence indicated that genetically predicted acetylornithine was positively associated with g-Factor (0.035 standard deviation units increase in g-Factor per one standard deviation increase in acetylornithine level; 95% confidence interval, 0.021 to 0.049; P  = 1.15 × 10 −6 ). Genetically predicted butyrylcarnitine was negatively associated with g-Factor (0.028 standard deviation units decrease in g-Factor per one standard deviation increase in genetically proxied butyrylcarnitine; 95% confidence interval, −0.041 to −0.015; P  = 1.31 × 10 −5 ). There was no evidence of associations between genetically proxied g-Factor and metabolites. Furthermore, the mediation analysis via two-step MR revealed that the causal pathway from acetylornithine to bipolar disorder was partly mediated by g-Factor, with a mediated proportion of 37.1%. Besides, g-Factor mediated the causal pathway from butyrylcarnitine to schizophrenia, with a mediated proportion of 37.5%. Other neurocognitive traits from different sources provided consistent findings.

Our results provide genetic evidence that acetylornithine protects against bipolar disorder through neurocognitive abilities, while butyrylcarnitine has an adverse effect on schizophrenia through neurocognition. These findings may provide insight into interventions at the metabolic level for risk of neurocognitive and related disorders.

Similar content being viewed by others

standard deviation use in analysis of research data

Genome-wide association study accounting for anticholinergic burden to examine cognitive dysfunction in psychotic disorders

standard deviation use in analysis of research data

Brain tissue- and cell type-specific eQTL Mendelian randomization reveals efficacy of FADS1 and FADS2 on cognitive function

standard deviation use in analysis of research data

Genome-wide interaction study with major depression identifies novel variants associated with cognitive function

Introduction.

Although neurocognitive abilities are considered indispensable in the assessment of psychiatric disorders [ 1 , 2 ], the pathogenesis underlying this relationship has not been well established. Observational clinical evidence, for example, has suggested that neurocognitive impairment occurs prior to the onset of schizophrenia and exacerbate following the episode [ 3 , 4 ]. Neurocognitive deficits may be inherent to psychiatric disorders, independent of other psychotic symptom domains [ 5 ]. Medical guidelines have recommended cognitive remediation as a therapeutic strategy for psychiatric patients [ 6 ]. The effectiveness of antipsychotics to improve neurocognition in patients with psychiatric disorders is controversial. There are studies showing that taking antipsychotics such as clozapine, olanzapine, and aripiprazole significantly improves cognitive performance in psychotic patients [ 7 , 8 ], but not all antipsychotics have a uniform positive cognitive profile [ 9 , 10 ]. This inconsistency in neurocognition is likely due to the varying degrees of metabolic discrepancies induced by antipsychotics [ 11 ]. For instance, antipsychotic medications have been associated with disrupted lipid metabolism [ 12 , 13 ], with concentrations of these lipid metabolites shown to correlate with cognitive functions such as verbal memory and processing speed [ 14 ]. The use of anticholinergic medication may adversely affect cognitive performance in patients with schizophrenia [ 15 ]. Antipsychotic medications can have a negative impact on cognitive processes by increasing the occupancy of dopamine D2 receptors [ 16 ]. It implies that the metabolites could serve as potentially modifiable therapeutic targets whose regulation may yield better clinical effects.

Metabolites are promising biomarkers reflecting biological and physiological processes [ 17 ]. Several studies have shown that metabolic abnormalities may worsen cognitive impairments in both the general population and individuals with psychiatric disorders [ 18 , 19 , 20 , 21 , 22 , 23 , 24 ]. For example, in untargeted metabolomics research with elderly subjects, β-cryptoxanthin plasma levels were associated with improved cognitive function, while N-acetylisoleucine and tyramine O-sulfate concentrations were linked to poorer cognitive function [ 19 ]. Metabolic syndrome, characterized by abnormal serum glucose and dyslipidemia, negatively impacts memory and executive function [ 21 ]. Docosahexaenoic acid plays a vital role in brain development and cognitive function from pregnancy to childhood [ 25 ]. Elevated kynurenine levels in brain parenchyma caused by peripheral inflammation are associated with depression and schizophrenia risk [ 26 ]. Sarcosine supplementation to antipsychotics can improves cognitive symptoms in patients with schizophrenia [ 27 ]. High triglyceride levels in female patients with major depressive disorder may lead to decreased neurocognitive functions in terms of memory, language, and attention [ 28 , 29 ]. Some randomized controlled trials (RCTs) support the impact of N-acetylcysteine [ 30 ], vitamin D3 [ 31 ], folic acid [ 32 , 33 ], choline and betaine [ 34 ] on cognitive function in individuals with psychiatric disorders. However, the causal relationships between metabolites and neurocognitive function remain unclear, and the pathways involved in such effects during pathology require further investigation.

Mendelian Randomization (MR) is an alternative method that using genetic variants robustly associated with the exposure as instrumental variables to uncover the potential causal effect of an exposure on an outcome [ 35 ]. While RCTs are still the gold standard for causal inference when properly designed, MR methods can use observational data to provide causal estimates given certain assumptions are met. Additionally, MR methods offer advantages in terms of sample size, study duration, and economic costs. With the accessibility of data from large-scale genome-wide association studies (GWASs), it provides an opportunity to explore causal associations between human blood metabolites, neurocognitive traits, and psychiatric disorders using two-sample MR studies [ 36 ].

In this study, we collected human blood metabolites with the benefits of small-molecule permeability, heritability and detectability. Given the high correlation among diverse neurocognitive domains, a general cognitive factor score (g-Factor), also known as a general intelligence, can be obtained statistically by modeling multiple neurocognitive specific tests [ 37 ]. Using bidirectional two-sample MR analysis, we evaluated which of the 317 human blood metabolites had a putative causal relationship with g-Factor. Furthermore, we performed mediation analysis to investigate the causal pathways mediated by g-Factor from risk metabolite to 10 different psychiatric disorders, such as schizophrenia, bipolar disorder, anorexia nervosa, attention deficit hyperactivity disorder (ADHD), major depressive disorder (MDD), autism spectrum disorder (ASD), posttraumatic stress disorder (PTSD), anxiety, obsessive–compulsive disorder (OCD) and Tourette syndrome. We also used cognitive traits from other sources to validate the putative causal pathways. Our findings may provide new insights into the prediction or improvement of neurocognitive decline in psychiatric disorders through the regulation of endogenous metabolites.

Materials and methods

Gwas data sources, blood metabolites.

Given that the study was based on summary-level data, we selected the human blood metabolites from the publicly available GWAS summary statistics [ 38 , 39 , 40 ]. Among these, the GWAS results with the largest sample size to date, including 174 metabolites and ranging from 8,569 to 86,507 individuals, were published by Lotta et al. [ 38 ]. In addition, Klarin et al. [ 40 ] provided data on four lipid cholesterol classes, derived from more than 200,000 participants in the US Million Veteran Program (MVP) database. For lipid metabolism, we supplemented the study with five selected data points from Kettunen et al. [ 39 ], with sample sizes ranging from 13,476 to 24,871. To enrich the dataset, we included 134 metabolites from the smaller-scaled metabolic data published by Shin et al. [ 41 ], with sample sizes ranging from 1,163 to 7,822. In total, 317 metabolites were collected in the study, which can be classified into six super-pathway-based categories: amino acids (67), lipids (208), carbohydrates (14), cofactors and vitamins (11), energy metabolites (6), and nucleotides (11). Samples for all metabolic data were exclusively of European ancestry, and detailed information can be found in Supplementary Table 1 .

Neurocognitive traits

The most recent publicly available GWAS summary statistics for neurocognitive traits and psychiatric disorders were obtained from individuals of European ancestry (Supplementary Table 2 ). A total of six neurocognitive traits were analyzed in this study. The data set included the following traits: g-Factor ( n  = 332,050) [ 42 ], intelligence ( n  = 269,867) [ 43 ], cognitive performance ( n  = 257,828) [ 44 ], general cognitive function ( n  = 282,014) [ 45 ], reaction time ( n  = 330,069) [ 45 ], and verbal numerical reasoning ( n  = 168,033) [ 45 ]. Among these traits, the g-Factor was the most discriminating. It represents a unified phenotype resulting from the integration of multiple neurocognitive tests [ 37 ] and is characterized by the largest sample size. Consequently, we selected g-Factor as the representative neurocognitive trait for our study, while utilizing the other neurocognitive traits from different sample sources to validate our findings. It is crucial to highlight that no instances of overlapping samples were observed between any metabolite and neurocognitive trait in our analysis.

We considered ten of the most prevalent psychiatric disorders known to impact cognitive function. These disorders include schizophrenia [ 46 ] ( n  = 130,644), bipolar disorder [ 47 ] ( n  = 413,466), anorexia nervosa [ 48 ] ( n  = 72,517), ADHD [ 49 ] ( n  = 53,293), MDD [ 50 ] ( n  = 807,553), ASD [ 51 ] ( n  = 46,350), PTSD [ 52 ] ( n  = 146,660), anxiety [ 53 ] ( n  = 17,310), OCD [ 54 ] ( n  = 9,725) and Tourette syndrome [ 55 ] ( n  = 14,307). To ensure that no overlap existed between the neurocognitive traits and psychiatric disorders within our sample participants, we employed external GWAS statistics for bipolar disorder [ 47 ] ( n  = 353,899), anorexia nervosa [ 48 ] ( n  = 68,684), and MDD [ 56 ] ( n  = 142,646) that did not include individuals from UK Biobank. Detailed information and the specific release link are provided in Supplementary Table 2 .

Overall study design

The overall study workflow is depicted in Fig. 1 . Prior to conducting MR analysis, we first utilized genetic correlation analysis to identify blood metabolites that are genetically associated with g-Factor. Subsequently, we performed bidirectional two-sample MR analyses between these blood metabolites and g-Factor to estimate potential causal relationships. Concurrently, we conducted genetic correlation and MR analyses between g-Factor and psychiatric disorders to gather genetic evidence of neurocognitive associations with these psychiatric disorders. In the final stage, we employed mediation analysis to explore potential causal pathways connecting the identified metabolites, g-Factor, and psychiatric disorders. To validate the putative causal pathways, we also incorporated other neurocognitive traits, including intelligence, cognitive performance, general cognitive function, reaction time, and verbal numerical reasoning. The additional traits were employed to further substantiate the observed associations and provide further corroboration for the putative causal relationships.

figure 1

Workflow of overall study design.

Genetic correlation

The LD-score regression software ( https://github.com/bulik/ldsc ) was employed to calculate the genetic correlation with the default parameters [ 57 ]. The reference variants were used from the HapMap3 dataset, excluding the major histocompatibility complex regions ( https://ibg.colorado.edu/cdrom2021/Day06-nivard/GenomicSEM_practical/eur_w_ld_chr/w_hm3.snplist ). Precalculated LD scores were used the 1 KG European reference panel ( https://ibg.colorado.edu/cdrom2021/Day06-nivard/GenomicSEM_practical/eur_w_ld_chr/ ).

Two sample MR analysis

Selection of instrument variants (ivs).

The MR analysis was performed in accordance with the previously described procedure [ 58 ], and in strict adherence to the STROBE-MR checklist [ 59 ]. In brief, SNPs with MAF > 0.01 and P value < 5 × 10 −8 were selected from the GWAS datasets. The palindromic SNPs were removed according to the default parameters of the “harmonise_data” function in TwoSampleMR R package (version 0.4.26, https://mrcieu.github.io/TwoSampleMR/ ) [ 60 ]. SNPs in the long-range LD regions ( https://genome.sph.umich.edu/wiki/Regions_of_high_linkage_disequilibrium_ (LD)#cite_note-3) were removed [ 61 ]. We then used 315,147 European UK Biobank data as LD reference genome to clump conditionally independent SNPs using PLINK software [ 62 ] (r 2  = 0.001, window size = 1 Mb and p- value = 5 × 10 −8 ). After obtaining SNPs independently associated with exposure, the selection of IVs is also subject to the following conditions: (i) no correlation with the outcome except through the exposure; (ii) if the SNPs are not present in the outcome, highly correlated proxy SNPs (r 2  > 0.8) can be selected to replace; and (iii) removing SNPs associated with confounders.

Removing confounders

We considered alcohol consumption and smoking as confounders affecting the relationship between blood metabolites [ 63 , 64 ], neurocognition [ 65 , 66 ], and psychiatric disorders [ 67 , 68 ]. We removed instrumental SNPs associated with alcohol- and smoking- related traits ( P  < 5 × 10 −8 ) by using the NHGRI GWAS catalog database [ 69 ] (v1.0.2-associtions_e104, release in 22 October 2021; https://www.ebi.ac.uk/gwas/docs/file-downloads ).

Heterogeneity, F-statistics and statistical power for IVs

We performed heterogeneity test for IVs using RadialMR R package (“ivw_radial” and “egger_radial” functions with default parameters, https://github.com/WSpiller/RadialMR/ ) [ 70 ], setting P values < 0.05 to filter out the outliers. We used the F-statistics to assess the strength of the IVs. The specific formula is F  =  (R 2   ×   (N-K-1))   /   ((1   –   R 2 )   ×   K) , R 2 denoted the explained variance of IVs on exposure ( R 2  =  β 2   /   (β 2  +  SE 2   ×   N) ), N denoted the sample size of exposure, K denoted the number of IVs, β 2 and SE 2 denoted the genetic effect size and standard error from GWAS data of exposure. A threshold of F  > 10 is usually used to indicate strong IVs. In addition, we estimated the statistical power of the IVs based on the sample size for each MR test according to the method proposed by Burgess [ 71 ]. We calculated the estimated effect size for each MR test with the 80% power at a significance level of 0.05.

Two-sample MR models

The primary method used to estimate causality was inverse variance weighted regression (IVW) [ 72 ]. To complement the IVW results, we used five additional MR models. These included MR-robust adjusted profile score (MR-RAPS) [ 73 ], weighted median [ 74 ], weighted mode [ 75 ], MR-Egger regression [ 76 ], and Wald ratio [ 77 ]. The Wald ratio is particularly applicable when there is only one genetic variant in the instrumental variable. To implement these methods, all of the above approaches can be invoked using the corresponding functions available in the TwoSampleMR R package (“mr_ivw”, “mr_raps”, “mr_weighted_median”, “mr_weighted_mode”, “mr_egger_regression”, and “mr_wald_ratio” with default parameters).

Sensitivity analyses

The objective of these analyses was to address potential concerns, such as outlier IVs, pleiotropy, and to assess the robustness of the causal hypothesis under different scenarios. Leave-one-out (LOO) analysis was used to ascertain the potential for an outlying IV. In the event that an outlier was identified, it was removed, and the subsequent IVs selection and MR tests were repeated accordingly. MR-PRESSO [ 78 ] (Mendelian randomization pleiotropy residual sum and outlier) global test was used to identify any horizontal pleiotropy in the MR test. We performed MR-Egger regression to assess whether the Egger intercept was close to zero, which would indicate the absence of potential pleiotropy [ 76 ].

Since the metabolites we analyzed for MR were genetically correlated with cognitive function, this could bias the causal results. A latent causal variable (LCV) model [ 79 , 80 ] and causal analysis using summary effect estimates (CAUSE) method [ 81 ] were constructed between each significantly correlated causal pair to estimate partial genetic causality. The LCV differs from other MR methods in that it does not offer a direct test for causal effects. In contrast, LCV assesses the proportion of each trait that is influenced by a shared factor, quantified as the posterior mean genetic causality proportion (GCP), with |GCP | > 0.6 considered as strong evidence of partial genetic causality. LCV model scripts are at https://github.com/lukejoconnor/LCV . The CAUSE method is employed to calculate the posterior probabilities of the causal effect, shared effect, and the proportion of variants that show correlated horizontal pleiotropy, known as the q value. The causal effect reflects how the variants influence the outcome through the exposure, whereas the shared effect indicates the presence of correlated horizontal pleiotropy. In the CAUSE method, we set all the parameters at their default values ( https://jean997.github.io/cause/ldl_cad.html ).

To address pleiotropy among metabolites, which poses a challenge for MR selection, we examined other metabolites associated with their IVs for the putative causal metabolites. We annotated the relevant genes of genetic instruments using annovar software ( http://www.openbioinformatics.org/annovar/annovar_download.html ) and expression quantitative trait loci (eQTL) data from whole blood samples in GTExV8 ( http://www.gtexportal.org ). This investigation aimed to identify IVs that are directly and unambiguously linked to the metabolites through molecules such as transporters or metabolizing enzyme. The MR effect size of each single SNP in the IVs was estimated using the function (“mr_singlesnp”) in the TwoSampleMR R package.

Mediation analysis

The principle of mediation analysis is to calculate the product of two-step MR coefficients [ 82 ]. The procedure is as follows:

Estimate the causal effect (β S1 ) of the exposure on the mediator.

Estimate the causal effect (β S2 ) of the mediator on the outcome.

Multiply these two estimates together to calculate the mediation effect ( \({\beta }_{M}={\beta }_{S1}\times {\beta }_{S2}\) ), also known as the indirect effect.

The standard error for the mediation effect is calculated using the following formula:

where SE S1 , SE S2 , β S1 , β S1 represent the standard errors and coefficients, respectively. The P value is then calculated from the standard normal distribution for a two-tailed test. The total effect refers to the effect of the exposure on the outcome directly through the MR analysis. The mediated proportion indicates the ratio of the indirect effect to the total effect. It is important to note that both the indirect effect and the total effect should be in the same direction. The MR Steiger test is employed to ascertain the absence of an inverse relationship between exposure and outcome, thereby ensuring the validity of the causal pathway hypothesis [ 82 , 83 ]. This test employs the “directionality_test” function within the TwoSampleMR R package.

Statistical analysis

In our genetic correlation analysis, we used a threshold of nominal p- values < 0.05 for metabolic-cognitive and cognitive-psychiatric pairs to assess potential causality. For specific MR analysis, we conducted Bonferroni correction to adjust for multiple tests. Consequently, to establish causal relationships between metabolites and the g-Factor, we set a significant p -value threshold at 7.89 × 10 −5 (0.05/317/2), where 317 represents the number of metabolites and 2 denotes forward and reverse MR tests. To determine causal relationships between the g-Factor and psychiatric disorders, the threshold was 2.50 × 10 −3 (0.05/10/2), with 10 representing the number of psychiatric disorders and 2 indicating bidirectional MR tests.

Upon analysis, we found causal relationships between two metabolites and three diseases and g-Factor. For mediation analysis, we used a significance threshold of 8.33 × 10 −3 (0.05/2/3). In tests for pleiotropy, such as MR-PRESSO, MR-Egger regression, CAUSE, LCV and additional MR models, a p -value < 0.05 indicated moderate support. All statistical tests, except for CAUSE, used two-tailed p -values; CAUSE used a one-tailed p -value to test whether the sharing model fit the data.

Genetic correlations between g-Factor, blood metabolites and psychiatric disorders

Before making any causal inferences, we used LD-score regression to examine whether there was a potential common genetic basis between g-Factor and 317 blood metabolites. We assessed their genetic correlations and identified 18 metabolites that showed nominal and weaker correlations with g-Factor (|r g  | ≤0.23, P  < 0.05) (Fig. 2A and Supplementary Table 3 ). Moreover, genetic correlation analyses between g-Factor and 10 psychiatric disorders revealed that seven of these disorders have potential genetic associations with g-Factor (Fig. 2B ). These included robust negative correlations with schizophrenia (r g  = −0.35, P  = 7.30 × 10 −70 ), PTSD (r g  = −0.44, P  = 3.40 × 10 −31 ), and bipolar disorder (r g  = −0.22, P  = 3.67 × 10 −25 ). MR analyses were then performed using these18 metabolites and seven disorders.

figure 2

A Genetic correlations between g-Factor and metabolites. B Genetic correlations between g-Factor and psychiatric disorders. Genetic correlation is estimated by LD score regression. The statistical tests were two-sided. P -value < 0.05 was considered significant. The asterisk represents that GWAS summary-level data contains samples from UK Biobank.

Causal inferences between blood metabolites and g-Factor

In a bidirectional two-sample MR analysis, we defined the causal effect of metabolite on g-Factor as a forward direction and vice versa. The SNPs associated with confounders such as alcohol consumption were excluded from the genetic instruments, as detailed in Supplementary Table 4 . For each of the MR tests with F -statistic values exceeding 10, the minimum effect size with sufficient statistical power exceeding 80% was calculated (Supplementary Tables 5 , 6 ).

In the forward MR results, two blood metabolites, acetylornithine and butyrylcarnitine, were identified as statistically significant putative causal factors for g-Factor (Fig. 3 and Supplementary Table 7 ). A 1-standard deviation (SD) increase in acetylornithine was found to be significantly associated with a 0.035 SD increase in g-Factor (β = 0.035, 95% CI 0.021 to 0.049, P  = 1.15 × 10 −6 ), suggesting a protective effect of acetylornithine on neurocognition. Conversely, a 1-SD increase in butyrylcarnitine was significantly associated with a 0.028 SD decrease in g-Factor (β = −0.028, 95% CI −0.041 to −0.015, P  = 1.31 × 10 −5 ), indicating a detrimental effect of butyrylcarnitine on neurocognition. Scatter plots illustrating the genetic associations between acetylornithine and g-Factor, and butyrylcarnitine and g-Factor are presented in Supplementary Fig. 1 . In the reverse direction, the analysis did not provide evidence to suggest that g-Factor has a causal effect on any of the metabolites (Supplementary Table 8 ).

figure 3

The forest plot shows significant associations. The effect estimates (β) indicate change in mean g-Factor per unit change in g-Factor, and the error bars indicate 95% confidence interval. All statistical tests were two-sided. A P -value < 7.89 × 10 −5 after Bonferroni correction was considered significant.

The sensitivity analyses demonstrated the reliability of these two putative causalities. Estimates from MR-RAPS (acetylornithine β = 0.035, 95% CI 0.020 to 0.049, P  = 2.43 × 10 −6 ; butyrylcarnitine β = −0.028, 95% CI −0.044 to −0.014, P  = 2.13 × 10 −4 ), weighted median (acetylornithine β = 0.031, 95% CI 0.017 to 0.046, P  = 2.88 × 10 −5 ; butyrylcarnitine β = −0.030, 95% CI −0.045 to −0.016, P  = 4.40 × 10 −5 ), and weighted mode (acetylornithine β = 0.031, 95% CI 0.013 to 0.049, P  = 4.92 × 10 −3 ; butyrylcarnitine β = −0.030, 95% CI −0.046 to −0.014, P  = 1.51 × 10 −3 ) methods were generally consistent with those of the IVW method in terms of the effect size and direction (Fig. 3 ). Confidence intervals obtained from MR-Egger were wider than those obtained from IVW, probably due to the lower power of the MR-Egger. Leave-one-out analyses showed that the estimates from IVW remained similar after excluding each SNP from the instrumental variables, suggesting that no single SNP drove the causal estimates (Supplementary Fig. 2 ). MR-PRESSO (acetylornithine P  = 0.558; butyrylcarnitine P  = 0.338) and MR-Egger intercept analyses (acetylornithine P  = 0.242; butyrylcarnitine P  = 0.998) provided no evidence of pleiotropy (Supplementary Table 9 ).

Given that the risk causal metabolites were genetically correlated with g-Factor, this could potentially bias the causal results. To address this, we conducted an LCV model, which revealed that acetylornithine → g-Factor (GCP = 0.50) and butyrylcarnitine → g-Factor (GCP = 0.62) exhibited a tendency towards strong evidence of partial genetic causality (Supplementary Table 10 ). The CAUSE method did not reject the sharing model for acetylornithine → g-Factor (ELPD Sharing vs Causal  = −2.2, P  = 0.098), estimating that 5% of acetylornithine variants act through a shared factor (Supplementary Table 10 ). Additionally, we found latent causal evidence supporting butyrylcarnitine → g-Factor using the CAUSE method (ELPD Sharing vs Causal  = −4.1, P  = 0.035) (Supplementary Table 10 ).

In addition, we considered pleiotropy among metabolites, which poses a challenge for MR selection. For butyrylcarnitine, three IVs (rs1171617, rs662138, rs77010315) were associated with 17 other metabolites, including various carnitine derivatives, total cholesterol, and low-density lipoprotein cholesterol (Supplementary Table 11 ). However, no associations were found between the IVs of acetylornithine and any other metabolite. We identified the relevant genes of genetic instruments through annotation of their physical locations and eQTL signals in blood tissues (Supplementary Table 12 ), some of which genes are involved in functions such as transporters and metabolizing enzymes. Next, we estimated the MR effect size of each individual SNP in the IVs (Supplementary Fig. 3 ). The results suggest that specific IVs may play crucial roles in causal relationships. Specifically, rs17349049 is an important IV for the acetylornithine → g-Factor relationship, but the MR estimate remained unchanged after its removal (β = 0.059, 95% CI 0.022 to 0.096, P  = 1.93 × 10 −3 ). Additionally, rs71454652 and rs1151874 appeared to be crucial for the butyrylcarnitine → g-Factor relationship; their removing resulted in a non-significant MR estimate (β = −0.018, 95% CI −0.041 to 0.005, P  = 0.125). These SNPs are located within genes involved in biological functions such as microtubule organization, ribonuclease activity, and calcium-mediated cellular signal transduction.

Causal inferences between g-Factor and psychiatric disorders

Furthermore, we conducted bidirectional two-sample MR analyses between the g-Factor and the seven psychiatric disorders previously identified. We have defined the MR analysis from g-Factor to disorder in a forward direction and vice versa as a reverse inference. Detailed information on confounding SNPs, F-statistic values and statistical power of each MR test, can be found in Supplementary Tables 4 , 13 and 14 .

With regard to the forward MR results, three putative causal relationships were identified between g-Factor and schizophrenia (IVW OR = 0.38, 95% CI 0.30 to 0.48, P  = 1.72 × 10 −15 ), PTSD (IVW OR = 0.38, 95% CI 0.25 to 0.57, P  = 3.38 × 10 −6 ), and bipolar disorder (IVW OR = 0.51, 95% CI 0.41 to 0.64, P  = 3.80 × 10 −9 ) (Table 1 ). Specifically, a 1-SD increase in the g-Factor was associated with a 62% lower risk of schizophrenia, a 62% lower risk of PTSD, and a 49% lower risk of bipolar disorder. In the reverse MR analyses, a higher risk of schizophrenia was associated with a decreased g-Factor (IVW β = −0.062, 95% CI −0.071 to −0.052, P  = 3.78 × 10 −38 ) (Table 1 ).

A series of sensitivity analyses were conducted, including MR-RAPS, weighted median, weighted mode, and MR-Egger methods. These yielded patterns of similar estimates in size, although the confidence intervals were wider than those of the IVW (Table 1 ). The scatter plots (Supplementary Fig. 4 ) and leave-one-out plots (Supplementary Fig. 5 ) provided further evidence of unbiased estimates. The MR-PRESSO and MR-Egger intercept analyses were conducted to examine the presence of pleiotropy, but no evidence of pleiotropy was detected (Supplementary Table 15 ).

Mediation analysis between metabolites, g-Factor and psychiatric disorders

Metabolites like acetylornithine and butyrylcarnitine, as well as psychiatric disorders such as schizophrenia, PTSD and bipolar disorder, have been found to be genetically related to g-Factor in the study. The next step involves examining whether the relationship between these metabolites and disorders is mediated through g-Factor. Mediation analyses were conducted to investigate potential pathways linking the identified metabolites to the g-Factor, and psychiatric disorders.

MR analyses provided moderate support for the causal relationships between acetylornithine and bipolar disorder, as well as between butyrylcarnitine and schizophrenia using the IVW method (Supplementary Fig. 6 ). Specifically, a 1-SD increment in acetylornithine was associated with a 6% lower odds of bipolar disorder risk (IVW OR = 0.94, 95% CI 0.90 to 0.98, P  = 8.09 × 10 −3 ), while a 1-SD increase in the butyrylcarnitine was associated with a 9% higher odds of schizophrenia risk (IVW OR = 1.09, 95% CI 1.04 to 1.14, P  = 2.05 × 10 −4 ). The effect estimates showed consistent direction and magnitude across the MR-RAPS, weighted median, and weighted mode methods. Notably, when the weighted median method was used instead of the IVW method, the results indicated a positive association between acetylornithine levels and schizophrenia risk (OR = 0.94, 95% CI 0.90 to 0.98, P  = 5.09 × 10 −3 ) (Supplementary Fig. 6 ).

Our results are consistent with those of a previous MR study [ 84 ], which also reported negative associations between N -acetylornithine and bipolar disorder (IVW OR = 0.72, 95% CI 0.66 to 0.79, P  = 1.08 × 10 −13 ) as well as schizophrenia (IVW OR = 0.74, 95% CI 0.64 to 0.84, P  = 5.14 × 10 −6 ), and a positive association between butyrylcarnitine and schizophrenia (IVW OR = 1.22, 95% CI 1.12 to 1.32, P  = 1.10 × 10 −6 ). The confidence intervals of our MR estimates differed from theirs, which may be due to the use of different GWAS data and more stringent criteria for IVs selection. To ensure the reliability of the estimates, sensitivity analyses were conducted, including scatter plots (Supplementary Fig. 7 ), leave-one-out plots (Supplementary Fig. 8 ), MR-PRESSO (Supplementary Table 16 ), and MR-Egger intercept (Supplementary Table 16 ). These results revealed that the estimates were free from bias. A summary of the IVs for acetylornithine and butyrylcarnitine in relation to schizophrenia, PTSD, and bipolar disorder is presented in Supplementary Table 17 .

Furthermore, mediation analysis was conducted to investigate the causal pathways from acetylornithine to bipolar disorder and from butyrylcarnitine to schizophrenia via the g-Factor. Two potential regulatory networks were identified (Fig. 4 ): a pathway from acetylornithine on bipolar disorder, mediated by g-Factor with a mediated effect of −0.023 (95% CI −0.036 to −0.011, P  = 1.76 × 10 −4 ) and accounting for a mediated proportion of 37.3% (Fig. 4A ); and a pathway from butyrylcarnitine to schizophrenia, also mediated by the g-Factor with a mediation effect of 0.027 (95% CI 0.013 to 0.042, P  = 1.32 × 10 −4 ), representing approximately 32.8% of the total effect (Fig. 4B ). The Steiger test [ 82 , 83 ] was performed to confirm the absence of evidence for reverse causality from acetylornithine or butyrylcarnitine to bipolar disorder, schizophrenia, and g-Factor (Supplementary Table 18 ).

figure 4

A Pathway from acetylornithine to bipolar disorder via the mediator of g-Factor. B Pathway from butyrylcarnitine to schizopherenia via the mediator of g-Factor. The indirect effect was calculated by mediation analysis via two-step MR framework. Inverse-variance weighted method was used as the MR test. All statistical tests were two-sided from normal distribution. A P -value < 8.33 × 10 −3 was considered significant after correction. Abbreviation: g-Factor, general cognitive factor score; CI, confidence intervals. 1 Total effect S0 indicates the causal effect of the exposure on the outcome. 2 Direct effect S1 indicates the causal effect of the exposure on the mediator. 3 Direct effect S2 indicates the causal effect of the mediator on the outcome. 4 Mediation effect indicates the indirect effect of exposure on outcome through the mediator. Indirect effect and total effect should be in the consistent direction. 5 Mediated proportion indicates the ratio of indirect effect to the total effect of the exposure on the outcome. 6 Direct effect S3 indicates the total effect minus the indirect effect of the exposure on the outcome.

Additionally, considering the reverse causal direction from schizophrenia to the g-Factor, we found support for the causal pathway from butyrylcarnitine to the g-Factor via schizophrenia (Supplementary Fig. 9 ). The mediation effect of butyrylcarnitine on the g-Factor, estimated at −0.005 (95% CI −0.008 to −0.002, P  = 3.60 × 10 −4 ), accounted for approximately 18.4% of the total effect. This suggests that butyrylcarnitine may serve as a promising risk factor, either directly or indirectly, influencing both schizophrenia and g-Factor. Specifically, it implies that during the early stages of schizophrenia, butyrylcarnitine affects schizophrenia through cognitive modulation. However, as schizophrenia progresses, butyrylcarnitine exacerbates its effects on neurocognition.

Validating the causal pathways by other relevant neurocognitive traits

To validate the causal pathways, we examined the findings using summary statistics from other sources of neurocognitive data (Supplementary Table 2 ). The results showed that the causal pathway from acetylornithine to bipolar disorder was mediated through three neurocognitive traits (cognitive performance, general cognitive function, and verbal numerical reasoning) with varying mediated proportions (5.9%, 7.9% and 21.1%, respectively) (Supplementary Table 19 ). Similarly, the causal pathway from butyrylcarnitine to schizophrenia was found to be mediated through four neurocognitive traits (intelligence, cognitive function, general cognitive function, and verbal numerical reasoning) with varying mediated proportions (7.9%, 6.9%, 13.7% and 11.0%, respectively) (Supplementary Table 19 ). The MR Steiger test did not provide evidence of reverse causality from these two metabolites on neurocognitive traits (Supplementary Table 20 ).

In this study, we aimed to investigate the relationships between blood metabolites and neurocognitive traits by using genetic variants as unconfounded proxies. Given the observed genetic associations between neurocognitive traits and psychiatric disorders, we conducted mediation analyses to uncover the causal pathways from blood metabolites to these disorders through cognition. Our findings suggested that acetylornithine has a protective effect on the g-Factor, a measure of general cognitive ability. We observed that the g-Factor partially mediates the association between acetylornithine and bipolar disorder. Similarly, we identified a deleterious causal effect of butyrylcarnitine on the g-Factor, with the g-Factor acting as a partial mediator in the association between butyrylcarnitine and schizophrenia. These results were robust across a variety of sensitivity analyses designed to address potential horizontal pleiotropy. To further validate the reliability of our findings, we corroborated them with cognitive phenotypes derived from independent sources, which consistently supported our conclusions.

Acetylornithine, a member of the class of biogenic amines, is an intermediate product in the biosynthesis of arginine from glutamate. The acetylornithine pathway may facilitate the polyamine-mediated stress response, which regulates intracellular polyamine homeostasis and metabolic processes in organisms [ 85 , 86 ]. The cognitive protective effect of acetylornithine is supported by evidence. Prior study has shown that individuals with Alzheimer’s disease exhibit higher serum acetylornithine levels compared to those with mild cognitive impairment [ 87 ]. Acetylornithine has also been demonstrated to exhibit high stability over a 10-year period, as evidenced by a study assessing metabolite stability in humans [ 88 ]. It can be obtained through dietary sources such as fruits and legumes [ 89 , 90 , 91 , 92 ], making it a potential target for healthcare interventions. This allows for the provision of appropriate dietary advice to patients.

Conversely, butyrylcarnitine, a plasma metabolite belonging to the acylcarnitine class, has been associated with detrimental effects on neurocognition. Abnormalities in acylcarnitine metabolism have been linked to impaired fatty acid oxidation and mitochondrial dysfunction, which can affect the brain’s energy supply [ 93 , 94 ]. Elevated plasma concentrations of butyrylcarnitine have been observed in individuals with developmental and cognitive impairment [ 95 ]. A study has suggested that individuals with schizophrenia exhibit elevated levels of butyrylcarnitine when compared to healthy individuals [ 96 ]. Elevated butyrylcarnitine level have been demonstrated to regulate accelerated neuronal differentiation in aged subjects [ 97 ]. Butyric acid, a precursor to butyrylcarnitine, is derived primarily from microbial fermentation of dietary fiber in the intestine [ 98 ]. Consequently, butyrylcarnitine is intimately linked with dietary intake and can facilitate the transfer of metabolites from food to the brain via the circulatory system.

It has been shown that there is a significant genetic overlap between cognitive traits and psychiatric disorders [ 99 , 100 , 101 ]. For instance, while the majority of schizophrenia risk variants are associated with poorer cognitive performance, bipolar disorder risk variants are associated with either poorer or better cognitive performance [ 102 ]. Moreover, gene set enrichment analyses revealed shared loci for biological processes related to neural development, synaptic integrity, and neurotransmission between schizophrenia and intelligence [ 103 ]. Therefore, the relationship between psychiatric disorders and neurocognition is complex and multifaceted. The findings of this study suggest that butyrylcarnitine may increase the risk of schizophrenia by impairing neurocognitive function. This impairment may, in turn, exacerbate neurocognitive impairment and contribute to the onset of schizophrenia. Clinical studies have consistently shown that neurocognitive deficits precede the onset of schizophrenia [ 3 ] and persist even after the onset of the disorder [ 4 ]. Recent MR studies have also supported our findings, indicating bidirectional genetic associations between schizophrenia and neurocognition [ 43 , 104 ], as well as associations of the metabolites acetylornithine and butyrylcarnitine with schizophrenia and bipolar disorder [ 84 , 105 ]. Nevertheless, no study to date has investigated the interrelationship between metabolites, cognition, and psychiatric disorders.

The studies of plasma pharmacometabolomics have revealed that the concentration of acetylornithine undergoes significant alterations following the administration of psychiatric or neurodegenerative drugs [ 106 , 107 ]. To date, no studies have been conducted to elucidate the functional role of acetylornithine in the central nervous system. Acetylornithine generates the metabolites ornithine and citrulline via deacetylase and carbamoyltransferase, which ultimately participate in the metabolic pathway of arginine synthesis. Arginine is also known to influence nitric oxide synthesis in the brain, as well as vasodilatation, neuronal conduction, and brain cell protection [ 108 , 109 , 110 ]. These effects have been suggested to impact brain cognitive function and psychiatric symptoms. Since the pathway of acetylornithine to arginine synthesis is not unidirectional, our findings may imply that acetylornithine may also play an important role in brain cognition and its associated psychiatric symptoms. A deficiency of short-chain acyl-CoA dehydrogenase, resulting from variations in genes encoding ACAD family members, may be responsible for the elevation of butyrylcarnitine in the blood [ 111 ]. Butyrylcarnitine belongs to the acylcarnitine family, which is involved in fatty acid metabolism, particularly mitochondrial fatty acid beta-oxidation [ 112 ]. Such abnormalities may be indicative of mitochondrial dysfunction, which affects the energy supply to the brain and consequently triggers disorders in brain function [ 113 ]. Impaired fatty acid and glucose oxidation due to mitochondrial dysfunction is strongly associated with cognitive dysfunction and the development of psychiatric disorders [ 114 , 115 ]. At present, there is a paucity of mechanistic studies on butyrylcarnitine, whereas there is considerable evidence supporting the neuroprotective effects of acetylcarnitine on cognitive impairment [ 116 ].

Our findings highlight the impact of blood metabolism levels on cognitive performance, particularly in relation to the risk of mental illness. However, it is essential to acknowledge several limitations in the study. Firstly, while we ensured the independence of IVs in terms of physical location, we cannot exclude the possibility of bio-functional interactions among genetic instruments. Secondly, the GWAS data used for the majority of metabolites were summary-level statistics derived from various meta-analyses, which may potentially introduce implications pertaining to population stratification. Thirdly, it should be noted that environmental and social factors, including assortative mating, lifestyle, and economic status, can introduce biases in MR estimates [ 117 , 118 ]. Fourthly, caution should be exercised in applying MR estimates to clinical interventions and health care decisions because MR primarily examines the long-term effects of lifetime exposures rather than short-term interventions [ 119 ]. Lastly, the use of a binary outcome in the mediation approach is a potential source of bias [ 82 ]. Since both schizophrenia and bipolar disorder are relatively rate (< 10%) [ 120 ], this is likely to be less of an issue, as the odds ratio will approximate the risk ratio somewhat sufficiently. If the outcome is common, then the product method used in the study is invalid for the direct and indirect effects. One way to address this issue is to estimate the direct and indirect effects using log-binomial models [ 82 ].

Conclusions

In conclusion, this study used large-scale GWAS data, MR, and mediation analysis to uncover causal pathways between blood metabolites, neurocognitive traits, and psychiatric disorders. The results suggested a protective role of acetylornithine and a detrimental role of butyrylcarnitine on neurocognition, linking acetylornithine to bipolar disorder and butyrylcarnitine to schizophrenia. These findings offer insights into the pathophysiology of these disorders and highlight potential metabolic targets for prevention and treatment. Further research is needed to explore these metabolic factors in schizophrenia and bipolar disorder.

Data availability

All GWAS summary statistics used in our study are publicly available online. Detailed information and specific release links are provided in Supplementary Table 1 and 2 . Briefly, summary-level data for metabolites GWAS were obtained from the following sources: https://omicscience.org/apps/crossplatform/ , http://metabolomics.helmholtz-muenchen.de/gwas , https://gwas.mrcieu.ac.uk , and NCBI dbGaP (phs001672.v4.p1). The g-Factor GWAS summary data were downloaded from https://datashare.ed.ac.uk/handle/10283/3756 , and other cognitive traits were sourced from http://www.thessgac.org , https://ctg.cncr.nl , and http://www.psy.ed.ac.uk/ccace/downloads/ . Data for nine psychiatric disorders were downloaded from https://pgc.unc.edu/for-researchers/download-results/ , with PTSD data available from NCBI dbGaP (phs001672.v1.p1).

Catalan A, Salazar de Pablo G, Aymerich C, Damiani S, Sordi V, Radua J, et al. Neurocognitive functioning in individuals at clinical high risk for psychosis: a systematic review and meta-analysis. JAMA Psychiatry. 2021;78:859–67.

Article   PubMed   PubMed Central   Google Scholar  

Sachdev PS, Blacker D, Blazer DG, Ganguli M, Jeste DV, Paulsen JS, et al. Classifying neurocognitive disorders: the DSM-5 approach. Nat Rev Neurol. 2014;10:634–42.

Article   PubMed   Google Scholar  

Reichenberg A, Caspi A, Harrington H, Houts R, Keefe RS, Murray RM, et al. Static and dynamic cognitive deficits in childhood preceding adult schizophrenia: a 30-year study. Am J Psychiatry. 2010;167:160–9.

Zanelli J, Mollon J, Sandin S, Morgan C, Dazzan P, Pilecka I, et al. Cognitive change in Schizophrenia and other psychoses in the decade following the first episode. Am J Psychiatry. 2019;176:811–9.

Joshua N, Gogos A, Rossell S. Executive functioning in schizophrenia: a thorough examination of performance on the Hayling sentence completion test compared to psychiatric and non-psychiatric controls. Schizophr Res. 2009;114:84–90.

Keepers GA, Fochtmann LJ, Anzia JM, Benjamin S, Lyness JM, Mojtabai R, et al. The American Psychiatric Association practice guideline for the treatment of patients with schizophrenia. Am J Psychiatry. 2020;177:868–72.

Woodward ND, Purdon SE, Meltzer HY, Zald DH. A meta-analysis of neuropsychological change to clozapine, olanzapine, quetiapine, and risperidone in schizophrenia. Int J Neuropsychopharmacol. 2005;8:457–72.

Article   PubMed   CAS   Google Scholar  

Goozee R, Reinders A, Handley R, Marques T, Taylor H, O’Daly O, et al. Effects of aripiprazole and haloperidol on neural activation during the n-back in healthy individuals: a functional MRI study. Schizophr Res. 2016;173:174–81.

Nielsen RE, Levander S, Kjaersdam Telleus G, Jensen SO, Ostergaard Christensen T, Leucht S. Second-generation antipsychotic effect on cognition in patients with schizophrenia–a meta-analysis of randomized clinical trials. Acta Psychiatr Scand. 2015;131:185–96.

Desamericq G, Schurhoff F, Meary A, Szoke A, Macquin-Mavier I, Bachoud-Levi AC, et al. Long-term neurocognitive effects of antipsychotics in schizophrenia: a network meta-analysis. Eur J Clin Pharmacol. 2014;70:127–34.

Allott K, Chopra S, Rogers J, Dauvermann MR, Clark SR. Advancing understanding of the mechanisms of antipsychotic-associated cognitive impairment to minimise harm: a call to action. Mol Psychiatry. 2024. https://doi.org/10.1038/s41380-024-02503-x .

Pillinger T, McCutcheon RA, Vano L, Mizuno Y, Arumuham A, Hindley G, et al. Comparative effects of 18 antipsychotics on metabolic function in patients with schizophrenia, predictors of metabolic dysregulation, and association with psychopathology: a systematic review and network meta-analysis. Lancet Psychiatry. 2020;7:64–77.

Kaddurah-Daouk R, McEvoy J, Baillie RA, Lee D, Yao JK, Doraiswamy PM, et al. Metabolomic mapping of atypical antipsychotic effects in schizophrenia. Mol Psychiatry. 2007;12:934–45.

Proitsi P, Kuh D, Wong A, Maddock J, Bendayan R, Wulaningsih W, et al. Lifetime cognition and late midlife blood metabolites: findings from a British birth cohort. Transl Psychiatry. 2018;8:203.

Joshi YB, Thomas ML, Braff DL, Green MF, Gur RC, Gur RE, et al. Anticholinergic medication burden-associated cognitive impairment in schizophrenia. Am J Psychiatry. 2021;178:838–47.

Sakurai H, Bies RR, Stroup ST, Keefe RS, Rajji TK, Suzuki T, et al. Dopamine D2 receptor occupancy and cognition in schizophrenia: analysis of the CATIE data. Schizophr Bull. 2013;39:564–74.

Wishart DS. Metabolomics for investigating physiological and pathophysiological processes. Physiol Rev. 2019;99:1819–75.

Smith PJ, Mabe SM, Sherwood A, Doraiswamy PM, Welsh-Bohmer KA, Burke JR, et al. Metabolic and neurocognitive changes following lifestyle modification: examination of biomarkers from the ENLIGHTEN randomized clinical trial. J Alzheimers Dis. 2020;77:1793–803.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Palacios N, Lee JS, Scott T, Kelly RS, Bhupathiraju SN, Bigornia SJ, et al. Circulating plasma metabolites and cognitive function in a Puerto Rican cohort. J Alzheimers Dis. 2020;76:1267–80.

Lindenmayer JP, Khan A, Kaushik S, Thanju A, Praveen R, Hoffman L, et al. Relationship between metabolic syndrome and cognition in patients with schizophrenia. Schizophr Res. 2012;142:171–6.

Yates KF, Sweat V, Yau PL, Turchiano MM, Convit A. Impact of metabolic syndrome on cognition and brain: a selected review of the literature. Arterioscler Thromb Vasc Biol. 2012;32:2060–7.

Mitchell AJ, Vancampfort D, Sweers K, van Winkel R, Yu W, De Hert M. Prevalence of metabolic syndrome and metabolic abnormalities in schizophrenia and related disorders—a systematic review and meta-analysis. Schizophr Bull. 2013;39:306–18.

Bosia M, Buonocore M, Bechi M, Santarelli L, Spangaro M, Cocchi F, et al. Improving cognition to increase treatment efficacy in schizophrenia: effects of metabolic syndrome on cognitive remediation’s outcome. Front Psychiatry. 2018;9:647.

DE Hert M, Schreurs V, Vancampfort D, VAN Winkel R. Metabolic syndrome in people with schizophrenia: a review. World Psychiatry. 2009;8:15–22.

Weiser MJ, Butt CM, Mohajeri MH. Docosahexaenoic acid and cognition throughout the lifespan. Nutrients. 2016;8:99.

Cervenka I, Agudelo LZ, Ruas JL. Kynurenines: tryptophan’s metabolites in exercise, inflammation, and mental health. Science. 2017;357:eaaf9794.

Tsai G, Lane HY, Yang P, Chong MY, Lange N. Glycine transporter I inhibitor, N-methylglycine (sarcosine), added to antipsychotics for the treatment of schizophrenia. Biol Psychiatry. 2004;55:452–6.

Avgerinos KI, Egan JM, Mattson MP, Kapogiannis D. Medium chain triglycerides induce mild ketosis and may improve cognition in Alzheimer’s disease. A systematic review and meta-analysis of human studies. Ageing Res Rev. 2020;58:101001.

Guan LY, Hou WL, Zhu ZH, Cao JQ, Tang Z, Yin XY, et al. Associations among gonadal hormone, triglycerides and cognitive decline in female patients with major depressive disorders. J Psychiatr Res. 2021;143:580–6.

Sepehrmanesh Z, Heidary M, Akasheh N, Akbari H, Heidary M. Therapeutic effect of adjunctive N-acetyl cysteine (NAC) on symptoms of chronic schizophrenia: a double-blind, randomized clinical trial. Prog Neuropsychopharmacol Biol Psychiatry. 2018;82:289–96.

Zajac IT, Barnes M, Cavuoto P, Wittert G, Noakes M. The effects of vitamin D-enriched mushrooms and vitamin D3 on cognitive performance and mood in healthy elderly adults: a randomised, double-blinded, placebo-controlled trial. Nutrients. 2020;12:3847.

Chen H, Liu S, Ge B, Zhou D, Li M, Li W, et al. Effects of folic acid and vitamin B12 supplementation on cognitive impairment and inflammation in patients with Alzheimer’s disease: a randomized, single-blinded, placebo-controlled trial. J Prev Alzheimers Dis. 2021;8:249–56.

PubMed   CAS   Google Scholar  

Ma F, Li Q, Zhou X, Zhao J, Song A, Li W, et al. Effects of folic acid supplementation on cognitive function and Abeta-related biomarkers in mild cognitive impairment: a randomized controlled trial. Eur J Nutr. 2019;58:345–56.

Zhong C, Lu Z, Che B, Qian S, Zheng X, Wang A, et al. Choline pathway nutrients and metabolites and cognitive impairment after acute ischemic stroke. Stroke. 2021;52:887–95.

Smith GD, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol. 2003;32:1–22.

Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, Consortium E-I. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015;30:543–52.

Deary IJ, Penke L, Johnson W. The neuroscience of human intelligence differences. Nat Rev Neurosci. 2010;11:201–11.

Lotta LA, Pietzner M, Stewart ID, Wittemans LBL, Li C, Bonelli R, et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nat Genet. 2021;53:54–64.

Kettunen J, Demirkan A, Wurtz P, Draisma HH, Haller T, Rawal R, et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nat Commun. 2016;7:11122.

Klarin D, Damrauer SM, Cho K, Sun YV, Teslovich TM, Honerlaw J, et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nat Genet. 2018;50:1514–23.

Shin SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46:543–50.

de la Fuente J, Davies G, Grotzinger AD, Tucker-Drob EM, Deary IJ. A general dimension of genetic sharing across diverse cognitive traits inferred from molecular data. Nat Hum Behav. 2021;5:49–58.

Savage JE, Jansen PR, Stringer S, Watanabe K, Bryois J, de Leeuw CA, et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet. 2018;50:912–9.

Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet. 2018;50:1112–21.

Davies G, Lam M, Harris SE, Trampush JW, Luciano M, Hill WD, et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat Commun. 2018;9:2098.

Trubetskoy V, Pardinas AF, Qi T, Panagiotaropoulou G, Awasthi S, Bigdeli TB, et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature. 2022;604:502–8.

Mullins N, Forstner AJ, O’Connell KS, Coombes B, Coleman JRI, Qiao Z, et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat Genet. 2021;53:817–29.

Watson HJ, Yilmaz Z, Thornton LM, Hubel C, Coleman JRI, Gaspar HA, et al. Genome-wide association study identifies eight risk loci and implicates metabo-psychiatric origins for anorexia nervosa. Nat Genet. 2019;51:1207–14.

Demontis D, Walters RK, Martin J, Mattheisen M, Als TD, Agerbo E, et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat Genet. 2019;51:63–75.

Howard DM, Adams MJ, Clarke TK, Hafferty JD, Gibson J, Shirali M, et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat Neurosci. 2019;22:343–52.

Grove J, Ripke S, Als TD, Mattheisen M, Walters RK, Won H, et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet. 2019;51:431–44.

Gelernter J, Sun N, Polimanti R, Pietrzak R, Levey DF, Bryois J, et al. Genome-wide association study of post-traumatic stress disorder reexperiencing symptoms in >165,000 US veterans. Nat Neurosci. 2019;22:1394–401.

Otowa T, Hek K, Lee M, Byrne EM, Mirza SS, Nivard MG, et al. Meta-analysis of genome-wide association studies of anxiety disorders. Mol Psychiatry. 2016;21:1391–9.

International Obsessive Compulsive Disorder Foundation Genetics C, Studies OCDCGA. Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol Psychiatry. 2018;23:1181–8.

Article   Google Scholar  

Yu D, Sul JH, Tsetsos F, Nawaz MS, Huang AY, Zelaya I, et al. Interrogating the genetic determinants of Tourette’s syndrome and other Tic disorders through genome-wide association studies. Am J Psychiatry. 2019;176:217–27.

Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet. 2018;50:668–81.

Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet. 2015;47:1236–41.

Guo J, Yu K, Dong SS, Yao S, Rong Y, Wu H, et al. Mendelian randomization analyses support causal relationships between brain imaging-derived phenotypes and risk of psychiatric disorders. Nat Neurosci. 2022;25:1519–27.

Skrivankova VW, Richmond RC, Woolf BAR, Yarmolinsky J, Davies NM, Swanson SA, et al. Strengthening the reporting of observational studies in epidemiology using mendelian randomization: the STROBE-MR statement. JAMA. 2021;326:1614–21.

Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7:e34408.

Price AL, Weale ME, Patterson N, Myers SR, Need AC, Shianna KV, et al. Long-range LD can confound genome scans in admixed populations. Am J Hum Genet. 2008;83:132–5.

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

Klop B, do Rego AT, Cabezas MC. Alcohol and plasma triglycerides. Curr Opin Lipidol. 2013;24:321–6.

Gu F, Derkach A, Freedman ND, Landi MT, Albanes D, Weinstein SJ, et al. Cigarette smoking behaviour and blood metabolomics. Int J Epidemiol. 2016;45:1421–32.

Jacobus J, Tapert SF. Neurotoxic effects of alcohol in adolescence. Annu Rev Clin Psychol. 2013;9:703–21.

Swan GE, Lessov-Schlaggar CN. The effects of tobacco smoke and nicotine on cognition and the brain. Neuropsychol Rev. 2007;17:259–73.

Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, Adkins AE, et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci. 2018;21:1656–69.

Gurillo P, Jauhar S, Murray RM, MacCabe JH. Does tobacco use cause psychosis? Systematic review and meta-analysis. Lancet Psychiatry. 2015;2:718–25.

Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D12.

Bowden J, Spiller W, Del Greco MF, Sheehan N, Thompson J, Minelli C, et al. Improving the visualization, interpretation and analysis of two-sample summary data Mendelian randomization via the Radial plot and Radial regression. Int J Epidemiol. 2018;47:2100.

Burgess S. Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome. Int J Epidemiol. 2014;43:922–9.

Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–65.

Zhao QY, Wang JS, Hemani G, Bowden J, Small DS. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Ann Stat. 2020;48:1742–69.

Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol. 2016;40:304–14.

Hartwig FP, Davey Smith G, Bowden J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int J Epidemiol. 2017;46:1985–98.

Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–25.

Wald A. The fitting of straight lines if both variables are subject to error. Ann Math Stat. 1940;11:284–300.

Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50:693–8.

Reay WR, Kiltschewskij DJ, Geaghan MP, Atkins JR, Carr VJ, Green MJ, et al. Genetic estimates of correlation and causality between blood-based biomarkers and psychiatric disorders. Sci Adv. 2022;8:eabj8969.

O’Connor LJ, Price AL. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018;50:1728–34.

Morrison J, Knoblauch N, Marcus JH, Stephens M, He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat Genet. 2020;52:740–7.

VanderWeele TJ. Mediation analysis: a practitioner’s guide. Annu Rev Public Health. 2016;37:17–32.

Richiardi L, Bellocco R, Zugna D. Mediation analysis in epidemiology: methods, interpretation and bias. Int J Epidemiol. 2013;42:1511–9.

Jia Y, Hui L, Sun L, Guo D, Shi M, Zhang K, et al. Association between human blood metabolome and the risk of psychiatric disorders. Schizophr Bull. 2023;49:428–43.

Tavladoraki P, Cona A, Federico R, Tempera G, Viceconte N, Saccoccio S, et al. Polyamine catabolism: target for antiproliferative therapies in animals and stress tolerance strategies in plants. Amino Acids. 2012;42:411–26.

Casero RA Jr., Pegg AE. Spermidine/spermine N1-acetyltransferase–the turning point in polyamine metabolism. FASEB J. 1993;7:653–61.

Weng WC, Huang WY, Tang HY, Cheng ML, Chen KH. The differences of serum metabolites between patients with early-stage Alzheimer’s disease and mild cognitive impairment. Front Neurol. 2019;10:1223.

Zeleznik OA, Wittenbecher C, Deik A, Jeanfavre S, Avila-Pacheco J, Rosner B, et al. Intrapersonal stability of plasma metabolomic profiles over 10 years among women. Metabolites. 2022;12:372.

Lau CE, Siskos AP, Maitre L, Robinson O, Athersuch TJ, Want EJ, et al. Determinants of the urinary and serum metabolome in children from six European populations. BMC Med. 2018;16:202.

Armstrong MD. N-delta-acetylornithine and S-methylcysteine in blood plasma. Biochim Biophys Acta. 1979;587:638–42.

Yuan L, Muli S, Huybrechts I, Nothlings U, Ahrens W, Scalbert A, et al. Assessment of fruit and vegetables intake with biomarkers in children and adolescents and their level of validation: a systematic review. Metabolites. 2022;12:126.

Perera T, Young MR, Zhang Z, Murphy G, Colburn NH, Lanza E, et al. Identification and monitoring of metabolite markers of dry bean consumption in parallel human and mouse studies. Mol Nutr Food Res. 2015;59:795–806.

Jones LL, McDonald DA, Borum PR. Acylcarnitines: role in brain. Prog Lipid Res. 2010;49:61–75.

Wajner M, Amaral AU. Mitochondrial dysfunction in fatty acid oxidation disorders: insights from human and animal studies. Biosci Rep. 2015;36:e00281.

van Maldegem BT, Duran M, Wanders RJ, Niezen-Koning KE, Hogeveen M, Ijlst L, et al. Clinical, biochemical, and genetic heterogeneity in short-chain acyl-coenzyme A dehydrogenase deficiency. JAMA. 2006;296:943–52.

Cao B, Wang D, Pan Z, Brietzke E, McIntyre RS, Musial N, et al. Characterizing acyl-carnitine biosignatures for schizophrenia: a longitudinal pre- and post-treatment study. Transl Psychiatry. 2019;9:19.

Du Preez A, Lefevre-Arbogast S, Gonzalez-Dominguez R, Houghton V, de Lucia C, Low DY, et al. Impaired hippocampal neurogenesis in vitro is modulated by dietary-related endogenous factors and associated with depression in a longitudinal ageing cohort study. Mol Psychiatry. 2022;27:3425–40.

Akagawa S, Akagawa Y, Nakai Y, Yamagishi M, Yamanouchi S, Kimata T, et al. Fiber-rich Barley increases butyric acid-producing bacteria in the human gut microbiota. Metabolites. 2021;11:559.

Andreassen OA, Hindley GFL, Frei O, Smeland OB. New insights from the last decade of research in psychiatric genetics: discoveries, challenges and clinical implications. World Psychiatry. 2023;22:4–24.

Richards AL, Pardinas AF, Frizzati A, Tansey KE, Lynham AJ, Holmans P, et al. The relationship between polygenic risk scores and cognition in schizophrenia. Schizophr Bull. 2020;46:336–44.

PubMed   Google Scholar  

Mistry S, Escott-Price V, Florio AD, Smith DJ, Zammit S. Investigating associations between genetic risk for bipolar disorder and cognitive functioning in childhood. J Affect Disord. 2019;259:112–20.

Smeland OB, Bahrami S, Frei O, Shadrin A, O’Connell K, Savage J, et al. Genome-wide analysis reveals extensive genetic overlap between schizophrenia, bipolar disorder, and intelligence. Mol Psychiatry. 2020;25:844–53.

Smeland OB, Frei O, Kauppi K, Hill WD, Li W, Wang Y, et al. Identification of genetic loci jointly influencing schizophrenia risk and the cognitive traits of verbal-numerical reasoning, reaction time, and general cognitive function. JAMA Psychiatry. 2017;74:1065–75.

Smeland OB, Bahrami S, Frei O, Shadrin A, O’Connell K, Savage J, et al. Genome-wide analysis reveals extensive genetic overlap between schizophrenia, bipolar disorder, and intelligence (vol 25, pg 844, 2020). Mol Psychiatr. 2020;25:914.

Article   CAS   Google Scholar  

Yang J, Yan B, Zhao B, Fan Y, He X, Yang L, et al. Assessing the causal effects of human serum metabolites on 5 major psychiatric disorders. Schizophr Bull. 2020;46:804–13.

McClay JL, Vunck SA, Batman AM, Crowley JJ, Vann RE, Beardsley PM, et al. Neurochemical metabolomics reveals disruption to sphingolipid metabolism following chronic haloperidol administration. J Neuroimmune Pharmacol. 2015;10:425–34.

Napoli E, Schneider A, Wang JY, Trivedi A, Carrillo NR, Tassone F, et al. Allopregnanolone treatment improves plasma metabolomic profile associated with GABA metabolism in fragile X-associated tremor/ataxia syndrome: a pilot study. Mol Neurobiol. 2019;56:3702–13.

Pervin M, Unno K, Konishi T, Nakamura Y. L-arginine exerts excellent anti-stress effects on stress-induced shortened lifespan, cognitive decline and depression. Int J Mol Sci. 2021;22:508.

Marcinkowska AB, Biancardi VC, Winklewski PJ. Arginine vasopressin, synaptic plasticity, and brain networks. Curr Neuropharmacol. 2022;20:2292–302.

Yin B, Cai Y, Teng T, Wang X, Liu X, Li X, et al. Identifying plasma metabolic characteristics of major depressive disorder, bipolar disorder, and schizophrenia in adolescents. Transl Psychiatry. 2024;14:163.

Vianey-Saban C, Guffon N, Fouilhoux A, Acquaviva C. Fifty years of research on mitochondrial fatty acid oxidation disorders: the remaining challenges. J Inherit Metab Dis. 2023;46:848–73.

Schooneman MG, Vaz FM, Houten SM, Soeters MR. Acylcarnitines: reflecting or inflicting insulin resistance? Diabetes. 2013;62:1–8.

Nalecz KA, Miecz D, Berezowski V, Cecchelli R. Carnitine: transport and physiological functions in the brain. Mol Aspects Med. 2004;25:551–67.

Klein IL, van de Loo KFE, Smeitink JAM, Janssen MCH, Kessels RPC, van Karnebeek CD, et al. Cognitive functioning and mental health in mitochondrial disease: a systematic scoping review. Neurosci Biobehav Rev. 2021;125:57–77.

Rezin GT, Amboni G, Zugno AI, Quevedo J, Streck EL. Mitochondrial dysfunction and psychiatric disorders. Neurochem Res. 2009;34:1021–9.

Ferreira GC, McKenna MC. L-Carnitine and Acetyl-L-carnitine roles and neuroprotection in developing brain. Neurochem Res. 2017;42:1661–75.

Hartwig FP, Davies NM, Davey Smith G. Bias in Mendelian randomization due to assortative mating. Genet Epidemiol. 2018;42:608–20.

Yang Q, Sanderson E, Tilling K, Borges MC, Lawlor DA. Exploring and mitigating potential bias when genetic instrumental variables are associated with multiple non-exposure traits in Mendelian randomization. Eur J Epidemiol. 2022;37:683–700.

Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601.

Collaborators GBDMD. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Psychiatry. 2022;9:137–50.

Download references

Acknowledgements

This work was supported by grants from Innovation Capability Support Program of Shaanxi Province (2022TD-44), Key Research and Development Project of Shaanxi Province (2022GXLH-01-22), National Natural Science Foundation of China (82401762 and 32370653), China Postdoctoral Science Foundation (2023M732810), Postdoctoral Fellowship Program of CPSF (GZC20232113), Provincial Natural Science Foundation of Hunan (2023JJ60292), Promotion Project of the Health and Family Planning Commission of Hunan Province (2022011), Hunan Province Innovation Guidance Project (2020SK50804) and Fundamental Research Funds for the Central Universities. We would like to thank the support of the High-Performance Computing Platform and Instrument Analysis Center of Xi’an Jiaotong University. We thank the UK Biobank Resource under the application number 46387. The GWAS summary data of blood lipids (HDL-C, LDL-C, TG and TC) we used were available from dbGaP ( http://www.ncbi.nlm.nih.gov/gap ) under accession number phs001672.v4.p1.

Author information

These authors contributed equally: Jing Guo, Ping Yang.

Authors and Affiliations

Key Laboratory of Biomedical Information Engineering of Ministry of Education, Biomedical Informatics & Genomics Center, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, P. R. China

Jing Guo, Jia-Hao Wang, Shi-Hao Tang, Ji-Zhou Han, Ke Yu, Cong-Cong Liu, Shan-Shan Dong, Kun Zhang, Yuan-Yuan Duan, Tie-Lin Yang & Yan Guo

Hunan Brain Hospital, Clinical Medical School of Hunan University of Chinese Medicine, Changsha, Hunan, 410007, P. R. China

Guangdong Key Laboratory of Age-Related Cardiac and Cerebral Diseases, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong, 524000, China

You can also search for this author in PubMed   Google Scholar

Contributions

YG and JG designed this project. JG, PY, J-HW, S-HT, KY, and C-CL conducted the computational work. JG wrote the manuscript. YG revised the manuscript. JG, J-ZH and SY summarized the tables and figures. SY, S-SD, KZ, Y-YD and T-LY summarized the public data and offered some advices. YG and T-LY supported and supervised this project.

Corresponding author

Correspondence to Yan Guo .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Ethics approval and consent to participate

This study was approved by the Ethics committee of Xi’an Jiaotong University (Shaanxi, China). All datasets were publicly available, and ethical approval and informed consent were acquired for all original studies.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary figures, supplementary tables, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Guo, J., Yang, P., Wang, JH. et al. Blood metabolites, neurocognition and psychiatric disorders: a Mendelian randomization analysis to investigate causal pathways. Transl Psychiatry 14 , 376 (2024). https://doi.org/10.1038/s41398-024-03095-4

Download citation

Received : 31 January 2024

Revised : 30 August 2024

Accepted : 05 September 2024

Published : 16 September 2024

DOI : https://doi.org/10.1038/s41398-024-03095-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

standard deviation use in analysis of research data

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

What Is Standard Error? | How to Calculate (Guide with Examples)

Published on December 11, 2020 by Pritha Bhandari . Revised on June 22, 2023.

The standard error of the mean , or simply standard error , indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population .

The standard error of the mean (SE or SEM) is the most commonly reported type of standard error. But you can also find the standard error for other statistics, like medians or proportions. The standard error is a common measure of sampling error—the difference between a population parameter and a sample statistic .

Table of contents

Why standard error matters, standard error vs standard deviation, standard error formula, how should you report the standard error, other standard errors, other interesting articles, frequently asked questions about standard error.

In statistics, data from samples is used to understand larger populations. Standard error matters because it helps you estimate how well your sample data represents the whole population.

With probability sampling , where elements of a sample are randomly selected, you can collect data that is likely to be representative of the population. However, even with probability samples, some sampling error will remain. That’s because a sample will never perfectly match the population it comes from in terms of measures like means and standard deviations.

By calculating standard error, you can estimate how representative your sample is of your population and make valid conclusions.

A high standard error shows that sample means are widely spread around the population mean—your sample may not closely represent your population. A low standard error shows that sample means are closely distributed around the population mean—your sample is representative of your population.

You can decrease standard error by increasing sample size. Using a large, random sample is the best way to minimize sampling bias .

Prevent plagiarism. Run a free check.

Standard error and standard deviation are both measures of variability :

  • The standard deviation describes variability within a single sample .
  • The standard error estimates the variability across multiple samples of a population.

Standard error vs standard deviation

The standard deviation is a descriptive statistic that can be calculated from sample data. In contrast, the standard error is an inferential statistic that can only be estimated (unless the real population parameter is known).

The standard deviation of the math scores is 180. This number reflects on average how much each score differs from the sample mean score of 550.

The standard error of the mean is calculated using the standard deviation and the sample size.

From the formula, you’ll see that the sample size is inversely proportional to the standard error. This means that the larger the sample, the smaller the standard error, because the sample statistic will be closer to approaching the population parameter.

Different formulas are used depending on whether the population standard deviation is known. These formulas work for samples with more than 20 elements ( n > 20).

When population parameters are known

When the population standard deviation is known, you can use it in the below formula to calculate standard error precisely.

Formula Explanation
is standard error is population standard deviation is the number of elements in the sample

When population parameters are unknown

When the population standard deviation is unknown, you can use the below formula to only estimate standard error. This formula takes the sample standard deviation as a point estimate for the population standard deviation.

Formula Explanation
is standard error is sample standard deviation is the number of elements in the sample

First, find the square root of your sample size ( n ).

Formula Calculation

Next, divide the sample standard deviation by the number you found in step one.

Formula Calculation

 

You can report the standard error alongside the mean or in a confidence interval to communicate the uncertainty around the mean.

The best way to report the standard error is in a confidence interval because readers won’t have to do any additional math to come up with a meaningful interval.

A confidence interval is a range of values where an unknown population parameter is expected to lie most of the time, if you were to repeat your study with new random samples.

With a 95% confidence level, 95% of all sample means will be expected to lie within a confidence interval of ± 1.96 standard errors of the sample mean.

Based on random sampling, the true population parameter is also estimated to lie within this range with 95% confidence.

For a normally distributed characteristic , like SAT scores, 95% of all sample means fall within roughly 4 standard errors of the sample mean.

Confidence interval formula

= ± (1.96 ×  )

= sample mean = 550
= standard error = 12.8

Lower limit Upper limit

− (1.96 × )

550 (1.96 × 12.8) =

+ (1.96 ×  )

550 + (1.96 × 12.8) =

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Aside from the standard error of the mean (and other statistics), there are two other standard errors you might come across: the standard error of the estimate and the standard error of measurement.

The standard error of the estimate is related to regression analysis . This reflects the variability around the estimated regression line and the accuracy of the regression model. Using the standard error of the estimate, you can construct a confidence interval for the true regression coefficient.

The standard error of measurement is about the reliability of a measure. It indicates how variable the measurement error of a test is, and it’s often reported in standardized testing. The standard error of measurement can be used to create a confidence interval for the true score of an element or an individual.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Confidence interval
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

The standard error of the mean , or simply standard error , indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population.

Standard error and standard deviation are both measures of variability . The standard deviation reflects variability within a sample, while the standard error estimates the variability across samples of a population.

Using descriptive and inferential statistics , you can make two types of estimates about the population : point estimates and interval estimates.

  • A point estimate is a single value estimate of a parameter . For instance, a sample mean is a point estimate of a population mean.
  • An interval estimate gives you a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

Both types of estimates are important for gathering a clear idea of where a parameter is likely to lie.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 22). What Is Standard Error? | How to Calculate (Guide with Examples). Scribbr. Retrieved September 16, 2024, from https://www.scribbr.com/statistics/standard-error/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, parameter vs statistic | definitions, differences & examples, how to calculate standard deviation (guide) | calculator & examples, understanding confidence intervals | easy examples & formulas, what is your plagiarism score.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

land-logo

Article Menu

standard deviation use in analysis of research data

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Analysis of the spatiotemporal differentiation and influencing factors of land use efficiency in the beijing–tianjin–hebei urban agglomeration.

standard deviation use in analysis of research data

1. Introduction

2. study area and datasets, 2.1. overview of the study region, 2.2. data sources, 3. analysis method, 3.1. ulue evaluation index system, 3.2. sbm-dea model, 3.3. exploratory spatial data analysis, 3.4. efficiency evolution analysis, 3.5. geographical detector, 4. results and analysis, 4.1. spatial temporal analysis of land use changes in bth urban agglomeration, 4.1.1. temporal evolution characteristics, 4.1.2. spatial evolution characteristics, 4.2. spatial agglomeration characteristics of urban ulue, 4.3. spatial evolution characteristics of urban land use decomposition efficiency, 4.4. geographical detector, 4.4.1. factor detection and result analysis, 4.4.2. interaction detection results and analysis, 5. discussion, 5.1. conclusions, 5.2. theorical implications, 5.3. practical implications, 5.4. limitations and future research, author contributions, data availability statement, conflicts of interest.

  • Ramachandra, T.V.; Aithal, B.H.; Sanna, D.D. Insights to Urban Dynamics through Landscape Spatial Pattern Analysis. Int. J. Appl. Earth Obs. Geoinf. 2012 , 18 , 329–343. [ Google Scholar ] [ CrossRef ]
  • Zou, L.; Liu, Y.; Wang, J.; Yang, Y.; Wang, Y. Land Use Conflict Identification and Sustainable Development Scenario Simulation on China’s Southeast Coast. J. Clean. Prod. 2019 , 238 , 117899. [ Google Scholar ] [ CrossRef ]
  • Dong, G.; Liu, Z.; Niu, Y.; Jiang, W. Identification of Land Use Conflicts in Shandong Province from an Ecological Security Perspective. Land 2022 , 11 , 2196. [ Google Scholar ] [ CrossRef ]
  • Chen, D.; Hu, W.; Li, Y.; Zhang, C.; Lu, X.; Cheng, H. Exploring the Temporal and Spatial Effects of City Size on Regional Economic Integration: Evidence from the Yangtze River Economic Belt in China. Land Use Policy 2023 , 132 , 106770. [ Google Scholar ] [ CrossRef ]
  • Zhao, J.; Zhu, D.; Cheng, J.; Jiang, X.; Lun, F.; Zhang, Q. Does Regional Economic Integration Promote Urban Land Use Efficiency? Evidence from the Yangtze River Delta, China. Habitat Int. 2021 , 116 , 102404. [ Google Scholar ] [ CrossRef ]
  • Jingxin, G.; Jinbo, S.; Lufang, W. A New Methodology to Measure the Urban Construction Land-Use Efficiency Based on the Two-Stage DEA Model. Land Use Policy 2022 , 112 , 105799. [ Google Scholar ] [ CrossRef ]
  • Jiang, H. Spatial–Temporal Differences of Industrial Land Use Efficiency and Its Influencing Factors for China’s Central Region: Analyzed by SBM Model. Environ. Technol. Innov. 2021 , 22 , 101489. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Jie, X.; Ning, S.; Wang, K.; Li, X. Coupling and Coordinated Development of Urban Land Use Economic Efficiency and Green Manufacturing Systems in the Chengdu-Chongqing Economic Circle. Sustain. Cities Soc. 2022 , 85 , 104012. [ Google Scholar ] [ CrossRef ]
  • Bardo, J.W.; Hartman, J.J. Urban Sociology: A Systematic Introduction ; F.E. Peacock: Itasca, IL, USA, 1982. [ Google Scholar ]
  • Koopmans, T.C.; Beckmann, M.J. Assignment Problems and the Location of Economic Activities. Econometrica 1957 , 25 , 53. [ Google Scholar ]
  • Wheaton, W.C. Urban Residential Growth under Perfect Foresight. J. Urban Econ. 1982 , 12 , 1–21. [ Google Scholar ] [ CrossRef ]
  • Capozza, D.R.; Helsley, R.W. The Fundamentals of Land Prices and Urban Growth. J. Urban Econ. 1989 , 26 , 295–306. [ Google Scholar ] [ CrossRef ]
  • Fava, S.F.; Chapin, F.S. Urban Land Use Planning. Am. Sociol. Rev. 1957 , 22 , 765. [ Google Scholar ]
  • Scott, A.J. The Urban Land Nexus and the State ; Routledge: London, UK, 1980. [ Google Scholar ]
  • Russett, B.M. Locational Approaches to Power and Conflict. Am. Political Sci. Rev. 1977 , 71 , 1617–1618. [ Google Scholar ]
  • Williams, O.P. Urban Politics as Political Ecology ; Palgrave Macmillan: London, UK, 1975. [ Google Scholar ]
  • Jenks, M.; Burton, E.J.; Williams, K. The Compact City: A Sustainable Urban Form? Routledge: London, UK, 1996. [ Google Scholar ]
  • Gabriel, S.A.; Faria, J.A.; Moglen, G.E. A Multiobjective Optimization Approach to Smart Growth in Land Development. Socio-Econ. Plan. Sci. 2006 , 40 , 212–248. [ Google Scholar ]
  • Irwin, E.G. New Directions for Urban Economic Models of Land Use Change: Incorporating Spatial Heterogeneity and Transitional Dynamics. J. Reg. Sci. 2009 , 50 , 65–91. [ Google Scholar ]
  • Verburg, P.H.; van Berkel, D.B.; van Doorn, A.M.; van Eupen, M.; van den Heiligenberg, H.A.R.M. Trajectories of Land Use Change in Europe: A Model-Based Exploration of Rural Futures. Landsc. Ecol. 2010 , 25 , 217–232. [ Google Scholar ] [ CrossRef ]
  • Thornton, P.E.; Calvin, K.; Jones, A.D.; Di Vittorio, A.V.; Bond-Lamberty, B.; Chini, L.; Shi, X.; Mao, J.; Collins, W.D.; Edmonds, J.; et al. Biospheric Feedback Effects in a Synchronously Coupled Model of Human and Earth Systems. Nat. Clim. Change 2017 , 7 , 496–500. [ Google Scholar ] [ CrossRef ]
  • Edwards, D.P.; Socolar, J.B.; Mills, S.C.; Burivalova, Z.; Koh, L.P.; Wilcove, D.S. Conservation of Tropical Forests in the Anthropocene. Curr. Biol. 2019 , 29 , R1008–R1020. [ Google Scholar ]
  • Kaur, H.; Garg, P. Urban Sustainability Assessment Tools: A Review. J. Clean. Prod. 2019 , 210 , 146–158. [ Google Scholar ] [ CrossRef ]
  • Zhu, X.; Zhang, P.; Wei, Y.; Li, Y.; Zhao, H. Measuring the Efficiency and Driving Factors of Urban Land Use Based on the DEA Method and the PLS-SEM Model—A Case Study of 35 Large and Medium-Sized Cities in China. Sustain. Cities Soc. 2019 , 50 , 101646. [ Google Scholar ] [ CrossRef ]
  • Tone, K.; Tsutsui, M. Dynamic DEA with Network Structure: A Slacks-Based Measure Approach. Omega 2014 , 42 , 124–131. [ Google Scholar ] [ CrossRef ]
  • Xie, H.; Wang, W. Exploring the Spatial-Temporal Disparities of Urban Land Use Economic Efficiency in China and Its Influencing Factors under Environmental Constraints Based on a Sequential Slacks-Based Model. Sustainability 2015 , 7 , 10171–10190. [ Google Scholar ] [ CrossRef ]
  • Wu, C.; Wei, Y.D.; Huang, X.; Chen, B. Economic Transition, Spatial Development and Urban Land Use Efficiency in the Yangtze River Delta, China. Habitat Int. 2017 , 63 , 67–78. [ Google Scholar ] [ CrossRef ]
  • Xie, H.; Chen, Q.; Lu, F.; Wang, W.; Yao, G.; Yu, J. Spatial-Temporal Disparities and Influencing Factors of Total-Factor Green Use Efficiency of Industrial Land in China. J. Clean. Prod. 2019 , 207 , 1047–1058. [ Google Scholar ] [ CrossRef ]
  • Yuan, X.; Liu, R.; Huang, T. Analyzing Spatial–Temporal Patterns and Driving Mechanisms of Ecological Resilience Using the Driving Force–Pressure–State–Influence–Response and Environment–Economy–Society Model: A Case Study of 280 Cities in China. Systems 2024 , 12 , 311. [ Google Scholar ] [ CrossRef ]
  • Kuang, B.; Lu, X.; Zhou, M.; Chen, D. Provincial Cultivated Land Use Efficiency in China: Empirical Analysis Based on the SBM-DEA Model with Carbon Emissions Considered. Technol. Forecast. Soc. Change 2020 , 151 , 119874. [ Google Scholar ] [ CrossRef ]
  • Fu, Y.; Zhou, T.; Yao, Y.; Qiu, A.; Wei, F.; Liu, J.; Liu, T. Evaluating Efficiency and Order of Urban Land Use Structure: An Empirical Study of Cities in Jiangsu, China. J. Clean. Prod. 2021 , 283 , 124638. [ Google Scholar ] [ CrossRef ]
  • Seto, K.C.; Fragkias, M.; Güneralp, B.; Reilly, M.K. A Meta-Analysis of Global Urban Land Expansion. PLoS ONE 2011 , 6 , e23777. [ Google Scholar ] [ CrossRef ]
  • Paulsen, K. Geography, Policy or Market? New Evidence on the Measurement and Causes of Sprawl (and Infill) in US Metropolitan Regions. Urban Stud. 2014 , 51 , 2629–2645. [ Google Scholar ] [ CrossRef ]
  • Xia, J.; Yan, Z.; Zhou, W.; Fong, S.K.; Leong, K.C.; Tang, I.M.; Chang, S.W.; Leong, W.K.; Jin, S. Projection of the Zhujiang (Pearl) River Delta’s Potential Submerged Area Due to Sea Level Rise during the 21st Century Based on CMIP5 Simulations. Acta Oceanol. Sin. 2015 , 34 , 78–84. [ Google Scholar ] [ CrossRef ]
  • Masini, E.; Tomao, A.; Barbati, A.; Corona, P.; Serra, P.; Salvati, L. Urban Growth, Land-Use Efficiency and Local Socioeconomic Context: A Comparative Analysis of 417 Metropolitan Regions in Europe. Environ. Manag. 2019 , 63 , 322–337. [ Google Scholar ] [ CrossRef ]
  • Gao, X.; Zhang, A.; Sun, Z. How Regional Economic Integration Influence on Urban Land Use Efficiency? A Case Study of Wuhan Metropolitan Area, China. Land Use Policy 2020 , 90 , 104329. [ Google Scholar ] [ CrossRef ]
  • Cao, X.; Liu, Y.; Li, T.; Liao, W. Analysis of Spatial Pattern Evolution and Influencing Factors of Regional Land Use Efficiency in China Based on ESDA-GWR. Sci. Rep. 2019 , 9 , 520. [ Google Scholar ] [ CrossRef ]
  • Reynès, F. The Cobb–Douglas Function as a Flexible Function: A New Perspective on Homogeneous Functions through the Lens of Output Elasticities. Math. Soc. Sci. 2019 , 97 , 11–17. [ Google Scholar ] [ CrossRef ]
  • Liao, X.; Fang, C.; Shu, T.; Ren, Y. Spatiotemporal Impacts of Urban Structure upon Urban Land-Use Efficiency: Evidence from 280 Cities in China. Habitat Int. 2023 , 131 , 102727. [ Google Scholar ] [ CrossRef ]
  • Yan, S.; Peng, J.; Wu, Q. Exploring the Non-Linear Effects of City Size on Urban Industrial Land Use Efficiency: A Spatial Econometric Analysis of Cities in Eastern China. Land Use Policy 2020 , 99 , 104944. [ Google Scholar ] [ CrossRef ]
  • Bellassen, V.; Luyssaert, S. Carbon Sequestration: Managing Forests in Uncertain Times. Nature 2014 , 506 , 153–155. [ Google Scholar ] [ CrossRef ]
  • Hong, J.; Mao, Y. How Does the Green Efficiency of Urban Land Use Evolve in the Urban Agglomeration of China’s Middle Yangtze River? Nat. Resour. Model. 2024 , 37 , e12392. [ Google Scholar ] [ CrossRef ]
  • Anselin, L. Local Indicators of Spatial Association—LISA. Geographical Analysis 1995 , 27 , 93–115. [ Google Scholar ] [ CrossRef ]
  • Meng, Q.; Pi, H.; Xu, T.; Li, L. Exploring the Relationship between Regional Tourism Development and Land Use Efficiency: A Case Study of Guangxi Zhuang Autonomous Region, China. PLoS ONE 2024 , 19 , e0297196. [ Google Scholar ] [ CrossRef ]
  • Lin, Q.; Bai, S.; Qi, R. Cultivated Land Green Use Efficiency and Its Influencing Factors: A Case Study of 39 Cities in the Yangtze River Basin of China. Sustainability 2024 , 16 , 29. [ Google Scholar ] [ CrossRef ]
  • Han, D.; Cao, Z. Evaluation and Influential Factors of Urban Land Use Efficiency in Yangtze River Economic Belt. Land 2024 , 13 , 671. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Primary IndicatorsSecondary IndicatorsTertiary IndicatorsCalculation Method
Input IndicatorsCapitalFixed Capital Investment Per Unit Land Area (x1)Total Social Fixed Asset Investment in the Urban Area/Area of the Urban Area
LandProportion of Built-up Land Area (x2)Built-up Area of the Urban Area/Area of the Urban Area
LaborNumber of Urban Employees Per Unit Land Area (x3)Number of Urban Employees/Area of the Urban Area
Desired OutputsEconomic BenefitsGDP Per Unit Land Area (x4)GDP of the Urban Area/Area of the Urban Area
Average Wages of On-the-job Employees (x5)Total Wages of On-the-job Employees/Number of On-the-job Employees
Social WelfareNumber of Beds in Public Health Institutions Per Unit Land Area (x6)Number of Beds in Public Health Institutions in the Urban Area/Area of the Urban Area
Education Level Per Unit Land Area (x7)Number of Primary and Secondary Schools in the Urban Area/Area of the Urban Area
Ecological EnvironmentPer Capita Public Green Area (x8)Public Green Area in the Urban Area/Total Population of the Urban Area
Green Coverage Rate of Built-up Areas (x9)Total Green Area in the Urban Area/Area of the Urban Area
Undesired OutputsSewage DischargeSewage Discharge Per Unit Land Area (x10)Sewage Discharge/Area of the Urban Area
Carbon EmissionsCarbon Emissions Per Unit Land Area (x11)Total Carbon Emissions/Area of the Urban Area
Judgement CriteriaInteraction Effect
Nonlinear weakening
Univariate nonlinear weakening
Bivariate enhancement
Independent
Nonlinear enhancement
YearGlobal Moran’ Iz-Valuep-Value
2005−0.134−1.00410.116
2010−0.157−1.65250.048
20150.0162.43790.034
2020−0.182−1.94300.012
AreaCumulative Change ValueGeometric Mean
GMLECTCGMLECTC
Beijing0.0232430.058350.0507471.0417541.0261421.061249
Tianjin−0.03632−0.02563−0.010721.044871.0065571.046793
Shijiazhuang−0.43522−0.600780.7978561.3272821.1546141.274292
Tangshan0.2397690.0777580.197461.1742960.9333071.254153
Qinhuangdao−0.17902−0.138141.5950691.4531721.0084621.466284
Handan2.0122820.0659090.4975051.5187471.0325571.48238
Xingtai−0.77352−0.147150.6723461.2476531.0094161.199544
Baoding0.0575110.0453780.0407461.1221561.0302641.08913
Zhangjiakou0.5763651.3466110.0942041.2130281.2872171.108462
Chengde0.0359830.199405−0.092621.0945330.9668011.104921
Cangzhou0.049150.0022580.0508150.9928720.9808431.016284
Langfang−0.30318−0.13193−0.078921.1192440.9434931.183914
Hengshui−0.02581−0.147260.1151111.1873911.0159241.172685
BTH region0.0954810.0465220.3022771.1951541.030431.189238
x1x2x3x4x5x6x7x8x9
q statistic0.79040.11650.77610.67710.94860.87640.35800.04770.1246
p value0.0000.11510.0000.0000.0000.0000.25710.52590.1360
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Huang, H.; Yang, J. Analysis of the Spatiotemporal Differentiation and Influencing Factors of Land Use Efficiency in the Beijing–Tianjin–Hebei Urban Agglomeration. Land 2024 , 13 , 1508. https://doi.org/10.3390/land13091508

Huang H, Yang J. Analysis of the Spatiotemporal Differentiation and Influencing Factors of Land Use Efficiency in the Beijing–Tianjin–Hebei Urban Agglomeration. Land . 2024; 13(9):1508. https://doi.org/10.3390/land13091508

Huang, Haixin, and Jiageng Yang. 2024. "Analysis of the Spatiotemporal Differentiation and Influencing Factors of Land Use Efficiency in the Beijing–Tianjin–Hebei Urban Agglomeration" Land 13, no. 9: 1508. https://doi.org/10.3390/land13091508

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Design, Spectroscopic Analysis, DFT Calculations, Catalytic Evaluation, and Comprehensive In Silico and In Vitro Bioactivity Studies, Including Molecular Docking, of Novel Co(II) Complexes of 2-Hydroxy-5,3-(phenylallylidene)aminobenzoic Acid

  • Published: 13 September 2024

Cite this article

standard deviation use in analysis of research data

  • Shalima Kumari 1 ,
  • Maridula Thakur   ORCID: orcid.org/0000-0001-9788-5308 1 &
  • Sachin Kumar 1  

The main target of the current research is designing and synthesizing novel Co(II) complexes derived from 2-hydroxy-5,3-(phenylallylidene)aminobenzoic acid ligand and to enhance comprehension as potential photocatalyst, antibacterial, antifungal, and antioxidants alternatives by means of using density functional theory (DFT) calculations and molecular docking investigation. Thus, 2-hydroxy-5,3-(phenylallylidene)aminobenzoic acid (L1), was prepared by thermal condensation of cinnamaldehyde with 5-aminosalicylic acid in methanol. A series of cobalt(II) complexes with newly synthesized Schiff base ligand and para substituted phenylphenol (L2) corresponding to complex 1, [Co II (L1) 2 (H 2 O) 4 ], and mixed-ligand complexes, 2 and 3, [Co II (L2) 1/2 (L1)(H 2 O) 4/3 ], have been prepared and analysed by FTIR, 1 H NMR, HRMS, PXRD, electrochemical and fluorescence spectral techniques. DFT calculations were utilized to verify the molecular structure, analysis of Frontier Molecular orbitals (FMOs), molecular electrostatic potential (MEP) and reactivity descriptor for complexes 1–3. In vitro experiments were conducted to evaluate the biological properties of the complexes. These findings revealed that the synthesized metal complexes have heightened biological efficacy as related to the unbound ligand. Complex 2 has been observed to show effective antibacterial MIC value against P. aeruginosa (3.81 μg/mL) which is superior to the efficacy of standard drug chloramphenicol used (7.81 μg/mL) while the antifungal activity of complexes was found to be moderate to that of standard nystatin. Complex 2 has also demonstrated strong antioxidant activity (67.7%), which was on par with ascorbic acid used as a reference. Furthermore, in silico antibacterial activities (molecular docking) of the complexes have indicated these to exhibit excellent efficacy with docking score of − 11.1, − 9.8 and − 9.4 KCalmol −1 against target proteins E. coli (PDB ID: 4OPQ), P. aeruginosa, (PDB ID: 6NE0) and S. aureus, (PDB ID: 3Q89), respectively. The photocatalytic behaviour of the Co(II) based complexes has been studied by Buchwald-Hartwig C–N (BHC) and Suzuki Miyura C–C (SMC) cross coupling reactions. Lastly, a correlation between in vitro efficacies with molecular docking data and photocatalytic activity with DFT data was done and analysed.

Graphical Abstract.

standard deviation use in analysis of research data

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

standard deviation use in analysis of research data

Similar content being viewed by others

standard deviation use in analysis of research data

Synthesis, structural elucidation, DFT investigations, biological evaluation and molecular docking studies of tetraamide-based macrocyclic cobalt (II) complexes

standard deviation use in analysis of research data

Synthesis, spectral analysis, DFT-assisted studies, in vitro antioxidant and antimicrobial activity of transition metal complexes of hydrazone ligands derived from 4-nitrocinnemaldehyde

standard deviation use in analysis of research data

Structural elucidation, theoretical investigation, biological screening and molecular docking studies of metal(II) complexes of NN donor ligand derived from 4-(2-aminopyridin-3-methylene)aminobenzoic acid

Explore related subjects, data availability.

No datasets were generated or analysed during the current study.

K. Abou-Melha, Synthesis, characterization, and biological application of some transition metal complexes of N’-(benzo [d] [1, 3] dioxol-5-ylmethylene) isonicotinohydrazide. J. Mol. Struct. 1268 , 133626 (2022). https://doi.org/10.1016/j.molstruc.2022.133626

Article   CAS   Google Scholar  

M. Ikram, S. Rehman, A. Khan, R.J. Baker, T.S. Hofer, F. Subhan, C. Schulzke, Synthesis, characterization, antioxidant and selective xanthine oxidase inhibitory studies of transition metal complexes of novel amino acid bearing Schiff base ligand. Inorgan. Chim. Acta 428 , 117–126 (2015). https://doi.org/10.1016/j.ica.2015.01.021

L.H. Abdel-Rahman, R.M. El-Khatib, L.A. Nassr, A.M. Abu-Dief, F.E.D. Lashin, Design, characterization, teratogenicity testing, antibacterial, antifungal and DNA interaction of few high spin Fe (II) Schiff base amino acid complexes. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 111 , 266–276 (2013). https://doi.org/10.1016/j.saa.2013.03.061

A.D.M. Mohamad, M.J.A. Abualreish, A.M. Abu-Dief, Antimicrobial and anticancer activities of cobalt (III)-hydrazone complexes: Solubilities and chemical potentials of transfer in different organic co-solvent-water mixtures. J. Mol. Liq. 290 , 111162 (2019). https://doi.org/10.1016/j.molliq.2019.111162

S. Mukherjee, S. Chowdhury, A.P. Chattapadhyay, A. Bhattacharya, Spectroscopic, cytotoxic and DFT studies of a luminescent palladium (II) complex of a hydrazone ligand that induces apoptosis in human prostate cancer cells. Inorg. Chim. Acta 373 (1), 40–46 (2011). https://doi.org/10.1016/j.ica.2011.03.048

N.M. Hosny, Y.E. Sherif, Synthesis, structural, optical and anti-rheumatic activity of metal complexes derived from (E)-2-amino-N-(1-(2-aminophenyl) ethylidene) benzohydrazide (2-AAB) with Ru (III), Pd (II) and Zr (IV). Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 136 , 510–519 (2015). https://doi.org/10.1016/j.saa.2014.09.064

N.S. Youssef, E. El-Zahany, A.M. El-Seidy, A. Caselli, S. Fantauzzi, S. Cenini, Synthesis and characterisation of new Schiff base metal complexes and their use as catalysts for olefin cyclopropanation. Inorg. Chim. Acta 362 (6), 2006–2014 (2009). https://doi.org/10.1016/j.ica.2008.09.012

P.G. Cozzi, Metal-Salen Schiff base complexes in catalysis: practical aspects. Chem. Soc. Rev. 33 (7), 410–421 (2004). https://doi.org/10.1039/B307853C

Article   CAS   PubMed   Google Scholar  

K.C. Gupta, A.K. Sutar, Catalytic activities of Schiff base transition metal complexes. Coord. Chem. Rev. 252 (12–14), 1420–1450 (2008). https://doi.org/10.1016/j.ccr.2007.09.005

C.M. Che, J.S. Huang, Metal complexes of chiral binaphthyl Schiff-base ligands and their application in stereoselective organic transformations. Coord. Chem. Rev. 242 (1–2), 97–113 (2003). https://doi.org/10.1016/S0010-8545(03)00065-1

X.D. Jin, G.C. Han, H.M. Liang, L. Kou, J. Tong, K.J. Ren, X.B. Zhao, Synthesis, characterization, and crystal structure of cobalt (II) and zinc (II) complexes with a bulky Schiff base derived from rimantadine. Russ. J. Coord. Chem. 42 (8), 539–545 (2016). https://doi.org/10.1134/S1070328416080029

D.H. Shi, Z.L. You, Synthesis, characterization, and crystal structures of two Schiff base zinc (II) complexes with urease inhibitory activities. Russ. J. Coord. Chem. 36 , 535–540 (2010). https://doi.org/10.1134/S1070328410070109

G. Ceyhan, C. Celik, S. Uruş, İ Demirtaş, M. Elmastaş, M. Tümer, Antioxidant, electrochemical, thermal, antimicrobial and alkane oxidation properties of tridentate Schiff base ligands and their metal complexes. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 81 (1), 184–198 (2011). https://doi.org/10.1016/j.saa.2011.05.106

P.A. Vigato, S. Tamburini, The challenge of cyclic and acyclic Schiff bases and related derivatives. Coord. Chem. Rev. 248 (17–20), 1717–2128 (2004). https://doi.org/10.1016/j.cct.2003.09.003

P.A. Vigato, S. Tamburini, L. Bertolo, The development of compartmental macrocyclic Schiff bases and related polyamine derivatives. Coord. Chem. Rev. 251 (11–12), 1311–1492 (2007). https://doi.org/10.1016/j.ccr.2006.11.016

S.D. Al-Qahtani, A. Alharbi, M.M. Abualnaja, A. Hossan, M. Alhasani, A.M. Abu-Dief, N.M. El-Metwaly, Synthesis and elucidation of binuclear thiazole-based complexes from Co (II) and Cu (II) ions: Conductometry, cytotoxicity and computational implementations for various verifications. J. Mol. Liq. 349 , 118100 (2022). https://doi.org/10.1016/j.ccr.2006.11.016

S.A. Almalki, T.M. Bawazeer, B. Asghar, A. Alharbi, M.M. Aljohani, M.E. Khalifa, N. El-Metwaly, Synthesis and characterization of new thiazole-based Co (II) and Cu (II) complexes; therapeutic function of thiazole towards COVID-19 in comparing to current antivirals in treatment protocol. J. Mol. Struct. 1244 , 130961 (2021). https://doi.org/10.1016/j.molstruc.2021.130961

Article   CAS   PubMed   PubMed Central   Google Scholar  

A.M. Abu-Dief, R.M. El-Khatib, F.S. Aljohani, S.O. Alzahrani, A. Mahran, M.E. Khalifa, N.M. El-Metwaly, Synthesis and intensive characterization for novel Zn (II), Pd (II), Cr (III) and VO (II)-Schiff base complexes; DNA-interaction, DFT, drug-likeness and molecular docking studies. J. Mol. Struct. 1242 , 130693 (2021). https://doi.org/10.1016/j.molstruc.2021.130693

M.P. Lopez-Gresa, R. Ortiz, L. Perelló, J. Latorre, M. Liu-Gonzalez, S. Garcıa-Granda, E. Canton, Interactions of metal ions with two quinolone antimicrobial agents (cinoxacin and ciprofloxacin): Spectroscopic and X-ray structural characterization. Antibacterial studies. J. Inorgan. Biochem. 92 (1), 65–74 (2002). https://doi.org/10.1016/S0162-0134(02)00487-7

Vol’pin, M. E., Novodarova, G. N., Krainova, N. Y., Lapikova, V. P., & Aver’yanov, A. A., Redox and fungicidal properties of phthalocyanine metal complexes as related to active oxygen. J. Inorg. Biochem. 81 (4), 285–292 (2000). https://doi.org/10.1016/S0162-0134(00)00114-8

Article   Google Scholar  

W.J. Geary, The use of conductivity measurements in organic solvents for the characterisation of coordination compounds. Coord. Chem. Rev. 7 (1), 81–122 (1971). https://doi.org/10.1016/S0010-8545(00)80009-0

M.H. Abdel-Rhman, M.A. Hussien, H.M. Mahmoud, N.M. Hosny, Synthesis, characterization, molecular docking and cytotoxicity studies on N-benzyl-2-isonicotinoylhydrazine-1-carbothioamide and its metal complexes. J. Mol. Struct. 1196 , 417–428 (2019). https://doi.org/10.1016/j.molstruc.2019.06.092

N.M. Hosny, A. Belal, R. Motawea, M.A. Hussien, M.H. Abdel-Rhman, Spectral characterization, DFT, docking and cytotoxicity of N-benzyl-4, 5-dihydro-3-methyl-5-oxo-1H-pyrazole-4-carbothioamide and its metal complexes. J. Mol. Struct. 1232 , 130020 (2021). https://doi.org/10.1016/j.molstruc.2021.130020

O. Pouralimardan, A.C. Chamayou, C. Janiak, H. Hosseini-Monfared, Hydrazone Schiff base-manganese (II) complexes: Synthesis, crystal structure and catalytic reactivity. Inorg. Chim. Acta 360 (5), 1599–1608 (2007). https://doi.org/10.1016/j.ica.2006.08.056

J. Subhash, A. Phor, A. Chaudhary, Synthesis, structural elucidation, cytotoxic, antimicrobial, antioxidant, density functional theory and molecular docking studies of mononuclear Ru (II) complexes of N4O4-bearing macrocyclic ligands. J Inorgan Organometal. Poly. Mater. 34 (2), 827–847 (2024). https://doi.org/10.1007/s10904-023-02862-y

J. Subhash, M. Gupta, A. Phor, A. Chaudhary, Synthesis, spectral characterisation, in vitro cytotoxicity, antimicrobial, antioxidant, DFT and molecular docking studies of Ru (III) complexes derived from amide-based macrocyclic ligands. Res. Chem. Intermed. 50 (3), 1081–1111 (2024). https://doi.org/10.1007/s11164-023-05124-1

J. Subhash, A. Chaudhary, Synthesis, spectroscopic characterization, in vitro cytotoxic, antimicrobial and antioxidant studies of Co (II) complexes bearing pyridine-based macrocyclic ligands with density function theory (DFT) and molecular docking investigations. Res. Chem. Intermed. 49 (11), 4729–4758 (2023). https://doi.org/10.1007/s11164-023-05096-2

C.K. Manna, R. Naskar, B. Bera, A. Das, T.K. Mondal, A new palladium (II) phosphino complex with ONS donor Schiff base ligand: synthesis, characterization and catalytic activity towards Suzuki–Miyaura cross-coupling reaction. J. Mol. Struct. 1237 , 130322 (2021). https://doi.org/10.1016/j.molstruc.2021.130322

K. Buldurun, T. Sarıdağ, Synthesis of Pd+ 2 complexes of Schiff bases containing methyl 2-amino-6-benzyl-4, 5, 6, 7-tetrahydrothieno [2, 3-c] pyridine-3-carboxylate and spectral and catalytic activities. J. Mol. Struct. 1273 , 134278 (2023). https://doi.org/10.1016/j.molstruc.2022.134278

A. Savcı, K. Buldurun, G. Kırkpantur, A new Schiff base containing 5-FU and its metal Complexes: Synthesis, Characterization, and biological activities. Inorg. Chem. Commun. 134 , 109060 (2021). https://doi.org/10.1016/j.inoche.2021.109060

K. Buldurun, İ Özdemir, 5-Nitrobenzimidazole containing Pd (II) catalyzed CC cross-coupling reactions: The effect of the N-substituent of the benzimidazole structure on catalyst activity. J. Mol. Struct. 1192 , 172–177 (2019). https://doi.org/10.1016/j.molstruc.2019.04.101

A. Akdeniz, N. Turan, Synthesis, characterization, Suzuki-Miyaura and Mizoroki-Heck cross-coupling reactions of Schiff base-Pd (II) complexes. J. Mol. Struct. 1287 , 135724 (2023). https://doi.org/10.1016/j.molstruc.2023.135724

I. Hussain, J. Capricho, M.A. Yawer, Synthesis of Biaryls via Ligand-Free Suzuki-Miyaura Cross-Coupling Reactions: A Review of Homogeneous and Heterogeneous Catalytic Developments. Adv. Synth. Catal. 358 (21), 3320–3349 (2016). https://doi.org/10.1002/adsc.201600354

N.G. Schmidt, E. Eger, W. Kroutil, Building bridges: Biocatalytic C-C-bond formation toward multifunctional products. ACS Catal. 6 (7), 4286–4311 (2016). https://doi.org/10.1021/acscatal.6b00758

N. Corrigan, S. Shanmugam, J. Xu, C. Boyer, Photocatalysis in organic and polymer synthesis. Chem. Soc. Rev. 45 (22), 6165–6212 (2016). https://doi.org/10.1039/C6CS00185H

C.C. Johansson Seechurn, M.O. Kitching, T.J. Colacot, V. Snieckus, Palladium-catalyzed cross-coupling: a historical contextual perspective to the 2010 Nobel Prize. Angew. Chem. Int. Ed. 51 (21), 5062–5085 (2012). https://doi.org/10.1002/anie.201107017

M. Hartings, Reactions coupled to palladium. Nat. Chem. 4 (9), 764–764 (2012). https://doi.org/10.1038/nchem

M. Mirza-Aghayan, M. Heidarian, M. Alizadeh, Pd-Isatin-Schiff base complex supported on graphene oxide as a catalyst for the Sonogashira cross-coupling reaction. J. Organometal. Chem. (2024). https://doi.org/10.1016/j.jorganchem.2024.123079

J.H. Kim, J.W. Kim, M. Shokouhimehr, Y.S. Lee, Polymer-supported N-heterocyclic carbene− palladium complex for heterogeneous Suzuki cross-coupling reaction. J. Org. Chem. 70 (17), 6714–6720 (2005). https://doi.org/10.1021/jo050721m

J.H. Kim, D.H. Lee, B.H. Jun, Y.S. Lee, Copper-free Sonogashira cross-coupling reaction catalyzed by polymer-supported N-heterocyclic carbene palladium complex. Tetrahedron Lett. 48 (40), 7079–7084 (2007). https://doi.org/10.1016/j.tetlet.2007.08.015

M. Shokouhimehr, J.H. Kim, Y.S. Lee, Heterogeneous Heck reaction catalyzed by recyclable polymer-supported N-heterocyclic carbene-palladium complex. Synlett 2006 (04), 0618–0620 (2006). https://doi.org/10.1055/s-2006-932467

J.H. Park, F. Raza, S.J. Jeon, H.I. Kim, T.W. Kang, D. Yim, J.H. Kim, Recyclable N-heterocyclic carbene/palladium catalyst on graphene oxide for the aqueous-phase Suzuki reaction. Tetrahedron Lett. 55 (23), 3426–3430 (2014). https://doi.org/10.1016/j.tetlet.2014.04.078

N. Miyaura, A. Suzuki, Palladium-catalyzed cross-coupling reactions of organoboron compounds. Chem. Rev. 95 (7), 2457–2483 (1995). https://doi.org/10.1021/cr00039a007

N. Miyaura, K. Yamada, A. Suzuki, A new stereospecific cross-coupling by the palladium-catalyzed reaction of 1-alkenylboranes with 1-alkenyl or 1-alkynyl halides. Tetrahedron Lett. 20 (36), 3437–3440 (1979). https://doi.org/10.1016/S0040-4039(01)95429-2

C.K. Prier, D.A. Rankic, D.W. MacMillan, Visible light photoredox catalysis with transition metal complexes: applications in organic synthesis. Chem. Rev. 113 (7), 5322–5363 (2013). https://doi.org/10.1021/cr300503r

K.L. Skubi, T.R. Blum, T.P. Yoon, Dual catalysis strategies in photochemical synthesis. Chem. Rev. 116 (17), 10035–10074 (2016). https://doi.org/10.1021/acs.chemrev.6b00018

M.D. Karkas, J.A. Porco Jr., C.R. Stephenson, Photochemical approaches to complex chemotypes: applications in natural product synthesis. Chem. Rev. 116 (17), 9683–9747 (2016). https://doi.org/10.1021/acs.chemrev.5b00760

F. Feizpour, M. Jafarpour, A. Rezaeifard, Band gap modification of TiO 2 nanoparticles by ascorbic acid-stabilized Pd nanoparticles for photocatalytic Suzuki-Miyaura and Ullmann coupling reactions. Catal. Lett. 149 , 1595–1610 (2019). https://doi.org/10.1007/s10562-019-02749-z

T. Yamada, H. Masuda, K. Park, T. Tachikawa, N. Ito, T. Ichikawa, H. Sajiki, Development of titanium dioxide-supported Pd catalysts for ligand-free Suzuki-Miyaura coupling of aryl chlorides. Catalysts 9 (5), 461 (2019). https://doi.org/10.3390/catal9050461

Q. Gu, Q. Jia, J. Long, Z. Gao, Heterogeneous Photocatalyzed C− C Cross-coupling Reactions Under Visible-light and Near-infrared Light Irradiation. ChemCatChem 11 (2), 669–683 (2019). https://doi.org/10.1002/cctc.201801616

H. Huang, J. Wang, J. Zhang, J. Cai, J. Pi, J.F. Xu, Inspirations of cobalt oxide nanoparticle based anticancer therapeutics. Pharmaceutics 13 (10), 1599 (2021). https://doi.org/10.3390/pharmaceutics1310159932

N.G. Yernale, B.S. Matada, G.B. Vibhutimath, V.D. Biradar, M.R. Karekal, M.D. Udayagiri, M.H. Mathada, Indole core-based Copper (II), Cobalt (II), Nickel (II) and Zinc (II) complexes: Synthesis, spectral and biological study. J. Mol. Struct. 1248 , 131410 (2022). https://doi.org/10.1016/j.molstruc.2021.131410

J.L. Bretonnet, Basics of the density functional theory. AIMS Mater. Sci. 4 (6), 1372–1405 (2017). https://doi.org/10.3934/matersci.2017.6.1372

R. Nagai, R. Akashi, S. Sasaki, S. Tsuneyuki, Neural-network Kohn-Sham exchange-correlation potential and its out-of-training transferability. J. Chem. Phys. Doi 10 (1063/1), 5029279 (2018)

Google Scholar  

Jia, W., Wang, H., Chen, M., Lu, D., Lin, L., Car, R., ... & Zhang, L. (2020, November). Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. In  SC20: International conference for high performance computing, networking, storage and analysis  (pp. 1–14). IEEE. https://doi.org/10.1109/SC41405 . 2020.00009

J. Fan, A. Fu, L. Zhang, Progress in molecular docking. Quant. Biol. 7 , 83–89 (2019). https://doi.org/10.1007/s40484-019-0172-y

D. Bandyopadhyay, M. Layek, M. Fleck, R. Saha, C. Rizzoli, Synthesis, crystal structure and antibacterial activity of azido complexes of cobalt (III) containing heteroaromatic Schiff bases. Inorg. Chim. Acta 461 , 174–182 (2017). https://doi.org/10.1016/j.ica.2017.02.018

Kumari, S., Thakur, M., Kumar, S., Devi, M., Sharma, S., Bhatt, A. K., & Kumari, M. (2024). Synthesis, Structural Analysis, Theoretical Calculations, In Silico Prediction of Antibacterial Efficacy, Toxicity Assessment, In Vitro Antioxidant Evaluation and Polymerization Activity of Cobalt (II) Coordination Complexes of Para-biphenylol. Chemistry Africa, 1–21. https://doi.org/10.1007/s42250-024-00898-2

M. Toscano, N. Russo, Soybean aglycones antioxidant activity. A theoretical investigation. Comput. Theor. Chem. 1077 , 119–124 (2016). https://doi.org/10.1016/j.comptc.2015.11.008

M. Leopoldini, I.P. Pitarch, N. Russo, M. Toscano, Structure, conformation, and electronic properties of apigenin, luteolin, and taxifolin antioxidants A first principle theoretical study. J. Phys. Chem. A 108 (1), 92–96 (2004). https://doi.org/10.1021/jp035901j

D.B. Kitchen, H. Decornez, J.R. Furr, J. Bajorath, Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Rev. Drug Dis. 3 (11), 935–949 (2004). https://doi.org/10.1038/nrd1549

H. Ge, Y. Wang, C. Li, N. Chen, Y. Xie, M. Xu, J. Xu, Molecular dynamics-based virtual screening: accelerating the drug discovery process by high-performance computing. J. Chem. Inform. Model. 53 (10), 2757–2764 (2013). https://doi.org/10.1021/ci400391s

G. Mazzone, N. Malaj, N. Russo, M. Toscano, Density functional study of the antioxidant activity of some recently synthesized resveratrol analogues. Food Chem. 141 (3), 2017–2024 (2013). https://doi.org/10.1016/j.foodchem.2013.05.071

G. Mazzone, N. Malaj, A. Galano, N. Russo, M. Toscano, Antioxidant properties of several coumarin–chalcone hybrids from theoretical insights. Rsc Adv. 5 (1), 565–575 (2015). https://doi.org/10.1039/C4RA11733F

G. Mazzone, M. Toscano, N. Russo, Density functional predictions of antioxidant activity and UV spectral features of nasutin A, isonasutin, ellagic acid, and one of its possible derivatives. J. Agricul. Food Chem. 61 (40), 9650–9657 (2013). https://doi.org/10.1021/jf403262k

M. Leopoldini, T. Marino, N. Russo, M. Toscano, Density functional computations of the energetic and spectroscopic parameters of quercetin and its radicals in the gas phase and in solvent. Theor. Chem. Acc. 111 , 210–216 (2004). https://doi.org/10.1007/s00214-003-0544-1

O. Trott, A.J. Olson, Software news and update AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function. Effic. Optim. Multithread. 31 , 455–461 (2009)

G.M. Morris, R. Huey, W. Lindstrom, M.F. Sanner, R.K. Belew, D.S. Goodsell, A.J. Olson, AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 30 (16), 2785–2791 (2009)

Design, L. I. G. A. N. D. (2014). Pharmacophore and ligand-based design with Biovia Discovery Studio®.  BIOVIA. California .

O. Trott, A.J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31 (2), 455–461 (2010). https://doi.org/10.1002/jcc.21334

S. Sharma, P. Kumar, R. Chandra, Applications of BIOVIA materials studio, LAMMPS, and GROMACS in various fields of science and engineering. Mol. Dyn. Simul Nanocompos. BIOVIA Mater. Studio, Lammps and Gromacs. (2019). https://doi.org/10.1016/B978-0-12-816954-4.00007-3

BIOVIA, D. S. (2016). Discovery studio modeling environment, release 2017, San Diego: DassaultSystèmes, 2016. Available from: Accessed 1 September 2016.

M. Kumar, A. Phor, M. Gupta, A. Chaudhary, Design, synthesis, characterization, in vitro cytotoxic, antimicrobial, antioxidant studies, DFT, thermal and molecular docking evaluation of biocompatible Co (II) complexes of N4O4-macrocyclic ligands. Comput. Biol. Chem. 110 , 108032 (2024). https://doi.org/10.1016/j.compbiolchem.2024.108032

S.M. Shanab, S.S. Mostafa, E.A. Shalaby, G.I. Mahmoud, Aqueous extracts of microalgae exhibit antioxidant and anticancer activities. Asian Pac. J. Trop. Biomed. 2 (8), 608–615 (2012). https://doi.org/10.1016/S2221-1691(12)60106-3

Article   PubMed   PubMed Central   Google Scholar  

N.J. Patel, B.S. Bhatt, P.A. Vekariya, F.U. Vaidya, C. Pathak, J. Pandya, M.N. Patel, Synthesis, characterization, structural-activity relationship and biomolecular interaction studies of heteroleptic Pd (II) complexes with acetyl pyridine scaffold. J. Mol. Struct. 1221 , 128802 (2020). https://doi.org/10.1016/j.molstruc.2020.128802

A.M. Bond, R.L. Martin, Electrochemistry and redox behaviour of transition metal dithiocarbamates. Coord. Chem. Rev. 54 , 23–98 (1984). https://doi.org/10.1016/0010-8545(84)85017-1

A.E. Fischer, Y. Show, G.M. Swain, Electrochemical performance of diamond thin-film electrodes from different commercial sources. Anal. Chem. 76 (9), 2553–2560 (2004). https://doi.org/10.1021/ac035214o

Roy, R., Mitra, S., Das, R., & Mukherjee, S. (1996, April). Absorption and emission spectra of 4-methyl-2, 6-diformyl phenol in protic solvents: Interaction with amine bases. In  Proceedings of the Indian Academy of Sciences-Chemical Sciences  (Vol. 108, pp. 79–87). Springer India. https://doi.org/10.1007/BF02872522

P.G. Avaji, S.A. Patil, P.S. Badami, Synthesis, spectral, thermal, solid-state DC electrical conductivity and biological studies of Co (II) complexes with Schiff bases derived from 3-substituted-4-amino-5-hydrazino-1, 2, 4-triazole and substituted salicylaldehydes. Transition Met. Chem. 33 , 275–283 (2008). https://doi.org/10.1007/s11243-007-9041-z

A.W. Czarnik, Desperately seeking sensors. Chem. Biol. 2 (7), 423–428 (1995)

H.M. Abd El-Lateef, A.R. Sayed, M.S.S. Adam, Sulfonated salicylidene thiadiazole complexes with Co (II) and Ni (II) ions as sustainable corrosion inhibitors and catalysts for cross coupling reaction. Appl. Organomet. Chem. 33 (8), e4987 (2019). https://doi.org/10.1002/aoc.4987

F. Akman, N. Issaoui, A.S. Kazachenko, Intermolecular hydrogen bond interactions in the thiourea/water complexes (Thio-(H2O) n)(n= 1,…, 5): X-ray, DFT, NBO, AIM, and RDG analyses. J. Mol. Model. 26 (6), 161 (2020). https://doi.org/10.1007/s00894-020-04423-3

K.V. Das, C.Y. Panicker, B. Narayana, P.S. Nayak, B.K. Sarojini, A.A. Al-Saadi, FT-IR, molecular structure, first order hyperpolarizability, NBO analysis, HOMO and LUMO and MEP analysis of 1-(10H-phenothiazin-2-yl) ethanone by HF and density functional methods. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 135 , 162–171 (2015). https://doi.org/10.1016/j.saa.2014.06.155

K. Alaoui, R. Touir, M. Galai, H. Serrar, M. Ouakki, S. Kaya, Y. El Kacimi, Electrochemical and computational studies of some triazepine carboxylate compounds as acid corrosion inhibitors for mild steel. J. Bio-and Tribo-Corr. 4 , 1–18 (2018). https://doi.org/10.1007/s40735-018-0154-z

F. Akman, A.S. Kazachenko, N. Issaoui, DFT calculations of some important radicals used in the nitroxide-mediated polymerization and their HOMO-LUMO, natural bond orbital, and molecular electrostatic potential comparative analysis. Polym. Sci., Ser. B 64 (5), 765–777 (2022). https://doi.org/10.1134/S156009042270035X

O.A. El-Gammal, F.S. Mohamed, G.N. Rezk, A.A. El-Bindary, Structural characterization and biological activity of a new metal complexes based of Schiff base. J. Mol. Liq. 330 , 115522 (2021). https://doi.org/10.1016/j.molliq.2021.115522

N.P. Bellafont, F. Illas, P.S. Bagus, Validation of Koopmans’ theorem for density functional theory binding energies. Phys. Chem. Chem. Phys. 17 (6), 4015–4019 (2015). https://doi.org/10.1039/C4CP05434B

B.A. Anjali, F.B. Sayyed, C.H. Suresh, Correlation and prediction of redox potentials of hydrogen evolution mononuclear cobalt catalysts via molecular electrostatic potential: a DFT study. J. Phys. Chem. A 120 (7), 1112–1119 (2016). https://doi.org/10.1021/acs.jpca.5b11543

H. Ahankar, A. Ramazani, K. Ślepokura, T. Lis, S.W. Joo, Synthesis of pyrrolidinone derivatives from aniline, an aldehyde and diethyl acetylenedicarboxylate in an ethanolic citric acid solution under ultrasound irradiation. Green Chem. 18 (12), 3582–3593 (2016). https://doi.org/10.1039/C6GC00157B

C. Subhash, A., Jyoti, Kumar, M., Mamta, & Solanki, R., Synthesis, structural elucidation, DFT investigations, biological evaluation and molecular docking studies of tetraamide-based macrocyclic cobalt (II) complexes. J. Iranian Chem. Soc. 20 (9), 2339–2362 (2023). https://doi.org/10.1007/s13738-023-02847-1

C. Subhash, A., Mamta, & Jyoti., Synthesis, structural characterization, thermal analysis, DFT, biocidal evaluation and molecular docking studies of amide-based Co (II) complexes. Chem. Pap. 77 (9), 5059–5078 (2023). https://doi.org/10.1007/s11696-023-02843-y

C. Subhash, A., Jyoti, Kumar, M., Kumar, N., & Agarwal, N. K., Synthesis, spectroscopic characterization, biocidal evaluation molecular docking & DFT investigation of 16–18 membered macrocyclic complexes of cobalt (II). J. Chem. Sci. 134 (4), 113 (2022). https://doi.org/10.1007/s12039-022-02109-2

H.M. Abd El-Lateef, A.M. Ali, M.M. Khalaf, A. Abdou, New Fe (III), Co (II), Ni (II), Cu (II), and Zn (II) mixed-ligand complexes: stractural, DFT, biological, and molecular docking studies. Bull. Chem. Soc. Ethiopia 38 (2), 397–416 (2024). https://doi.org/10.4314/bcse.v38i1.12

A. Eskandari, M. Jafarpour, A. Rezaeifard, M. Salimi, Supramolecular photocatalyst of Palladium (II) encapsulated within Dendrimer on TiO2 nanoparticles for photo-induced Suzuki–Miyaura and Sonogashira cross-coupling reactions. Appl. Organomet. Chem. 33 (10), e5093 (2019). https://doi.org/10.1002/aoc.5093

H.R. Choe, S.S. Han, Y.I. Kim, C. Hong, E.J. Cho, K.M. Nam, Understanding and improving photocatalytic activity of Pd-Loaded BiVO4 microspheres: application to visible light-induced Suzuki–Miyaura coupling reaction. ACS Appl. Mater. Interface. 13 (1), 1714–1722 (2020). https://doi.org/10.1021/acsami.0c15488

L. Brus, Noble metal nanocrystals: plasmon electron transfer photochemistry and single-molecule Raman spectroscopy. Acc. Chem. Res. 41 (12), 1742–1749 (2008). https://doi.org/10.1021/ar800121r

C.R. LeBlond, A.T. Andrews, Y. Sun, J.R. Sowa, Activation of aryl chlorides for Suzuki cross-coupling by ligandless, heterogeneous palladium. Org. Lett. 3 (10), 1555–1557 (2001). https://doi.org/10.1021/ol015850d

Roundhill, D. M. (2013).  Photochemistry and photophysics of metal complexes . Springer Science & Business Media.

M.B. Thathagar, J.E. ten Elshof, G. Rothenberg, Pd Nanoclusters in C-C Coupling Reactions: Proof of Leaching. Angew. Chem. Int. Ed. 45 (18), 2886–2890 (2006). https://doi.org/10.1002/anie.200504321

N. Sharma, M. Kumari, V. Kumar, S.C. Chaudhry, Synthesis, structural characterisation and antibacterial activity of bis (1-phenyl-1, 3-butanedionato) non-oxovanadium (IV) hydroxamates. J. Enzyme Inhib. Med. Chem. 25 (5), 708–714 (2010). https://doi.org/10.3109/14756360903540292

M. Salehi, R. Kia, A. Khaleghian, Syntheses, crystal structures, and antibacterial activities of two cobalt (III) complexes. J. Coordinat. Chem. 65 (17), 3007–3018 (2012). https://doi.org/10.1080/00958972.2012.708737

Z.H. Chohan, C.T. Supuran, Metalloantibiotics: Synthesis, characterization and in-vitro antibacterial studies on cobalt (II), copper (II), nickel (II) and zinc (II) complexes with cloxacillin. J. Enzyme Inhib. Med. Chem. 21 (4), 441–448 (2006). https://doi.org/10.1080/14756360500397307

M. Antonijević-Nikolić, J. Antić-Stanković, B. Dražić, S. Tanasković, New macrocyclic Cu (II) complex with bridge terephthalate: synthesis, spectral properties, in vitro cytotoxic and antimicrobial activity. comparison with related complexes. J. Mol. Struct. 1184 , 41–48 (2019). https://doi.org/10.1016/j.molstruc.2018.10.027

Z.H. Chohan, C.T. Supuran, In-vitro antibacterial and cytotoxic activity of cobalt (II), copper (II), nickel (II) and zinc (II) complexes of the antibiotic drug cephalothin (keflin). J. Enzyme Inhib. Med. Chem. 20 (5), 463–468 (2005). https://doi.org/10.1080/10485250500219765

R. Basavarajaiah, S. M., Nagesh, G. Y., Mohammad, J., & Ramakrishna Reddy, K., Synthesis, characterization, DFT analysis, biological evaluation, and molecular docking of Schiff Base derived from Isatin–isoniazid and its metal (II) complexes. Polycyc. Aromat. Compd. (2022). https://doi.org/10.1080/10406638.2022.2138927

M. Ibrahim, A. Khan, B. Faiz, M. Ikram, H. Nabi, M. Shah, A.A. Ahuchaogu, In Vitro Antioxidant evaluation and DNA binding ability of Ni (II), Co (II), Cu (II) and Zn (II) metal complexes containing bidentate Schiff base. IOSR J Appl Chem 10 , 06–14 (2012)

Download references

Acknowledgements

Authors wish to thank Department of Chemistry, Lovely Professional University, Jalandhar, Punjab, India for carrying out SEM-EDX, PXRD studies. Authors wish to thank Indian Insittute of Technology Mandi, H. P. for conducting 1 H NMR and HRMS studies. Authors also wish to thank SAIF, Punjab University, Chandigarh for conducting FTIR studies.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Chemistry, Himachal Pradesh University, Summer Hill, Shimla, H.P, 171005, India

Shalima Kumari, Maridula Thakur & Sachin Kumar

You can also search for this author in PubMed   Google Scholar

Contributions

Shalima Kumari: methodology, formal analysis, investigation, data curation, writing-review and editing, Visualization: Maridula Thakur: conceptualization, formal analysis, writing-original draft, writing-review and editing, supervision, project administration validation, investigation. Sachin Kumar: software, resources and editing. All authors reviewed the manucsript.

Corresponding author

Correspondence to Maridula Thakur .

Ethics declarations

Conflict of interest.

The authors declare no competing interests.

Ethical Approval

Not applicable.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 19989 KB)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Kumari, S., Thakur, M. & Kumar, S. Design, Spectroscopic Analysis, DFT Calculations, Catalytic Evaluation, and Comprehensive In Silico and In Vitro Bioactivity Studies, Including Molecular Docking, of Novel Co(II) Complexes of 2-Hydroxy-5,3-(phenylallylidene)aminobenzoic Acid. J Inorg Organomet Polym (2024). https://doi.org/10.1007/s10904-024-03351-6

Download citation

Received : 24 June 2024

Accepted : 17 August 2024

Published : 13 September 2024

DOI : https://doi.org/10.1007/s10904-024-03351-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Co(II)-based mixed-ligand complexes
  • DFT Calculations
  • Molecular Docking
  • Catalytic Activity
  • In vitro biological functionality
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. Standard Deviation: Interpretations and Calculations

    The standard deviation (SD) is a single number that summarizes the variability in a dataset. It represents the typical distance between each data point and the mean. Smaller values indicate that the data points cluster closer to the mean—the values in the dataset are relatively consistent. Conversely, higher values signify that the values ...

  2. Standard Deviation

    The standard deviation (SD) measures the extent of scattering in a set of values, typically compared to the mean value of the set.[1][2][3] The calculation of the SD depends on whether the dataset is a sample or the entire population. Ideally, studies would obtain data from the entire target population, which defines the population parameter. However, this is rarely possible in medical ...

  3. What is Standard Deviation Statistics and Data Analysis

    Standard deviation is crucial in statistics and data analysis for understanding the variability of a dataset. It helps identify trends, assess data reliability, detect outliers, compare datasets, and evaluate risk. A high standard deviation indicates a larger spread of values. In contrast, a low standard deviation shows that the values are more ...

  4. Why is Standard Deviation Important? (Explanation + Examples)

    The answer: Standard deviation is important because it tells us how spread out the values are in a given dataset. Whenever we analyze a dataset, we're interested in finding the following metrics: The center of the dataset. The most common way to measure the "center" is with the mean and the median. The spread of values in the dataset.

  5. Understanding the Difference Between Standard Deviation and Standard

    As an important aside, in a normal distribution there is a specific relationship between the mean and SD: mean ± 1 SD includes 68.3% of the population, mean ± 2 SD includes 95.5% of the population, and mean ± 3 SD includes 99.7% of the population.

  6. How to Interpret Standard Deviation and Standard Error in Survey Research

    The individual responses did not deviate at all from the mean. In Rating "B", even though the group mean is the same (3.0) as the first distribution, the Standard Deviation is higher. The Standard Deviation of 1.15 shows that the individual responses, on average*, were a little over 1 point away from the mean.

  7. How to Calculate Standard Deviation (Guide)

    The standard deviation is usually calculated automatically by whichever software you use for your statistical analysis. But you can also calculate it by hand to better understand how the formula works. There are six main steps for finding the standard deviation by hand. We'll use a small data set of 6 scores to walk through the steps.

  8. A beginner's guide to standard deviation and standard error

    How to calculate standard deviation. Standard deviation is rarely calculated by hand. It can, however, be done using the formula below, where x represents a value in a data set, μ represents the mean of the data set and N represents the number of values in the data set. The steps in calculating the standard deviation are as follows: For each ...

  9. Mean & Standard Deviation

    That number, 8.40, is 1 unit of standard deviation. The 68/95/99.7 Rule tells us that standard deviations can be converted to percentages, so that: 68% of scores fall within 1 SD of the mean. 95% of all scores fall within 2 SD of the mean. 99.7% of all scores fall within 3 SD of the mean. For the visual learners, you can put those percentages ...

  10. Variability

    The larger the standard deviation, the more variable the data set is. There are six steps for finding the standard deviation by hand: List each score and find their mean. Subtract the mean from each score to get the deviation from the mean. Square each of these deviations. Add up all of the squared deviations.

  11. How to Interpret Standard Deviation Results

    Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean. If the data points are further from the mean, there is a higher deviation within the data set; thus, the more ...

  12. An Overview of the Fundamentals of Data Management, Analysis, and

    Quantitative data analysis involves the use of statistics. Descriptive statistics help summarize the variables in a data set to show what is typical for a sample. Measures of central tendency (ie, mean, median, mode), measures of spread (standard deviation), and parameter estimation measures (confidence intervals) may be calculated.

  13. Describing Data using the Mean and Standard Deviation

    A large standard deviation indicates that the data points are far from the mean, and a small standard deviation indicates that they are clustered closely around the mean. ... If the histogram of a data set is approximately bell-shaped, we can approximate the percentage of data between standard deviations using the empirical rule. Empirical Rule.

  14. Descriptive Statistics for Summarising Data

    Exploratory Data Analysis procedure; ... For the variance and standard deviation, ... Often, EDA techniques are used as data screening devices, which are typically not reported in actual write-ups of research (we will discuss data screening in more detail in Procedure 10.1007/978-981-15-2537-7_8#Sec11). This is a perfectly legitimate use for ...

  15. What to use to express the variability of data: Standard deviation or

    In such cases, data can be presented using other measures of variability (e.g. mean absolute deviation and the interquartile range), or can be transformed (common transformations include the logarithmic, inverse, square root, and arc sine transformations). Some journal editors require their authors to use the SD and not the SEM.

  16. Measures of Variability: Range, Interquartile Range, Variance, and

    Conveniently, the standard deviation uses the original units of the data, which makes interpretation easier. Consequently, the standard deviation is the most widely used measure of variability. For example, in the pizza delivery example, a standard deviation of 5 indicates that the typical delivery time is plus or minus 5 minutes from the mean.

  17. How to Interpret Standard Deviation in a Statistical Data Set

    Standard deviation can be difficult to interpret as a single number on its own. Basically, a small standard deviation means that the values in a statistical data set are close to the mean (or average) of the data set, and a large standard deviation means that the values in the data set are farther away from the mean.. The standard deviation measures how concentrated the data are around the ...

  18. Descriptive Statistics

    Range of visits to the library in the past year Ordered data set: 0, 3, 3, 12, 15, 24. Range: 24 - 0 = 24 Standard deviation. The standard deviation (s or SD) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

  19. The Relationship Between Mean & Standard Deviation (With Example)

    It is calculated as: Sample standard deviation = √Σ (xi - xbar)2 / (n-1) where: Σ: A symbol that means "sum". xi: The ith value in the sample. xbar: The mean of the sample. n: The sample size. Notice the relationship between the mean and standard deviation: The mean is used in the formula to calculate the standard deviation.

  20. 6 Examples of Using Standard Deviation in Real Life

    Example 1: Standard Deviation in Weather Forecasting. Standard deviation is widely used in weather forecasting to understand how much variation exists in daily and monthly temperatures in different cities. A weatherman who works in a city with a small standard deviation in temperatures year-round can confidently predict what the weather will be ...

  21. Standard Deviation Formula and Uses vs. Variance

    This means that analysts or researchers using standard deviation are comparing many data points, rather than drawing conclusions based on only analyzing single points of data, which leads to a ...

  22. Examples of Standard Deviation and How It's Used

    Looking at standard deviation examples can help ease confusion when studying statistics. Learn what the formula for standard deviation is and see examples. ... A low standard deviation means that the data is very closely related to the average, thus very reliable. ... Standard deviation is an important part of any statistical analysis. But it ...

  23. Blood metabolites, neurocognition and psychiatric disorders: a

    MR evidence indicated that genetically predicted acetylornithine was positively associated with g-Factor (0.035 standard deviation units increase in g-Factor per one standard deviation increase in ...

  24. Prediction of discharge in a tidal river using the LSTM ...

    The Seq2Seq models improved by 6%-60% and 5%-20% of the relative standard deviation compared to the harmonic analysis models and improved back propagation neural network models in discharge prediction, respectively. In addition, the relative accuracy of the Seq2Seq model is 1% to 3% higher than that of the LSTM model.

  25. What Is Standard Error?

    Using descriptive and inferential statistics, you can make two types of estimates about the population: point estimates and interval estimates.. A point estimate is a single value estimate of a parameter.For instance, a sample mean is a point estimate of a population mean. An interval estimate gives you a range of values where the parameter is expected to lie.

  26. Analysis of the Spatiotemporal Differentiation and Influencing Factors

    The research scale mainly focuses on provincial units or single cities; for example, Kuang measured the provincial arable land use efficiency of 31 provinces in China from 2000 to 2017, and Fu analyzed and studied the urban land use efficiency of Jiangsu Province, China, from 2006 to 2017, by using the data envelopment analysis method and the ...

  27. Dwarf mongoose-tree-based analysis for estimating the frost durability

    The data utilized in this research are collected from a range of published literature sources. These data encompass crucial information pertaining to the evaluation of frost resistance in recycled aggregate concrete (Liu et al. 2021, 2016; Salem and Burdette 1998; Salem et al. 2003; Dhir et al. 1999; Ajdukiewicz and Kliszczewicz 2002; Zaharieva et al. 2004; Cui et al. 2007; Richardson et al ...

  28. Design, Spectroscopic Analysis, DFT Calculations, Catalytic Evaluation

    The main target of the current research is designing and synthesizing novel Co(II) complexes derived from 2-hydroxy-5,3-(phenylallylidene)aminobenzoic acid ligand and to enhance comprehension as potential photocatalyst, antibacterial, antifungal, and antioxidants alternatives by means of using density functional theory (DFT) calculations and molecular docking investigation. Thus, 2-hydroxy-5,3 ...

  29. Full article: Finite element analysis of the stress and buckling

    1. Introduction. In many industries like power generation and the petrochemical industry, cylindrical tanks are used extensively for the storage of water, oil, petrochemical products, etc (Shokrzadeh & Sohrabi, Citation 2016).Most of the time, these tanks are constructed with a cylindrical wall sheet with steeped or uniform thickness and with an open-top or closed-roof top, having a thin base ...