LEARN STATISTICS EASILY

LEARN STATISTICS EASILY

Learn Data Analysis Now!

LEARN STATISTICS EASILY LOGO 2

5 Statistics Case Studies That Will Blow Your Mind

You will learn the transformative impact of statistical science in unfolding real-world narratives from global economics to public health victories.

Introduction

The untrained eye may see only cold, lifeless digits in the intricate dance of numbers and patterns that constitute data analysis and statistics. Yet, for those who know how to listen, these numbers whisper stories about our world, our behaviors, and the delicate interplay of systems and relationships that shape our reality. Artfully unfolded through meticulous statistical analysis, these narratives can reveal startling truths and unseen correlations that challenge our understanding and broaden our horizons. Here are five case studies demonstrating the profound power of statistics to decode reality’s vast and complex tapestry.

  • 2008 Financial Crisis : Regression analysis showed Lehman Brothers’ collapse rippled globally, causing a credit crunch and recession.
  • Eradication of Guinea Worm Disease : Geospatial and logistic regression helped reduce cases from 3.5 million to 54 by 2019.
  • Amazon’s Personalized Marketing : Machine learning algorithms predict customer preferences, drive sales, and set industry benchmarks for personalized shopping.
  • American Bald Eagle Recovery : Statistical models and the DDT ban led to the recovery of the species, once on the brink of extinction.
  • Twitter and Political Polarization : MIT’s sentiment analysis of tweets revealed echo chambers, influencing political discourse and highlighting the need for algorithm transparency.

1. The Butterfly Effect in Global Markets: The 2008 Financial Crisis

The 2008 financial crisis is a prime real-world example of the Butterfly Effect in global markets. What started as a crisis in the housing market in the United States quickly escalated into a full-blown international banking crisis with the collapse of the investment bank Lehman Brothers on September 15, 2008.

Understanding the Ripples

A team of economists employed regression analysis to understand the impact of the Lehman Brothers collapse. The statistical models revealed how this event affected financial institutions worldwide, causing a credit crunch and a widespread economic downturn.

The Data Weaves a Story

Further analysis using time-series forecasting methods painted a detailed picture of the crisis’s spread. For instance, these models were used to predict how the initial shockwave would impact housing markets globally, consumer spending, and unemployment rates. These forecasts proved incredibly accurate, showcasing not only the domino effect of the crisis but also the predictive power of well-crafted statistical models.

Implications for Future Predictions

This real-life event became a case study of the importance of understanding the deep connections within the global financial system. Banks, policymakers, and investors now use the predictive models developed from the 2008 crisis to stress-test economic systems against similar shocks. It has led to a greater appreciation of risk management and the implementation of stricter financial regulations to safeguard against future crises.

By interpreting the unfolding of the 2008 crisis through the lens of statistical science, we can appreciate the profound effect that one event in a highly interconnected system can have. The lessons learned continue to resonate, influencing financial policies and the global economic forecasting and stability approach.

2. Statistical Fortitude in Public Health: The Eradication of Dracunculiasis (Guinea Worm Disease)

In a world teeming with infectious diseases, the story of dracunculiasis, commonly known as Guinea Worm Disease, is a testament to public health tenacity and the judicious application of statistical analysis in disease eradication efforts.

Tracing the Path of the Parasite

The campaign against dracunculiasis, led by The Carter Center and supported by a consortium of international partners, utilized epidemiological data to trace and interrupt the life cycle of the Guinea worm — the statistical approach underpinning this public health victory involved meticulously collecting data on disease incidence and transmission patterns.

The Tally of Triumph

By employing geospatial statistics and logistic regression models, health workers pinpointed endemic villages and formulated strategies that targeted the disease’s transmission vectors. These statistical tools were instrumental in monitoring the progress of eradication efforts and allocating resources to areas most in need.

The Countdown to Zero

The eradication campaign’s success was measured by the continuous decline in cases, from an estimated 3.5 million in the mid-1980s to just 54 reported cases in 2019. This dramatic decrease has been documented through rigorous data collection and statistical validation, ensuring that each reported case was accounted for and dealt with accordingly.

Legacy of a Worm

The nearing eradication of Guinea Worm Disease, with no vaccine or curative treatment, is a feat that underscores the power of preventive public health strategies informed by statistical analysis. It serves as a blueprint for tackling other infectious diseases. It is a real-world example of how statistics can aid in making the invisible enemy of disease a known and conquerable foe.

The narrative of Guinea Worm eradication is not just a tale of statistical victory but also one of human resilience and commitment to public health. It is a story that will continue to inspire as the world edges closer to declaring dracunculiasis the second human disease, after smallpox, to be eradicated.

3. Unraveling the DNA of Consumer Behavior: A Case Study of Amazon’s Personalized Marketing

The advent of big data analytics has revolutionized marketing strategies by providing deep insights into consumer behavior. Amazon, a global leader in e-commerce, is at the forefront of leveraging statistical analysis to offer its customers a highly personalized shopping experience.

The Predictive Power of Purchase Patterns

Amazon collects vast user data, including browsing histories, purchase patterns, and product searches. Amazon analyzes this data by employing machine learning algorithms to predict individual customer preferences and future buying behavior. This predictive power is exemplified by Amazon’s recommendation engine, which suggests products to users with uncanny accuracy, often leading to increased sales and customer satisfaction.

Beyond the Purchase: Sentiment Analysis

Amazon extends its data analysis beyond purchases by analyzing customer reviews and feedback sentiment. This analysis gives Amazon a nuanced understanding of customer sentiments towards products and services. Amazon can quickly address issues, improve product offerings, and enhance customer service by mining text for customer sentiment.

Crafting Tomorrow’s Trends Today

Amazon’s data analytics insights are not limited to personalizing the shopping experience. They are also used to anticipate and set future trends. Amazon has mastered the art of using consumer data to meet existing demands and influence and create new consumer needs. By analyzing emerging patterns, Amazon stocks products ahead of demand spikes and develops new products that align with predicted consumer trends.

Amazon’s success in utilizing statistical analysis for marketing is a testament to the power of big data in shaping the future of consumer engagement. The company’s ability to personalize the shopping experience and anticipate consumer trends has set a benchmark in the industry, illustrating the transformative impact of statistics on marketing strategies.

4. The Revival of the American Bald Eagle: A Triumph of Environmental Policy and Statistics

In the annals of environmental success stories, the recovery of the American Bald Eagle (Haliaeetus leucocephalus) from extinction stands out as a sterling example of how rigorous science, public policy, and statistics can combine to safeguard wildlife. This case study offers a narrative that encapsulates the meticulous application of data analysis in wildlife conservation, revealing a more profound truth about the interdependence of species and the human spirit’s capacity for stewardship.

The Descent Towards Silence

By the mid-20th century, the American Bald Eagle, a symbol of freedom and strength, faced decimation. Pesticides like DDT, habitat loss, and illegal shooting had dramatically reduced their numbers. The alarming descent prompted an urgent call to action bolstered by the rigorous collection and analysis of ecological data.

The Statistical Lifeline

Biostatisticians and ecologists began a comprehensive monitoring program, recording eagle population numbers, nesting sites, and chick survival rates. Advanced statistical models, including logistic regression and population viability analysis (PVA), were employed to assess the eagles’ extinction risk under various scenarios and to evaluate the effectiveness of different conservation strategies.

The Ban on DDT – A Calculated Decision

A pivotal moment in the Bald Eagle’s story was the ban on DDT in 1972, a decision grounded in the statistical analysis of the pesticide’s impacts on eagle reproduction. Studies demonstrated a strong correlation between DDT and thinning eggshells, leading to reduced hatching rates. Based on this analysis, the ban’s implementation marked the turning point for the eagle’s fate.

A Soaring Recovery

Post-ban, rigorous monitoring continued, and the data collected painted a story of resilience and recovery. The statistical evidence was undeniable: eagle populations were rebounding. As of the early 21st century, the Bald Eagle had made a miraculous comeback, removed from the Endangered Species List in 2007.

The Legacy of a Species

The American Bald Eagle’s resurgence is more than a conservation narrative; it’s a testament to the harmony between humanity’s analytical prowess and its capacity for environmental guardianship. It shows how statistics can forecast doom and herald a new dawn for conservation. This case study epitomizes the beautiful interplay between human action, informed by truth and statistical insight, resulting in a tangible good: the return of a majestic species from the shadow of extinction.

5. The Algorithmic Mirrors of Social Media – The Case of Twitter and Political Polarization

Social media platforms, particularly Twitter, have become critical arenas for public discourse, shaping societal norms and reflecting public sentiment. This case study examines the real-world application of statistical models and algorithms to understand Twitter’s role in political polarization.

Twitter’s Data-Driven Sentiment Reflection

The aim was to analyze Twitter data to evaluate public sentiment regarding political events and understand the platform’s contribution to societal polarization.

Using natural language processing (NLP) and sentiment analysis, researchers from the Massachusetts Institute of Technology (MIT) analyzed over 10 million tweets from the period surrounding the 2020 U.S. Presidential Election. The tweets were filtered using politically relevant hashtags and keywords.

Deciphering the Digital Pulse

A sentiment index was created, categorizing tweets into positive, negative, or neutral sentiments concerning the candidates. This ‘Twitter Political Sentiment Index’ provided a temporal view of public mood swings about key campaign events and debates.

The Echo Chambers of the Internet

Network analysis revealed distinct user clusters along ideological lines, illustrating the presence of echo chambers. The study examined retweet networks and highlighted how information circulated within politically homogeneous groups, reinforcing existing beliefs.

The study showed limited user exposure to opposing political views on Twitter, increasing polarization. It also correlated significant shifts in the sentiment index with real-life events, such as policy announcements and election results.

Shaping the Future of Public Discourse

The study, published in Science, emphasizes the need for transparency in social media algorithms to mitigate echo chambers’ effects. The insights gained are being used to inform policymakers and educators about the dynamics of online discourse and to encourage the design of algorithms that promote a more balanced and open digital exchange of ideas.

The findings from MIT’s Twitter data analysis underscore the platform’s power as a real-time barometer of public sentiment and its role in shaping political discourse. The case study offers a roadmap for leveraging big data to foster a healthier democratic process in the digital age.

Drawing together these varied case studies, it becomes clear that statistics and data analysis are far from mere computation tools. They are, in fact, the instruments through which we can uncover deeper truths about our world. They can illuminate the unseen, predict the future, and help us shape it towards the common good. These narratives exemplify the pursuit of true knowledge, promoting good actions, and appreciating a beautiful world.

As we engage with the data of our daily lives, we continually decode the complexities of existence. From the markets to the microorganisms, consumer behavior to conservation efforts, and the physical to the digital world, statistics is the language in which the tales of our times are written. It is the language that reveals the integrity of systems, the harmony of nature, and the pulse of humanity. Through this science’s meticulous and ethical application, we uphold the values of truth, goodness, and beauty — ideals that remain ever-present in the quest for understanding and improving the world we share.

Recommended Articles

Curious about the untold stories behind the numbers? Dive into our blog for more riveting articles that showcase the transformative power of statistics in understanding and shaping our world. Continue your journey into the beauty of data-driven truths with us.

  • Music, Tea, and P-Values: Impossible Results and P-Hacking
  • Statistical Fallacies and the Perception of the Mozart Effect
  • How Data Visualization in the Form of Pie Charts Saved Lives

Frequently Asked Questions

Q1: What is the significance of the 2008 Financial Crisis in statistics?  The 2008 Financial Crisis is significant in statistics for demonstrating the Butterfly Effect in global markets, where regression analysis revealed the interconnected impact of Lehman Brothers’ collapse on the global economy.

Q2: How did statistics contribute to the eradication of Guinea Worm Disease?  Through geospatial and logistic regression, statistics played a crucial role in tracking and reducing the spread of Guinea Worm Disease, contributing to the decline from 3.5 million cases to just 54 by 2019.

Q3: What role does machine learning play in Amazon’s marketing?  Machine learning algorithms at Amazon analyze vast amounts of consumer data to predict customer preferences and personalize the shopping experience, driving sales and setting industry benchmarks.

Q4: How were statistics instrumental in the recovery of the American Bald Eagle?  Statistical models helped assess the risk of extinction and the impact of DDT on eagle reproduction, leading to conservation strategies that aided in the eagle’s significant recovery.

Q5: What is sentiment analysis, and how was it used in studying Twitter?  Sentiment analysis uses natural language processing to categorize the tone of text content. MIT used it to evaluate political sentiment on Twitter and study the platform’s role in political polarization.

Q6: How did statistical models predict the global effects of the 2008 crisis?  Statistical models, including time-series forecasting, predicted how the crisis would affect housing markets, consumer spending, and unemployment, demonstrating the predictive power of statistics.

Q7: Why is the eradication of Guinea Worm Disease significant beyond public health?  The near eradication, without a vaccine or cure, illustrates the power of preventive strategies and statistical analysis in public health, serving as a blueprint for combating other diseases.

Q8: In what way did statistics aid in the decision to ban DDT?  Statistical analysis linked DDT to thinning eagle eggshells and poor hatching rates, leading to the ban crucial for the Bald Eagle’s recovery.

Q9: How does Amazon’s use of data analytics influence consumer behavior?  By analyzing consumer data, Amazon anticipates and sets trends, meets demands, and influences new consumer needs, shaping the future of consumer engagement.

Q10: What implications does the Twitter political polarization study have?  The study calls for transparency in social media algorithms to reduce echo chambers. It suggests using statistical insights to foster a balanced, open digital exchange in democratic processes.

Similar Posts

data transformations for normality

Data Transformations for Normality: Essential Techniques

Explore essential techniques in data transformations for normality to unlock true insights and enhance your statistical analysis.

Those Who Ignore Statistics Are Condemned to Reinvent it

Those Who Ignore Statistics Are Condemned to Reinvent it

Explore why those who ignore statistics are condemned to reinvent it and how embracing data drives innovation and efficiency.

The Union and Intersection of Two Sets

The Union and Intersection of Two Sets: A Fundamental Approach to Set Analysis

Explore “The Union and Intersection of Two Sets” to master set theory operations for robust data analysis.

how-to-tell-stories-with-statistical-data

How to Tell Stories with Statistical Data

Discover the art of transforming statistical data into engaging stories that resonate with truth and beauty.

10 Revolutionary Techniques to Master Statistics and Data Analysis

10 Revolutionary Techniques to Master Statistics and Data Analysis Effortlessly!

Discover 10 effective techniques to master statistics and data analysis, enhancing insightful, efficient learning skills.

Statistical Learning

Join the Data Revolution: A Layman’s Guide to Statistical Learning

Explore the transformative power of Statistical Learning in our comprehensive guide and become part of the data revolution.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

statistical analysis case study

Introduction to Statistical Thinking

Chapter 16 case studies, 16.1 student learning objective.

This chapter concludes this book. We start with a short review of the topics that were discussed in the second part of the book, the part that dealt with statistical inference. The main part of the chapter involves the statistical analysis of 2 case studies. The tools that will be used for the analysis are those that were discussed in the book. We close this chapter and this book with some concluding remarks. By the end of this chapter, the student should be able to:

Review the concepts and methods for statistical inference that were presented in the second part of the book.

Apply these methods to requirements of the analysis of real data.

Develop a resolve to learn more statistics.

16.2 A Review

The second part of the book dealt with statistical inference; the science of making general statement on an entire population on the basis of data from a sample. The basis for the statements are theoretical models that produce the sampling distribution. Procedures for making the inference are evaluated based on their properties in the context of this sampling distribution. Procedures with desirable properties are applied to the data. One may attach to the output of this application summaries that describe these theoretical properties.

In particular, we dealt with two forms of making inference. One form was estimation and the other was hypothesis testing. The goal in estimation is to determine the value of a parameter in the population. Point estimates or confidence intervals may be used in order to fulfill this goal. The properties of point estimators may be assessed using the mean square error (MSE) and the properties of the confidence interval may be assessed using the confidence level.

The target in hypotheses testing is to decide between two competing hypothesis. These hypotheses are formulated in terms of population parameters. The decision rule is called a statistical test and is constructed with the aid of a test statistic and a rejection region. The default hypothesis among the two, is rejected if the test statistic falls in the rejection region. The major property a test must possess is a bound on the probability of a Type I error, the probability of erroneously rejecting the null hypothesis. This restriction is called the significance level of the test. A test may also be assessed in terms of it’s statistical power, the probability of rightfully rejecting the null hypothesis.

Estimation and testing were applied in the context of single measurements and for the investigation of the relations between a pair of measurements. For single measurements we considered both numeric variables and factors. For numeric variables one may attempt to conduct inference on the expectation and/or the variance. For factors we considered the estimation of the probability of obtaining a level, or, more generally, the probability of the occurrence of an event.

We introduced statistical models that may be used to describe the relations between variables. One of the variables was designated as the response. The other variable, the explanatory variable, is identified as a variable which may affect the distribution of the response. Specifically, we considered numeric variables and factors that have two levels. If the explanatory variable is a factor with two levels then the analysis reduces to the comparison of two sub-populations, each one associated with a level. If the explanatory variable is numeric then a regression model may be applied, either linear or logistic regression, depending on the type of the response.

The foundations of statistical inference are the assumption that we make in the form of statistical models. These models attempt to reflect reality. However, one is advised to apply healthy skepticism when using the models. First, one should be aware what the assumptions are. Then one should ask oneself how reasonable are these assumption in the context of the specific analysis. Finally, one should check as much as one can the validity of the assumptions in light of the information at hand. It is useful to plot the data and compare the plot to the assumptions of the model.

16.3 Case Studies

Let us apply the methods that were introduced throughout the book to two examples of data analysis. Both examples are taken from the case studies of the Rice Virtual Lab in Statistics can be found in their Case Studies section. The analysis of these case studies may involve any of the tools that were described in the second part of the book (and some from the first part). It may be useful to read again Chapters  9 – 15 before reading the case studies.

16.3.1 Physicians’ Reactions to the Size of a Patient

Overweight and obesity is common in many of the developed contrives. In some cultures, obese individuals face discrimination in employment, education, and relationship contexts. The current research, conducted by Mikki Hebl and Jingping Xu 87 , examines physicians’ attitude toward overweight and obese patients in comparison to their attitude toward patients who are not overweight.

The experiment included a total of 122 primary care physicians affiliated with one of three major hospitals in the Texas Medical Center of Houston. These physicians were sent a packet containing a medical chart similar to the one they view upon seeing a patient. This chart portrayed a patient who was displaying symptoms of a migraine headache but was otherwise healthy. Two variables (the gender and the weight of the patient) were manipulated across six different versions of the medical charts. The weight of the patient, described in terms of Body Mass Index (BMI), was average (BMI = 23), overweight (BMI = 30), or obese (BMI = 36). Physicians were randomly assigned to receive one of the six charts, and were asked to look over the chart carefully and complete two medical forms. The first form asked physicians which of 42 tests they would recommend giving to the patient. The second form asked physicians to indicate how much time they believed they would spend with the patient, and to describe the reactions that they would have toward this patient.

In this presentation, only the question on how much time the physicians believed they would spend with the patient is analyzed. Although three patient weight conditions were used in the study (average, overweight, and obese) only the average and overweight conditions will be analyzed. Therefore, there are two levels of patient weight (average and overweight) and one dependent variable (time spent).

The data for the given collection of responses from 72 primary care physicians is stored in the file “ discriminate.csv ” 88 . We start by reading the content of the file into a data frame by the name “ patient ” and presenting the summary of the variables:

Observe that of the 72 “patients”, 38 are overweight and 33 have an average weight. The time spend with the patient, as predicted by physicians, is distributed between 5 minutes and 1 hour, with a average of 27.82 minutes and a median of 30 minutes.

It is a good practice to have a look at the data before doing the analysis. In this examination on should see that the numbers make sense and one should identify special features of the data. Even in this very simple example we may want to have a look at the histogram of the variable “ time ”:

statistical analysis case study

A feature in this plot that catches attention is the fact that there is a high concventration of values in the interval between 25 and 30. Together with the fact that the median is equal to 30, one may suspect that, as a matter of fact, a large numeber of the values are actually equal to 30. Indeed, let us produce a table of the response:

Notice that 30 of the 72 physicians marked “ 30 ” as the time they expect to spend with the patient. This is the middle value in the range, and may just be the default value one marks if one just needs to complete a form and do not really place much importance to the question that was asked.

The goal of the analysis is to examine the relation between overweigh and the Doctor’s response. The explanatory variable is a factor with two levels. The response is numeric. A natural tool to use in order to test this hypothesis is the \(t\) -test, which is implemented with the function “ t.test ”.

First we plot the relation between the response and the explanatory variable and then we apply the test:

statistical analysis case study

Nothing seems problematic in the box plot. The two distributions, as they are reflected in the box plots, look fairly symmetric.

When we consider the report that produced by the function “ t.test ” we may observe that the \(p\) -value is equal to 0.005774. This \(p\) -value is computed in testing the null hypothesis that the expectation of the response for both types of patients are equal against the two sided alternative. Since the \(p\) -value is less than 0.05 we do reject the null hypothesis.

The estimated value of the difference between the expectation of the response for a patient with BMI=23 and a patient with BMI=30 is \(31.36364 -24.73684 \approx 6.63\) minutes. The confidence interval is (approximately) equal to \([1.99, 11.27]\) . Hence, it looks as if the physicians expect to spend more time with the average weight patients.

After analyzing the effect of the explanatory variable on the expectation of the response one may want to examine the presence, or lack thereof, of such effect on the variance of the response. Towards that end, one may use the function “ var.test ”:

In this test we do not reject the null hypothesis that the two variances of the response are equal since the \(p\) -value is larger than \(0.05\) . The sample variances are almost equal to each other (their ratio is \(1.044316\) ), with a confidence interval for the ration that essentially ranges between 1/2 and 2.

The production of \(p\) -values and confidence intervals is just one aspect in the analysis of data. Another aspect, which typically is much more time consuming and requires experience and healthy skepticism is the examination of the assumptions that are used in order to produce the \(p\) -values and the confidence intervals. A clear violation of the assumptions may warn the statistician that perhaps the computed nominal quantities do not represent the actual statistical properties of the tools that were applied.

In this case, we have noticed the high concentration of the response at the value “ 30 ”. What is the situation when we split the sample between the two levels of the explanatory variable? Let us apply the function “ table ” once more, this time with the explanatory variable included:

Not surprisingly, there is still high concentration at that level “ 30 ”. But one can see that only 2 of the responses of the “ BMI=30 ” group are above that value in comparison to a much more symmetric distribution of responses for the other group.

The simulations of the significance level of the one-sample \(t\) -test for an Exponential response that were conducted in Question  \[ex:Testing.2\] may cast some doubt on how trustworthy are nominal \(p\) -values of the \(t\) -test when the measurements are skewed. The skewness of the response for the group “ BMI=30 ” is a reason to be worry.

We may consider a different test, which is more robust, in order to validate the significance of our findings. For example, we may turn the response into a factor by setting a level for values larger or equal to “ 30 ” and a different level for values less than “ 30 ”. The relation between the new response and the explanatory variable can be examined with the function “ prop.test ”. We first plot and then test:

statistical analysis case study

The mosaic plot presents the relation between the explanatory variable and the new factor. The level “ TRUE ” is associated with a value of the predicted time spent with the patient being 30 minutes or more. The level “ FALSE ” is associated with a prediction of less than 30 minutes.

The computed \(p\) -value is equal to \(0.05409\) , that almost reaches the significance level of 5% 89 . Notice that the probabilities that are being estimated by the function are the probabilities of the level “ FALSE ”. Overall, one may see the outcome of this test as supporting evidence for the conclusion of the \(t\) -test. However, the \(p\) -value provided by the \(t\) -test may over emphasize the evidence in the data for a significant difference in the physician attitude towards overweight patients.

16.3.2 Physical Strength and Job Performance

The next case study involves an attempt to develop a measure of physical ability that is easy and quick to administer, does not risk injury, and is related to how well a person performs the actual job. The current example is based on study by Blakely et al.  90 , published in the journal Personnel Psychology.

There are a number of very important jobs that require, in addition to cognitive skills, a significant amount of strength to be able to perform at a high level. Construction worker, electrician and auto mechanic, all require strength in order to carry out critical components of their job. An interesting applied problem is how to select the best candidates from amongst a group of applicants for physically demanding jobs in a safe and a cost effective way.

The data presented in this case study, and may be used for the development of a method for selection among candidates, were collected from 147 individuals working in physically demanding jobs. Two measures of strength were gathered from each participant. These included grip and arm strength. A piece of equipment known as the Jackson Evaluation System (JES) was used to collect the strength data. The JES can be configured to measure the strength of a number of muscle groups. In this study, grip strength and arm strength were measured. The outcomes of these measurements were summarized in two scores of physical strength called “ grip ” and “ arm ”.

Two separate measures of job performance are presented in this case study. First, the supervisors for each of the participants were asked to rate how well their employee(s) perform on the physical aspects of their jobs. This measure is summarizes in the variable “ ratings ”. Second, simulations of physically demanding work tasks were developed. The summary score of these simulations are given in the variable “ sims ”. Higher values of either measures of performance indicates better performance.

The data for the 4 variables and 147 observations is stored in “ job.csv ” 91 . We start by reading the content of the file into a data frame by the name “ job ”, presenting a summary of the variables, and their histograms:

statistical analysis case study

All variables are numeric. Examination of the 4 summaries and histograms does not produce interest findings. All variables are, more or less, symmetric with the distribution of the variable “ ratings ” tending perhaps to be more uniform then the other three.

The main analyses of interest are attempts to relate the two measures of physical strength “ grip ” and “ arm ” with the two measures of job performance, “ ratings ” and “ sims ”. A natural tool to consider in this context is a linear regression analysis that relates a measure of physical strength as an explanatory variable to a measure of job performance as a response.

Scatter Plots and Regression Lines

FIGURE 16.1: Scatter Plots and Regression Lines

Let us consider the variable “ sims ” as a response. The first step is to plot a scatter plot of the response and explanatory variable, for both explanatory variables. To the scatter plot we add the line of regression. In order to add the regression line we fit the regression model with the function “ lm ” and then apply the function “ abline ” to the fitted model. The plot for the relation between the response and the variable “ grip ” is produced by the code:

The plot that is produced by this code is presented on the upper-left panel of Figure  16.1 .

The plot for the relation between the response and the variable “ arm ” is produced by this code:

The plot that is produced by the last code is presented on the upper-right panel of Figure  16.1 .

Both plots show similar characteristics. There is an overall linear trend in the relation between the explanatory variable and the response. The value of the response increases with the increase in the value of the explanatory variable (a positive slope). The regression line seems to follow, more or less, the trend that is demonstrated by the scatter plot.

A more detailed analysis of the regression model is possible by the application of the function “ summary ” to the fitted model. First the case where the explanatory variable is “ grip ”:

Examination of the report reviles a clear statistical significance for the effect of the explanatory variable on the distribution of response. The value of R-squared, the ration of the variance of the response explained by the regression is \(0.4094\) . The square root of this quantity, \(\sqrt{0.4094} \approx 0.64\) , is the proportion of the standard deviation of the response that is explained by the explanatory variable. Hence, about 64% of the variability in the response can be attributed to the measure of the strength of the grip.

For the variable “ arm ” we get:

This variable is also statistically significant. The value of R-squared is \(0.4706\) . The proportion of the standard deviation that is explained by the strength of the are is \(\sqrt{0.4706} \approx 0.69\) , which is slightly higher than the proportion explained by the grip.

Overall, the explanatory variables do a fine job in the reduction of the variability of the response “ sims ” and may be used as substitutes of the response in order to select among candidates. A better prediction of the response based on the values of the explanatory variables can be obtained by combining the information in both variables. The production of such combination is not discussed in this book, though it is similar in principle to the methods of linear regression that are presented in Chapter  14 . The produced score 92 takes the form:

\[\mbox{\texttt{score}} = -5.434 + 0.024\cdot \mbox{\texttt{grip}}+ 0.037\cdot \mbox{\texttt{arm}}\;.\] We use this combined score as an explanatory variable. First we form the score and plot the relation between it and the response:

The scatter plot that includes the regression line can be found at the lower-left panel of Figure  16.1 . Indeed, the linear trend is more pronounced for this scatter plot and the regression line a better description of the relation between the response and the explanatory variable. A summary of the regression model produces the report:

Indeed, the score is highly significant. More important, the R-squared coefficient that is associated with the score is \(0.5422\) , which corresponds to a ratio of the standard deviation that is explained by the model of \(\sqrt{0.5422} \approx 0.74\) . Thus, almost 3/4 of the variability is accounted for by the score, so the score is a reasonable mean of guessing what the results of the simulations will be. This guess is based only on the results of the simple tests of strength that is conducted with the JES device.

Before putting the final seal on the results let us examine the assumptions of the statistical model. First, with respect to the two explanatory variables. Does each of them really measure a different property or do they actually measure the same phenomena? In order to examine this question let us look at the scatter plot that describes the relation between the two explanatory variables. This plot is produced using the code:

It is presented in the lower-right panel of Figure  16.1 . Indeed, one may see that the two measurements of strength are not independent of each other but tend to produce an increasing linear trend. Hence, it should not be surprising that the relation of each of them with the response produces essentially the same goodness of fit. The computed score gives a slightly improved fit, but still, it basically reflects either of the original explanatory variables.

In light of this observation, one may want to consider other measures of strength that represents features of the strength not captures by these two variable. Namely, measures that show less joint trend than the two considered.

Another element that should be examined are the probabilistic assumptions that underly the regression model. We described the regression model only in terms of the functional relation between the explanatory variable and the expectation of the response. In the case of linear regression, for example, this relation was given in terms of a linear equation. However, another part of the model corresponds to the distribution of the measurements about the line of regression. The assumption that led to the computation of the reported \(p\) -values is that this distribution is Normal.

A method that can be used in order to investigate the validity of the Normal assumption is to analyze the residuals from the regression line. Recall that these residuals are computed as the difference between the observed value of the response and its estimated expectation, namely the fitted regression line. The residuals can be computed via the application of the function “ residuals ” to the fitted regression model.

Specifically, let us look at the residuals from the regression line that uses the score that is combined from the grip and arm measurements of strength. One may plot a histogram of the residuals:

statistical analysis case study

The produced histogram is represented on the upper panel. The histogram portrays a symmetric distribution that my result from Normally distributed observations. A better method to compare the distribution of the residuals to the Normal distribution is to use the Quantile-Quantile plot . This plot can be found on the lower panel. We do not discuss here the method by which this plot is produced 93 . However, we do say that any deviation of the points from a straight line is indication of violation of the assumption of Normality. In the current case, the points seem to be on a single line, which is consistent with the assumptions of the regression model.

The next task should be an analysis of the relations between the explanatory variables and the other response “ ratings ”. In principle one may use the same steps that were presented for the investigation of the relations between the explanatory variables and the response “ sims ”. But of course, the conclusion may differ. We leave this part of the investigation as an exercise to the students.

16.4 Summary

16.4.1 concluding remarks.

The book included a description of some elements of statistics, element that we thought are simple enough to be explained as part of an introductory course to statistics and are the minimum that is required for any person that is involved in academic activities of any field in which the analysis of data is required. Now, as you finish the book, it is as good time as any to say some words regarding the elements of statistics that are missing from this book.

One element is more of the same. The statistical models that were presented are as simple as a model can get. A typical application will required more complex models. Each of these models may require specific methods for estimation and testing. The characteristics of inference, e.g. significance or confidence levels, rely on assumptions that the models are assumed to possess. The user should be familiar with computational tools that can be used for the analysis of these more complex models. Familiarity with the probabilistic assumptions is required in order to be able to interpret the computer output, to diagnose possible divergence from the assumptions and to assess the severity of the possible effect of such divergence on the validity of the findings.

Statistical tools can be used for tasks other than estimation and hypothesis testing. For example, one may use statistics for prediction. In many applications it is important to assess what the values of future observations may be and in what range of values are they likely to occur. Statistical tools such as regression are natural in this context. However, the required task is not testing or estimation the values of parameters, but the prediction of future values of the response.

A different role of statistics in the design stage. We hinted in that direction when we talked about in Chapter  \[ch:Confidence\] about the selection of a sample size in order to assure a confidence interval with a given accuracy. In most applications, the selection of the sample size emerges in the context of hypothesis testing and the criteria for selection is the minimal power of the test, a minimal probability to detect a true finding. Yet, statistical design is much more than the determination of the sample size. Statistics may have a crucial input in the decision of how to collect the data. With an eye on the requirements for the final analysis, an experienced statistician can make sure that data that is collected is indeed appropriate for that final analysis. Too often is the case where researcher steps into the statistician’s office with data that he or she collected and asks, when it is already too late, for help in the analysis of data that cannot provide a satisfactory answer to the research question the researcher tried to address. It may be said, with some exaggeration, that good statisticians are required for the final analysis only in the case where the initial planning was poor.

Last, but not least, is the theoretical mathematical theory of statistics. We tried to introduce as little as possible of the relevant mathematics in this course. However, if one seriously intends to learn and understand statistics then one must become familiar with the relevant mathematical theory. Clearly, deep knowledge in the mathematical theory of probability is required. But apart from that, there is a rich and rapidly growing body of research that deals with the mathematical aspects of data analysis. One cannot be a good statistician unless one becomes familiar with the important aspects of this theory.

I should have started the book with the famous quotation: “Lies, damned lies, and statistics”. Instead, I am using it to end the book. Statistics can be used and can be misused. Learning statistics can give you the tools to tell the difference between the two. My goal in writing the book is achieved if reading it will mark for you the beginning of the process of learning statistics and not the end of the process.

16.4.2 Discussion in the Forum

In the second part of the book we have learned many subjects. Most of these subjects, especially for those that had no previous exposure to statistics, were unfamiliar. In this forum we would like to ask you to share with us the difficulties that you encountered.

What was the topic that was most difficult for you to grasp? In your opinion, what was the source of the difficulty?

When forming your answer to this question we will appreciate if you could elaborate and give details of what the problem was. Pointing to deficiencies in the learning material and confusing explanations will help us improve the presentation for the future editions of this book.

Hebl, M. and Xu, J. (2001). Weighing the care: Physicians’ reactions to the size of a patient. International Journal of Obesity, 25, 1246-1252. ↩

The file can be found on the internet at http://pluto.huji.ac.il/~msby/StatThink/Datasets/discriminate.csv . ↩

One may propose splinting the response into two groups, with one group being associated with values of “ time ” strictly larger than 30 minutes and the other with values less or equal to 30. The resulting \(p\) -value from the expression “ prop.test(table(patient$time>30,patient$weight)) ” is \(0.01276\) . However, the number of subjects in one of the cells of the table is equal only to 2, which is problematic in the context of the Normal approximation that is used by this test. ↩

Blakley, B.A., Qui?ones, M.A., Crawford, M.S., and Jago, I.A. (1994). The validity of isometric strength tests. Personnel Psychology, 47, 247-274. ↩

The file can be found on the internet at http://pluto.huji.ac.il/~msby/StatThink/Datasets/job.csv . ↩

The score is produced by the application of the function “ lm ” to both variables as explanatory variables. The code expression that can be used is “ lm(sims ~ grip + arm, data=job) ”. ↩

Generally speaking, the plot is composed of the empirical percentiles of the residuals, plotted against the theoretical percentiles of the standard Normal distribution. The current plot is produced by the expression “ qqnorm(residuals(sims.score)) ”. ↩

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

statistical analysis case study

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12
30

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00
653

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

Prevent plagiarism. Run a free check.

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval

Methodology

  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hostile attribution bias
  • Affect heuristic

Is this article helpful?

Other students also liked.

  • Descriptive Statistics | Definitions, Types, Examples
  • Inferential Statistics | An Easy Introduction & Examples
  • Choosing the Right Statistical Test | Types & Examples

More interesting articles

  • Akaike Information Criterion | When & How to Use It (Example)
  • An Easy Introduction to Statistical Significance (With Examples)
  • An Introduction to t Tests | Definitions, Formula and Examples
  • ANOVA in R | A Complete Step-by-Step Guide with Examples
  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Chi-Square (Χ²) Distributions | Definition & Examples
  • Chi-Square (Χ²) Table | Examples & Downloadable Table
  • Chi-Square (Χ²) Tests | Types, Formula & Examples
  • Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  • Chi-Square Test of Independence | Formula, Guide & Examples
  • Coefficient of Determination (R²) | Calculation & Interpretation
  • Correlation Coefficient | Types, Formulas & Examples
  • Frequency Distribution | Tables, Types & Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | 4 Ways with Examples & Explanation
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Mode | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  • Interval Data and How to Analyze It | Definitions & Examples
  • Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  • Linear Regression in R | A Step-by-Step Guide & Examples
  • Missing Data | Types, Explanation, & Imputation
  • Multiple Linear Regression | A Quick Guide (Examples)
  • Nominal Data | Definition, Examples, Data Collection & Analysis
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • One-way ANOVA | When and How to Use It (With Examples)
  • Ordinal Data | Definition, Examples, Data Collection & Analysis
  • Parameter vs Statistic | Definitions, Differences & Examples
  • Pearson Correlation Coefficient (r) | Guide & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Probability Distribution | Formula, Types, & Examples
  • Quartiles & Quantiles | Calculation, Definition & Interpretation
  • Ratio Scales | Definition, Examples, & Data Analysis
  • Simple Linear Regression | An Easy Introduction & Examples
  • Skewness | Definition, Examples & Formula
  • Statistical Power and Why It Matters | A Simple Introduction
  • Student's t Table (Free Download) | Guide & Examples
  • T-distribution: What it is and how to use it
  • Test statistics | Definition, Interpretation, and Examples
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Two-Way ANOVA | Examples & When To Use It
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Understanding P values | Definition and Examples
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Kurtosis? | Definition, Examples & Formula
  • What Is Standard Error? | How to Calculate (Guide with Examples)

What is your plagiarism score?

Statistical analyses of case-control studies

Statistical analyses of case-control studies

How Evidence-based practice (EBP) can be translated as health communication or patient education materials

How Evidence-based practice (EBP) can be translated as health communication or patient education materials

How to evaluate bias in meta-analysis within meta-epidemiological studies

How to evaluate bias in meta-analysis within meta-epidemiological studies?

Introduction.

A case-control study is used to see if exposure is linked to a certain result (i.e., disease or condition of interest). Case-control research is always retrospective by definition since it starts with a result and then goes back to look at exposures. The investigator already knows the result of each participant when they are enrolled in their separate groups. Case-control studies are retrospective because of this, not because the investigator frequently uses previously gathered data. This article discusses statistical analysis in case-control studies.

Advantages and Disadvantages of Case-Control Studies

Advantages-and-Disadvantages-of-Case-Control-Studies

Study Design

Participants in a case-control study are chosen for the study depending on their outcome status. As a result, some individuals have the desired outcome (referred to as cases), while others do not have the desired outcome (referred to as controls). After that, the investigator evaluates the exposure in both groups. As a result, in case-control research , the outcome must occur in at least some individuals. Thus, as shown in Figure 1, some research participants have the outcome, and others do not enrol.

Example-of-a-case-control-study

Figure 1. Example of a case-control study [1]

Selection of case

The cases should be defined as precisely as feasible by the investigator. A disease’s definition may be based on many criteria at times; hence, all aspects should be fully specified in the case definition.

Selection of a control

Controls that are comparable to the cases in a variety of ways should be chosen. The matching criteria are the parameters (e.g., age, sex, and hospitalization time) used to establish how controls and cases should be similar. For instance, it would be unfair to compare patients with elective intraocular surgery to a group of controls with traumatic corneal lacerations. Another key feature of a case-control study is that the exposure in both cases and controls should be measured equally.

Though some controls have to be similar to cases in many respects, it is possible to over-match. Over-matching might make it harder to identify enough controls. Furthermore, once a matching variable is chosen, it cannot be analyzed as a risk factor. Enrolling more than one control for each case is an effective method for increasing the power of research. However, incorporating more than two controls per instance adds little statistical value.

Data collection

Decide on the data to be gathered after precisely identifying the cases and controls; both groups must have the same data obtained in the same method. If the search for primary risk variables is not conducted objectively, the study may suffer from researcher bias, especially because the conclusion is already known. It’s crucial to try to hide the outcome from the person collecting risk factor data or interviewing patients, even if it’s not always practicable. Patients may be asked questions concerning historical issues (such as smoking history, food, usage of conventional eye medications, and so on). For some people, precisely recalling all of this information may be challenging.

Furthermore, patients who get the result (cases) are more likely to recall specifics of unfavourable experiences than controls. Recall bias is a term for this phenomenon. Any effort made by the researcher to reduce this form of bias would benefit the research.

The frequency of each of the measured variables in each of the two groups is computed in the analysis. Case-control studies produce the odds ratio to measure the strength of the link between exposure and the outcome. An odds ratio is the ratio of exposure probabilities in the case group to the odds of response in the control group. Calculating a confidence interval for each odds ratio is critical. A confidence interval of 1.0 indicates that the link between the exposure and the result might have been discovered by chance alone and that the link is not statistically significant. Without a confidence interval, an odds ratio isn’t particularly useful. Computer programmes are typically used to do these computations. Because no measures are taken in a population-based sample, case-control studies cannot give any information regarding the incidence or prevalence of a disease.

Risk Factors and Sampling

Case-control studies can also be used to investigate risk factors for a rare disease. Cases might be obtained from hospital records. Patients who present to the hospital, on the other hand, may not be typical of the general community. The selection of an appropriate control group may provide challenges. Patients from the same hospital who do not have the result are a common source of controls. However, hospitalized patients may not always reflect the broader population; they are more likely to have health issues and access the healthcare system.

Recent research on case-control studies using statistical analyses

i) R isk factors related to multiple sclerosis in Kuwait

This matched case-control research in Kuwait looked at the relationship between several variables: family history, stressful life events, tobacco smoke exposure, vaccination history, comorbidity, and multiple sclerosis (MS) risk. To accomplish the study’s goal, a matched case-control strategy was used. Cases were recruited from Ibn Sina Hospital’s neurology clinics and the Dasman Diabetes Institute’s MS clinic. Controls were chosen from among Kuwait University’s faculty and students. A generalized questionnaire was used to collect data on socio-demographic, possibly genetic, and environmental aspects from each patient and his/her pair-matched control. Descriptive statistics were produced, including means and standard deviations for quantitative variables and frequencies for qualitative variables. Variables that were substantially (p ≤ 0.15) associated with MS status in the univariable conditional logistic regression analysis were evaluated for inclusion in the final multivariable conditional logistic regression model. In this case-control study, 112 MS patients were invited to participate, and 110 (98.2 %) agreed to participate. Therefore, 110 MS patients and 110 control participants were enlisted, and they were individually matched with cases (1:1) on age (5 years), gender, and nationality (Fig. 1). The findings revealed that having a family history of MS was significantly associated with an increased risk of developing MS. In contrast, vaccination against influenza A and B viruses provided significant protection against MS.

Flow-chart-on-the-enrollment-of-the-MS-cases-and-controls

Figure 1. Flow chart on the enrollment of the MS cases and controls [1]

ii) Relation between periodontitis and COVID-19 infection

COVID-19 is linked to a higher inflammatory response, which can be deadly. Periodontitis is characterized by systemic inflammation. In Qatar, patients with COVID-19 were chosen from Hamad Medical Corporation’s (HMC) national electronic health data. Patients with COVID-19 problems (death, ICU hospitalizations, or assisted ventilation) were categorized as cases, while COVID-19 patients released without severe difficulties were categorized as controls. There was no control matching because all controls were included in the analysis. Periodontal problems were evaluated using dental radiographs from the same database. The relationships between periodontitis and COVID 19 problems were investigated using logistic regression models adjusted for demographic, medical, and behavioural variables. 258 of the 568 participants had periodontitis. Only 33 of the 310 patients with periodontitis had COVID-19 issues, whereas only 7 of the 310 patients without periodontitis had COVID-19 issues. Table 2 shows the unadjusted and adjusted odds ratios and 95 % confidence intervals for the relationship between periodontitis and COVID-19 problems. Periodontitis was shown to be substantially related to a greater risk of COVID-19 complications, such as ICU admission, the requirement for assisted breathing, and mortality, as well as higher blood levels of indicators connected to a poor COVID-19 outcome, such as D-dimer, WBC, and CRP.

Table 2. Associations between periodontal condition and COVID-19 complications [3]

Associations-between-periodontal-condition-and-COVID-19-complications

iii) Menstrual, reproductive and hormonal factors and thyroid cancer

The relationships between menstrual, reproductive, and hormonal variables and thyroid cancer incidence in a population of Chinese women were investigated in this study. A 1:1 corresponding hospital-based Case-control study was conducted in 7 counties of Zhejiang Province to investigate the correlations of diabetes mellitus and other variables with thyroid cancer. Case participants were eligible if they were diagnosed with primary thyroid cancer for the first time in a hospital between July 2015 and December 2017. The patients and controls in this research were chosen at random. At enrollment, the interviewer gathered all essential information face-to-face using a customized questionnaire. Descriptive statistics were utilized to characterize the baseline characteristics of female individuals using frequency and percentage. To investigate the connections between the variables and thyroid cancer, univariate conditional logistic regression models were used. We used four multivariable conditional logistic regression models adjusted for variables to investigate the relationships between menstrual, reproductive, and hormonal variables and thyroid cancer. In all, 2937 pairs of participants took part in the case-control research. The findings revealed that a later age at first pregnancy and a longer duration of breastfeeding were substantially linked with a lower occurrence of thyroid cancer, which might shed light on the aetiology, monitoring, and prevention of thyroid cancer in Chinese women [4].

It’s important to note that the term “case-control study” is commonly misunderstood. A case-control study starts with a group of people exposed to something and a comparison group (control group) who have not been exposed to anything and then follows them over time to see what occurs. However, this is not a case-control study. Case-control studies are frequently seen as less valuable since they are retrospective. They can, however, be a highly effective technique of detecting a link between an exposure and a result. In addition, they are sometimes the only ethical approach to research a connection. Case-control studies can provide useful information if definitions, controls, and the possibility for bias are carefully considered.

[1] Setia, Maninder Singh. “Methodology Series Module 2: Case-control Studies.” Indian journal of dermatology vol. 61,2 (2016): 146-51. doi:10.4103/0019-5154.177773

[2] El-Muzaini, H., Akhtar, S. & Alroughani, R. A matched case-control study of risk factors associated with multiple sclerosis in Kuwait. BMC Neurol 20, 64 (2020). https://doi.org/10.1186/s .

[3] Marouf, Nadya, Wenji Cai, Khalid N. Said, Hanin Daas, Hanan Diab, Venkateswara Rao Chinta, Ali Ait Hssain, Belinda Nicolau, Mariano Sanz, and Faleh Tamimi. “Association between periodontitis and severity of COVID‐19 infection: A case–control study.” Journal of clinical periodontology 48, no. 4 (2021): 483-491.

[4] Wang, Meng, Wei-Wei Gong, Qing-Fang He, Ru-Ying Hu, and Min Yu. “Menstrual, reproductive and hormonal factors and thyroid cancer: a hospital-based case-control study in China.” BMC Women’s Health 21, no. 1 (2021): 1-8.

pubrica-academy

pubrica-academy

Related posts.

statistical analysis case study

Importance Of Proofreading For Scientific Writing Methods and Significance

Selecting material (e.g. excipient, active pharmaceutical ingredient, packaging material) for drug development

Selecting material (e.g. excipient, active pharmaceutical ingredient, packaging material) for drug development

Health economics in clinical trials

Health economics in clinical trials

Comments are closed.

  • ASA Community
  • Code of Conduct

Welcome to Stats 101: A Resource for Teaching Introductory Statistics

statistical analysis case study

A Series of Case Studies

Resources for Statistics Teachers developed by:

Richard D. De Veaux, Williams College Deborah Nolan and Jasjeet Sekhon, UC Berkeley Nicholas Horton, Amherst College and Ben Baumer, Smith College Daniel Kaplan, Macalester College Julie Legler, St. Olaf College and Carrie Grimes, Google with help from David Bock, Ithaca High School, retired December 14, 2015

Introduction: Many teachers of introductory statistics courses, whether at the high school, 2 year or 4 year college or university level are trained in mathematics, with little or no training or experience with statistics. At the request of the 2015 President of the American Statistical Association, David Morganstein, we have written a series of case studies, designed to show statistics in action, rather than showing it as a branch of mathematics. Each case starts with a real world problem and leads the reader through the steps taken to explore the problem, highlighting the techniques used in introductory or AP statistics classes. Sometimes the analysis goes slightly past the methods taught in such an intro course, but the analysis is meant to build on simpler techniques and to provide examples of real analyses, typical of the kind of analysis a professional statistician might perform. Our hope is that these case studies can both provide context and motivation for the instructor so that the methods in the intro course come alive, rather than seem a list of cookbook formulas. They can be used as examples in class, or just as guides for what a statistical analysis might entail.

Each case is presented in 2 versions:

  • An R version, written in R Markdown, showing all the R code used to make the plots and the analysis. This version is available in the public library on this site.
  • A version using the package JMP from SAS. This version will be housed on the JMP User Community site .

Please share your feedback. Use this link to ask questions and share your comments about the case studies. Your feedback will help us improve!

Available Case Studies

How Much is a Fireplace Worth?

Author: Dick De Veaux, Williams College

Nearly 60% of the houses in Saratoga County New York have fireplaces. On average, those houses sell for about $65,000 more than houses without fireplaces. Is the fireplace the reason for the difference? This case study starts by the simple comparison of the prices with and without fireplaces. But, there are other characteristics of the houses with fireplace that may affect the price as well. The intent is to show the danger of using simple group comparisons to answer a question that involves many variables. The study then builds a series of more sophisticated models to show how adjustment by other variables can lead to a more sensible conclusion.

The data are a random sample of 1,728 homes taken from public records from the Saratoga County ( http://www.saratogacountyny.gov/departments/real-property-tax-service-agency/ ) and collected by Candice Corvetti (Williams ’07) for her senior thesis.

How Much Does a Diamond Cost ?

Everyone who has thought about buying a diamond knows about the four C’s of diamond pricing: Carat (weight), Color, Cut and Clarity. What are the tradeoffs among these factors? Can we build a model to accurately predict the price of a diamond knowing just these characteristics? The object of the study is to produce and diagnose such a model and to assess its limitations.

The data are a sample of 2,690 diamonds taken from the site http://www.adiamor.com/ in 2010 by Lou Valente of JMP .

Keeping a Web Cache   Fresh

Authors: Carrie Grimes, Google and Deb Nolan, University of California, Berkeley

Internet searches such as those preformed by Google, Bing, and Ask, keep copies of Web pages so that when you make a query, they can quickly search their stored pages and return their findings to you. A saved page is called a Web cache. By using caches, instead of searching hundreds of thousands of sites, the search can be performed in real time. Of course, if the page has changed since the last time it was stored then the search engine serves stale pages and the results are either out of date, or just wrong. In order to keep the cache fresh, Web pages need to be visited regularly and the cache updated with any changed pages. How often do Web pages change? How often should the sites be visited to keep the cache fresh? This case study will consider these questions by creating models of page updating.

The data are a collection of the behavior of 1,000 Web pages. Each of these pages was visited every hour for 30 days. The page was compared to the previous visit, and if it had changed, the cache was updated and time of the visit was recorded.

Better Flight Experiences with Data (Airline Delays in New York City)

Authors: Nick Horton (Amherst College) and Ben Baumer (Smith College)

If you’ve ever taken a commercial airline, you know that delays are part of the adventure. Before booking your flight, can data help you decide what time of day, what time of year and what airline to choose in order to minimize your chance of a delay? The object of the study is to explore data collected by the US Bureau of Transportation Statistics (BTS) set in order to minimize the chances of experiencing a long delay. The data set is extremely large, so the study focuses on the delays from New York City in the year 2013.

The data are collected daily by BTS. (For a summary see http://www.transtats.bts.gov/ homedrillchart.asp ). The data in this study are a subset of that data, collected by Hadley Wickham of RStudio .

Election 2000 – What Happened to Al Gore?

Authors: Deb Nolan and Jasjeet Sekhon, University of California at Berkeley

The 2000 election was extremely close with Al Gore receiving 50,999,897 votes to the 50,456,002 received by George W. Bush. However, the electoral college vote was 271 to 266 in favor of Bush, giving him the election. Those 271 electoral votes included 25 from the state of Florida where the vote was so close that a mandatory recount was preformed. The Supreme Court ended the recount on Dec. 12, 2000 awarding Florida’s votes to Bush and the election. The study explores the effect of the infamous “butterfly ballot” in Palm Beach County, and whether the voters there who, according to county records, voted for Pat Buchanan, actually wanted to vote for Al Gore.

The data are the votes, for each county in Florida cast for each of the candidates in the 2000 US presidential election.

  • Privacy Policy

Community Tags

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

7.2.1 - case-cohort study design.

A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time \(t_1\), after baseline. In a case-cohort study, the cohort members were assessed for risk factors at any time prior to \(t_1\). Non-cases are randomly selected from the parent cohort, forming a subcohort. No matching is performed.

Advantages of Case-Cohort Study:

Similar to nested case-control study design:

  • Efficient– not all members of the parent cohort require diagnostic testing
  • Flexible– allows testing hypotheses not anticipated when the cohort was drawn \((t_0)\)
  • Reduces selection bias – cases and noncases sampled from the same population
  • Reduced information bias – risk factor exposure can be assessed with investigator blind to case status

Other advantages, as compared to nested case-control study design:

  • The subcohort can be used to study multiple outcomes
  • Risk can be measured at any time up to \(t_1\) (e.g. elapsed time from a variable event, such as menopause, birth)
  • Subcohort can be used to calculate person-time risk

Disadvantages of Case-Cohort Study:

As compared to nested case-control study design:

  • subcohort may have been established after \(t_0\)
  • exposure information collected at different times (e.g. potential for sample deterioration)

Statistical Analysis for Case-Cohort Study:

Weighted Cox proportional hazards regression model (we will look at proportional hazards regression later in this course)

Help | Advanced Search

Statistics > Applications

Title: open case studies: statistics and data science education through real-world applications.

Abstract: With unprecedented and growing interest in data science education, there are limited educator materials that provide meaningful opportunities for learners to practice statistical thinking, as defined by Wild and Pfannkuch (1999), with messy data addressing real-world challenges. As a solution, Nolan and Speed (1999) advocated for bringing applications to the forefront in undergraduate statistics curriculum with the use of in-depth case studies to encourage and develop statistical thinking in the classroom. Limitations to this approach include the significant time investment required to develop a case study -- namely, to select a motivating question and to create an illustrative data analysis -- and the domain expertise needed. As a result, case studies based on realistic challenges, not toy examples, are scarce. To address this, we developed the Open Case Studies ( this https URL ) project, which offers a new statistical and data science education case study model. This educational resource provides self-contained, multimodal, peer-reviewed, and open-source guides (or case studies) from real-world examples for active experiences of complete data analyses. We developed an educator's guide describing how to most effectively use the case studies, how to modify and adapt components of the case studies in the classroom, and how to contribute new case studies. ( this https URL ).
Comments: 16 pages in main text, 3 figures, and 2 tables; 9 page in supplement
Subjects: Applications (stat.AP); Other Statistics (stat.OT)
Cite as: [stat.AP]
  (or [stat.AP] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

13. Study design and choosing a statistical test

Sample size.

statistical analysis case study

The field of statistics is the science of learning from data, and it studies that process from beginning to end to understand how to produce trustworthy results. While statistical analysis provides tremendous benefits, obtaining valid results requires proper methods for collecting your sample , taking measurements, designing experiments, and using appropriate analytical techniques. Consequently, data analysts must carefully plan and understand the entire process, from data collection to statistical analysis. Alternatively, if someone else collected the data, the analyst must understand that context to interpret their results correctly.

If you’re good with numbers and enjoy working with data, statistical analysis could be the perfect career path for you. As big data, machine learning, and technology grow, the demand for skilled statistical analysts is rising. It’s a great time to build these skills and find a job that fits your interests.

Some of the fastest-growing career paths use statistical analysis, such as statisticians , data analysts, and data engineers. In this post, you’ll learn the different types of statistical analysis, their benefits, and the key steps.

Types of Statistical Analysis

Given the broad range of uses for statistical analysis, there are several broad categories. These include descriptive, inferential, experimental, and predictive statistics. While the goals of these approaches differ, they all aim to take the raw data and turn them into meaningful information that helps you understand the subject area and make decisions.

Choosing the correct approach to bring the data to life is an essential part of the craft of statistical analysis. For all the following types of statistical analyses, you’ll use statistical reports, graphs, and tables to explain the results to others.

Descriptive

Descriptive statistical analysis describes a sample of data using various summary statistics such as measures of central tendency , variability , relative standing , and correlation . These results apply only to the items or people that the researchers measure and not to a broader population . Additionally, correlations do not necessarily imply causation.

For example, you can report the mean test score and the correlation between hours of studying and test scores for a specific class. These results apply only to this class and no one else. Do not assume the correlation implies causation.

Inferential

Inferential statistical analysis goes a step further and uses a representative sample to estimate the properties of an entire population. A sample is a subset of the population. Usually, populations are so large that it’s impossible to measure everyone in a population.

For example, if you draw a simple random sample of students and administer a test, statistical analysis of the data allows you to estimate the properties of the population. Statistical analysis in the form of hypothesis testing can determine whether the effects and correlations you observe in the sample also exist in the population.

Suppose the hypothesis test results for the correlation between hours studying and test score is statistically significant . In that case, you can conclude that the correlation you see in the sample also exists in the larger population. Despite being statistically significant, the correlation still does not imply causation because the researchers did not use an experimental design.

The sample correlation estimates the population correlation. However, because you didn’t measure everyone in the population, you must account for sampling error by applying a margin of error around the sample estimate using a confidence interval .

Learn more about the Differences between Descriptive and Inferential Statistical Analysis .

Designed Experiments

Statistical analysis of experimental designs strives to identify causal relationships between variables rather than mere correlation. Observing a correlation in inferential statistics does not suggest a causal relationship exists. You must design an experiment to evaluate causality. Typically, this process involves randomly assigning subjects to treatment and control groups.

Does increasing study hours cause test scores to improve or not? Without a designed experiment, you can’t rule out the possibility that a confounding variable and not studying caused the test scores to improve.

Suppose you randomly assign students to high and low study-time experimental groups. The statistical analysis indicates the longer-duration study group has a higher mean score than the shorter-duration group. The difference is statistically significant. These results provide evidence that study time causes changes in the test scores.

Learn more about Correlation vs. Causation and Experimental Design: Definition and Types .

Predictive statistical analysis doesn’t necessarily strive to understand why and how one variable affects another. Instead, it predicts outcomes as precisely as possible. These analyses can use causal or non-causal correlations to predict outcomes.

For example, assume that the number of ice cream cones consumed predicts the number of shark attacks in a beach town. The correlation is not causal because ice cream cone consumption doesn’t cause shark attacks. However, the number of cones sold reflects favorable weather conditions and the number of beachgoers. Those variables do cause changes in the number of shark attacks. If the number of cones is easier to measure and predicts shark attacks better than other measures, it’s a good predictive model.

Three Key Steps in Statistical Analysis

Producing trustworthy statistical analysis requires following several key steps. Each phase plays a vital role in ensuring that the data collected is accurate, the methods are sound, and the results are reliable. From careful planning to adequate sampling and insightful data analysis, these steps help researchers and businesses make informed, data-driven decisions. Below, I outline the major steps involved in conducting solid statistical analysis.

The planning step is essential for creating well-structured experiments and studies that effectively set up the statistical analysis from the start. Whether working in a lab, conducting fieldwork, or designing surveys, this stage ensures that the research design gathers data that statistical analysis can use to answer the research question effectively. Researchers can make informed decisions about variables, sample sizes, and sampling methods by analyzing data patterns and previous statistical analyses. Using the proper sampling techniques allows researchers to work with a manageable portion of the population while maintaining accuracy.

This careful preparation reduces errors and saves resources, leading to more reliable results. Optimizing research strategies during the planning phase allows scientists and businesses to focus on the most relevant aspects of their investigations, which results in more precise findings. Researchers design the best studies when they keep the statistical analysis in mind.

Learn more about Planning Sample Sizes and Sampling Methods: Different Types in Research .

Data Collection

After all the planning, the next step is to go out and execute the plan. This process involves collecting the sample and taking the measurements. It might also require implementing the treatment under controlled conditions if it’s an experiment.

For studies that use data collected by others, the researchers must acquire, prepare, and clean the data before performing the statistical analysis. In this context, a crucial part of sound statistical analysis is understanding how the data were gathered. Analysts must review the methods used in data collection to identify potential sampling biases or errors that could affect the results. Sampling techniques, data sources, and collection conditions can all introduce variability or skew the data. Without this awareness, an analyst risks drawing flawed conclusions.

Analysts can adjust their approach and account for any limitations by carefully examining how the data was collected, ensuring their statistical analysis remains accurate and trustworthy.

Statistical Analysis

After data collection, the statistical analysis step transforms raw data into meaningful insights that can inform real-world decisions. Ideally, the preceding steps have all set the stage for this analysis.

Successful research plans and their effective execution allow statistical analysis to produce clear, understandable results, making it easy to identify trends, draw conclusions, and forecast outcomes. These insights not only clarify current conditions but also help anticipate future developments. Statistical analysis drives business strategies and scientific advancements by converting data into actionable information.

Learn more about Hypothesis Testing and Regression Analysis .

Share this:

statistical analysis case study

Reader Interactions

Comments and questions cancel reply.

  • Proceedings
  • Register / Sign In
  • Access via your Institution
  • ASA-SIAM Series on Statistics and Applied Mathematics

Statistical Case Studies (Student Edition) : A Collaboration between Academe and Industry, Student Edition

  • Roxy Peck , 
  • Larry D. Haugh , and 
  • Arnold Goodman
  • Advances in Design and Control
  • CBMS-NSF Regional Conference Series in Applied Mathematics
  • Classics in Applied Mathematics
  • Computational Science & Engineering
  • Data Science
  • Discrete Mathematics and Applications
  • Financial Mathematics
  • Frontiers in Applied Mathematics
  • Fundamentals of Algorithms
  • Mathematical Modeling and Computation
  • Mathematics in Industry
  • MOS-SIAM Series on Optimization
  • Other Titles in Applied Mathematics
  • SIAM Spotlights
  • Software, Environments, and Tools
  • Studies in Applied and Numerical Mathematics
  • Textbook Adoption

Bonus Material (link)

Statisticians know that the clean data sets that appear in textbook problems have little to do with real-life industry data. To better prepare their students for all types of statistical careers, academic statisticians now strive to use data sets from real-life statistical problems. This book contains 20 case studies that use actual data sets that have not been simplified for classroom use. Each case study is a collaboration between statisticians from academe and from business, industry, or government.

This book is the result of a collaborative workshop of statisticians focusing on academic-industrial partnerships. The cases come from a wide variety of application areas, including biology/environment, medical and health care, pharmaceutical, marketing and survey research, and manufacturing.

  • statistics ,
  • case studies

Front Matter

The front matter includes the title page, series page, copyright page, dedication, TOC, preface, and introductions.

1. Are the Fish Safe to Eat? Assessing Mercury Levels in Fish in Maine Lakes

  • Jennifer A. Hoeting , 
  • Anthony R. Olsen

The information in this article has been funded in part by the United States Environmental Protection Agency. It has been subjected to Agency peer review and approved for publication. The conclusions and opinions are solely those of the authors and are not necessarily the views of the Agency .

Mercury is a toxic metal sometimes found in fish consumed by humans. The state of Maine conducted a field study of 115 lakes to characterize mercury levels in fish, measuring mercury and 10 variables on lake characteristics. From these data, we can investigate four questions of interest: 1. Are mercury levels high enough to be of concern in Maine lakes? 2. Do dams and other man-made flowage controls increase mercury levels? 3. Do different types of lakes have different mercury levels? 4. Which lake characteristics best predict mercury levels?

Introduction

In May, 1994, the state of Maine issued the following health advisory regarding mercury in Maine lakes, warning citizens of the potential health effects of consuming too much fish from Maine lakes [Bower et al., 1997]:

“Pregnant women, nursing mothers, women who may become pregnant, and children less than 8 years old, should not eat fish from lakes and ponds in the state. Other people should limit consumption (eating) fish from these waters to 6–22 meals per year. People who eat large (old) fish should use the lower limit of 6 fish meals per year. People who limit themselves to eating smaller (younger) fish may use the upper limit of 22 fish meals per year.”

2. Chemical Assay Validation

  • Russell Reeve , 
  • Francis Giesbrecht

Many manufacturing processes depend upon measurements made on the product of the process. To maintain control over the manufacturing process, these measurements must themselves come from a measuring process of satisfactory quality. Therefore, an assessment of the characteristics of the measurement process is important. This case study discusses the statistical analysis of a measuring process set in the pharmaceutical industry: assay validation. Here we discuss one facet of assay validation: the assay's accuracy and repeatability.

While the terminology of this case study comes out of the pharmaceutical/biotechnology industries, the statistical reasoning crosses the boundaries of many industries.

In the pharmaceutical industry, chemical assays must be validated before use in pharmacokinetic/pharmacodynamic studies, manufacturing, or stability analyses. A method validation is very similar in principle to gage studies (defined below) found in other industries; however, a validation is more extensive. We will discuss one component of a validation package: the analysis of a method's accuracy and precision. In general, the accuracy refers to the bias of a method, while precision refers to the variability in a method, usually measured by the coefficient of variation (CV); in some laboratories, the CV is called the relative standard deviation, or RSD for short.

Background Information

A gage study is any study of a measuring process designed to assess the measuring process' capability. The chief concerns in a gage study are the measuring process' accuracy, reproducibility, and repeatability.

3. Automating a Manual Telephone Process

  • Mary Batcher , 
  • Kevin Cecco , 

This article was written and prepared by U.S. Government employees on official time. It is in the public domain and not subject to U.S. copyright. The content of this article is the opinion of the writer and does not necessarily represent the position of the Internal Revenue Service. The mention of specific product or service in this article does not imply endorsement by any agency of the federal government to the exclusion of others which may be suitable .

This case study illustrates the use of statistics to evaluate a new technology that will be implemented nationally in over 30 locations if it proves successful in a pilot study. Specifically, the case study is of an interactive telephone application that will let certain types of calls to the IRS be handled without the intervention of a staff person.

When introducing new technology, it is important to consider the human interaction with that technology. The technology may function perfectly but the ability or willingness of people to use it may not be there. It is thus very important to pilot test new systems. The interactive telephone system was pilot tested and evaluated in terms of cost, customer satisfaction, and ease of use. This case study focuses on the assessment of cost, in terms of time to complete a transaction, and ease of use, in terms of the percent of users who successfully completed their transaction without requiring the assistance of IRS staff. The case study illustrates the use of hypothesis testing in decision making and the use of basic time series statistics and plots to examine periodic fluctuation over time.

4. Dissolution Method Equivalence

In this case study, we will explore the concept of equivalence, and in particular how it relates to a dissolution test. Many questions that are answered with hypothesis testing could be better answered using an equivalence approach. A discussion of dissolution tests will come first; it will be followed by an explanation of why we need criteria for equivalence. It is helpful if the reader has at least read the Chemical Assay Validation case study (Chapter 2) before proceeding with this case study, though it is not necessary.

There are two issues in this case study: (1) Devise statistical criteria to decide if two sites yield equivalent results, and (2) apply the methodology developed on two data sets. Note that (1) is more theoretical in nature, while (2) applies the mathematical work of (1) to data. Since many comparisons are best dealt with as equivalence problems, the methodology has wide applicability.

A dissolution test measures how fast a solid-dosage pharmaceutical product dissolves [USP XXII], [Cohen et al., 1990]. Since variation in dissolution profiles can have deleterious effects on the in vivo performance of a solid-dosage product, a test that measures the dissolution is of upmost importance.

The dissolution apparatus often consists of six vessels, each containing a dissolving solution—typically water with pH adjusted to cause the tablets or capsules to dissolve, though in some cases digestive enzymes may be used to more realistically simulate that action of the stomach. The sampled units are dropped into the vessels; the units here are either individual capsules or tablets; see Fig. 1. The vessels themselves are in a water bath to maintain a nearly constant temperature.

5. Comparison of Hospital Length of Stay between Two Insurers for Patients with Pediatric Asthma

  • Robert L. Houchens , 
  • Nancy Schoeps

This case study investigates the relative importance of several factors in predicting the length of time young patients with asthma stay in the hospital. With the present atmosphere of cutting health care costs it is important to look at providing adequate care while at the same time reducing costs and not keeping children in the hospital longer than necessary. By looking at a sample of patients with pediatric asthma, concomitant factors to the main reason for being admitted to the hospital may shed light on different lengths of stay.

In today's healthcare environment, health insurance companies are increasingly pressuring hospitals to provide high quality health services at the lowest possible cost. The vast majority of all healthcare costs are for hospitalization. During the past decade, inpatient costs of patients in hospitals have been reduced in two primary ways. First, the less severe cases are now treated in the doctor's office or in hospital emergency rooms rather than being admitted to the hospital. Second, for cases admitted to the hospital, the lengths of hospital stays have been considerably shortened.

It is believed that some insurers have been more successful than others at minimizing hospital lengths of stay (LOS). To test this, a sample of hospital medical records was drawn for each of several illnesses from metropolitan hospitals operating in one state. The data for this case study consists of information abstracted from the medical records of asthma patients between the ages of 2 and 18 years old.

6. Comparing Nonsteriodal Anti-Inflammatory Drugs with Respect to Stomach Damage

  • Tom Filloon , 

This case study may seem to be a rather trivial exercise, but we feel that it contains many of the important ideas that applied statisticians use on a day-to-day basis. It discusses the quantile—quantile (QQ) plot, normality assumptions, comparing distributions, and calculating p -values. Furthermore, it shows the great utility of the Mann—Whitney—Wilcoxon rank sum approach.

Many people take medication daily for the treatment of arthritis. Painful, swollen joints are a source of problems for arthritis sufferers. Pain relief and anti-inflammatory benefits can be achieved by drugs classified as NSAIDs (NonSteroidal Anti-Inflammatory Drugs), which include such drugs as ibuprofen (Motrin). One potential side effect with the long-term use of this class of drugs is that they can possibly cause severe stomach damage (lesions, ulcers, perforation, death). In addition, if a person has developed a stomach ulcer, then this type of drug has the potential for delaying the time it takes for an ulcer to heal. The goal of a pharmaceutical company's research is to provide a better, safer drug for treating arthritis (i.e., developing an arthritis drug that does not slow the ulcer healing process). In this study, we are evaluating two drugs in an animal ulcer healing experiment in an effort to determine a new, more stomach-safe NSAID for use by arthritis sufferers. Analysis of this data will include descriptive statistics, assessing normality, permutation testing, and sample size determination.

An animal (rat) experimental model has been developed to evaluate NSAIDs with regard to their effects on ulcer healing. In this animal model, all animals are given a large dose of a known stomach-damaging compound. It has been shown that after approximately 2 weeks, the majority of the stomach damage created by the initial insult is gone (i.e., substantial ulcer healing has taken place).

7. Validating an Assay of Viral Contamination

  • Lawrence I-Kuei Lin , 
  • W. Robert Stephenson

Viral contamination is of great concern to the makers, and users, of biological products such as blood clotting Factor Eight (given to people with hemophilia) and human blood substitute (a product still in development). How does one guarantee that such products are free of viral contamination? The first step is to have an assay that can accurately and precisely measure viral contamination. An assay is an analysis of a substance to determine the presence or absence of a specific ingredient. Most of you will be familiar with the idea of an assay of mineral ore to determine the amount of gold. In a viral assay, a solution is analyzed to determine the presence or absence of a specific virus. A viral assay can also be used to determine the amount of virus in the solution, the total viral titer. In order to ensure the accuracy and precision of an assay, it must be validated. The validation of an assay has three components: linearity (or proportionality), precision, and sensitivity. Each of these components requires the use of statistical methods. This case study looks at validating a viral assay using bovine viral diarrhea virus (BVDV). Other methods are used to validate viral assays for human immunodeficiency virus (HIV), the virus that causes AIDS.

In order to validate an assay one must start with something that has a known viral contamination. To do this, virologists spike a sterile stock solution with a known amount of a particular virus, in our case BVDV. BVDV is a virus that affects the gastrointestinal system of cattle causing severe diarrhea. The virus is particularly harmful to pregnant cattle because of its ability to infect the fetus. BVDV is closely related to the hog cholera virus and a similar virus that affects sheep. The BVDV has the property that when cultured in a petri dish the viral particles form plaques, circular regions in the culture medium. These plaques are easily visible under a microscope or to the naked eye when a stain is used. Each plaque is associated with a single viral particle.

8. Control Charts for Quality Characteristics under Nonnormal Distributions

  • Youn-Min Chou , 
  • Galen D. Halverson , 
  • Steve T. Mandraccia

When using Shewhart control charts, the underlying distribution of the quality characteristic must be at least approximately normal. In many processes, the assumption of normality is violated or unjustifiable. If the quality characteristic is not so distributed, then the control limits may be entirely inappropriate, and we may be seriously misled by using these control charts. In this case study, we discuss several “state of the art” curve-fitting methods for improving the technical validity of control charts for the nonnormal situation. We also compare their practical application qualities using data from the semiconductor industry.

To set up a control chart for the particle counts, the control limits may be calculated according to historical data from the particle count database. Two frequently used charts for particle counts are the c (or number of particles per wafer) chart and the x (or individual measurements) chart. See [Montgomery, 1996]. The basic probability models used for these charts are, respectively, the Poisson distribution and the normal distribution.

In many practical situations, these models may not be appropriate. In such cases, the conclusions drawn from the analysis will be invalid. To solve these problems, we propose to transform data to near normality and then apply the normal-based control charts to the transformed data.

Statistical process control techniques have been employed in manufacturing industries to maintain and improve quality by applying statistical methods to the data collected from a process.

9. Evaluation of Sound to Improve Customer Value

  • John R. Voit , 
  • Esteban Walker

The rumble of a motor, the tick of a clock, the hum of a generator are all product sounds that are not necessary for product performance. Some of these unnecessary noises may be pleasant to the consumer, whereas others may be annoying or even intolerable. Unfortunately, “pleasantness” is inherently a subjective characteristic and thus is difficult to measure reliably. Subjective characteristics are often evaluated through a panel of judges using a method to rank the items. There is little information on which method provides the most reliable and consistent rankings. This article compares two subjective evaluation methods commonly used by panels of judges to rate noises. Using data from a manufacturing process, the methods are compared on the basis of

• Consistency of judges within panels,

• Consistency of panels over time,

• Agreement between an expert panel and a nonexpert panel.

There exists an increased need in the engineering community to evaluate noise “quality” to justify adding unit cost to better satisfy the consumer. For instance, the sound of a car air conditioning (AC) system is known to be annoying to some customers. This fact has been conveyed through warranty claims for noise complaints and customer clinics, where people were asked about their AC units. The engineers have several design options that will reduce the noise generated by the AC unit; however, all will increase the cost of production.

With a reliable method to evaluate the AC unit noise, engineers and managers would be better able to determine the option that represents the best value to the consumer. The same principles can be applied to other products where costly design changes are considered to reduce objectionable noises.

10. Improving Integrated Circuit Manufacture Using a Designed Experiment

  • Veronica Czitrom , 
  • John Sniegowski , 
  • Larry D. Haugh

Integrated circuits (chips) are essential components in the electronics industry, where they are used in computer products, radios, airplanes, and other electronic equipment. Numerous integrated circuits are manufactured simultaneously on one silicon wafer. Chemical etching (removing an oxide layer from a wafer) is one of many manufacturing steps in the creation of an integrated circuit. A designed experiment was performed to improve control of an etching process. It was necessary to increase the CF 4 gas flow beyond what development engineers had recommended, and it was hoped that two other factors, electric power and bulk gas flow, could be used to offset the effect of this increase on three important responses related to yield and throughput: etch rate, etch rate nonuniformity, and selectivity. The designed experiment allowed a systematic and efficient study of the effect of the three factors on the responses. Settings were found that allowed the CF 4 gas flow to be increased.

The semiconductor industry is the foundation of the electronics industry, the largest industry in the U.S., employing 2.7 million Americans. The semiconductor industry manufactures integrated circuits, or chips, for use in airplanes, computers, cars, televisions, and other electronic equipment. Each integrated circuit consists of thousands of interconnected microscopic elements such as transistors and resistors. The smallest active features are 0.5 microns in width, or approximately 1/150th the diameter of a human hair. During manufacture, many integrated circuits are created simultaneously on a thin round silicon wafer. The wafer goes through a very complex set of manufacturing steps that can take up to two months to complete.

11. Evaluating the Effects of Nonresponse and the Number of Response Levels on Survey Samples

  • Robert K. Smidt , 
  • Robert Tortora

The purpose of this case study is to examine two elements in survey sampling that affect estimation: nonresponse and the number of response levels allowed for any question. The first factor, nonresponse, causes difficulties in surveys, especially when those who fail to respond are liable to be different from those who do. Estimates based on such responses will have substantial errors. The second factor is associated with surveys that employ the Likert scale. The Likert scale offers a series of “ k ” ordered responses indicating the degree of agreement (or satisfaction or support, etc.) for the question under consideration. The choice of k , the number of available responses to a survey question, is crucial, particularly when interest lies in estimating the percent that belongs in the top category. We would like to examine the combined effect of these two factors on estimation.

Survey samples are used to gather information from many and diverse groups. Based on these, elections are predicted before the polls close, television networks decide which programs to replace, advertising firms choose the group to target with their marketing strategies, and companies retool their factories. It is crucial for sample surveys to be designed so that representative information is obtained from the appropriate group. Failure to do so can lead to disastrous results. Introductory statistics texts enjoy describing the Literary Digest's attempt to forecast the Roosevelt/Landon election (remember President Alf?) or presenting the photograph of Harry S. Truman holding aloft the headline proclaiming Dewey's victory. Less dramatic but often more costly mistakes are made when inaccurate sample surveys lead firms to make bad decisions and take inappropriate actions.

12. Designing an Experiment to Obtain a Target Value in the Chemical Processes Industry

  • Michael C. Morrow , 
  • Thomas Kuczek , 
  • Marcey L. Abate

The case history presents the issues encountered when designing an experiment. The emphasis is on the design, not the analysis. A proper design makes the statistical analysis straightforward. A lot of design issues are presented. An appropriate design option is chosen and evaluated for sensitivity given the constraints of the process. The data from the experiment is analyzed and summarized. Conclusions from the planning process and analysis are presented. It is hoped that students exposed to this case study will get a taste of what experimental design is truly about.

The problem to be solved is to identify the critical variables involved in a chemical process and then to come up with an appropriate experimental design which will help put the process on target. The key issues are setting the objectives of the experiment and then choosing the design to achieve the objectives of the experiment. The emphasis in this case history is on the planning process, although the analysis of the chosen design is also presented.

A major goal in the production of plastic pellets at Eastman Chemical Company is to keep a property of the plastic pellets, in this case Response, as close to a target value as possible. It is critical that the response of the plastic pellets produced be close to the target value, for if it is not, the pellets cannot be used efficiently in the manufacturing processes of Eastman's customers. Eastman's customers use the pellets to produce sheeting, containers, refrigerator components, display cases, and so forth. If the response were to deviate from the target response value by a high enough margin, the result could be unsellable material or a substandard product if the pellets were used.

13. Investigating Flight Response of Pacific Brant to Helicopters at Izembek Lagoon, Alaska by Using Logistic Regression

  • Wallace P. Erickson , 
  • Todd G. Nick , 
  • David H. Ward

Izembek Lagoon, an estuary in Alaska, is a very important staging area for Pacific brant, a small migratory goose. Each fall, nearly the entire Pacific Flyway population of 130,000 brant flies to Izembek Lagoon and feeds on eelgrass to accumulate fat reserves for nonstop transoceanic migration to wintering areas as distant as Mexico. In the past 10 years, offshore oil drilling activities in this area have increased and, as a result, the air traffic in and out of the nearby Cold Bay airport has also increased. There has been a concern that this increased air traffic could affect the brant by disturbing them from their feeding and resting activities, which in turn could result in reduced energy intake and buildup. This may increase the mortality rates during their migratory journey. Because of these concerns, a study was conducted to investigate the flight response of brant to overflights of large helicopters. Response was measured on flocks during experimental overflights of large helicopters flown at varying altitudes and lateral (perpendicular) distances from the flocks. Logistic regression models were developed for predicting probability of flight response as a function of these distance variables. Results of this study may be used in the development of new FAA guidelines for aircraft near Izembek Lagoon.

14. Estimating the Biomass of Forage Fishes in Alaska's Prince William Sound Following the Exxon Valdez Oil Spill

  • Winson Taam , 
  • Lyman McDonald , 
  • Kenneth Coyle , 
  • Lew Halderson

The Alaska Predator Ecosystem Experiment (APEX) is a research project to determine why some species of seabirds whose populations were reduced by the Exxon Valdez oil spill in Prince William Sound, Alaska are not recovering. An acoustic survey was performed in the Sound to estimate the abundance and distribution of forage fishes and seabirds in the region. APEX involves a number of aspects, including estimation of seabird population sizes, food abundance, and state of the ocean. The sampling design was conducted with designated straight line paths transecting in three regions of the sound in July, 1995. These three regions were chosen to represent three levels of impact by the Exxon Valdez accident. The data consist of acoustic sonar signals collected on each transect using surface sensors, observer sightings of birds, net sampling of fishes, and water and weather conditions. This case study provides analysis of a segment of this study; namely, estimating the biomass of one species of forage fish with spatially correlated data. Other components of the project will evaluate the forage fish data collected in concert with seabird reproduction data over three years, 1995–1997, in an attempt to determine if food is limiting recovery of the piscivorous (fish-eating) seabirds.

In a field study, many issues related to planning, execution, and analysis are crucial to the success of a project.

15. A Simplified Simulation of the Impact of Environmental Interference on Measurement Systems in an Electrical Components Testing Laboratory

  • David A. Fluharty , 
  • Yiqian Wang , 
  • James D. Lynch

In the evolution of the automobile, electrical signal transmission is playing a more prominent role in automotive electronic systems. Since signal transmission involves voltages and currents in circuitry that are considerably less than for power transmission circuits, corrosion buildup in connections of such low energy circuits—referred to as dry circuits—is a considerable problem because corrosion buildup increases resistance.

To study the reliability of dry circuits in a laboratory setting, automotive engineers test connections and measure the resistance in a connection as a surrogate for connection failure. Since resistance is measured indirectly via Ohm's Law using circuit voltage and current measurements, the voltage and current measurement errors propagate through Ohm's Law to affect the calculated resistance. In addition, such tests are very sensitive to external voltage sources that can be difficult, if not impossible, to control even in laboratory settings. Because these tests are performed with voltages and currents that are very small, the sensitivity of the calculated resistance to error propagation and to intermittent voltage sources are important issues. The purpose of this project is to gain insight into these issues through a simulation study.

16. Cerebral Blood Flow Cycling: Anesthesia and Arterial Blood Pressure

  • Michael H. Kutner , 
  • Kirk A. Easley , 
  • Stephen C. Jones , 
  • G. Rex Bryce

Cerebral blood flow (CBF) sustains mental activity and thought. Oscillations of CBF at a frequency of 6 per minute, termed CBF cycling, have been suspected of being dependent on the type of anesthesia [Clark Jr., Misrahy, and Fox, 1958; Hundley et al., 1988]. Thus, we investigated the effects on CBF cycling using different anesthetics [Jones et al., 1995].

CBF is important because it sustains mental activity and thought. The research question asked here was whether CBF cycling is influenced by the type of anesthesia. Because cycling is enhanced at lower arterial pressures, blood was withdrawn (exsanguinated) in all experimental animals to reduce their arterial pressure. Analysis of covariance is used to explore whether cycling characteristics, amplitude, and frequency differ by type of anesthesia while controlling for the amount of blood pressure change induced by exsanguination.

CBF is important because it sustains neuronal activity, the supposed basis of mental activity and thought [Chien, 1985]. As various regions of the brain are activated by sensory or motor demands, the level of blood flow adjusts regionally in a dynamic fashion to support the local changes in neuronal activity [Lindauer, Dirnagl, and Villringer, 1993].

Fluctuations or oscillations of CBF and their relation to the rate of oxygen use by the brain have been noted by many workers [Clark Jr., Misrahy, and Fox, 1958; Vern et al., 1988], but generally they have only been reported as secondary results and usually only in a small fraction of the subjects studied. Because these oscillations occur with a dominant frequency near 0.1 Hz, or 6/min, and often have a high amplitude that can approach 15% of the mean value of CBF, they have intrigued many who have sought to understand their physiological significance and possible relation to pathology [Jöbsis et al., 1977; Mayevsky and Ziv, 1991].

17. Modeling Circuit Board Yields

  • Lorraine Denby , 
  • Karen Kafadar , 

The manufacturing of products often involves a complicated process with many steps, the quality of which depends upon the complexity of the individual tasks. More complex components can, but need not, result in lower success rates in the final product. “Success” is measured differently for different products; it may be as simple as “unit turns on and off properly” or more quantitative such as “output power falls within the range 100 ± 0.05 watts.”

The cost of poor quality is significant in a manufacturing plant: loss of a functioning unit that could have been sold for profit, lost employee time that produced the defective unit, diagnosing the problem, and correcting it if feasible, and materials from the unit that are no longer usable (scrapped). Thus, managers focus on ways of designing and building quality into the final product. If certain characteristics can be manufactured more successfully than others, it behooves the designer to incorporate such features wherever possible without sacrificing optimal performance. Statistical quality control usually involves the monitoring of product quality over time to ensure consistent performance of manufactured units. Our focus here is on quality one step before: to identify the characteristics of products which lead to higher probability of successful performance.

In this case study, we analyze the yield of printed circuit boards, i.e., the percent of boards in a production lot which function properly. Printed circuit boards are used in hundreds of electronic components, including computers, televisions, stereos, compact disk players, and control panels in automobiles and aircraft.

18. Experimental Design for Process Settings in Aircraft Manufacturing

  • Roger M. Sauter , 
  • Russell V. Lenth

This case study is about designing and analyzing experiments that are relevant to hole-drilling operations in aircraft. When a change was made to a new lubricant, it was necessary to do some experimentation to learn how much of this lubricant should be used and how it interacts with other process factors such as drill speed. Several factors are involved in our experiment, and there are physical and time constraints as well, necessitating an incomplete-block experiment where only a subset of the factor combinations are used on any one test coupon. The reader is guided through the design and analysis, considering some related practical issues along the way.

The goal of this study is to design and analyze an experiment that will help improve a manufacturing process—in this case, the assembly of aircraft.

The skin of the fuselage (i.e., the body of the plane—see Figure 1) and wings are made of pieces of metal, overlapped and fastened together with rivets. Thus, the process involves drilling a very large number of holes. These must be positioned accurately, and the quality of the holes themselves is important.

The experiment in this study was motivated by a change in lubricant. A certain amount of lubrication reduces friction, prevents excessive heat, and improves hole quality; however, too much lubricant can be problematic because the drill bit may not get enough “bite.” In the past, chlorofluorocarbons (CFCs) had been used as a lubricant. These can no longer be used due to environmental concerns, necessitating the use of a new class of lubricants. Changing the lubricant can have a far-reaching impact on the entire process; hence, experimental methods are used to study the key control variables—including the amount of lubricant—in the hole-drilling process.

19. An Evaluation of Process Capability for a Fuel Injector Process Using Monte Carlo Simulation

  • Carl Lee , 
  • Gus A. D. Matzo

Capability indices are widely used in industry for investigating how capable a manufacturing process is for producing products that confirm the engineer's specification limits for essential quality characteristics. Companies use them to demonstrate the quality of their products. Vendees use them to decide their business relationship with the manufacturer. One important underlying assumption for capability analysis is that the quality characteristic should follow a normal distribution. Unfortunately, many quality characteristics do not meet this assumption. For example, the leakage from a fuel injector follows a very right-skewed distribution. Most of the leakages are less than one ml, with a few cases over one ml and some rare cases over three ml. It is important to understand how well these indices perform if the underlying distribution is skewed. This case study is initiated from the concerns of using these indices for reporting the capability of a fuel injector process in an engine manufacturing plant.

The fuel injection system of an automobile meters fuel into the incoming air stream, in accordance with engine speed and load, and distributes this mixture uniformly to the individual engine cylinders. Each cylinder is connected with an injector. When an engine is turned on, fuel injectors inject fuel into individual cylinder along with the incoming air to form a uniform mixture for ignition. When the engine is turned off, the injector should stop injecting fuel immediately. However, during the first few seconds after turning off the engine, a tiny amount of fuel may leak out from the injector into the engine cylinder. This is the injector leakage. Such leakage is undesirable from an emissions standpoint.

20. Data Fusion and Maintenance Policies for Continuous Production Processes

  • Nozer D. Singpurwalla , 
  • Joseph N. Skwish

Continuous production processes involve round-the-clock operation of several, almost identical, pieces of equipment that are required to operate concurrently. Failure of one of these pieces of equipment interrupts the flow of production and incurs losses due to waste of raw material. The incidences of this in-service failure can be reduced through preventive maintenance; however, preventive maintenance also interrupts production and creates waste. Thus, the desire to prevent in-service failures while minimizing the frequency of preventive maintenance gives rise to the problem of determining an optimal system-wide maintenance interval. The aim of this study is to propose a procedure for addressing problems of this type.

The maintenance of equipment used in continuous manufacturing processes, such as refining oil and the production of paper, steel, synthetics, and textiles, presents a generic class of problems on which little has been written. Such processes are characterized by round-the-clock operation of several pieces of almost identical equipment, called “processing stations,” to which there is a continuous flow of raw material; see Fig. 1. Examples of such equipment are the spinning wheels of textile mills and the extrusion dies of chemical and steel plants.

Each processing station converts raw material to a finished product, and all stations operate concurrently. Since a common flow of raw material feeds all stations, the flow cannot be stopped to stations that are out of service. Thus, whenever a station experiences an in-service failure, there is a loss of production and a wastage of raw material.

The incidence of these in-service failures can be reduced through periodic preventive maintenance; however, since preventive maintenance also interrupts the flow of production and creates waste, it should be performed only as often as necessary.

Back Matter

The back matter includes index.

cover image ASA-SIAM Series on Statistics and Applied Mathematics

Pricing Options

  • Book Details Published: 1998 ISBN: 978-0-89871-421-0 eISBN: 978-0-89871-974-1 https://doi.org/10.1137/1.9780898719741 Book Series Name: ASA-SIAM Series on Statistics and Applied Mathematics Book Code: SA04 Book Pages: xxxi + 175 BibTex

Recommended Content

40 off sale logo

Request Username

Can't sign in? Forgot your username?

Enter your email address below and we will send you your username

If the address matches an existing account you will receive an email with instructions to retrieve your username

Change Password

Your password must have 2 characters or more and contain 3 of the following:.

  • a lower case character, 
  • an upper case character, 
  • a special character 

Password Changed Successfully

Your password has been changed

Can't sign in? Forgot your password?

Enter your email address below and we will send you the reset instructions

If the address matches an existing account you will receive an email with instructions to reset your password

Verify Phone

Your Phone has been verified

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 03 February 2011

Basic statistical analysis in genetic case-control studies

  • Geraldine M Clarke 1 ,
  • Carl A Anderson 2 ,
  • Fredrik H Pettersson 1 ,
  • Lon R Cardon 3 ,
  • Andrew P Morris 1 &
  • Krina T Zondervan 1  

Nature Protocols volume  6 ,  pages 121–133 ( 2011 ) Cite this article

31k Accesses

321 Citations

13 Altmetric

Metrics details

  • Disease model
  • Genetic association study
  • Statistical methods

This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of measures of association and relevance of disease models; (ii) appropriate selection of tests of association; (iii) visualization and interpretation of results; (iv) consideration of appropriate methods to control for multiple testing; and (v) replication strategies. Assuming no previous experience with software such as PLINK, R or Haploview, we describe how to use these popular tools for handling single-nucleotide polymorphism data in order to carry out tests of association and visualize and interpret results. This protocol assumes that data quality assessment and control has been performed, as described in a previous protocol, so that samples and markers deemed to have the potential to introduce bias to the study have been identified and removed. Study design, marker selection and quality control of case-control studies have also been discussed in earlier protocols. The protocol should take ∼ 1 h to complete.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 12 print issues and online access

251,40 € per year

only 20,95 € per issue

Buy this article

  • Purchase on SpringerLink
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

statistical analysis case study

Similar content being viewed by others

statistical analysis case study

Genome-wide association studies

statistical analysis case study

Opportunities and challenges for the use of common controls in sequencing studies

statistical analysis case study

Rare-variant collapsing analyses for complex traits: guidelines and applications

Zondervan, K.T. & Cardon, L.R. Designing candidate gene and genome-wide case-control association studies. Nat. Protoc. 2 , 2492–2501 (2007).

Article   CAS   Google Scholar  

Pettersson, F.H. et al. Marker selection for genetic case-control association studies. Nat. Protoc. 4 , 743–752 (2009).

Anderson, C.A. et al. Data quality control in genetic-case control association studies. Nat. Protoc. 5 , 1564–1573 (2010).

Morris, A.P. & Zeggini, E. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet. Epidemiol. 34 , 188–193 (2010).

Article   Google Scholar  

Cho, E.Y. et al. Genome-wide association analysis and replication of coronary artery disease in South Korea suggests a causal variant common to diverse populations. Heart Asia 2 , 104–108 (2010).

PubMed   PubMed Central   Google Scholar  

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 , 661–678 (2007).

The International HapMap Project. Nature 426 , 789–796 (2003).

Anderson, C.A. et al. Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am. J. Hum. Genet. 83 , 112–119 (2008).

Camp, N.J. Genomewide transmission/disequilibrium testing—consideration of the genotypic relative risks at disease loci. Am. J. Hum. Genet. 61 , 1424–1430 (1997).

Balding, D.J., Bishop, M. & Cannings, C. Handbook of Statistical Genetics (John Wiley & Sons Ltd., 2003).

Bishop, Y.M.M., Fienberg, S.E. & Holland, P.W. Discrete Multivariate Analysis: Theory and Practice (MIT Press, 557, 1975).

Cochran, W.G. Some methods for strengthening the common chi-squared test. Biometrics 10 (1954).

Armitage, P. Tests for linear trends in proportions and frequencies. Biometrics 11 , 375–386 (1955).

Rice, J.A. Mathematical Statistics and Data Analysis (Duxbury Press, 1995).

Sidak, Z. On multivariate normal probabilities of rectangles: their dependence on correlations. Ann. Math. Statist. 39 , 1425–1434 (1968).

Sidak, Z. On probabilities of rectangles in multivariate Student distributions: their dependence on correlations. Ann. Math. Statist. 42 , 169–175 (1971).

Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6 , 65–70 (1979).

Google Scholar  

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. Royal Statist. Soc. Series B-Methodological 57 , 289–300 (1995).

Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29 , 1165–1188 (2001).

Westfall, P.H. & Young, S.S. Resampling-Based Multiple Testing: Examples and Methods for P-value Adjustment xvii, 340 p. (John Wiley & Sons, 1993).

Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32 , 227–234 (2008).

Hoggart, C.J., Clark, T.G., De Iorio, M., Whittaker, J.C. & Balding, D.J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32 , 179–185 (2008).

Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M.J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32 , 381–385 (2008).

Weir, B.S., Hill, W.G. & Cardon, L.R. Allelic association patterns for a dense SNP map. Genet. Epidemiol. 27 , 442–450 (2004).

Knowler, W.C., Williams, R.C., Pettitt, D.J. & Steinberg, A.G. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am. J. Hum. Genet. 43 , 520–526 (1988).

CAS   PubMed   PubMed Central   Google Scholar  

Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55 , 997–1004 (1999).

de Bakker, P.I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17 , R122–R128 (2008).

Clarke, G.M., Carter, K.W., Palmer, L.J., Morris, A.P. & Cardon, L.R. Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81 , 995–1005 (2007).

Skol, A.D., Scott, L.J., Abecasis, G.R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 , 209–213 (2006).

Skol, A.D., Scott, L.J., Abecasis, G.R. & Boehnke, M. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31 , 776–788 (2007).

R Development Core Team.. A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2009).

Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81 , 559–575 (2007).

Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21 , 263–265 (2005).

Fox, J. An R and S-Plus Companion to Applied Regression , xvi, 312 p. (Sage Publications, 2002).

Nyholt, D.R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74 , 765–769 (2004).

Hosmer, D.W. & Lemeshow, S. Applied Logistic Regression , xii, 373 p. (Wiley, 2000).

Dalgaard, P. Introductory Statistics with R , xvi, 363 p. (Springer, 2008).

Pettersson, F., Jonsson, O. & Cardon, L.R. GOLDsurfer: three dimensional display of linkage disequilibrium. Bioinformatics 20 , 3241–3243 (2004).

Pettersson, F., Morris, A.P., Barnes, M.R. & Cardon, L.R. Goldsurfer2 (Gs2): a comprehensive tool for the analysis and visualization of genome wide association studies. BMC Bioinformatics 9 , 138 (2008).

Download references

Acknowledgements

G.M.C. is funded by the Wellcome Trust. F.H.P. is funded by the Welcome Trust. C.A.A. is funded by the Wellcome Trust (WT91745/Z/10/Z). A.P.M. is supported by a Wellcome Trust Senior Research Fellowship. K.T.Z. is supported by a Wellcome Trust Research Career Development Fellowship.

Author information

Authors and affiliations.

Genetic and Genomic Epidemiology Unit, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK

Geraldine M Clarke, Fredrik H Pettersson, Andrew P Morris & Krina T Zondervan

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

Carl A Anderson

GlaxoSmithKline, King of Prussia, Pennsylvania, USA.,

Lon R Cardon

You can also search for this author in PubMed   Google Scholar

Contributions

G.M.C. wrote the first draft of the manuscript, wrote scripts and performed analyses. G.M.C., C.A.A., A.P.M. and K.T.Z. revised the manuscript and designed the protocol. L.R.C. conceived the protocol.

Corresponding author

Correspondence to Geraldine M Clarke .

Ethics declarations

Competing interests.

The authors declare no competing financial interests.

Supplementary information

Supplementary data 1.

Example genome wide association (GTA) data. (ZIP 201229 kb)

Supplementary Data 2

Example candidate gene 9 (CG) data. (ZIP 66 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Clarke, G., Anderson, C., Pettersson, F. et al. Basic statistical analysis in genetic case-control studies. Nat Protoc 6 , 121–133 (2011). https://doi.org/10.1038/nprot.2010.182

Download citation

Published : 03 February 2011

Issue Date : February 2011

DOI : https://doi.org/10.1038/nprot.2010.182

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Haplotype based testing for a better understanding of the selective architecture.

  • Marta Pelizzola
  • Andreas Futschik

BMC Bioinformatics (2023)

Leveraging Mann–Whitney U test on large-scale genetic variation data for analysing malaria genetic markers

  • Kah Yee Tai
  • Jasbir Dhaliwal
  • Vinod Balasubramaniam

Malaria Journal (2022)

Transfer learning for genotype–phenotype prediction using deep learning models

  • Muhammad Muneeb
  • Samuel Feng
  • Andreas Henschel

BMC Bioinformatics (2022)

A systematic review and meta-analysis of HLA class II associations in patients with IgG4 autoimmunity

  • Anja Panhuber
  • Giovanni Lamorte
  • Inga Koneczny

Scientific Reports (2022)

Effect of L3MBTL3/PTPN9 polymorphisms on risk to alcohol-induced ONFH in Chinese Han population

  • Xiang-Zhou Zeng

Neurological Sciences (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

statistical analysis case study

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Statistical analysis of case-control studies

Affiliation.

  • 1 Department of Applied Medical Sciences, School of Applied Science, University of Southern Maine, Portland 04103.
  • PMID: 7925726
  • DOI: 10.1093/oxfordjournals.epirev.a036143

Methods of analysis of results from case-control studies have evolved considerably since the 1950s. These methods have helped to improve the validity of the conclusions drawn from case-control research and have helped to ensure that the available data are utilized to their fullest extent. Logistic regression modeling, in its various forms, has become by far the most frequently applied method for multivariable analysis of case-control studies. As with any type of statistical modeling, the appropriateness of its formulation can be verified only partially through examination of the data themselves, and cautious interpretation has been urged (80-83). In this article, I have concentrated on methods that are extremely well suited to the evaluation of fairly specific etiologic issues, where one or two particular exposures are designated as being of a priori interest. In situations where a large number of associations are examined for possible case-control differences, additional complexities arise. Several authors have argued strongly against statistical adjustment for "multiple comparisons" in such situations (6, 7, 84). However, recent work suggests that, when background information is limited, certain forms of multiple-comparison procedures can be useful, specifically within a decision-analysis framework (85-87). Further methodological work relevant to the analysis of case-control studies is needed in at least two important areas. First, as discussed above, we need additional methods for conducting analyses that take appropriate account of the considerable error to which measurements in case-control studies are subject. Only with such methods available can estimates from case-control studies be confidently employed for elucidating pathogenesis, for developing policy, and for individual decision-making. Second, there has been a renewed effort in recent years to clarify the nature of causal effects and to relate these to the typically calculated epidemiologic parameters (88-92). As this work develops further, it is likely that the analysis of case-control studies will be enriched.

PubMed Disclaimer

Similar articles

  • Methodological and conceptual issues regarding occupational psychosocial coronary heart disease epidemiology. Burr H, Formazin M, Pohrt A. Burr H, et al. Scand J Work Environ Health. 2016 May 1;42(3):251-5. doi: 10.5271/sjweh.3557. Epub 2016 Mar 9. Scand J Work Environ Health. 2016. PMID: 26960179
  • Assessment and statistical modeling of the relationship between remotely sensed aerosol optical depth and PM2.5 in the eastern United States. Paciorek CJ, Liu Y; HEI Health Review Committee. Paciorek CJ, et al. Res Rep Health Eff Inst. 2012 May;(167):5-83; discussion 85-91. Res Rep Health Eff Inst. 2012. PMID: 22838153
  • Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents. Coull BA, Bobb JF, Wellenius GA, Kioumourtzoglou MA, Mittleman MA, Koutrakis P, Godleski JJ. Coull BA, et al. Res Rep Health Eff Inst. 2015 Jun;(183 Pt 1-2):5-50. Res Rep Health Eff Inst. 2015. PMID: 26333238
  • Refinement of the HCUP Quality Indicators. Davies SM, Geppert J, McClellan M, McDonald KM, Romano PS, Shojania KG. Davies SM, et al. Rockville (MD): Agency for Healthcare Research and Quality (US); 2001 May. Report No.: 01-0035. Rockville (MD): Agency for Healthcare Research and Quality (US); 2001 May. Report No.: 01-0035. PMID: 20734520 Free Books & Documents. Review.
  • Sutureless Aortic Valve Replacement for Treatment of Severe Aortic Stenosis: A Single Technology Assessment of Perceval Sutureless Aortic Valve [Internet]. Desser AS, Arentz-Hansen H, Fagerlund BF, Harboe I, Lauvrak V. Desser AS, et al. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 25. Report from the Norwegian Institute of Public Health No. 2017-01. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 25. Report from the Norwegian Institute of Public Health No. 2017-01. PMID: 29553663 Free Books & Documents. Review.
  • DNA methyltransferase genes polymorphisms are associated with primary knee osteoarthritis: a matched case-control study. Miranda-Duarte A, Borgonio-Cuadra VM, González-Huerta NC, Rojas-Toledo EX, Ahumada-Pérez JF, Sosa-Arellano M, Morales-Hernández E, Pérez-Hernández N, Rodríguez-Pérez JM. Miranda-Duarte A, et al. Rheumatol Int. 2020 Apr;40(4):573-581. doi: 10.1007/s00296-019-04474-7. Epub 2019 Nov 12. Rheumatol Int. 2020. PMID: 31713648
  • Identifying Etiologically Distinct Sub-Types of Cancer: A Demonstration Project Involving Breast Cancer. Begg CB, Orlow I, Zabor EC, Arora A, Sharma A, Seshan VE, Bernstein JL. Begg CB, et al. Cancer Med. 2015 Sep;4(9):1432-9. doi: 10.1002/cam4.456. Epub 2015 May 13. Cancer Med. 2015. PMID: 25974664 Free PMC article.
  • Predictive modeling in pediatric traumatic brain injury using machine learning. Chong SL, Liu N, Barbier S, Ong ME. Chong SL, et al. BMC Med Res Methodol. 2015 Mar 17;15:22. doi: 10.1186/s12874-015-0015-0. BMC Med Res Methodol. 2015. PMID: 25886156 Free PMC article.
  • Social support, self-efficacy for decision-making, and follow-up care use in long-term cancer survivors. Forsythe LP, Alfano CM, Kent EE, Weaver KE, Bellizzi K, Arora N, Aziz N, Keel G, Rowland JH. Forsythe LP, et al. Psychooncology. 2014 Jul;23(7):788-96. doi: 10.1002/pon.3480. Epub 2014 Jan 30. Psychooncology. 2014. PMID: 24481884 Free PMC article.
  • A conceptual and methodological framework for investigating etiologic heterogeneity. Begg CB, Zabor EC, Bernstein JL, Bernstein L, Press MF, Seshan VE. Begg CB, et al. Stat Med. 2013 Dec 20;32(29):5039-52. doi: 10.1002/sim.5902. Epub 2013 Jul 16. Stat Med. 2013. PMID: 23857589 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.
  • Silverchair Information Systems

Research Materials

  • NCI CPTC Antibody Characterization Program

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Anal Sci Adv
  • v.3(5-6); 2022 Jun
  • PMC10989633

Logo of analsciadv

Helping reviewers assess statistical analysis: A case study from analytic methods

Ron s. kenett.

1 The KPA group and the Samuel Neaman Institute, Technion, Haifa Israel

Bernard G. Francq

2 UCLouvain, ISBA (Institute of Statistics, Biostatistics and Actuarial Sciences), Louvain la Neuve Belgium

Associated Data

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Analytic methods development, like many other disciplines, relies on experimentation and data analysis. Determining the contribution of a paper or report on a study incorporating data analysis is typically left to the reviewer's experience and good sense, without reliance on structured guidelines. This is amplified by the growing role of machine learning driven analysis, where results are based on computer intensive algorithm applications. The evaluation of a predictive model where cross validation was used to fit its parameters adds challenges to the evaluation of regression models, where the estimates can be easily reproduced. This lack of structure to support reviews increases uncertainty and variability in reviews. In this paper, aspects of statistical assessment are considered. We provide checklists for reviewers of applied statistics work with a focus on analytic method development. The checklist covers six aspects relevant to a review of statistical analysis, namely: (1) study design, (2) algorithmic and inferential methods in frequentism analysis, (3) Bayesian methods in Bayesian analysis (if relevant), (4) selective inference aspects, (5) severe testing properties and (6) presentation of findings. We provide a brief overview of these elements providing references for a more elaborate treatment. The robustness analysis of an analytical method is used to illustrate how an improvement can be achieved in response to questions in the checklist. The paper is aimed at both engineers and seasoned researchers.

1. BACKGROUND

In the pharmaceutical industry, as well as in other contexts, reviewers provide feedback aimed at improving work based on statistical data analysis. A good reviewer is one who contributes to the analysis and constructively enhances its level. Some journals in medicine publish guidelines for such reviews. 1 , 2 We discuss here the review of papers or reports based on statistical analysis rather than mathematical modelling. We use, as a case study, a publication on the design of a high performance liquid chromatography (HPLC) analytic method. Section  2 provides a perspective on the review of statistical reports, Section  3 is a review of statistical analysis methods, and Section  4 presents two checklists. Section  5 is a case study based on the design of an HPLC analytic method. The paper concludes with a discussion.

2. SOME PERSPECTIVES ON THE STATISTICAL REVIEW OF APPLIED STATISTICS

An important aspect of reviewing applied statistics is related to the reproducibility of the research findings. Part of this has been addressed by a much‐discussed American statistical association (ASA) statement on p ‐values. 3 While the conclusions of applied research papers must be supported by data statistical analysis, p ‐values (together with confidence intervals [CIs]) are, usually, mandatory in publications as evidence supporting alignment with the conclusions. The ASA statement formulates six principles for statistical analysis:

  • Principle 1 : p ‐values can indicate how incompatible the data are with a specified statistical model.
  • Principle 2 : p ‐values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  • Principle 3 : Scientific conclusions and business or policy decisions should not be based only on whether a p ‐value passes a specific threshold.
  • Principle 4 : Proper inference requires full reporting and transparency.
  • Principle 5 : A p ‐value, or statistical significance, does not measure the size of an effect or the importance of a result.
  • Principle 6 : By itself, a p ‐value does not provide a good measure of evidence regarding a model or hypothesis.

Other approaches, mentioned in the ASA statement without critical appraisal, include (1) confidence intervals, (2) prediction intervals, (3) estimation, (4) likelihood ratios, (5) Bayesian methods, (6) Bayes factor and (7) credibility intervals.

Should these principles guide the review process of applied research papers in general? The answer is highly debated. A series of papers and blogs present contrarian and supporting views to these principles and approaches (e.g., 4 , 5 , 6 , 7 , 8 some journals adopt opinionated guidelines affecting statistical analysis in papers they publish). 9 , 10 Specifically, they adopt a policy whereby null hypothesis statistical testing is not to be used. Mayo 11 characterized these debates as ‘the statistics wars’. Much discussion has focused on misuses of the null hypothesis testing process and low powered studies. Daniel Kahneman, the Nobel Prize winner behavioral economist retracted from his book the mention of several studies retrospectively found to be based on underpowered studies ( https://retractionwatch.com/2017/02/20/placed‐much‐faith‐underpowered‐studies‐nobel‐prize‐winner‐admits‐mistakes/ ). The studies themselves were however not retracted from the journals that originally published them. The lack of retraction is another important element in the reproducibility discussion, see, for example, in medical research. 12 The rate of retraction has been estimated (from a study published in 2011) to 0.02% in biomedical fields with nearly half due to honest error or non‐replicable findings. 13 Retraction is practically unknown in the industrial method application setting.

Given this background, how should a reviewer assess the statistical analysis of applied research? The next section maps out statistical characteristics of applied research. Later we focus on a case study in the development of analytic methods.

3. Statistical analysis of applied research

Efron and Hastie 14 present a comprehensive review of statistical analysis over time. At the origin, classical statistics consists of an algorithmic and an inferential part. Frequentism (or ‘objectivism’) is based on the probabilistic properties of a procedure of interest, as derived and applied to observed data. This provides us with an assessment of bias and variance. The frequentists’ interpretation is based on a scenario, where the same situation is repeated, endlessly. Within the frequentism framework, several methods can be applied: (1) the plug‐in substitution principle, (2) the delta methods Taylor series approximation, (3) the application of parametric families and maximum likelihood theory, (4) the use of simulation and bootstrapping computer intensive numerical methods and (5) pivotal statistics. 15 These distinctions are important for reviewers to make. The Neyman–Pearson lemma provides an optimum hypothesis testing algorithm, where a black and white decision is made. With this approach you either reject the null hypothesis while testing for an alternative hypothesis, or not. This offers an apparently simple and effective way to conduct statistical inference that can be scaled up. On the other hand, conficence intervals (CIs) are considered by many as more informative. However, like p ‐value hacking, Barnett and Wren 16 demonstrate the wide prevalence of CI hacking. When a p ‐value is lower than the significance level (usually 5%), the test is said to be significant. When researchers strive to get significant (low) p ‐values, they hope to find CIs that do not overlap the null hypothesis. Specifically, ‘the set of all CIs at different levels of probability . . . (yields a) confidence distribution’, 17 (p. 363).

Alternatively, statistical analysis can be conducted within a Bayesian framework by transforming a prior distribution on the parameters of interest, to a posterior, using the observed data. In this framework, one often invokes the Bayes factor, which is a likelihood ratio of the marginal likelihood of two competing hypotheses, usually a null and an alternative. The Bayes factor is a sort of Bayesian alternative to classical hypothesis testing. In computer age analytics one distinguishes between algorithms aiming at (1) estimation, (2) prediction or (3) explanations of structure in the data. Estimation is assessed by accuracy of estimators, prediction by prediction error, and explanations are based on variable selection using variance bias tradeoffs, penalized regression and regularization criteria.

Mayo 11 presents a perspective on statistical inference based on the concept of severe testing; she labels it ‘error statistics philosophy’. For error statisticians, a claim, or research finding, is severely tested if it has been subjected to and passes a test that probably would have found flaws, were they present, 11 (p. xii). If little or nothing has been done to rule out flaws in inferring a claim, then it has not passed a severe test. Mayo identifies three types of models: primary models, experimental models and data models. Primary models break down a research question into a set of local hypotheses that can be investigated using reliable methods. Experimental models structure the particular models at hand and serve to link primary models to data models. Data models generate and model raw data, as well as checking whether the data satisfy the assumptions of the experimental models. Error statistical assessments pick up on the effects of data dredging, multiple testing, optional stopping and other biasing selection effects. Biasing selection effects are blocked in error statistical accounts because they preclude control of error probabilities. Error statistical accounts require a preregistration of the study. 18 , 19 Long‐run performance requirements are only necessary and not sufficient for severity. Long‐run behavior could be satisfied with error probabilities that do not reflect well‐testedness. Tools that are typically justified, because they control the probability of erroneous inferences in the long‐run, are given an inferential justification. It is only when long‐run relative frequencies represent the method's capability to discern mistaken interpretations of data that the performance, and severe testing goals are reached. Mayo 11 presents a range of conceptual methods for severe testing: ‘bad evidence, no test’ (BENT), probabilism, performance and probativeness. Insevere tests yield BENT. Performance is about controlling the relative frequency of erroneous inferences in the long run of applications. Probabilism views probability as a means of assigning degrees of belief, support or plausibility to hypotheses. Probativeness is scrutinizing BENT science by the severity criterion. In interpreting CIs, one needs to connect actual experiments with hypothesized concepts. In general, the reported analysis should be able to pinpoint the sources of failed predictions and indicate what is/is not learned from negative results. 20 Every reported inference should include what cannot be reliably inferred, what potential mistakes were not probed or ruled out, and what gaps would need checking in order to avoid various misinterpretations of results, Mayo, 11 (p. 437). A podcast with Mayo on severe testing is available at https://mattasher.com/2020/11/23/ep‐26‐deborah‐mayo‐on‐error‐replication‐and‐severe‐testing/ . An applet for severity testing assessment is available in https://richarddmorey.shinyapps.io/severity/ .

Another aspect, to be considered in reviewing an applied research paper, is study design. Some studies are based on observational data and some on interventions, or experiments, designed by the researchers. There are many publications on statistical methods to design experimental interventions. The following illustration is adapted from Kenett and Zacks. 21 Interventions are determined by factor level combinations, the effects measured through responses. One particular aspect in this methodology is the use of blocking and randomization, which aims at increasing the precision of the estimates and ensures the validity of the inference. As these aspects are ubiquitous in study design, we discuss them with some more details. Blocking is used to reduce errors. A block is a portion of the experimental material that is expected to be more homogeneous than the whole aggregate. An example of blocking is the boy's shoes example, 22 (p. 97). Two kinds of shoe soles’ materials are to be tested by fixing the soles on n pairs of boys’ shoes and measuring the amount of wear of the soles after a period of actively wearing the shoes. Since there is high variability in activity of boys, if m pairs will be with soles of one type and the rest of the other, it will not be clear whether any difference that might be observed in the degree of wear out is due to differences between the characteristics of the sole material or to the differences between the boys. By blocking by pair of shoes, we can reduce much of the variability. Each pair of shoes is assigned the two types of soles. The comparison within each block is free of the variability between boys. Furthermore, since boys use their right or left foot differently, one should assign the type of soles to the left or right shoes at random. Thus, the treatments (two types of soles) are assigned within each block at random. An analytic device with two columns is equivalent to the boys’ feet. Other examples of blocks could be equipment, laboratory personnel or days of the week. Generally, if there are t treatments to compare, and b blocks, and if all t treatments can be performed within a single block, we assign all the t treatments to each block. The order of applying the treatments within each block should be randomized. Such a design is called a randomized complete block design. If not, all treatments can be applied within each block; it is desirable to assign treatments to blocks in some balanced fashion. Such designs are called balanced incomplete block designs. Randomization within each block is validating the assumption that the error components in the statistical model are independent. This assumption may not be valid if treatments are not assigned at random to the experimental units within each block. If factors are hard to change, a design based on split plots will prove more effective and accommodating to the logistic constraints. Of course, you can have a good experimental plan with attention to power, use of blocks, etc., but, overall, a bad experiment because the conditions were not chosen realistically, or because the wrong outcomes were measured, or you had the right outcomes but the wrong measurement instruments.

Yet another aspect of statistical analysis, with a potentially strong impact on the results, is selective inference. Selective inference is inference on a selected subset of the parameters that turned out to be of interest, after viewing the data. This selection leads to difficulties in reproducibility of results and needs to be accounted for and controlled in the statistical analysis. We can distinguish between out‐of‐study and in‐study selection. The former is not evident in the published work and is due to publication bias, p ‐hacking or other forms of significance chasing. The in‐study selection can be more evident in the published work. This is reflected by selection choices in abstract content, table, figure or in highlighting results passing a threshold. 23 , 24 Attentive reviewers of analytic work should be looking for such selective inference.

Finally, findings have to be presented and generalized. Generalization can be achieved by a range of methods, some intuitive, some conceptual and some more formal, invoking, for example, causal arguments. 25 , 26 Findings can be presented in different ways. One approach is based on alternative verbal representations, some with meaning equivalence and some with surface similarity. 26 Verbal expression of research findings has been proposed in Greenland 27 and Yarkoni. 28 Alternative verbal statements, with meaning equivalence, represent the same conceptual statement. Alternatives with surface similarity seem similar to the target conceptual statement but have different meaning. This approach generates a table of alternative representations with a boundary of meaning (BOM). 26 , 29 The BOM is a demarcation line between claims, presented in alternative ways, and seemingly similar representations of findings not supported by the research. An example from Efron et al. 30 is shown in Figure  1 . It describes findings from a study on the management of hypersensitivity reactions to non‐steroidal anti‐inflammatory drugs in children and adolescents and a structured gradual exposure protocol to baked and heated milk in the treatment of milk allergy. As an example, statements such as: ‘The quality of life of patients and families affected with a food allergy to staple foods (milk, egg, sesame, peanut) is impaired’ and ‘Food allergy in children impacts negatively on day to day activities of the whole family’ are considered equivalent in meaning. On the other hand, statements such as: ‘Food allergy in children impacts negatively on day to day activities of the whole family’ and ‘Educating patients on strict avoidance and carrying an epinephrine autoinjector is completely effective in avoiding accidental exposures in preschool children activities of the whole family’ carry only surface similarity. These alternatives were formulated by the researchers. The BOM is the demarcation line between the columns with meaning equivalence listings and the column with surface similarity listings. Other generalization methods are possible; the reviewer should identify what approach is used in a specific paper or report.

An external file that holds a picture, illustration, etc.
Object name is ANSA-3-212-g003.jpg

Generalization of findings with alternative representation and a boundary of meaning (BOM) (adapted from Efron et al. 30 )

With this context, we formulate questions for reviewers of statistical analysis in applied research as checklists. These are listed in Table  1 . The next section is about such questions. It is followed by a case study and a discussion.

Questions for reviewing statistical analysis in applied research

PartQuestions
1. Study design

1.1 Is the experimental set up clearly presented?

1.2 Have aliasing and power consideration been taken into account?

1.3 Is there reference to blocking, split plots and randomization?

1.4 Was an IRB required, and if so, was it obtained? (if relevant)

1.5 Are there any data ethics issues to consider?

2. Algorithmic and inferential methods

2.1 Are the algorithmic and inferential methods uses clearly stated?

2.2 Is the analysis aiming at estimation, predictive or explanatory goals?

2.3 Are data and code available to replicate the analysis?

2.4 Are outcomes of inferential analysis properly interpreted?

3. Bayesian analysis

3.1 Are prior distributions justified using prior experience or data?

3.2 What are the Bayesian methods used in the analysis?

3.3 How are Bayes factors interpreted?

4. Selective inference

4.1 Has the study been pre‐registered?

4.2 Have any false discovery rate corrections been made?

4.3 Is the presentation of findings affected by selective inference?

5. Severe testing

5.1 Have the findings been tested with an option of failing the test?

5.2 Is the study a first or is it replicating previous studies?

5.3 Have probabilism, performance and probativeness criteria been considered?

5.4 What type of model is used in the analysis: primary models, experimental models or and data models?

5.5 If used, how are confidence interval (CI) interpreted?

6. Presentation of findings

6.1 How are the research findings presented?

6.2 Have the research findings been generalized?

6.3 Are there any causality arguments presented?

6.4 In a causal study, are there issues of endogeneity (reverse‐causation)?

4. Statistical checklists for reviewing applied research

Our goal is to setup a checklist for a reviewer considering aspects related to the statistical analysis of a research paper. These are structured in six parts:

  • Study design
  • Algorithmic and inferential methods in frequentism analysis
  • Bayesian methods in Bayesian analysis
  • Selective inference aspects
  • Severe testing properties
  • Presentation of findings

Specific questions addressing these sections are listed in Table  1 .

These questions provide checklists to reviewers assigned the task of assessing the statistical analysis of an applied research paper. They are not meant to be prescriptive and are only designed as a sort of review checklist.

In this paper, we focus on evaluating studies presenting results in the development of analytic methods. 31 As background to such applications, we propose the checklist in Table  2 . A reviewer should consider the checklist questions to help characterise the study under consideration.

Checklist for analytic methods

Analytic method elementDescription and question (Q)
Precision

This requirement makes sure that method variability is only a small proportion of the specifications range (upper specification limit – lower specification limit). This is also called gage reproducibility and repeatability (GR&R).

Selectivity

Determination of impurities to monitor at each production step and specification of design methods that adequately discriminate the relative proportions of each impurity.

Sensitivity

The achievement with the method of effective process control, by accurately reflecting changes in CQA's that are important relative to the specification limits.

Identification and specification of the analytical method performance

Approach to the selection of the method work conditions to achieve the design intent

Establishment and definition of appropriate controls for the components with the largest contributions to performance variability.

Demonstration of acceptable method performance with robust and effective controls.

Testing robustness of analytical methods involves evaluating the influence of small changes in the operating conditions.

Ruggedness testing identifies the degree of reproducibility of test results obtained by the analysis of the same sample under various normal test conditions such as different laboratories, analysts, and instruments

5. A CASE STUDY

The case study concerns the development of an HPLC method analyzed by Romero et al. 32 The specific system consists of an Agilent 1050, with a variable‐wavelength ultra violet (UV) detector and a model 3396‐A integrator. Table  3 lists the factors and their levels used in the designed experiments of this case study. The original experimental array was a 2 7–4 fractional factorial experiment with three center points (see Table  4 ). The levels ‘−1’ and ‘1’ correspond to the lower and upper levels listed in Table  3 , and ‘0’ corresponds to the nominal level. The lower and upper levels are chosen to reflect variation that might naturally occur about the nominal setting during regular operation. The fractional factorial experiment consists of 11 runs that combine the design factor levels in a balanced set of combinations, including three center points.

Factors and levels in high performance liquid chromatography (HPLC) experiments

FactorNominal valueLower level (−1)Upper level (+1)
Gradient profile102
Column temp ( C)403842
Buffer conc (mM)403644
Mobile‐phase buffer pH54.85.2
Detection wavelength (nm)446441451
Triethylamine (%)0.230.210.25
Dimethylformamide (%)109.510.5

Original fractional factorial experimental array for high performance liquid chromatography (HPLC) experiment (seven independent variables and one response variable (peak height)

Gradient (X1)Column temperature (X2)Buffer Concentration (X3)Buffer pH (X4)Detection Wavelength (X5)Triethylamine percentage (X6)Dimethyl‐formamide Percentage (X7)Peak Height (Y)
1111111221.351
11−1−11−1−1226.029
1−111−1−1−1226.136
1−1−1−1−111225.052
−111−1−11−1221.835
−11−11−1−11224.268
−1−11−11−11234.957
−1−1−1111−1234.699
0000000221.249
0000000218.445
0000000219.921

What do we learn from this fractional factorial experiment?

The following paragraphs illustrate the use of the checklist in Table  1 on this robustness study.

Where α ^ is the estimated intercept, and the β ^ i ’s are the estimated coefficients (slopes) of each independent variable ( X 1 being the gradient, …, X 7 the dimethyl‐formamide Percentage, see Table  4 ). The intercept, α ^ , is the peak height predicted at the nominal levels of each input variables as the design is coded between −1 and +1. The goal of the robustness study is then to study the impact of changes in the input variables to the nominal level (target) of peak height.

  • A power analysis is not given. The experiments were run in a randomized order (no blocking or split‐plot design is used). The use of an institutional review board (IRB) is not required in this non‐clinical study, and there are no data ethics issues.

Estimated coefficients on the original fractional factorial design for high performance liquid chromatography (HPLC) experiment

Estimate (95% confidence interval [CI]), ‐value
Intercept224.9 (219.14, 230.67), < 00001
Gradient−2.15 (−8.91, 4.61), = 0.39
Col Temp−3.42 (−10.18, 3.34), = 0.21
Buf Conc−0.72 (−7.48, 6.04), = 0.76
Buf pH−0.18 (−6.94, 6.59), = 0.94
Det Wave2.47 (−4.3, 9.23), = 0.33
Trie perc−1.06 (−7.82, 5.71), = 0.65
Dim Perc−0.38 (−7.15, 6.38), = 0.87

‐ How sensitive is the method to natural variation in the input settings?

‐ Which inputs have the largest effect on the outputs from the method?

‐ Are there different inputs that dominate the sensitivity of different responses?

‐ Is the variation transmitted from factor variation large relative to natural run‐to‐run variation?

The input variable with the lowest p ‐value is the column temperature while the highest p ‐value is given for the buffer pH. However, none of the parameters is significant. The authors do not address the possibility of improving robustness by possibly moving the nominal setting to one that is less sensitive to factor variation.

The data are available, and the analysis can be reproduced (even though no computer code is given in the original paper). The outcomes of the analysis are interpreted and visualized by means of graphs.

  • No Bayesian analysis is reported in this study.
  • Non‐clinical studies do not need to be pre‐registered. No false discovery rate corrections were made, which means that the five response variables in the original paper must be interpreted separately (only the peak height is considered here). The joint analysis of the different response variables could be further discussed. The presentation of findings was comprehensive without undue emphasis on specific findings. However, there is no evidence that the selected model is the right one. The authors chose a main effects only model but robustness has a close link to nonlinearity. Kenett 33 has shown that this (simplified) model suffers from a lack of fit when analyzing the height of the peak and encourages the use of quadratic and/or interaction terms in robustness study. The p ‐value of the lack of fit test is indeed p = 0.018, which indicates that the form of the model is not adequate. One can notice that the three replicates at the center point of the design are much lower than their predictions. The model predicts the peak height at 224.90 when all the parameters are set to their nominal level, while the observed mean is 219.87

Kenett 33 shows that a definite screening design is appropriate to evaluate the robustness of a chemical process by estimating linear and quadratic terms. Table  6 shows the 17 runs of such a design with the corresponding peak height.

Definitive screening design for high performance liquid chromatography (HPLC) experiment (17 runs, seven independent variables and one response variable [peak height])

GradientColumn temperatureBuffer concentrationBuffer pHDetection wavelengthTriethylamine percentageDimethyl‐formamide PercentagePeak height
−1−1−11−111232.873
−1−11−1110228.823
−1−1110−1−1231.756
−10−1−11−11234.056
−11−1110−1226.949
−110−1−11−1221.77
−1110−1−11223.008
0−1−1−1−1−1−1220.459
0000000214.52
0111111216.927
1−1−1011−1225.315
1‐1011−11234.211
1−11−1−101226.512
1011−11−1221.193
11−1−1011220.424
11−11−1−10222.251
111−11−1−1226.226

The quadratic effect of ‘gradient’ is then significant (Table  7 ). The main effects and the quadratic effect of ‘gradient’ are statistically significant, the adjusted R 2 is 82%, and the run‐to‐run variation has an estimated standard deviation of 2.498. This results in a curvature of the response variable around the nominal level of the ‘gradient’ factor, which is important when interpreting the results (while this was neglected in the original analysis). Thus, this quadratic term gives valuable information about where to set the gradient to achieve a robust method (typically at the minimum of this curvature where potential variation of the gradient will have minimum impact). In order to improve robustness, we need to identify nonlinear effects. Here, the only nonlinear effect is for gradient. The effect of each input variables in the peak height is illustrated on Figure  3 . This shows us that the quadratic response curve for gradient reaches a minimum quite close to the nominal value (0 in the coded units of Figure  2 ). Consequently, setting the nominal level of Gradient to that level is a good choice for robustness. The other factors can also be kept at their nominal settings. They have only minor quadratic effects, so moving them to other settings will have no effect on method robustness. The level of variation on the response variable can then be assessed by simulating a noise from normal distributions around the nominal levels (using the simulator in JMP statistical disovery (JMP) with a normal distribution standard deviation (SD) = 0.4 in coded units for each input). Figure  3 shows the results of this simulation. The standard deviation of peakHeight associated with variation in the factor levels is 2.832, very similar in magnitude to the SD for run‐to‐run variation from the experimental data. The estimate of the overall method SD is then 3.776 (the square root of 2.498 2 + 2.832 2 ). Figure  4 shows the histogram and density of the peak height obtained by simulations with noise on each of the seven input variables plus the run‐to‐run variability. By calculating quantiles 2.5 and 97.5, one can be 95% confident that the peak height will lie between 212.15 and 227.12. Dividing these values by the intercept (the target peak height estimated by the model), one can claim with 95% confidence that the peak height should not deviate more than 4.5% from its target value.

  • 7. No option of failing the severe test approach is made. The study does not aim to replicate any previous studies. Probabilism is assessed by means of p ‐values for the significance of each parameter (no p ‐values are given in the original paper but significant parameters are highlighted). CIs are not given in the original paper but are provided here in Tables  5 and  7 .
  • 8. The research findings are well described and presented in summary tables and visualized by means of (3D) graphs. The paper concludes with a recommendation to set the ranges of the different parameters for the HPLC results to be robust. The causality issue is less important in this study as the original fractional factorial design is orthogonal for the main effects.

Parameter estimates for high performance liquid chromatography (HPLC) experiment (with quadratic effect(s) from the definitive screening design)

Estimate (95% confidence interval [CI]), ‐value
Intercept217.3 (213.98, 220.63), < .0001
Gradient−1.65 (−3.19, −0.11), = 0.04
Col temp− .03 (−4.57, −1.49), = 0.002
Buf Conc−0.56 (−2.1, 0.98), = 0.42
Buf pH0.56 (−0.98, 2.1), = 0.42
Det wave1.75 (0.21, 3.29), = 0.03
Trie perc−1.76 (−3.3, −0.22), = 0.03
Dim Perc1.02 (‐0.52, 2.56), = 0.16
Gradient*Gradient9.51 (5.84, 13.18), = 0.0003

An external file that holds a picture, illustration, etc.
Object name is ANSA-3-212-g002.jpg

Profiler of peak Height at nominal levels (grey areas and blue curves are the 95% confidence intervals)

An external file that holds a picture, illustration, etc.
Object name is ANSA-3-212-g001.jpg

Profiler of peak Height at nominal levels (grey areas and blue curves are the 95% confidence intervals), with added noise from a normal distribution (mean equal to the nominal level) on the input variables and the impact on the peak height (histogram on the right) (JMP ver. 15.2)

An external file that holds a picture, illustration, etc.
Object name is ANSA-3-212-g004.jpg

Histogram and density of peak Height at nominal levels for the seven input variables with added noise and run‐to‐run variability. Dashed vertical lines are the quantiles 2.5% and 97.5% (212.15 and 227.12)

The checklist in Table  1 aims to improve the quality of the review as it clearly highlights that the study design, the goals, the statistical methodology and the data are clearly described in this HPLC robustness study. It also shows some points to improve (i.e., few words about the power of the study are missing, the authors could elaborate on the multiplicity issues when analyzing several response variables, the model adequacy is not discussed). CIs could help to better understand the impact and the importance of each parameter effect. In addition, the checklist in Table  2 , specific for analytic methods, gives an overall summary of different important elements to consider when developing and analyzing analytical methods. The case study focuses on the HPLC's method robustness. The precision can be estimated with the three replicates at the nominal level of each of seven parameters (usually called in pharmaceutical industry critical process parameters). Different response variables (usually called critical quality attribute (CQA), critical quality attributes) are measured (this section focuses on the peak height).

6. DISCUSSION

To evaluate the checklist in Table  1 , we conducted an experiment by asking several researchers to review a paper by Smith et al. 34 before and after seeing the checklist table. They were then asked to comment on the checklist and more precisely address the question: ‘do the guidelines provided by the checklist improve the quality of the review’? Their comments are hereby summarized.

‘While many manuscripts are sent for review by editors or peers in industry, there is a lack of consistency in reviewing the innovations, due to the early development stage of the research and the lack of commonly shared views. How to evaluate papers regarding their innovation in interdisciplinary fields is usually not very clear’.

‘I felt a bit dumb without the checklist as no clear guidelines were given in the first round of review. There are some weak or missing points in the paper that would not be highlighted or even not spotted without the checklist’.

‘This checklist is very useful to be sure that some important points are adequately addressed in the paper. It might be good to send the statistical review to help the subject matter expert reviewer as well’.

In Francois, 35 the author analyzes data from an experiment design to assess the effect of variability in the review process. The paper described an experiment where 10% of submitted manuscripts (166 items) submitted for publication in a conference proceeding went through the review process twice. Arbitrariness was measured as the conditional probability for an accepted submission to get rejected if examined by the second committee. This number was equal to 60%, for a total acceptance rate equal to 22.5%. The author applies a Bayesian analysis to these two numbers, by introducing a hidden parameter, which measures the probability that a submission meets basic quality criteria. The standard quality criteria considered in this study include novelty, clarity, reproducibility, correctness and no form of misconduct. These were met by a large proportion of submitted items. The Bayesian estimate for the hidden parameter was equal to 56% (95% CI: [0.34, 0.83]). As a result of this analysis, the author suggests that the total acceptance rate should be increased in order to decrease arbitrariness estimates in future review processes.

Yet another approach for reviewing applied research is based on the information quality framework introduced in Kenett and Shmueli. 36 , 43 This framework involves four components (study utility (U), the data (X), the data analysis (f) and the analysis goal (g)) and eight dimensions (data resolution, data structure, data integration, temporal relevance, generalizability, chronology of data and goal, operationalization and communication). Information quality is defined as the utility of a particular data set for achieving a given analysis goal by employing statistical analysis or machine learning algorithms. 36 , 37

Data analysis pipelines affect the outcomes of statistical analysis, Botvinik‐Nezer et al. 38 These are usually not documented. Part of this is the handling of missing data and outliers. For an exception see openml.org Vanschoren et al. 39 where open access is given to the data and its analysis platform. Reviewers of data analysis uploaded to this platform should be able to fully replicate the study under review. Popp and Biskup 40 has proposed a framework in Pythin for the analysis of spectroscopic data focussing on reproducibility and good scientific practice. We therefore anticipate that future publications will require a documentation of the data analysis pipeline, beyond current requests to make data and code publicly available.

In conclusions, several areas in science have set checklists tailored to their needs, see for example, Feng et al. 41 and Aczel et al. 42 Our goal here is to provide such support in the context of statistical analysis in studies focused on the development of analytic methods.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

ACKNOWLEDGEMENT

The authors are grateful to Dr Sylvie Scolas and Dominique Derreumaux (GSK), and Professors Ran Jin, and Xinwei Deng (Virginia Tech) for their helpful comments on the checklist table.

Kenett RS, Francq BG. Helping reviewers assess statistical analysis: A case study from analytic methods . Anal Sci Adv . 2022; 3 :212‐222. 10.1002/ansa.202000159 [ CrossRef ] [ Google Scholar ]

DATA AVAILABILITY STATEMENT

AI Data Drop: 3 Key Insights from Real-World Research on AI Usage

One of the largest studies of Copilot usage—at nearly 60 companies—reveals how AI is changing the way we work.

We have spent the past nine months collaborating with 58 Microsoft 365 Copilot customers to analyze the work habits of 6,317 employees through their telemetry data—one of the largest studies of its kind to date. Our researchers randomly divided the employees into two groups: one with access to Copilot and the other without.*   

Collaborating with organizations to do real-world studies like this is critical to the broader feedback loop we’ve built for Copilot, helping us accelerate product innovation and boost value for customers. We’re just starting to learn about the new patterns of work that are emerging as more people use Copilot in more ways. Some organizations in the study experienced more impact than others, but the examples here provide valuable insights into how Copilot is already beginning to transform work. 

1. AI is starting to liberate people from email Our 2024 Work Trend Index Annual Report found that email overload remains a persistent problem: the typical person has to read about four emails for every one they send. But signals from our new study suggest that we may soon be liberated from our inboxes.  

People are spending less time reading email: Overall, employees at a consumer goods company with access to Copilot spent 31% less time reading emails, a time savings of 50 minutes a week per user. At a telecommunications company, employees spent 23% less time, saving 40 minutes per week.      We also observed marked drops in the number of emails read: The telecom company’s employees saw a 21% decrease in the number of emails read for one to 10 minutes, and a 19% drop in the number of emails read for more than 10 minutes. Employees at other companies experienced similar results, and an insurance company saw a drop in individual emails read of 21% .    The data shows that decreases in time spent in email was the largest statistically significant effect we observed among the people using AI, showing that features like Summarize were freeing them from long email chains—a crucial step toward eliminating email overload forever. 

2. Meetings are becoming more about value creation   The workday is often a balancing act between crucial meetings and focused work. And our 2023 Work Trend Index Annual Report found that all too often, people join an hour - long meeting for a one-minute insight, pushing valuable focus work to after hours . Now we’re seeing AI flip that on its head—some companies are reducing time spent in meetings, and others are making the time spent in meetings more valuable. 

We saw AI start to reduce the time spent exchanging information in meetings: People using AI at a consulting firm spent 16% less time in meetings. An energy company saw a 12% increase in the number of meetings being left early, suggesting that people may feel comfortable bowing out because they can use Copilot to get meeting notes, ask questions, and check on action items.   

We also saw with AI that meetings are becoming opportunities for new forms of co-creation: At the materials science company Dow, a team that produces technical white papers began using meetings to reinvent how they write. For some paragraphs they asked Copilot to craft a first draft, which they reviewed together and refined. Other times, they scheduled meetings for the express purpose of discussing the white paper topic in depth, with the aim of handing the transcript over to Copilot to weave it all into a cohesive, clear write-up. “It eliminates so much of the back-and-forth,” says Brandon Toyzan, a technical architect at Dow. “We have that quick conversation and get everyone’s point of view, and we automatically have a draft.” He says a process that once required hours of back-and-forth can now be accomplished in a 30-minute conversation. Think of the process as “writing out loud”—they come into a meeting with a blank page and wrap up the call with a solid draft.  

3. People are co-creating more with AI—and with one another   As the Dow meeting example shows, human-to-AI-to-human collaboration fosters better human-to-human collaboration, reducing the time it takes to get from good to great. That pattern is borne out by our data.  

People are co-creating more content with AI: one consumer goods company saw a 41% boost in the number of Word sessions, while at a law firm and a telecom company, Word document creation soared by 58% and 45% , respectively.  And AI encourages more collaboration on that content: Employees with access to Copilot at a financial services company co-edited 33% more documents than those without AI, and a consulting firm saw a similar effect. A national postal service saw the number of documents commented on increase by 82% for employees with high Copilot usage. At a multinational retailer the number of collaborators per user went up by 19% . That figure was 30% for the consulting firm.    AI also seems to enhance some people’s ability to process and consume information. At one technology company, employees who displayed high usage of Copilot saw the duration of their Word sessions drop by 52% .  

With AI, we’re seeing a new work pattern emerge, one that’s more collaborative, iterative, and multiplayer—teams working in tandem with AI. 

How to maximize the impact of AI across your organization   As these findings show, Copilot is reshaping the workday to privilege valuable creation and collaboration. The best way leaders can help their teams optimize for the kinds of effects the study revealed is by encouraging usage. For many of the companies that saw no statistically significant effects, the problem was a lack of AI usage. The organizations that have done the most to drive up adoption metrics saw the greatest impact.  

For instance, across the entire sample, people who had access to Copilot on average read six fewer emails a week than their non-Copilot counterparts. But when we looked at employees with high usage rates, that number tripled to 18 fewer emails a week. This shows that the more people develop AI usage as a daily habit, the more value they’ll get out of it.  

As companies continue their adoption journeys and the effects ripple across their organizations, the impact will intensify. What is clear now is that the AI aptitude gained from daily usage will become increasingly important for employees to keep up with where work is headed. 

*Methodology: Researchers worked with organizations in the Microsoft 365 Copilot Early Access Program to create a randomized control trial. Microsoft 365 Copilot combines generative AI tools in applications such as Word, Excel, PowerPoint, Outlook, Teams, and others. Each organization set aside at least 50 licenses to be randomly assigned among 100 or more Microsoft 365 users nominated by the organization. Researchers partnered closely with IT administrators and business decision-makers in each of the participating organizations to explain the need for randomization and obtain buy-in. Using metadata from Microsoft 365 in these organizations, researchers compared how email, meeting, and document behavior differed based on being assigned a Copilot license.

Want more research and insights on AI at work? Subscribe to the WorkLab newsletter. 

Anatomy of a Copilot

Ai’s impact at 3 industry-leading companies.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals

You are here

  • Online First
  • Facilitating GRADE judgements about the inconsistency of effects using a novel visualisation approach
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-5502-5975 Mohammad Hassan Murad 1 ,
  • http://orcid.org/0000-0002-9368-6149 Zhen Wang 1 , 2 ,
  • Yngve Falck-Ytter 3
  • 1 Evidence-based Practice Center , Mayo Clinic , Rochester , MN , USA
  • 2 Health Care Policy and Research , Mayo Clinic Minnesota , Rochester , Minnesota , USA
  • 3 Case Western Reserve University , Cleveland , Ohio , USA
  • Correspondence to Dr Mohammad Hassan Murad; murad.mohammad{at}mayo.edu

https://doi.org/10.1136/bmjebm-2024-113038

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • Systematic Reviews as Topic
  • Clinical Decision-Making

Inconsistency is a key domain that determines the certainty of evidence. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach specifically defines inconsistency as the variability in results across studies, and not variability in study characteristics, eligibility criteria or design. 1 Statistical measures of heterogeneity are often used to assess inconsistency, however, major limitations of such measures have been described. For example, Cochran’s Q test for homogeneity is usually underpowered to detect heterogeneity. The I 2 index which is the most commonly used measure, underestimate true statistical heterogeneity when there are fewer than 10 studies in a meta-analysis, which is a common scenario, and is correlated with the sample size of the included studies. 2 The I 2 index is also often misunderstood as an indicator of the spread of the effect size. Borenstein demonstrates how a meta-analysis with I 2 index of 25% can have more spread of the effect size than a meta-analysis with I 2 index of 75%. 3 Therefore, GRADE guidance on inconsistency recommended less reliance on statistical measures and instead, instructed to make judgements about whether studies in a meta-analysis provide estimates that are clinically importantly different from each other. 1 However, there are no existing tools to facilitate this process making it highly subjective. Users are instructed to look at a forest plot and evaluate the similarity of point estimates of the included studies and the overlap of their CIs, and make a judgement based on values that they consider clinically important. Merely counting studies does not work because some studies can be outliers but may have a very small weight within the pooled effect estimate. Having multiple thresholds makes this task even more difficult. Furthermore, in the case of binary outcomes, decision thresholds are based on absolute treatment effects 4 5 whereas most meta-analyses and their associated forest plot are performed on relative effect scales.

The proposed visualisation approach

This visualisation approach can be used when meta-analysts prepare a summary of the findings table and make a judgement about inconsistency. The approach starts with stakeholders providing three thresholds in the form of absolute risk differences. Consistent with recent GRADE guidance, 4 these three thresholds define seven treatment effect ranges consistent with large, medium and small reduction, large, medium and small increase, and a trivial or no effect. As an example, we used in this paper the following thresholds (per 1000 patients): small, medium and large reduction (−10 to –100 and −200), small, medium and large increase (10, 100, 200), and trivial or no effect (between −10 and +10). Random-effects meta-analysis is conducted using the restricted maximum likelihood estimator of between-study heterogeneity on a relative effect scale. The relative treatment effect of each study is converted to an absolute effect using a baseline risk that is either derived from the available studies or can also be provided by users (in this visualisation, it was derived by dividing the number of events in the control groups of a meta-analysis by the total number of participants in the control arms). Each individual study is categorised into one of the seven ranges based on its absolute effect. The random-effect weights of studies that fall in each inference range are summed to provide the total weight for that range. A bar graph depicts all the ranges with the height of the bars representing the percentage of the total weight for the range. This bar graph allows visualisation of the distribution of inferences of the individual studies in relation to stakeholder-provided thresholds. The approach is summarised in box 1 . This approach can also be used for continuous outcomes, which can be expressed on their original scale and using stakeholder-provided thresholds. If such thresholds were unknown, the outcome could be expressed as a standardised mean difference and we can use the traditional thresholds of 0.2, 0.5 and 0.8 to define small, moderate and large effect thresholds. 4

Steps of performing the proposed visualisation approach

Decisional thresholds are provided by stakeholders.

If unavailable or unknown, default thresholds can be used.

Meta-analysis is conducted to obtain the effect size and weight for each study.

The effect size of each study is converted as needed to match units of decisional thresholds.

Binary outcomes: convert relative effects to absolute effects using appropriate baseline risk.

Continuous outcomes: convert to a standardised mean difference if thresholds are unavailable.

Total weight is calculated for each decisional range by summing the weights of individual studies that fall within that range.

A bar graph allows users to visualise the spread of inference across decisional ranges and make a judgement about inconsistency.

The approach is implemented for binary and continuous outcomes in an open-source R code provided in the online supplemental appendix (R Core Team 2024. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria). The code is implemented in an R Shiny application that does not require knowledge of statistical software coding: https://hassan-murad.shinyapps.io/inconsistency_visualization/

Supplemental material

The first example is a meta-analysis of eight studies 6 that evaluated the effect of home non-invasive pressure ventilation on mortality in patients with chronic obstructive pulmonary disease. The I 2 index of 27% and the p value for heterogeneity of 0.21 suggest no important heterogeneity. The point estimates on the relative risk scale are similar except for two smaller studies ( figure 1 , panel A). Applying the proposed visualisation approach ( figure 1 , panel B), we note that the pooled effect and 50.2% of the weights of individual studies suggest a small reduction in risk. However, the remaining 49.8% of the weights of individual studies suggest three different inferences: moderate reduction, trivial effect and small increase. Thus, inferences from individual studies are quite variable in contrast to the impression derived from the forest plot and its associated statistical measures. In this case, rating down for inconsistency seems justified.

  • Download figure
  • Open in new tab
  • Download powerpoint

Meta-analysis of trials of home non-invasive positive pressure ventilation in chronic obstructive pulmonary disease. Panel A (top) demonstrates a meta-analysis of the risk ratio scale suggesting small heterogeneity. Panel B (bottom) demonstrates a bar graph showing the distribution of meta-analysis weights per effect size range, suggesting important inconsistency. The green bar represents the inference associated with the target of certainty (the pooled estimate). MA, meta-analysis.

The second example is a meta-analysis of nine studies 7 that evaluated the adverse events of fluoxetine in patients who are obese or overweight. The I 2 index of 53% and the p value for heterogeneity of 0.03, suggest a substantial and statistically significant heterogeneity ( figure 2 , panel A). Applying the proposed visualisation approach ( figure 2 , panel B), suggests that the inference from almost all the studies (96% of the meta-analysis weight) is consistent with a small increase in risk. Therefore, the statistically significant heterogeneity did not lead to any important inconsistency when considering stakeholder-provided thresholds. In this case, rating down for inconsistency is unnecessary.

Meta-analysis of trials of fluoxetine for adults who are overweight or obese. Panel A (top) demonstrates a meta-analysis of the risk ratio scale suggesting substantial heterogeneity. Panel B (bottom) demonstrates a bar graph showing the distribution of meta-analysis weights per effect size range, suggesting minimal inconsistency. The green bar represents the inference associated with the target of certainty (the pooled estimate). MA, meta-analysis.

The third example addresses a continuous outcome ( online supplemental figure 1 ). A meta-analysis of eight trials evaluated the effect of health and wellness coaching on the severity of depression in patients with chronic illness. 8 The I 2 index of 95% and the p value for heterogeneity of 0.01 suggested a substantial and statistically significant heterogeneity. The proposed visualisation shows that the majority of evidence (75% of meta-analysis weight) was consistent with a single inference, a trivial effect, which can justify not rating down for inconsistency. The forest plot and the bar graph ( online supplemental figure 1 ) demonstrate that statistical heterogeneity is driven by a single small study (11.9% of the weight). Reviewing the inclusion criteria for this study may indicate a systematic difference from the remaining eight studies.

It is very challenging to look at a forest plot and judge the consistency of individual studies in terms of their inference relating to multiple inference regions, up to seven regions according to recent GRADE guidance. 4 This complexity increases to another level in the case of binary outcomes, which require translation to absolute effects. The proposed visualisation and quantification of total weight across stakeholder-provided thresholds can help in streamlining this judgement and make it more explicit. Using meta-analysis weights instead of ‘counting studies’ addresses small studies that are outliers with extreme results.

The approach can also be used when stakeholders decide to not to use multiple thresholds, and opt to only use the minimally important difference (MID). A positive MID and a negative MID define three ranges of effect, important reduction, trivial to no effect and important increase. 5 The same visualisation can show the total meta-analysis weight distributed across these three ranges to make a judgement about inconsistency.

Limitations to this approach include two concerns associated with transforming a relative effect of a binary outcome to an absolute one. The first issue is the assumption of portability of the relative effect across different baseline risks, which is not always true. 9 The second issue is that such transformation is usually done without addressing uncertainty in baseline risks. Several methods have been proposed to address uncertainty in the baseline risk when estimating the absolute effect, 10 which can be easily incorporated in this proposed visualisation approach. Lastly, there are inherent methodological limitations to the MID and its reliability for gauging clinical relevance. Other approaches for establishing clinical relevance thresholds exists. 11

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Mayer M , et al
  • Morton SC ,
  • O’Connor E , et al
  • Borenstein M
  • Schünemann HJ ,
  • Neumann I ,
  • Hultcrantz M , et al
  • Brignardello-Petersen R ,
  • Wilson ME ,
  • Dobler CC ,
  • Morrow AS , et al
  • Serralde-Zúñiga AE ,
  • Gonzalez Garay AG ,
  • Rodríguez-Carmona Y , et al
  • Boehmer KR ,
  • Álvarez-Villalobos NA ,
  • Barakat S , et al
  • Wang Z , et al
  • Zhu Y , et al
  • de Boer M ,

X @@m_hassan_murad

Contributors MHM and YF-Y conceived this study. MHM wrote the first draft. All authors critically revised the manuscript and approved the final version.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest Content
  • Topic Collections
  • BMJ Journals

You are here

  • Volume 11, Issue Suppl 1
  • 04166 Understanding variegate porphyria in an hiv patient: a detailed case study
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Laura Sabina Varela 1 ,
  • Viviana Alicia Melito 1 ,
  • Fabiana Alejandra Caballero 1 ,
  • Marcelo Guolo 1 ,
  • Verónica Goñi 1 ,
  • Pablo Winitzky 1 ,
  • Mariana Agosta 2 ,
  • María Pasman 3 ,
  • Ana María Buzaleh 1 ,
  • Lucía Tomassi 3 ,
  • Victoria Estela Parera 1
  • 1 Centro de Investigaciones sobre Porfirinas y Porfirias (CIPYP), UBA-CONICET, Buenos Aires, Argentina
  • 2 Hospital Rocca, Buenos Aires, Argentina
  • 3 Hospital Ramos Mejía, Buenos Aires, Argentina

Variegate Porphyria (VP) is a rare metabolic disorder characterized by deficient activity of the protoporphyrinogen oxidase (PPOX) enzyme, leading to the accumulation of neurotoxic porphyrin precursors. Managing VP becomes increasingly challenging in patients with coexisting conditions, such as HIV, due to potential drug interactions. Here, we present a comprehensive case study of a 32 years old HIV-positive female, who exhibited multiple neurological and psychiatric symptoms, initially interpreted as secondary to an infectious condition. Recurrent seizures, acute psychotic episodes, and polyneuropathy marked the patient´s clinical course. Magnetic resonance imaging revealed characteristic findings suggestive of vasogenic edema, prompting suspicion of autoimmune encephalitis. Despite initial treatment attempts with immunomodulatory therapy, the patient´s condition continued to deteriorate, culminating in episodes of sepsis. Further investigation revealed the underlying diagnosis of VP. Biochemical analysis consistently showed elevated levels of aminolevulinic acid (ALA), porphobilinogen (PBG), and total urinary porphyrins (TUP). Additionally, the chromatographic profile of fecal porphyrins and the plasma porphyrin index (PPI) at l: 626nm supported the diagnosis of VP. Genetic analysis identified a pathogenic mutation in the PPOX gene (NM_001365398.1):c.428A>T, further confirming the diagnosis of VP. This mutation, while not previously reported in ClinVar or HGMD, has been documented in various publications and databases, emphasizing its significance in porphyria pathogenesis. Biochemical data correlated with clinical symptoms, showing a notable decrease in ALA, PBG, and TUP levels following hemin therapy. However, a few months later, she experienced two more attacks and received a second and third successful course of hemin. Although the patient was clinically stable, her ALA and PBG levels rose once again. Consequently, Tenofovir, a possible porphyrinogenic agent used in HIV management, was discontinued, coinciding with a marked improvement in her condition. This emphasizes the importance of medication review and the consideration of potential drug interactions in porphyria management. In conclusion, this case highlights the intricate interplay between genetic predisposition, environmental factors, and comorbidities in the manifestation and management of VP. Genetic testing, coupled with biochemical analysis, facilitates accurate diagnosis and customized treatment strategies, optimizing outcomes for patients facing the complex junction of VP and HIV.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjgast-2024-ICPP.57

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Read the full text or download the PDF:

Can Coal-Bearing Source Rocks Generate Oil-Type Gas?——A Case Study of Carboniferous-Permian Coal-Bearing Strata in the Dongpu Depression

17 Pages Posted: 19 Sep 2024

chengfu zhang

affiliation not provided to SSRN

Jingyan Liu

Lishuang lv.

Nanjing University

To explore whether Coal-Bearing Source Rocks can generate oil-type gas, this study begins with the identification and analysis of organic microbial compositions; By employing thermal simulation technology, the study observes the hydrocarbon generation characteristics of different coal-forming biogenic sources and the carbon isotope composition of generated alkanes; Further, infrared spectroscopy is used to reveal the hydrocarbon generation mechanisms of coal-type source rocks. The results show that the coal and mudstone in the Dongpu Depression contain high amounts of oil-generating components such as Alginite, Sporinite, Cutinite, Resinite, and Barkinite.Under low thermal evolution conditions, the hydrogen-rich Aliphatic groups in these different oil-generating components undergo cracking, sequentially producing oil and oil-type gas. The generation of hydrocarbons from various oil-prone precursors under different thermal evolution conditions leads to an initial lightening and subsequent heavying of the alkane gas carbon isotopes as thermal maturity increases. During the high to over-mature stages, the hydrocarbon generation mechanism shifts towards a more singular process of aromatization and polycondensation, where a substantial detachment of benzene ring substituents from Vitrinite and Inertinite results in the generation of coal-type gas. This study reveals the hydrocarbon generation mechanisms of oil-type gas in coal-bearing source rocks and their associated carbon isotope evolution.

Keywords: coal-bearing source rocks, oil-type gas, coal-type gas

Suggested Citation: Suggested Citation

Chengfu Zhang (Contact Author)

Affiliation not provided to ssrn ( email ).

No Address Available

Nanjing University ( email )

Nanjing China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, fossil energy ejournal.

Subscribe to this fee journal for more curated articles on this topic

Energy eJournal

Energy & environmental science ejournal, environmental geoscience ejournal, geochemistry & petrology ejournal.

IMAGES

  1. (PDF) Statistical analysis of a case study of acquired knowledge

    statistical analysis case study

  2. (PDF) Statistical analysis for single case data: Draft chapter

    statistical analysis case study

  3. Case Study Of Statistical Analysis Of Research Findings One Pager

    statistical analysis case study

  4. 1 Case Study and Statistical Analysis

    statistical analysis case study

  5. Statistical Analysis Case Study Solution for Harvard HBR Case Study

    statistical analysis case study

  6. PPT

    statistical analysis case study

VIDEO

  1. Demographic Analysis in SPSS

  2. Case Control Study: Explained

  3. Fundamental Analysis

  4. Statistical Analysis- Case study 1.3 (Chi- Square test)

  5. Statistical Analysis- Case study 1.4 (Graphical Distribution)

  6. Statistical Analysis- Case study 1.7 (Regression)

COMMENTS

  1. 5 Statistics Case Studies That Will Blow Your Mind

    This case study epitomizes the beautiful interplay between human action, informed by truth and statistical insight, resulting in a tangible good: the return of a majestic species from the shadow of extinction. 5. The Algorithmic Mirrors of Social Media - The Case of Twitter and Political Polarization.

  2. Chapter 16 Case Studies

    16.3 Case Studies. Let us apply the methods that were introduced throughout the book to two examples of data analysis. Both examples are taken from the case studies of the Rice Virtual Lab in Statistics can be found in their Case Studies section. The analysis of these case studies may involve any of the tools that were described in the second part of the book (and some from the first part).

  3. The Beginner's Guide to Statistical Analysis

    This article is a practical introduction to statistical analysis for students and researchers. We'll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables. Example: Causal research question.

  4. What is a Case Study? Definition & Examples

    Case Study Definition. A case study is an in-depth investigation of a single person, group, event, or community. This research method involves intensively analyzing a subject to understand its complexity and context. The richness of a case study comes from its ability to capture detailed, qualitative data that can offer insights into a process ...

  5. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  6. Statistical analyses of case-control studies

    Analysis. The frequency of each of the measured variables in each of the two groups is computed in the analysis. Case-control studies produce the odds ratio to measure the strength of the link between exposure and the outcome. An odds ratio is the ratio of exposure probabilities in the case group to the odds of response in the control group.

  7. Open Case Studies: Statistics and Data Science Education through Real

    Open Case Studies: Statistics and Data Science Education through Real-World Applications Carrie Wright1, Qier Meng1, Michael R. Breshock2, Lyla Atta2, Margaret A. Taub 1, Leah R. Jager 1, John Muschelli 1,3, and Stephanie C. Hicks1,* 1Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health 2Department of Biomedical Engineering, Johns Hopkins University

  8. Home

    A Series of Case Studies. Resources for Statistics Teachers developed by: Richard D. De Veaux, Williams College Deborah Nolan and Jasjeet Sekhon, UC Berkeley ... They can be used as examples in class, or just as guides for what a statistical analysis might entail. Each case is presented in 2 versions: An R version, written in R Markdown ...

  9. 7.2.1

    A case-cohort study is similar to a nested case-control study in that the cases and non-cases are within a parent cohort; cases and non-cases are identified at time \(t_1\), after baseline. ... Statistical Analysis for Case-Cohort Study: Weighted Cox proportional hazards regression model (we will look at proportional hazards regression later in ...

  10. [2301.05298] Open Case Studies: Statistics and Data Science Education

    As a result, case studies based on realistic challenges, not toy examples, are scarce. To address this, we developed the Open Case Studies (this https URL) project, which offers a new statistical and data science education case study model. This educational resource provides self-contained, multimodal, peer-reviewed, and open-source guides (or ...

  11. 13. Study design and choosing a statistical test

    Design. In many ways the design of a study is more important than the analysis. A badly designed study can never be retrieved, whereas a poorly analysed one can usually be reanalysed. (1) Consideration of design is also important because the design of a study will govern how the data are to be analysed. Most medical studies consider an input ...

  12. Writing a Case Analysis Paper

    Case study is unbounded and relies on gathering external information; case analysis is a self-contained subject of analysis. The scope of a case study chosen as a method of research is bounded. However, the researcher is free to gather whatever information and data is necessary to investigate its relevance to understanding the research problem.

  13. Statistical Analysis Overview

    Statistical analysis involves assessing quantitative data to identify data characteristics, trends, and relationships. ... In that case, you can conclude that the correlation you see in the sample also exists in the larger population. ... The statistical analysis indicates the longer-duration study group has a higher mean score than the shorter ...

  14. Statistical Case Studies (Student Edition)

    14. Estimating the Biomass of Forage Fishes in Alaska's Prince William Sound Following the Exxon Valdez Oil Spill. 15. A Simplified Simulation of the Impact of Environmental Interference on Measurement Systems in an Electrical Components Testing Laboratory.

  15. Comparing Visual and Statistical Analysis in Single-Case Studies Using

    Interrupted time-series analysis. Interrupted time series analysis (ITSA) is a statistical method used to examine intervention effects of single-case study designs. It was initially developed by Box and Tiao (1965; Box & Jenkins, 1976) and introduced to the behavioral sciences by Glass, Willson, and Jenkins (1975).

  16. Evidence‐based statistical analysis and methods in biomedical research

    The statistical analysis depends on study design type (randomized clinical trial [RCT], nonrandomized clinical trial [NRCT], and observational study), study design methods (matched study, two groups pre‐post study, cross‐over study, repeated measures study, etc), study hypothesis (superiority, non‐inferiority, equivalence), study purpose ...

  17. Communications in Statistics: Case Studies, Data Analysis and

    The Communications in Statistics series of journals aims to be at the forefront of probability and statistics research globally. Communications in Statistics: Case Studies, Data Analysis and Applications helps us do this by publishing case studies and associated data analytic methods in statistics. Special issues dedicated to any of the above ...

  18. PDF Statistical Methods for The Analysis of Case Series Data

    Such studies have mostly used two main approaches, case-control study and cohort study. The cohort study starts with the putative cause of disease, and observes the occurrence of disease relative to the hypothesized causal agent, while the case-control study proceeds from documented disease and investigates possible causes of the dis-ease [7].

  19. Statistical analysis of a case study of acquired knowledge

    Abstract. The paper performs a detailed statistical study in a case of acquired knowledge. based on historical data from university lev el courses. der the author's coordination. Those courses ...

  20. Basic statistical analysis in genetic case-control studies

    This protocol describes how to perform basic statistical analysis in a population-based genetic association case-control study. The steps described involve the (i) appropriate selection of ...

  21. Statistical analysis of case-control studies

    Data Interpretation, Statistical*. Humans. Methods of analysis of results from case-control studies have evolved considerably since the 1950s. These methods have helped to improve the validity of the conclusions drawn from case-control research and have helped to ensure that the available data are utilized to their fullest extent. Logistic r ….

  22. Exploring new directions in statistical analysis of single-case

    David Rindskopf. We are pleased to introduce the first of two special issues dedicated to statistical and meta-analysis of single-case experimental designs (SCEDs). This first issue is focused on the analysis of data from SCEDs while the forthcoming second issue will document the state-of-the-art in SCED research synthesis.

  23. Statistical Analysis on The Number of Monthly Paediatric Admission (A

    This work is on statistical analysis on the number of pediatric admissions at specialist Hospital Potiskum, for the period of 7 years from January 2015 to December 2021. This study is to determine the general pattern of pediatric admission and estimate the number of pediatric admissions per month. Having analyzed the data and we discuss the result of the analysis on the basis of observation we ...

  24. Helping reviewers assess statistical analysis: A case study from

    In this paper, aspects of statistical assessment are considered. We provide checklists for reviewers of applied statistics work with a focus on analytic method development. The checklist covers six aspects relevant to a review of statistical analysis, namely: (1) study design, (2) algorithmic and inferential methods in frequentism analysis, (3 ...

  25. Adding venetoclax or hypomethylating agents to induction chemotherapy

    By case-cohort study, we compared the treatment outcomes of venetoclax or HMAs combined with the "7 + 3" regimen versus the "7 + 3" regimen in a cohort of patients with newly diagnosed AML. ... Statistical analysis. The t-test and the Mann-Whitney U test were used to analyze continuous data of characteristics of patients with normal and ...

  26. AI Data Drop: 3 Key Insights from Real-World Research on AI Usage

    We're just starting to learn about the new patterns of work that are emerging as more people use Copilot in more ways. Some organizations in the study experienced more impact than others, but the examples here provide valuable insights into how Copilot is already beginning to transform work. 1. AI is starting to liberate people from email

  27. Electrocardiographic ventricular arrhythmia parameters during diagnosis

    Statistical Analysis. JMP 16.0.1 software (SAS Institute, Cary, NC, USA) was used for analysis of the data. Mean ± standard deviation (SD) values were used for continuous data while number (n) and percentage (%) for the categorical data. ... An Observational, Analytical, Case-Control Study. J Clin Endocrinol Metab. 2003; 88:3196-3201. Crossref ...

  28. Facilitating GRADE judgements about the inconsistency of effects using

    Inconsistency is a key domain that determines the certainty of evidence. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach specifically defines inconsistency as the variability in results across studies, and not variability in study characteristics, eligibility criteria or design.1 Statistical measures of heterogeneity are often used to assess inconsistency ...

  29. 04166 Understanding variegate porphyria in an hiv patient: a detailed

    Variegate Porphyria (VP) is a rare metabolic disorder characterized by deficient activity of the protoporphyrinogen oxidase (PPOX) enzyme, leading to the accumulation of neurotoxic porphyrin precursors. Managing VP becomes increasingly challenging in patients with coexisting conditions, such as HIV, due to potential drug interactions. Here, we present a comprehensive case study of a 32 years ...

  30. Can Coal-Bearing Source Rocks Generate Oil-Type Gas?——A Case Study of

    Abstract. To explore whether Coal-Bearing Source Rocks can generate oil-type gas, this study begins with the identification and analysis of organic microbial compositions; By employing thermal simulation technology, the study observes the hydrocarbon generation characteristics of different coal-forming biogenic sources and the carbon isotope composition of generated alkanes; Further, infrared ...