Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Descriptive Research | Definition, Types, Methods & Examples

Descriptive Research | Definition, Types, Methods & Examples

Published on May 15, 2019 by Shona McCombes . Revised on June 22, 2023.

Descriptive research aims to accurately and systematically describe a population, situation or phenomenon. It can answer what , where , when and how   questions , but not why questions.

A descriptive research design can use a wide variety of research methods  to investigate one or more variables . Unlike in experimental research , the researcher does not control or manipulate any of the variables, but only observes and measures them.

Table of contents

When to use a descriptive research design, descriptive research methods, other interesting articles.

Descriptive research is an appropriate choice when the research aim is to identify characteristics, frequencies, trends, and categories.

It is useful when not much is known yet about the topic or problem. Before you can research why something happens, you need to understand how, when and where it happens.

Descriptive research question examples

  • How has the Amsterdam housing market changed over the past 20 years?
  • Do customers of company X prefer product X or product Y?
  • What are the main genetic, behavioural and morphological differences between European wildcats and domestic cats?
  • What are the most popular online news sources among under-18s?
  • How prevalent is disease A in population B?

Prevent plagiarism. Run a free check.

Descriptive research is usually defined as a type of quantitative research , though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable .

Survey research allows you to gather large volumes of data that can be analyzed for frequencies, averages and patterns. Common uses of surveys include:

  • Describing the demographics of a country or region
  • Gauging public opinion on political and social topics
  • Evaluating satisfaction with a company’s products or an organization’s services

Observations

Observations allow you to gather data on behaviours and phenomena without having to rely on the honesty and accuracy of respondents. This method is often used by psychological, social and market researchers to understand how people act in real-life situations.

Observation of physical entities and phenomena is also an important part of research in the natural sciences. Before you can develop testable hypotheses , models or theories, it’s necessary to observe and systematically describe the subject under investigation.

Case studies

A case study can be used to describe the characteristics of a specific subject (such as a person, group, event or organization). Instead of gathering a large volume of data to identify patterns across time or location, case studies gather detailed data to identify the characteristics of a narrowly defined subject.

Rather than aiming to describe generalizable facts, case studies often focus on unusual or interesting cases that challenge assumptions, add complexity, or reveal something new about a research problem .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, June 22). Descriptive Research | Definition, Types, Methods & Examples. Scribbr. Retrieved June 18, 2024, from https://www.scribbr.com/methodology/descriptive-research/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, what is quantitative research | definition, uses & methods, correlational research | when & how to use, descriptive statistics | definitions, types, examples, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Descriptive Statistics for Summarising Data

Ray w. cooksey.

UNE Business School, University of New England, Armidale, NSW Australia

This chapter discusses and illustrates descriptive statistics . The purpose of the procedures and fundamental concepts reviewed in this chapter is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data (e.g. a histogram, box plot, radar plot, stem-and-leaf display, icon plot or line graph) or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement (e.g., frequency counts, measures of central tendency, variability, standard scores). Along the way, we explore the fundamental concepts of probability and the normal distribution. We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

The first broad category of statistics we discuss concerns descriptive statistics . The purpose of the procedures and fundamental concepts in this category is quite straightforward: to facilitate the description and summarisation of data. By ‘describe’ we generally mean either the use of some pictorial or graphical representation of the data or the computation of an index or number designed to summarise a specific characteristic of a variable or measurement.

We seldom interpret individual data points or observations primarily because it is too difficult for the human brain to extract or identify the essential nature, patterns, or trends evident in the data, particularly if the sample is large. Rather we utilise procedures and measures which provide a general depiction of how the data are behaving. These statistical procedures are designed to identify or display specific patterns or trends in the data. What remains after their application is simply for us to interpret and tell the story.

Reflect on the QCI research scenario and the associated data set discussed in Chap. 10.1007/978-981-15-2537-7_4. Consider the following questions that Maree might wish to address with respect to decision accuracy and speed scores:

  • What was the typical level of accuracy and decision speed for inspectors in the sample? [see Procedure 5.4 – Assessing central tendency.]
  • What was the most common accuracy and speed score amongst the inspectors? [see Procedure 5.4 – Assessing central tendency.]
  • What was the range of accuracy and speed scores; the lowest and the highest scores? [see Procedure 5.5 – Assessing variability.]
  • How frequently were different levels of inspection accuracy and speed observed? What was the shape of the distribution of inspection accuracy and speed scores? [see Procedure 5.1 – Frequency tabulation, distributions & crosstabulation.]
  • What percentage of inspectors would have ‘failed’ to ‘make the cut’ assuming the industry standard for acceptable inspection accuracy and speed combined was set at 95%? [see Procedure 5.7 – Standard ( z ) scores.]
  • How variable were the inspectors in their accuracy and speed scores? Were all the accuracy and speed levels relatively close to each other in magnitude or were the scores widely spread out over the range of possible test outcomes? [see Procedure 5.5 – Assessing variability.]
  • What patterns might be visually detected when looking at various QCI variables singly and together as a set? [see Procedure 5.2 – Graphical methods for dispaying data, Procedure 5.3 – Multivariate graphs & displays, and Procedure 5.6 – Exploratory data analysis.]

This chapter includes discussions and illustrations of a number of procedures available for answering questions about data like those posed above. In addition, you will find discussions of two fundamental concepts, namely probability and the normal distribution ; concepts that provide building blocks for Chaps. 10.1007/978-981-15-2537-7_6 and 10.1007/978-981-15-2537-7_7.

Procedure 5.1: Frequency Tabulation, Distributions & Crosstabulation

Frequency tabulation and distributions.

Frequency tabulation serves to provide a convenient counting summary for a set of data that facilitates interpretation of various aspects of those data. Basically, frequency tabulation occurs in two stages:

  • First, the scores in a set of data are rank ordered from the lowest value to the highest value.
  • Second, the number of times each specific score occurs in the sample is counted. This count records the frequency of occurrence for that specific data value.

Consider the overall job satisfaction variable, jobsat , from the QCI data scenario. Performing frequency tabulation across the 112 Quality Control Inspectors on this variable using the SPSS Frequencies procedure (Allen et al. 2019 , ch. 3; George and Mallery 2019 , ch. 6) produces the frequency tabulation shown in Table 5.1 . Note that three of the inspectors in the sample did not provide a rating for jobsat thereby producing three missing values (= 2.7% of the sample of 112) and leaving 109 inspectors with valid data for the analysis.

Frequency tabulation of overall job satisfaction scores

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab1_HTML.jpg

The display of frequency tabulation is often referred to as the frequency distribution for the sample of scores. For each value of a variable, the frequency of its occurrence in the sample of data is reported. It is possible to compute various percentages and percentile values from a frequency distribution.

Table 5.1 shows the ‘Percent’ or relative frequency of each score (the percentage of the 112 inspectors obtaining each score, including those inspectors who were missing scores, which SPSS labels as ‘System’ missing). Table 5.1 also shows the ‘Valid Percent’ which is computed only for those inspectors in the sample who gave a valid or non-missing response.

Finally, it is possible to add up the ‘Valid Percent’ values, starting at the low score end of the distribution, to form the cumulative distribution or ‘Cumulative Percent’ . A cumulative distribution is useful for finding percentiles which reflect what percentage of the sample scored at a specific value or below.

We can see in Table 5.1 that 4 of the 109 valid inspectors (a ‘Valid Percent’ of 3.7%) indicated the lowest possible level of job satisfaction—a value of 1 (Very Low) – whereas 18 of the 109 valid inspectors (a ‘Valid Percent’ of 16.5%) indicated the highest possible level of job satisfaction—a value of 7 (Very High). The ‘Cumulative Percent’ number of 18.3 in the row for the job satisfaction score of 3 can be interpreted as “roughly 18% of the sample of inspectors reported a job satisfaction score of 3 or less”; that is, nearly a fifth of the sample expressed some degree of negative satisfaction with their job as a quality control inspector in their particular company.

If you have a large data set having many different scores for a particular variable, it may be more useful to tabulate frequencies on the basis of intervals of scores.

For the accuracy scores in the QCI database, you could count scores occurring in intervals such as ‘less than 75% accuracy’, ‘between 75% but less than 85% accuracy’, ‘between 85% but less than 95% accuracy’, and ‘95% accuracy or greater’, rather than counting the individual scores themselves. This would yield what is termed a ‘grouped’ frequency distribution since the data have been grouped into intervals or score classes. Producing such an analysis using SPSS would involve extra steps to create the new category or ‘grouping’ system for scores prior to conducting the frequency tabulation.

Crosstabulation

In a frequency crosstabulation , we count frequencies on the basis of two variables simultaneously rather than one; thus we have a bivariate situation.

For example, Maree might be interested in the number of male and female inspectors in the sample of 112 who obtained each jobsat score. Here there are two variables to consider: inspector’s gender and inspector’s j obsat score. Table 5.2 shows such a crosstabulation as compiled by the SPSS Crosstabs procedure (George and Mallery 2019 , ch. 8). Note that inspectors who did not report a score for jobsat and/or gender have been omitted as missing values, leaving 106 valid inspectors for the analysis.

Frequency crosstabulation of jobsat scores by gender category for the QCI data

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab2_HTML.jpg

The crosstabulation shown in Table 5.2 gives a composite picture of the distribution of satisfaction levels for male inspectors and for female inspectors. If frequencies or ‘Counts’ are added across the gender categories, we obtain the numbers in the ‘Total’ column (the percentages or relative frequencies are also shown immediately below each count) for each discrete value of jobsat (note this column of statistics differs from that in Table 5.1 because the gender variable was missing for certain inspectors). By adding down each gender column, we obtain, in the bottom row labelled ‘Total’, the number of males and the number of females that comprised the sample of 106 valid inspectors.

The totals, either across the rows or down the columns of the crosstabulation, are termed the marginal distributions of the table. These marginal distributions are equivalent to frequency tabulations for each of the variables jobsat and gender . As with frequency tabulation, various percentage measures can be computed in a crosstabulation, including the percentage of the sample associated with a specific count within either a row (‘% within jobsat ’) or a column (‘% within gender ’). You can see in Table 5.2 that 18 inspectors indicated a job satisfaction level of 7 (Very High); of these 18 inspectors reported in the ‘Total’ column, 8 (44.4%) were male and 10 (55.6%) were female. The marginal distribution for gender in the ‘Total’ row shows that 57 inspectors (53.8% of the 106 valid inspectors) were male and 49 inspectors (46.2%) were female. Of the 57 male inspectors in the sample, 8 (14.0%) indicated a job satisfaction level of 7 (Very High). Furthermore, we could generate some additional interpretive information of value by adding the ‘% within gender’ values for job satisfaction levels of 5, 6 and 7 (i.e. differing degrees of positive job satisfaction). Here we would find that 68.4% (= 24.6% + 29.8% + 14.0%) of male inspectors indicated some degree of positive job satisfaction compared to 61.2% (= 10.2% + 30.6% + 20.4%) of female inspectors.

This helps to build a picture of the possible relationship between an inspector’s gender and their level of job satisfaction (a relationship that, as we will see later, can be quantified and tested using Procedure 10.1007/978-981-15-2537-7_6#Sec14 and Procedure 10.1007/978-981-15-2537-7_7#Sec17).

It should be noted that a crosstabulation table such as that shown in Table 5.2 is often referred to as a contingency table about which more will be said later (see Procedure 10.1007/978-981-15-2537-7_7#Sec17 and Procedure 10.1007/978-981-15-2537-7_7#Sec115).

Frequency tabulation is useful for providing convenient data summaries which can aid in interpreting trends in a sample, particularly where the number of discrete values for a variable is relatively small. A cumulative percent distribution provides additional interpretive information about the relative positioning of specific scores within the overall distribution for the sample.

Crosstabulation permits the simultaneous examination of the distributions of values for two variables obtained from the same sample of observations. This examination can yield some useful information about the possible relationship between the two variables. More complex crosstabulations can be also done where the values of three or more variables are tracked in a single systematic summary. The use of frequency tabulation or cross-tabulation in conjunction with various other statistical measures, such as measures of central tendency (see Procedure 5.4 ) and measures of variability (see Procedure 5.5 ), can provide a relatively complete descriptive summary of any data set.

Disadvantages

Frequency tabulations can get messy if interval or ratio-level measures are tabulated simply because of the large number of possible data values. Grouped frequency distributions really should be used in such cases. However, certain choices, such as the size of the score interval (group size), must be made, often arbitrarily, and such choices can affect the nature of the final frequency distribution.

Additionally, percentage measures have certain problems associated with them, most notably, the potential for their misinterpretation in small samples. One should be sure to know the sample size on which percentage measures are based in order to obtain an interpretive reference point for the actual percentage values.

For example

In a sample of 10 individuals, 20% represents only two individuals whereas in a sample of 300 individuals, 20% represents 60 individuals. If all that is reported is the 20%, then the mental inference drawn by readers is likely to be that a sizeable number of individuals had a score or scores of a particular value—but what is ‘sizeable’ depends upon the total number of observations on which the percentage is based.

Where Is This Procedure Useful?

Frequency tabulation and crosstabulation are very commonly applied procedures used to summarise information from questionnaires, both in terms of tabulating various demographic characteristics (e.g. gender, age, education level, occupation) and in terms of actual responses to questions (e.g. numbers responding ‘yes’ or ‘no’ to a particular question). They can be particularly useful in helping to build up the data screening and demographic stories discussed in Chap. 10.1007/978-981-15-2537-7_4. Categorical data from observational studies can also be analysed with this technique (e.g. the number of times Suzy talks to Frank, to Billy, and to John in a study of children’s social interactions).

Certain types of experimental research designs may also be amenable to analysis by crosstabulation with a view to drawing inferences about distribution differences across the sets of categories for the two variables being tracked.

You could employ crosstabulation in conjunction with the tests described in Procedure 10.1007/978-981-15-2537-7_7#Sec17 to see if two different styles of advertising campaign differentially affect the product purchasing patterns of male and female consumers.

In the QCI database, Maree could employ crosstabulation to help her answer the question “do different types of electronic manufacturing firms ( company ) differ in terms of their tendency to employ male versus female quality control inspectors ( gender )?”

Software Procedures

ApplicationProcedures
SPSS or . and select the variable(s) you wish to analyse; for the procedure, hitting the ‘ ’ button will allow you to choose various types of statistics and percentages to show in each cell of the table.
NCSS or and select the variable(s) you wish to analyse.
SYSTAT or ➔ and select the variable(s) you wish to analyse and choose the optional statistics you wish to see.
STATGRAPHICS or and select the variable(s) you wish to analyse; hit ‘ ’ and when the ‘Tables and Graphs’ window opens, choose the Tables and Graphs you wish to see.
Commander or and select the variable(s) you wish to analyse and choose the optional statistics you wish to see.

Procedure 5.2: Graphical Methods for Displaying Data

Graphical methods for displaying data include bar and pie charts, histograms and frequency polygons, line graphs and scatterplots. It is important to note that what is presented here is a small but representative sampling of the types of simple graphs one can produce to summarise and display trends in data. Generally speaking, SPSS offers the easiest facility for producing and editing graphs, but with a rather limited range of styles and types. SYSTAT, STATGRAPHICS and NCSS offer a much wider range of graphs (including graphs unique to each package), but with the drawback that it takes somewhat more effort to get the graphs in exactly the form you want.

Bar and Pie Charts

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are categorical (nominal or ordinal level of measurement).

  • A bar chart uses vertical and horizontal axes to summarise the data. The vertical axis is used to represent frequency (number) of occurrence or the relative frequency (percentage) of occurrence; the horizontal axis is used to indicate the data categories of interest.
  • A pie chart gives a simpler visual representation of category frequencies by cutting a circular plot into wedges or slices whose sizes are proportional to the relative frequency (percentage) of occurrence of specific data categories. Some pie charts can have a one or more slices emphasised by ‘exploding’ them out from the rest of the pie.

Consider the company variable from the QCI database. This variable depicts the types of manufacturing firms that the quality control inspectors worked for. Figure 5.1 illustrates a bar chart summarising the percentage of female inspectors in the sample coming from each type of firm. Figure 5.2 shows a pie chart representation of the same data, with an ‘exploded slice’ highlighting the percentage of female inspectors in the sample who worked for large business computer manufacturers – the lowest percentage of the five types of companies. Both graphs were produced using SPSS.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig1_HTML.jpg

Bar chart: Percentage of female inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig2_HTML.jpg

Pie chart: Percentage of female inspectors

The pie chart was modified with an option to show the actual percentage along with the label for each category. The bar chart shows that computer manufacturing firms have relatively fewer female inspectors compared to the automotive and electrical appliance (large and small) firms. This trend is less clear from the pie chart which suggests that pie charts may be less visually interpretable when the data categories occur with rather similar frequencies. However, the ‘exploded slice’ option can help interpretation in some circumstances.

Certain software programs, such as SPSS, STATGRAPHICS, NCSS and Microsoft Excel, offer the option of generating 3-dimensional bar charts and pie charts and incorporating other ‘bells and whistles’ that can potentially add visual richness to the graphic representation of the data. However, you should generally be careful with these fancier options as they can produce distortions and create ambiguities in interpretation (e.g. see discussions in Jacoby 1997 ; Smithson 2000 ; Wilkinson 2009 ). Such distortions and ambiguities could ultimately end up providing misinformation to researchers as well as to those who read their research.

Histograms and Frequency Polygons

These two types of graphs are useful for summarising the frequency of occurrence of various values (or ranges of values) where the data are essentially continuous (interval or ratio level of measurement) in nature. Both histograms and frequency polygons use vertical and horizontal axes to summarise the data. The vertical axis is used to represent the frequency (number) of occurrence or the relative frequency (percentage) of occurrences; the horizontal axis is used for the data values or ranges of values of interest. The histogram uses bars of varying heights to depict frequency; the frequency polygon uses lines and points.

There is a visual difference between a histogram and a bar chart: the bar chart uses bars that do not physically touch, signifying the discrete and categorical nature of the data, whereas the bars in a histogram physically touch to signal the potentially continuous nature of the data.

Suppose Maree wanted to graphically summarise the distribution of speed scores for the 112 inspectors in the QCI database. Figure 5.3 (produced using NCSS) illustrates a histogram representation of this variable. Figure 5.3 also illustrates another representational device called the ‘density plot’ (the solid tracing line overlaying the histogram) which gives a smoothed impression of the overall shape of the distribution of speed scores. Figure 5.4 (produced using STATGRAPHICS) illustrates the frequency polygon representation for the same data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig3_HTML.jpg

Histogram of the speed variable (with density plot overlaid)

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig4_HTML.jpg

Frequency polygon plot of the speed variable

These graphs employ a grouped format where speed scores which fall within specific intervals are counted as being essentially the same score. The shape of the data distribution is reflected in these plots. Each graph tells us that the inspection speed scores are positively skewed with only a few inspectors taking very long times to make their inspection judgments and the majority of inspectors taking rather shorter amounts of time to make their decisions.

Both representations tell a similar story; the choice between them is largely a matter of personal preference. However, if the number of bars to be plotted in a histogram is potentially very large (and this is usually directly controllable in most statistical software packages), then a frequency polygon would be the preferred representation simply because the amount of visual clutter in the graph will be much reduced.

It is somewhat of an art to choose an appropriate definition for the width of the score grouping intervals (or ‘bins’ as they are often termed) to be used in the plot: choose too many and the plot may look too lumpy and the overall distributional trend may not be obvious; choose too few and the plot will be too coarse to give a useful depiction. Programs like SPSS, SYSTAT, STATGRAPHICS and NCSS are designed to choose an ‘appropriate’ number of bins to be used, but the analyst’s eye is often a better judge than any statistical rule that a software package would use.

There are several interesting variations of the histogram which can highlight key data features or facilitate interpretation of certain trends in the data. One such variation is a graph is called a dual histogram (available in SYSTAT; a variation called a ‘comparative histogram’ can be created in NCSS) – a graph that facilitates visual comparison of the frequency distributions for a specific variable for participants from two distinct groups.

Suppose Maree wanted to graphically compare the distributions of speed scores for inspectors in the two categories of education level ( educlev ) in the QCI database. Figure 5.5 shows a dual histogram (produced using SYSTAT) that accomplishes this goal. This graph still employs the grouped format where speed scores falling within particular intervals are counted as being essentially the same score. The shape of the data distribution within each group is also clearly reflected in this plot. However, the story conveyed by the dual histogram is that, while the inspection speed scores are positively skewed for inspectors in both categories of educlev, the comparison suggests that inspectors with a high school level of education (= 1) tend to take slightly longer to make their inspection decisions than do their colleagues who have a tertiary qualification (= 2).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig5_HTML.jpg

Dual histogram of speed for the two categories of educlev

Line Graphs

The line graph is similar in style to the frequency polygon but is much more general in its potential for summarising data. In a line graph, we seldom deal with percentage or frequency data. Instead we can summarise other types of information about data such as averages or means (see Procedure 5.4 for a discussion of this measure), often for different groups of participants. Thus, one important use of the line graph is to break down scores on a specific variable according to membership in the categories of a second variable.

In the context of the QCI database, Maree might wish to summarise the average inspection accuracy scores for the inspectors from different types of manufacturing companies. Figure 5.6 was produced using SPSS and shows such a line graph.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig6_HTML.jpg

Line graph comparison of companies in terms of average inspection accuracy

Note how the trend in performance across the different companies becomes clearer with such a visual representation. It appears that the inspectors from the Large Business Computer and PC manufacturing companies have better average inspection accuracy compared to the inspectors from the remaining three industries.

With many software packages, it is possible to further elaborate a line graph by including error or confidence intervals bars (see Procedure 10.1007/978-981-15-2537-7_8#Sec18). These give some indication of the precision with which the average level for each category in the population has been estimated (narrow bars signal a more precise estimate; wide bars signal a less precise estimate).

Figure 5.7 shows such an elaborated line graph, using 95% confidence interval bars, which can be used to help make more defensible judgments (compared to Fig. 5.6 ) about whether the companies are substantively different from each other in average inspection performance. Companies whose confidence interval bars do not overlap each other can be inferred to be substantively different in performance characteristics.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig7_HTML.jpg

Line graph using confidence interval bars to compare accuracy across companies

The accuracy confidence interval bars for participants from the Large Business Computer manufacturing firms do not overlap those from the Large or Small Electrical Appliance manufacturers or the Automobile manufacturers.

We might conclude that quality control inspection accuracy is substantially better in the Large Business Computer manufacturing companies than in these other industries but is not substantially better than the PC manufacturing companies. We might also conclude that inspection accuracy in PC manufacturing companies is not substantially different from Small Electrical Appliance manufacturers.

Scatterplots

Scatterplots are useful in displaying the relationship between two interval- or ratio-scaled variables or measures of interest obtained on the same individuals, particularly in correlational research (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

In a scatterplot, one variable is chosen to be represented on the horizontal axis; the second variable is represented on the vertical axis. In this type of plot, all data point pairs in the sample are graphed. The shape and tilt of the cloud of points in a scatterplot provide visual information about the strength and direction of the relationship between the two variables. A very compact elliptical cloud of points signals a strong relationship; a very loose or nearly circular cloud signals a weak or non-existent relationship. A cloud of points generally tilted upward toward the right side of the graph signals a positive relationship (higher scores on one variable associated with higher scores on the other and vice-versa). A cloud of points generally tilted downward toward the right side of the graph signals a negative relationship (higher scores on one variable associated with lower scores on the other and vice-versa).

Maree might be interested in displaying the relationship between inspection accuracy and inspection speed in the QCI database. Figure 5.8 , produced using SPSS, shows what such a scatterplot might look like. Several characteristics of the data for these two variables can be noted in Fig. 5.8 . The shape of the distribution of data points is evident. The plot has a fan-shaped characteristic to it which indicates that accuracy scores are highly variable (exhibit a very wide range of possible scores) at very fast inspection speeds but get much less variable and tend to be somewhat higher as inspection speed increases (where inspectors take longer to make their quality control decisions). Thus, there does appear to be some relationship between inspection accuracy and inspection speed (a weak positive relationship since the cloud of points tends to be very loose but tilted generally upward toward the right side of the graph – slower speeds tend to be slightly associated with higher accuracy.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig8_HTML.jpg

Scatterplot relating inspection accuracy to inspection speed

However, it is not the case that the inspection decisions which take longest to make are necessarily the most accurate (see the labelled points for inspectors 7 and 62 in Fig. 5.8 ). Thus, Fig. 5.8 does not show a simple relationship that can be unambiguously summarised by a statement like “the longer an inspector takes to make a quality control decision, the more accurate that decision is likely to be”. The story is more complicated.

Some software packages, such as SPSS, STATGRAPHICS and SYSTAT, offer the option of using different plotting symbols or markers to represent the members of different groups so that the relationship between the two focal variables (the ones anchoring the X and Y axes) can be clarified with reference to a third categorical measure.

Maree might want to see if the relationship depicted in Fig. 5.8 changes depending upon whether the inspector was tertiary-qualified or not (this information is represented in the educlev variable of the QCI database).

Figure 5.9 shows what such a modified scatterplot might look like; the legend in the upper corner of the figure defines the marker symbols for each category of the educlev variable. Note that for both High School only-educated inspectors and Tertiary-qualified inspectors, the general fan-shaped relationship between accuracy and speed is the same. However, it appears that the distribution of points for the High School only-educated inspectors is shifted somewhat upward and toward the right of the plot suggesting that these inspectors tend to be somewhat more accurate as well as slower in their decision processes.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig9_HTML.jpg

Scatterplot displaying accuracy vs speed conditional on educlev group

There are many other styles of graphs available, often dependent upon the specific statistical package you are using. Interestingly, NCSS and, particularly, SYSTAT and STATGRAPHICS, appear to offer the most variety in terms of types of graphs available for visually representing data. A reading of the user’s manuals for these programs (see the Useful additional readings) would expose you to the great diversity of plotting techniques available to researchers. Many of these techniques go by rather interesting names such as: Chernoff’s faces, radar plots, sunflower plots, violin plots, star plots, Fourier blobs, and dot plots.

These graphical methods provide summary techniques for visually presenting certain characteristics of a set of data. Visual representations are generally easier to understand than a tabular representation and when these plots are combined with available numerical statistics, they can give a very complete picture of a sample of data. Newer methods have become available which permit more complex representations to be depicted, opening possibilities for creatively visually representing more aspects and features of the data (leading to a style of visual data storytelling called infographics ; see, for example, McCandless 2014 ; Toseland and Toseland 2012 ). Many of these newer methods can display data patterns from multiple variables in the same graph (several of these newer graphical methods are illustrated and discussed in Procedure 5.3 ).

Graphs tend to be cumbersome and space consuming if a great many variables need to be summarised. In such cases, using numerical summary statistics (such as means or correlations) in tabular form alone will provide a more economical and efficient summary. Also, it can be very easy to give a misleading picture of data trends using graphical methods by simply choosing the ‘correct’ scaling for maximum effect or choosing a display option (such as a 3-D effect) that ‘looks’ presentable but which actually obscures a clear interpretation (see Smithson 2000 ; Wilkinson 2009 ).

Thus, you must be careful in creating and interpreting visual representations so that the influence of aesthetic choices for sake of appearance do not become more important than obtaining a faithful and valid representation of the data—a very real danger with many of today’s statistical packages where ‘default’ drawing options have been pre-programmed in. No single plot can completely summarise all possible characteristics of a sample of data. Thus, choosing a specific method of graphical display may, of necessity, force a behavioural researcher to represent certain data characteristics (such as frequency) at the expense of others (such as averages).

Virtually any research design which produces quantitative data and statistics (even to the extent of just counting the number of occurrences of several events) provides opportunities for graphical data display which may help to clarify or illustrate important data characteristics or relationships. Remember, graphical displays are communication tools just like numbers—which tool to choose depends upon the message to be conveyed. Visual representations of data are generally more useful in communicating to lay persons who are unfamiliar with statistics. Care must be taken though as these same lay people are precisely the people most likely to misinterpret a graph if it has been incorrectly drawn or scaled.

ApplicationProcedures
SPSS and choose from a range of gallery chart types: , ; drag the chart type into the working area and customise the chart with desired variables, labels, etc. many elements of a chart, including error bars, can be controlled.
NCSS or or or or or hichever type of chart you choose, you can control many features of the chart from the dialog box that pops open upon selection.
STATGRAPHICS or or or hichever type of chart you choose, you can control a number of features of the chart from the series of dialog boxes that pops open upon selection.
SYSTAT or or or or or (which offers a range of other more novel graphical displays, including the dual histogram). For each choice, a dialog box opens which allows you to control almost every characteristic of the graph you want.
Commander or or or or ; for some graphs ( being the exception), there is minimal control offered by Commander over the appearance of the graph (you need to use full commands to control more aspects; e.g. see Chang ).

Procedure 5.3: Multivariate Graphs & Displays

Graphical methods for displaying multivariate data (i.e. many variables at once) include scatterplot matrices, radar (or spider) plots, multiplots, parallel coordinate displays, and icon plots. Multivariate graphs are useful for visualising broad trends and patterns across many variables (Cleveland 1995 ; Jacoby 1998 ). Such graphs typically sacrifice precision in representation in favour of a snapshot pictorial summary that can help you form general impressions of data patterns.

It is important to note that what is presented here is a small but reasonably representative sampling of the types of graphs one can produce to summarise and display trends in multivariate data. Generally speaking, SYSTAT offers the best facilities for producing multivariate graphs, followed by STATGRAPHICS, but with the drawback that it is somewhat tricky to get the graphs in exactly the form you want. SYSTAT also has excellent facilities for creating new forms and combinations of graphs – essentially allowing graphs to be tailor-made for a specific communication purpose. Both SPSS and NCSS offer a more limited range of multivariate graphs, generally restricted to scatterplot matrices and variations of multiplots. Microsoft Excel or STATGRAPHICS are the packages to use if radar or spider plots are desired.

Scatterplot Matrices

A scatterplot matrix is a useful multivariate graph designed to show relationships between pairs of many variables in the same display.

Figure 5.10 illustrates a scatterplot matrix, produced using SYSTAT, for the mentabil , accuracy , speed , jobsat and workcond variables in the QCI database. It is easy to see that all the scatterplot matrix does is stack all pairs of scatterplots into a format where it is easy to pick out the graph for any ‘row’ variable that intersects a column ‘variable’.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig10_HTML.jpg

Scatterplot matrix relating mentabil , accuracy , speed , jobsat & workcond

In those plots where a ‘row’ variable intersects itself in a column of the matrix (along the so-called ‘diagonal’), SYSTAT permits a range of univariate displays to be shown. Figure 5.10 shows univariate histograms for each variable (recall Procedure 5.2 ). One obvious drawback of the scatterplot matrix is that, if many variables are to be displayed (say ten or more); the graph gets very crowded and becomes very hard to visually appreciate.

Looking at the first column of graphs in Fig. 5.10 , we can see the scatterplot relationships between mentabil and each of the other variables. We can get a visual impression that mentabil seems to be slightly negatively related to accuracy (the cloud of scatter points tends to angle downward to the right, suggesting, very slightly, that higher mentabil scores are associated with lower levels of accuracy ).

Conversely, the visual impression of the relationship between mentabil and speed is that the relationship is slightly positive (higher mentabil scores tend to be associated with higher speed scores = longer inspection times). Similar types of visual impressions can be formed for other parts of Fig. 5.10 . Notice that the histogram plots along the diagonal give a clear impression of the shape of the distribution for each variable.

Radar Plots

The radar plot (also known as a spider graph for obvious reasons) is a simple and effective device for displaying scores on many variables. Microsoft Excel offers a range of options and capabilities for producing radar plots, such as the plot shown in Fig. 5.11 . Radar plots are generally easy to interpret and provide a good visual basis for comparing plots from different individuals or groups, even if a fairly large number of variables (say, up to about 25) are being displayed. Like a clock face, variables are evenly spaced around the centre of the plot in clockwise order starting at the 12 o’clock position. Visual interpretation of a radar plot primarily relies on shape comparisons, i.e. the rise and fall of peaks and valleys along the spokes around the plot. Valleys near the centre display low scores on specific variables, peaks near the outside of the plot display high scores on specific variables. [Note that, technically, radar plots employ polar coordinates.] SYSTAT can draw graphs using polar coordinates but not as easily as Excel can, from the user’s perspective. Radar plots work best if all the variables represented are measured on the same scale (e.g. a 1 to 7 Likert-type scale or 0% to 100% scale). Individuals who are missing any scores on the variables being plotted are typically omitted.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig11_HTML.jpg

Radar plot comparing attitude ratings for inspectors 66 and 104

The radar plot in Fig. 5.11 , produced using Excel, compares two specific inspectors, 66 and 104, on the nine attitude rating scales. Inspector 66 gave the highest rating (= 7) on the cultqual variable and inspector 104 gave the lowest rating (= 1). The plot shows that inspector 104 tended to provide very low ratings on all nine attitude variables, whereas inspector 66 tended to give very high ratings on all variables except acctrain and trainapp , where the scores were similar to those for inspector 104. Thus, in general, inspector 66 tended to show much more positive attitudes toward their workplace compared to inspector 104.

While Fig. 5.11 was generated to compare the scores for two individuals in the QCI database, it would be just as easy to produce a radar plot that compared the five types of companies in terms of their average ratings on the nine variables, as shown in Fig. 5.12 .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig12_HTML.jpg

Radar plot comparing average attitude ratings for five types of company

Here we can form the visual impression that the five types of companies differ most in their average ratings of mgmtcomm and least in the average ratings of polsatis . Overall, the average ratings from inspectors from PC manufacturers (black diamonds with solid lines) seem to be generally the most positive as their scores lie on or near the outer ring of scores and those from Automobile manufacturers tend to be least positive on many variables (except the training-related variables).

Extrapolating from Fig. 5.12 , you may rightly conclude that including too many groups and/or too many variables in a radar plot comparison can lead to so much clutter that any visual comparison would be severely degraded. You may have to experiment with using colour-coded lines to represent different groups versus line and marker shape variations (as used in Fig. 5.12 ), because choice of coding method for groups can influence the interpretability of a radar plot.

A multiplot is simply a hybrid style of graph that can display group comparisons across a number of variables. There are a wide variety of possible multiplots one could potentially design (SYSTAT offers great capabilities with respect to multiplots). Figure 5.13 shows a multiplot comprising a side-by-side series of profile-based line graphs – one graph for each type of company in the QCI database.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig13_HTML.jpg

Multiplot comparing profiles of average attitude ratings for five company types

The multiplot in Fig. 5.13 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within a specific type of company. This multiplot shows the same story as the radar plot in Fig. 5.12 , but in a different graphical format. It is still fairly clear that the average ratings from inspectors from PC manufacturers tend to be higher than for the other types of companies and the profile for inspectors from automobile manufacturers tends to be lower than for the other types of companies.

The profile for inspectors from large electrical appliance manufacturers is the flattest, meaning that their average attitude ratings were less variable than for other types of companies. Comparing the ease with which you can glean the visual impressions from Figs. 5.12 and 5.13 may lead you to prefer one style of graph over another. If you have such preferences, chances are others will also, which may mean you need to carefully consider your options when deciding how best to display data for effect.

Frequently, choice of graph is less a matter of which style is right or wrong, but more a matter of which style will suit specific purposes or convey a specific story, i.e. the choice is often strategic.

Parallel Coordinate Displays

A parallel coordinate display is useful for displaying individual scores on a range of variables, all measured using the same scale. Furthermore, such graphs can be combined side-by-side to facilitate very broad visual comparisons among groups, while retaining individual profile variability in scores. Each line in a parallel coordinate display represents one individual, e.g. an inspector.

The interpretation of a parallel coordinate display, such as the two shown in Fig. 5.14 , depends on visual impressions of the peaks and valleys (highs and lows) in the profiles as well as on the density of similar profile lines. The graph is called ‘parallel coordinate’ simply because it assumes that all variables are measured on the same scale and that scores for each variable can therefore be located along vertical axes that are parallel to each other (imagine vertical lines on Fig. 5.14 running from bottom to top for each variable on the X-axis). The main drawback of this method of data display is that only those individuals in the sample who provided legitimate scores on all of the variables being plotted (i.e. who have no missing scores) can be displayed.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig14_HTML.jpg

Parallel coordinate displays comparing profiles of average attitude ratings for five company types

The parallel coordinate display in Fig. 5.14 , produced using SYSTAT, graphs the profile of average attitude ratings for all inspectors within two specific types of company: the left graph for inspectors from PC manufacturers and the right graph for automobile manufacturers.

There are fewer lines in each display than the number of inspectors from each type of company simply because several inspectors from each type of company were missing a rating on at least one of the nine attitude variables. The graphs show great variability in scores amongst inspectors within a company type, but there are some overall patterns evident.

For example, inspectors from automobile companies clearly and fairly uniformly rated mgmtcomm toward the low end of the scale, whereas the reverse was generally true for that variable for inspectors from PC manufacturers. Conversely, inspectors from automobile companies tend to rate acctrain and trainapp more toward the middle to high end of the scale, whereas the reverse is generally true for those variables for inspectors from PC manufacturers.

Perhaps the most creative types of multivariate displays are the so-called icon plots . SYSTAT and STATGRAPHICS offer an impressive array of different types of icon plots, including, amongst others, Chernoff’s faces, profile plots, histogram plots, star glyphs and sunray plots (Jacoby 1998 provides a detailed discussion of icon plots).

Icon plots generally use a specific visual construction to represent variables scores obtained by each individual within a sample or group. All icon plots are thus methods for displaying the response patterns for individual members of a sample, as long as those individuals are not missing any scores on the variables to be displayed (note that this is the same limitation as for radar plots and parallel coordinate displays). To illustrate icon plots, without generating too many icons to focus on, Figs. 5.15 , 5.16 , 5.17 and 5.18 present four different icon plots for QCI inspectors classified, using a new variable called BEST_WORST , as either the worst performers (= 1 where their accuracy scores were less than 70%) or the best performers (= 2 where their accuracy scores were 90% or greater).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig15_HTML.jpg

Chernoff’s faces icon plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig16_HTML.jpg

Profile plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig17_HTML.jpg

Histogram plot comparing individual attitude ratings for best and worst performing inspectors

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig18_HTML.jpg

Sunray plot comparing individual attitude ratings for best and worst performing inspectors

The Chernoff’s faces plot gets its name from the visual icon used to represent variable scores – a cartoon-type face. This icon tries to capitalise on our natural human ability to recognise and differentiate faces. Each feature of the face is controlled by the scores on a single variable. In SYSTAT, up to 20 facial features are controllable; the first five being curvature of mouth, angle of brow, width of nose, length of nose and length of mouth (SYSTAT Software Inc., 2009 , p. 259). The theory behind Chernoff’s faces is that similar patterns of variable scores will produce similar looking faces, thereby making similarities and differences between individuals more apparent.

The profile plot and histogram plot are actually two variants of the same type of icon plot. A profile plot represents individuals’ scores for a set of variables using simplified line graphs, one per individual. The profile is scaled so that the vertical height of the peaks and valleys correspond to actual values for variables where the variables anchor the X-axis in a fashion similar to the parallel coordinate display. So, as you examine a profile from left to right across the X-axis of each graph, you are looking across the set of variables. A histogram plot represents the same information in the same way as for the profile plot but using histogram bars instead.

Figure 5.15 , produced using SYSTAT, shows a Chernoff’s faces plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine general attitude statements.

Each face is labelled with the inspector number it represents. The gaps indicate where an inspector had missing data on at least one of the variables, meaning a face could not be generated for them. The worst performers are drawn using red lines; the best using blue lines. The first variable is jobsat and this variable controls mouth curvature; the second variable is workcond and this controls angle of brow, and so on. It seems clear that there are differences in the faces between the best and worst performers with, for example, best performers tending to be more satisfied (smiling) and with higher ratings for working conditions (brow angle).

Beyond a broad visual impression, there is little in terms of precise inferences you can draw from a Chernoff’s faces plot. It really provides a visual sketch, nothing more. The fact that there is no obvious link between facial features, variables and score levels means that the Chernoff’s faces icon plot is difficult to interpret at the level of individual variables – a holistic impression of similarity and difference is what this type of plot facilitates.

Figure 5.16 produced using SYSTAT, shows a profile plot for the best and worst performing inspectors using their ratings of job satisfaction, working conditions and the nine attitude variables.

Like the Chernoff’s faces plot (Fig. 5.15 ), as you read across the rows of the plot from left to right, each plot corresponds respectively to a inspector in the sample who was either in the worst performer (red) or best performer (blue) category. The first attitude variable is jobsat and anchors the left end of each line graph; the last variable is polsatis and anchors the right end of the line graph. The remaining variables are represented in order from left to right across the X-axis of each graph. Figure 5.16 shows that these inspectors are rather different in their attitude profiles, with best performers tending to show taller profiles on the first two variables, for example.

Figure 5.17 produced using SYSTAT, shows a histogram plot for the best and worst performing inspectors based on their ratings of job satisfaction, working conditions and the nine attitude variables. This plot tells the same story as the profile plot, only using histogram bars. Some people would prefer the histogram icon plot to the profile plot because each histogram bar corresponds to one variable, making the visual linking of a specific bar to a specific variable much easier than visually linking a specific position along the profile line to a specific variable.

The sunray plot is actually a simplified adaptation of the radar plot (called a “star glyph”) used to represent scores on a set of variables for each individual within a sample or group. Remember that a radar plot basically arranges the variables around a central point like a clock face; the first variable is represented at the 12 o’clock position and the remaining variables follow around the plot in a clockwise direction.

Unlike a radar plot, while the spokes (the actual ‘star’ of the glyph’s name) of the plot are visible, no interpretive scale is evident. A variable’s score is visually represented by its distance from the central point. Thus, the star glyphs in a sunray plot are designed, like Chernoff’s faces, to provide a general visual impression, based on icon shape. A wide diameter well-rounded plot indicates an individual with high scores on all variables and a small diameter well-rounded plot vice-versa. Jagged plots represent individuals with highly variable scores across the variables. ‘Stars’ of similar size, shape and orientation represent similar individuals.

Figure 5.18 , produced using STATGRAPHICS, shows a sunray plot for the best and worst performing inspectors. An interpretation glyph is also shown in the lower right corner of Fig. 5.18 , where variables are aligned with the spokes of a star (e.g. jobsat is at the 12 o’clock position). This sunray plot could lead you to form the visual impression that the worst performing inspectors (group 1) have rather less rounded rating profiles than do the best performing inspectors (group 2) and that the jobsat and workcond spokes are generally lower for the worst performing inspectors.

Comparatively speaking, the sunray plot makes identifying similar individuals a bit easier (perhaps even easier than Chernoff’s faces) and, when ordered as STATGRAPHICS showed in Fig. 5.18 , permits easier visual comparisons between groups of individuals, but at the expense of precise knowledge about variable scores. Remember, a holistic impression is the goal pursued using a sunray plot.

Multivariate graphical methods provide summary techniques for visually presenting certain characteristics of a complex array of data on variables. Such visual representations are generally better at helping us to form holistic impressions of multivariate data rather than any sort of tabular representation or numerical index. They also allow us to compress many numerical measures into a finite representation that is generally easy to understand. Multivariate graphical displays can add interest to an otherwise dry statistical reporting of numerical data. They are designed to appeal to our pattern recognition skills, focusing our attention on features of the data such as shape, level, variability and orientation. Some multivariate graphs (e.g. radar plots, sunray plots and multiplots) are useful not only for representing score patterns for individuals but also providing summaries of score patterns across groups of individuals.

Multivariate graphs tend to get very busy-looking and are hard to interpret if a great many variables or a large number of individuals need to be displayed (imagine any of the icon plots, for a sample of 200 questionnaire participants, displayed on a A4 page – each icon would be so small that its features could not be easily distinguished, thereby defeating the purpose of the display). In such cases, using numerical summary statistics (such as averages or correlations) in tabular form alone will provide a more economical and efficient summary. Also, some multivariate displays will work better for conveying certain types of information than others.

Information about variable relationships may be better displayed using a scatterplot matrix. Information about individual similarities and difference on a set of variables may be better conveyed using a histogram or sunray plot. Multiplots may be better suited to displaying information about group differences across a set of variables. Information about the overall similarity of individual entities in a sample might best be displayed using Chernoff’s faces.

Because people differ greatly in their visual capacities and preferences, certain types of multivariate displays will work for some people and not others. Sometimes, people will not see what you see in the plots. Some plots, such as Chernoff’s faces, may not strike a reader as a serious statistical procedure and this could adversely influence how convinced they will be by the story the plot conveys. None of the multivariate displays described here provide sufficiently precise information for solid inferences or interpretations; all are designed to simply facilitate the formation of holistic visual impressions. In fact, you may have noticed that some displays (scatterplot matrices and the icon plots, for example) provide no numerical scaling information that would help make precise interpretations. If precision in summary information is desired, the types of multivariate displays discussed here would not be the best strategic choices.

Virtually any research design which produces quantitative data/statistics for multiple variables provides opportunities for multivariate graphical data display which may help to clarify or illustrate important data characteristics or relationships. Thus, for survey research involving many identically-scaled attitudinal questions, a multivariate display may be just the device needed to communicate something about patterns in the data. Multivariate graphical displays are simply specialised communication tools designed to compress a lot of information into a meaningful and efficient format for interpretation—which tool to choose depends upon the message to be conveyed.

Generally speaking, visual representations of multivariate data could prove more useful in communicating to lay persons who are unfamiliar with statistics or who prefer visual as opposed to numerical information. However, these displays would probably require some interpretive discussion so that the reader clearly understands their intent.

ApplicationProcedures
SPSS and choose from the gallery; drag the chart type into the working area and customise the chart with desired variables, labels, etc. Only a few elements of each chart can be configured and altered.
NCSS Only a few elements of this plot are customisable in NCSS.
SYSTAT (and you can select what type of plot you want to appear in the diagonal boxes) or ( can be selected by choosing a variable. e.g. ) or or (for icon plots, you can choose from a range of icons including Chernoff’s faces, histogram, star, sun or profile amongst others). A large number of elements of each type of plot are easily customisable, although it may take some trial and error to get exactly the look you want.
STATGRAPHICS or or or Several elements of each type of plot are easily customisable, although it may take some trial and error to get exactly the look you want.
commander You can select what type of plot you want to appear in the diagonal boxes, and you can control some other features of the plot. Other multivariate data displays are available via various packages (e.g. the or package), but not through commander.

Procedure 5.4: Assessing Central Tendency

The three most commonly reported measures of central tendency are the mean, median and mode. Each measure reflects a specific way of defining central tendency in a distribution of scores on a variable and each has its own advantages and disadvantages.

The mean is the most widely used measure of central tendency (also called the arithmetic average). Very simply, a mean is the sum of all the scores for a specific variable in a sample divided by the number of scores used in obtaining the sum. The resulting number reflects the average score for the sample of individuals on which the scores were obtained. If one were asked to predict the score that any single individual in the sample would obtain, the best prediction, in the absence of any other relevant information, would be the sample mean. Many parametric statistical methods (such as Procedures 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 and 10.1007/978-981-15-2537-7_7#Sec68) deal with sample means in one way or another. For any sample of data, there is one and only one possible value for the mean in a specific distribution. For most purposes, the mean is the preferred measure of central tendency because it utilises all the available information in a sample.

In the context of the QCI database, Maree could quite reasonably ask what inspectors scored on the average in terms of mental ability ( mentabil ), inspection accuracy ( accuracy ), inspection speed ( speed ), overall job satisfaction ( jobsat ), and perceived quality of their working conditions ( workcond ). Table 5.3 shows the mean scores for the sample of 112 quality control inspectors on each of these variables. The statistics shown in Table 5.3 were computed using the SPSS Frequencies ... procedure. Notice that the table indicates how many of the 112 inspectors had a valid score for each variable and how many were missing a score (e.g. 109 inspectors provided a valid rating for jobsat; 3 inspectors did not).

Measures of central tendency for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab3_HTML.jpg

Each mean needs to be interpreted in terms of the original units of measurement for each variable. Thus, the inspectors in the sample showed an average mental ability score of 109.84 (higher than the general population mean of 100 for the test), an average inspection accuracy of 82.14%, and an average speed for making quality control decisions of 4.48 s. Furthermore, in terms of their work context, inspectors reported an average overall job satisfaction of 4.96 (on the 7-point scale, or a level of satisfaction nearly one full scale point above the Neutral point of 4—indicating a generally positive but not strong level of job satisfaction, and an average perceived quality of work conditions of 4.21 (on the 7-point scale which is just about at the level of Stressful but Tolerable.

The mean is sensitive to the presence of extreme values, which can distort its value, giving a biased indication of central tendency. As we will see below, the median is an alternative statistic to use in such circumstances. However, it is also possible to compute what is called a trimmed mean where the mean is calculated after a certain percentage (say, 5% or 10%) of the lowest and highest scores in a distribution have been ignored (a process called ‘trimming’; see, for example, the discussion in Field 2018 , pp. 262–264). This yields a statistic less influenced by extreme scores. The drawbacks are that the decision as to what percentage to trim can be somewhat subjective and trimming necessarily sacrifices information (i.e. the extreme scores) in order to achieve a less biased measure. Some software packages, such as SPSS, SYSTAT or NCSS, can report a specific percentage trimmed mean, if that option is selected for descriptive statistics or exploratory data analysis (see Procedure 5.6 ) procedures. Comparing the original mean with a trimmed mean can provide an indication of the degree to which the original mean has been biased by extreme values.

Very simply, the median is the centre or middle score of a set of scores. By ‘centre’ or ‘middle’ is meant that 50% of the data values are smaller than or equal to the median and 50% of the data values are larger when the entire distribution of scores is rank ordered from the lowest to highest value. Thus, we can say that the median is that score in the sample which occurs at the 50th percentile. [Note that a ‘percentile’ is attached to a specific score that a specific percentage of the sample scored at or below. Thus, a score at the 25th percentile means that 25% of the sample achieved this score or a lower score.] Table 5.3 shows the 25th, 50th and 75th percentile scores for each variable – note how the 50th percentile score is exactly equal to the median in each case .

The median is reported somewhat less frequently than the mean but does have some advantages over the mean in certain circumstances. One such circumstance is when the sample of data has a few extreme values in one direction (either very large or very small relative to all other scores). In this case, the mean would be influenced (biased) to a much greater degree than would the median since all of the data are used to calculate the mean (including the extreme scores) whereas only the single centre score is needed for the median. For this reason, many nonparametric statistical procedures (such as Procedures 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 and 10.1007/978-981-15-2537-7_7#Sec63) focus on the median as the comparison statistic rather than on the mean.

A discrepancy between the values for the mean and median of a variable provides some insight to the degree to which the mean is being influenced by the presence of extreme data values. In a distribution where there are no extreme values on either side of the distribution (or where extreme values balance each other out on either side of the distribution, as happens in a normal distribution – see Fundamental Concept II ), the mean and the median will coincide at the same value and the mean will not be biased.

For highly skewed distributions, however, the value of the mean will be pulled toward the long tail of the distribution because that is where the extreme values lie. However, in such skewed distributions, the median will be insensitive (statisticians call this property ‘robustness’) to extreme values in the long tail. For this reason, the direction of the discrepancy between the mean and median can give a very rough indication of the direction of skew in a distribution (‘mean larger than median’ signals possible positive skewness; ‘mean smaller than median’ signals possible negative skewness). Like the mean, there is one and only one possible value for the median in a specific distribution.

In Fig. 5.19 , the left graph shows the distribution of speed scores and the right-hand graph shows the distribution of accuracy scores. The speed distribution clearly shows the mean being pulled toward the right tail of the distribution whereas the accuracy distribution shows the mean being just slightly pulled toward the left tail. The effect on the mean is stronger in the speed distribution indicating a greater biasing effect due to some very long inspection decision times.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig19_HTML.jpg

Effects of skewness in a distribution on the values for the mean and median

If we refer to Table 5.3 , we can see that the median score for each of the five variables has also been computed. Like the mean, the median must be interpreted in the original units of measurement for the variable. We can see that for mentabil , accuracy , and workcond , the value of the median is very close to the value of the mean, suggesting that these distributions are not strongly influenced by extreme data values in either the high or low direction. However, note that the median speed was 3.89 s compared to the mean of 4.48 s, suggesting that the distribution of speed scores is positively skewed (the mean is larger than the median—refer to Fig. 5.19 ). Conversely, the median jobsat score was 5.00 whereas the mean score was 4.96 suggesting very little substantive skewness in the distribution (mean and median are nearly equal).

The mode is the simplest measure of central tendency. It is defined as the most frequently occurring score in a distribution. Put another way, it is the score that more individuals in the sample obtain than any other score. An interesting problem associated with the mode is that there may be more than one in a specific distribution. In the case where multiple modes exist, the issue becomes which value do you report? The answer is that you must report all of them. In a ‘normal’ bell-shaped distribution, there is only one mode and it is indeed at the centre of the distribution, coinciding with both the mean and the median.

Table 5.3 also shows the mode for each of the five variables. For example, more inspectors achieved a mentabil score of 111 more often than any other score and inspectors reported a jobsat rating of 6 more often than any other rating. SPSS only ever reports one mode even if several are present, so one must be careful and look at a histogram plot for each variable to make a final determination of the mode(s) for that variable.

All three measures of central tendency yield information about what is going on in the centre of a distribution of scores. The mean and median provide a single number which can summarise the central tendency in the entire distribution. The mode can yield one or multiple indices. With many measurements on individuals in a sample, it is advantageous to have single number indices which can describe the distributions in summary fashion. In a normal or near-normal distribution of sample data, the mean, the median, and the mode will all generally coincide at the one point. In this instance, all three statistics will provide approximately the same indication of central tendency. Note however that it is seldom the case that all three statistics would yield exactly the same number for any particular distribution. The mean is the most useful statistic, unless the data distribution is skewed by extreme scores, in which case the median should be reported.

While measures of central tendency are useful descriptors of distributions, summarising data using a single numerical index necessarily reduces the amount of information available about the sample. Not only do we need to know what is going on in the centre of a distribution, we also need to know what is going on around the centre of the distribution. For this reason, most social and behavioural researchers report not only measures of central tendency, but also measures of variability (see Procedure 5.5 ). The mode is the least informative of the three statistics because of its potential for producing multiple values.

Measures of central tendency are useful in almost any type of experimental design, survey or interview study, and in any observational studies where quantitative data are available and must be summarised. The decision as to whether the mean or median should be reported depends upon the nature of the data which should ideally be ascertained by visual inspection of the data distribution. Some researchers opt to report both measures routinely. Computation of means is a prelude to many parametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec22 , 10.1007/978-981-15-2537-7_7#Sec32 , 10.1007/978-981-15-2537-7_7#Sec42 , 10.1007/978-981-15-2537-7_7#Sec52 , 10.1007/978-981-15-2537-7_7#Sec68 , 10.1007/978-981-15-2537-7_7#Sec76 and 10.1007/978-981-15-2537-7_7#Sec105); comparison of medians is associated with many nonparametric statistical methods (see, for example, Procedure 10.1007/978-981-15-2537-7_7#Sec27 , 10.1007/978-981-15-2537-7_7#Sec37 , 10.1007/978-981-15-2537-7_7#Sec63 and 10.1007/978-981-15-2537-7_7#Sec81).

ApplicationProcedures
SPSS then press the ‘ ’ button and choose mean, median and mode. To see trimmed means, you must use the Exploratory Data Analysis procedure; see .
NCSS then select the reports and plots that you want to see; make sure you indicate that you want to see the ‘Means Section’ of the Report. If you want to see trimmed means, tick the ‘Trimmed Section’ of the Report.
SYSTAT … then select the mean, median and mode (as well as any other statistics you might wish to see). If you want to see trimmed means, tick the ‘Trimmed mean’ section of the dialog box and set the percentage to trim in the box labelled ‘Two-sided’.
STATGRAPHICS or then choose the variable(s) you want to describe and select Summary Statistics (you don’t get any options for statistics to report – measures of central tendency and variability are automatically produced). STATGRAPHICS will not report modes and you will need to use and request ‘Percentiles’ in order to see the 50%ile score which will be the median; however, it won’t be labelled as the median.
Commander then select the central tendency statistics you want to see. Commander will not produce modes and to see the median, make sure that the ‘Quantiles’ box is ticked – the .5 quantile score (= 50%ile) score is the median; however, it won’t be labelled as the median.

Procedure 5.5: Assessing Variability

There are a variety of measures of variability to choose from including the range, interquartile range, variance and standard deviation. Each measure reflects a specific way of defining variability in a distribution of scores on a variable and each has its own advantages and disadvantages. Most measures of variability are associated with a specific measure of central tendency so that researchers are now commonly expected to report both a measure of central tendency and its associated measure of variability whenever they display numerical descriptive statistics on continuous or ranked-ordered variables.

This is the simplest measure of variability for a sample of data scores. The range is merely the largest score in the sample minus the smallest score in the sample. The range is the one measure of variability not explicitly associated with any measure of central tendency. It gives a very rough indication as to the extent of spread in the scores. However, since the range uses only two of the total available scores in the sample, the rest of the scores are ignored, which means that a lot of potentially useful information is being sacrificed. There are also problems if either the highest or lowest (or both) scores are atypical or too extreme in their value (as in highly skewed distributions). When this happens, the range gives a very inflated picture of the typical variability in the scores. Thus, the range tends not be a frequently reported measure of variability.

Table 5.4 shows a set of descriptive statistics, produced by the SPSS Frequencies procedure, for the mentabil, accuracy, speed, jobsat and workcond measures in the QCI database. In the table, you will find three rows labelled ‘Range’, ‘Minimum’ and ‘Maximum’.

Measures of central tendency and variability for specific QCI variables

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab4_HTML.jpg

Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s – the fastest quality decision to 17.10 – the slowest quality decision). Accuracy scores had a range of 43 (from 57% – the least accurate inspector to 100% – the most accurate inspector). Both work context measures ( jobsat and workcond ) exhibited a range of 6 – the largest possible range given the 1 to 7 scale of measurement for these two variables.

Interquartile Range

The Interquartile Range ( IQR ) is a measure of variability that is specifically designed to be used in conjunction with the median. The IQR also takes care of the extreme data problem which typically plagues the range measure. The IQR is defined as the range that is covered by the middle 50% of scores in a distribution once the scores have been ranked in order from lowest value to highest value. It is found by locating the value in the distribution at or below which 25% of the sample scored and subtracting this number from the value in the distribution at or below which 75% of the sample scored. The IQR can also be thought of as the range one would compute after the bottom 25% of scores and the top 25% of scores in the distribution have been ‘chopped off’ (or ‘trimmed’ as statisticians call it).

The IQR gives a much more stable picture of the variability of scores and, like the median, is relatively insensitive to the biasing effects of extreme data values. Some behavioural researchers prefer to divide the IQR in half which gives a measure called the Semi-Interquartile Range ( S-IQR ) . The S-IQR can be interpreted as the distance one must travel away from the median, in either direction, to reach the value which separates the top (or bottom) 25% of scores in the distribution from the remaining 75%.

The IQR or S-IQR is typically not produced by descriptive statistics procedures by default in many computer software packages; however, it can usually be requested as an optional statistic to report or it can easily be computed by hand using percentile scores. Both the median and the IQR figure prominently in Exploratory Data Analysis, particularly in the production of boxplots (see Procedure 5.6 ).

Figure 5.20 illustrates the conceptual nature of the IQR and S-IQR compared to that of the range. Assume that 100% of data values are covered by the distribution curve in the figure. It is clear that these three measures would provide very different values for a measure of variability. Your choice would depend on your purpose. If you simply want to signal the overall span of scores between the minimum and maximum, the range is the measure of choice. But if you want to signal the variability around the median, the IQR or S-IQR would be the measure of choice.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig20_HTML.jpg

How the range, IQR and S-IQR measures of variability conceptually differ

Note: Some behavioural researchers refer to the IQR as the hinge-spread (or H-spread ) because of its use in the production of boxplots:

  • the 25th percentile data value is referred to as the ‘lower hinge’;
  • the 75th percentile data value is referred to as the ‘upper hinge’; and
  • their difference gives the H-spread.

Midspread is another term you may see used as a synonym for interquartile range.

Referring back to Table 5.4 , we can find statistics reported for the median and for the ‘quartiles’ (25th, 50th and 75th percentile scores) for each of the five variables of interest. The ‘quartile’ values are useful for finding the IQR or S-IQR because SPSS does not report these measures directly. The median clearly equals the 50th percentile data value in the table.

If we focus, for example, on the speed variable, we could find its IQR by subtracting the 25th percentile score of 2.19 s from the 75th percentile score of 5.71 s to give a value for the IQR of 3.52 s (the S-IQR would simply be 3.52 divided by 2 or 1.76 s). Thus, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores spanning a range of 3.52 s. Alternatively, we could report that the median decision speed for inspectors was 3.89 s and that the middle 50% of inspectors showed scores which ranged 1.76 s either side of the median value.

Note: We could compare the ‘Minimum’ or ‘Maximum’ scores to the 25th percentile score and 75th percentile score respectively to get a feeling for whether the minimum or maximum might be considered extreme or uncharacteristic data values.

The variance uses information from every individual in the sample to assess the variability of scores relative to the sample mean. Variance assesses the average squared deviation of each score from the mean of the sample. Deviation refers to the difference between an observed score value and the mean of the sample—they are squared simply because adding them up in their naturally occurring unsquared form (where some differences are positive and others are negative) always gives a total of zero, which is useless for an index purporting to measure something.

If many scores are quite different from the mean, we would expect the variance to be large. If all the scores lie fairly close to the sample mean, we would expect a small variance. If all scores exactly equal the mean (i.e. all the scores in the sample have the same value), then we would expect the variance to be zero.

Figure 5.21 illustrates some possibilities regarding variance of a distribution of scores having a mean of 100. The very tall curve illustrates a distribution with small variance. The distribution of medium height illustrates a distribution with medium variance and the flattest distribution ia a distribution with large variance.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig21_HTML.jpg

The concept of variance

If we had a distribution with no variance, the curve would simply be a vertical line at a score of 100 (meaning that all scores were equal to the mean). You can see that as variance increases, the tails of the distribution extend further outward and the concentration of scores around the mean decreases. You may have noticed that variance and range (as well as the IQR) will be related, since the range focuses on the difference between the ends of the two tails in the distribution and larger variances extend the tails. So, a larger variance will generally be associated with a larger range and IQR compared to a smaller variance.

It is generally difficult to descriptively interpret the variance measure in a meaningful fashion since it involves squared deviations around the sample mean. [Note: If you look back at Table 5.4 , you will see the variance listed for each of the variables (e.g. the variance of accuracy scores is 84.118), but the numbers themselves make little sense and do not relate to the original measurement scale for the variables (which, for the accuracy variable, went from 0% to 100% accuracy).] Instead, we use the variance as a steppingstone for obtaining a measure of variability that we can clearly interpret, namely the standard deviation . However, you should know that variance is an important concept in its own right simply because it provides the statistical foundation for many of the correlational procedures and statistical inference procedures described in Chaps. 10.1007/978-981-15-2537-7_6 , 10.1007/978-981-15-2537-7_7 and 10.1007/978-981-15-2537-7_8.

When considering either correlations or tests of statistical hypotheses, we frequently speak of one variable explaining or sharing variance with another (see Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec47 ). In doing so, we are invoking the concept of variance as set out here—what we are saying is that variability in the behaviour of scores on one particular variable may be associated with or predictive of variability in scores on another variable of interest (e.g. it could explain why those scores have a non-zero variance).

Standard Deviation

The standard deviation (often abbreviated as SD, sd or Std. Dev.) is the most commonly reported measure of variability because it has a meaningful interpretation and is used in conjunction with reports of sample means. Variance and standard deviation are closely related measures in that the standard deviation is found by taking the square root of the variance. The standard deviation, very simply, is a summary number that reflects the ‘average distance of each score from the mean of the sample’. In many parametric statistical methods, both the sample mean and sample standard deviation are employed in some form. Thus, the standard deviation is a very important measure, not only for data description, but also for hypothesis testing and the establishment of relationships as well.

Referring again back to Table 5.4 , we’ll focus on the results for the speed variable for discussion purposes. Table 5.4 shows that the mean inspection speed for the QCI sample was 4.48 s. We can also see that the standard deviation (in the row labelled ‘Std Deviation’) for speed was 2.89 s.

This standard deviation has a straightforward interpretation: we would say that ‘on the average, an inspector’s quality inspection decision speed differed from the mean of the sample by about 2.89 s in either direction’. In a normal distribution of scores (see Fundamental Concept II ), we would expect to see about 68% of all inspectors having decision speeds between 1.59 s (the mean minus one amount of the standard deviation) and 7.37 s (the mean plus one amount of the standard deviation).

We noted earlier that the range of the speed scores was 16.05 s. However, the fact that the maximum speed score was 17.1 s compared to the 75th percentile score of just 5.71 s seems to suggest that this maximum speed might be rather atypically large compared to the bulk of speed scores. This means that the range is likely to be giving us a false impression of the overall variability of the inspectors’ decision speeds.

Furthermore, given that the mean speed score was higher than the median speed score, suggesting that speed scores were positively skewed (this was confirmed by the histogram for speed shown in Fig. 5.19 in Procedure 5.4 ), we might consider emphasising the median and its associated IQR or S-IQR rather than the mean and standard deviation. Of course, similar diagnostic and interpretive work could be done for each of the other four variables in Table 5.4 .

Measures of variability (particularly the standard deviation) provide a summary measure that gives an indication of how variable (spread out) a particular sample of scores is. When used in conjunction with a relevant measure of central tendency (particularly the mean), a reasonable yet economical description of a set of data emerges. When there are extreme data values or severe skewness is present in the data, the IQR (or S-IQR) becomes the preferred measure of variability to be reported in conjunction with the sample median (or 50th percentile value). These latter measures are much more resistant (‘robust’) to influence by data anomalies than are the mean and standard deviation.

As mentioned above, the range is a very cursory index of variability, thus, it is not as useful as variance or standard deviation. Variance has little meaningful interpretation as a descriptive index; hence, standard deviation is most often reported. However, the standard deviation (or IQR) has little meaning if the sample mean (or median) is not reported along with it.

Knowing that the standard deviation for accuracy is 9.17 tells you little unless you know the mean accuracy (82.14) that it is the standard deviation from.

Like the sample mean, the standard deviation can be strongly biased by the presence of extreme data values or severe skewness in a distribution in which case the median and IQR (or S-IQR) become the preferred measures. The biasing effect will be most noticeable in samples which are small in size (say, less than 30 individuals) and far less noticeable in large samples (say, in excess of 200 or 300 individuals). [Note that, in a manner similar to a trimmed mean, it is possible to compute a trimmed standard deviation to reduce the biasing effect of extreme data values, see Field 2018 , p. 263.]

It is important to realise that the resistance of the median and IQR (or S-IQR) to extreme values is only gained by deliberately sacrificing a good deal of the information available in the sample (nothing is obtained without a cost in statistics). What is sacrificed is information from all other members of the sample other than those members who scored at the median and 25th and 75th percentile points on a variable of interest; information from all members of the sample would automatically be incorporated in mean and standard deviation for that variable.

Any investigation where you might report on or read about measures of central tendency on certain variables should also report measures of variability. This is particularly true for data from experiments, quasi-experiments, observational studies and questionnaires. It is important to consider measures of central tendency and measures of variability to be inextricably linked—one should never report one without the other if an adequate descriptive summary of a variable is to be communicated.

Other descriptive measures, such as those for skewness and kurtosis 1 may also be of interest if a more complete description of any variable is desired. Most good statistical packages can be instructed to report these additional descriptive measures as well.

Of all the statistics you are likely to encounter in the business, behavioural and social science research literature, means and standard deviations will dominate as measures for describing data. Additionally, these statistics will usually be reported when any parametric tests of statistical hypotheses are presented as the mean and standard deviation provide an appropriate basis for summarising and evaluating group differences.

ApplicationProcedures
SPSS then press the ‘ ’ button and choose Std. Deviation, Variance, Range, Minimum and/or Maximum as appropriate. SPSS does not produce or have an option to produce either the IQR or S-IQR, however, if your request ‘Quantiles’ you will see the 25th and 75th %ile scores, which can then be used to quickly compute either variability measure. Remember to select appropriate central tendency measures as well.
NCSS then select the reports and plots that you want to see; make sure you indicate that you want to see the Variance Section of the Report. Remember to select appropriate central tendency measures as well (by opting to see the Means Section of the Report).
SYSTAT … then select SD, Variance, Range, Interquartile range, Minimum and/or Maximum as appropriate. Remember to select appropriate central tendency measures as well.
STATGRAPHICS or then choose the variable(s) you want to describe and select Summary Statistics (you don’t get any options for statistics to report – measures of central tendency and variability are automatically produced). STATGRAPHICS does not produce either the IQR or S-IQR, however, if you use Percentiles’ can be requested in order to see the 25th and 75th %ile scores, which can then be used to quickly compute either variability measure.
Commander then select either the Standard Deviation or Interquartile Range as appropriate. Commander will not produce the range statistic or report minimum or maximum scores. Remember to select appropriate central tendency measures as well.

Fundamental Concept I: Basic Concepts in Probability

The concept of simple probability.

In Procedures 5.1 and 5.2 , you encountered the idea of the frequency of occurrence of specific events such as particular scores within a sample distribution. Furthermore, it is a simple operation to convert the frequency of occurrence of a specific event into a number representing the relative frequency of that event. The relative frequency of an observed event is merely the number of times the event is observed divided by the total number of times one makes an observation. The resulting number ranges between 0 and 1 but we typically re-express this number as a percentage by multiplying it by 100%.

In the QCI database, Maree Lakota observed data from 112 quality control inspectors of which 58 were male and 51 were female (gender indications were missing for three inspectors). The statistics 58 and 51 are thus the frequencies of occurrence for two specific types of research participant, a male inspector or a female inspector.

If she divided each frequency by the total number of observations (i.e. 112), whe would obtain .52 for males and .46 for females (leaving .02 of observations with unknown gender). These statistics are relative frequencies which indicate the proportion of times that Maree obtained data from a male or female inspector. Multiplying each relative frequency by 100% would yield 52% and 46% which she could interpret as indicating that 52% of her sample was male and 46% was female (leaving 2% of the sample with unknown gender).

It does not take much of a leap in logic to move from the concept of ‘relative frequency’ to the concept of ‘probability’. In our discussion above, we focused on relative frequency as indicating the proportion or percentage of times a specific category of participant was obtained in a sample. The emphasis here is on data from a sample.

Imagine now that Maree had infinite resources and research time and was able to obtain ever larger samples of quality control inspectors for her study. She could still compute the relative frequencies for obtaining data from males and females in her sample but as her sample size grew larger and larger, she would notice these relative frequencies converging toward some fixed values.

If, by some miracle, Maree could observe all of the quality control inspectors on the planet today, she would have measured the entire population and her computations of relative frequency for males and females would yield two precise numbers, each indicating the proportion of the population of inspectors that was male and the proportion that was female.

If Maree were then to list all of these inspectors and randomly choose one from the list, the chances that she would choose a male inspector would be equal to the proportion of the population of inspectors that was male and this logic extends to choosing a female inspector. The number used to quantify this notion of ‘chances’ is called a probability. Maree would therefore have established the probability of randomly observing a male or a female inspector in the population on any specific occasion.

Probability is expressed on a 0.0 (the observation or event will certainly not be seen) to 1.0 (the observation or event will certainly be seen) scale where values close to 0.0 indicate observations that are less certain to be seen and values close to 1.0 indicate observations that are more certain to be seen (a value of .5 indicates an even chance that an observation or event will or will not be seen – a state of maximum uncertainty). Statisticians often interpret a probability as the likelihood of observing an event or type of individual in the population.

In the QCI database, we noted that the relative frequency of observing males was .52 and for females was .46. If we take these relative frequencies as estimates of the proportions of each gender in the population of inspectors, then .52 and .46 represent the probability of observing a male or female inspector, respectively.

Statisticians would state this as “the probability of observing a male quality control inspector is .52” or in a more commonly used shorthand code, the likelihood of observing a male quality control inspector is p = .52 (p for probability). For some, probabilities make more sense if they are converted to percentages (by multiplying by 100%). Thus, p = .52 can also understood as a 52% chance of observing a male quality control inspector.

We have seen that relative frequency is a sample statistic that can be used to estimate the population probability. Our estimate will get more precise as we use larger and larger samples (technically, as the size of our samples more closely approximates the size of our population). In most behavioural research, we never have access to entire populations so we must always estimate our probabilities.

In some very special populations, having a known number of fixed possible outcomes, such as results of coin tosses or rolls of a die, we can analytically establish event probabilities without doing an infinite number of observations; all we must do is assume that we have a fair coin or die. Thus, with a fair coin, the probability of observing a H or a T on any single coin toss is ½ or .5 or 50%; the probability of observing a 6 on any single throw of a die is 1/6 or .16667 or 16.667%. With behavioural data, though, we can never measure all possible behavioural outcomes, which thereby forces researchers to depend on samples of observations in order to make estimates of population values.

The concept of probability is central to much of what is done in the statistical analysis of behavioural data. Whenever a behavioural scientist wishes to establish whether a particular relationship exists between variables or whether two groups, treated differently, actually show different behaviours, he/she is playing a probability game. Given a sample of observations, the behavioural scientist must decide whether what he/she has observed is providing sufficient information to conclude something about the population from which the sample was drawn.

This decision always has a non-zero probability of being in error simply because in samples that are much smaller than the population, there is always the chance or probability that we are observing something rare and atypical instead of something which is indicative of a consistent population trend. Thus, the concept of probability forms the cornerstone for statistical inference about which we will have more to say later (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec6). Probability also plays an important role in helping us to understand theoretical statistical distributions (e.g. the normal distribution) and what they can tell us about our observations. We will explore this idea further in Fundamental Concept II .

The Concept of Conditional Probability

It is important to understand that the concept of probability as described above focuses upon the likelihood or chances of observing a specific event or type of observation for a specific variable relative to a population or sample of observations. However, many important behavioural research issues may focus on the question of the probability of observing a specific event given that the researcher has knowledge that some other event has occurred or been observed (this latter event is usually measured by a second variable). Here, the focus is on the potential relationship or link between two variables or two events.

With respect to the QCI database, Maree could ask the quite reasonable question “what is the probability (estimated in the QCI sample by a relative frequency) of observing an inspector being female given that she knows that an inspector works for a Large Business Computer manufacturer.

To address this question, all she needs to know is:

  • how many inspectors from Large Business Computer manufacturers are in the sample ( 22 ); and
  • how many of those inspectors were female ( 7 ) (inspectors who were missing a score for either company or gender have been ignored here).

If she divides 7 by 22, she would obtain the probability that an inspector is female given that they work for a Large Business Computer manufacturer – that is, p = .32 .

This type of question points to the important concept of conditional probability (‘conditional’ because we are asking “what is the probability of observing one event conditional upon our knowledge of some other event”).

Continuing with the previous example, Maree would say that the conditional probability of observing a female inspector working for a Large Business Computer manufacturer is .32 or, equivalently, a 32% chance. Compare this conditional probability of p  = .32 to the overall probability of observing a female inspector in the entire sample ( p  = .46 as shown above).

This means that there is evidence for a connection or relationship between gender and the type of company an inspector works for. That is, the chances are lower for observing a female inspector from a Large Business Computer manufacturer than they are for simply observing a female inspector at all.

Maree therefore has evidence suggesting that females may be relatively under-represented in Large Business Computer manufacturing companies compared to the overall population. Knowing something about the company an inspector works for therefore can help us make a better prediction about their likely gender.

Suppose, however, that Maree’s conditional probability had been exactly equal to p  = .46. This would mean that there was exactly the same chance of observing a female inspector working for a Large Business Computer manufacturer as there was of observing a female inspector in the general population. Here, knowing something about the company an inspector works doesn’t help Maree make any better prediction about their likely gender. This would mean that the two variables are statistically independent of each other.

A classic case of events that are statistically independent is two successive throws of a fair die: rolling a six on the first throw gives us no information for predicting how likely it will be that we would roll a six on the second throw. The conditional probability of observing a six on the second throw given that I have observed a six on the first throw is 0.16667 (= 1 divided by 6) which is the same as the simple probability of observing a six on any specific throw. This statistical independence also means that if we wanted to know what the probability of throwing two sixes on two successive throws of a fair die, we would just multiply the probabilities for each independent event (i.e., throw) together; that is, .16667 × .16667 = .02789 (this is known as the multiplication rule of probability, see, for example, Smithson 2000 , p. 114).

Finally, you should know that conditional probabilities are often asymmetric. This means that for many types of behavioural variables, reversing the conditional arrangement will change the story about the relationship. Bayesian statistics (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec73) relies heavily upon this asymmetric relationship between conditional probabilities.

Maree has already learned that the conditional probability that an inspector is female given that they worked for a Large Business Computer manufacturer is p = .32. She could easily turn the conditional relationship around and ask what is the conditional probability that an inspector works for a Large Business Computer manufacturer given that the inspector is female?

From the QCI database, she can find that 51 inspectors in her total sample were female and of those 51, 7 worked for a Large Business Computer manufacturer. If she divided 7 by 51, she would get p = .14 (did you notice that all that changed was the number she divided by?). Thus, there is only a 14% chance of observing an inspector working for a Large Business Computer manufacturer given that the inspector is female – a rather different probability from p = .32, which tells a different story.

As you will see in Procedures 10.1007/978-981-15-2537-7_6#Sec14 and 10.1007/978-981-15-2537-7_7#Sec17, conditional relationships between categorical variables are precisely what crosstabulation contingency tables are designed to reveal.

Procedure 5.6: Exploratory Data Analysis

There are a variety of visual display methods for EDA, including stem & leaf displays, boxplots and violin plots. Each method reflects a specific way of displaying features of a distribution of scores or measurements and, of course, each has its own advantages and disadvantages. In addition, EDA displays are surprisingly flexible and can combine features in various ways to enhance the story conveyed by the plot.

Stem & Leaf Displays

The stem & leaf display is a simple data summary technique which not only rank orders the data points in a sample but presents them visually so that the shape of the data distribution is reflected. Stem & leaf displays are formed from data scores by splitting each score into two parts: the first part of each score serving as the ‘stem’, the second part as the ‘leaf’ (e.g. for 2-digit data values, the ‘stem’ is the number in the tens position; the ‘leaf’ is the number in the ones position). Each stem is then listed vertically, in ascending order, followed horizontally by all the leaves in ascending order associated with it. The resulting display thus shows all of the scores in the sample, but reorganised so that a rough idea of the shape of the distribution emerges. As well, extreme scores can be easily identified in a stem & leaf display.

Consider the accuracy and speed scores for the 112 quality control inspectors in the QCI sample. Figure 5.22 (produced by the R Commander Stem-and-leaf display … procedure) shows the stem & leaf displays for inspection accuracy (left display) and speed (right display) data.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig22_HTML.jpg

Stem & leaf displays produced by R Commander

[The first six lines reflect information from R Commander about each display: lines 1 and 2 show the actual R command used to produce the plot (the variable name has been highlighted in bold); line 3 gives a warning indicating that inspectors with missing values (= NA in R ) on the variable have been omitted from the display; line 4 shows how the stems and leaves have been defined; line 5 indicates what a leaf unit represents in value; and line 6 indicates the total number (n) of inspectors included in the display).] In Fig. 5.22 , for the accuracy display on the left-hand side, the ‘stems’ have been split into ‘half-stems’—one (which is starred) associated with the ‘leaves’ 0 through 4 and the other associated with the ‘leaves’ 5 through 9—a strategy that gives the display better balance and visual appeal.

Notice how the left stem & leaf display conveys a fairly clear (yet sideways) picture of the shape of the distribution of accuracy scores. It has a rather symmetrical bell-shape to it with only a slight suggestion of negative skewness (toward the extreme score at the top). The right stem & leaf display clearly depicts the highly positively skewed nature of the distribution of speed scores. Importantly, we could reconstruct the entire sample of scores for each variable using its display, which means that unlike most other graphical procedures, we didn’t have to sacrifice any information to produce the visual summary.

Some programs, such as SYSTAT, embellish their stem & leaf displays by indicating in which stem or half-stem the ‘median’ (50th percentile), the ‘upper hinge score’ (75th percentile), and ‘lower hinge score’ (25th percentile) occur in the distribution (recall the discussion of interquartile range in Procedure 5.5 ). This is shown in Fig. 5.23 , produced by SYSTAT, where M and H indicate the stem locations for the median and hinge points, respectively. This stem & leaf display labels a single extreme accuracy score as an ‘outside value’ and clearly shows that this actual score was 57.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig23_HTML.jpg

Stem & leaf display, produced by SYSTAT, of the accuracy QCI variable

Another important EDA technique is the boxplot or, as it is sometimes known, the box-and-whisker plot . This plot provides a symbolic representation that preserves less of the original nature of the data (compared to a stem & leaf display) but typically gives a better picture of the distributional characteristics. The basic boxplot, shown in Fig. 5.24 , utilises information about the median (50th percentile score) and the upper (75th percentile score) and lower (25th percentile score) hinge points in the construction of the ‘box’ portion of the graph (the ‘median’ defines the centre line in the box; the ‘upper’ and ‘lower hinge values’ define the end boundaries of the box—thus the box encompasses the middle 50% of data values).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig24_HTML.jpg

Boxplots for the accuracy and speed QCI variables

Additionally, the boxplot utilises the IQR (recall Procedure 5.5 ) as a way of defining what are called ‘fences’ which are used to indicate score boundaries beyond which we would consider a score in a distribution to be an ‘outlier’ (or an extreme or unusual value). In SPSS, the inner fence is typically defined as 1.5 times the IQR in each direction and a ‘far’ outlier or extreme case is typically defined as 3 times the IQR in either direction (Field 2018 , p. 193). The ‘whiskers’ in a boxplot extend out to the data values which are closest to the upper and lower inner fences (in most cases, the vast majority of data values will be contained within the fences). Outliers beyond these ‘whiskers’ are then individually listed. ‘Near’ outliers are those lying just beyond the inner fences and ‘far’ outliers lie well beyond the inner fences.

Figure 5.24 shows two simple boxplots (produced using SPSS), one for the accuracy QCI variable and one for the speed QCI variable. The accuracy plot shows a median value of about 83, roughly 50% of the data fall between about 77 and 89 and there is one outlier, inspector 83, in the lower ‘tail’ of the distribution. The accuracy boxplot illustrates data that are relatively symmetrically distributed without substantial skewness. Such data will tend to have their median in the middle of the box, whiskers of roughly equal length extending out from the box and few or no outliers.

The speed plot shows a median value of about 4 s, roughly 50% of the data fall between 2 s and 6 s and there are four outliers, inspectors 7, 62, 65 and 75 (although inspectors 65 and 75 fall at the same place and are rather difficult to read), all falling in the slow speed ‘tail’ of the distribution. Inspectors 65, 75 and 7 are shown as ‘near’ outliers (open circles) whereas inspector 62 is shown as a ‘far’ outlier (asterisk). The speed boxplot illustrates data which are asymmetrically distributed because of skewness in one direction. Such data may have their median offset from the middle of the box and/or whiskers of unequal length extending out from the box and outliers in the direction of the longer whisker. In the speed boxplot, the data are clearly positively skewed (the longer whisker and extreme values are in the slow speed ‘tail’).

Boxplots are very versatile representations in that side-by-side displays for sub-groups of data within a sample can permit easy visual comparisons of groups with respect to central tendency and variability. Boxplots can also be modified to incorporate information about error bands associated with the median producing what is called a ‘notched boxplot’. This helps in the visual detection of meaningful subgroup differences, where boxplot ‘notches’ don’t overlap.

Figure 5.25 (produced using NCSS), compares the distributions of accuracy and speed scores for QCI inspectors from the five types of companies, plotted side-by-side.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig25_HTML.jpg

Comparisons of the accuracy (regular boxplots) and speed (notched boxplots) QCI variables for different types of companies

Focus first on the left graph in Fig. 5.25 which plots the distribution of accuracy scores broken down by company using regular boxplots. This plot clearly shows the differing degree of skewness in each type of company (indicated by one or more outliers in one ‘tail’, whiskers which are not the same length and/or the median line being offset from the centre of a box), the differing variability of scores within each type of company (indicated by the overall length of each plot—box and whiskers), and the differing central tendency in each type of company (the median lines do not all fall at the same level of accuracy score). From the left graph in Fig. 5.25 , we could conclude that: inspection accuracy scores are most variable in PC and Large Electrical Appliance manufacturing companies and least variable in the Large Business Computer manufacturing companies; Large Business Computer and PC manufacturing companies have the highest median level of inspection accuracy; and inspection accuracy scores tend to be negatively skewed (many inspectors toward higher levels, relatively fewer who are poorer in inspection performance) in the Automotive manufacturing companies. One inspector, working for an Automotive manufacturing company, shows extremely poor inspection accuracy performance.

The right display compares types of companies in terms of their inspection speed scores, using’ notched’ boxplots. The notches define upper and lower error limits around each median. Aside from the very obvious positive skewness for speed scores (with a number of slow speed outliers) in every type of company (least so for Large Electrical Appliance manufacturing companies), the story conveyed by this comparison is that inspectors from Large Electrical Appliance and Automotive manufacturing companies have substantially faster median decision speeds compared to inspectors from Large Business Computer and PC manufacturing companies (i.e. their ‘notches’ do not overlap, in terms of speed scores, on the display).

Boxplots can also add interpretive value to other graphical display methods through the creation of hybrid displays. Such displays might combine a standard histogram with a boxplot along the X-axis to provide an enhanced picture of the data distribution as illustrated for the mentabil variable in Fig. 5.26 (produced using NCSS). This hybrid plot also employs a data ‘smoothing’ method called a density trace to outline an approximate overall shape for the data distribution. Any one graphical method would tell some of the story, but combined in the hybrid display, the story of a relatively symmetrical set of mentabil scores becomes quite visually compelling.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig26_HTML.jpg

A hybrid histogram-density-boxplot of the mentabil QCI variable

Violin Plots

Violin plots are a more recent and interesting EDA innovation, implemented in the NCSS software package (Hintze 2012 ). The violin plot gets its name from the rough shape that the plots tend to take on. Violin plots are another type of hybrid plot, this time combining density traces (mirror-imaged right and left so that the plots have a sense of symmetry and visual balance) with boxplot-type information (median, IQR and upper and lower inner ‘fences’, but not outliers). The goal of the violin plot is to provide a quick visual impression of the shape, central tendency and variability of a distribution (the length of the violin conveys a sense of the overall variability whereas the width of the violin conveys a sense of the frequency of scores occurring in a specific region).

Figure 5.27 (produced using NCSS), compares the distributions of speed scores for QCI inspectors across the five types of companies, plotted side-by-side. The violin plot conveys a similar story to the boxplot comparison for speed in the right graph of Fig. 5.25 . However, notice that with the violin plot, unlike with a boxplot, you also get a sense of distributions that have ‘clumps’ of scores in specific areas. Some violin plots, like that for Automobile manufacturing companies in Fig. 5.27 , have a shape suggesting a multi-modal distribution (recall Procedure 5.4 and the discussion of the fact that a distribution may have multiple modes). The violin plot in Fig. 5.27 has also been produced to show where the median (solid line) and mean (dashed line) would fall within each violin. This facilitates two interpretations: (1) a relative comparison of central tendency across the five companies and (2) relative degree of skewness in the distribution for each company (indicated by the separation of the two lines within a violin; skewness is particularly bad for the Large Business Computer manufacturing companies).

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig27_HTML.jpg

Violin plot comparisons of the speed QCI variable for different types of companies

EDA methods (of which we have illustrated only a small subset; we have not reviewed dot density diagrams, for example) provide summary techniques for visually displaying certain characteristics of a set of data. The advantage of the EDA methods over more traditional graphing techniques such as those described in Procedure 5.2 is that as much of the original integrity of the data is maintained as possible while maximising the amount of summary information available about distributional characteristics.

Stem & leaf displays maintain the data in as close to their original form as possible whereas boxplots and violin plots provide more symbolic and flexible representations. EDA methods are best thought of as communication devices designed to facilitate quick visual impressions and they can add interest to any statistical story being conveyed about a sample of data. NCSS, SYSTAT, STATGRAPHICS and R Commander generally offer more options and flexibility in the generation of EDA displays than SPSS.

EDA methods tend to get cumbersome if a great many variables or groups need to be summarised. In such cases, using numerical summary statistics (such as means and standard deviations) will provide a more economical and efficient summary. Boxplots or violin plots are generally more space efficient summary techniques than stem & leaf displays.

Often, EDA techniques are used as data screening devices, which are typically not reported in actual write-ups of research (we will discuss data screening in more detail in Procedure 10.1007/978-981-15-2537-7_8#Sec11). This is a perfectly legitimate use for the methods although there is an argument for researchers to put these techniques to greater use in published literature.

Software packages may use different rules for constructing EDA plots which means that you might get rather different looking plots and different information from different programs (you saw some evidence of this in Figs. 5.22 and 5.23 ). It is important to understand what the programs are using as decision rules for locating fences and outliers so that you are clear on how best to interpret the resulting plot—such information is generally contained in the user’s guides or manuals for NCSS (Hintze 2012 ), SYSTAT (SYSTAT Inc. 2009a , b ), STATGRAPHICS (StatPoint Technologies Inc. 2010 ) and SPSS (Norušis 2012 ).

Virtually any research design which produces numerical measures (even to the extent of just counting the number of occurrences of several events) provides opportunities for employing EDA displays which may help to clarify data characteristics or relationships. One extremely important use of EDA methods is as data screening devices for detecting outliers and other data anomalies, such as non-normality and skewness, before proceeding to parametric statistical analyses. In some cases, EDA methods can help the researcher to decide whether parametric or nonparametric statistical tests would be best to apply to his or her data because critical data characteristics such as distributional shape and spread are directly reflected.

ApplicationProcedures
SPSS

produces stem-and-leaf displays and boxplots by default; variables may be explored on a whole-of-sample basis or broken down by the categories of a specific variable (called a ‘factor’ in the procedure). Cases can also be labelled with a variable (like in the QCI database), so that outlier points in the boxplot are identifiable.

can also be used to custom build different types of boxplots.

NCSS

produces a stem-and-leaf display by default.

can be used to produce box plots with different features (such as ‘notches’ and connecting lines).

can be configured to produce violin plots (by selecting the plot shape as ‘density with reflection’).

SYSTAT

can be used to produce stem-and-leaf displays for variables; however, you cannot really control any features of these displays.

can be used to produce boxplots of many types, with a number of features being controllable.

STATGRAPHICS

allows you to do a complete exploration of a single variable, including stem-and-leaf display (you need to select this option) and boxplot (produced by default). Some features of the boxplot can be controlled, but not features of the stem-and-leaf diagram.

and select either or which can produce not only descriptive statistics but also boxplots with some controllable features.

Commander or the dialog box for each procedure offers some features of the display or plot that can be controlled; whole-of-sample boxplots or boxplots by groups are possible.

Procedure 5.7: Standard ( z ) Scores

In certain practical situations in behavioural research, it may be desirable to know where a specific individual’s score lies relative to all other scores in a distribution. A convenient measure is to observe how many standard deviations (see Procedure 5.5 ) above or below the sample mean a specific score lies. This measure is called a standard score or z -score . Very simply, any raw score can be converted to a z -score by subtracting the sample mean from the raw score and dividing that result by the sample’s standard deviation. z -scores can be positive or negative and their sign simply indicates whether the score lies above (+) or below (−) the mean in value. A z -score has a very simple interpretation: it measures the number of standard deviations above or below the sample mean a specific raw score lies.

In the QCI database, we have a sample mean for speed scores of 4.48 s, a standard deviation for speed scores of 2.89 s (recall Table 5.4 in Procedure 5.5 ). If we are interested in the z -score for Inspector 65’s raw speed score of 11.94 s, we would obtain a z -score of +2.58 using the method described above (subtract 4.48 from 11.94 and divide the result by 2.89). The interpretation of this number is that a raw decision speed score of 11.94 s lies about 2.9 standard deviations above the mean decision speed for the sample.

z -scores have some interesting properties. First, if one converts (statisticians would say ‘transforms’) every available raw score in a sample to z -scores, the mean of these z -scores will always be zero and the standard deviation of these z -scores will always be 1.0. These two facts about z -scores (mean = 0; standard deviation = 1) will be true no matter what sample you are dealing with and no matter what the original units of measurement are (e.g. seconds, percentages, number of widgets assembled, amount of preference for a product, attitude rating, amount of money spent). This is because transforming raw scores to z -scores automatically changes the measurement units from whatever they originally were to a new system of measurements expressed in standard deviation units.

Suppose Maree was interested in the performance statistics for the top 25% most accurate quality control inspectors in the sample. Given a sample size of 112, this would mean finding the top 28 inspectors in terms of their accuracy scores. Since Maree is interested in performance statistics, speed scores would also be of interest. Table 5.5 (generated using the SPSS Descriptives … procedure, listed using the Case Summaries … procedure and formatted for presentation using Excel) shows accuracy and speed scores for the top 28 inspectors in descending order of accuracy scores. The z -score transformation for each of these scores is also shown (last two columns) as are the type of company, education level and gender for each inspector.

Listing of the 28 (top 25%) most accurate QCI inspectors’ accuracy and speed scores as well as standard ( z ) score transformations for each score

Case numberInspectorcompanyeduclevgenderaccuracyspeedZaccuracyZspeed
18PC ManufacturerHigh School OnlyMale1001.521.95−1.03
29PC ManufacturerHigh School OnlyFemale1003.321.95−0.40
314PC ManufacturerHigh School OnlyMale1003.831.95−0.23
417PC ManufacturerHigh School OnlyFemale997.071.840.90
5101PC ManufacturerHigh School Only983.111.73−0.47
619PC ManufacturerTertiary QualifiedFemale943.841.29−0.22
734Large Electrical Appliance ManufacturerTertiary QualifiedMale941.901.29−0.89
863Large Business Computer ManufacturerHigh School OnlyMale9411.941.292.58
967Large Business Computer ManufacturerHigh School OnlyMale942.341.29−0.74
1080Large Business Computer ManufacturerHigh School OnlyFemale944.681.290.07
115PC ManufacturerTertiary QualifiedMale934.181.18−0.10
1218PC ManufacturerTertiary QualifiedMale937.321.180.98
1346Small Electrical Appliance ManufacturerTertiary QualifiedFemale932.011.18−0.86
1464Large Business Computer ManufacturerHigh School OnlyFemale925.181.080.24
1577Large Business Computer ManufacturerTertiary QualifiedFemale926.111.080.56
1679Large Business Computer ManufacturerHigh School OnlyMale924.381.08−0.03
17106Large Electrical Appliance ManufacturerTertiary QualifiedMale921.701.08−0.96
1858Small Electrical Appliance ManufacturerHigh School OnlyMale914.120.97−0.12
1963Large Business Computer ManufacturerHigh School OnlyMale914.730.970.09
2072Large Business Computer ManufacturerTertiary QualifiedMale914.720.970.08
2120PC ManufacturerHigh School OnlyMale904.530.860.02
2269Large Business Computer ManufacturerHigh School OnlyMale904.940.860.16
2371Large Business Computer ManufacturerHigh School OnlyFemale9010.460.862.07
2485Automobile ManufacturerTertiary QualifiedFemale903.140.86−0.46
25111Large Business Computer ManufacturerHigh School OnlyMale904.110.86−0.13
266PC ManufacturerHigh School OnlyMale895.460.750.34
2761Large Business Computer ManufacturerTertiary QualifiedMale895.710.750.43
2875Large Business Computer ManufacturerHigh School OnlyMale8912.050.752.62

There are three inspectors (8, 9 and 14) who scored maximum accuracy of 100%. Such accuracy converts to a z -score of +1.95. Thus 100% accuracy is 1.95 standard deviations above the sample’s mean accuracy level. Interestingly, all three inspectors worked for PC manufacturers and all three had only high school-level education. The least accurate inspector in the top 25% had a z -score for accuracy that was .75 standard deviations above the sample mean.

Interestingly, the top three inspectors in terms of accuracy had decision speeds that fell below the sample’s mean speed; inspector 8 was the fastest inspector of the three with a speed just over 1 standard deviation ( z  = −1.03) below the sample mean. The slowest inspector in the top 25% was inspector 75 (case #28 in the list) with a speed z -score of +2.62; i.e., he was over two and a half standard deviations slower in making inspection decisions relative to the sample’s mean speed.

The fact that z -scores always have a common measurement scale having a mean of 0 and a standard deviation of 1.0 leads to an interesting application of standard scores. Suppose we focus on inspector number 65 (case #8 in the list) in Table 5.5 . It might be of interest to compare this inspector’s quality control performance in terms of both his decision accuracy and decision speed. Such a comparison is impossible using raw scores since the inspector’s accuracy score and speed scores are different measures which have differing means and standard deviations expressed in fundamentally different units of measurement (percentages and seconds). However, if we are willing to assume that the score distributions for both variables are approximately the same shape and that both accuracy and speed are measured with about the same level of reliability or consistency (see Procedure 10.1007/978-981-15-2537-7_8#Sec1), we can compare the inspector’s two scores by first converting them to z -scores within their own respective distributions as shown in Table 5.5 .

Inspector 65 looks rather anomalous in that he demonstrated a relatively high level of accuracy (raw score = 94%; z  = +1.29) but took a very long time to make those accurate decisions (raw score = 11.94 s; z  = +2.58). Contrast this with inspector 106 (case #17 in the list) who demonstrated a similar level of accuracy (raw score = 92%; z  = +1.08) but took a much shorter time to make those accurate decisions (raw score = 1.70 s; z  = −.96). In terms of evaluating performance, from a company perspective, we might conclude that inspector 106 is performing at an overall higher level than inspector 65 because he can achieve a very high level of accuracy but much more quickly; accurate and fast is more cost effective and efficient than accurate and slow.

Note: We should be cautious here since we know from our previous explorations of the speed variable in Procedure 5.6 , that accuracy scores look fairly symmetrical and speed scores are positively skewed, so assuming that the two variables have the same distribution shape, so that z -score comparisons are permitted, would be problematic.

You might have noticed that as you scanned down the two columns of z -scores in Table 5.5 , there was a suggestion of a pattern between the signs attached to the respective z -scores for each person. There seems to be a very slight preponderance of pairs of z -scores where the signs are reversed (12 out of 22 pairs). This observation provides some very preliminary evidence to suggest that there may be a relationship between inspection accuracy and decision speed, namely that a more accurate decision tends to be associated with a faster decision speed. Of course, this pattern would be better verified using the entire sample rather than the top 25% of inspectors. However, you may find it interesting to learn that it is precisely this sort of suggestive evidence (about agreement or disagreement between z -score signs for pairs of variable scores throughout a sample) that is captured and summarised by a single statistical indicator called a ‘correlation coefficient’ (see Fundamental Concept 10.1007/978-981-15-2537-7_6#Sec1 and Procedure 10.1007/978-981-15-2537-7_6#Sec4).

z -scores are not the only type of standard score that is commonly used. Three other types of standard scores are: stanines (standard nines), IQ scores and T-scores (not to be confused with the t -test described in Procedure 10.1007/978-981-15-2537-7_7#Sec22). These other types of scores have the advantage of producing only positive integer scores rather than positive and negative decimal scores. This makes interpretation somewhat easier for certain applications. However, you should know that almost all other types of standard scores come from a specific transformation of z -scores. This is because once you have converted raw scores into z -scores, they can then be quite readily transformed into any other system of measurement by simply multiplying a person’s z -score by the new desired standard deviation for the measure and adding to that product the new desired mean for the measure.

T-scores are simply z-scores transformed to have a mean of 50.0 and a standard deviation of 10.0; IQ scores are simply z-scores transformed to have a mean of 100 and a standard deviation of 15 (or 16 in some systems). For more information, see Fundamental Concept II .

Standard scores are useful for representing the position of each raw score within a sample distribution relative to the mean of that distribution. The unit of measurement becomes the number of standard deviations a specific score is away from the sample mean. As such, z -scores can permit cautious comparisons across samples or across different variables having vastly differing means and standard deviations within the constraints of the comparison samples having similarly shaped distributions and roughly equivalent levels of measurement reliability. z -scores also form the basis for establishing the degree of correlation between two variables. Transforming raw scores into z -scores does not change the shape of a distribution or rank ordering of individuals within that distribution. For this reason, a z -score is referred to as a linear transformation of a raw score. Interestingly, z -scores provide an important foundational element for more complex analytical procedures such as factor analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec36), cluster analysis ( Procedure 10.1007/978-981-15-2537-7_6#Sec41) and multiple regression analysis (see, for example, Procedure 10.1007/978-981-15-2537-7_6#Sec27 and 10.1007/978-981-15-2537-7_7#Sec86).

While standard scores are useful indices, they are subject to restrictions if used to compare scores across samples or across different variables. The samples must have similar distribution shapes for the comparisons to be meaningful and the measures must have similar levels of reliability in each sample. The groups used to generate the z -scores should also be similar in composition (with respect to age, gender distribution, and so on). Because z -scores are not an intuitively meaningful way of presenting scores to lay-persons, many other types of standard score schemes have been devised to improve interpretability. However, most of these schemes produce scores that run a greater risk of facilitating lay-person misinterpretations simply because their connection with z -scores is hidden or because the resulting numbers ‘look’ like a more familiar type of score which people do intuitively understand.

It is extremely rare for a T-score to exceed 100 or go below 0 because this would mean that the raw score was in excess of 5 standard deviations away from the sample mean. This unfortunately means that T-scores are often misinterpreted as percentages because they typically range between 0 and 100 and therefore ‘look’ like percentages. However, T-scores are definitely not percentages.

Finally, a common misunderstanding of z -scores is that transforming raw scores into z -scores makes them follow a normal distribution (see Fundamental Concept II ). This is not the case. The distribution of z -scores will have exactly the same shape as that for the raw scores; if the raw scores are positively skewed, then the corresponding z -scores will also be positively skewed.

z -scores are particularly useful in evaluative studies where relative performance indices are of interest. Whenever you compute a correlation coefficient ( Procedure 10.1007/978-981-15-2537-7_6#Sec4), you are implicitly transforming the two variables involved into z -scores (which equates the variables in terms of mean and standard deviation), so that only the patterning in the relationship between the variables is represented. z -scores are also useful as a preliminary step to more advanced parametric statistical methods when variables differing in scale, range and/or measurement units must be equated for means and standard deviations prior to analysis.

ApplicationProcedures
SPSS and tick the box labelled ‘Save standardized values as variables’. -scores are saved as new variables (labelled as Z followed by the original variable name as shown in Table ) which can then be listed or analysed further.
NCSS and select a new variable to hold the -scores, then select the ‘STANDARDIZE’ transformation from the list of available functions. -scores are saved as new variables which can then be listed or analysed further.
SYSTAT where -scores are saved as new variables which can then be listed or analysed further.
STATGRAPHICSOpen the window, and select an empty column in the database, then and choose the ‘STANDARDIZE’ transformation, choose the variable you want to transform and give the new variable a name.
Commander and select the variables you want to standardize; Commander automatically saves the transformed variable to the data base, appending Z. to the front of each variable’s name.

Fundamental Concept II: The Normal Distribution

Arguably the most fundamental distribution used in the statistical analysis of quantitative data in the behavioural and social sciences is the normal distribution (also known as the Gaussian or bell-shaped distribution ). Many behavioural phenomena, if measured on a large enough sample of people, tend to produce ‘normally distributed’ variable scores. This includes most measures of ability, performance and productivity, personality characteristics and attitudes. The normal distribution is important because it is the one form of distribution that you must assume describes the scores of a variable in the population when parametric tests of statistical inference are undertaken. The standard normal distribution is defined as having a population mean of 0.0 and a population standard deviation of 1.0. The normal distribution is also important as a means of interpreting various types of scoring systems.

Figure 5.28 displays the standard normal distribution (mean = 0; standard deviation = 1.0) and shows that there is a clear link between z -scores and the normal distribution. Statisticians have analytically calculated the probability (also expressed as percentages or percentiles) that observations will fall above or below any specific z -score in the theoretical standard normal distribution. Thus, a z -score of +1.0 in the standard normal distribution will have 84.13% (equals a probability of .8413) of observations in the population falling at or below one standard deviation above the mean and 15.87% falling above that point. A z -score of −2.0 will have 2.28% of observations falling at that point or below and 97.72% of observations falling above that point. It is clear then that, in a standard normal distribution, z -scores have a direct relationship with percentiles .

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig28_HTML.jpg

The normal (bell-shaped or Gaussian) distribution

Figure 5.28 also shows how T-scores relate to the standard normal distribution and to z -scores. The mean T-score falls at 50 and each increment or decrement of 10 T-score units means a movement of another standard deviation away from this mean of 50. Thus, a T-score of 80 corresponds to a z -score of +3.0—a score 3 standard deviations higher than the mean of 50.

Of special interest to behavioural researchers are the values for z -scores in a standard normal distribution that encompass 90% of observations ( z  = ±1.645—isolating 5% of the distribution in each tail), 95% of observations ( z  = ±1.96—isolating 2.5% of the distribution in each tail), and 99% of observations ( z  = ±2.58—isolating 0.5% of the distribution in each tail).

Depending upon the degree of certainty required by the researcher, these bands describe regions outside of which one might define an observation as being atypical or as perhaps not belonging to a distribution being centred at a mean of 0.0. Most often, what is taken as atypical or rare in the standard normal distribution is a score at least two standard deviations away from the mean, in either direction. Why choose two standard deviations? Since in the standard normal distribution, only about 5% of observations will fall outside a band defined by z -scores of ±1.96 (rounded to 2 for simplicity), this equates to data values that are 2 standard deviations away from their mean. This can give us a defensible way to identify outliers or extreme values in a distribution.

Thinking ahead to what you will encounter in Chap. 10.1007/978-981-15-2537-7_7, this ‘banding’ logic can be extended into the world of statistics (like means and percentages) as opposed to just the world of observations. You will frequently hear researchers speak of some statistic estimating a specific value (a parameter ) in a population, plus or minus some other value.

A survey organisation might report political polling results in terms of a percentage and an error band, e.g. 59% of Australians indicated that they would vote Labour at the next federal election, plus or minus 2%.

Most commonly, this error band (±2%) is defined by possible values for the population parameter that are about two standard deviations (or two standard errors—a concept discussed further in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) away from the reported or estimated statistical value. In effect, the researcher is saying that on 95% of the occasions he/she would theoretically conduct his/her study, the population value estimated by the statistic being reported would fall between the limits imposed by the endpoints of the error band (the official name for this error band is a confidence interval ; see Procedure 10.1007/978-981-15-2537-7_8#Sec18). The well-understood mathematical properties of the standard normal distribution are what make such precise statements about levels of error in statistical estimates possible.

Checking for Normality

It is important to understand that transforming the raw scores for a variable to z -scores (recall Procedure 5.7 ) does not produce z -scores which follow a normal distribution; rather they will have the same distributional shape as the original scores. However, if you are willing to assume that the normal distribution is the correct reference distribution in the population, then you are justified is interpreting z -scores in light of the known characteristics of the normal distribution.

In order to justify this assumption, not only to enhance the interpretability of z -scores but more generally to enhance the integrity of parametric statistical analyses, it is helpful to actually look at the sample frequency distributions for variables (using a histogram (illustrated in Procedure 5.2 ) or a boxplot (illustrated in Procedure 5.6 ), for example), since non-normality can often be visually detected. It is important to note that in the social and behavioural sciences as well as in economics and finance, certain variables tend to be non-normal by their very nature. This includes variables that measure time taken to complete a task, achieve a goal or make decisions and variables that measure, for example, income, occurrence of rare or extreme events or organisational size. Such variables tend to be positively skewed in the population, a pattern that can often be confirmed by graphing the distribution.

If you cannot justify an assumption of ‘normality’, you may be able to force the data to be normally distributed by using what is called a ‘normalising transformation’. Such transformations will usually involve a nonlinear mathematical conversion (such as computing the logarithm, square root or reciprocal) of the raw scores. Such transformations will force the data to take on a more normal appearance so that the assumption of ‘normality’ can be reasonably justified, but at the cost of creating a new variable whose units of measurement and interpretation are more complicated. [For some non-normal variables, such as the occurrence of rare, extreme or catastrophic events (e.g. a 100-year flood or forest fire, coronavirus pandemic, the Global Financial Crisis or other type of financial crisis, man-made or natural disaster), the distributions cannot be ‘normalised’. In such cases, the researcher needs to model the distribution as it stands. For such events, extreme value theory (e.g. see Diebold et al. 2000 ) has proven very useful in recent years. This theory uses a variation of the Pareto or Weibull distribution as a reference, rather than the normal distribution, when making predictions.]

Figure 5.29 displays before and after pictures of the effects of a logarithmic transformation on the positively skewed speed variable from the QCI database. Each graph, produced using NCSS, is of the hybrid histogram-density trace-boxplot type first illustrated in Procedure 5.6 . The left graph clearly shows the strong positive skew in the speed scores and the right graph shows the result of taking the log 10 of each raw score.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig29_HTML.jpg

Combined histogram-density trace-boxplot graphs displaying the before and after effects of a ‘normalising’ log 10 transformation of the speed variable

Notice how the long tail toward slow speed scores is pulled in toward the mean and the very short tail toward fast speed scores is extended away from the mean. The result is a more ‘normal’ appearing distribution. The assumption would then be that we could assume normality of speed scores, but only in a log 10 format (i.e. it is the log of speed scores that we assume is normally distributed in the population). In general, taking the logarithm of raw scores provides a satisfactory remedy for positively skewed distributions (but not for negatively skewed ones). Furthermore, anything we do with the transformed speed scores now has to be interpreted in units of log 10 (seconds) which is a more complex interpretation to make.

Another visual method for detecting non-normality is to graph what is called a normal Q-Q plot (the Q-Q stands for Quantile-Quantile). This plots the percentiles for the observed data against the percentiles for the standard normal distribution (see Cleveland 1995 for more detailed discussion; also see Lane 2007 , http://onlinestatbook.com/2/advanced_graphs/ q-q_plots.html) . If the pattern for the observed data follows a normal distribution, then all the points on the graph will fall approximately along a diagonal line.

Figure 5.30 shows the normal Q-Q plots for the original speed variable and the transformed log-speed variable, produced using the SPSS Explore... procedure. The diagnostic diagonal line is shown on each graph. In the left-hand plot, for speed , the plot points clearly deviate from the diagonal in a way that signals positive skewness. The right-hand plot, for log_speed, shows the plot points generally falling along the diagonal line thereby conforming much more closely to what is expected in a normal distribution.

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Fig30_HTML.jpg

Normal Q-Q plots for the original speed variable and the new log_speed variable

In addition to visual ways of detecting non-normality, there are also numerical ways. As highlighted in Chap. 10.1007/978-981-15-2537-7_1, there are two additional characteristics of any distribution, namely skewness (asymmetric distribution tails) and kurtosis (peakedness of the distribution). Both have an associated statistic that provides a measure of that characteristic, similar to the mean and standard deviation statistics. In a normal distribution, the values for the skewness and kurtosis statistics are both zero (skewness = 0 means a symmetric distribution; kurtosis = 0 means a mesokurtic distribution). The further away each statistic is from zero, the more the distribution deviates from a normal shape. Both the skewness statistic and the kurtosis statistic have standard errors (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec14) associated with them (which work very much like the standard deviation, only for a statistic rather than for observations); these can be routinely computed by almost any statistical package when you request a descriptive analysis. Without going into the logic right now (this will come in Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1), a rough rule of thumb you can use to check for normality using the skewness and kurtosis statistics is to do the following:

  • Prepare : Take the standard error for the statistic and multiply it by 2 (or 3 if you want to be more conservative).
  • Interval : Add the result from the Prepare step to the value of the statistic and subtract the result from the value of the statistic. You will end up with two numbers, one low - one high, that define the ends of an interval (what you have just created approximates what is called a ‘confidence interval’, see Procedure 10.1007/978-981-15-2537-7_8#Sec18).
  • Check : If zero falls inside of this interval (i.e. between the low and high endpoints from the Interval step), then there is likely to be no significant issue with that characteristic of the distribution. If zero falls outside of the interval (i.e. lower than the low value endpoint or higher than the high value endpoint), then you likely have an issue with non-normality with respect to that characteristic.

Visually, we saw in the left graph in Fig. 5.29 that the speed variable was highly positively skewed. What if Maree wanted to check some numbers to support this judgment? She could ask SPSS to produce the skewness and kurtosis statistics for both the original speed variable and the new log_speed variable using the Frequencies... or the Explore... procedure. Table 5.6 shows what SPSS would produce if the Frequencies ... procedure were used.

Skewness and kurtosis statistics and their standard errors for both the original speed variable and the new log_speed variable

An external file that holds a picture, illustration, etc.
Object name is 489638_3_En_5_Tab6_HTML.jpg

Using the 3-step check rule described above, Maree could roughly evaluate the normality of the two variables as follows:

  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] 1.487 − .458 = 1.029 and 1.487 + .458 = 1.945 ➔ [Check] zero does not fall inside the interval bounded by 1.029 and 1.945, so there appears to be a significant problem with skewness. Since the value for the skewness statistic (1.487) is positive, this means the problem is positive skewness, confirming what the left graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] 3.071 − .91 = 2.161 and 3.071 + .91 = 3.981 ➔ [Check] zero does not fall in interval bounded by 2.161 and 3.981, so there appears to be a significant problem with kurtosis. Since the value for the kurtosis statistic (1.487) is positive, this means the problem is leptokurtosis—the peakedness of the distribution is too tall relative to what is expected in a normal distribution.
  • skewness : [Prepare] 2 × .229 = .458 ➔ [Interval] −.050 − .458 = −.508 and −.050 + .458 = .408 ➔ [Check] zero falls within interval bounded by −.508 and .408, so there appears to be no problem with skewness. The log transform appears to have corrected the problem, confirming what the right graph in Fig. 5.29 showed.
  • kurtosis : [Prepare] 2 × .455 = .91 ➔ [Interval] −.672 – .91 = −1.582 and −.672 + .91 = .238 ➔ [Check] zero falls within interval bounded by −1.582 and .238, so there appears to be no problem with kurtosis. The log transform appears to have corrected this problem as well, rendering the distribution more approximately mesokurtic (i.e. normal) in shape.

There are also more formal tests of significance (see Fundamental Concept 10.1007/978-981-15-2537-7_7#Sec1) that one can use to numerically evaluate normality, such as the Kolmogorov-Smirnov test and the Shapiro-Wilk’s test . Each of these tests, for example, can be produced by SPSS on request, via the Explore... procedure.

1 For more information, see Chap. 10.1007/978-981-15-2537-7_1 – The language of statistics .

References for Procedure 5.1

  • Allen P, Bennett K, Heritage B. SPSS statistics: A practical guide. 4. South Melbourne, VIC: Cengage Learning Australia Pty; 2019. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. New York: Routledge; 2019. [ Google Scholar ]

Useful Additional Readings for Procedure 5.1

  • Agresti A. Statistical methods for the social sciences. 5. Boston: Pearson; 2018. [ Google Scholar ]
  • Argyrous G. Statistics for research: With a guide to SPSS. 3. London: Sage; 2011. [ Google Scholar ]
  • De Vaus D. Analyzing social science data: 50 key problems in data analysis. London: Sage; 2002. [ Google Scholar ]
  • Glass GV, Hopkins KD. Statistical methods in education and psychology. 3. Upper Saddle River, NJ: Pearson; 1996. [ Google Scholar ]
  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 10. Belmont, CA: Wadsworth Cengage; 2017. [ Google Scholar ]
  • Steinberg WJ. Statistics alive. 2. Los Angeles: Sage; 2011. [ Google Scholar ]

References for Procedure 5.2

  • Chang W. R graphics cookbook: Practical recipes for visualizing data. 2. Sebastopol, CA: O’Reilly Media; 2019. [ Google Scholar ]
  • Jacoby WG. Statistical graphics for univariate and bivariate data. Thousand Oaks, CA: Sage; 1997. [ Google Scholar ]
  • McCandless D. Knowledge is beautiful. London: William Collins; 2014. [ Google Scholar ]
  • Smithson MJ. Statistics with confidence. London: Sage; 2000. [ Google Scholar ]
  • Toseland M, Toseland S. Infographica: The world as you have never seen it before. London: Quercus Books; 2012. [ Google Scholar ]
  • Wilkinson L. Cognitive science and graphic design. In: SYSTAT Software Inc, editor. SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. pp. 1–21. [ Google Scholar ]

Useful Additional Readings for Procedure 5.2

  • Field A. Discovering statistics using SPSS for windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • George D, Mallery P. IBM SPSS statistics 25 step by step: A simple guide and reference. 15. Boston, MA: Pearson Education; 2019. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Graphics. Kaysville, UT: Number Cruncher Statistical Systems; 2012. [ Google Scholar ]
  • StatPoint Technologies, Inc . STATGRAPHICS Centurion XVI user manual. Warrenton, VA: StatPoint Technologies Inc.; 2010. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Graphics. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

References for Procedure 5.3

  • Cleveland WR. Visualizing data. Summit, NJ: Hobart Press; 1995. [ Google Scholar ]
  • Jacoby WJ. Statistical graphics for visualizing multivariate data. Thousand Oaks, CA: Sage; 1998. [ Google Scholar ]

Useful Additional Readings for Procedure 5.3

  • Kirk A. Data visualisation: A handbook for data driven design. Los Angeles: Sage; 2016. [ Google Scholar ]
  • Knaflic CN. Storytelling with data: A data visualization guide for business professionals. Hoboken, NJ: Wiley; 2015. [ Google Scholar ]
  • Tufte E. The visual display of quantitative information. 2. Cheshire, CN: Graphics Press; 2001. [ Google Scholar ]

Reference for Procedure 5.4

Useful additional readings for procedure 5.4.

  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill Inc; 1991. [ Google Scholar ]

References for Procedure 5.5

Useful additional readings for procedure 5.5.

  • Gravetter FJ, Wallnau LB. Statistics for the behavioural sciences. 9. Belmont, CA: Wadsworth Cengage; 2012. [ Google Scholar ]

References for Fundamental Concept I

Useful additional readings for fundamental concept i.

  • Howell DC. Statistical methods for psychology. 8. Belmont, CA: Cengage Wadsworth; 2013. [ Google Scholar ]

References for Procedure 5.6

  • Norušis MJ. IBM SPSS statistics 19 guide to data analysis. Upper Saddle River, NJ: Prentice Hall; 2012. [ Google Scholar ]
  • Field A. Discovering statistics using SPSS for Windows. 5. Los Angeles: Sage; 2018. [ Google Scholar ]
  • Hintze JL. NCSS 8 help system: Introduction. Kaysville, UT: Number Cruncher Statistical System; 2012. [ Google Scholar ]
  • SYSTAT Software Inc . SYSTAT 13: Statistics - I. Chicago, IL: SYSTAT Software Inc; 2009. [ Google Scholar ]

Useful Additional Readings for Procedure 5.6

  • Hartwig F, Dearing BE. Exploratory data analysis. Beverly Hills, CA: Sage; 1979. [ Google Scholar ]
  • Leinhardt G, Leinhardt L. Exploratory data analysis. In: Keeves JP, editor. Educational research, methodology, and measurement: An international handbook. 2. Oxford: Pergamon Press; 1997. pp. 519–528. [ Google Scholar ]
  • Rosenthal R, Rosnow RL. Essentials of behavioral research: Methods and data analysis. 2. New York: McGraw-Hill, Inc.; 1991. [ Google Scholar ]
  • Tukey JW. Exploratory data analysis. Reading, MA: Addison-Wesley Publishing; 1977. [ Google Scholar ]
  • Velleman PF, Hoaglin DC. ABC’s of EDA. Boston: Duxbury Press; 1981. [ Google Scholar ]

Useful Additional Readings for Procedure 5.7

References for fundemental concept ii.

  • Diebold FX, Schuermann T, Stroughair D. Pitfalls and opportunities in the use of extreme value theory in risk management. The Journal of Risk Finance. 2000; 1 (2):30–35. doi: 10.1108/eb043443. [ CrossRef ] [ Google Scholar ]
  • Lane D. Online statistics education: A multimedia course of study. Houston, TX: Rice University; 2007. [ Google Scholar ]

Useful Additional Readings for Fundemental Concept II

  • Keller DK. The tao of statistics: A path to understanding (with no math) Thousand Oaks, CA: Sage; 2006. [ Google Scholar ]
  • Privacy Policy

Research Method

Home » Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics – Types, Methods and Examples

Table of Contents

Descriptive Statistics

Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution, as well as graphical representations of data.

Here are the main components of descriptive statistics:

  • Measures of Central Tendency : These provide a summary statistic that represents the center point or typical value of a dataset. The most common measures of central tendency are the mean (average), median (middle value), and mode (most frequent value).
  • Measures of Dispersion or Variability : These provide a summary statistic that represents the spread of values in a dataset. Common measures of dispersion include the range (difference between the highest and lowest values), variance (average of the squared differences from the mean), standard deviation (square root of the variance), and interquartile range (difference between the upper and lower quartiles).
  • Measures of Position : These are used to understand the distribution of values within a dataset. They include percentiles and quartiles.
  • Graphical Representations : Data can be visually represented using various methods like bar graphs, histograms, pie charts, box plots, and scatter plots. These visuals provide a clear, intuitive way to understand the data.
  • Measures of Association : These measures provide insight into the relationships between variables in the dataset, such as correlation and covariance.

Descriptive Statistics Types

Descriptive statistics can be classified into two types:

Measures of Central Tendency

These measures help describe the center point or average of a data set. There are three main types:

  • Mean : The average value of the dataset, obtained by adding all the data points and dividing by the number of data points.
  • Median : The middle value of the dataset, obtained by ordering all data points and picking out the one in the middle (or the average of the two middle numbers if the dataset has an even number of observations).
  • Mode : The most frequently occurring value in the dataset.

Measures of Variability (or Dispersion)

These measures describe the spread or variability of the data points in the dataset. There are four main types:

  • Range : The difference between the largest and smallest values in the dataset.
  • Variance : The average of the squared differences from the mean.
  • Standard Deviation : The square root of the variance, giving a measure of dispersion that is in the same units as the original dataset.
  • Interquartile Range (IQR) : The range between the first quartile (25th percentile) and the third quartile (75th percentile), which provides a measure of variability that is resistant to outliers.

Descriptive Statistics Formulas

Sure, here are some of the most commonly used formulas in descriptive statistics:

Mean (μ or x̄) :

The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations.

Formula : μ = Σx/n or x̄ = Σx/n (where Σx is the sum of all observations and n is the number of observations)

The middle value in the dataset when the observations are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.

The most frequently occurring number in the dataset. There’s no formula for this as it’s determined by observation.

The difference between the highest (max) and lowest (min) values in the dataset.

Formula : Range = max – min

Variance (σ² or s²) :

The average of the squared differences from the mean. Variance is a measure of how spread out the numbers in the dataset are.

Population Variance formula : σ² = Σ(x – μ)² / N Sample Variance formula: s² = Σ(x – x̄)² / (n – 1)

(where x is each individual observation, μ is the population mean, x̄ is the sample mean, N is the size of the population, and n is the size of the sample)

Standard Deviation (σ or s) :

The square root of the variance. It measures the amount of variability or dispersion for a set of data. Population Standard Deviation formula: σ = √σ² Sample Standard Deviation formula: s = √s²

Interquartile Range (IQR) :

The range between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile). It measures statistical dispersion, or how far apart the data points are.

Formula : IQR = Q3 – Q1

Descriptive Statistics Methods

Here are some of the key methods used in descriptive statistics:

This method involves arranging data into a table format, making it easier to understand and interpret. Tables often show the frequency distribution of variables.

Graphical Representation

This method involves presenting data visually to help reveal patterns, trends, outliers, or relationships between variables. There are many types of graphs used, such as bar graphs, histograms, pie charts, line graphs, box plots, and scatter plots.

Calculation of Central Tendency Measures

This involves determining the mean, median, and mode of a dataset. These measures indicate where the center of the dataset lies.

Calculation of Dispersion Measures

This involves calculating the range, variance, standard deviation, and interquartile range. These measures indicate how spread out the data is.

Calculation of Position Measures

This involves determining percentiles and quartiles, which tell us about the position of particular data points within the overall data distribution.

Calculation of Association Measures

This involves calculating statistics like correlation and covariance to understand relationships between variables.

Summary Statistics

Often, a collection of several descriptive statistics is presented together in what’s known as a “summary statistics” table. This provides a comprehensive snapshot of the data at a glanc

Descriptive Statistics Examples

Descriptive Statistics Examples are as follows:

Example 1: Student Grades

Let’s say a teacher has the following set of grades for 7 students: 85, 90, 88, 92, 78, 88, and 94. The teacher could use descriptive statistics to summarize this data:

  • Mean (average) : (85 + 90 + 88 + 92 + 78 + 88 + 94)/7 = 88
  • Median (middle value) : First, rearrange the grades in ascending order (78, 85, 88, 88, 90, 92, 94). The median grade is 88.
  • Mode (most frequent value) : The grade 88 appears twice, more frequently than any other grade, so it’s the mode.
  • Range (difference between highest and lowest) : 94 (highest) – 78 (lowest) = 16
  • Variance and Standard Deviation : These would be calculated using the appropriate formulas, providing a measure of the dispersion of the grades.

Example 2: Survey Data

A researcher conducts a survey on the number of hours of TV watched per day by people in a particular city. They collect data from 1,000 respondents and can use descriptive statistics to summarize this data:

  • Mean : Calculate the average hours of TV watched by adding all the responses and dividing by the total number of respondents.
  • Median : Sort the data and find the middle value.
  • Mode : Identify the most frequently reported number of hours watched.
  • Histogram : Create a histogram to visually display the frequency of responses. This could show, for example, that the majority of people watch 2-3 hours of TV per day.
  • Standard Deviation : Calculate this to find out how much variation there is from the average.

Importance of Descriptive Statistics

Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important:

  • Data Summarization : Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large dataset, it’s often difficult to identify patterns or tendencies just by looking at the raw data. Descriptive statistics provide numerical and graphical summaries that can highlight important aspects of the data.
  • Data Simplification : They simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary, making it easier to understand and interpret the dataset.
  • Identification of Patterns and Trends : Descriptive statistics can help identify patterns and trends in the data, providing valuable insights. Measures like the mean and median can tell you about the central tendency of your data, while measures like the range and standard deviation tell you about the dispersion.
  • Data Comparison : By summarizing data into measures such as the mean and standard deviation, it’s easier to compare different datasets or different groups within a dataset.
  • Data Quality Assessment : Descriptive statistics can help identify errors or outliers in the data, which might indicate issues with data collection or entry.
  • Foundation for Further Analysis : Descriptive statistics are typically the first step in data analysis. They help create a foundation for further statistical or inferential analysis. In fact, advanced statistical techniques often assume that one has first examined their data using descriptive methods.

When to use Descriptive Statistics

They can be used in a wide range of situations, including:

  • Understanding a New Dataset : When you first encounter a new dataset, using descriptive statistics is a useful first step to understand the main characteristics of the data, such as the central tendency, dispersion, and distribution.
  • Data Exploration in Research : In the initial stages of a research project, descriptive statistics can help to explore the data, identify trends and patterns, and generate hypotheses for further testing.
  • Presenting Research Findings : Descriptive statistics can be used to present research findings in a clear and understandable way, often using visual aids like graphs or charts.
  • Monitoring and Quality Control : In fields like business or manufacturing, descriptive statistics are often used to monitor processes, track performance over time, and identify any deviations from expected standards.
  • Comparing Groups : Descriptive statistics can be used to compare different groups or categories within your data. For example, you might want to compare the average scores of two groups of students, or the variance in sales between different regions.
  • Reporting Survey Results : If you conduct a survey, you would use descriptive statistics to summarize the responses, such as calculating the percentage of respondents who agree with a certain statement.

Applications of Descriptive Statistics

Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications:

  • Business : Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month to understand trends, or use graphical representations like bar charts to present sales data.
  • Healthcare : In healthcare, descriptive statistics are used to summarize patient data, such as age, weight, blood pressure, or cholesterol levels. They are also used to describe the incidence and prevalence of diseases in a population.
  • Education : Educators use descriptive statistics to summarize student performance, like average test scores or grade distribution. This information can help identify areas where students are struggling and inform instructional decisions.
  • Social Sciences : Social scientists use descriptive statistics to summarize data collected from surveys, experiments, and observational studies. This can involve describing demographic characteristics of participants, response frequencies to survey items, and more.
  • Psychology : Psychologists use descriptive statistics to describe the characteristics of their study participants and the main findings of their research, such as the average score on a psychological test.
  • Sports : Sports analysts use descriptive statistics to summarize athlete and team performance, such as batting averages in baseball or points per game in basketball.
  • Government : Government agencies use descriptive statistics to summarize data about the population, such as census data on population size and demographics.
  • Finance and Economics : In finance, descriptive statistics can be used to summarize past investment performance or economic data, such as changes in stock prices or GDP growth rates.
  • Quality Control : In manufacturing, descriptive statistics can be used to summarize measures of product quality, such as the average dimensions of a product or the frequency of defects.

Limitations of Descriptive Statistics

While descriptive statistics are a crucial part of data analysis and provide valuable insights about a dataset, they do have certain limitations:

  • Lack of Depth : Descriptive statistics provide a summary of your data, but they can oversimplify the data, resulting in a loss of detail and potentially significant nuances.
  • Vulnerability to Outliers : Some descriptive measures, like the mean, are sensitive to outliers. A single extreme value can significantly skew your mean, making it less representative of your data.
  • Inability to Make Predictions : Descriptive statistics describe what has been observed in a dataset. They don’t allow you to make predictions or generalizations about unobserved data or larger populations.
  • No Insight into Correlations : While some descriptive statistics can hint at potential relationships between variables, they don’t provide detailed insights into the nature or strength of these relationships.
  • No Causality or Hypothesis Testing : Descriptive statistics cannot be used to determine cause and effect relationships or to test hypotheses. For these purposes, inferential statistics are needed.
  • Can Mislead : When used improperly, descriptive statistics can be used to present a misleading picture of the data. For instance, choosing to only report the mean without also reporting the standard deviation or range can hide a large amount of variability in the data.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Inferential Statistics

Inferential Statistics – Types, Methods and...

Discourse Analysis

Discourse Analysis – Methods, Types and Examples

Multidimensional Scaling

Multidimensional Scaling – Types, Formulas and...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

Methodological Framework

Methodological Framework – Types, Examples and...

Correlation Analysis

Correlation Analysis – Types, Methods and...

  • What is descriptive research?

Last updated

5 February 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

Descriptive research is a common investigatory model used by researchers in various fields, including social sciences, linguistics, and academia.

Read on to understand the characteristics of descriptive research and explore its underlying techniques, processes, and procedures.

Analyze your descriptive research

Dovetail streamlines analysis to help you uncover and share actionable insights

Descriptive research is an exploratory research method. It enables researchers to precisely and methodically describe a population, circumstance, or phenomenon.

As the name suggests, descriptive research describes the characteristics of the group, situation, or phenomenon being studied without manipulating variables or testing hypotheses . This can be reported using surveys , observational studies, and case studies. You can use both quantitative and qualitative methods to compile the data.

Besides making observations and then comparing and analyzing them, descriptive studies often develop knowledge concepts and provide solutions to critical issues. It always aims to answer how the event occurred, when it occurred, where it occurred, and what the problem or phenomenon is.

  • Characteristics of descriptive research

The following are some of the characteristics of descriptive research:

Quantitativeness

Descriptive research can be quantitative as it gathers quantifiable data to statistically analyze a population sample. These numbers can show patterns, connections, and trends over time and can be discovered using surveys, polls, and experiments.

Qualitativeness

Descriptive research can also be qualitative. It gives meaning and context to the numbers supplied by quantitative descriptive research .

Researchers can use tools like interviews, focus groups, and ethnographic studies to illustrate why things are what they are and help characterize the research problem. This is because it’s more explanatory than exploratory or experimental research.

Uncontrolled variables

Descriptive research differs from experimental research in that researchers cannot manipulate the variables. They are recognized, scrutinized, and quantified instead. This is one of its most prominent features.

Cross-sectional studies

Descriptive research is a cross-sectional study because it examines several areas of the same group. It involves obtaining data on multiple variables at the personal level during a certain period. It’s helpful when trying to understand a larger community’s habits or preferences.

Carried out in a natural environment

Descriptive studies are usually carried out in the participants’ everyday environment, which allows researchers to avoid influencing responders by collecting data in a natural setting. You can use online surveys or survey questions to collect data or observe.

Basis for further research

You can further dissect descriptive research’s outcomes and use them for different types of investigation. The outcomes also serve as a foundation for subsequent investigations and can guide future studies. For example, you can use the data obtained in descriptive research to help determine future research designs.

  • Descriptive research methods

There are three basic approaches for gathering data in descriptive research: observational, case study, and survey.

You can use surveys to gather data in descriptive research. This involves gathering information from many people using a questionnaire and interview .

Surveys remain the dominant research tool for descriptive research design. Researchers can conduct various investigations and collect multiple types of data (quantitative and qualitative) using surveys with diverse designs.

You can conduct surveys over the phone, online, or in person. Your survey might be a brief interview or conversation with a set of prepared questions intended to obtain quick information from the primary source.

Observation

This descriptive research method involves observing and gathering data on a population or phenomena without manipulating variables. It is employed in psychology, market research , and other social science studies to track and understand human behavior.

Observation is an essential component of descriptive research. It entails gathering data and analyzing it to see whether there is a relationship between the two variables in the study. This strategy usually allows for both qualitative and quantitative data analysis.

Case studies

A case study can outline a specific topic’s traits. The topic might be a person, group, event, or organization.

It involves using a subset of a larger group as a sample to characterize the features of that larger group.

You can generalize knowledge gained from studying a case study to benefit a broader audience.

This approach entails carefully examining a particular group, person, or event over time. You can learn something new about the study topic by using a small group to better understand the dynamics of the entire group.

  • Types of descriptive research

There are several types of descriptive study. The most well-known include cross-sectional studies, census surveys, sample surveys, case reports, and comparison studies.

Case reports and case series

In the healthcare and medical fields, a case report is used to explain a patient’s circumstances when suffering from an uncommon illness or displaying certain symptoms. Case reports and case series are both collections of related cases. They have aided the advancement of medical knowledge on countless occasions.

The normative component is an addition to the descriptive survey. In the descriptive–normative survey, you compare the study’s results to the norm.

Descriptive survey

This descriptive type of research employs surveys to collect information on various topics. This data aims to determine the degree to which certain conditions may be attained.

You can extrapolate or generalize the information you obtain from sample surveys to the larger group being researched.

Correlative survey

Correlative surveys help establish if there is a positive, negative, or neutral connection between two variables.

Performing census surveys involves gathering relevant data on several aspects of a given population. These units include individuals, families, organizations, objects, characteristics, and properties.

During descriptive research, you gather different degrees of interest over time from a specific population. Cross-sectional studies provide a glimpse of a phenomenon’s prevalence and features in a population. There are no ethical challenges with them and they are quite simple and inexpensive to carry out.

Comparative studies

These surveys compare the two subjects’ conditions or characteristics. The subjects may include research variables, organizations, plans, and people.

Comparison points, assumption of similarities, and criteria of comparison are three important variables that affect how well and accurately comparative studies are conducted.

For instance, descriptive research can help determine how many CEOs hold a bachelor’s degree and what proportion of low-income households receive government help.

  • Pros and cons

The primary advantage of descriptive research designs is that researchers can create a reliable and beneficial database for additional study. To conduct any inquiry, you need access to reliable information sources that can give you a firm understanding of a situation.

Quantitative studies are time- and resource-intensive, so knowing the hypotheses viable for testing is crucial. The basic overview of descriptive research provides helpful hints as to which variables are worth quantitatively examining. This is why it’s employed as a precursor to quantitative research designs.

Some experts view this research as untrustworthy and unscientific. However, there is no way to assess the findings because you don’t manipulate any variables statistically.

Cause-and-effect correlations also can’t be established through descriptive investigations. Additionally, observational study findings cannot be replicated, which prevents a review of the findings and their replication.

The absence of statistical and in-depth analysis and the rather superficial character of the investigative procedure are drawbacks of this research approach.

  • Descriptive research examples and applications

Several descriptive research examples are emphasized based on their types, purposes, and applications. Research questions often begin with “What is …” These studies help find solutions to practical issues in social science, physical science, and education.

Here are some examples and applications of descriptive research:

Determining consumer perception and behavior

Organizations use descriptive research designs to determine how various demographic groups react to a certain product or service.

For example, a business looking to sell to its target market should research the market’s behavior first. When researching human behavior in response to a cause or event, the researcher pays attention to the traits, actions, and responses before drawing a conclusion.

Scientific classification

Scientific descriptive research enables the classification of organisms and their traits and constituents.

Measuring data trends

A descriptive study design’s statistical capabilities allow researchers to track data trends over time. It’s frequently used to determine the study target’s current circumstances and underlying patterns.

Conduct comparison

Organizations can use a descriptive research approach to learn how various demographics react to a certain product or service. For example, you can study how the target market responds to a competitor’s product and use that information to infer their behavior.

  • Bottom line

A descriptive research design is suitable for exploring certain topics and serving as a prelude to larger quantitative investigations. It provides a comprehensive understanding of the “what” of the group or thing you’re investigating.

This research type acts as the cornerstone of other research methodologies . It is distinctive because it can use quantitative and qualitative research approaches at the same time.

What is descriptive research design?

Descriptive research design aims to systematically obtain information to describe a phenomenon, situation, or population. More specifically, it helps answer the what, when, where, and how questions regarding the research problem rather than the why.

How does descriptive research compare to qualitative research?

Despite certain parallels, descriptive research concentrates on describing phenomena, while qualitative research aims to understand people better.

How do you analyze descriptive research data?

Data analysis involves using various methodologies, enabling the researcher to evaluate and provide results regarding validity and reliability.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 6 February 2023

Last updated: 15 January 2024

Last updated: 6 October 2023

Last updated: 5 February 2023

Last updated: 16 April 2023

Last updated: 7 March 2023

Last updated: 9 March 2023

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

descriptive research data analysis methods

Users report unexpectedly high data usage, especially during streaming sessions.

descriptive research data analysis methods

Users find it hard to navigate from the home page to relevant playlists in the app.

descriptive research data analysis methods

It would be great to have a sleep timer feature, especially for bedtime listening.

descriptive research data analysis methods

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Chapter 14 Quantitative Analysis Descriptive Statistics

Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarize themselves with one of these programs for understanding the concepts described in this chapter.

Data Preparation

In research projects, data may be collected from a variety of sources: mail-in surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine -readable, numeric format, such as in a spreadsheet or a text file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation usually follows the following steps.

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, for measuring a construct such as “benefits of computers,” if a survey provided respondents with a checklist of b enefits that they could select from (i.e., they could choose as many of those benefits as they wanted), then the total number of checked items can be used as an aggregate measure of benefits. Note that many other forms of data, such as interview transcripts, cannot be converted into a numeric format for statistical analysis. Coding is especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format (e.g., SPSS stores data as .sav files), which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database, where they can be reorganized as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet and each measurement item can be represented as one column. The entered data should be frequently checked for accuracy, via occasional spot checks on a set of items or observations, during and after entry. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the “strongly agree” response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive. Such problems should be detected earlier during pretests and corrected before the main data collection process begins. During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion . Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation . For instance, if the missing value is one item in a multi-item scale, the imputed value may be the average of the respondent’s responses to remaining items on that scale. If the missing value belongs to a single-item scale, many researchers use the average of other respondent’s responses to that item as the imputed value. Such imputation may be biased if the missing value is of a systematic nature rather than a random nature. Two methods that can produce relatively unbiased estimates for imputation are the maximum likelihood procedures and multiple imputation methods, both of which are supported in popular software programs such as SPSS and SAS.

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items, where items convey the opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate Analysis

Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services (as a measure of their “religiosity”) using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for “did not answer.” If we count the number (or percentage) of observations within each category (except “did not answer” which is really a missing value rather than a category), and display it in the form of a table as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

descriptive research data analysis methods

Figure 14.1. Frequency distribution of religiosity.

With very large samples where observations are independent and random, the frequency distribution tends to follow a plot that looked like a bell-shaped curve (a smoothed bar chart of the frequency distribution) similar to that shown in Figure 14.2, where most observations are clustered toward the center of the range of values, and fewer and fewer observations toward the extreme ends of the range. Such a curve is called a normal distribution.

Central tendency is an estimate of the center of a distribution of values. There are three major estimates of central tendency: mean, median, and mode. The arithmetic mean (often simply called the “mean”) is the simple average of all values in a given distribution. Consider a set of eight test scores: 15, 22, 21, 18, 36, 15, 25, 15. The arithmetic mean of these values is (15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 = 20.875. Other types of means include geometric mean (n th root of the product of n numbers in a distribution) and harmonic mean (the reciprocal of the arithmetic means of the reciprocal of each value in a distribution), but these means are not very popular for statistical analysis of social research data.

The second measure of central tendency, the median , is the middle value within a range of values in a distribution. This is computed by sorting all values in a distribution in increasing order and selecting the middle value. In case there are two middle values (if there is an even number of values in a distribution), the average of the two middle values represent the median. In the above example, the sorted values are: 15, 15, 15, 18, 22, 21, 25, 36. The two middle values are 18 and 22, and hence the median is (18 + 22)/2 = 20.

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

Dispersion refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. Two common measures of dispersion are the range and standard deviation. The range is the difference between the highest and lowest values in a distribution. The range in our previous example is 36-15 = 21.

The range is particularly sensitive to the presence of outliers. For instance, if the highest value in the above distribution was 85 and the other vales remained the same, the range would be 85-15 = 70. Standard deviation , the second measure of dispersion, corrects for such outliers by using a formula that takes into account how close or how far each value from the distribution mean:

descriptive research data analysis methods

Figure 14.2. Normal distribution.

descriptive research data analysis methods

Table 14.1. Hypothetical data on age and self-esteem.

The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.” The histogram of each variable is shown on the left side of Figure 14.3. The formula for calculating bivariate correlation is:

descriptive research data analysis methods

Figure 14.3. Histogram and correlation plot of age and self-esteem.

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

H 0 : r = 0

H 1 : r ≠ 0

H 0 is called the null hypotheses , and H 1 is called the alternative hypothesis (sometimes, also represented as H a ). Although they may seem like two hypotheses, H 0 and H 1 actually represent a single hypothesis since they are direct opposites of each other. We are interested in testing H 1 rather than H 0 . Also note that H 1 is a non-directional hypotheses since it does not specify whether r is greater than or less than zero. Directional hypotheses will be specified as H 0 : r ≤ 0; H 1 : r > 0 (if we are testing for a positive correlation). Significance testing of directional hypothesis is done using a one-tailed t-test, while that for non-directional hypothesis is done using a two-tailed t-test.

In statistical testing, the alternative hypothesis cannot be tested directly. Rather, it is tested indirectly by rejecting the null hypotheses with a certain level of probability. Statistical testing is always probabilistic, because we are never sure if our inferences, based on sample data, apply to the population, since our sample never equals the population. The probability that a statistical inference is caused pure chance is called the p-value . The p-value is compared with the significance level (α), which represents the maximum level of risk that we are willing to take that our inference is incorrect. For most statistical analysis, α is set to 0.05. A p-value less than α=0.05 indicates that we have enough statistical evidence to reject the null hypothesis, and thereby, indirectly accept the alternative hypothesis. If p>0.05, then we do not have adequate statistical evidence to reject the null hypothesis or accept the alternative hypothesis.

The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text book on statistics or on the Internet (most software programs also perform significance testing). The critical value of r depends on our desired significance level (α = 0.05), the degrees of freedom (df), and whether the desired test is a one-tailed or two-tailed test. The degree of freedom is the number of values that can vary freely in any calculation of a statistic. In case of correlation, the df simply equals n – 2, or for the data in Table 14.1, df is 20 – 2 = 18. There are two different statistical tables for one-tailed and two -tailed test. In the two -tailed table, the critical value of r for α = 0.05 and df = 18 is 0.44. For our computed correlation of 0.79 to be significant, it must be larger than the critical value of 0.44 or less than -0.44. Since our computed value of 0.79 is greater than 0.44, we conclude that there is a significant correlation between age and self-esteem in our data set, or in other words, the odds are less than 5% that this correlation is a chance occurrence. Therefore, we can reject the null hypotheses that r ≤ 0, which is an indirect way of saying that the alternative hypothesis r > 0 is probably correct.

Most research studies involve more than two variables. If there are n variables, then we will have a total of n*(n-1)/2 possible correlations between these n variables. Such correlations are easily computed using a software program like SPSS, rather than manually using the formula for correlation (as we did in Table 14.1), and represented using a correlation matrix, as shown in Table 14.2. A correlation matrix is a matrix that lists the variable names along the first row and the first column, and depicts bivariate correlations between pairs of variables in the appropriate cell in the matrix. The values along the principal diagonal (from the top left to the bottom right corner) of this matrix are always 1, because any variable is always perfectly correlated with itself. Further, since correlations are non-directional, the correlation between variables V1 and V2 is the same as that between V2 and V1. Hence, the lower triangular matrix (values below the principal diagonal) is a mirror reflection of the upper triangular matrix (values above the principal diagonal), and therefore, we often list only the lower triangular matrix for simplicity. If the correlations involve variables measured using interval scales, then this specific type of correlations are called Pearson product moment correlations .

Another useful way of presenting bivariate data is cross-tabulation (often abbreviated to cross-tab, and sometimes called more formally as a contingency table). A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables. As an example, let us assume that we have the following observations of gender and grade for a sample of 20 students, as shown in Figure 14.3. Gender is a nominal variable (male/female or M/F), and grade is a categorical variable with three levels (A, B, and C). A simple cross-tabulation of the data may display the joint distribution of gender and grades (i.e., how many students of each gender are in each grade category, as a raw frequency count or as a percentage) in a 2 x 3 matrix. This matrix will help us see if A, B, and C grades are equally distributed across male and female students. The cross-tab data in Table 14.3 shows that the distribution of A grades is biased heavily toward female students: in a sample of 10 male and 10 female students, five female students received the A grade compared to only one male students. In contrast, the distribution of C grades is biased toward male students: three male students received a C grade, compared to only one female student. However, the distribution of B grades was somewhat uniform, with six male students and five female students. The last row and the last column of this table are called marginal totals because they indicate the totals across each category and displayed along the margins of the table.

descriptive research data analysis methods

Table 14.2. A hypothetical correlation matrix for eight variables.

descriptive research data analysis methods

Table 14.3. Example of cross-tab analysis.

Although we can see a distinct pattern of grade distribution between male and female students in Table 14.3, is this pattern real or “statistically significant”? In other words, do the above frequency counts differ from that that may be expected from pure chance? To answer this question, we should compute the expected count of observation in each cell of the 2 x 3 cross-tab matrix. This is done by multiplying the marginal column total and the marginal row total for each cell and dividing it by the total number of observations. For example, for the male/A grade cell, expected count = 5 * 10 / 20 = 2.5. In other words, we were expecting 2.5 male students to receive an A grade, but in reality, only one student received the A grade. Whether this difference between expected and actual count is significant can be tested using a chi-square test . The chi-square statistic can be computed as the average difference between observed and expected counts across all cells. We can then compare this number to the critical value associated with a desired probability level (p < 0.05) and the degrees of freedom, which is simply (m-1)*(n-1), where m and n are the number of rows and columns respectively. In this example, df = (2 – 1) * (3 – 1) = 2. From standard chi-square tables in any statistics book, the critical chi-square value for p=0.05 and df=2 is 5.99. The computed chi -square value, based on our observed data, is 1.00, which is less than the critical value. Hence, we must conclude that the observed grade pattern is not statistically different from the pattern that can be expected by pure chance.

  • Social Science Research: Principles, Methods, and Practices. Authored by : Anol Bhattacherjee. Provided by : University of South Florida. Located at : http://scholarcommons.usf.edu/oa_textbooks/3/ . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
  • What is PESTLE?
  • Entrepreneurs
  • Permissions
  • Privacy Policy

PESTLE Analysis

Descriptive Analysis: How-To, Types, Examples

PESTLEanalysis Team

We review the basics of descriptive analysis, including what exactly it is, what benefits it has, how to do it, as well as some types and examples.

From diagnostic to predictive, there are many different types of data analysis . Perhaps the most straightforward of them is descriptive analysis, which seeks to describe or summarize past and present data, helping to create accessible data insights. In this short guide, we'll review the basics of descriptive analysis, including what exactly it is, what benefits it has, how to do it, as well as some types and examples.

What Is Descriptive Analysis?

Descriptive analysis, also known as descriptive analytics or descriptive statistics, is the process of using statistical techniques to describe or summarize a set of data. As one of the major types of data analysis, descriptive analysis is popular for its ability to generate accessible insights from otherwise uninterpreted data.

Unlike other types of data analysis, the descriptive analysis does not attempt to make predictions about the future. Instead, it draws insights solely from past data, by manipulating in ways that make it more meaningful.

Benefits of Descriptive Analysis

Descriptive analysis is all about trying to describe or summarize data. Although it doesn't make predictions about the future, it can still be extremely valuable in business environments . This is chiefly because descriptive analysis makes it easier to consume data, which can make it easier for analysts to act on.

Another benefit of descriptive analysis is that it can help to filter out less meaningful data. This is because the statistical techniques used within this type of analysis usually focus on the patterns in data, and not the outliers.

Types of Descriptive Analysis

According to CampusLabs.com , descriptive analysis can be categorized as one of four types. They are measures of frequency, central tendency, dispersion or variation, and position.

Measures of Frequency

In descriptive analysis, it's essential to know how frequently a certain event or response occurs. This is the purpose of measures of frequency, like a count or percent. For example, consider a survey where 1,000 participants are asked about their favourite ice cream flavor. A list of 1,000 responses would be difficult to consume, but the data can be made much more accessible by measuring how many times a certain flavor was selected.

Measures of Central Tendency

In descriptive analysis, it's also worth knowing the central (or average) event or response. Common measures of central tendency include the three averages — mean, median, and mode. As an example, consider a survey in which the height of 1,000 people is measured. In this case, the mean average would be a very helpful descriptive metric.

Measures of Dispersion

Sometimes, it may be worth knowing how data is distributed across a range. To illustrate this, consider the average height in a sample of two people. If both individuals are six feet tall, the average height is six feet. However, if one individual is five feet tall and the other is seven feet tall, the average height is still six feet. In order to measure this kind of distribution, measures of dispersion like range or standard deviation can be employed.

Measures of Position

Last of all, descriptive analysis can involve identifying the position of one event or response in relation to others. This is where measures like percentiles and quartiles can be used.

descriptive-analysis-charts

How to Do Descriptive Analysis

Like many types of data analysis, descriptive analysis can be quite open-ended. In other words, it's up to you what you want to look for in your analysis. With that said, the process of descriptive analysis usually consists of the same few steps.

  • Collect data

The first step in any type of data analysis is to collect the data. This can be done in a variety of ways, but surveys and good old fashioned measurements are often used.

Another important step in descriptive and other types of data analysis is to clean the data. This is because data may be formatted in inaccessible ways, which will make it difficult to manipulate with statistics. Cleaning data may involve changing its textual format, categorizing it, and/or removing outliers.

  • Apply methods

Finally, descriptive analysis involves applying the chosen statistical methods so as to draw the desired conclusions. What methods you choose will depend on the data you are dealing with and what you are looking to determine. If in doubt, review the four types of descriptive analysis methods explained above.

When to Do Descriptive Analysis

Descriptive analysis is often used when reviewing any past or present data. This is because raw data is difficult to consume and interpret, while the metrics offered by descriptive analysis are much more focused.

Descriptive analysis can also be conducted as the precursor to diagnostic or predictive analysis , providing insights into what has happened in the past before attempting to explain why it happened or predicting what will happen in the future.

Descriptive Analysis Example

As an example of descriptive analysis, consider an insurance company analyzing its customer base.

The insurance company may know certain traits about its customers, such as their gender, age, and nationality. To gain a better profile of their customers, the insurance company can apply descriptive analysis.

Measures of frequency can be used to identify how many customers are under a certain age; measures of central tendency can be used to identify who most of their customers are; measures of dispersion can be used to identify the variation in, for example, the age of their customers; finally, measures of position can be used to compare segments of customers based on specific traits.

Final Thoughts

Descriptive analysis is a popular type of data analysis. It's often conducted before diagnostic or predictive analysis, as it simply aims to describe and summarize past data.

To do so, descriptive analysis uses a variety of statistical techniques, including measures of frequency, central tendency, dispersion, and position. How exactly you conduct descriptive analysis will depend on what you are looking to find out, but the steps usually involve collecting, cleaning, and finally analyzing data.

In any case, this business analysis process is invaluable when working with data.

Image by  Pexels

Integration of Cybersecurity in Market Analysis: How to Apply Different Methodologies to Identify Risks

Allen Brown

Streamlining Your Business Analysis with a Master's in Lean Manufacturing

Technological factors affecting business to include in pestle analysis.

Jim Makos

  • Descriptive Research Designs: Types, Examples & Methods

busayo.longe

One of the components of research is getting enough information about the research problem—the what, how, when and where answers, which is why descriptive research is an important type of research. It is very useful when conducting research whose aim is to identify characteristics, frequencies, trends, correlations, and categories.

This research method takes a problem with little to no relevant information and gives it a befitting description using qualitative and quantitative research method s. Descriptive research aims to accurately describe a research problem.

In the subsequent sections, we will be explaining what descriptive research means, its types, examples, and data collection methods.

What is Descriptive Research?

Descriptive research is a type of research that describes a population, situation, or phenomenon that is being studied. It focuses on answering the how, what, when, and where questions If a research problem, rather than the why.

This is mainly because it is important to have a proper understanding of what a research problem is about before investigating why it exists in the first place. 

For example, an investor considering an investment in the ever-changing Amsterdam housing market needs to understand what the current state of the market is, how it changes (increasing or decreasing), and when it changes (time of the year) before asking for the why. This is where descriptive research comes in.

What Are The Types of Descriptive Research?

Descriptive research is classified into different types according to the kind of approach that is used in conducting descriptive research. The different types of descriptive research are highlighted below:

  • Descriptive-survey

Descriptive survey research uses surveys to gather data about varying subjects. This data aims to know the extent to which different conditions can be obtained among these subjects.

For example, a researcher wants to determine the qualification of employed professionals in Maryland. He uses a survey as his research instrument , and each item on the survey related to qualifications is subjected to a Yes/No answer. 

This way, the researcher can describe the qualifications possessed by the employed demographics of this community. 

  • Descriptive-normative survey

This is an extension of the descriptive survey, with the addition being the normative element. In the descriptive-normative survey, the results of the study should be compared with the norm.

For example, an organization that wishes to test the skills of its employees by a team may have them take a skills test. The skills tests are the evaluation tool in this case, and the result of this test is compared with the norm of each role.

If the score of the team is one standard deviation above the mean, it is very satisfactory, if within the mean, satisfactory, and one standard deviation below the mean is unsatisfactory.

  • Descriptive-status

This is a quantitative description technique that seeks to answer questions about real-life situations. For example, a researcher researching the income of the employees in a company, and the relationship with their performance.

A survey will be carried out to gather enough data about the income of the employees, then their performance will be evaluated and compared to their income. This will help determine whether a higher income means better performance and low income means lower performance or vice versa.

  • Descriptive-analysis

The descriptive-analysis method of research describes a subject by further analyzing it, which in this case involves dividing it into 2 parts. For example, the HR personnel of a company that wishes to analyze the job role of each employee of the company may divide the employees into the people that work at the Headquarters in the US and those that work from Oslo, Norway office.

A questionnaire is devised to analyze the job role of employees with similar salaries and who work in similar positions.

  • Descriptive classification

This method is employed in biological sciences for the classification of plants and animals. A researcher who wishes to classify the sea animals into different species will collect samples from various search stations, then classify them accordingly.

  • Descriptive-comparative

In descriptive-comparative research, the researcher considers 2 variables that are not manipulated, and establish a formal procedure to conclude that one is better than the other. For example, an examination body wants to determine the better method of conducting tests between paper-based and computer-based tests.

A random sample of potential participants of the test may be asked to use the 2 different methods, and factors like failure rates, time factors, and others will be evaluated to arrive at the best method.

  • Correlative Survey

Correlative surveys are used to determine whether the relationship between 2 variables is positive, negative, or neutral. That is, if 2 variables say X and Y are directly proportional, inversely proportional or are not related to each other.

Examples of Descriptive Research

There are different examples of descriptive research, that may be highlighted from its types, uses, and applications. However, we will be restricting ourselves to only 3 distinct examples in this article.

  • Comparing Student Performance:

An academic institution may wish 2 compare the performance of its junior high school students in English language and Mathematics. This may be used to classify students based on 2 major groups, with one group going ahead to study while courses, while the other study courses in the Arts & Humanities field.

Students who are more proficient in mathematics will be encouraged to go into STEM and vice versa. Institutions may also use this data to identify students’ weak points and work on ways to assist them.

  • Scientific Classification

During the major scientific classification of plants, animals, and periodic table elements, the characteristics and components of each subject are evaluated and used to determine how they are classified.

For example, living things may be classified into kingdom Plantae or kingdom animal is depending on their nature. Further classification may group animals into mammals, pieces, vertebrae, invertebrae, etc. 

All these classifications are made a result of descriptive research which describes what they are.

  • Human Behavior

When studying human behaviour based on a factor or event, the researcher observes the characteristics, behaviour, and reaction, then use it to conclude. A company willing to sell to its target market needs to first study the behaviour of the market.

This may be done by observing how its target reacts to a competitor’s product, then use it to determine their behaviour.

What are the Characteristics of Descriptive Research?  

The characteristics of descriptive research can be highlighted from its definition, applications, data collection methods, and examples. Some characteristics of descriptive research are:

  • Quantitativeness

Descriptive research uses a quantitative research method by collecting quantifiable information to be used for statistical analysis of the population sample. This is very common when dealing with research in the physical sciences.

  • Qualitativeness

It can also be carried out using the qualitative research method, to properly describe the research problem. This is because descriptive research is more explanatory than exploratory or experimental.

  • Uncontrolled variables

In descriptive research, researchers cannot control the variables like they do in experimental research.

  • The basis for further research

The results of descriptive research can be further analyzed and used in other research methods. It can also inform the next line of research, including the research method that should be used.

This is because it provides basic information about the research problem, which may give birth to other questions like why a particular thing is the way it is.

Why Use Descriptive Research Design?  

Descriptive research can be used to investigate the background of a research problem and get the required information needed to carry out further research. It is used in multiple ways by different organizations, and especially when getting the required information about their target audience.

  • Define subject characteristics :

It is used to determine the characteristics of the subjects, including their traits, behaviour, opinion, etc. This information may be gathered with the use of surveys, which are shared with the respondents who in this case, are the research subjects.

For example, a survey evaluating the number of hours millennials in a community spends on the internet weekly, will help a service provider make informed business decisions regarding the market potential of the community.

  • Measure Data Trends

It helps to measure the changes in data over some time through statistical methods. Consider the case of individuals who want to invest in stock markets, so they evaluate the changes in prices of the available stocks to make a decision investment decision.

Brokerage companies are however the ones who carry out the descriptive research process, while individuals can view the data trends and make decisions.

Descriptive research is also used to compare how different demographics respond to certain variables. For example, an organization may study how people with different income levels react to the launch of a new Apple phone.

This kind of research may take a survey that will help determine which group of individuals are purchasing the new Apple phone. Do the low-income earners also purchase the phone, or only the high-income earners do?

Further research using another technique will explain why low-income earners are purchasing the phone even though they can barely afford it. This will help inform strategies that will lure other low-income earners and increase company sales.

  • Validate existing conditions

When you are not sure about the validity of an existing condition, you can use descriptive research to ascertain the underlying patterns of the research object. This is because descriptive research methods make an in-depth analysis of each variable before making conclusions.

  • Conducted Overtime

Descriptive research is conducted over some time to ascertain the changes observed at each point in time. The higher the number of times it is conducted, the more authentic the conclusion will be.

What are the Disadvantages of Descriptive Research?  

  • Response and Non-response Bias

Respondents may either decide not to respond to questions or give incorrect responses if they feel the questions are too confidential. When researchers use observational methods, respondents may also decide to behave in a particular manner because they feel they are being watched.

  • The researcher may decide to influence the result of the research due to personal opinion or bias towards a particular subject. For example, a stockbroker who also has a business of his own may try to lure investors into investing in his own company by manipulating results.
  • A case-study or sample taken from a large population is not representative of the whole population.
  • Limited scope:The scope of descriptive research is limited to the what of research, with no information on why thereby limiting the scope of the research.

What are the Data Collection Methods in Descriptive Research?  

There are 3 main data collection methods in descriptive research, namely; observational method, case study method, and survey research.

1. Observational Method

The observational method allows researchers to collect data based on their view of the behaviour and characteristics of the respondent, with the respondents themselves not directly having an input. It is often used in market research, psychology, and some other social science research to understand human behaviour.

It is also an important aspect of physical scientific research, with it being one of the most effective methods of conducting descriptive research . This process can be said to be either quantitative or qualitative.

Quantitative observation involved the objective collection of numerical data , whose results can be analyzed using numerical and statistical methods. 

Qualitative observation, on the other hand, involves the monitoring of characteristics and not the measurement of numbers. The researcher makes his observation from a distance, records it, and is used to inform conclusions.

2. Case Study Method

A case study is a sample group (an individual, a group of people, organizations, events, etc.) whose characteristics are used to describe the characteristics of a larger group in which the case study is a subgroup. The information gathered from investigating a case study may be generalized to serve the larger group.

This generalization, may, however, be risky because case studies are not sufficient to make accurate predictions about larger groups. Case studies are a poor case of generalization.

3. Survey Research

This is a very popular data collection method in research designs. In survey research, researchers create a survey or questionnaire and distribute it to respondents who give answers.

Generally, it is used to obtain quick information directly from the primary source and also conducting rigorous quantitative and qualitative research. In some cases, survey research uses a blend of both qualitative and quantitative strategies.

Survey research can be carried out both online and offline using the following methods

  • Online Surveys: This is a cheap method of carrying out surveys and getting enough responses. It can be carried out using Formplus, an online survey builder. Formplus has amazing tools and features that will help increase response rates.
  • Offline Surveys: This includes paper forms, mobile offline forms , and SMS-based forms.

What Are The Differences Between Descriptive and Correlational Research?  

Before going into the differences between descriptive and correlation research, we need to have a proper understanding of what correlation research is about. Therefore, we will be giving a summary of the correlation research below.

Correlational research is a type of descriptive research, which is used to measure the relationship between 2 variables, with the researcher having no control over them. It aims to find whether there is; positive correlation (both variables change in the same direction), negative correlation (the variables change in the opposite direction), or zero correlation (there is no relationship between the variables).

Correlational research may be used in 2 situations;

(i) when trying to find out if there is a relationship between two variables, and

(ii) when a causal relationship is suspected between two variables, but it is impractical or unethical to conduct experimental research that manipulates one of the variables. 

Below are some of the differences between correlational and descriptive research:

  • Definitions :

Descriptive research aims is a type of research that provides an in-depth understanding of the study population, while correlational research is the type of research that measures the relationship between 2 variables. 

  • Characteristics :

Descriptive research provides descriptive data explaining what the research subject is about, while correlation research explores the relationship between data and not their description.

  • Predictions :

 Predictions cannot be made in descriptive research while correlation research accommodates the possibility of making predictions.

Descriptive Research vs. Causal Research

Descriptive research and causal research are both research methodologies, however, one focuses on a subject’s behaviors while the latter focuses on a relationship’s cause-and-effect. To buttress the above point, descriptive research aims to describe and document the characteristics, behaviors, or phenomena of a particular or specific population or situation. 

It focuses on providing an accurate and detailed account of an already existing state of affairs between variables. Descriptive research answers the questions of “what,” “where,” “when,” and “how” without attempting to establish any causal relationships or explain any underlying factors that might have caused the behavior.

Causal research, on the other hand, seeks to determine cause-and-effect relationships between variables. It aims to point out the factors that influence or cause a particular result or behavior. Causal research involves manipulating variables, controlling conditions or a subgroup, and observing the resulting effects. The primary objective of causal research is to establish a cause-effect relationship and provide insights into why certain phenomena happen the way they do.

Descriptive Research vs. Analytical Research

Descriptive research provides a detailed and comprehensive account of a specific situation or phenomenon. It focuses on describing and summarizing data without making inferences or attempting to explain underlying factors or the cause of the factor. 

It is primarily concerned with providing an accurate and objective representation of the subject of research. While analytical research goes beyond the description of the phenomena and seeks to analyze and interpret data to discover if there are patterns, relationships, or any underlying factors. 

It examines the data critically, applies statistical techniques or other analytical methods, and draws conclusions based on the discovery. Analytical research also aims to explore the relationships between variables and understand the underlying mechanisms or processes involved.

Descriptive Research vs. Exploratory Research

Descriptive research is a research method that focuses on providing a detailed and accurate account of a specific situation, group, or phenomenon. This type of research describes the characteristics, behaviors, or relationships within the given context without looking for an underlying cause. 

Descriptive research typically involves collecting and analyzing quantitative or qualitative data to generate descriptive statistics or narratives. Exploratory research differs from descriptive research because it aims to explore and gain firsthand insights or knowledge into a relatively unexplored or poorly understood topic. 

It focuses on generating ideas, hypotheses, or theories rather than providing definitive answers. Exploratory research is often conducted at the early stages of a research project to gather preliminary information and identify key variables or factors for further investigation. It involves open-ended interviews, observations, or small-scale surveys to gather qualitative data.

Read More – Exploratory Research: What are its Method & Examples?

Descriptive Research vs. Experimental Research

Descriptive research aims to describe and document the characteristics, behaviors, or phenomena of a particular population or situation. It focuses on providing an accurate and detailed account of the existing state of affairs. 

Descriptive research typically involves collecting data through surveys, observations, or existing records and analyzing the data to generate descriptive statistics or narratives. It does not involve manipulating variables or establishing cause-and-effect relationships.

Experimental research, on the other hand, involves manipulating variables and controlling conditions to investigate cause-and-effect relationships. It aims to establish causal relationships by introducing an intervention or treatment and observing the resulting effects. 

Experimental research typically involves randomly assigning participants to different groups, such as control and experimental groups, and measuring the outcomes. It allows researchers to control for confounding variables and draw causal conclusions.

Related – Experimental vs Non-Experimental Research: 15 Key Differences

Descriptive Research vs. Explanatory Research

Descriptive research focuses on providing a detailed and accurate account of a specific situation, group, or phenomenon. It aims to describe the characteristics, behaviors, or relationships within the given context. 

Descriptive research is primarily concerned with providing an objective representation of the subject of study without explaining underlying causes or mechanisms. Explanatory research seeks to explain the relationships between variables and uncover the underlying causes or mechanisms. 

It goes beyond description and aims to understand the reasons or factors that influence a particular outcome or behavior. Explanatory research involves analyzing data, conducting statistical analyses, and developing theories or models to explain the observed relationships.

Descriptive Research vs. Inferential Research

Descriptive research focuses on describing and summarizing data without making inferences or generalizations beyond the specific sample or population being studied. It aims to provide an accurate and objective representation of the subject of study. 

Descriptive research typically involves analyzing data to generate descriptive statistics, such as means, frequencies, or percentages, to describe the characteristics or behaviors observed.

Inferential research, however, involves making inferences or generalizations about a larger population based on a smaller sample. 

It aims to draw conclusions about the population characteristics or relationships by analyzing the sample data. Inferential research uses statistical techniques to estimate population parameters, test hypotheses, and determine the level of confidence or significance in the findings.

Related – Inferential Statistics: Definition, Types + Examples

Conclusion  

The uniqueness of descriptive research partly lies in its ability to explore both quantitative and qualitative research methods. Therefore, when conducting descriptive research, researchers have the opportunity to use a wide variety of techniques that aids the research process.

Descriptive research explores research problems in-depth, beyond the surface level thereby giving a detailed description of the research subject. That way, it can aid further research in the field, including other research methods .

It is also very useful in solving real-life problems in various fields of social science, physical science, and education.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • descriptive research
  • descriptive research method
  • example of descriptive research
  • types of descriptive research
  • busayo.longe

Formplus

You may also like:

Acceptance Sampling: Meaning, Examples, When to Use

In this post, we will discuss extensively what acceptance sampling is and when it is applied.

descriptive research data analysis methods

Type I vs Type II Errors: Causes, Examples & Prevention

This article will discuss the two different types of errors in hypothesis testing and how you can prevent them from occurring in your research

Extrapolation in Statistical Research: Definition, Examples, Types, Applications

In this article we’ll look at the different types and characteristics of extrapolation, plus how it contrasts to interpolation.

Cross-Sectional Studies: Types, Pros, Cons & Uses

In this article, we’ll look at what cross-sectional studies are, how it applies to your research and how to use Formplus to collect...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Research Design
  • Conclusion Validity
  • Data Preparation
  • Correlation
  • Inferential Statistics
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics . With descriptive statistics you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data.

Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is performing in baseball, the batting average. This single number is simply the number of hits divided by the number of times at bat (reported to three significant digits). A batter who is hitting .333 is getting a hit one time in every three at bats. One batting .250 is hitting one time in four. The single number describes a large number of discrete events. Or, consider the scourge of many students, the Grade Point Average (GPA). This single number describes the general performance of a student across a potentially wide range of course experiences.

Every time you try to describe a large set of observations with a single indicator you run the risk of distorting the original data or losing important detail. The batting average doesn’t tell you whether the batter is hitting home runs or singles. It doesn’t tell whether she’s been in a slump or on a streak. The GPA doesn’t tell you whether the student was in difficult courses or easy ones, or whether they were courses in their major field or in other disciplines. Even given these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.

Univariate Analysis

Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at:

  • the distribution
  • the central tendency
  • the dispersion

In most situations, we would describe all three of these characteristics for each of the variables in our study.

The Distribution

The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. For instance, a typical way to describe the distribution of college students is by year in college, listing the number or percent of students at each of the four years. Or, we describe gender by listing the number or percent of males and females. In these cases, the variable has few enough values that we can list each one and summarize how many sample cases had the value. But what do we do for a variable like income or GPA? With these variables there can be a large number of possible values, with relatively few people having each one. In this case, we group the raw scores into categories according to ranges of values. For instance, we might look at GPA according to the letter grade ranges. Or, we might group income into four or five ranges of income values.

CategoryPercent
Under 35 years old9%
36–4521%
46–5545%
56–6519%
66+6%

One of the most common ways to describe a single variable is with a frequency distribution . Depending on the particular variable, all of the data values may be represented, or you may group the values into categories first (e.g. with age, price, or temperature variables, it would usually not be sensible to determine the frequencies for each value. Rather, the value are grouped into ranges and the frequencies determined.). Frequency distributions can be depicted in two ways, as a table or as a graph. The table above shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 1. This type of graph is often referred to as a histogram or bar chart.

Distributions may also be displayed using percentages. For example, you could use percentages to describe the:

  • percentage of people in different income levels
  • percentage of people in different age ranges
  • percentage of people in different ranges of standardized test scores

Central Tendency

The central tendency of a distribution is an estimate of the “center” of a distribution of values. There are three major types of estimates of central tendency:

The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of values. For example, the mean or average quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:

The sum of these 8 values is 167 , so the mean is 167/8 = 20.875 .

The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample. For example, if there are 500 scores in the list, score #250 would be the median. If we order the 8 scores shown above, we would get:

There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20 , the median is 20 . If the two middle scores had different values, you would have to interpolate to determine the median.

The Mode is the most frequently occurring value in the set of scores. To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the model. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently.

Notice that for the same set of 8 scores we got three different values ( 20.875 , 20 , and 15 ) for the mean, median and mode respectively. If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode are all equal to each other.

Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15 , so the range is 36 - 15 = 21 .

The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample. Again lets take the set of scores:

to compute the standard deviation, we first find the distance between each value and the mean. We know from above that the mean is 20.875 . So, the differences from the mean are:

Notice that values that are below the mean have negative discrepancies and values above it have positive ones. Next, we square each discrepancy:

Now, we take these “squares” and sum them to get the Sum of Squares (SS) value. Here, the sum is 350.875 . Next, we divide this sum by the number of scores minus 1 . Here, the result is 350.875 / 7 = 50.125 . This value is known as the variance . To get the standard deviation, we take the square root of the variance (remember that we squared the deviations earlier). This would be SQRT(50.125) = 7.079901129253 .

Although this computation may seem convoluted, it’s actually quite simple. To see this, consider the formula for the standard deviation:

  • X is each score,
  • X̄ is the mean (or average),
  • n is the number of values,
  • Σ means we sum across the values.

In the top part of the ratio, the numerator, we see that each score has the mean subtracted from it, the difference is squared, and the squares are summed. In the bottom part, we take the number of scores minus 1 . The ratio is the variance and the square root is the standard deviation. In English, we can describe the standard deviation as:

the square root of the sum of the squared deviations from the mean divided by the number of scores minus one.

Although we can calculate these univariate statistics by hand, it gets quite tedious when you have more than a few values and variables. Every statistics program is capable of calculating them easily for you. For instance, I put the eight scores into SPSS and got the following table as a result:

MetricValue
N8
Mean20.8750
Median20.0000
Mode15.00
Standard Deviation7.0799
Variance50.1250
Range21.00

which confirms the calculations I did by hand above.

The standard deviation allows us to reach some conclusions about specific scores in our distribution. Assuming that the distribution of scores is normal or bell-shaped (or close to it!), the following conclusions can be reached:

  • approximately 68% of the scores in the sample fall within one standard deviation of the mean
  • approximately 95% of the scores in the sample fall within two standard deviations of the mean
  • approximately 99% of the scores in the sample fall within three standard deviations of the mean

For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799 , we can from the above statement estimate that approximately 95% of the scores will fall in the range of 20.875-(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348 . This kind of information is a critical stepping stone to enabling us to compare the performance of an individual on one variable with their performance on another, even when the variables are measured on entirely different scales.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Educational resources and simple solutions for your research journey

What is Descriptive Research? Definition, Methods, Types and Examples

What is Descriptive Research? Definition, Methods, Types and Examples

Descriptive research is a methodological approach that seeks to depict the characteristics of a phenomenon or subject under investigation. In scientific inquiry, it serves as a foundational tool for researchers aiming to observe, record, and analyze the intricate details of a particular topic. This method provides a rich and detailed account that aids in understanding, categorizing, and interpreting the subject matter.

Descriptive research design is widely employed across diverse fields, and its primary objective is to systematically observe and document all variables and conditions influencing the phenomenon.

After this descriptive research definition, let’s look at this example. Consider a researcher working on climate change adaptation, who wants to understand water management trends in an arid village in a specific study area. She must conduct a demographic survey of the region, gather population data, and then conduct descriptive research on this demographic segment. The study will then uncover details on “what are the water management practices and trends in village X.” Note, however, that it will not cover any investigative information about “why” the patterns exist.

Table of Contents

What is descriptive research?

If you’ve been wondering “What is descriptive research,” we’ve got you covered in this post! In a nutshell, descriptive research is an exploratory research method that helps a researcher describe a population, circumstance, or phenomenon. It can help answer what , where , when and how questions, but not why questions. In other words, it does not involve changing the study variables and does not seek to establish cause-and-effect relationships.

descriptive research data analysis methods

Importance of descriptive research

Now, let’s delve into the importance of descriptive research. This research method acts as the cornerstone for various academic and applied disciplines. Its primary significance lies in its ability to provide a comprehensive overview of a phenomenon, enabling researchers to gain a nuanced understanding of the variables at play. This method aids in forming hypotheses, generating insights, and laying the groundwork for further in-depth investigations. The following points further illustrate its importance:

Provides insights into a population or phenomenon: Descriptive research furnishes a comprehensive overview of the characteristics and behaviors of a specific population or phenomenon, thereby guiding and shaping the research project.

Offers baseline data: The data acquired through this type of research acts as a reference for subsequent investigations, laying the groundwork for further studies.

Allows validation of sampling methods: Descriptive research validates sampling methods, aiding in the selection of the most effective approach for the study.

Helps reduce time and costs: It is cost-effective and time-efficient, making this an economical means of gathering information about a specific population or phenomenon.

Ensures replicability: Descriptive research is easily replicable, ensuring a reliable way to collect and compare information from various sources.

When to use descriptive research design?

Determining when to use descriptive research depends on the nature of the research question. Before diving into the reasons behind an occurrence, understanding the how, when, and where aspects is essential. Descriptive research design is a suitable option when the research objective is to discern characteristics, frequencies, trends, and categories without manipulating variables. It is therefore often employed in the initial stages of a study before progressing to more complex research designs. To put it in another way, descriptive research precedes the hypotheses of explanatory research. It is particularly valuable when there is limited existing knowledge about the subject.

Some examples are as follows, highlighting that these questions would arise before a clear outline of the research plan is established:

  • In the last two decades, what changes have occurred in patterns of urban gardening in Mumbai?
  • What are the differences in climate change perceptions of farmers in coastal versus inland villages in the Philippines?

Characteristics of descriptive research

Coming to the characteristics of descriptive research, this approach is characterized by its focus on observing and documenting the features of a subject. Specific characteristics are as below.

  • Quantitative nature: Some descriptive research types involve quantitative research methods to gather quantifiable information for statistical analysis of the population sample.
  • Qualitative nature: Some descriptive research examples include those using the qualitative research method to describe or explain the research problem.
  • Observational nature: This approach is non-invasive and observational because the study variables remain untouched. Researchers merely observe and report, without introducing interventions that could impact the subject(s).
  • Cross-sectional nature: In descriptive research, different sections belonging to the same group are studied, providing a “snapshot” of sorts.
  • Springboard for further research: The data collected are further studied and analyzed using different research techniques. This approach helps guide the suitable research methods to be employed.

Types of descriptive research

There are various descriptive research types, each suited to different research objectives. Take a look at the different types below.

  • Surveys: This involves collecting data through questionnaires or interviews to gather qualitative and quantitative data.
  • Observational studies: This involves observing and collecting data on a particular population or phenomenon without influencing the study variables or manipulating the conditions. These may be further divided into cohort studies, case studies, and cross-sectional studies:
  • Cohort studies: Also known as longitudinal studies, these studies involve the collection of data over an extended period, allowing researchers to track changes and trends.
  • Case studies: These deal with a single individual, group, or event, which might be rare or unusual.
  • Cross-sectional studies : A researcher collects data at a single point in time, in order to obtain a snapshot of a specific moment.
  • Focus groups: In this approach, a small group of people are brought together to discuss a topic. The researcher moderates and records the group discussion. This can also be considered a “participatory” observational method.
  • Descriptive classification: Relevant to the biological sciences, this type of approach may be used to classify living organisms.

Descriptive research methods

Several descriptive research methods can be employed, and these are more or less similar to the types of approaches mentioned above.

  • Surveys: This method involves the collection of data through questionnaires or interviews. Surveys may be done online or offline, and the target subjects might be hyper-local, regional, or global.
  • Observational studies: These entail the direct observation of subjects in their natural environment. These include case studies, dealing with a single case or individual, as well as cross-sectional and longitudinal studies, for a glimpse into a population or changes in trends over time, respectively. Participatory observational studies such as focus group discussions may also fall under this method.

Researchers must carefully consider descriptive research methods, types, and examples to harness their full potential in contributing to scientific knowledge.

Examples of descriptive research

Now, let’s consider some descriptive research examples.

  • In social sciences, an example could be a study analyzing the demographics of a specific community to understand its socio-economic characteristics.
  • In business, a market research survey aiming to describe consumer preferences would be a descriptive study.
  • In ecology, a researcher might undertake a survey of all the types of monocots naturally occurring in a region and classify them up to species level.

These examples showcase the versatility of descriptive research across diverse fields.

Advantages of descriptive research

There are several advantages to this approach, which every researcher must be aware of. These are as follows:

  • Owing to the numerous descriptive research methods and types, primary data can be obtained in diverse ways and be used for developing a research hypothesis .
  • It is a versatile research method and allows flexibility.
  • Detailed and comprehensive information can be obtained because the data collected can be qualitative or quantitative.
  • It is carried out in the natural environment, which greatly minimizes certain types of bias and ethical concerns.
  • It is an inexpensive and efficient approach, even with large sample sizes

Disadvantages of descriptive research

On the other hand, this design has some drawbacks as well:

  • It is limited in its scope as it does not determine cause-and-effect relationships.
  • The approach does not generate new information and simply depends on existing data.
  • Study variables are not manipulated or controlled, and this limits the conclusions to be drawn.
  • Descriptive research findings may not be generalizable to other populations.
  • Finally, it offers a preliminary understanding rather than an in-depth understanding.

To reiterate, the advantages of descriptive research lie in its ability to provide a comprehensive overview, aid hypothesis generation, and serve as a preliminary step in the research process. However, its limitations include a potential lack of depth, inability to establish cause-and-effect relationships, and susceptibility to bias.

Frequently asked questions

When should researchers conduct descriptive research.

Descriptive research is most appropriate when researchers aim to portray and understand the characteristics of a phenomenon without manipulating variables. It is particularly valuable in the early stages of a study.

What is the difference between descriptive and exploratory research?

Descriptive research focuses on providing a detailed depiction of a phenomenon, while exploratory research aims to explore and generate insights into an issue where little is known.

What is the difference between descriptive and experimental research?

Descriptive research observes and documents without manipulating variables, whereas experimental research involves intentional interventions to establish cause-and-effect relationships.

Is descriptive research only for social sciences?

No, various descriptive research types may be applicable to all fields of study, including social science, humanities, physical science, and biological science.

How important is descriptive research?

The importance of descriptive research lies in its ability to provide a glimpse of the current state of a phenomenon, offering valuable insights and establishing a basic understanding. Further, the advantages of descriptive research include its capacity to offer a straightforward depiction of a situation or phenomenon, facilitate the identification of patterns or trends, and serve as a useful starting point for more in-depth investigations. Additionally, descriptive research can contribute to the development of hypotheses and guide the formulation of research questions for subsequent studies.

Researcher.Life is a subscription-based platform that unifies top AI tools and services designed to speed up, simplify, and streamline a researcher’s journey, from reading to writing, submission, promotion and more. Based on over 20 years of experience in academia, Researcher.Life empowers researchers to put their best research forward and move closer to success.

Try for free or sign up for the Researcher.Life  All Access Pack , a one-of-a-kind subscription that unlocks full access to an AI academic writing assistant, literature reading app, journal finder, scientific illustration tool, and exclusive discounts on professional services from Editage. Find the best AI tools a researcher needs, all in one place –  Get All Access now for prices starting at just $17 a month !

Related Posts

article recommendation system

How Publishers Can Enhance Reader Engagement with R Discovery’s Article Recommendation System

research

What is Research? Definition, Types, Methods, and Examples

logo image missing

  • > Statistics

An Overview of Descriptive Analysis

  • Ayush Singh Rawat
  • Mar 31, 2021

An Overview of Descriptive Analysis title banner

Nowadays, Big Data and Data Science have become high volume keywords. They tend to become extensively researched and this makes this data to be processed and studied with scrutiny. One of the techniques to analyse this data is Descriptive Analysis.

This data needs to be analysed to provide great insights and influential trends that allows the next batch of content to be made in accordance to the general population’s liking or dis-liking.

Introduction

The conversion of raw data into a form that will make it easy to understand & interpret, ie., rearranging, ordering, and manipulating data to provide insightful information about the provided data.

Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns might emerge that fulfill every condition of the data.

It is one of the most important steps for conducting statistical data analysis . It gives you a conclusion of the distribution of your data, helps you detect typos and outliers, and enables you to identify similarities among variables, thus making you ready for conducting further statistical analyses.   

Techniques for Descriptive Analysis

Data aggregation and data mining are two techniques used in descriptive analysis to churn out historical data. In Data aggregation, data is first collected and then sorted in order to make the datasets more manageable.

Descriptive techniques often include constructing tables of quantiles and means, methods of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs" that can be used to carry out many disparate hypotheses. These hypotheses often highlight differences among subgroups.

Measures like segregation, discrimination, and inequality are studied using specialised descriptive techniques. Discrimination is measured with the help of audit studies or decomposition methods. More segregation on the basis of type or inequality of outcomes need not be wholly good or bad in itself, but it is often considered a marker of unjust social processes; accurate measurement of the different steps across space and time is a prerequisite to understanding these processes.

A table of means by subgroup is used to show important differences across subgroups, which mostly results in inference and conclusions being made. When we notice a gap in earnings, for example, we naturally tend to extrapolate reasons for those patterns complying. 

But this also enters the province of measuring impacts which requires the use of different techniques. Often, random variation causes difference in means, and statistical inference is required to determine whether observed differences could happen merely due to chance.

A crosstab or two-way tabulation is supposed to show the proportions of components with unique values for each of two variables available, or cell proportions. For example, we might tabulate the proportion of the population that has a high school degree and also receives food or cash assistance, meaning a crosstab of education versus receipt of assistance is supposed to be made. 

Then we might also want to examine row proportions, or the fractions in each education group who receive food or cash assistance, perhaps seeing assistance levels dip extraordinarily at higher education levels.

Column proportions can also be examined, for the fraction of population with different levels of education, but this is the opposite from any causal effects. We might come across a surprisingly high number or proportion of recipients with a college education, but this might be a result of larger numbers of people being college graduates than people who have less than a high school degree.

(Must check: 4 Types of Data in Statistics )

Types of Descriptive Analysis

Descriptive analysis can be categorized into four types which are measures of frequency, central tendency, dispersion or variation, and position. These methods are optimal for a single variable at a time.

the photo represents the different types of Descriptive analysis techniques, namely; Measures of frequency, measures of central tendency, measures of dispersion, measures of position, contingency tables and scatter plots.

Different types of Descriptive Analysis

Measures of Frequency

In descriptive analysis, it’s essential to know how frequently a certain event or response is likely to occur. This is the prime purpose of measures of frequency to make like a count or percent. 

For example, consider a survey where 500 participants are asked about their favourite IPL team. A list of 500 responses would be difficult to consume and accommodate, but the data can be made much more accessible by measuring how many times a certain IPL team was selected.

Measures of Central Tendency

In descriptive analysis, it’s also important to find out the Central (or average) Tendency or response. Central tendency is measured with the use of three averages — mean, median, and mode. As an example, consider a survey in which the weight of 1,000 people is measured. In this case, the mean average would be an excellent descriptive metric to measure mid-values.

Measures of Dispersion

Sometimes, it is important to know how data is divided across a range. To elaborate this, consider the average weight in a sample of two people. If both individuals are 60 kilos, the average weight will be 60 kg. However, if one individual is 50 kg and the other is 70 kg, the average weight is still 60 kg. Measures of dispersion like range or standard deviation can be employed to measure this kind of distribution.

Measures of Position

Descriptive analysis also involves identifying the position of a single value or its response in relation to others. Measures like percentiles and quartiles become very useful in this area of expertise.

Apart from it, if you’ve collected data on multiple variables, you can use the Bivariate or Multivariate descriptive statistics to study whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two different variables to see if they seem to have a pattern and vary together. You can also test and compare the central tendency of the two variables before carrying out further types of statistical analysis .

Multivariate analysis is the same as bivariate analysis but it is carried out for more than two variables. Following 2 methods are for bivariate analysis.

Contingency table

In a contingency table, each cell represents the combination of the two variables. Naturally, an independent variable (e.g., gender) is listed along the vertical axis and a dependent one is tallied along the horizontal axis (e.g., activities). You need to read “across” the table to witness how the two variables i.e. independent and dependent variables relate to each other.

A table showing a tally of different gender with number of activities

Scatter plots

A scatter plot is a chart that enables you to see the relationship between two or three different variables. It’s a visual rendition of the strength of a relationship.

In a scatter plot, you are supposed to plot one variable along the x-axis and another one along the y-axis. Each data point is denoted by a point in the chart.

the photo is a scatter plot representation for the different hours of sleep a person needs to acquire by the different age in his lifespan

The scatter plot shows the hours of sleep needed per day by age, Source

(Recommend Blog: Introduction to Bayesian Statistics )

Advantages of Descriptive Analysis

High degree of objectivity and neutrality of the researchers are one of the main advantages of Descriptive Analysis. The reason why researchers need to be extra vigilant is because descriptive analysis shows different characteristics of the data extracted and if the data doesn’t match with the trends then it will lead to major dumping of data.

Descriptive analysis is considered to be more vast than other quantitative methods and provide a broader picture of an event or phenomenon. It can use any number of variables or even a single number of variables to conduct a descriptive research. 

This type of analysis is considered as a better method for collecting information that describes relationships as natural and exhibits the world as it exists. This reason makes this analysis very real and close to humanity as all the trends are made after research about the real-life behaviour of the data.

It is considered useful for identifying variables and new hypotheses which can be further analyzed through experimental and inferential studies. It is considered useful because the margin for error is very less as we are taking the trends straight from the data properties.

This type of study gives the researcher the flexibility to use both quantitative and qualitative data in order to discover the properties of the population.

For example, researchers can use both case study which is a qualitative analysis and correlation analysis to describe a phenomena in its own way. Using the case studies for describing people, events, institutions enables the researcher to understand the behavior and pattern of the concerned set to its maximum potential. 

In the case of surveys which consist of one of the main types of Descriptive Analysis, the researcher tends to gather data points from a relatively large number of samples unlike experimental studies that generally need smaller samples.

This is an out and out advantage of the survey method over other descriptive methods that it enables researchers to study larger groups of individuals with ease. If the surveys are properly administered, it gives a broader and neater description of the unit under research.

(Also check: Importance of Statistics for Data Science )

Share Blog :

descriptive research data analysis methods

Be a part of our Instagram community

Trending blogs

5 Factors Influencing Consumer Behavior

Elasticity of Demand and its Types

What is PESTLE Analysis? Everything you need to know about it

What is Managerial Economics? Definition, Types, Nature, Principles, and Scope

5 Factors Affecting the Price Elasticity of Demand (PED)

6 Major Branches of Artificial Intelligence (AI)

Scope of Managerial Economics

Dijkstra’s Algorithm: The Shortest Path Algorithm

Different Types of Research Methods

Latest Comments

descriptive research data analysis methods

Vivian Marcus

Hello my name is Vivian Marcus from the United State, i'm so exciting writing this article to let people seek for help in any Break up Marriage and Relationship, Dr Kachi brought my Ex Boyfriend back to me, Thank you Sir Kachi for helped so many Relationship situation like mine to be restored, i was in pain until the day my aunt introduce me to Dr Kachi that she got her husband back with powerful love spell with help of Dr Kachi So i sent him an email telling him about my problem how my Boyfriend left me and cheating on me because of her boss lady at work i cry all day and night, but Dr Kachi told me my Boyfriend shall return back to me within 24hrs and to me everything he asked me to do the next day it was all like a dream when he text me and said please forgive me and accept me back exactly what i wanted, i am so happy now as we are back together again. because I never thought my Ex Boyfriend would be back to me so quickly with your spell. You are the best and the world greatest Dr Kachi. if you're having broke up Ex Lover or your husband left you and moved to another woman, You do want to get Pregnant do not feel sad anymore contact: [email protected] his Text Number Call: +1 (209) 893-8075 You can reach him Website: https://drkachispellcaster.wixsite.com/my-site

descriptive research data analysis methods

judsoncrump624557e3b22f5ca43f3

I was in the process of rebuilding! I had a 30-day late payment to Bank of America from May of 2019 due to being out of the state I live in due to a family emergency. My mother experienced a life changing injury and was in a trauma center for over a month. During this time, I inadvertently overlooked making my BOA payment. I explained this in a Goodwill letter I sent and was surprised that I immediately got a call stating they would investigate it. I was told to give them two to three days, as they would need time to review my account. One morning, I got a call from the same lady stating that they could not honor my request and remove it due to the Fair Credit Reporting Act. I told her that I was disappointed as I had read that they had removed other’s requests after similar events. She then apologized and stated how they changed the Goodwill/Humanitarian policy in December. I was also an authorized user on my ex’s USAA card that was charged off. It was at some point paid and settled but USAA refuses to remove the negative reporting from my credit report. I have disputed with all bureaus, and it’s never removed. This is both a recommendation and an appreciation of the good job that PINNACLE CREDIT SPECIALIST did for me. After reading various testimonies on QUORA and myFICO forum I decided to give them a trial. And believe me when I say they were up to the task as they cleared all the negative items on my credit report including charge offs, all late payment has been marked as paid on time. My report is squeaky clean. This morning, I got an email from Experian that said, " Congrats, your FICO Score went up. My score is now 803. This might sound sketchy, but PINNACLE CREDIT SPECIALIST is a life saver. Contact them by email: [email protected] Or Text (409) 231-0041.

Walterbrianabcd

THIS IS REAL. I REPEAT, THIS IS REAL. The black mirror is real, the black mirror is really powerful, effective and 100% reliable. My name is Walter Brian, I want to thank Dada Magical for giving his black mirror to me. Since he gave me his black mirror, I became rich, successful, protected, informed and powerful. I was browsing through the internet one day when I saw multiple testimonies on how Dada Magical has helped so many people with his black mirror. I thought it was a joke at first but I gave it a try and contacted him. He sold the black mirror to me and told me how to use it and all that I need to do. I followed the instructions just as he told me and to my greatest surprise, it worked just as he told me. The black mirror is still working for me. The mirror also brings good luck, blessings and information. Contact Dada Magical now on his email; [email protected] and he will help you also with the black mirror just the same way he helped me. Thank you Great Dada Magical.

descriptive research data analysis methods

Mary Robinson

Good day to everyone reading my post, i'm here to appreciate a legitimate spell caster call Dr Kachi who can help you winning the lottery draw, i have never win a biggest amount in lottery unite the day i saw good reviews about DR Kachi how he has helped a lot of people in different ways both financially/martially and i have been playing Mega Million for 8years now, but things suddenly change the moment i contacted Dr Kachi and explained everything to me about the spell and I accepted. I followed his instructions and played the Mega Million with the numbers he gave me, now i am a proud lottery winner with the help of Dr Kachi spell, i win $640 Million Dollars in Mega Millions Ticket, i am making this known to everyone out there who have been trying all day to win the lottery jackpot, believe me this is the only way to win the lottery, this is the real secret we all have been searching for. I want to thank Dr Kachi for his endless help and his from the United States. you can contact via email [email protected] or through Text and Call Number: +1 (209) 893-8075 his website: https://drkachispellcaster.wixsite.com/my-site

johncase3976c2ff74a8e374e04

Hi there, everyone, I’m John, please to meet you all. I work as a security with a fair rent since I live with a family who doesn’t want me to live alone and have a good paying job. My goal is to build a good credit score to get a good credit card. In April I had 140 dollars or so collections from a gym saying I never canceled my membership. I wasn’t getting charged the monthly fee for it. So, I randomly go on credit karma and see the collections. Few days after I contacted PINNACLE CREDIT SPECIALIST, they increased my score to 809 and improved my credit profile by removing the collections, charge off and late payments. I just saw today it says no more derogatory on my credit Karma. My score went up. I sincerely acknowledge their relentless efforts and urge you to contact PINNACLE CREDIT SPECIALIST for any credit related issue. Contact info: [email protected] Or Call 409 231 0041. Honestly, I do not enjoy writing much, but I have let the world know about this genius that helped me.

holdmark1349ffe654384b44163

One faithful day as i was watching a video on you tube i saw a comment of one MR PAUL HAVERSACK testifying of this great herbal healer doctor Moses Buba,That helped him enlarge his penis .i was shocked and happy, so i quickly visited his website and emailed him within 30 mins he got back to me and told me all i need to buy and i did so after 4 days i received his herbal medicine ,he gave me instructions on how to use it ,as i am speaking to you people now after using the cream for just two weeks my penis size is 10 inches long and 8.0 girth ,,am so happy and grateful for his work in my life thank you so much Doctor Moses buba ,,i also learnt he has cure for LOW SPERM COUNT,PREMATURE EJACULATION,ERECTILE DYSFUNCTION,HIV/AIDS VIRUS,DIABETES 1/2,HERPES DISEASE,CANCER,and lots more you can email him on ( [email protected] ) or call/whats-app him directly on +2349060529305

vernonjesse8153eace3e9a25e4325

There was a time, I was a credit disaster, I knew absolutely nothing about how credit works. I’d pay bills late thinking paying late was better than not paying at all. To be honest, I didn’t even know negatives lasted on your reports for 7 years. Why did I pay late? Because at the time, I’d think I needed to buy something shiney and then pay bills. To further my reckless behaviors, I had my house foreclosed on. To be honest, I don’t know why I checked my credit, which ultimately led me to this site where I found PINNACLE CREDIT SPECIALIST, but I’m glad I did, reading as much testimonies as I could. My starting score was (low 500’s). All thanks to PINNACLE CREDIT SPECIALIST, as of today, I’m sitting at EX: 808, TU: 812 and my lowest is EQ: 801. If I can offer one piece of advice that helped me, I’ll advise you to contact PINNACLE by email: [email protected] Or call +1 (409) 231-0041. The first time I’m seeing my scores reach the 800 mark, that feeling is right up there with having my first child, marriage.

descriptive research data analysis methods

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

descriptive research data analysis methods

Home Market Research

Descriptive Research: Definition, Characteristics, Methods + Examples

Descriptive Research

Suppose an apparel brand wants to understand the fashion purchasing trends among New York’s buyers, then it must conduct a demographic survey of the specific region, gather population data, and then conduct descriptive research on this demographic segment.

The study will then uncover details on “what is the purchasing pattern of New York buyers,” but will not cover any investigative information about “ why ” the patterns exist. Because for the apparel brand trying to break into this market, understanding the nature of their market is the study’s main goal. Let’s talk about it.

What is descriptive research?

Descriptive research is a research method describing the characteristics of the population or phenomenon studied. This descriptive methodology focuses more on the “what” of the research subject than the “why” of the research subject.

The method primarily focuses on describing the nature of a demographic segment without focusing on “why” a particular phenomenon occurs. In other words, it “describes” the research subject without covering “why” it happens.

Characteristics of descriptive research

The term descriptive research then refers to research questions, the design of the study, and data analysis conducted on that topic. We call it an observational research method because none of the research study variables are influenced in any capacity.

Some distinctive characteristics of descriptive research are:

  • Quantitative research: It is a quantitative research method that attempts to collect quantifiable information for statistical analysis of the population sample. It is a popular market research tool that allows us to collect and describe the demographic segment’s nature.
  • Uncontrolled variables: In it, none of the variables are influenced in any way. This uses observational methods to conduct the research. Hence, the nature of the variables or their behavior is not in the hands of the researcher.
  • Cross-sectional studies: It is generally a cross-sectional study where different sections belonging to the same group are studied.
  • The basis for further research: Researchers further research the data collected and analyzed from descriptive research using different research techniques. The data can also help point towards the types of research methods used for the subsequent research.

Applications of descriptive research with examples

A descriptive research method can be used in multiple ways and for various reasons. Before getting into any survey , though, the survey goals and survey design are crucial. Despite following these steps, there is no way to know if one will meet the research outcome. How to use descriptive research? To understand the end objective of research goals, below are some ways organizations currently use descriptive research today:

  • Define respondent characteristics: The aim of using close-ended questions is to draw concrete conclusions about the respondents. This could be the need to derive patterns, traits, and behaviors of the respondents. It could also be to understand from a respondent their attitude, or opinion about the phenomenon. For example, understand millennials and the hours per week they spend browsing the internet. All this information helps the organization researching to make informed business decisions.
  • Measure data trends: Researchers measure data trends over time with a descriptive research design’s statistical capabilities. Consider if an apparel company researches different demographics like age groups from 24-35 and 36-45 on a new range launch of autumn wear. If one of those groups doesn’t take too well to the new launch, it provides insight into what clothes are like and what is not. The brand drops the clothes and apparel that customers don’t like.
  • Conduct comparisons: Organizations also use a descriptive research design to understand how different groups respond to a specific product or service. For example, an apparel brand creates a survey asking general questions that measure the brand’s image. The same study also asks demographic questions like age, income, gender, geographical location, geographic segmentation , etc. This consumer research helps the organization understand what aspects of the brand appeal to the population and what aspects do not. It also helps make product or marketing fixes or even create a new product line to cater to high-growth potential groups.
  • Validate existing conditions: Researchers widely use descriptive research to help ascertain the research object’s prevailing conditions and underlying patterns. Due to the non-invasive research method and the use of quantitative observation and some aspects of qualitative observation , researchers observe each variable and conduct an in-depth analysis . Researchers also use it to validate any existing conditions that may be prevalent in a population.
  • Conduct research at different times: The analysis can be conducted at different periods to ascertain any similarities or differences. This also allows any number of variables to be evaluated. For verification, studies on prevailing conditions can also be repeated to draw trends.

Advantages of descriptive research

Some of the significant advantages of descriptive research are:

Advantages of descriptive research

  • Data collection: A researcher can conduct descriptive research using specific methods like observational method, case study method, and survey method. Between these three, all primary data collection methods are covered, which provides a lot of information. This can be used for future research or even for developing a hypothesis for your research object.
  • Varied: Since the data collected is qualitative and quantitative, it gives a holistic understanding of a research topic. The information is varied, diverse, and thorough.
  • Natural environment: Descriptive research allows for the research to be conducted in the respondent’s natural environment, which ensures that high-quality and honest data is collected.
  • Quick to perform and cheap: As the sample size is generally large in descriptive research, the data collection is quick to conduct and is inexpensive.

Descriptive research methods

There are three distinctive methods to conduct descriptive research. They are:

Observational method

The observational method is the most effective method to conduct this research, and researchers make use of both quantitative and qualitative observations.

A quantitative observation is the objective collection of data primarily focused on numbers and values. It suggests “associated with, of or depicted in terms of a quantity.” Results of quantitative observation are derived using statistical and numerical analysis methods. It implies observation of any entity associated with a numeric value such as age, shape, weight, volume, scale, etc. For example, the researcher can track if current customers will refer the brand using a simple Net Promoter Score question .

Qualitative observation doesn’t involve measurements or numbers but instead just monitoring characteristics. In this case, the researcher observes the respondents from a distance. Since the respondents are in a comfortable environment, the characteristics observed are natural and effective. In a descriptive research design, the researcher can choose to be either a complete observer, an observer as a participant, a participant as an observer, or a full participant. For example, in a supermarket, a researcher can from afar monitor and track the customers’ selection and purchasing trends. This offers a more in-depth insight into the purchasing experience of the customer.

Case study method

Case studies involve in-depth research and study of individuals or groups. Case studies lead to a hypothesis and widen a further scope of studying a phenomenon. However, case studies should not be used to determine cause and effect as they can’t make accurate predictions because there could be a bias on the researcher’s part. The other reason why case studies are not a reliable way of conducting descriptive research is that there could be an atypical respondent in the survey. Describing them leads to weak generalizations and moving away from external validity.

Survey research

In survey research, respondents answer through surveys or questionnaires or polls . They are a popular market research tool to collect feedback from respondents. A study to gather useful data should have the right survey questions. It should be a balanced mix of open-ended questions and close ended-questions . The survey method can be conducted online or offline, making it the go-to option for descriptive research where the sample size is enormous.

Examples of descriptive research

Some examples of descriptive research are:

  • A specialty food group launching a new range of barbecue rubs would like to understand what flavors of rubs are favored by different people. To understand the preferred flavor palette, they conduct this type of research study using various methods like observational methods in supermarkets. By also surveying while collecting in-depth demographic information, offers insights about the preference of different markets. This can also help tailor make the rubs and spreads to various preferred meats in that demographic. Conducting this type of research helps the organization tweak their business model and amplify marketing in core markets.
  • Another example of where this research can be used is if a school district wishes to evaluate teachers’ attitudes about using technology in the classroom. By conducting surveys and observing their comfortableness using technology through observational methods, the researcher can gauge what they can help understand if a full-fledged implementation can face an issue. This also helps in understanding if the students are impacted in any way with this change.

Some other research problems and research questions that can lead to descriptive research are:

  • Market researchers want to observe the habits of consumers.
  • A company wants to evaluate the morale of its staff.
  • A school district wants to understand if students will access online lessons rather than textbooks.
  • To understand if its wellness questionnaire programs enhance the overall health of the employees.

FREE TRIAL         LEARN MORE

MORE LIKE THIS

CX Platforms

CX Platform: Top 13 CX Platforms to Drive Customer Success

Jun 17, 2024

descriptive research data analysis methods

How to Know Whether Your Employee Initiatives are Working

Weighting Survey Data

How to Weighting Survey Data to Enhance Your Data Quality?

Jun 12, 2024

stay interviews

Stay Interviews: What Is It, How to Conduct, 15 Questions

Jun 11, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Koshal Research Support

—Information | Knowledge | Wisdom—

Descriptive Data Analysis: Definition, method with examples and importance

Descriptive Data Analysis

If you are a researcher or a data analyst, you should be aware of what descriptive data analysis is and how it is carried out

If a reader takes a little time to read this article with dedication, he or she will be able to gain a much better understanding of the concept of descriptive data analysis much more quickly.

The purpose of this article is to provide a brief description of what descriptive data analysis is, how it works, and why it is important in the research process.

We hope that after reading this article, you will be able to use these methods in your own research, as well as getting your research accepted in the very near future

In case you find any difficulty during the research and publication process, please do not hesitate to contact us by email

The analysis of data is one of the most important parts of any research process and it has been seen that after sucessfull collection of data, when data is not analyzed properly, it can be overwhelming, confusing, and even misleading the information. There are many types of analyses that are used in research and descriptive data analysis is one of them.

Read: Primary Data Collection Method and Importance

What is descriptive data analysis?

Descriptive Data Analysis is a method of analyzing data that involves summarizing and describing the key characteristics of a dataset in order to describe how the data were collected or gathered.

Using these analysis methods, researchers will be able to get an overview of the data as well as identify the key trends, patterns, and relationships between the data in order to help them make sense of it.

There are many basic questions that can be answered by descriptive data analysis, such as:

  • What is the range of values in the dataset?
  • What is the average or median value?
  • What is the most common value or category?
  • What is the variability or spread of the data?
  • Are there any outliers or unusual values?

Researchers often use descriptive data analysis as a preliminary step to formulate hypotheses for further analysis when they are analyzing data, as it allows them to identify any potential problems with the data or to formulate hypotheses for further research.

Methods of descriptive data analysis

In order to analyze descriptive data, there are several methods that can be used, such as:

#1. Measures of central tendency

In statistics, these are measures that describe the average or center value for a collection of data and help to reduse the length of a data set. Examples include the mean, median, and mode.

#2. Measures of variability

This is a statistical measure that is used to describe the dispersion or spread of a set of data and help to find out the diversity of data. Examples include range, variance, and standard deviation.

#3. Frequency distribution

There are a number of graphs that show the frequency or proportion of each value or category in the dataset based on the values they belong to and help to present the data in a visual mode. Examples include histograms, bar charts, and pie charts.

#4. Correlation analysis

It is used to determine how strong and in which direction the relationship between two variables is, as well as the strength of the relationship. In order to quantify the level of relationship between two variables, correlation coefficients, such as Pearson’s r, can be used.

Here are some examples of how descriptive data analysis can be used in order to get a better understanding of how it works.

Example 1: Survey data

Consider a scenario in which you were trying to understand the opinions of 100 people regarding a new product by conducting a survey. In the survey, you collected data by using number of factors, including age, gender, income, and rating of the product on a scale of 1 to 10, so that you could generate statistics. In order to summarize this data, you could use descriptive data analysis in the following ways:

Measures of central tendency

As a way to determine the typical rating given by respondents to the product rating variable, you could compute the mean, median, and mode of the collcetd datae in seperate way and find out the biase and error in the data.

Measures of variability: In order to determine how much the ratings vary among respondents for a particular product, you can calculate the range and standard deviation of the variable and find out the diversity of data.

Frequency distributions

It would be helpful to create a histogram or bar chart that shows the distribution of product ratings among respondents based on their responses and present in a visual mode.

Correlation analysis

In order to determine if there is a relationship between age and product rating, you could examine the correlation between these two variables to determine if there is an association and also find out the product review and use by age and gender wise.

Read: Explain Survey Research

Example 2: Sales data

Suppose you own a store and you want to understand the sales patterns that are associated with the top-selling products that you sell in your store. Your company has compiled data over the past year on how many units have been sold each week. The following are a few ways in which you could summarize this data using descriptive data analysis:

In order to figure out how many units are sold per week, you could calculate the mean, median, and mode of the number of units sold per week.

Measures of variability

In order to understand how much sales volume varies across weeks, you can calculate the range and standard deviation for the number of units sold per week in order to identify the variability between weeks.

A line chart can be used to demonstrate the trend in sales volume over time by showing the frequency distributions and help to make a comparison between the previous year’s sale and the subsequent year’s sale.

It is possible to examine the relationship between sales volume and external factors, such as advertising and promotions, in order to see if there is a correlation between these variables and also find out which factor more responsible for sale

Why is descriptive data analysis important?

The importance of descriptive data analysis can be attributed to several factors:

  • A data analysis provides a basic understanding of the data, allowing researchers to identify any potential problems or patterns that may exist in the data.
  • This can be used as a means of summarizing data in a manner that is easy to understand and that is accessible to others as well.
  • Using inferential statistics or hypothesis testing, it provides a basis for further analysis such as inferential statistics or hypothesis testing.
  • You can use it to communicate results to other people, such as stakeholders or policymakers, in order to inform them of the findings.

Read: Steps in Process of Data Analysis

There can be no doubt that descriptive data analysis is an important part of any research process. This method can be used to identify any potential problems or patterns in a dataset as well as to help researchers understand the basic features of the dataset.

A descriptive data analysis is one of the most important tools for researchers and analysts who want to understand and summarize data in a meaningful way.

In order to gain valuable insights into the key features of a dataset, researchers can use measures of central tendency, variability, frequency distributions, and correlation analysis etc which are the parts of the descriptive data analysis.

Regardless of whether you are working with survey data, sales data, or any other type of data, descriptive data analysis is an essential step in the analysis process, no matter what type of data you are working with.

As a result of descriptive data analysis, researchers are able to summarize data in a way that is easily understandable and accessible to other researchers, so that this data can be further analyzed and communicated to others.

Wrapping Up

This is all about this article and we hope that this article will provides you a brief explanation of what descriptive data analysis is, how it works, and why it is so important, and that you will be able to use these methods to analyze your research data for your research.

KressUp is an online and research support platform that is designed for academics as well as researchers.

As a member of this program, you will get an array of content that is regularly updated to support the development of your academic and career skills.

We would be delighted if you would share our website with your friends and subscribe to it if you are visiting for the first time on our website.

For additional assistance with E-content and research, please visit our website or feel free to send us an e-mail , Any member of our team will be happy to assist you with your research

Related articles:

  • 42 Most Frequently Used Data Analysis Tools And How They Are Used To Examine Data
  • Data Analysis in Research and its Importance | Best Statistical Data Analysis Method
  • Secondary Data Collection, Its Sources And best Method Of Collection

General FAQ related to descriptive data analysis

Q 1. what do you mean by descriptive data analysis.

Data analysis that summarizes and describes key characteristics of a dataset is referred to as descriptive data analysis.

Q 2. Best methods of data collection in descriptive research

Observation is best and often used methods of data collection in descriptive research

Q 3. Can we use descriptive analysis for any data set

Mainly the use of descriptive analysis is to describe quantitative data only

Q 4. Describe the main purpose of descriptive analysis.

A descriptive statistic provides us an information about the data set

7 thoughts on “Descriptive Data Analysis: Definition, method with examples and importance”

  • Pingback: Data Analysis in Research and its Importance | Best Statistical Data Analysis Method | Koshal Research Support
  • Pingback: Quantitative Data Analysis: Definition, Methods, Types, Techniques, And Tools | Koshal Research Support
  • Pingback: What Is Research | Type, Methods, and Characteristics? | Koshal Research Support
  • Pingback: Thematic Analysis: Definition, Process and Benefits | Koshal Research Support
  • Pingback: Qualitative Data Analysis: Definition, Methods, Techniques and Tools | Koshal Research Support
  • Pingback: vikalis 20mg tadalafil
  • Pingback: psychology essay writing services

Leave a Reply Cancel reply

You must be logged in to post a comment.

Examples

Quantitative Data

Ai generator.

descriptive research data analysis methods

Quantitative data refers to information that can be measured and expressed numerically. This type of data is crucial for performing quantitative analysis , a method used to evaluate numerical data to uncover patterns, correlations, and trends. In fields like finance, economics, and the natural sciences, quantitative risk analysis is utilized to assess potential risks by quantifying their probability and impact. The precision and objectivity of quantitative data make it essential for making data-driven decisions and forming the basis for statistical analysis.

What is Quantitative Data?

Quantitative data is numerical information that can be measured and analyzed statistically. It represents quantities and allows for objective comparison and analysis.

Examples of Quantitative Data

  • Age in years
  • Height in centimeters
  • Weight in kilograms
  • Temperature in degrees Celsius
  • Number of siblings
  • Annual income in dollars
  • Distance in miles
  • Test scores in percentages
  • Number of books read in a year
  • Time in minutes
  • Number of employees in a company
  • Population of a city
  • Speed in miles per hour
  • Number of students in a class
  • Price of a product in dollars
  • Volume of water in liters
  • Number of steps taken in a day
  • Daily calorie intake
  • Frequency of visits to the gym per month
  • Number of social media followers
  • Hours of sleep per night
  • Number of pages in a book
  • Length of a movie in minutes
  • Number of items sold per day
  • Score in a game
  • Number of cars in a parking lot
  • Monthly utility bills in dollars
  • Number of courses completed
  • Quantity of rainfall in millimeters
  • Number of products in inventory
  • Blood pressure readings
  • Number of phone calls made per day
  • Distance run in a week in kilometers
  • Number of website visits per month
  • Number of pets owned
  • Number of countries visited
  • Monthly rent in dollars
  • Number of clients served
  • Weight of luggage in pounds
  • Number of trees in a park
  • Annual sales revenue in dollars
  • Number of hours worked per week
  • Quantity of milk produced by a cow in liters
  • Number of concerts attended per year
  • Number of patients treated in a hospital
  • Number of goals scored in a season
  • Monthly savings in dollars
  • Number of chapters in a novel
  • Frequency of meetings per week
  • Number of assignments submitted

What is the Difference Between Quantitative and Qualitative Data?

Difference-Between-Quantitative-and-Qualitative-Data

Numerical information that can be measured.Descriptive information that cannot be measured.
Objective and measurable.Subjective and interpretive.
Height, weight, age, income.Opinions, feelings, experiences, colors.
Numbers and statistics.Words, images, symbols.
Uses tools like scales, rulers, and thermometers.Uses interviews, observations, and surveys.
Statistical and mathematical analysis.Thematic and content analysis.
To quantify variables and analyze relationships.To understand concepts, thoughts, and experiences.
Specific and can be generalized.Detailed and rich in context, not easily generalized.
Surveys, experiments, market analysis.Case studies, ethnography, narrative research.
Graphs, charts, tables.Narratives, quotes, descriptions.

What are the Different Types of Quantitative Data?

1. discrete data.

Discrete data represents countable items. It is often whole numbers and does not include fractions or decimals. This type of data is used in scenarios where items can only be counted in whole units.

  • Number of students in a classroom
  • Number of books in a library

2. Continuous Data

Continuous data can take any value within a range. This type of data includes fractions and decimals, making it suitable for measurements that require precision.

Application in Research:

  • Data Analysis: Both discrete and continuous data are fundamental in data analysis , allowing researchers to perform statistical tests, create models, and derive insights from numerical information.
  • Historical Research: Quantitative data in historical research helps in understanding trends over time, such as population growth, economic changes, and social developments.
  • Quantitative Research: This Quantitative research method relies heavily on quantitative data to test hypotheses, establish patterns, and predict outcomes, making it vital for scientific, economic, and social research.

How is Quantitative Data Collected?

1. surveys and questionnaires.

These tools gather numerical information by asking people questions. Respondents choose from set options, making it easy to count and compare answers.

2. Experiments

Researchers conduct experiments by changing variables to see how they affect other variables. This helps in understanding cause and effect.

3. Observations

In this method, data is collected by watching and recording events or behaviors as they happen. For example, counting how many people enter a store.

4. Existing Records and Databases

Quantitative data can also be found in existing sources like government reports, academic studies, or company records. Researchers use this data for analysis.

5. Sensors and Instruments

Devices like thermometers, scales, and GPS units measure physical quantities and provide precise numerical data.

6. Structured Interviews

Interviewers ask a set list of questions to gather numerical responses from participants. This method ensures consistency in the data collected.

Interval vs. ratio data

Data with equal intervals between values but no true zero point.Data with equal intervals between values and a true zero point.
Temperature in Celsius or Fahrenheit, IQ scoresHeight, weight, age, income, temperature in Kelvin
Arbitrary zero (e.g., 0°C does not mean “no temperature”)True zero (e.g., 0 kg means “no weight”)
Addition and subtraction are meaningful (e.g., difference in temperature).All arithmetic operations are meaningful (e.g., you can multiply and divide).
Measures differences between valuesMeasures differences and ratios between values
Measuring temperature changes between citiesComparing heights of different individuals

How is quantitative data analyzed?

1. data collection.

Before analysis, ensure that data is accurately and reliably collected through methods such as surveys, experiments, or existing records.

2. Data Cleaning

Clean the data by removing any errors, duplicates, or inconsistencies. This step ensures that the data set is accurate and ready for analysis.

3. Descriptive Statistics

Use descriptive statistics to summarize and describe the main features of the data. This includes measures such as:

  • Mean : The average value.
  • Median : The middle value when data is ordered.
  • Mode : The most frequently occurring value.
  • Standard Deviation : A measure of the amount of variation or dispersion in the data.

4. Data Visualization

Visualize the data to identify patterns, trends, and outliers. Common visualization techniques include:

  • Histograms : Show the distribution of data.
  • Bar Charts : Compare different groups.
  • Line Graphs : Show trends over time.
  • Scatter Plots : Identify relationships between variables.

5. Inferential Statistics

Apply inferential statistics to make predictions or inferences about a population based on a sample of data. This involves:

  • Hypothesis Testing : Determining if there is enough evidence to support a specific hypothesis.
  • Confidence Intervals : Estimating the range within which a population parameter lies.
  • Regression Analysis : Examining the relationship between variables.

6. Data Interpretation

Interpret the results to draw conclusions and make informed decisions. This step involves understanding the implications of the statistical findings and how they relate to the research question or problem.

7. Reporting Results

Present the findings in a clear and concise manner. This may involve writing reports, creating presentations, or publishing research papers. Ensure that the results are communicated effectively to the target audience.

What’s the Difference Between Descriptive and Inferential Analysis of Quantitative Data?

Summarizes and describes the main features of a data set.Makes predictions or inferences about a population based on a sample of data.
Provides an overview and understanding of the current data.Extends findings from a sample to a larger population, estimating population parameters.
– Mean, median, mode<br>- Standard deviation<br>- Range<br>- Frequency distribution<br>- Percentiles<br>- Data visualization (e.g., charts, graphs)– Hypothesis testing<br>- Confidence intervals<br>- Regression analysis<br>- ANOVA (Analysis of Variance)<br>- Chi-square tests
Uses all data points in the data set.Uses a sample of data to make generalizations about a larger population.
Calculating the average age of students in a class.Using a sample to estimate the average age of all students in a school district.
Provides summaries such as central tendency and variability.Provides insights about population parameters, including margins of error.
Suitable for initial data exploration and presentation.Suitable for testing hypotheses and making predictions.

What are the Advantages and Disadvantages of Quantitative Data?

Objectivity and Reliability : Quantitative data is based on measurable values, which makes it more objective and less prone to bias. The results are replicable, allowing for consistent verification of findings.

Precision and Consistency : Quantitative data allows for precise measurement and quantification. This precision helps in making accurate comparisons and analyzing trends over time.

Statistical Analysis : The numerical nature of quantitative data enables the use of statistical analysis to identify patterns, relationships, and causal effects. Advanced statistical methods can be applied to test hypotheses and make predictions.

Generalizability : Large sample sizes and standardized data collection methods enable findings to be generalized to larger populations, enhancing the external validity of the research.

Efficient Data Collection : Quantitative data collection methods, such as surveys and experiments, can be more efficient and quicker to administer to large groups compared to qualitative methods.

Clear Data Presentation : Quantitative data can be easily presented using graphs, charts, and tables, making it easier to communicate findings clearly and effectively.

Disadvantages

Limited Flexibility : Standardized data collection methods can be rigid, not allowing for flexibility in exploring unexpected results or new avenues of inquiry.

Potential for Misinterpretation : Without proper understanding of statistical methods and the context of the data, there is a risk of misinterpreting the results. Misleading conclusions can be drawn from incorrect or incomplete analysis.

Resource Intensive : Collecting large amounts of quantitative data can be resource-intensive, requiring significant time, effort, and financial investment for surveys, experiments, and data analysis.

Measurement Errors : Errors in measurement tools or data entry can affect the accuracy and reliability of quantitative data. Small errors can lead to significant deviations in the results.

Limited Depth : Quantitative data typically does not provide in-depth insights into complex issues or human experiences, which may require qualitative data to fully understand.

Should I use Quantitative in my Research?

You Need to Measure and Quantify : If your research aims to quantify variables, measure frequencies, or make numerical comparisons, quantitative data is suitable.

Example : Measuring the average income level of a population.

You Aim for Objectivity : When you require objective data that can be statistically analyzed to test hypotheses and identify patterns, trends, or correlations.

Example : Analyzing the correlation between hours of study and exam scores.

You Want Generalizable Results : If you aim to generalize findings from a sample to a larger population, quantitative methods allow for this, provided you have a sufficiently large and representative sample.

Example : Conducting a survey to estimate the percentage of people who prefer online shopping over in-store shopping.

You Have Large Populations: When dealing with large populations where collecting and analyzing numerical data is more feasible and efficient.

Example: National health surveys to track prevalence of diseases.

You Need Statistical Analysis: When your research requires the application of statistical tests, quantitative data is essential.

Example: Using regression analysis to predict future sales based on past trends.

What are Some Common Quantitative Analysis Tools?

1. microsoft excel.

  • Description : Spreadsheet software for organizing and analyzing data.
  • Features : Formulas, charts, pivot tables.
  • Use Case : Basic to intermediate data analysis.
  • Description : Software for statistical analysis.
  • Features : Descriptive statistics, regression analysis, ANOVA.
  • Use Case : Social sciences and health research.
  • Description : Programming language for statistics and graphics.
  • Features : Statistical techniques, data manipulation, extensive packages.
  • Use Case : Advanced statistical analysis and data science.
  • Description : Software suite for advanced analytics.
  • Features : Statistical procedures, predictive modeling, data mining.
  • Use Case : Business, healthcare, government.
  • Description : Software for data analysis and visualization.
  • Features : Data management, statistical analysis, graphics.
  • Use Case : Economics, sociology, epidemiology.
  • Description : Language and environment for numerical computation.
  • Features : Mathematical functions, algorithm development, data visualization.
  • Use Case : Engineering, finance, scientific research.
  • Description : Data visualization software.
  • Features : Interactive dashboards, real-time analysis, visual analytics.
  • Use Case : Business intelligence and reporting.
  • Description : Statistical software for data analysis.
  • Features : Descriptive statistics, hypothesis testing, control charts.
  • Use Case : Manufacturing, quality improvement, Six Sigma projects.

9. Google Data Studio

  • Description : Tool for creating interactive dashboards and reports.
  • Features : Data integration, customizable reports, real-time updates.
  • Use Case : Marketing, sales, performance tracking.

10. Python (with libraries like Pandas, NumPy, Matplotlib)

  • Description : Programming language with data analysis libraries.
  • Features : Data manipulation (Pandas), numerical computations (NumPy), plotting (Matplotlib).
  • Use Case : Data science and machine learning.

11. IBM Watson Analytics

  • Description : Cloud-based analytics service.
  • Features : Automated data visualization, predictive modeling.
  • Use Case : Business intelligence and data-driven decision-making.

Quantitative Data Examples for Students

Academic performance.

  • Test scores (e.g., 85%, 90%)
  • GPA (e.g., 3.5, 4.0)
  • Number of assignments completed
  • Attendance records (e.g., number of days present)
  • Hours spent studying per week

Classroom Activities

  • Number of books read in a semester
  • Number of extracurricular activities participated in
  • Number of homework problems solved
  • Participation points earned in class
  • Time taken to complete a test (in minutes)

Personal Life

  • Age (in years)
  • Height (in centimeters or inches)
  • Weight (in kilograms or pounds)
  • Daily screen time (in hours)
  • Number of steps taken per day

Technology Usage

  • Number of text messages sent per day
  • Number of emails received per day
  • Hours spent on social media per week
  • Number of apps downloaded on a phone
  • Battery life percentage at the end of the day

Health and Fitness

  • Number of push-ups completed in one session
  • Distance run in kilometers or miles
  • Heart rate (beats per minute)

What is quantitative data?

Quantitative data is numerical information that can be measured and analyzed statistically.

How is quantitative data collected?

It is collected through surveys, experiments, observations, existing records, and sensors.

Why use quantitative data?

It provides objective, measurable, and comparable results for statistical analysis and decision-making.

What are examples of quantitative data?

Examples include test scores, height, weight, income, and temperature.

What tools analyze quantitative data?

Common tools are Microsoft Excel, SPSS, R, SAS, and Tableau.

How is quantitative data visualized?

It is visualized using charts, graphs, histograms, and scatter plots.

What is descriptive statistics?

Descriptive statistics summarize and describe data features, such as mean and standard deviation.

What is inferential statistics?

Inferential statistics make predictions or inferences about a population based on a sample.

What is the difference between interval and ratio data?

Interval data has no true zero point, while ratio data has a true zero.

What are the advantages of quantitative data?

Advantages include objectivity, reliability, precision, and the ability to perform statistical analysis.

Twitter

Text prompt

  • Instructive
  • Professional

10 Examples of Public speaking

20 Examples of Gas lighting

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

environments-logo

Article Menu

descriptive research data analysis methods

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Association between combined metals and pfas exposure with dietary patterns: a preliminary study.

descriptive research data analysis methods

1. Introduction

2. materials and methods, 2.1. study cohort and design, 2.2. calculation of dii scores, 2.3. statistical analysis, 2.3.1. descriptive statistics, 2.3.2. bayesian kernel machine regression, 3.1. characteristics of the sample population, 3.2. correlation between environmental contaminants variables and the dii, 3.3. bkmr analysis, 3.3.1. posterior inclusion probability of environmental contaminants with dii, 3.3.2. univariate association of the dii and combined pfas and heavy metals, 3.3.3. bivariate exposure–response function, 3.3.4. overall exposure effect of the dii in relation to pfas and heavy metal exposure percentiles, 3.3.5. single-variable effects of pfas and metals with the dii, 3.3.6. single-variable interaction terms of pfas and metals on the dii, 4. discussion, limitations, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.

ProportionStd. Error95% Confidence Interval
Gender @ DII
Male—00.6240.0310.556, 0.689
Male—10.4530.0100.432, 0.475
Female—00.3750.0310.311, 0.444
Female—10.5470.0100.525, 0.568
Ethnicity @DII
1—00.1010.0200.0661,0.153
1—10.08750.0160.0589, 0.128
2—00.07480.0120.0533,0.104
2—10.06640.0090.0492, 0.0889
3—00.6280.0340.553, 0.697
3—10.6300.0250.575, 0.682
4—00.07810.0120.0555, 0.109
4—10.1200.0170.0875, 0.162
5—00.06550.0110.0458, 0.0927
5—10.05040.0090.319, 0.0721
6—00.05220.0160.0266, 0.0997
6—10.04640.0050.0367, 0.0585
Alcohol @ DII
Yes—00.9410.0100.915, 0.960
Yes—10.9140.0060.898, 0.928
No—00.05880.0100.0403, 0.0853
No—10.08580.0070.0724, 0.102
Smoking @DII
Yes—00.4000.2270.352, 0.449
Yes—10.4190.01670.384, 0.455
No—00.6000.0230.551, 0.648
No—10.5810.0170.545, 0.616
  • Collaborators, G.A. Global, regional, and national burden of diseases and injuries for adults 70 years and older: Systematic analysis for the Global Burden of Disease 2019 Study. BMJ 2022 , 376 , e068208. [ Google Scholar ]
  • Rakhra, V.; Galappaththy, S.L.; Bulchandani, S.; Cabandugama, P.K. Obesity and the western diet: How we got here. Mo. Med. 2020 , 117 , 536. [ Google Scholar ] [ PubMed ]
  • Furman, D.; Campisi, J.; Verdin, E.; Carrera-Bastos, P.; Targ, S.; Franceschi, C.; Ferrucci, L.; Gilroy, D.W.; Fasano, A.; Miller, G.W.; et al. Chronic inflammation in the etiology of disease across the life span. Nat. Med. 2019 , 25 , 1822–1832. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Clark, M.; Hill, J.; Tilman, D. The diet, health, and environment trilemma. Annu. Rev. Environ. Resour. 2018 , 43 , 109–134. [ Google Scholar ] [ CrossRef ]
  • Shivappa, N.; Steck, S.E.; Hurley, T.G.; Hussey, J.R.; Hébert, J.R. Designing and developing a literature-derived, population-based dietary inflammatory index. Public Health Nutr. 2014 , 17 , 1689–1696. [ Google Scholar ] [ CrossRef ]
  • Cavicchia, P.P.; Steck, S.E.; Hurley, T.G.; Hussey, J.R.; Ma, Y.; Ockene, I.S.; Hébert, J.R. A New Dietary Inflammatory Index Predicts Interval Changes in Serum High-Sensitivity C-Reactive Protein. J. Nutr. 2009 , 139 , 2365–2372. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hébert, J.R.; Shivappa, N.; Wirth, M.D.; Hussey, J.R.; Hurley, T.G. Perspective: The Dietary Inflammatory Index (DII)—Lessons Learned, Improvements Made, and Future Directions. Adv. Nutr. 2019 , 10 , 185–195. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shivappa, N.; Hebert, J.R.; Marcos, A.; Diaz, L.-E.; Gomez, S.; Nova, E.; Michels, N.; Arouca, A.; González-Gil, E.; Frederic, G.; et al. Association between dietary inflammatory index and inflammatory markers in the HELENA study. Mol. Nutr. Food Res. 2017 , 61 , 1600707. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Libby, P. Inflammation in atherosclerosis. Nature 2002 , 420 , 868–874. [ Google Scholar ] [ CrossRef ]
  • Ridker, P.M.; Everett, B.M.; Thuren, T.; MacFadyen, J.G.; Chang, W.H.; Ballantyne, C.; Fonseca, F.; Nicolau, J.; Koenig, W.; Anker, S.D.; et al. Antiinflammatory Therapy with Canakinumab for Atherosclerotic Disease. N. Engl. J. Med. 2017 , 377 , 1119–1131. [ Google Scholar ] [ CrossRef ]
  • Li, R.; Zhan, W.; Huang, X.; Zhang, Z.; Zhou, M.; Bao, W.; Li, Q.; Ma, Y. Association of dietary inflammatory index and metabolic syndrome in the elderly over 55 years in Northern China. Br. J. Nutr. 2022 , 128 , 1082–1089. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Li, H.; Yang, M.; Yang, J.; Seery, S.; Ma, C.; Liu, Y.; Zhang, X.; Li, A.; Guo, H. Per- and polyfluoroalkyl substances and the associated thyroid cancer risk: A case-control study in China. Chemosphere 2023 , 337 , 139411. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Agency for Toxic Substances and Disease Registry. Per-and Polyfluoroalkyl Substances (PFAS) and Your Health. 2020. Available online: https://www.atsdr.cdc.gov/pfas/health-effects/index.html (accessed on 17 June 2024).
  • Sunderland, E.M.; Hu, X.C.; Dassuncao, C.; Tokranov, A.K.; Wagner, C.C.; Allen, J.G. A review of the pathways of human exposure to poly- and perfluoroalkyl substances (PFASs) and present understanding of health effects. J. Expo. Sci. Environ. Epidemiol. 2019 , 29 , 131–147. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Poothong, S.; Papadopoulou, E.; Padilla-Sánchez, J.A.; Thomsen, C.; Haug, L.S. Multiple pathways of human exposure to poly- and perfluoroalkyl substances (PFASs): From external exposure to human blood. Environ. Int. 2020 , 134 , 105244. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pizzurro, D.M.; Seeley, M.; Kerper, L.E.; Beck, B.D. Interspecies differences in perfluoroalkyl substances (PFAS) toxicokinetics and application to health-based criteria. Regul. Toxicol. Pharmacol. 2019 , 106 , 239–250. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhang, L.; Louie, A.; Rigutto, G.; Guo, H.; Zhao, Y.; Ahn, S.; Dahlberg, S.; Sholinbeck, M.; Smith, M.T. A systematic evidence map of chronic inflammation and immunosuppression related to per- and polyfluoroalkyl substance (PFAS) exposure. Environ. Res. 2023 , 220 , 115188. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Goodrich, J.A.; Walker, D.; Lin, X.; Wang, H.; Lim, T.; McConnell, R.; Conti, D.V.; Chatzi, L.; Setiawan, V.W. Exposure to perfluoroalkyl substances and risk of hepatocellular carcinoma in a multiethnic cohort. JHEP Rep. 2022 , 4 , 100550. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Grandjean, P.; Clapp, R. Perfluorinated Alkyl Substances:Emerging Insights Into Health Risks. New Solut. A J. Environ. Occup. Health Policy 2015 , 25 , 147–163. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tchounwou, P.B.; Yedjou, C.G.; Patlolla, A.K.; Sutton, D.J. Heavy metal toxicity and the environment. Exp. Suppl. 2012 , 101 , 133–164. [ Google Scholar ] [ CrossRef ]
  • Wang, X.; Mukherjee, B.; Park, S.K. Associations of cumulative exposure to heavy metal mixtures with obesity and its comorbidities among U.S. adults in NHANES 2003–2014. Environ. Int. 2018 , 121 , 683–694. [ Google Scholar ] [ CrossRef ]
  • Duruibe, J.O.; Ogwuegbu, M.O.; Egwurugwu, J.N. Heavy Metal Pollution and Human Biotoxic Effects. Int. J. Phys. Sci. 2007 , 2 , 112–118. [ Google Scholar ]
  • Haruna, I.; Obeng-Gyasi, E. Association of Combined Per-and Polyfluoroalkyl Substances and Metals with Chronic Kidney Disease. Int. J. Environ. Res. Public Health 2024 , 21 , 468. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sun, X.; Li, J.; Zhao, H.; Wang, Y.; Liu, J.; Shao, Y.; Xue, Y.; Xing, M. Synergistic effect of copper and arsenic upon oxidative stress, inflammation and autophagy alterations in brain tissues of Gallus gallus . J. Inorg. Biochem. 2018 , 178 , 54–62. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhou, R.; Peng, J.; Zhang, L.; Sun, Y.; Yan, J.; Jiang, H. Association between the dietary inflammatory index and serum perfluoroalkyl and polyfluoroalkyl substance concentrations: Evidence from NANHES 2007–2018. Food Funct. 2023 . [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bashir, T.; Obeng-Gyasi, E. The Association between Multiple Per- and Polyfluoroalkyl Substances’ Serum Levels and Allostatic Load. Int. J. Environ. Res. Public Health 2022 , 19 , 5455. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Roth, K.; Imran, Z.; Liu, W.; Petriello, M.C. Diet as an Exposure Source and Mediator of Per- and Polyfluoroalkyl Substance (PFAS) Toxicity. Front. Toxicol. 2020 , 2 , 601149. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Aleksandrov, A.P.; Mirkov, I.; Tucovic, D.; Kulas, J.; Zeljkovic, M.; Popovic, D.; Ninkov, M.; Jankovic, S.; Kataranovski, M. Immunomodulation by heavy metals as a contributing factor to inflammatory diseases and autoimmune reactions: Cadmium as an example. Immunol. Lett. 2021 , 240 , 106–122. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Valko, M.; Morris, H.; Cronin, M. Metals, toxicity and oxidative stress. Curr. Med. Chem. 2005 , 12 , 1161–1208. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bobb, J.F.; Valeri, L.; Claus Henn, B.; Christiani, D.C.; Wright, R.O.; Mazumdar, M.; Godleski, J.J.; Coull, B.A. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 2014 , 16 , 493–508. [ Google Scholar ] [ CrossRef ]
  • Coull, B.A.; Bobb, J.F.; Wellenius, G.A.; Kioumourtzoglou, M.A.; Mittleman, M.A.; Koutrakis, P.; Godleski, J.J. Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents. Res. Rep. Health Eff. Inst. 2015 , 183 , 5–50. [ Google Scholar ]
  • Bobb, J.F.; Claus Henn, B.; Valeri, L.; Coull, B.A. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ. Health 2018 , 17 , 67. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Steenland, K.; Winquist, A. PFAS and cancer, a scoping review of the epidemiologic evidence. Environ. Res. 2021 , 194 , 110690. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Balogun, M.; Obeng-Gyasi, E. Association of Combined PFOA, PFOS, Metals and Allostatic Load on Hepatic Disease Risk. J. Xenobiot. 2024 , 14 , 516–536. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hossein-Khannazer, N.; Azizi, G.; Eslami, S.; Alhassan Mohammed, H.; Fayyaz, F.; Hosseinzadeh, R.; Usman, A.B.; Kamali, A.N.; Mohammadi, H.; Jadidi-Niaragh, F. The effects of cadmium exposure in the induction of inflammation. Immunopharmacol. Immunotoxicol. 2020 , 42 , 1–8. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Takiguchi, M.; Yoshihara, S.i. New aspects of cadmium as endocrine disruptor. Environ. Sci. 2006 , 13 , 107–116. [ Google Scholar ] [ PubMed ]
  • Georgescu, B.; Georgescu, C.; Dărăban, S.; Bouaru, A.; Pașcalău, S. Heavy metals acting as endocrine disrupters. Sci. Pap. Anim. Sci. Biotechnol. 2011 , 44 , 89. [ Google Scholar ]
  • Wolf, M.B.; Baynes, J.W. Cadmium and mercury cause an oxidative stress-induced endothelial dysfunction. Biometals 2007 , 20 , 73–81. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Stafoggia, M.; Breitner, S.; Hampel, R.; Basagaña, X. Statistical approaches to address multi-pollutant mixtures and multiple exposures: The state of the science. Curr. Environ. Health Rep. 2017 , 4 , 481–490. [ Google Scholar ] [ CrossRef ]
  • Le Magueresse-Battistoni, B.; Vidal, H.; Naville, D. Environmental pollutants and metabolic disorders: The multi-exposure scenario of life. Front. Endocrinol. 2018 , 9 , 582. [ Google Scholar ] [ CrossRef ]
  • Pawelec, G.; Goldeck, D.; Derhovanessian, E. Inflammation, ageing and chronic disease. Curr. Opin. Immunol. 2014 , 29 , 23–28. [ Google Scholar ] [ CrossRef ]
  • Yu, L.; Liu, W.; Wang, X.; Ye, Z.; Tan, Q.; Qiu, W.; Nie, X.; Li, M.; Wang, B.; Chen, W. A review of practical statistical methods used in epidemiological studies to estimate the health effects of multi-pollutant mixture. Environ. Pollut. 2022 , 306 , 119356. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Dominici, F.; Peng, R.D.; Barr, C.D.; Bell, M.L. Protecting human health from air pollution: Shifting from a single-pollutant to a multipollutant approach. Epidemiology 2010 , 21 , 187–194. [ Google Scholar ] [ CrossRef ] [ PubMed ]

Click here to enlarge figure

Variable * Participants (n)MeanStandard Error (SE)MinimumMaximum
Age (Years)925434.325.50.0080.0
BMI (kg/m ) 800526.68.2612.386.2
Lead (µg/dL)68841.081.290.05042.5
Cadmium (µg/L)75130.3740.5030.07013.0
Mercury (µg/L)75131.142.270.20063.6
PFOA (ng/mL)19291.711.820.14052.9
PFOS (mg/mL)19296.517.740.140105
DII74951.791.59−4.345.15
Mean Std. Error95% Confidence Intervalp-Value
PFOA
01.920.1541.59, 2.250.137
10.1700.0661.56, 1.84
PFOS
06.560.5045.48, 7.630.067
15.640.2195.18, 6.11
Lead
01.060.0730.904, 1.220.263
11.000.0540.957, 1.16
Cadmium
00.3570.0240.307, 0.4090.360
10.3820.0140.352, 0.411
Mercury
01.510.0791.19, 1.66<0.0001
11.070.0600.929, 1.17
Age in Year
046.00.87544.1, 47.8<0.0001
137.00.50935.9, 38.1
BMI
028.00.35127.2, 28.70.390
128.00.23727.1, 28.1
DIICoefficient *Std. Errorp-Value95% Confidence Interval
PFOA−0.0350.0530.520−0.149, 0.079
PFOS−0.0080.0110.471−0.030, 0.015
Lead0.0160.0580.787−0.108, 0.140
Cadmium0.3690.0130.0120.092, 0.647
Mercury−0.1230.0310.001−0.189, −0.057
VariablePIP
Lead0.560
Cadmium1.000
Mercury1.000
PFOA0.592
PFOS0.852
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Odediran, A.; Obeng-Gyasi, E. Association between Combined Metals and PFAS Exposure with Dietary Patterns: A Preliminary Study. Environments 2024 , 11 , 127. https://doi.org/10.3390/environments11060127

Odediran A, Obeng-Gyasi E. Association between Combined Metals and PFAS Exposure with Dietary Patterns: A Preliminary Study. Environments . 2024; 11(6):127. https://doi.org/10.3390/environments11060127

Odediran, Augustina, and Emmanuel Obeng-Gyasi. 2024. "Association between Combined Metals and PFAS Exposure with Dietary Patterns: A Preliminary Study" Environments 11, no. 6: 127. https://doi.org/10.3390/environments11060127

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Root out friction in every digital experience, super-charge conversion rates, and optimise digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered straight to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Meet the operating system for experience management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence
  • Market Research
  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Survey Analysis

Descriptive Statistics

Try Qualtrics for free

Descriptive statistics in research: a critical component of data analysis.

15 min read With any data, the object is to describe the population at large, but what does that mean and what processes, methods and measures are used to uncover insights from that data? In this short guide, we explore descriptive statistics and how it’s applied to research.

What do we mean by descriptive statistics?

With any kind of data, the main objective is to describe a population at large — and using descriptive statistics, researchers can quantify and describe the basic characteristics of a given data set.

For example, researchers can condense large data sets, which may contain thousands of individual data points or observations, into a series of statistics that provide useful information on the population of interest. We call this process “describing data”.

In the process of producing summaries of the sample, we use measures like mean, median, variance, graphs, charts, frequencies, histograms, box and whisker plots, and percentages. For datasets with just one variable, we use univariate descriptive statistics. For datasets with multiple variables, we use bivariate correlation and multivariate descriptive statistics.

Want to find out the definitions? Univariate descriptive statistics: this is when you want to describe data with only one characteristic or attribute

Bivariate correlation: this is when you simultaneously analyse (compare) two variables to see if there is a relationship between them

Multivariate descriptive statistics: this is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable

Then, after describing and summarising the data, as well as using simple graphical analyses, we can start to draw meaningful insights from it to help guide specific strategies. It’s also important to note that descriptive statistics can employ and use both quantitative and  qualitative research .

Describing data is undoubtedly the most critical first step in research as it enables the subsequent organisation, simplification and summarisation of information — and every survey question and population has summary statistics. Let’s take a look at a few examples.

Examples of descriptive statistics

Consider for a moment a number used to summarise how well a striker is performing in football — goals scored per game. This number is simply the number of shots taken against how many of those shots hit the back of the net (reported to three significant digits). If a striker is scoring 0.333, that’s one goal for every three shots. If they’re scoring one in four, that’s 0.250.

A classic example is a student’s grade point average (GPA). This single number describes the general performance of a student across a range of course experiences and classes. It doesn’t tell us anything about the difficulty of the courses the student is taking, or what those courses are, but it does provide a summary that enables a degree of comparison with people or other units of data.

Ultimately, descriptive statistics make it incredibly easy for people to understand complex (or data intensive) quantitative or qualitative insights across large data sets.

Take your research and subsequent analysis to the next level

Types of descriptive statistics

To quantitatively summarise the characteristics of raw, ungrouped data, we use the following types of descriptive statistics:

  • Measures of Central Tendency ,
  • Measures of Dispersion  and
  • Measures of Frequency Distribution.

Following the application of any of these approaches, the raw data then becomes ‘grouped’ data that’s logically organised and easy to understand. To visually represent the data, we then use graphs, charts, tables etc.

Let’s look at the different types of measurement and the statistical methods that belong to each:

Measures of Central Tendency  are used to describe data by determining a single representative of central value. For example, the mean, median or mode.

Measures of Dispersion  are used to determine how spread out a data distribution is with respect to the central value, e.g. the mean, median or mode. For example, while central tendency gives the person the average or central value, it doesn’t describe how the data is distributed within the set.

Measures of Frequency Distribution  are used to describe the occurrence of data within the data set (count).

The methods of each measure are summarised in the table below:

Measures of Central Tendency Measures of Dispersion Measures of Frequency Distribution
Mean Range Count
Median Standard deviation
Mode Quartile deviation
Variance
Absolute deviation

Mean:  The most popular and well-known measure of central tendency. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

Median:  The median is the middle score for a set of data that has been arranged in order of magnitude. If you have an even number of data, e.g. 10 data points, take the two middle scores and average the result.

Mode:  The mode is the most frequently occurring observation in the data set.  

Range:  The difference between the highest and lowest value.

Standard deviation:  Standard deviation measures the dispersion of a data set relative to its mean and is calculated as the square root of the variance.

Quartile deviation : Quartile deviation measures the deviation in the middle of the data.

Variance:  Variance measures the variability from the average of mean.

Absolute deviation:  The absolute deviation of a dataset is the average distance between each data point and the mean.

Count:  How often each value occurs.

Scope of descriptive statistics in research

Descriptive statistics (or analysis) is considered more vast than other quantitative and qualitative methods as it provides a much broader picture of an event, phenomenon or population.

But that’s not all: it can use any number of variables, and as it collects data and describes it as it is, it’s also far more representative of the world as it exists.

However, it’s also important to consider that descriptive analyses lay the foundation for further methods of study. By summarising and condensing the data into easily understandable segments, researchers can further analyse the data to uncover new variables or hypotheses.

Mostly, this practice is all about the ease of data visualisation. With data presented in a meaningful way, researchers have a simplified interpretation of the data set in question. That said, while descriptive statistics helps to summarise information, it only provides a general view of the variables in question.

It is, therefore, up to the researchers to probe further and use other methods of analysis to discover deeper insights.

Things you can do with descriptive statistics:

  • Define subject characteristics:  If a marketing team wanted to build out accurate buyer personas for specific products and industry verticals, they could use descriptive analyses on customer datasets (procured via a survey) to identify consistent traits and behaviours.

They could then ‘describe’ the data to build a clear picture and understanding of who their buyers are, including things like preferences, business challenges, income and so on.

  • Measure data trends

Let’s say you wanted to assess propensity to buy over several months or years for a specific target market and product. With descriptive statistics, you could quickly summarise the data and extract the precise data points you need to understand the trends in product purchase behaviour.

  • Compare events, populations or phenomena

How do different demographics respond to certain variables? For example, you might want to run a customer study to see how buyers in different job functions respond to new product features or price changes. Are all groups as enthusiastic about the new features and likely to buy? Or do they have reservations? This kind of data will help inform your overall product strategy and potentially how you tier solutions.

  • Validate existing conditions

When you have a belief or hypothesis but need to prove it, you can use descriptive techniques to ascertain underlying patterns or assumptions.

  • Form new hypotheses

With the data presented and surmised in a way that everyone can understand (and infer connections from), you can delve deeper into specific data points to uncover deeper and more meaningful insights — or run more comprehensive research.

Guiding your survey design to improve the data collected

To use your surveys as an effective tool for customer engagement and understanding, every survey goal and item should answer one simple, yet highly important question:

“What am I really asking?”

It might seem trivial, but by having this question frame survey research, it becomes significantly easier for researchers to develop the  right questions  that uncover useful, meaningful and actionable insights.

Planning becomes easier, questions clearer and perspective far wider and yet nuanced.

Hypothesise — what’s the problem that you’re trying to solve? Far too often, organisations collect data without understanding what they’re asking, and why they’re asking it.

Finally, focus on the end result. What kind of data do you need to answer your question? Also, are you asking a quantitative or qualitative question? Here are a few things to consider:

  • Clear questions are clear for everyone. It takes time to make a concept clear
  • Ask about measurable, evident and noticeable activities or behaviours.
  • Make rating scales easy. Avoid long lists, confusing scales or “don’t know” or “not applicable” options.
  • Ensure your survey makes sense and flows well. Reduce the cognitive load on respondents by making it easy for them to complete the survey.
  • Read your questions aloud to see how they sound.
  • Pretest by asking a few uninvolved individuals to answer.

Furthermore…

As well as understanding what you’re really asking, there are several other considerations for your data:

  • Keep it random

How you select your sample is what makes your research replicable and meaningful. Having a truly random sample helps prevent bias, increasingly the quality of evidence you find.

  • Plan for and avoid sample error

Before starting your research project, have a clear plan for avoiding sample error. Use larger sample sizes, and apply random sampling to minimise the potential for bias.

  • Don’t over sample

Remember, you can sample 500 respondents selected randomly from a population and they will closely reflect the actual population 95% of the time.

  • Think about the mode

Match your survey methods to the sample you select. For example, how do your current customers prefer communicating? Do they have any shared characteristics or preferences? A mixed-method approach is critical if you want to drive action across different customer segments.

Use a survey tool that supports you with the whole process

Surveys created using a survey research software can support researchers in a number of ways:

  • Employee satisfaction  survey template
  • Employee exit  survey template
  • Customer satisfaction (CSAT)  survey template
  • Ad testing  survey template
  • Brand awareness  survey template
  • Product pricing  survey template
  • Product research  survey template
  • Employee engagement  survey template
  • Customer service  survey template
  • NPS  survey template
  • Product package testing  survey template
  • Product features prioritisation  survey template

These considerations have been included in  Qualtrics’ survey software , which summarises and creates visualisations of data, making it easy to access insights, measure trends, and examine results without complexity or jumping between systems.

Uncover your next breakthrough idea with Stats iQ™

What makes Qualtrics so different from other survey providers is that it is built in consultation with trained research professionals and includes  high-tech statistical software like Qualtrics Stats iQ .

With just a click, the software can run specific analyses or automate statistical testing and data visualisation. Testing parameters are automatically chosen based on how your data is structured (e.g. categorical data will run a statistical test like Chi-squared), and the results are translated into plain language that anyone can understand and put into action.

  • Get more meaningful insights from your data

Stats iQ includes a variety of statistical analyses, including: describe, relate, regression, cluster, factor, TURF, and pivot tables — all in one place!

  • Confidently analyse complex data

Built-in artificial intelligence and advanced algorithms automatically choose and apply the right statistical analyses and return the insights in plain english so everyone can take action.

  • Integrate existing statistical workflows

For more experienced stats users, built-in R code templates allow you to run even more sophisticated analyses by adding R code snippets directly in your survey analysis.

         Advanced statistical analysis methods available in Stats iQ

Regression analysis – Measures the degree of influence of independent variables on a dependent variable (the relationship between two or multiple variables).

Analysis of Variance (ANOVA) test  – Commonly used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them.

Conjoint analysis  – Asks people to make trade-offs when making decisions, then analyses the results to give the most popular outcome. Helps you understand why people make the complex choices they do.

T-Test  – Helps you compare whether two data groups have different mean values and allows the user to interpret whether differences are meaningful or merely coincidental.

Crosstab analysis  – Used in quantitative  market research to analyse categorical data – that is, variables that are different and mutually exclusive, and allows you to compare the relationship between two variables in contingency tables.

Go from insights to action

Now that you have a better understanding of descriptive statistics in research and how you can leverage statistical analysis methods correctly, now’s the time to utilise a tool that can take your research and subsequent analysis to the next level.

Try out a Qualtrics survey software demo so you can see how it can take you through  descriptive research  and further research projects from start to finish.

Related resources

Analysis & Reporting

Statistical Significance Calculator 18 min read

Zero-party data 12 min read, what is social media analytics in 2023 13 min read, topic modelling 16 min read, margin of error 11 min read, text analysis 44 min read, sentiment analysis 21 min read, request demo.

Ready to learn more about Qualtrics?

IMAGES

  1. What is Descriptive Analysis?- Types and Advantages

    descriptive research data analysis methods

  2. Standard statistical tools in research and data analysis

    descriptive research data analysis methods

  3. How To Use Descriptive Analysis In Research

    descriptive research data analysis methods

  4. What Is Descriptive Analytics? A Complete Guide

    descriptive research data analysis methods

  5. Understanding Descriptive Research Methods

    descriptive research data analysis methods

  6. What is Data Analysis ?

    descriptive research data analysis methods

VIDEO

  1. Descriptive Analysis

  2. Data analysis and interpretation of descriptive research (part 2) with example

  3. Understanding Quantitative Research Methods

  4. Lecture 2: Descriptive Stats & Probability Distributions

  5. Data Analysis in Research

  6. Descriptive Analytics in Data Analysis

COMMENTS

  1. Descriptive Analytics

    Descriptive Analytics. Definition: Descriptive analytics focused on describing or summarizing raw data and making it interpretable. This type of analytics provides insight into what has happened in the past. It involves the analysis of historical data to identify patterns, trends, and insights. Descriptive analytics often uses visualization ...

  2. Descriptive Research Design

    Data Analysis Methods. Descriptive research design data analysis methods depend on the type of data collected and the research question being addressed. Here are some common methods of data analysis for descriptive research: Descriptive Statistics. This method involves analyzing data to summarize and describe the key features of a sample or ...

  3. Descriptive Statistics

    A data set is a collection of responses or observations from a sample or entire population. In quantitative research, after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

  4. Descriptive Research

    Descriptive research methods. Descriptive research is usually defined as a type of quantitative research, though qualitative research can also be used for descriptive purposes. The research design should be carefully developed to ensure that the results are valid and reliable.. Surveys. Survey research allows you to gather large volumes of data that can be analyzed for frequencies, averages ...

  5. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  6. PDF Descriptive analysis in education: A guide for researchers

    Box 7. Data Summaries Are Not Descriptive Analysis 10 Box 8. An Example of Using Descriptive Analysis to Support or Rule Out Explanations 13 Box 9. An example of the Complexity of Describing Constructs 20 Box 10. Example of Descriptive Research that Compares Academic Achievement Gaps by Socioeconomic Status over Time 24 Box 11.

  7. Descriptive Statistics

    Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important: Data Summarization: Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large ...

  8. Descriptive Research: Design, Methods, Examples, and FAQs

    The following are some of the characteristics of descriptive research: Quantitativeness. Descriptive research can be quantitative as it gathers quantifiable data to statistically analyze a population sample. These numbers can show patterns, connections, and trends over time and can be discovered using surveys, polls, and experiments.

  9. Descriptive Analysis: What It Is + Best Research Tips

    Descriptive analysis is a sort of data research that aids in describing, demonstrating, or helpfully summarizing data points so those patterns may develop that satisfy all of the conditions of the data. ... The four types of descriptive analysis methods are: 01. Measurements of Frequency. Understanding how often a particular event or reaction ...

  10. Chapter 14 Quantitative Analysis Descriptive Statistics

    Chapter 14 Quantitative Analysis Descriptive Statistics. Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs.

  11. Descriptive Analysis: How-To, Types, Examples

    In descriptive analysis, it's also worth knowing the central (or average) event or response. Common measures of central tendency include the three averages — mean, median, and mode. As an example, consider a survey in which the height of 1,000 people is measured. In this case, the mean average would be a very helpful descriptive metric.

  12. Descriptive Data Analysis

    Descriptive Data Analysis. Descriptive techniques often include constructing tables of means and quantiles, measures of dispersion such as variance or standard deviation, and cross-tabulations or "crosstabs" that can be used to examine many disparate hypotheses. Those hypotheses are often about observed differences across subgroups.

  13. Descriptive Research Designs: Types, Examples & Methods

    The descriptive-analysis method of research describes a subject by further analyzing it, which in this case involves dividing it into 2 parts. ... Descriptive research provides descriptive data explaining what the research subject is about, while correlation research explores the relationship between data and not their description.

  14. Descriptive Statistics in Research: Your Complete Guide- Qualtrics

    Descriptive statistics in research: a critical component of data analysis . 15 min read With any data, the object is to describe the population at large, but what does that mean and what processes, methods and measures are used to uncover insights from that data?

  15. Descriptive Statistics

    Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. Descriptive statistics are typically distinguished from inferential statistics.

  16. What is Descriptive Research? Definition, Methods, Types and Examples

    Descriptive research methods. Several descriptive research methods can be employed, and these are more or less similar to the types of approaches mentioned above. Surveys: This method involves the collection of data through questionnaires or interviews. Surveys may be done online or offline, and the target subjects might be hyper-local ...

  17. What is Descriptive Analysis?- Types and Advantages

    The conversion of raw data into a form that will make it easy to understand & interpret, ie., rearranging, ordering, and manipulating data to provide insightful information about the provided data. Descriptive Analysis is the type of analysis of data that helps describe, show or summarize data points in a constructive way such that patterns ...

  18. Descriptive Research: Characteristics, Methods + Examples

    Characteristics of descriptive research. The term descriptive research then refers to research questions, the design of the study, and data analysis conducted on that topic. We call it an observational research method because none of the research study variables are influenced in any capacity. Some distinctive characteristics of descriptive ...

  19. Qualitative and descriptive research: Data type versus data analysis

    Qualitative research collects data qualitatively, and the method of analysis is also primarily qualitative. This often involves an inductive exploration of the data to identify recurring themes, patterns, or concepts and then describing and interpreting those categories. Of course, in qualitative research, the data collected qualitatively can ...

  20. Descriptive Analysis of Research Data

    Descriptive statistics, such as means and percentages, describe data obtained from empirical observations and measurements, whereas inferen tial statistics are used to make infer ences or draw conclusions about a population, given the data were actu ally obtained for a sample. This article briefly discusses common descriptive data analysis ...

  21. (PDF) Descriptive Data Analysis

    It is one of the simplest analyses that can be performed and interpreted, it is the easiest method to summarize a data set, get a description of the targeted sample, and show its characteristics ...

  22. Descriptive Data Analysis: Definition, method with examples and

    In order to analyze descriptive data, there are several methods that can be used, such as: #1. Measures of central tendency. In statistics, these are measures that describe the average or center value for a collection of data and help to reduse the length of a data set. Examples include the mean, median, and mode. #2.

  23. What is a Research Design? Definition, Types, Methods and Examples

    11. Mixed-Methods Research. Combines qualitative and quantitative research methods to provide a more holistic understanding of a research problem. 12. Grounded Theory. A qualitative research method that aims to develop theories or explanations grounded in the data collected during the research process. 13. Simulation and Modeling

  24. Quantitative Data

    1. Data Collection. Before analysis, ensure that data is accurately and reliably collected through methods such as surveys, experiments, or existing records. 2. Data Cleaning. Clean the data by removing any errors, duplicates, or inconsistencies. This step ensures that the data set is accurate and ready for analysis. 3. Descriptive Statistics

  25. Environments

    Methods: Descriptive statistics, a correlational analysis, and linear regression were initially used to assess the relationship between the variables of interest. We subsequently employed Bayesian kernel Machine regression (BKMR) to analyze the data to assess the non-linear, non-additive, exposure-response relationships and interactions ...

  26. Descriptive Statistics In Research

    Descriptive statistics (or analysis) is considered more vast than other quantitative and qualitative methods as it provides a much broader picture of an event, phenomenon or population. But that's not all: it can use any number of variables, and as it collects data and describes it as it is, it's also far more representative of the world as ...

  27. Service recovery system and service recovery in retail banks: a

    Data analysis. We used PROCESS in SPSS 26.0. ... Table 3 presents the descriptive statistics and correlations between the constructs. Table 3. ... and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research and methods in organizations: Foundations, extensions, and new directions (pp. 3-90). Wiley. https ...