Type: | Article |
---|---|
Title: | Too many digits? The presentation of numerical data |
Open access status: | An open access version is available from UCL Discovery |
DOI: | |
Publisher version: | |
Additional information: | This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http:// creativecommons.org/licenses/by/4.0/ |
UCL classification: | > > > > > > > > > > > |
URI: |
View Item |
Related documents.
You can add this document to your study collection(s)
You can add this document to your saved list
(For complaints, use another form )
Input it if you want to receive answer
Enhancing the QUAlity and Transparency Of health Research
Use your browser's Back button to return to your search results
Reporting guideline provided for? (i.e. exactly what the authors state in the paper) | Recommendations for rounding summary statistics. |
Full bibliographic reference | Cole TJ. Too many digits: the presentation of numerical data. Arch Dis Child. 2015;100(7):608-609. |
Language | English |
PubMed ID | |
Relevant URLs (full-text if available) | The full-text of this reporting guideline is freely available from: | Statistical methods and analyses | December 17, 2021 |
Some reporting guidelines are also available in languages other than English. Find out more in our Translations section .
For information about Library scope and content, identification of reporting guidelines and inclusion/exclusion criteria please visit About the Library .
Visit our Help page for information about searching for reporting guidelines and for general information about using our website.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Wan nor arifin.
1 Statistical Editors, Malaysian Journal of Medical Sciences, Penerbit Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia
2 Unit of Biostatistics and Research Methodology, School of Medical Sciences, Universiti Sains Malaysia, 16150 Kubang Kerian, Kelantan, Malaysia
Bachok norsa’adah, yaacob najib majdi, ab hamid siti-azrin, musa kamarul imran.
3 Department of Community Medicine, School of Medical Sciences, Universiti Sains Malaysia,16150 Kubang Kerian, Kelantan, Malaysia
4 Unit of Community Medicine, Faculty of Medicine, Universiti Sultan Zainal Abidin, 20400 Kuala Terengganu, Terengganu, Malaysia
5 PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong BE 1410, Brunei
Statistical editors of the Malaysian Journal of Medical Sciences (MJMS) must go through many submitted manuscripts, focusing on the statistical aspect of the manuscripts. However, the editors notice myriad styles of reporting the statistical results, which are not standardised among the authors. This could be due to the lack of clear written instructions on reporting statistics in the guidelines for authors. The aim of this editorial is to briefly outline reporting methods for several important and common statistical results. It will also address a number of common mistakes made by the authors. The editorial will serve as a guideline for authors aiming to publish in the MJMS as well as in other medical journals.
Year over year, statistical editors of the Malaysian Journal of Medical Sciences (MJMS) must go through many submitted manuscripts, scrutinising the statistical and methodological soundness of the manuscripts. In 2015 alone, the MJMS received 272 manuscripts from many different countries, 52% of which were original articles ( 1 ). However, the editors have noted many different styles of reporting statistical results, and these styles are not standardised among authors. This has caused unnecessary difficulties for the editors as they have to comment not only on the methods and statistics used, but also on the technical and formatting aspects of the manuscripts. This lack of standardised reporting also causes delay in reviewing and accepting submitted articles. Admittedly, this could be due to a lack of clear written instructions on reporting statistics in the guidelines for authors. Although there are a number of guidelines available on reporting statistical results, for examples in Lang and Altman ( 2 ) and Cummings and Rivara ( 3 ), the editors of the MJMS found them incomplete as guidelines for authors.
The aim of this editorial is to outline reporting methods for several important and common statistical results. It will also address a number of common mistakes made by authors.
Statistical results can be presented in text, table, or figure form. The decision depends very much on the amount of information the authors want to present to the readers.
The text form is suitable for brief results, for example, the description of a sample (“A total of 100 patients were recruited,” “Most of the respondents were female...”). Text form is also used to highlight important results in tables that might be missed by readers given the amount of information commonly summarised in tables, for example, “Among all the studied factors, only gender and salary were found to be significantly associated with...”.
The table form is suitable for presentation of detailed statistical results. Common examples are detailed demographic profiles of study participants, results of a multiple logistic regression analysis, and cross-tabulation of factors with outcomes. It is very important to note that the table description is placed at the top of the table, while the list of abbreviations and additional relevant descriptions (especially related to statistical analysis) are placed below the table as footnotes. The footnotes should be indicated by superscript Roman letters (a, b, c, ...) instead of symbols or numbers. All abbreviations used in the table must be described again in the footnotes, although the abbreviations were already described in text or earlier tables.
The figure form includes charts, graphs, and other images. It should be reserved for results that are more presentable in this form, for example, trends or geographical distribution of disease, histopathological or radiological images, and comparison of means over time. Figure descriptions are placed below the figure.
The descriptive statistics summarise data from a sample, for example, demographic profiles. Whenever there are a number of groups, it is useful to provide the descriptive statistics by group and for the overall sample. This gives a visual impression of the comparability of the groups in term of their baseline characteristics. It is not necessary to report statistical tests and P -values in such a summary because the main concern is the comparability of the participants (which reflects the sampling), not the populations.
Depending on the types of variables, authors should present the appropriate descriptive statistics. For numerical variables, if the variable is normally distributed, the mean and standard deviation (SD) are presented. In the text, this is reported as mean (SD = value), for example, “the mean age was 46.5 (SD = 3.0).” In a table, the “mean (SD)” statement is included in the header. Whenever the variable is not normally distributed, the median and inter-quartile range (IQR) are reported instead. The use of “±” symbol between a mean and an SD must be avoided because the mathematical symbol has its own specific meaning. For the categorical variable, count ( n ) and percentage (%) are presented. In addition, authors must report the group size and total sample size, written as n = size in the table headers and the table description, respectively. The use of a capital N in place of n must be avoided as it refers to population size instead of sample size. A typical demographic table is presented in Table 1 .
Patient demographics ( n = 95)
Variables | Drug X ( = 45) (%) | Placebo ( = 50) (%) | Total (%) | |
---|---|---|---|---|
Age (years) | 45.3 ( 2.6) | 47.8 ( 3.2) | 46.5 ( 3.0) | |
Gender | Male | 25 (55.6) | 25 (50.0) | 50 (52.6) |
Female | 20 (44.4) | 25 (50.0) | 45 (47.4) | |
BMI groups | Underweight (BMI < 18.5) | 10 (22.2) | 11 (24.0) | 21 (22.1) |
Normal (BMI 18.5 to 24.9) | 12 (26.7) | 13 (28.0) | 25 (26.3) | |
Overweight (BMI ≥ 25) | 23 (51.1) | 26 (48.0) | 49 (51.6) |
Precision of the estimates, for example, single mean and proportion, are presented in the form “estimate (95% CI: lower limit, upper limit)”. In writing, for the single mean, “the mean body mass index (BMI) was 22.5 (95% CI: 21.5, 23.5)” and for the single proportion/percentage “the prevalence of obesity was 34.5% (95% CI: 30.5%, 38.5%)”. Other common examples are the reporting of mean difference (independent t -test) and odds ratio (logistic regression), which are presented under the specific statistical tests section below.
In order to standardise the reporting and presenting of statistical results in the MJMS, the editors offer the suggested forms of presentation summarised in Table 2 as general guidelines.
Presenting statistical results
Statistical tests | Table form | Figure form | ||||||
---|---|---|---|---|---|---|---|---|
Independent -test | Comparison of systolic blood pressure between intervention and control groups. | - | ||||||
SBP (mmHg) | Intervention = 40 | Control = 40 | 10.0 (7.5, 12.6) | 7.83 (78) | < 0.001 | |||
119.4 (5.06) | 109.4 (6.34) | |||||||
SBP = systolic blood pressure. Independent test. | ||||||||
Paired -test | Comparison of systolic blood pressure pre- and post-treatment. | - | ||||||
| ||||||||
SBP (mmHg) | Pre | Post | −11.5 (−13.1, −9.9) | −14.92 (29) | < 0.001 | |||
136.5 (9.72) | 125.0 (7.64) | |||||||
SBP = systolic blood pressure. Paired -test. | ||||||||
One-way ANOVA | Comparison of mean weight between the four diet plans. | |||||||
Okinawa Diet | 10 | 65.5 (9.98) | 13.41 (3, 36) | < 0.001 | ||||
Eastern Diet | 10 | 75.4 (4.17) | ||||||
Western Diet | 10 | 77.9 (5.70) | ||||||
Fast food Diet | 10 | 83.9 (5.07) | ||||||
One-way ANOVA, Post-hoc analysis with Bonferroni corrections shows significant difference between Okinawa diet and other diet plans ( < 0.001) and between Eastern diet and Western diet ( = 0.041). | ||||||||
Pearson’s correlation | Correlation between the study variables ( = 155). | |||||||
Age (years) | < 0.001 | 0.031 | ||||||
BMI (kg/m ) | 0.55 | 0.008 | ||||||
SBP (mmHg) | 0.71 | 0.33 | ||||||
BMI = body mass index, SBP = systolic blood pressure. SD, -values, correlation coefficient ( ). | ||||||||
Linear regression | Factors associated with systolic blood pressure (mmHg) ( = 150). | - | ||||||
BMI (kg/m ) | 9.4 (8.6, 10.2) | < 0.001 | ||||||
Age (years) | 2.5 (1.5, 3.5) | 0.004 | ||||||
BMI = body mass index. Adjusted regression coefficients, Multiple linear regression ( = 0.65). | ||||||||
Chi-square test | Association between gender and disease status | - | ||||||
Gender | Male | 20 (80.0) | 5 (20.0) | 25 | 8.33 (1) | 0.004 | ||
Female | 10 (40.0) | 15 (60.0) | 25 | |||||
Chi-square test for independence | ||||||||
McNemar | Status of skin lesion pre- and post-treatment. | - | ||||||
Pre | Lesion | 30 (37.5) | 18 (22.5) | 80 | 11.34 (1) | < 0.001 | ||
No lesion | 2 ( 2.5) | 30 (37.5) | ||||||
McNemar’s Chi-squared test with continuity correction. | ||||||||
Logistic regression | Associated factors of coronary artery disease ( = 250). | - | ||||||
Diastolic Blood Pressure (mmHg) | 0.05 | 1.05 (1.02, 1.08) | <0.001 | |||||
Gender | Male vs Female | 0.81 | 2.24 (1.04, 4.82) | 0.045 | ||||
OR = odds ratio. Likelihood ratio test, the reference category. | ||||||||
Diagnostic test | Sensitivity and specificity values of selected tumor markers ( = 435). | |||||||
Tumor marker A (at 20 ng/mL) | 78.5 | 90.2 | 0.75 (0.71, 0.79) | < 0.001 | ||||
Tumor marker B (at 35 ng/mL) | 60.1 | 50.3 | 0.54 (0.51, 0.57) | 0.004 | ||||
Tumor marker C (at 12 ng/mL) | 55.5 | 81.1 | 0.45 (0.38, 0.52) | 0.919 | ||||
AUC = Area under the curve. Null hypothesis: true area = 0.5. |
In text, the P -value is written as an italic capital P followed by the value, while as a table header, it should be written as P -value. The authors should write the value instead of reporting the result as “not significant” or “NS” ( 3 ). For example, “the comparison was significant, with P = 0.003”. Three decimal places are preferred in the MJMS for all ranges of P -values. The editors are aware of different guidelines on the number of decimal places of P -values, for example, as given in Cummings and Rivara ( 3 ) and Cole ( 4 ).
Statistical tests that are named after the statistical distributions on which they are based, for example, t -test, F -test, and χ 2 -test, are italicised. In addition, coefficients, for example, r (Pearson’s correlation coefficient), R 2 ( R -squared), and α (Cronbach’s alpha) are also italicised.
Computer programs used for statistical analysis should be described, specifically, the name of the program and the version should be given as well as the specific add-on packages if applicable. For example, “IBM SPSS for Windows version 22.0 was...” and “ psych version 1.5.8 and lavaan version 0.5–20 packages were used in the R software environment.” The statistical analysis used should be described in sufficient detail to reproduce the analysis ( 2 ), particularly the name of the analysis, its relation to the aims of the study, and the dependent and independent variables. In addition, Lang and Altman ( 2 ) outlined in greater detail general principles of reporting statistical methods.
In general, one decimal place is used for percentage values. Use two or more decimal places for percentage values less than 1.0%. For descriptive statistics of numerical data, add one additional decimal place to the original data. For example, if cholesterol level is reported with one decimal place (e.g. 4.8 mmol/L), the mean and SD should be reported with two decimal places (e.g. mean = 4.82, SD = 2.11 mmol/L). Use two decimal places for test statistics values, for example, values of t -statistic, F -statistic, and χ 2 -statistic.
Using a dash “−” in between any two numbers must be avoided as it could be mistaken for a minus or negative sign. For example, authors should write “the age ranges between 20 to 29 years old” instead of “the age ranges between 20 – 29 years old”. In relation to formatting of numbers in tables, the last digits of numbers must be right-aligned. The formatting is demonstrated in Table 1 and and2 2 .
This editorial outlines the basics of reporting statistical results in medical journals. This editorial will serve as a guide to authors aiming to publish in the MJMS. Given the availability of the guidelines on reporting statistical results, it is hoped that the authors follow the guidelines to ensure standardisation of the submitted manuscripts. This will shorten the process of reviewing and accepting manuscripts submitted to the MJMS.
Intended for healthcare professionals
The purpose of a scientific paper is to communicate, and within the paper this applies especially to the presentation of data.
Continuous data, such as serum cholesterol concentration or triceps skinfold thickness, can be summarised numerically either in the text or in tables or plotted in a graph. When numbers are given there is the problem of how precisely to specify them. As far as possible the numerical precision used should be consistent throughout a paper and especially within a table. In general, summary statistics such as means should not be given to more than one extra decimal place over the raw data. The same usually applies to measures of variability or uncertainty such as the standard deviation or standard error, though greater precision may be warranted for these quantities as they are often used in further calculations. Similar comments apply to the results of regression analyses, where spurious precision should be avoided. For example, the regression equation 1
birth weight=-3.0983527 + 0.142088xchest circumf + 0.158039 x midarm circumf, purports to predict birth weight to 1/1000000 g.
Categorical data, such as disease group or presence or absence of symptoms, can be summarised as frequencies and percentages. It can be confusing to give percentages alone, as the denominator may be unclear. Also, giving frequencies allows percentages to be given as integers, such as 22%, rather than more precisely. Percentages to one decimal place may sometimes be reasonable, but not in small samples; greater precision is unwarranted. Such data rarely need to be shown graphically.
Test statistics, such as values of t or χ 2 , and correlation coefficients should be given to no more than two decimal places. Confidence intervals are better presented as, say, “12.4 to 52.9” because the format “12.4-52.9” is confusing when one or both numbers are negative. P values should be given to one or two significant figures. P values are always greater than zero. Because computer output is often to a fixed number of decimal places P=0.0000 really means P<0.00005—such values should be converted to P<0.0001. P values always used to be quoted as P<0.05, P<0.01, and so on because results were compared with tabulated values of statistical distributions. Now that most P values are produced by computer they should be given more exactly, even for non-significant results—for example, P=0.2. Values such as P=0.0027 can be rounded up to P=0.003, but not in general to P<0.01 or P<0.05. In particular, the use of P<0.05 (or, even worse, P=NS) may conceal important information: there is minimal difference between P=0.06 and P=0.04. In tables, however, it may be necessary to use symbols to denote degrees of significance; a common system is to use *, **, and *** to mean P<0.05, 0.01, and 0.001 respectively. Mosteller gives a more extensive discussion of numerical presentation. 2
The choice between using a table or figure is not easy, nor is it easy to offer much general guidance. Tables are suitable for displaying information about a large number of variables at once, and graphs are good for showing multiple observations on individuals or groups, but between these cases lie a wide range of situations where the best format is not obvious. One point to consider when contemplating using a figure is the amount of numerical information contained. A figure that displays only two means with their standard errors or confidence intervals is a waste of space as a figure; either more information should be added, such as the raw data (a really useful feature of a figure), or the summary values should be put in the text.
In tables information about different variables or quantities is easier to assimilate if the columns (rather than the rows) contain like information, such as means or standard deviations. Interpretation of tables showing data for individuals (or perhaps for many groups) is aided by having the data ordered by one of the variables—for example, by the baseline value of the measurement of interest or by some important prognostic characteristic.
Jonathan A Cook, Dongquan Bi, Jonas Ranstam, The art of reporting numerical data, British Journal of Surgery , Volume 109, Issue 6, 16 May 2022, Pages 548–549, https://doi.org/10.1093/bjs/znac028
It is unfortunate that in English and a number of other languages, we use the same term ‘statistics’ to refer to both numerical data (e.g. national statistics) and also the science of collecting and analysing data (e.g. a degree in statistics). The former can be very mundane and routine, whereas the latter, which utilizes the former to understand variability and carry out inference, is enlightening and often sobering. The reporting of numerical data should be informed by statistical principles (the science of statistics). One area where this can be counterintuitive is the level of numerical precision to report data (both of individual values and summaries like means and standard deviations). A review of three recent BJS articles identified over 1000 statistics across the articles, which either summarized data from the respective study or reported an output from an analysis of study data. Often in manuscripts submitted to BJS , values are reported to an excessive level of numerical precision.
Simplicity, neatness of presentation, a desire to report fully numerical calculations, or a false understanding of the value of the data collected, can all lead to reporting data to an excessive level of precision. For example, it is tempting but somewhat misleading to report 13 out of 39 as (33.33 per cent). The use of two decimal points (four significant figures) gives the false impression of greater precision than really occurred. In this example, each observation accounted for over 3 per cent of the final percentage. There is a quirk of numerical data that percentages will often not add up to exactly 100 per cent where there are three or more categories. However, as long as the number of observations within each category is accurately reported with the percentage in the group, this apparent discrepancy is but a small price to pay for greater clarity and a more honest representation of the data. A simple rule of thumb is that if there are less than 100 observations in a sample, reporting percentages to fractions of a per cent is not helpful. Arguably even for larger samples, it is rarely necessary except perhaps for reporting the extremes (e.g. 0.1 versus 0 per cent). Many people will find tables filled with redundant decimal points more difficult to read and understand, and find themselves lost in a sea of unnecessary figures. The number of people who are able to differentiate between tenths of a per cent, let alone 100 of a per cent, are, in our view rather miniscule. Equivalent arguments can be made for reporting proportions where two decimal points should be the standard level of precision.
Continuous data are perhaps the hardest to fairly report; to do so requires an understanding of the quantity measured, how well it can be measured, how it is or might be used, and how well the original data were recorded. While an operation time reported as 134.682341 min can be occur and could be recorded as such, it is of no greater value (and certainly of lesser clarity) to do so than recording it as simply 135 min. You would be a brave person to believe an operation time is accurately recorded to this level of precision (microseconds) in practice. For the poor reader of papers, this is even more so the case when it is recognized that operating practices and the recording of the operation time tend to vary greatly between surgeons, surgical teams, and institutions. My 135 min could easily be your 140 min. Reporting mean operation times beyond minutes is of little clinical value irrespective of how large the study sample size is.
Another area where excessive decimal points are commonly used is in reporting P -values as if more zeros create stronger evidence. This probably reflects both some misunderstanding of what a P -value is, which is an indirect measure of evidence against a null hypothesis, and not a measure of the strength, nor the magnitude 1 of statistical disagreement (only of the unlikeliness of compatibility assuming the null hypothesis is true). If the conventional statistical approach is being used with a prior statistical significance level specified (say, the typical two-sided 5 per cent level), then reporting it to a sufficient level to see how the P -value relates to that marker of statistical significance is what is important. As a consequence, reporting a P -value as ‘ P < 0.05’ or ‘n.s.’ does not provide sufficient detail as it is unclear how close or not the P -value was to the cut-off point. However, once the value is some distance from the cut-off marker, further precision is of little consequence. P -values above 0.1 can happily be reported to one decimal point (e.g. 0.2), and if 0.05 is the significance level of interest, then the differences between P -values <0.001 are of little interest. Where large numbers of hypotheses are tested (e.g. analyses of genetic data), the significance level of interest should be reduced 2 , but the same principle can be applied to the corresponding lower significance level. Here, as with reporting other outputs of statistical tests and models such as a mean difference or an odds ratio with confidence intervals, the level of precision that is appropriate should be determined by the question we are seeking to answer and how the findings might be applied, as well as the level of precision the data can bear.
Helpful guidance on reporting numbers is available elsewhere 3 for a range of statistical metrics beyond those covered here. It is difficult to be too prescriptive as surgery and science are wonderfully complex. Nevertheless, it would benefit our research and the readers of it if we were more circumspect in reporting our data and a bit more humble in our data presentation.
Disclosure . The authors declare no conflict of interest.
Cook JA , Ranstam J . Statistical methods that provide an effect size are to be preferred . Br J Surg 2016 ; 103 : 1365
Google Scholar
Cook JA , Ranstam J . Spurious findings . Br J Surg 2017 ; 104 : 97
Cole TJ . Too many digits: the presentation of numerical data . Arch Dis Child 2015 ; 100 : 608 – 609
Month: | Total Views: |
---|---|
March 2022 | 48 |
April 2022 | 8 |
May 2022 | 76 |
June 2022 | 65 |
July 2022 | 38 |
August 2022 | 23 |
September 2022 | 27 |
October 2022 | 22 |
November 2022 | 22 |
December 2022 | 31 |
January 2023 | 13 |
February 2023 | 11 |
March 2023 | 16 |
April 2023 | 12 |
May 2023 | 12 |
June 2023 | 4 |
July 2023 | 14 |
August 2023 | 13 |
September 2023 | 8 |
October 2023 | 7 |
November 2023 | 13 |
December 2023 | 7 |
January 2024 | 9 |
February 2024 | 5 |
March 2024 | 11 |
April 2024 | 9 |
May 2024 | 12 |
June 2024 | 8 |
July 2024 | 8 |
August 2024 | 8 |
September 2024 | 4 |
Citing articles via.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Email citation, add to collections.
Your saved search, create a file for external citation management software, your rss feed.
Affiliation.
PubMed Disclaimer
Full text sources.
NCBI Literature Resources
MeSH PMC Bookshelf Disclaimer
The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
When presenting data using a percentage, is it a good thing to have decimal places, say 2 decimal places instead of rounding off to whole numbers?
For example, instead of 23.43%, you round off to 23%.
I am looking at this from the perspective of whether the 2 decimal places accuracy will make much difference since we are dealing with percentage and not raw data value.
It depends on the size of the differences between classes. In most applications, saying the 73% prefer option A and 27% prefer option B is perfectly acceptable. But if you're dealing in an election where candidate X has 50.15% of votes and candidate Y has 49.86%, the decimal places are very much necessary.
Of course, you need to take care to make sure that all classes add up to 100%. In my electoral example above, they add up to 100.01%. In that case you might even consider adding a third decimal place.
Different organisations often have conflicting rules for the precision in reporting of results. Ultimately there is a trade-off between when seeing the extra digits is useful, versus cases where unnecessary and excessive precision "can swamp the reader, overcomplicate the story and obscure the message" — a subject explored by Tim Cole (2015) in a piece that I found gave a useful guide to "sensible" precision in reporting, and a comparison of leading style manuals. His advice on percentages was as follows:
Integers, or one decimal place for values under 10%. Values over 90% may need one decimal place if their complement is informative. Use two or more decimal places only if the range of values is less than 0.1% Examples: 0.1%, 5.3%, 27%, 89%, 99.6%
By "complement" he is referring to cases where one might be interested in the "other lot", e.g. if I tell you 98% of patients in a trial got better, you may well be interested in the 2% who did not, and in that case another decimal place to distinguish whether that "2%" really means "2.4% or "1.6%" would actually be useful.
Cole, T. J. (2015). Too many digits: the presentation of numerical data. Archives of disease in childhood , 100 (7), 608-609. http://dx.doi.org/10.1136/archdischild-2014-307149
This is a significant figures issue, and is dependent upon the precision of the numbers underlying the percentages. The technically correct number of significant figures is not dependent upon downstream use or the differences between percentage values.
If you're trying to express a percentage describing 5 items out of 7, it would be absurd to claim that it's 71.4285714285% - you simply don't have the precision to back up all those decimal places. When doing division, your answer should have as many significant figures and the fewest number of sig figs in your starting numbers. Here, you only have 1 significant figure, so the percentage should really just be 70%, not even 71%. If you had another example where you want to express 71428 items out of 100000, then you are justified in using more significant figures, all the way out to 71.428%.
Even if you have great precision, it's often preferable to truncate for human readability. Depending on your domain, adding those two extra decimal places may or may not make a difference. You should never over-report significant figures, but you may be justified in under-reporting them if your statistical precision is greater than what's needed for your application.
The goal is to make it easy for the reader to understand the important differences. Too many digits obscures the meaningful difference between values in a table. Too few leaves out important information. Here's a great discussion: https://newmr.org/blog/how-many-significant-digits-should-you-display-in-your-presentation/ and here's a much more detailed analysis: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4483789/
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
https://doi.org/10.1136/archdischild-2015-309113
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
We have all been frustrated reading numbers to too many decimal places, the simplest being digital scales in the outpatient clinic where measurements are probably not accurate to more than 10g although the implication of the weight recorded is that the accuracy is much greater. In an excellent leading article this month Tim Cole takes us back to first principles to discuss this and provide sensible, pragmatic guidelines for the presentation of numerical data. It is interesting and helpful to work through. Remember the difference between decimal places and significant figures. The number of significant figures (digits) is the number of all digits ignoring the decimal point, and ignoring all leading and some trailing zeros. Data should be rounded appropriately—not too much, not too little. Clearly, for example 22.68 (95% confidence interval 7.51–73.67) is more effectively and meaningfully written as 23 (95% confidence interval 7.5–74). The various reporting tools are discussed. Significant figures should be considered rather than just decimal places. The general principle is to use two or three significant digits for effect sizes, and one or two significant digits for measures of variability. There is a helpful summary table included with recommendations given for different scenarios. See page 608 .
Bed sharing and sudden infant death.
Bed sharing increases the risk of sudden infant death in infants less than 3 months. The effect is most profound in infants less than 1 month (5 fold increase in risk of SIDS in infants less than 1 month). The mechanism is not clearly defined. Heyman and colleagues review the accidental deaths during sleep (as a cause of sudden infant death in infancy); New Zealand 48 cases, 2002–2009, 0.1 per 1000 live births. Deaths were due to overlay (n=30), or wedging (n=18), with 34 (71%) in a bed sharing situation. Of the overlay group 8 were by a mother while breast feeding, 4 by a sibling and 17 by a parent. In the wedging group 10 were between a sleeping surface and wall or broken cot, 6 between a cushion and couch and 2 between a sleep surface and bedding. The authors conclude these are potentially preventable deaths particularly if bed sharing is avoided, faulty or if inadequately constructed cots are avoided and extra attention is paid to the safety of sleep arrangements particularly if adhoc/temporary. In an accompanying editorial Volpe and colleagues discuss—Infant sleep related deaths: why do parents take risks. The editorial is provocative discussing these issues in the context of other factors, recent guidance from NICE and the need to inform parents about the risks and benefits in order to help them make the best decision for them and their child. See pages 610 and 603 .
Duration and quality of sleep affect child development and health with early childhood being a time in which sleep consolidates into the night and napping ceases. Many factors influence sleep patterns and childhood sleep patterns have the potential to disrupt family functioning and child well being. Thorpe and colleagues report a systematic review of the evidence regarding the effects of napping on child development and health. 26 articles were included—heterogeneous quality, observational study designs. Most of the findings were inconsistent—cognition, behaviour, health impact—probably because of variability in ages and habitual napping status. The most consistent finding was an association between napping and later onset, shorter duration and poorer quality night sleep with evidence strongest in children greater than 2 years. The authors highlight the absolute need for more data before specific advice is given. Lucy Wiggs discusses the findings and their wider implications in an accompanying editorial. It is interesting to reflect on what is normal—how should a nap be identified (quantity, quality, timing), heterogeneity of the individual, influence of the family and environment, and multiple potential outcome measures of impact and therefore difficulty in studying. Certainly napping in young children is universal and the question posed in the title of the editorial—Daytime napping in preschool aged children; is it to be encouraged—is appropriate. Ensuring children receive sufficient amounts of good quality sleep, according to their individual needs, remains the priority. See pages 615 and 604 .
This is a significant, emotive and difficult issue particularly when the clinician is faced with a patient who needs a blood transfusion but refuses it for religious or other reasons. In a thought provoking leading article Robert Wheeler explores these issues, using case law to illustrate and very much highlighting the different issues in children compared to adult and as such is very relevant to us as paediatricians. The decision of a competent adult to refuse blood is legally binding on doctors. This is not the case in a child or young person under age 18 years when the law will no longer defer to a parent's wishes or religious beliefs if such deference will mean that the child is not treated in accordance with his best interests. This clearly needs to be managed carefully and with consideration of alternative options and after social care and legal advice. The issues and some of the practicalities are complex, even more so during adolescence and the article of relevance to how we manage these difficult situations when blood transfusion or other life saving treatment are needed and for complex reasons consent not forthcoming. See page 606 .
Language | Label | Description | Also known as |
---|---|---|---|
English |
Wikipedia (0 entries), wikibooks (0 entries), wikinews (0 entries), wikiquote (0 entries), wikisource (0 entries), wikiversity (0 entries), wikivoyage (0 entries), wiktionary (0 entries), multilingual sites (0 entries).
Seeing the forest for the trees.
Once you have your data in front of you, you’ve seen how we can form visual summaries with ggplot2 . But how can we calculate numerical summaries? Furthermore, what if we are concerned about summarizing a portion of our data, like just one species of penguin at a time? We will answer these questions below, and introduce some new functions from the dplyr package (within the tidyverse library) along the way. We’ll also look at how factor() can come in handy while plotting.
If you are playing along in RStudio while reading these notes (which we strongly recommend!), be sure to start off by loading the two packages that are necessary for the tutorial by running the following code.
One example of a numerical variable we could have examine is the body mass of a particular penguin (measured in grams). Let’s calculate both a measure of center and spread for this variable.
To get an idea of what summaries we should pick, let’s revisit the density plot from earlier.
What we can glean from this figure is that the distribution of body masses across all species of penguin is skewed right. This means that, for instance, a more typical observation lies closer to 4000 grams than 5000 grams.
If we take an average, it is likely to be pulled to the right by the larger, but less typical, observations. The median observation, however, would be more resistant to this pull. Therefore, the median might be a nice choice for a measure of center. Similarly, since the IQR is initially constructed from the median, it will serve well here as a measure of spread.
Now, let’s calculate these values. We should first isolate our variable of interest. We can do this in code by using the dplyr function select() .
As is custom with dplyr functions, the first argument goes to the data frame you are working with. The following arguments are more function specific. In select() ’s case, we tell the computer which column/variable we are interested in.
Now, we can calculate our summaries. When working with a vector, we could use functions like mean() and median() directly, e.g. median(body_mass_g) . However, body_mass_g is not a standalone vector but is now a column in a data frame called body_mass ! Therefore, we need to access it through a dplyr function called summarise() .
Note that while the first argument goes to the name of the data frame, the following arguments are given to the names of the new columns that summarise() puts in another new data frame (one row by two columns). You can name the columns whatever you would like.
Based on what we’ve found, the median here supports the claim we made above: that a typical penguin has a body mass closer to 4000 grams than to 5000 grams. The middle 50 percent of the penguins have body masses within 1225/2 grams, or roughly 600 grams, of 4050.
Let’s return to the bill length examine of a particular penguin, measured in millimeters. Here is the density plot for all of the data ; for simplicity, earlier we showed you the plot for only the first 16 observations.
This plot is interesting. It appears we have a bimodal shape! While it’s tempting to state that the data is roughly symmetric and calculate an overall mean, we should first see if there are any other variables at play. It stands to reason that different species of penguin might have different anatomical features. Let’s add species to the mix by using the color aesthetic (see if you can code along)!
Aha! We now see that each penguin species has its own shape of distribution when it comes to bill length.
The example above demonstrates a very common scenario: you want to perform some calculations on one particular group of observations in your data set. But what if you want to do that same calculation for every group? For example, what if we’d like to find the average and standard deviation of bill length among each species of penguin separately ?
This task - performing an operation on all groups of a data set one-by-one - is such a common data science task that nearly every software tool has a good solution. In the dplyr package, the solution is the group_by() function. Let’s see it in action.
Like most tidyverse functions, the first argument to group_by() is a data frame. The second argument is the name of the variable that you want to use to delineate groups. In this case, we want to group by species to calculate three separate mean/standard deviation pairs.
Now, assuming we roll with our new grouped_penguins data frame, we can use summarise() like we did before!
From both the visuals and the numbers, we can see that Adelie penguins have much smaller bill lengths on average when compared to Chinstrap and Gentoo penguins. We also see that the Adelie distribution of bill lengths is less variable than the distributions of the other two species.
Finally, let’s return to the violin plot of bill lengths grouped by species of penguin.
What if I wanted the Adelie violin to show up on the top of the graph? By default, the violin plot puts the level first in the alphabetical order on the bottom of the plot. Therefore, I need to reorder the levels of species to put Adelie at the top. This is where factor() will do the job!
As before, bill_length_mm is not a standalone vector but a column in a data frame! We cannot access it directly, e.g. by factor(species, levels = c("Gentoo", "Chinstrap", "Adelie")) .
Therefore, we use the dplyr function mutate() . A mutation involves changing the properties of an existing column, or adding a new one altogether (which we will explore next week).
The first argument of mutate() is dedicated to our data frame, penguins . The second argument can be the name of an existing column or the name of a new column (next week). We want to change species to be an altered version of itself, hence we name the second argument species . Make sure you understand where each set of parentheses closes and ends.
Now, assuming we roll with our new reordered_penguins data frame, we can use ggplot() like we did before!
A summary of a summaries…this better be brief! Summaries of numerical data - graphical and numerical - often involve choices of what information to include and what information to omit. These choices involve a degree of judgement and knowledge of the criteria that were used to construct the commonly used statistics and graphics.
Europe PMC requires Javascript to function effectively.
Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.
IMAGES
VIDEO
COMMENTS
Use the same rule as for the corresponding effect size (be it mean, percentage, mean difference, regression coefficient, correlation coefficient or risk ratio), perhaps with one less significant digit. Test statistics: t, F, χ 2, etc. Up to one decimal place and up to two significant digits. t=−1.3. F=11.
One or two decimal places, or more when very close to ±1. 0.03. 0.7. − 0.89. Risk ratio. Round to two significant digits if the leading non-zero digit is four or more, otherwise round to three (the rule of four11). Alternatively use one/two significant digits rather than two/three.
Too many digits: the presentation of numerical data. Arch Dis Child. 2015 Jul;100 (7):608-9. doi: 10.1136/archdischild-2014-307149. Epub 2015 Apr 15.
As a s tatis tical revie wer for Archives and. BMJ I am interested in the presentation of. numerical data. It concerns me that. numbers are often reported to ex cessive. precision, because too ...
As a statistical reviewer for Archives and BMJ I am interested in the presentation of numerical data. 1 It concerns me that numbers are often reported to excessive precision, because too many digits can swamp the reader, overcomplicate the story and obscure the message.
Emperor Joseph II : Well, there it is. Quotation from the film Amadeus (1984) As a statistical reviewer for Archives and BMJ I am interested in the presentation of numerical data.1 It concerns me that numbers are often reported to excessive precision, because too many digits can swamp the reader, overcomplicate the story and obscure the message.
TOO MANY DIGITS—THE PRESENTATION OF NUMERICAL DATA We have all been frustrated reading numbers to too many decimal places, the simplest being digital scales in the out-patient clinic where measurements are probably not accurate to more than 10g although the implication of the weight recorded is that the accuracy is much greater.
The presentation of numerical data - UCL Discovery. Too many digits? The presentation of numerical data. Cole, TJ; (2015) Too many digits? The presentation of numerical data. Archives of Disease in Childhood , 100 (7) pp. 608-609. 10.1136/archdischild-2014-307149 . Preview. Text. Arch Dis Child-2015-Cole-608-9.pdf.
BMJ I am interested in the presentation of. numerical data.1 It concerns me that. numbers are often reported to excessive. precision, because too many digits can. swamp the reader, overcomplicate the. story and obscure the message. A number's precision relates to its. decimal places or significant figures (or as.
Too many digits: the presentation of numerical data. Reporting guideline provided for? ... Full bibliographic reference: Cole TJ. Too many digits: the presentation of numerical data. Arch Dis Child. 2015;100(7):608-609. Language: English: PubMed ID: 25877157: Relevant URLs (full-text if available)
As a statistical reviewer for Archives and BMJ I am interested in the presentation of numerical data. 1 It concerns me that numbers are often reported to excessive precision, because too many digits can swamp the reader, overcomplicate the story and obscure the message. A number's precision relates to its decimal places or significant figures ...
This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy. Abstract ... Similar Articles Presentation of numerical data. ...
Units - Free download as PDF File (.pdf), Text File (.txt) or read online for free. 1) The presentation of numerical data in publications is often overly precise, reporting numbers with too many decimal places or significant digits, which can obscure the key message. 2) There is no single rule for rounding that works in all cases - guidelines variously specify rounding to a certain number of ...
For descriptive statistics of numerical data, add one additional decimal place to the original data. For example, if cholesterol level is reported with one decimal place (e.g. 4.8 mmol/L), the mean and SD should be reported with two decimal places (e.g. mean = 4.82, SD = 2.11 mmol/L). ... Too many digits: the presentation of numerical data ...
Precision and rounding—decimal places and significant digits. Reporting of numerical data is an important element in medical research. Summary statistics are often reported to too many decimal places, leading to spurious precision and over-complicated presentation1; less often, too few decimal places are used, resulting in a lack of precision.. Surprisingly, few guidelines on the subject
For example, the regression equation 1. birth weight=-3.0983527 + 0.142088xchest circumf + 0.158039 x midarm circumf, purports to predict birth weight to 1/1000000 g. Categorical data, such as disease group or presence or absence of symptoms, can be summarised as frequencies and percentages. It can be confusing to give percentages alone, as the ...
The reporting of numerical data should be informed by statistical principles (the science of statistics). One area where this can be counterintuitive is the level of numerical precision to report data (both of individual values and summaries like means and standard deviations). A review of three recent BJS articles identified over 1000 ...
Presentation of numerical data. Presentation of numerical data BMJ. 1996 Mar 2;312(7030):572. doi: 10.1136/bmj.312.7030.572. Authors D G Altman 1 , J M Bland. Affiliation 1 IRCF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford. PMID: 8595293 PMCID: ...
When presenting data using a percentage, is it a good thing to have decimal places, say 2 decimal places instead of rounding off to whole numbers? ... Too many digits: the presentation of numerical data. Archives of disease in childhood, 100(7), 608-609. ... Too many digits obscures the meaningful difference between values in a table. Too few ...
Clearly, the second statistic is too precise to be realistic. Altman and Bland (1996) make. Intrinsic measures of precision: Rounding descriptive statistics from the precision of raw measurements. The following requires the notion of significant digits. In the world of mathematics, numbers are composed of digits, each one having a definite value.
Too many digits—the presentation of numerical data. We have all been frustrated reading numbers to too many decimal places, the simplest being digital scales in the outpatient clinic where measurements are probably not accurate to more than 10g although the implication of the weight recorded is that the accuracy is much greater.
Too many digits: the presentation of numerical data (Q28647834) From Wikidata. ... scientific article. edit. Language Label Description Also known as; English: Too many digits: the presentation of numerical data. scientific article. Statements. instance of. scholarly article. 0 references. title. Too many digits: the presentation of numerical ...
A summary of a summaries…this better be brief! Summaries of numerical data - graphical and numerical - often involve choices of what information to include and what information to omit. These choices involve a degree of judgement and knowledge of the criteria that were used to construct the commonly used statistics and graphics.
This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.
This review highlights the clinical presentation, complications, evaluation, and numerical significance, when applicable, for the following skin findings: infantile hemangiomas, capillary malformations, café-au-lait macules, hypopigmented macules, juvenile xanthogranulomas, pilomatricomas, and angiofibromas.