The Promise and Pitfalls of Self-report: Development, research design and analysis issues, and multiple methods.

  • Frontline Learning Research 8(3):1-9
  • CC BY-NC-ND 4.0

Luke K Fryer at The University of Hong Kong

  • The University of Hong Kong

Daniel L. Dinsmore at University of North Florida

  • University of North Florida

Abstract and Figures

Chronological representation of the data collection procedure.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Hua Du

  • Anna Göddeke

Petra Kneip

  • Yoga Religia
  • Pretika Prameswari
  • Mohamad Fadli

Pankaj Sheshrao Chavan

  • Abhinav Sarkar
  • Aditya Panwar

Ernesto Panadero

  • Fretty S Siahaan
  • Khairul Ummi

Mary Ainley

  • Suzanne E. Hidi
  • Dagmar Berndorff
  • Keith Rayner

Philip H Winne

  • Christopher A. Wolters

Nicolette van Halem

  • Victor Jupp

Amanda Durik

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Subjective data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics application

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation School of Business and Economics, Maastricht University, Maastricht, The Netherlands

ORCID logo

Contributed equally to this work with: Bart Rienties, Quan Nguyen

Roles Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing

Affiliation Institute of Educational Technology, Open University UK, Milton Keynes, United Kingdom

Roles Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

Affiliation School of Information, University of Michigan, Ann Arbor, MI, United States of America

  • Dirk Tempelaar, 
  • Bart Rienties, 
  • Quan Nguyen

PLOS

  • Published: June 12, 2020
  • https://doi.org/10.1371/journal.pone.0233977
  • Peer Review
  • Reader Comments

Table 1

For decades, self-report measures based on questionnaires have been widely used in educational research to study implicit and complex constructs such as motivation, emotion, cognitive and metacognitive learning strategies. However, the existence of potential biases in such self-report instruments might cast doubts on the validity of the measured constructs. The emergence of trace data from digital learning environments has sparked a controversial debate on how we measure learning. On the one hand, trace data might be perceived as “objective” measures that are independent of any biases. On the other hand, there is mixed evidence of how trace data are compatible with existing learning constructs, which have traditionally been measured with self-reports. This study investigates the strengths and weaknesses of different types of data when designing predictive models of academic performance based on computer-generated trace data and survey data. We investigate two types of bias in self-report surveys: response styles (i.e., a tendency to use the rating scale in a certain systematic way that is unrelated to the content of the items) and overconfidence (i.e., the differences in predicted performance based on surveys’ responses and a prior knowledge test). We found that the response style bias accounts for a modest to a substantial amount of variation in the outcomes of the several self-report instruments, as well as in the course performance data. It is only the trace data, notably that of process type, that stand out in being independent of these response style patterns. The effect of overconfidence bias is limited. Given that empirical models in education typically aim to explain the outcomes of learning processes or the relationships between antecedents of these learning outcomes, our analyses suggest that the bias present in surveys adds predictive power in the explanation of performance data and other questionnaire data.

Citation: Tempelaar D, Rienties B, Nguyen Q (2020) Subjective data, objective data and the role of bias in predictive modelling: Lessons from a dispositional learning analytics application. PLoS ONE 15(6): e0233977. https://doi.org/10.1371/journal.pone.0233977

Editor: Vitomir Kovanovic, University of South Australia, AUSTRALIA

Received: February 16, 2020; Accepted: May 16, 2020; Published: June 12, 2020

Copyright: © 2020 Tempelaar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data, the MPlus and SPSS codes, and the main components of the output are archived in DANS, the Data Archiving and Networked Services of the NOW, the Dutch organization of scientific research. DANS is an open access resource. The final version of this archive, labelled Tempelaar, D, 2020, "Replication Data for PlosOne 2020 manuscript Tempelaar ea", has received the unique handle: https://hdl.handle.net/10411/YAF7CJ , DataverseNL

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

SBBG or the ‘Snapshot, Bookend, Between-Groups’ paradigm is the unflattering description by Winne and Nesbit [ 1 ] of the current state of affairs of building and estimating educational models.

The data reference in that description is provided by the snapshot ‘S’, an S that could equally well stand for a survey or self-report. In the alternative paradigm of a ‘more productive psychology of academic achievement’ [1, p. 671] that the authors offer, the use of trace data collected over time that describe learning episodes, supplemented with some snapshot data represents one of the paradigmatic changes suggested. Other researchers go even further in restricting the role of snapshot type of data in educational research. For example, in line with traditions in the area of metacognition terming the different data paradigms as off-line and online, Veenman [ 2 , 3 ] limits the description of the properties of off-line data to a list of ‘fundamental validity problems’, such as the problem of the individual reference point (variability in perspective chosen by learners), memory problems (failing to correctly retrieve past experiences), and the prompting effect problem (item steering in a direction different from what a spontaneous self-report will bring). If these observations were to be representative for the development of empirical educational research, it suggests that research papers based on questionnaire data do not have such a bright future.

Reading through this methodological litany, the asymmetry in descriptions of data sources and data types in educational research stands out. Where the fundamental validity problems going with questionnaire data are typically spelt out at considerable detail, with the above issues raised by Veenman [ 2 , 3 ] being no more than the top of the iceberg, a critical evaluation of the characteristics of online data, or trace data, is often missing. It is as if online data represents by definition “true”, unbiased data, which are both valid and reliable [ 2 – 4 ]. Much to our surprise, in the research area of learning analytics, an opposite development can be observed. Several learning analytics researchers explicitly recognize the limitations of models designed on online data only [ 5 , 6 ]. There is an emergence of research that seeks to integrate different types of data, such as visible from the new area of dispositional learning analytics that searches to append trace data, the classical subject of learning analytics research, with questionnaire measured disposition data, or educational research using multi-modal data [ 5 , 6 ].

The aim of this study is to showcase the benefits of critically assessing the characteristics of trace and questionnaire data. This showcase is developed in the context of a dispositional learning analytics application that combines a wide variety of data and data types: trace data from technology-enhanced learning systems, computer log data of static nature, questionnaire data, and course performance data. In the survey research literature, it is widely acknowledged that although questionnaires and psychometric instruments measuring constructs like anxiety, motivation, or self-regulation, have strong internal and external validity, many respondents have a typical response style [ 7 , 8 ]. For example, some learners are more inclined to have an acquiescence style of response (i.e., the tendency to yeah saying), while others tend to extreme responses (i.e., using the extremes on the Likert response scale). Similarly, in terms of confidence biases, some learners might underestimate their abilities, skills, and knowledge, while others might overestimate their confidence [ 9 ]. One view is to consider these response styles and confidence biases as unwelcome, another is that these “biases” could potentially be used as interesting proxies of underlying features of a respondent. Therefore, as an instrument to characterise this rich set of data, we develop two alternative approaches: one building on the framework of response styles; and an alternative one based on the difference between subjective and objective notions of confidence in one’s learning.

Both response styles and confidence differences serve as a potential source of biases in the data. One would expect this to refer to self-report questionnaire data only, but we will investigate other data types (e.g., trace data, performance data) on the presence of these anomalies. Based on that analysis, we intend to answer two generalised research questions:

  • In a data-rich context consisting of data of questionnaire type, trace data of both process and product type and performance data, how can we decompose each data element into a component that represents the contribution of biases, such as response style bias or overconfidence, and a component independent of biases?
  • If our modelling endeavour aims to design models that help predict course performance or explain the relationship between student’s characteristics that act as antecedents of performance, what lessons can be learned from these decompositions into bias and non-bias components?

First, we will introduce the reader to the three buildings blocks of our study: dispositional learning analytics, response styles, subjective and objective confidence measures. Second, we will investigate the presence of any response styles and confidence difference components in subjective questionnaire data, objective trace data, and learning outcomes data types, and discuss their implications.

Three buildings blocks: Dispositional learning analytics, response styles and confidence measures

Dispositional learning analytics.

Dispositional learning analytics proposes a learning analytics infrastructure [ 10 – 12 ] that combines learning data, generated in learning activities through the traces of technology-enhanced learning systems, with learner data, such as student dispositions, values, and attitudes measured through self-report questionnaires [ 5 ]. The unique feature of dispositional learning analytics is in the combination of learning data with learner data: digital footprints of learning activities, as in all learning analytics applications, together with self-response questionnaire learner data. In [ 5 , 13 ], the source of learner data is found in the use of a dedicated questionnaire instrument specifically developed to identify learning power: a mix of dispositions, experiences, social relations, values and attitudes that influence the engagement with learning. In our own dispositional learning analytic research [ 14 – 18 ], we sought to operationalise dispositions with the help of instruments developed in the context of contemporary social-cognitive educational research, as to make the connection with educational theory as strong as possible. Another motivation to select these instruments is that they are closely related to educational interventions. These instruments include:

  • The expectancy-value framework of learning behaviour [ 19 ], encompassing affective, behavioural, and cognitive facets;
  • The motivation and engagement framework of learning cognitions and behaviours [ 20 ] that distinguishes learning cognitions and learning behaviours of adaptive and maladaptive types;
  • Aspects of a student approach to learning (SAL) framework: cognitive processing strategies and metacognitive regulation strategies, from Vermunt’s [ 21 ] learning styles instrument, encompassing cognitions and behaviours (see also [ 22 ]);
  • The control-value theory of achievement emotions, both about learning emotions of activity and epistemic types, at the affective pole of the spectrum [ 23 – 25 ];
  • Goal setting behaviour in the approach and avoidance dimensions [ 26 ];
  • Academic motivations that distinguish intrinsically versus extrinsically motivated learning [ 27 ].

The type of dispositional learning analytics models we have developed within the above theoretical frameworks fit in the current trend in educational research to apply multi-modal data analysis by combining data from a range of different sources. In our research, we invariably find that predictive modelling focusing on learning outcomes or dropout finds formative assessment data as its dominant predictor. However, these formative assessment data are often less timely than one would wish, for example, for doing educational interventions early in the course. The best timely prediction models we were able to design are typically dominated by trace data of product type (e.g. tool mastery scores) combined with questionnaire data, with secondary roles for trace data of process type (e.g. number of attempts to solve math exercise 21, number of assignments completed in week 4), due to its unstable nature [ 15 – 18 ].

Response styles

Response styles refer to typical patterns in responses to Likert response scales questionnaire items [ 7 , 8 , 28 , 29 ]. Although intensively investigated in marketing, cultural, and health studies, response styles went largely unnoticed in empirical educational research. Response styles are induced by the tendency of respondents to respond in a similar way to items, independent of the content of the item, such as yeah saying, or seeking for extreme responses. In the literature, nine common types of response styles are distinguished:

  • Acquiescence Response Style, ARS: the tendency to respond positively
  • Dis-Acquiescence Response Style, DARS: the tendency to respond negatively
  • Net-Acquiescence, NARS: ARS-DARS
  • MidPoint Response Style, MRS: the tendency to respond neutrally
  • Non-Contingent Response, NCR: the tendency to respond at random
  • Extreme Response Scale, ERS: the tendency to respond extremely
  • Extreme Response Scale, ERSpos and ERSneg: the tendency to respond extremely positively or extremely negatively
  • Response range, RR: the difference between the maximum and minimum response
  • Mild Response Style, MLRS: the tendency to provide a mild response.

Longitudinal research into the stability of response styles concludes that response styles function as relatively stable, individual characteristics that can be included as control variables in the analysis of questionnaire data [ 29 ]. Largest effects were found in studies of the ERS style [ 30 ], but explained variation never exceeded the level of 10%. Other empirical studies, such as [ 28 , 31 ] focussed on the ERS only. Response styles constitute a highly collinear set of observations, by definition: for example, mild responses are the complement of extreme responses. Therefore, any analysis of response styles has to be based on a selection from the above styles.

In the fifties and sixties, response styles research focused on a second antecedent of response styles beyond personality: the domain of the questionnaire. The leading research question in those investigations was if response style findings can be generalised over different instruments. Findings indicate that this generalisation is partial: response styles contain both a generic component and an instrument-specific component [ 30 , 32 ]. Empirical research is however limited in most cases to the comparison of response styles in two or three instruments; it is only in applications of dispositional learning analytics as in the current study that one can investigate commonalities in response styles over a broad range of instruments.

Confidence measures

As a second source of response bias, we sought for an indicator of under- or overconfidence, or the difference between a subjective confidence measure and an objective one. Different operationalizations of this can be found, such as judgements of learning, feeling of learning, or ease of learning judgements [ 9 ]. Our operationalization of subjective confidence is best interpreted as a prospective, ease-of-learning indicator. It is based on an expectancy-value framework oriented survey [ 33 ] administered at the start of a course that generates several expectancy scores (such as perceived cognitive competence, or the expectation not to encounter difficulties in learning), and personal value scores. Subjective confidence is then defined as the predicted value of a learning outcome based on survey responses (e.g. the regression of exam performance for mathematics on the scores of the several expectancy- and value-constructs). A similar procedure can be applied to define objective confidence, whereby we define the predicted value of the regression of the exam score on two objective predictors available at the start of the course: the level of prior education and the score on a diagnostic entry test. The difference between these two regression-based predictions is seen as the difference of subjective and objective confidence or a measure of subjective overconfidence. The variables used in the calculations of response styles and the confidence difference will be described in the next section.

Research methods

Ethics approval was obtained by the Ethical Review Committee Inner City faculties of Maastricht University (ERCIC_044_14_07). Participants of the research all provided written consent.

Context of the empirical study

This study took place in a large-scale introductory mathematics and statistics course for first-year undergraduate students in a business and economics program in the Netherlands. The educational system is best described as ‘blended’ or ‘hybrid' [ 34 ]. The main component is face-to-face: Problem-Based Learning (PBL), in small groups (14 students), coached by a content expert tutor (see [ 35 ] for further information on PBL and the course design). Participation in tutorial groups is required. Optional is the online component of the blend: the use of the two e-tutorials—SOWISO and MyStatLab (MSL) [ 18 ]. This design is based on the philosophy of student-centred education, placing the responsibility for making educational choices primarily on the student. Since most of the learning takes place during self-study outside class through the e-tutorials or other learning materials, class time is used to discuss solving advanced problems. Thus, the instructional format is best characterized as a flipped-classroom design [ 35 ].

The student-centred nature of the instructional design requires, first and foremost, adequate actionable feedback to students so that they can appropriately monitor their study progress and topic mastery. The provision of relevant feedback starts on the first day of the course when students take two diagnostic entry tests for mathematics and statistics, the mathematics test based on a validated, nation-wide instrument. Feedback from these entry tests provides a first signal for the importance of using the e-tutorials. Next, the e-tutorials take over the monitoring function: at any time, students can see their performance in the practice sessions, their progress in preparing for the next quiz, and detailed feedback on their completed quizzes, all in the absolute and relative (to their peers) sense. Students receive feedback about their learning dispositions through a dataset containing their personal scores on several instruments, and aggregate scores. These datasets are the basis of the individual student projects students do in the second last week of the course, in which they statistically analyse and interpret their personal data and compare it with class means. Profiting from the intensive contact between students and their tutors of the PBL tutorial groups, learning feedback is directed at students and their tutors, who carry first responsibility for pedagogical interventions.

The subject of this study is the full 2018/2019 cohort of students, i.e. all students who enrolled in the course and administered the learning dispositions instruments: in total, 1080 students (that includes all first-year students, since the student project is a required assignment, but excludes repeat students, who did the project the previous year). A large diversity in the student population was present: only 21.6% were educated in the Dutch high school system. The largest group, 32.6% of the students, followed secondary education in Germany, followed by 20.8% of students with Belgian education. In total, 57 nationalities were present. A large share of students was of European nationality, with only 4.8% of students from outside Europe. High school systems in Europe differ strongly, most particularly in the teaching of mathematics and statistics. For example, the Dutch high school system has a strong focus on the topic of statistics, whereas statistics are completely missing in high school programs of many other European countries. Next, all countries distinguish different tracks of mathematics education at the secondary level, with 31.5% of our students educated at the highest, advanced level preparing sciences, and 68.5% of students educated at the intermediate level, preparing social sciences. Therefore, it is crucial that this present introductory module is flexible and allows for individual learning paths, which is the reason to opt for a blended design with providing students with a lot of learning feedback generated by the application of dispositional learning analytics [ 18 , 35 ].

Instruments and procedure

In this study, we combine data of different types: course performance measures, Learning Management System (LMS) and e-tutorial trace variables, Students Information System (SIS) based variables, and learning disposition variables measured by self-report questionnaires. As suggested by Winne’s taxonomy of data sources [ 4 , 36 , 37 ], our study applies self-report questionnaire data and trace data through the logging of study behaviours and the specific choices students make in the e-tutorials.

The self-report questionnaires applied in this study are described in S1 Appendix : achievement emotions (A. 1), epistemic emotions (A. 2), achievement goals (A. 3), motivation and engagement (A. 4), attitudes towards learning (A. 5), approaches to learning (A. 6) and academic motivations (A. 7). These questionnaires are all long-existing instruments, well-described, and validated in decades of empirical research into educational psychology. Most were administered in the first two weeks of the course, at different days, each administration taking between five and ten minutes. The first exception is the instrument quantifying emotions by participating in learning activities (described in section A. 1), which was administered halfway through the course. This was done to allow students sufficient experiences with the learning activities, while simultaneously avoiding the danger that an approaching exam might strongly impact learning emotions. A second exception is that the motivation and engagement instrument (described in section A. 4), was administered twice: at the start and the end of the course (T2). Since data from the self-report questionnaires are used by the students in individual statistical projects that analyse personal learning data, the responses cover all students (except for about 15 students dropping out). To ease the administration of the questionnaires, all applied the same response format of a seven-point Likert scale. Students provide consent that their personal data is used outside the project for learning analytics-based feedback and educational research.

Course performance measures.

The final course performance measure, Grade, is a weighted average of final exam score (87%) and quiz scores (13%). Performance in the exam has two components with equal weight: exam score mathematics (MathExam) and exam score statistics (StatsExam). The same decomposition refers to the aggregated performance in the quizzes for both topics: MathQuiz and StatsQuiz.

Trace data from technology enhancing learning systems.

Three digital systems have been used to organise the learning of students and to facilitate the creation of individual learning paths: the LMS BlackBoard and the two e-tutorials SOWISO for mathematics and MSL for statistics. From the BlackBoard trace variables, all of the process type, based upon our previous research, we choose BBClicks as the total number of clicks in BlackBoard. From the thousands of trace variables available from the two e-tutorial systems, we selected one product type variable and a few process type variables, all on an aggregate level. The product variable represents mastery achieved in the e-tutorials, as the proportion of exercises correctly solved: MathMastery and StatsMastery. Main process type of variables are the number of attempts to solve an exercise, totalled over all exercises: MathAttempts and StatsAttempts, and total time on task: MathTime and StatsTime. Next, the Sowiso system archives the feedback strategies students apply in solving any exercise, resulting in additional process variables MathHints, the total number of hints asked for, and MathSolutions, the number of worked-out examples asked for.

SIS system data and entry tests.

Our university SIS provided several further variables mainly used for control purposes. Standard demographic variables are Gender (with an indicator variable for female students), International (with an indicator for non-Dutch high school education), and MathMajor (with an indicator for the advanced mathematics track in high school). The MathMajor indicator is constructed based on distinguishing prior education preparing for either sciences or social sciences. Finally, students were required upon entering the course to complete two diagnostic entry tests, one for mathematics (MathEntry), and one for statistics (StatsEntry).

Data analysis

The data analysis of this study contained a sequence of steps. In several of these steps, different options were available as to how to proceed in the analysis. We will shortly explain the choices we made, without suggesting that other choices cannot work as well. In fact, this study would lend itself to an application of ‘multiverse analysis’ [ 38 ]: performing the analyses across a set of alternative data sets applying alternative statistical methods to find out how robust the empirical outcomes are.

Our dispositional learning analytics-based dataset consisted of several types of data: self-report questionnaire data as the dispositions, trace data from learning enhancing systems, demographic data from SIS type of systems, and course performance data.

All questionnaires were administered with items of the Likert 1…7 type, to simplify the response by students. Since the different instruments applied different labels for the several Likert options, we used the three anchors as labels: the negative pole, the neutral anchor and the positive pole. The 7-point Likert scale is a relatively long scale where most response style literature is based on 4-point or 5-point Likert scales. The use of this 7-point Likert scale, as well as the large size of our sample, comply with the outcomes of a recent simulation study [ 39 ] that signals a loss of control of Type 1 error for scales shorter than 7-point and samples smaller than 100.

Researchers investigating very long scales (9-, 10- or 11-point scales) have applied alternative operationalisations of extreme responses, including two extreme response categories at each end of the continuum [ 32 ]. Being in between those short and long scales, we opted to analyse both cases defining extreme responses as the proportion of responses in the single most extreme category as well as the proportion of responses in the two most extreme categories. That is: we defined extreme negative response as the proportion of responses equal to 1 (ERSneg1) or equal to 1 or 2 (ERSneg2), and we defined extreme positive response as the proportion of responses equal to 6 or 7 (ERSpos2) or equal to 7 only (ERSpos1). Analyses were performed for both operationalisations, but reporting was restricted to the case of defining extremity by two categories. An important reason to do so was based on distributional properties of the data: where measures of extreme responses based on the single most extreme outcome are strongly right-skewed, measures based on 1, 2 or 6, 7 together are only moderately skewed. Thus, to prevent the need of data transformations that would make an interpretation of the outcomes of the regression models less straightforward, we opted for the current operationalisation. Next: preliminary analysis suggested that the effects of extreme responses depend on the direction: positive or negative. So, whereas most empirical studies in response styles aggregate positive and negative extreme responses into one category [ 29 , 31 ], we chose to differentiate the two directions. As the Results section will indicate, in most of the models, we found that positive and negative extreme responses had opposite effects, suggesting that aggregation into the total extreme response is dubious.

We used response styles as one approach to operationalising bias. A set of 13 response styles was calculated for all eight questionnaire administrations: ARS, ARSW, DARS, DARSW, MRS, NARS, NARSW, RR, NCR, ERSneg1, ERSpos1, ERSneg2, and ERSpos2, where the last four styles were described above: negative and positive extreme responses, and taking one or two response categories into account. By definition, this set of response styles was strongly collinear, making a selection necessary. We followed other empirical studies in this area [ 31 ] by focusing on only ERS as a descriptor of response styles since the style was found to be relatively stable in repeated measurements and in this way acted as a personality characteristic [ 29 ]. This constitutes the response style that had the strongest impact on measures of central tendency for questionnaire scales, thus the strongest bias.

After computing response style measures for each of the eight questionnaire administrations, we investigated stability over different questionnaire instruments and calculated aggregated measures of response styles. In that aggregation, we excluded the second, end of course administration of the motivation and engagement instrument, so that aggregated measures represented averages of response styles from seven different instruments. An advantage of keeping one instrument apart was that it allowed investigating the role of both stability and endogeneity (the external validation of our extreme response measures, by investigating their role in the explanation of responses to an instrument not included in the calculation of extreme response measures). Concerning endogeneity: if we analysed the role of an aggregate measure of response style and the outcomes of one survey, did it matter much if in the calculation of the aggregated measures we included or excluded the specific survey?

The seven instruments used to generate the aggregated response styles counted in total 77 scales. Of these scales, a majority of scales, 46, were of adaptive or positive valence (examples are the enjoyment of learning, study management, valuing university, intrinsic but also extrinsic motivation). A minority of scales, 13, were of maladaptive (hampering learning activity) or negative (unpleasant) valence (such as a-motivation, boredom, disengagement). The balance between positive or adaptive and negative or maladaptive items differed from instrument to instrument, thereby impacting response style measures, as described in the literature [ 40 ].

the problem of the self report in survey research

In exactly the same manner, we constructed the ΔConfidence(LAX) score as the beta weight of the regression of LAX on ΔConfidence (see Table B1 in S2 Appendix ; since this is a univariate regression, that beta weight equals the correlation) and we decomposed the variable LAX into a predicted and residual part using the variable ΔConfidence as an instrument. That decomposition is indicated as LAXConf and LAXConfcor. In these decompositions, LAXRS and LAXConf represent the bias components, and LAXRScor and LAXConfcor the de-biased, bias-corrected, components.

This procedure was applied to all variables under study, including the ‘objectively’ measured variables. That is, self-report constructs, trace variables of the process and product types, and course performance variables were all assigned variable specific scores for ERSpos, ERSneg and ΔConfidence, and were all decomposed into predicted and residuals components, both with response styles and ΔConfidence as instruments. For the self-report constructs, an alternative operationalisation of extreme responses would have been the extreme response scores of the items belonging to the specific scale. However, this procedure would have limited the analysis to the scale-based self-report variables only and would not allow for constructing an overconfidence component in the data; therefore, we opted for the above approach.

The last step in the analysis was to estimate models of educational processes, in three different modes:

  • using only observed, uncorrected versions of the variables, resulting in traditional models based on observed data;
  • using corrected, de-biased versions of the variables only, deriving alternative models that excluded biases resulting from response styles or confidence differences;
  • the combined model, with observed, uncorrected response variables, and as explanatory variables the combination of corrected, de-biased versions of the survey or trace variables, together with the response style variables or the confidence difference variable.

In this third model, the bias terms are orthogonal to the bias-corrected variables, allowing quantifying the differential impact of response styles or confidence difference variables on models of educational processes. Models we estimated are all of the multiple regression types, for which IBM SPSS vs 26 was applied. Omega reliability measures were calculated in MPlus vs 8.4, using code developed by Bandalos [41, p. 396].

The several subsections will follow the sequential steps in the statistical analysis, described above. At first, we investigate response styles and confidence differences as sources of bias in questionnaire measurements in the first two subsections. All the following subsections document comparisons of models estimated with observed scores and models based on corrected scores using the instrumental variables approach. The first three subsections give insight into the outcomes of the decompositions of all variables under study. In the following subsection, we investigate the impact of these decompositions on the design and estimation of educational models. Given that we collected data based on a wide range of theoretical frameworks, a large number of different models can be estimated (and was indeed estimated). Our reporting is based on a, somewhat arbitrary, selection from all these models. In section four, we estimate the CVTAE model for achievement emotions. Section five investigates epistemic emotions as antecedents of achievement emotions. In the last two sections, we look into models that include other types of data than survey data only. In section six, we predict course performance variables from achievement emotions. And in section seven, we predict course performance variables from trace data. In all of these modelling endeavours, the main emphasis is on the role of the decomposition of all variables in bias and bias-corrected components.

Response styles of different instruments demonstrate some variation in descriptive values, as visible in Table 1 , which can be explained by the balance between adaptive or positive items in the instrument at the one side, and negative or maladaptive items at the other side (in line with findings of other research, [ 33 ]). The AEQ instrument has lowest ARS and ERSpos scores. At the same time, the AEQ has the highest proportion of negatively valenced items (44 out of 54 items, or 81%) and the highest proportion of maladaptive items (33 out of 54 items; or 61%; the eleven learning anxiety items are negatively valenced but of adaptive type). Likewise, EES has a majority of negatively valenced items. Students tend to disagree with these negatively valenced or maladaptive items, causing lower ARS and ERSpos scores, and higher DARS and ERSneg scores. In contrast, the AGQ contains only positively valenced items, and only adaptive or neutral items (depending on how one classifies items with a performance valence). AGQ has the highest ARS and ERSpos scores, the lowest DARS and ERSneg scores.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0233977.t001

Scale reliabilities have been estimated by two different measures: Cronbach’s alpha measure and the omega measure. Omega measures have the advantage over alpha measures that they do not require the strict assumptions that come with the alpha measures and are violated in many situations [ 42 ]. Omega measures are calculated in MPlus based on code described in [ 41 ].

At the same time, there is a reasonable amount of stability in the response style measures, except for the RR and NCR variables, over the instruments: Cronbach's alpha values vary from .64 to .80. Two of the response styles are strongly right-skewed: ERSneg1 and ERSpos1.

There exists collinearity amongst the set of response styles, resulting from the overlap in their definitions. E.g., ARS correlates .74 with ERSpos2, DARS correlates .79 with ERSneg2 (see S3 Appendix ). Therefore, we make a selection from the full set of response styles, based on choices made in other research, reliability, and skewness scores. That selection is MRLS as a mild response, ERSneg2 as a negative extreme response, and ERSpos2 as a positive extreme response. Since MRLS is the complement of ERSpos2 and ERSneg2, we will report the latter two variables in the following sections (shortly addressed as ERSpos and ERSneg). ERSpos and ERSneg are, be it quite weakly, positively related, with r = .103 ( p = .001).

Confidence scores

the problem of the self report in survey research

ΔConfidence is defined as the difference between subjective and objective confidence. If objective confidence is regarded as the true level of confidence, it represents a measure of overconfidence. ΔConfidence is very weakly related to ERSpos (r = -.066, p = .03), but moderately positive related to ERSneg (r = .273, p < .001), as is visible from scatterplot presented in Fig 1 , where each dot represents a student.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.g001

Classification of variables based on response styles or overconfidence

The availability of response style measures allows new ways to categorise our data in educational studies. Rather than using the dichotomy of self-reported data versus objectively scored, we can position each variable of each data type in a two-dimensional plane of response styles: ERSpos and ERSneg. Fig 2 represents such a classification as a scatterplot, where each dot represents a variable in the analysis of either questionnaire, trace or learning outcome type. Variable numbers (see Table B1, S2 Appendix ) are included in the scatter.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.g002

The distance of any point to the origin indicates how strong the role of response styles is in that variable. LAX (2), achievement anxiety, has the largest share of response styles in explained variation: 43%. LAX is characterised by large negative ERSneg score and modest positive ERSpos score. Other variables with that same characterisation are LHL (4), achievement helplessness, Anxiety (9), Confusion (8) and Frustration (10), the three epistemic emotions, and the three maladaptive motivations UC (29), FA (28) and AN (27): uncertain control, failure avoidance, and anxiety. In other words: these are negatively valenced, but mostly activating emotions, motivations and engagement variables.

The second group of variables positioned on the middle top of Fig 2 are characterised by high positive values of ERSneg, and small values of ERSpos. These variables are ASC (5), academic control, AB (32), academic buoyancy, and the several course performance variables: Grade (56), exam and quiz scores on both topics (57–60).

The largest cluster of variables has positive ERSpos scores and about zero ERSneg scores, as positioned on the right of Fig 2 . These represent the several goal-setting behaviours (13–20), the cognitive and metacognitive scales (39–48), and academic motivation scales (49–54), all positively valenced scales. A smaller cluster is that of the trace variables (61–68), again with zero ERSneg scores, and small positive ERSpos scores positioned just to the right of the origin of the graph. The correlation between ERSpos and ERSneg, this time with variables as the subject, is nearly zero (r = .02).

A similar classification can be done for the confidence difference variable. Fig 3 provides the scatter of confidence difference against ERSneg. Low confidence difference observations are therefore epistemic emotions Anxiety (9), Confusion (8), Frustration (10), the three maladaptive motivations UC (29), FA (28) and AN (27): uncertain control, failure avoidance, and anxiety, and the two achievement emotions LAX (2) and LHL (4), learning anxiety and hopelessness. These same variables make up the negative pole of ERSneg. At the other pole of the high positive confidence difference values, we find the variable AB (32), academic buoyancy, ASC (5), academic control, and SB (21), self-belief, that also distinguished in high positive ERSneg values. The correlation between ERSneg and confidence difference with the variables as the subject is high, .86 (much higher than the same correlation with students as subject). For that reason, Fig 3 is designed as the scatter of ΔConfidence against ERSneg (ΔConfidence is again no more than weakly related to ERSpos, with a correlation of -.18).

thumbnail

https://doi.org/10.1371/journal.pone.0233977.g003

The control-value theory of achievement emotions model

The CVTAE model is the simplest model to illustrate the suggested analytic approach within the current dataset. The model contains one predictor variable, academic control (ASC), and four response variables: learning anxiety (LAX), learning boredom (LBO), learning hopelessness (LHL) and learning enjoyment (LJO). The first step is to decompose all these five constructs into a response style component and its residual, or a confidence difference component and it's residual. Table 2 provides that decomposition.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t002

From the left panel of Table 2 , we see that response styles account for a substantial amount of variation in the achievement emotions, up to more than 40% for anxiety and hopelessness. That is not the case in the right panel: explained variation by the confidence difference level is at a lower level, at most 10%. Left and right panel coincide concerning the ranking of the variables on explained variation: anxiety and hopelessness demonstrate the largest biases components, enjoyment the lowest. Given that ΔConfidence shares variation with ERSneg and ERSneg is the dominant predictor of anxiety and helplessness, this pattern is not surprising.

The four regression equations representing the CVTAE model estimated on observed values are contained in Table 3 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t003

The same CVTAE model, now based on bias-corrected values, is provided in Table 4 . Bias correction is based on response styles, left panel, or conference difference, right panel. Bias correction is applied to both left and right-hand side of the four regression equations, that is, e.g., anxiety corrected for response styles is regressed on academic control corrected for response style in the upper left panel.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t004

The effects visible in the two panels are quite different. In the right panel, we see that correcting for overconfidence has an only limited impact: regression betas and explained variation decrease somewhat, but not much. The left panel shows a different picture. Most of the explanatory power is taken out by response style correction of anxiety and boredom values, together with ASC. In the case of boredom, academic control has even lost all of its explanatory power.

The last step in the analysis combines the corrected version of ASC with the bias terms, either ERSpos and ERSneg or ΔConfidence, as predictors of the four observed learning emotion variables. In Table 5 the outcomes of this last step are detailed.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t005

Comparing Table 5 with Table 3 signals again a crucial difference between the corrections by response styles versus overconfidence. The right panel demonstrates that explained variation after adding overconfidence as a predictor does not increase a lot. In contrast to the left panel: adding the two response styles variables has a substantial impact on explained variation.

The epistemic origins of achievement emotions

Four weeks before measuring the achievement motivations embedded within the context of the mathematics and statistics learning tasks discussed in the middle of the course, learning emotions were measured within a more general context: learning for the course in general. Both this difference in timing and context suggest that epistemic emotions act as an antecedent for achievement emotions. To investigate if this antecedent-consequence relationship is invariant under correction for bias, we follow the same steps as in the previous subsection: first, decompose the predictor variables into bias component(s) and a bias-corrected component, and next investigate relationships with and without bias correction. Table 6 provides the decomposition of epistemic emotions measured by the EES instrument.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t006

Response styles explain less variation in epistemic emotions than they do for achievement emotions; the effect of overconfidence is less clear. But similar to the case of achievement motivations, the effect of overconfidence explaining epistemic emotions is much smaller than the effect of the response styles.

The four regression equations relating the achievement emotions to the epistemic emotions are contained in Table 7 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t007

Epistemic emotions explain 30% to 50% of the variation in achievement emotions. The pattern visible in the previous section, indicating that anxiety and hopelessness find a better explanation by academic control as well as response style or overconfidence, repeats with the current very different set of predictor variables. That is remarkable, since hopelessness is the single achievement emotion without a corresponding epistemic emotion, that in all of the other three regression equation absorbs most of the predictive power. In hopelessness, it is epistemic anxiety taking that role, with secondary roles for curiosity, surprise, confusion, frustration and boredom.

The effect of correcting all emotion variables is displayed in two different tables: Table 8 when the correction is based on response styles, Table 9 for the overconfidence case.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t008

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t009

Correcting for response styles has a substantial impact on the relationships between epistemic and achievement emotions: all explained variation values diminish in size, primarily because the role of the main predictor variable is diminished. The story of the overconfidence corrected regression models is different: due to limited collinearity of overconfidence with both types of emotions measurements, the prediction equations of boredom and enjoyment do not change by correcting measurements, whereas the prediction equations of anxiety and hopelessness do change slightly.

In the last modelling step, we add the bias term (either ERSpos and ERSneg or ΔConfidence) to the set of predictor variables and run the regressions with the observed versions of the achievement emotions. Tables 10 and 11 provide these regression outcomes.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t010

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t011

Although the overconfidence variable is a significant predictor of all four achievement emotions, see Table 10 , the decomposition of epistemic emotions into an overconfidence part and an orthogonal part does not increase predictive power. Explained variation is of the same order of magnitude as in Table 7 . The story is again different for the response styles corrected measurements. Explained variation in Table 10 is substantially higher than that in Table 7 . In all four regressions, the predictor with the largest beta is one of the response style variables. The general pattern we can distil from these regressions is that more extreme responses tend to increase the level of positive emotions, decrease the level of negative emotions. Both types of extreme responses have effects of similar directions in case of boredom and enjoyment, whereas, in the case of anxiety and hopelessness, the effect of the negative type of extreme response dominates the effect of the positive type.

From self-report to course performance

Does bias in self-reports also influence objectively measured constructs, such as course performance? We investigate again using the AEQ questionnaire data but would have achieved similar outcomes by using other questionnaire data as predictors. As with the other analyses, we start with decomposing the course performance variables into a bias component and a component orthogonal to that bias. One would expect the bias component to be zero because these are not self-report data, but although the bias component tends to be smaller than in the self-report cases, it is nowhere zero, as is clear from Table 12 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t012

The right panel of Table 12 tells that all relationships between the course performance variables and the overconfidence variable are insignificant from a practical point of view, in that explained variation is always less than 1%. Especially for the first two measures of course performance, Grade and MathExam, this is remarkable since overconfidence is defined as the difference between subjective and objective confidence, and both of these confidence constructs are defined by regression of MathExam on two predictor sets of self-reports respectively objective measures. Due to this inability of the overconfidence construct to explain variation in course outcome variables, we will leave it out of consideration in the remainder of this section, and the next section.

The left panel of Table 12 tells that the story of the response styles is very different. Explained variation is still not impressive for the quiz scores as intermediate course performance variables, but up to 10% for the final and total scores. In all cases, it is the negative extreme response style that dominates the prediction of course performance scores: students high on negative response styles score on average higher in exam and quizzes.

Regression equations explaining observed course performance variables form observed CVTAE variables indicates that explained variation is modest: 16% for the final course grade (see Table 13 ).

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t013

Main predictors are academic control and hopelessness. We see marked differences between the two topics of the course, mathematics and statistics. Hopelessness is a strong predictor for mathematics-related performance, but much less for statistics related performance. Causing a gap between explained variation in performance of both topics: R 2 measures are highest for the two math-related course performances.

Redoing the analysis with response styles corrected measures brings Table 14 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t014

The explanatory power of all five of these regressions has decreased considerably: explained variation is less than half of the explained variation of the equations based on observed measures. The last step in the analysis refers to the explanation of observed performance variables by response styles corrected learning emotions plus the two bias terms themselves: see Table 15 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t015

The explained variation visible from Table 15 is back to the level of the equations expressed in observed measures. The role of the main predictor has however shifted from the academic control variable to the negative extreme response scale: part of explained variation accounted for by academic control in Table 13 , has shifted toward the negative extreme response style in Table 15 .

Self-report biases and trace and course performance variables

In this last step of the empirical analysis, we extend to the trace data and use these trace data to develop regression equations explaining the same five course performance variables as in the previous section. That is: these models are fully based on objective measures, both with regard to response variables, and predictor variables.

As a preliminary analysis, we decompose the trace variables exactly the way we did in the previous section, using both response styles and overconfidence as instrumental variables. The outcome is in Table 16 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t016

We included the corrections for overconfidence (right panel) to demonstrate that as in the course performance variables, overconfidence has no impact on the trace data collected from the learning processes. The largest R 2 equals 0.4% so that we will disregard this type of correction in the remainder of this section. Explained variation by the response styles is modest too, with the largest R 2 of 4.0%. From the three datasets incorporated in this study, it is clear that the learning activity trace data present the weakest relationship with the two bias factors we have composed. Another feature of interest is that the relationships of trace variables with the two response styles seem to be opposite to the relationships of performance and response styles: we find negative, rather than positive beta’s for the negative extreme response scale, and positive, rather than absent, betas for the positive extreme response scale.

In the explanation of course performance variables by trace variables, we make use of the separate topic scores at hand and the topic-specific trace measures. That is why in Table 17 , the predictors of the two mathematics course performance measures differ from the predictors of the statistics course performance measures, except for BBClicks. To address the collinearity within the set of trace variables, we had to remove the Math Sowiso Solutions variable that is highly collinear with Math Sowiso Attempts.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t017

The main predictors in all four equations are the two product types of trace variables that represent the mastery levels achieved by the students in the two e-tutorials. The other trace variables derived from the two e-tutorials, all of the process type, all have negative or zero betas, although they are highly positively correlated with performance in bivariate relations. In combination with the mastery variable, the negative betas of NoAttempts, NoHints and Time tell that students who need more attempts, hints or time to reach the same level of mastery, score lower on average on course performance. Next, we observe that quiz performance is much better explained than exam performance, due to the close connection of quizzes and the e-tutorials.

Redoing the analysis with response styles corrected measures brings Table 18 .

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t018

Regression equations in Table 18 are (practically) identical to those in Table 17 , due to the circumstance that the response styles corrections have little impact on the trace variables. The last step of the analysis is the regression of the observed performance variables on the response style bias-corrected trace variables and the response styles. These outcomes, in Table 19 , differ considerably from the two previous tables.

thumbnail

https://doi.org/10.1371/journal.pone.0233977.t019

Because course performance variables contain a substantial response styles component, in contrast to the learning trace variables, we see that the explanation of course performance does improve adding the response styles to the predictor set. In some cases, that improvement is considerable: for the most difficult to explain performance measure, MathExam, the increase in explained variation is 40%.

The often-cited drawback of self-report data such as surveys and psychometric instruments is it biasedness: self-perceptions are seldom an accurate account of true measures. The question is: is this drawback unique for self-reports? To investigate this question, we constructed two different bias measures: one based on the differences between subjective and objective measures of confidence for learning in university, a type of under- and overconfidence construct, and the other based on extreme response styles. The selected response styles, both positive and negative extreme responses, make up a substantial part of all of the self-reported questionnaire variables, in size ranging between 7% and 43% of the explained variation. That indeed represents a considerable bias. However, the objectively measured performance measures allow the same decomposition and result in response styles contributions to explained variation in the lower end of that same range. Learning systems-based trace variables, both of product and process types, are most resistant to response styles, with the highest contribution to the explained variation of 4%. The role of the other type of bias we sought to operationalize, the overconfidence, is more modest. It stands out of course performance variables and trace variables, and is contained in some of the self-report variables, but nowhere with a variance contribution exceeding 10%.

Negative and positive extreme responses occur in different items: negative extreme response in items with a negative valence, where the scale mean is below the neutral value, and positive extreme response in positively valenced items, with scale means above the neutral value. Typically, the two extreme responses do not go together: items with a high ERSneg tend to have about zero ERSpos, such as learning helplessness, LHL (4) in Fig 1 , and items that have high ERSpos tend to have about zero ERSneg. In Fig 1 , that is the large cluster of variables on the right: since most items are positively valenced, there is a large group of academic motivations, goal setting and learning approaches variables ending up in that cluster on the right. There are a few exceptions to this pattern. Several instruments contain an anxiety-related scale, and three of these (LAX, Anxiety, AN) combine negative ERSneg weights with positive ERSpos weights. That is: if we wish to correct anxiety scores for response styles, true anxiety scores are lower than measured ones for those students with high ERSneg scores, and true anxiety score are higher than measured ones for those students with high ERSpos scores. If we look at Tables 2 and 6 , we see that the correction induced by ERSneg is a consistent and strong one: in all negatively valenced constructs, we find that students with high ERSneg levels exaggerate their negative emotions, so a downward correction is required. Likewise, these students undervalue their positive emotions, so an upward correction is demanded. The role of ERSpos is less unambiguous and not uniquely determined by the valence of the scale. In several scales, we find that an upward correction is required for students high in ERSpos scores: their anxiety levels are higher than measured, but their enjoyment and curiosity levels too. The exception is in boredom, both epistemic and achievement type: students who tend to provide extreme positive responses, exaggerate their boredom levels, calling for a downward correction.

The patterns induced by overconfidence mirror those of the negative extreme response style, but at a smaller scale. Overconfidence increases the level of the constructs with a positive valence, as academic control and enjoyment, and decreases the levels of negative emotions: anxiety of several types, hopelessness, frustration and confusion. Correcting for overconfidence will thus imply a downward correction of these positively valenced constructs and an upward correction for the negatively valenced constructs.

Course performance variables follow most of the patterns of the positively valenced self-report scales, in that we find consistent, strong ERSneg contributions. That is: expected performance levels of students with high ERSneg scores should be corrected in an upward direction. Remarkably, no correction for ERSpos scores is needed, what results in the course performance variables clustering together along the positive part of the vertical axis in Fig 2 . Together with Academic control, ASC (5), Cognitive competence (34) and Affect (35), the three variables expressing perceived self-efficacy.

In Fig 2 , the cluster nearest to the origin is that of the learning activity trace variables. They are least biased, and thus need no more than a small correction, given that students with high positive extremes tend to have slightly higher average activity levels.

In the literature on ‘fundamental validity problems’ [ 2 , 3 ], the individual reference problem and the memory problem can both explain the existence of response styles like differences in answer patterns between students. If the absence of such answer patterns is taken as a definition of the true level of measurement, then it is clear that all of the self-reports, as well as course performance variables, represent biased constructs, and that the decomposition of these variables into a response style component and a component orthogonal to that, is one of taking the bias out. However, to make validity into a meaningful concept, it has to be criterion-related. In educational research, that criterion is that it helps understanding educational theories: theories that relate multiple educational concepts measured with different instruments, or theories that relate such concepts with the outcomes of educational processes. If that is the main criterion, then our definition of validity and bias should change. A valid instrument is then an instrument that contains such typical person-specific response patterns, and bias is now defined as the incapability of the instrument to account for such patterns. In the context of our application: it is the self-report and course performance data that represent the unbiased parts of our data collection, since we aim to investigate the empirical model of the control-value theory of achievement emotions (CVTAE) and its contribution in the explanation of course outcomes, whereas our trace variables represent the biased part of our data collection, due to its inability to account for these typical personal patterns in the data that determine our criterion.

the problem of the self report in survey research

Is it the second equation that we prefer? It has eliminated the impact of response styles, at least those we distinguished, but says nothing about other potential biases. The third equation has the advantage that it allows an impression of the impact of response styles, but it is in unattractive, non-parsimonious format. One needs all three expressions, because the first two help understand the extent to which helplessness and academic control share the same response styles, and the third one provides the decomposition into response style or not. However, the problem is that the response style is only one source of bias. In this example, confidence difference brings the second type of bias, accounting for 7% of the variation. Adding this second correction or any further correction one can think of, would add explanatory power, but make for an explanation most obviously lacking any parsimony, without the guarantee that all bias sources are covered. It is therefore that we prefer the first formulation of the three equations, knowing that this choice sacrifices at least 10% of the explained variation, resulting from the circumstance that helplessness carries a larger response style component than academic control can account for.

The outcome of this study that connects with all our previous research [ 14 – 18 ] is that we once more discovered how “dangerous” learning activity trace data of process type can be. In this study, we included NoAttempts, NoSolutions, NoHints, and TimeOnTask as examples of such process variables. All these variables demonstrate strong positive bivariate relationships with all of the learning performance variables, telling the simple message: the more active the student, the higher the expected learning outcomes. Nevertheless, that simple message is deceptive: as soon as we add a covariate of product type, such as Mastery in the learning tool, the role of the process predictors changes radically: relationships become negative or vanish. In itself not surprising, and easily explained by a second simple mechanism. The student who needs to consult more worked-out examples (Solutions), the student who needs more Hints, the student who needs more Attempts, the student who needs more TimeOnTask, than another student to reach the same level of Mastery, is learning less efficiently, and therefore predicted to achieve lower course performance scores on average. The obvious way out of this problem is finding the causes of these efficiency differences and correct for these factors. That is no easy way to go; although having access to a huge database of personal characteristics of students, none of these qualified as a proper predictor of learning efficiency. Prior education, diagnostic entry test scores and other variables of this type all explain a small part of these efficiency differences, but no more than that.

Reflecting on our research questions: we do find that self-report survey data and course performance data largely reflect the conceptualisations of the constructs we intended to find in our models. Both types of constructs contain response style type of components of modest to a substantial size. These might be regarded as components of bias, and contrast to the trace variables that lack these bias components. However, is it reasonable to make these traces the standards of our educational theories? When designing models, we hardly ever will do so with the prime aim of explaining levels of traces of learning activity. The majority of our models seek to understand the outcomes of learning processes or investigate the relationships between social-cognitive antecedents of these learning outcomes. Therefore, if these modelling aims define our standards, the bias is at the side of the trace variables in that they need to be corrected to include the stable response style patterns that characterise all other variables in our models.

These differences in response style patterns do not necessarily constrain analytical choices. If a sufficiently rich set of self-report data is available, as in our application, we can make a reliable decomposition of all variables in the analysis. Models that build on such a decomposition have the advantage of high predictive power, against the disadvantage of being less parsimonious, more difficult to interpret. If we prefer to stick with parsimonious models that apply measured variables only without correction, we are indeed restrained in our analytical choices. In our case, that restriction comes down to a limitation of the role that trace variables can play in the explanation of other types of variables, due to their incapability to catch the response style components.

The inclusion of response styles as separate explanatory factors does change the interpretation of models somewhat. In a manner that is quite intuitive: if we isolate the response styles components from the achievement emotions, as in Table 15 , the achievement emotions will lose part of their predictive power in favour of the response styles. That is exactly what happens in the comparison between Tables 13 and 15 : it is still academic control, ASC, and helplessness, LHL, that predict the several course performance categories, with a positive beta for ASC and a negative beta for LHL, but the absolute size of these betas are diminished. That predictive power is now absorbed by ERSneg. This finding can be generalised: when we estimate models of learning processes that are formulated in terms of variables that share a common component, such as a response style or any other ‘bias’, we will find inflated estimates caused by the circumstance that the same bias component is part of both response and predictors. Any predictor that is free of that bias component will also be free of such an inflated estimate.

Limitations and future directions

In our context, we find that it is the trace type of data that stands out in the sense that these data cannot be easily integrated with self-report and course performance data. That is a robust outcome: the same data-rich context used in this study has been investigated in more than ten years of learning analytics research, always with that same conclusion. Strong heterogeneity in our population may be part of the explanation of why the trace variables are so out of synch with the other measured constructs. High levels of learning activity may signal a student who likes doing the subject and is very good at it or a very conscientious student but may also be an indicator for extra learning efforts required to compensate low proficiency levels at the start of the course. Where the heterogeneous population benefits, in general, most model building endeavours, it clearly limits the analysis of the learning activity to learning outcomes relationship. If this analysis could be repeated in a more homogeneous sample, we might have found more stable roles for the online trace variables. However, it would not help to solve the other issue: by not being able to capture response patterns characterising questionnaire and learning outcome data, their role in empirical models based on multi-modal data is problematic.

Heterogeneity in our sample is not the only difference with other studies. Quite a lot of studies are based on experimental design, with limited numbers of participating students and focussing on learning activities of limited intensity. For instance, the Zhou and Winne study [ 4 ] is based on 95 students in a one-hour experimental session, and the Fincham study [ 43 ] on 230 students participating in one of three different MOOCs. These MOOCs lasted five to ten weeks, but per active week, students watched an average of less than one video and submitted between one and two problems. In contrast, our study (and previous ones) focus on learning activities with far higher intensity. During our eight-week course, students do, on average, 760 problem-solving attempts in the math e-tutorial and 210 attempts in the stats e-tutorial, in total more than 120 attempts per week. Given that the number of problems offered per week fluctuates between 40 and 80, a substantial part of these attempts represents repeated attempts, where the need to repeat attempts differs strongly from student to student. Therefore, it is not unlikely that the difference in the role of trace variables of process type play is a consequence of investigation of learning in a small-scale experimental design, versus participating in intensive activities in an authentic learning context. More research on the role of the learning context is needed to answer such questions.

A third option to extend this study is turning it into a multiverse analysis [ 38 ] by investigating alternative data sets and alternative statistical methods to validate our findings in different contexts. The application of robust regression methods is a prime candidate of such alternative statistical method, as well as the application of slider rating scales in the administration of the several self-report instruments.

The last topic of future research refers to the mechanism at work that might explain the relationships between response styles and learning outcome variables, or trace variables of product type. Potential antecedents of response styles, such as cultural factors or gender, have been researched [ 8 , 29 ]. But these studies are not of much help in explaining why response styles based on questionnaire data do show up in other types of data, like performance data and trace data of product type. More research is needed here.

Conclusions

The large-scale introduction of technology-enhanced learning environments has had a huge impact on education as well as educational research. Questionnaire data, long time being the main source of empirical studies of learning and teaching, lost its prominent position to online data collected as digital traces of learning processes. Because this online data, using the term that is used in the area of metacognitive research, refers to data that is collected during the learning process itself, by following the student in all learning activity steps. Trace data of process type collected by technology-enhanced learning systems is an excellent example of such on-line data (whereas trace data of product type is, in fact, part of the off-line data, because it refers to reaching a state of mastery, what is not a dynamic process). In that debate, the outcome is invariable that subjective off-line data is inferior to objective on-line data. In our learning analytics research, where the learning outcome is typically the response variable that is explained and predicted, we find the opposite conclusion invariably: it is the on-line trace data of process type that is inferior to the off-line data and the online trace data of product type. In the generation of explanatory models, the role of process type of trace variables is quite unstable, depending strongly on the covariates in the model. Regression betas can become insignificant or switch signs after adding covariates, especially if these are of product type.

In contrast, product type of trace variables tends to play stable roles, with little disturbance of the addition of covariates. The critique towards data of self-report type, too stable and too trait-oriented [ 37 ] is reversed in this application: it is the trace data of process type, even after aggregation over the full course period, which lacks the stability to act as a reliable predictor. That is what one decade of learning analytics research brought the authors as insight: be very careful with online data of process type, put more trust in online data of product type, complemented with survey data.

Supporting information

S1 appendix. instruments of self-report surveys [ 44 – 49 ]..

https://doi.org/10.1371/journal.pone.0233977.s001

S2 Appendix. Descriptive statistics of all variables in the study.

https://doi.org/10.1371/journal.pone.0233977.s002

The Promise and Pitfalls of Self-report Development, research design and analysis issues, and multiple methods

Article sidebar, main article content.

As a prelude to this special issue on the promise and pitfalls of self-report, this article addresses three issues critical to its current and future use. The development of self-report is framed in Vertical (improvement) and Horizontal (diversification) terms, making clear the role of both paths for continued innovation. The ongoing centrality of research design and analysis in ensuring that self-reported data is employed effectively is reviewed. Finally, the synergistic use of multiple methods is discussed. This article concludes with an overview of the SI's contributions and a summary of the SI's answers to its three central questions: a) In what ways do self-report instruments reflect the conceptualizations of the constructs suggested in theory related to motivation or strategy use? b) How does the use of self-report constrain the analytical choices made with that self-report data? c) How do the interpretations of self-report data influence interpretations of study findings

Article Details

FLR adopts the Attribution-NonCommercial-NoDerivs Creative Common License (BY-NC-ND). That is, Copyright for articles published in this journal is retained by the authors with, however, first publication rights granted to the journal. By virtue of their appearance in this open access journal, articles are free to use, with proper attribution, in educational and other non-commercial settings.

Bondarev, A., & Greiner, A. (2019). Endogenous growth and structural change through vertical and horizontal innovations. Macroeconomic Dynamics, 23, 52-79. https://doi.org/10.1017/S1365100516001115

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.

Chauliac, M; Catrysse, L. ; Gijbels, D. & Donche V. (2020). It is all in the surv-eye: can eye tracking data shed light on the internal consistency in self-report questionnaires on cognitive processing strategies? Frontline Learning Research. 8 (3), 26 – 39. https://doi.org/10.14786/flr.v8i3.48

Chiu, M. H., Liaw, H. L., Yu, Y. R., & Chou, C. C. (2019). Facial micro‐expression states as an indicator for conceptual change in students' understanding of air pressure and boiling points. British Journal of Educational Technology, 50, 469-480. https://doi.org/10.1111/bjet.12597

Dingle, G. A., Hodges, J., & Kunde, A. (2016). Tuned In emotion regulation program using music listening: Effectiveness for adolescents in educational settings. Frontiers in Psychology, 7, 859. https://doi.org/10.3389/fpsyg.2016.00859

Dinsmore, D. L. (2017). Towards a dynamic, multidimensional model of strategic processing. Educational Psychology Review, 29, 235-268. https://doi.org/10.1007/s10648-017-9407-5

Dinsmore, D. L., Alexander, P. A., & Loughlin, S. M. (2008). Focusing the conceptual lens on metacognition, self-regulation, and self-regulated learning. Educational Psychology Review, 20, 391-409. https://doi.org/10.1007/s10648-008-9083-6

Durik, A. M. & Jenkins J. S. (2020). Variability in Certainty of Self-Reported Interest: Implications for Theory and Research. Frontline Learning Research. 8 (3) 85-103. https://doi.org/10.14786/flr.v8i3.491

Fyer, L. & Nakao K. (2020). The Future of Survey Self-report: An experiment contrasting Likert, VAS, Slide, and Swipe touch interfaces. Frontline Learning Research, 8 (3),10-25. https://doi.org/10.14786/flr.v8i3.501

Gillet, N., Morin, A. J., Huyghebaert, T., Burger, L., Maillot, A., Poulin, A., & Tricard, E. (2019). University students' need satisfaction trajectories: A growth mixture analysis. Learning and Instruction, 60, 275-285. https://doi.org/10.1016/j.learninstruc.2017.11.003

Ginns, P., Martin, A. J., & Papworth, B. (2018). Student learning in Australian high schools: Contrasting personological and contextual variables in a longitudinal structural model. Learning and Individual Differences, 64, 83-93. https://doi.org/10.1016/j.lindif.2018.03.007

Godfroid, A., & Spino, L. A. (2015). Reconceptualizing reactivity of think‐alouds and eye tracking: Absence of evidence is not evidence of absence. Language Learning, 65, 896-928. https://doi.org/10.1111/lang.12136

Hidi, S. (2016). Revisiting the role of rewards in motivation and learning: Implications of neuroscientific research. Educational Psychology Review, 28(1), 61-93. https://doi.org/10.1007/s10648-015-9307-5

Iaconelli, R. & Wolters C.A. (2020). Insufficient Effort Responding in Surveys Assessing Self-Regulated Learning: Nuisance or Fatal Flaw? Frontline Learning Research. 8 (3) 104 – 125. https://doi.org/10.14786/flr.v8i3.521

Lawless, K. A., & Riel, J. (2020). Exploring the utilization of the big data revolution as a methodology for exploring learning strategy in educational environments. In D.L. Dinsmore, L. K. Fryer, & M. M. Parkinson (Eds.), Handbook of strategies and strategic processing, (pp.296-316). New York: Routledge.

Lawrence, J. G. (2005). Horizontal and vertical gene transfer: The life history of pathogens. Contributions to Microbiology, 12, 255-271.

Kline, R. B. (2011). Principles and practices of structural equation modeling (3 ed.). New York: Guilford Press.

Martin, A.J. (2011). Prescriptive Statements and Educational Practice: What Can Structural Equation Modeling (SEM) Offer? Educational Psychology Review. 23. 235-244. https://doi.org/10.1007/s10648-011-9160-0

Mayer, R. E. (2017). How can brain research inform academic learning and instruction? Educational Psychology Review, 29(4), 835-846. https://doi.org/10.1007/s10648-016-9391-1

Moeller, J. ;Viljaranta, J.; Kracke, B. & Dietrich, J. (2020). Disentangling objective characteristics of learning situations from subjective perceptions thereof, using an experience sampling method design. Frontline Learning Research, 8(3), 63-84. https://doi.org/10.14786/flr.v8i3.529

Pekrun, R. (2020). Self-report is indispensable to assess students’ learning. Frontline Learning Research, 8 (3), 185–193. https://doi.org/10.14786/flr.v8i3.627

Rogiers, A.; Merchie, E. & Van Keer H. (2020). Opening the black box of students’ text-learning processes: A process mining perspective. Frontline Learning Research, 8(3), 40 – 62. https://doi.org/10.14786/flr.v8i3.527

Van Halem, N., van Klaveren, C., Drachsler H., Schmitz, M., & Cornelisz, I. (2020). Tracking Patterns in Self-Regulated Learning Using Students’ Self-Reports and Online Trace Data. Frontline Learning Research, 8(3) 140-163; https://doi.org/10.14786/flr.v8i3.497

Van Meter, P. (2020) Measurement and the Study of Motivation and Strategy Use: Determining If and When Self-report Measures are Appropriate. Frontline Learning Research, 8(3), 174–184. https://doi.org/10.14786/flr.v8i3.631 .

Veenman, M. V., Van Hout-Wolters, B. H., & Afflerbach, P. (2006). Metacognition and learning: Conceptual and methodological considerations. Metacognition and Learning, 1, 3-14. https://doi.org/10.1007/s11409-006-6893-0

Vriesema, C.C., & McCaslin, M. (2020) Experience and Meaning in Small-Group Contexts: Fusing Observational and Self-Report Data to Capture Self and Other Dynamics. Frontline Learning Research, 8(3), 126-139. https://doi.org/10.14786/flr.v8i3.493

Winne, P. (2020) A Proposed Remedy for Grievances about Self-Report Methodologies. Frontline Learning Research. 8 (3) 164 -173. https://doi.org/10.14786/flr.v8i3.625

Yuen, A. H., Cheng, M., & Chan, F. H. (2019). Student satisfaction with learning management systems: A growth model of belief and use. British Journal of Educational Technology, 50(5), 2520-2535. https://doi.org/10.1111/bjet.12830

Zimmerman, B. J. (2000). Self-efficacy: An essential motive to learn. Contemporary Educational Psychology, 25, 82-91. https://doi.org/10.1006/ceps.1999.1016

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 10 November 2022

Large studies reveal how reference bias limits policy applications of self-report measures

  • Benjamin Lira 1   na1 ,
  • Joseph M. O’Brien 2   na1 ,
  • Pablo A. Peña 3 ,
  • Brian M. Galla 4 ,
  • Sidney D’Mello 5 ,
  • David S. Yeager 2 ,
  • Amy Defnet 6 ,
  • Tim Kautz 6 ,
  • Kate Munkacsy 6 &
  • Angela L. Duckworth 1  

Scientific Reports volume  12 , Article number:  19189 ( 2022 ) Cite this article

8066 Accesses

14 Citations

28 Altmetric

Metrics details

  • Human behaviour

There is growing policy interest in identifying contexts that cultivate self-regulation. Doing so often entails comparing groups of individuals (e.g., from different schools). We show that self-report questionnaires—the most prevalent modality for assessing self-regulation—are prone to reference bias , defined as systematic error arising from differences in the implicit standards by which individuals evaluate behavior. In three studies, adolescents ( N = 229,685) whose peers performed better academically rated themselves lower in self-regulation and held higher standards for self-regulation. This effect was not observed for task measures of self-regulation and led to paradoxical predictions of college persistence 6 years later. These findings suggest that standards for self-regulation vary by social group, limiting the policy applications of self-report questionnaires.

Similar content being viewed by others

the problem of the self report in survey research

Simple questionnaires outperform behavioral tasks to measure socio-emotional skills in students

the problem of the self report in survey research

Going beyond the individual level in self-control research

the problem of the self report in survey research

Teaching self-regulation

Introduction.

Self-regulation refers to a diverse set of personal qualities, distinct from cognitive ability, that enable individuals to set and pursue goals. The terminology favored for self-regulation and its facets varies across the literatures of child development (e.g., effortful control, ego strength) 1 , 2 , 3 , adult personality (e.g., Big Five conscientiousness) 4 , psychopathology (e.g., impulse control) 5 , and economics (e.g., temporal discounting) 6 , 7 . Such diverse traditions in behavioral science have directed this attention because individual differences in self-regulation predict later life outcomes, including academic performance 8 , 9 , 10 ; physical and mental health 11 , 12 , 13 ; well-being and life satisfaction 14 ; civic and social behavior 12 , 15 ; job performance 16 ; earnings 12 , 17 , 18 , 19 ; and wealth 12 , 17 . Moreover, the effects of self-regulation are independent of, and comparable in magnitude to, cognitive ability and family socioeconomic status (SES) 8 , 12 .

A half-century of basic research suggests that self-regulation develops optimally in caring environments that encourage adaptive goal-relevant knowledge (e.g., strategies for managing attention), beliefs (e.g., that emotion and motivation can be regulated), and values (e.g., that self-regulation is important) 20 . This development extends far beyond early childhood, when children are mostly in the company and care of parents. Indeed, adolescence may be particularly important for supporting self-regulation because of the rapid growth, learning, adaptation, and neurobiological development that mark this period of life 21 , 22 , 23 . Further, impulsive choices in adolescence (e.g., to start smoking, to drop out of school) can alter life trajectories in ways that are difficult to reverse 12 .

Schools are a natural target for policy because of their potential to provide equal access to environments that support the development of self-regulation 24 , 25 . Not only is school where young people spend most of their waking hours outside the home, it is also where they experience a multitude of factors that have been shown to either scaffold or stymie the development of self-regulation, including adult role models 26 , 27 and peers 28 , 29 . Recently, a growing chorus of policymakers has urged schools to extend their purview beyond traditional academic coursework and into the domain of social-emotional skills such as self-regulation—a trend that is reflected in the expanded scope of federal and state standards and accountability systems 30 , 31 , 32 .

In this investigation, we identify a pervasive measurement bias that, if not remedied, may thwart policymakers’ efforts to evaluate, measure, and improve the effectiveness of schools that foster adolescent self-regulation. The possibility of this measurement bias has led to serious questions from policymakers about “whether we can make [self-regulation skills] visible, comparable, and therefore amenable to deliberate policy action in a similar way that traditional tests do with academic knowledge and skills” 33 . As a result, education systems have been left with great interest in self-regulation and related constructs—but insufficient scientific guidance.

The empirical starting point for our research is the mixed and often counterintuitive evidence regarding school effects on self-regulation. On one hand, Jackson et al. 34 show encouraging evidence that schools can differ in how much they improve students’ scores on a self-report measure of hard work, and these school differences predicted students’ later college enrollment and persistence. On the other hand, evaluations of charter schools show that they fail to raise self-reports of self-regulation, despite raising report card grades, standardized test scores, attendance rates, and college enrollment levels while reducing incarceration and unplanned pregnancies 35 , 36 , 37 , 38 . Are high-performing schools whose cultures explicitly emphasize hard work and high expectations 39 , 40 in fact having no impact on students’ self-regulation—or is there a problem in how self-regulation is measured?

figure 1

Peers influence the standards by which an individual judges their own behavior, resulting in a “reference bias” effect that distorts cross-context comparisons of self-reported self-regulation. Illustration by Tom McQuaid.

We suggest that reference bias, the systematic error that arises when respondents refer to different implicit standards when answering the same questions 41 , is a legitimate threat to between-school comparisons and can help explain the conflicting evidence of school effects on self-regulation. Moreover, we contend that even within a school, comparisons of students are biased when different subgroups of students rely on different standards when answering the same questions. In the present policy context, reference bias is especially pernicious because it is difficult to detect and diagnose. Unlike social desirability bias, modesty bias 42 , faking, and response style biases 43 , reference bias can emerge even when respondents answer truthfully, and it can coexist with otherwise strong validity associations at the individual level. This is because reference bias can distort inferences any time there are comparisons of self-regulation across groups who differ in their frames of references—for example, schools with very different peer cultures with respect to effort, or even subcultures within a school.

Why might self-report questionnaires be subject to reference bias? Dominant models in survey methodology identify a multi-stage cognitive response process: students first read and interpret the question; then they identify relevant information in memory, form a summary judgment, and translate this judgment into one of the response options; finally, they edit their response if motivated to do so 44 , 45 , 46 . As illustrated in Fig.  1 , a student may interpret a questionnaire item and its response options differently depending on their peers’ typical behaviors 47 . If they have high-achieving classmates who, for example, study for hours each evening and consistently arrive prepared for class, they might judge themselves against higher standards and rate themselves lower in self-regulation than an equally industrious student whose lower-achieving peers study and prepare less. While schools might be effective in increasing self-regulated behavior, they might at the same time increase the standards, leading to lower self-reported self-regulation.

A well-established research literature has demonstrated that the subjective view students hold of themselves, both in general terms (i.e., self-esteem) and in the realm of academic performance (i.e., academic self-concept) depends upon peer comparisons 42 , 48 , 49 . In particular, the Big Fish Little Pond Effect (BFLPE) refers to the lower academic self-concept of students in higher-achieving schools 50 . A related and older literature on social comparison has demonstrated that in general, people spontaneously compare themselves to other people, especially to people who are superior to them in some way, which can lower their subjective appraisal of their own ability 51 . Finally, there is evidence that academic self-concept and standardized test scores are positively correlated within countries but inversely correlated between countries—a phenomenon dubbed the attitude-achievement paradox 42 , 52 . In sum, there is ample evidence for the influence of peers on inherently subjective constructs.

In contrast, evidence that reference bias distorts comparisons of self-regulation across social groups has been indirect. A handful of cross-cultural studies have yielded paradoxical findings (e.g., Asian countries such as Japan and South Korea ranking lower in self-reported conscientiousness than other countries that are typically thought to be less conscientious 53 ), but none of these studies directly measured standards for behavior, relying instead on experts’ ratings of cultural stereotypes or indirect proxies for self-regulation (e.g., the average walking speed in downtown locations of a convenience sample of a country’s residents, as a proxy for the nation’s conscientiousness).

In the educational literature, studies that compare the test scores and average self-regulation scores for different schools have not ruled out unobserved confounds, such as the possibility that school factors (e.g., average family income) that increase test scores (e.g., due to investment in educational opportunities) also decrease self-regulation (e.g., by shielding children from responsibilities that could cultivate self-regulation). Therefore, the research literature to date has not been able to distinguish biases in self-reports from potentially true group differences in self-regulation.

In this investigation, we overcome these limitations by using three complementary methods to examine reference bias more directly than has been possible previously. Our approach is motivated by the basic finding that people judge themselves compared to salient and similar others 47 . Therefore we exploit (Studies 1 and 2) or work around (Study 3) variation in people’s reference groups.

In Study 1 (total N = 206,589 students in k = 562 Mexican high schools), we show that the reference bias effect appears even within the same school in a year-over-year comparison. When students are surrounded by higher-achieving peers relative to other students at the same school in a different year, they rate themselves lower in self-regulation. Study 2 addresses an additional confound that could remain in Study 1’s analysis, which is the possibility that year-over-year fluctuations in test scores are not random but are due to choices made by families about the academic trajectory of the school. In Study 2 ( N = 21,818 students in k = 62 U.S. secondary schools), we rule this out with an analysis rooted in the purported psychological explanation for reference bias, which is that people’s self-judgments should be more influenced by the peers whose behaviors they observe rather than peers whose behaviors they do not observe. We show that reference bias is evident in data from a single school year only when administrative data showed that the peers shared classes and therefore had an opportunity to observe each other’s self-regulated behavior. Furthermore, Study 2 examined the theorized, but typically unmeasured, explanation for reference bias: differences in students’ implicit standards for self-regulation (i.e., how many hours of homework constitute “a lot of homework” and how often it means to “sometimes” forget what they need for class).

Studies 1 and 2 argue against school-level alternative explanations for reference bias but nevertheless allowed for the possibility that high-achieving peers reduce a student’s real capacity for self-regulation. Study 3 ( N = 1278 seniors in k = 15 U.S. high schools) addressed this possibility with a workaround: an objective behavioral task that involves no self-reports and therefore is not subject to biases due to differences in frames of reference. By matching self-regulation data collected in high school to records of college graduation, we show that there is no evidence of reference bias when a behavioral task is used. This evidence is bolstered by Study 3’s use of a measure of school achievement that is independent of the high school peer group: graduation from college within 6 years after high school completion.

Study 1: Evidence for reference bias in a country-wide natural experiment

In 2012 and 2013, the Secretariat of Public Education administered questionnaires measuring grit (the passion and perseverance for long term-term goals 54 ) and collected data on academic performance from high school seniors in a nationally representative sample of 10% of high schools in Mexico. We analyzed data from the 1% of all schools that, by chance, were selected in both years. This enabled us to exploit exogenous variation in the academic performance of the 2013 high school cohort when compared to the performance of the 2012 cohort. Reference bias was quantified as the effect on self-reported grit uniquely attributable to peer academic performance (i.e., the cohort-wide averages of GPA, standardized math test scores, and standardized reading test scores, respectively, excluding said student from the average), after controlling for differences between schools, cohort year, and each student’s own academic performance.

Sample and procedure

High school seniors in two representative random samples, each comprising 10% of schools in Mexico, completed standardized achievement tests of math and reading and, separately, self-report questionnaires late in the spring term of the 2011–2012 and 2012–2013 academic years, respectively. By chance, about 1% ( k = 562) of high schools were included in both years. Our final sample includes 97.8% of the students in these high schools ( N = 206,589) who completed a questionnaire measure of grit. There were slightly more girls than boys in our sample (53.49% female). On average, students in our sample were 17.61 years old (SD = 0.79).

Self-reported grit

The Technical Committee for Background Questionnaires at the National Center of Evaluation for Higher Education in Mexico (Centro Nacional de Evaluación para la Educación Superior) translated all 8 items of the Short Grit Scale 55 as well as its 5-point Likert-type response scale (1 = Not at all like me to 5 = Very much like me ) into Spanish. The observed reliability was \(\alpha = 0.62\) . All reported reliabilities are Cronbach’s alphas.

Grade point average (GPA)

Students reported their overall, verbal, and math GPAs using a categorical scale which ranged from less than 5.9 to 10 in half-point increments (i.e., < 5.9, 6.0–6.4, 6.5–6.9, etc.). We used the midpoint of the range in our analyses (i.e., 5.7, 6.2, 6.7, etc.). Although official GPAs were not available, meta-analytic estimates of the correlation between self-reported and objectively recorded GPA is \(r =0.82\) 56 . To avoid any issues with multicollinearity, we ran separate models for each GPA measure.

Standardized test scores

The Mexican Secretariat of Public Education provided standardized math and reading scores.

Analytic strategy

We used ordinary least squares (OLS) regression with clustered standard errors to predict self-reported grit from student’s own and peer’s academic performance:

where \(G_{ist}\) is the self-reported grit for student i who was in 12th grade in school s at time t (2012 or 2013). Term \(a_{ist}\) is that student’s own academic performance, operationalized as self-reported GPA, standardized math scores, or standardized verbal scores, respectively. Term \(b_{-ist}\) represents the average academic performance of students sharing a school with each student i , excluding student i . Term \(\theta _s\) represents fixed effects for each student’s school and captures ways in which schools might differ from each other—including such differences as teachers, curricula, school policies, and regional populations from which schools draw their members. Term \(\eta _{t}\) (fixed effect for year), captures how cohorts for each school systematically differ from each other. \(\epsilon _i\) represents error.

Students surrounded by higher-performing classmates rate themselves lower in grit

Consistent with prior research, among students in the same school, self-reported grit correlated positively with GPA ( \(\beta\) = 0.43, p < 0.001), standardized math test scores ( \(\beta\) = 0.16, p < 0.001), and standardized reading test scores ( \(\beta\) = 0.16, p < 0.001). However, consistent with reference bias, self-reported grit correlated inversely with schoolmates’ GPA ( \(\beta = -0.25\) , p < 0.001), peer standardized math test scores ( \(\beta = -0.09\) , p \(< 0.001\) ), and peer standardized reading test scores ( \(\beta = -0.07\) , \(p = 0.004\) ). See Fig.  2 and Supporting Information for details.

figure 2

In Study 1, self-reported grit correlated positively with a student’s own academic performance but inversely with the performance of their schoolmates. OLS models included demographic controls and school fixed effects. Error bars represent 95% confidence intervals. Model \(R^2\) s for GPA, math score, and language score were 0.124, 0.071, and 0.071, respectively.

Evidence for reference bias was consistent across demographic subgroups

Capitalizing on the size and representativeness of our sample, we explored moderators of reference bias. Regression coefficients for peer academic performance were not significantly different across subgroups defined by gender, mother’s educational level, school type (public or private), or school size. See Tables S6 and S7 in Supporting Information for details.

Study 2: Replication and extension in a single large school district

In Study 2, we partnered with the nonprofit organization Character Lab to replicate and extend Study 1 with a sample of students in grades 8 through 12 in a large, diverse school district in the United States. This partnership enabled us to obtain official class schedules for each student, which we used to distinguish near- versus far-peers as students who did or didn’t share daily academic classes, respectively. Whereas GPA was self-reported in Study 1, in Study 2 we obtained GPA from official school records. As part of a larger survey administered by Character Lab, students completed a self-report questionnaire of conscientiousness (the tendency to be organized, responsible, and hardworking 57 ) as well as two questions we developed to directly assess self-regulation standards.

This study included data from N = 21,818 (50% female, \(M_{age}\) = 15.60, \(SD_{age}\) = 1.54) students attending k = 62 middle and high schools in a large public school district in the United States who completed surveys in either October 2019 or February 2020. This district was part of Character Lab Research Network (CLRN), a consortium of school partners committed to advancing scientific insights that help children thrive. According to school records, the race/ethnicity of our sample was: Hispanic/Latinx (41%), White (28%), Black (23%), and other (8%). About half (49%) of students were eligible for free and reduced-price meals.

Self-reported conscientiousness

Students completed 12 items from the Big Five Inventory-2 58 assessing conscientiousness (e.g., “I am someone who is persistent, works until the task is finished”) using a 5-point Likert-type scale ranging from 1 = Not like me at all to 5 = Totally like me . The observed reliability was \(\alpha = 0.83\) .

Standards for hard work and preparedness

We included two questions to measure implicit standards for self-regulation. One question assessed norms for hard work: “If a student in your grade says they did ‘a lot of homework’ on a weeknight, how long would you guess they mean?” Eight response options ranged from 15 min (coded as 0.25 hours) to 3 or more hours (coded as 3 hours). The second question assessed norms for preparedness: “If a student in your grade says they ‘sometimes’ forget something they need for class, how often would you guess they mean?” Seven response options ranged from once a month to three times or more per day (coded as 66 times per month). We reverse-coded these values such that higher numbers indicated stricter standards for preparedness. These items were created for this study and used here for the first time.

From school administrative records, we calculated GPAs on a 100-point scale by averaging final grades in students’ academic courses (English language arts, math, science, social studies) for the quarter in which students took the survey during the 2019–2020 school year.

Near-peer and far-peer GPAs

For each student, we designated near-peers as those students who took at least one academic course with the target student during the quarter in which they took the survey. We designated far-peers as students in the same school who did not share any academic courses. For the average student in our sample, 38% of schoolmates were near-peers and 62% were far-peers.

To examine whether self-regulation standards and conscientiousness related to students’ own and peers’ performance, we fit OLS regression models with standard errors clustered by school to estimate the following equation:

where \(S_{is}\) is a survey measure of conscientiousness or self-regulation standards for student i in school s , \(a_{is}\) is a student’s own GPA, \(b_{-is}\) is the average GPA of students in the same school sharing at least one academic course with student i , \(c_{-is}\) is the average GPA of students in the same school but not sharing any academic courses with student i , \(x_{is}\) is a vector of student characteristics (age, gender, race/ethnicity, grade level, free or reduced-price meal status, English language learner status, special-education status, home language, and timing of the survey), \(\theta _{s}\) represents school fixed effects, and \(\epsilon _i\) is is a random error term.

Reference bias replicates: students whose classmates perform better academically rate themselves as lower in conscientiousness. As expected, this effect is driven by near-peers rather than far-peers

If implicit standards for self-regulation are determined by social comparison, reference bias should be driven by the individuals with whom individuals are in direct contact. As shown in Fig. 3 , consistent with Study 1, self-reported conscientiousness was correlated positively with a student’s own GPA ( \(\beta = 0.29\) , \(p< 0.001\) ), negatively with the GPA of near-peers ( \(\beta = - 0.06\) , \(p < 0.001\) ), and not at all with the GPA of far-peers ( \(\beta = 0.01\) , \(p = 0.395\) ). See Table S10 in Supporting Information for details.

Students whose near-peers perform better academically hold higher self-regulation standards

As expected, standards for hard work were predicted by a student’s own GPA ( \(\beta = 0.07\) , \(p < 0.001\) ) and the GPA of their near-peers ( \(\beta = 0.23\) , \(p < 0.001\) ), but not the GPA of their far-peers ( \(\beta = -0.03\) , \(p = 0.198\) ). The same pattern emerged for preparedness norms, which were predicted by students own GPA ( \(\beta = 0.05\) , \(p < 0.001\) ) and the GPA of their near-peers ( \(\beta = 0.14\) , \(p < 0.001\) ), but not far-peers ( \(\beta = -0.02\) , \(p = 0.080\) ). As in Study 1, the patterns of findings were generally similar across subgroups. See Tables S11 – S16 in Supporting Information for details.

figure 3

In Study 2, self-reported conscientiousness correlated positively with a student’s own GPA and negatively with the GPA of near-peers. In contrast, standards for what constitutes hard work and preparedness correlated positively with both own and near-peer GPA. As expected, there was no effect of far-peer GPA. OLS models included demographic controls and school fixed effects. Error bars represent 95% confidence intervals. Model \(R^2\) s for conscientiousness, hard work norms, and preparedness norms were 0.095, 0.159, and 0.059, respectively.

Study 3: In a longitudinal study of college graduation, evidence of reference bias in questionnaire but not in task measures of self-regulation

In Study 3, we sought evidence of discriminant validity. Unlike questionnaires, which require participants to make subjective judgments of their behavior, task measures assay behavior directly. In a prospective, longitudinal study of N = 1278 students attending k = 15 different college-preparatory charter schools in the United States, we tested the prediction that reference bias should be evident in questionnaire but not behavioral task measures of self-regulation. In their senior year of high school, students self-reported their grit and self-control (the ability to be in command of one’s behavior and to inhibit one’s impulses 57 ). In addition, they completed the Academic Diligence Task, a behavioral task in which students voluntarily allocate attention to either good-for-me-later math problems or fun-for-me-now games and videos. The Academic Diligence Task has previously been validated as indexing self-control and grit 59 , 60 . Six years later, we used the National Student Clearinghouse database to identify students who successfully obtained their college diploma.

A few weeks before graduation, N = 1278 (55% female, \(M_{age}\) = 18.01, \(SD_{age}\) = 1.01) high school seniors responded to self-report questionnaires and task measures in school computer labs. Students attended k = 15 charter schools located in various urban centers in the United States. Between 76 and 98% of the students at each school participated in the study. Most students were socioeconomically disadvantaged (84% of students’ mothers had less than a 4-year degree, 68% qualified for free or reduced-priced meals), and were mostly Latinx (46%) and African American (40%).

Students completed a 4-item version of the Grit Scale developed specifically for adolescents 61 . Students responded on a 5-point Likert-type scale ranging from 1 = Not at all true to 5 = Completely true . The observed reliability was \(\alpha = 0.78\) .

Self-control

Students completed four items from the Domain-Specific Impulsivity Scale 59 , 62 assessing academic self-control (e.g., “I forgot something needed for school”). Students responded on a 5-point Likert-type scale ranging from Not at all true to Completely true. The observed reliability was \(\alpha = 0.72\) .

Academic Diligence Task (ADT)

A subset ( n = 802) of students in our sample completed the Academic Diligence Task, a behavioral assessment of self-regulation that has been validated in separate research 59 . This computer-based task begins with screens explaining that practicing simple mathematical skills like subtraction can aid in further enhancing overall math abilities. Then, they completed three 3-min timed task blocks. In each, they chose between “Do math” and “Play game or watch movie.” Clicking “Do math” displayed a math task involving single-digit subtraction with multiple-choice responses. On the other hand, clicking “Play game or watch movie” allowed students to play Tetris or watch entertaining videos. Students could freely switch between them during each block. See Supporting Information for details. The key metric from the ADT was the mean number of problems correctly answered over the three blocks. Basic subtraction is very easy for most 12th grade students, so attentive engagement with the task resulted almost exclusively in correct answers: The median rate of correct responses was 98.3%. Due to positive skew and some clustering of data at 0 (i.e., students who did no math problems), we applied a square-root transformation to minimize bias from extremely high scores; this created an approximately normal distribution, which we used in subsequent calculations. Models using raw (i.e., non-transformed) ADT scores are shown in Table S19 . Across the three blocks, the observed reliability was \(\alpha = .78\) .

General cognitive ability

During the online survey, students completed a brief (12-item) version of Raven’s Progressive Matrices as an assessment of general cognitive ability 63 . The ability variable was calculated as the sum of correctly answered questions out of 12, with any missing questions marked as incorrect. The observed reliability was \(\alpha = 0.73\) .

College graduation

We matched our data to the National Student Clearinghouse, a public database that includes enrollment and graduation data for over 97% of students in 2022 64 , 65 . We coded six-year college graduation as 1 = obtained degree within 6 years of enrollment and 0 = did not obtain degree within 6 years of enrollment.

Because we were interested in both individual-level and school-level differences in self-regulation, we used multilevel modeling to analyze how the Academic Diligence Task and self-reported grit and self-control, predict college graduation. Specifically, we expected the ADT to positively predict college graduation at both the within- and between-school levels. We expected the relationship to be positive because prior research shows that students who obtain higher ADT scores tend to perform better academically 59 . Moreover, we expected the relationship to be positive at both levels because, as a task measure, it does not involve comparative judgment and thus cannot be influenced by reference bias. In contrast, we expected self-reported grit and self-control to positively predict college graduation within a school but negatively between schools. We used a missing dummy variable coding approach to deal with missing data and included controls for general cognitive ability in our models.

Evidence of reference bias in longitudinal predictions of college graduation from self-reported, but not objectively measured, self-regulation

As shown in Fig.   4 , among seniors in the same high school, higher scores on self-report questionnaires of self-control ( b = 0.16, OR = 1.17, p = 0.022) and grit ( b = 0.16, OR = 1.18, p = 0.020) each predicted greater odds of earning a college diploma 6 years later. However, college graduation rates were actually lower for schools with higher self-reported self-control and grit scores ( \(b = -0.44\) , OR = 0.64, p = 0.001; \(b = -0.39\) , OR = 0.68, p = 0.005, for self-control and grit, respectively).

This paradoxical pattern was not evident when self-regulation was assessed objectively using the Academic Diligence Task 59 . Among seniors in the same school, college graduation was predicted by higher scores on the Academic Diligence Task ( b = 0.15, OR = 1.17, p = 0.031). Likewise, when comparing across schools, college graduation rates were higher for schools whose students performed better on the Academic Diligence Task ( b = 0.46, OR = 1.58, p < 0.001).

Taken as a whole these findings suggest that reference bias reversed the relationship between self-regulation and graduation across schools. See Supporting Information for summaries of multilevel logistic regression models, robustness checks, and a replication of the own versus peer performance models in Studies 1 and 2.

figure 4

In Study 3, comparing students within schools (colored lines), higher self-regulation predicted higher odds of college graduation, whether measured by self-report questionnaires for grit and self-control or by a behavioral task called the Academic Diligence Task. When comparing schools to each other, however, higher self-reported grit and self-control scores predicted lower graduation rates, whereas the behavioral task positively predicted college graduation, as shown in the solid black lines. Plots show predicted probabilities of graduation from multilevel logistic regression models. AUC s for models predicting the academic diligence task, self-reported grit, and self-reported self-control were 0.694, 0.693, and, 0.676, respectively.

The three studies in this investigation provide direct evidence for reference bias in self-reported self-regulation. In Study 1, high school seniors rated themselves lower in grit when their schoolmates earned higher GPAs and standardized achievement test scores. In Study 2, we replicated this effect using self-report questionnaires of conscientiousness and showed that it was driven by near-peers rather than by far-peers. Further, we showed that the GPA of near-peers (but not far-peers) correlates positively with self-regulation standards. Finally, in Study 3, we found that using self-report questionnaires of grit and self-control to predict college graduation 6 years later produced paradoxical results: Within a high school, students with higher self-reported self-regulation were more likely to graduate from college 6 years later, but across schools, average levels of self-regulation negatively predicted graduation. In contrast, an objective task measure of self-regulation—which indexed performance directly and did not ask students to judge themselves—positively predicted college graduation both within and across schools.

How big are reference bias effects? Studies 1 and 2 provide estimates in the range of r = 0.06 to 0.25. All else being equal, a student in our samples whose peers’ academic achievement is one standard deviation above the mean is predicted to rate their own self-regulation as 10–20% of a standard deviation lower. Assuming that higher standards for self-regulation depress self-report ratings while at the same time, via social norms and modeling, encourage more self-regulated behavior, these are actually lower-bound estimates. Consistent with this possibility, when we use a behavioral task to assess self-regulation, we observe results consistent with positive peer effects (Study 3), which have also been previously reported in the literature 66 , 67 , 68 . Taken together, our findings suggest that reference bias effects, even across social groups in the same country, can be at least small-to-medium in size by contemporary benchmarks 69 and comparable to the effect sizes for faking on self-regulation questionnaires in workplace settings 70 .

Several limitations of the current investigation suggest promising directions for future research.

First, we must be cautious about drawing strong causal inferences from the non-experimental data in our three field studies. In Study 1, variation in peer quality could have influenced self-reported self-regulation for reasons other than reference bias. Against this, we found direct evidence for near-peer influence on self-regulation standards provided in Study 2. However, in Study 2, there is the possibility of reverse-causality. For example, rather than near-peers determining self-regulation standards, it is possible that self-regulation standards determined patterns of enrollment (e.g., students with higher standards self-selecting into the same difficult classes). In Study 3, we cannot rule out the possibility that some unmeasured confound gave rise to contradictory within-school versus between-school results on self-report (but not objective task) measures of self-regulation. In sum, it is important to confirm our observational findings by experimentally manipulating peer groups and/or standards of self-regulation.

Second, there are limits to the external validity of our conclusions. In particular, we examined reference bias in adolescence, a developmental period in which sensitivity to peers is at its apogee 71 . The adolescents in our investigation lived in Mexico (Study 1) and the United States. (Studies 2 and 3). Further research on children and adults, in a wider sample of countries, and in contexts outside formal schooling, is needed to establish boundary conditions and moderators of reference bias. In general, effect sizes for reference bias are expected to be smaller when comparing social groups with more similar standards.

Third, we did not collect nuanced data on social networks (e.g., friendships, acquaintances). Indeed, our operationalization of peer groups was quite crude—students in the same grade and attending the same school in Study 1 and 3, and students in the same grade and school who share at least one academic class (i.e., near-peers) in Study 2. Given the increasing prevalence of social-network studies and the continued popularity of self-report questionnaires in behavioral science, it should be possible to identify the influence of prominent social referents and close friends on reference bias.

Finally, while we collected information about student’s standards for self-regulation (in Study 2) and an objective measure of self-regulation (in Study 3), we have yet to collect both types of measures in the same sample. Doing so in a future study would enable us to test a mediation model in which peers influence standards for self-regulation which, in turn, diminish self-reported self-regulation relative to performance on a behavioral task of self-regulation. More generally, additional research is needed to establish the mediators, moderators, and boundary conditions of reference bias in the measurement of self-regulation.

Unfortunately, the problem of reference bias is not easily corrected. The most commonly suggested solution is anchoring vignettes 72 . This technique entails asking participants to rate detailed descriptions of hypothetical characters. These ratings are then used to adjust self-report questionnaire scores upward or downward depending on the stringency or leniency with which participants evaluated the hypothetical characters. Anchoring vignettes can increase the reliability and validity of self-reports 73 but do not always work as intended 74 . They also increase the time, effort, and literacy required from survey respondents, which may limit their utility at scale 73 , 75 .

A related possibility is to use behaviorally anchored 76 or act-frequency rating scales 77 , which ask respondents to rate themselves on more specific, contextualized behaviors than is typical in traditional questionnaires. For example, while students at over-subscribed charter schools do not rate themselves as more self-regulated, they and their parents do report more “minutes of homework completed” in an open-ended question in the same questionnaire 38 . In our view, such questions might mitigate response bias but probably do not eliminate it altogether. Why not? Because all subjective judgments rely, at least to some degree, on implicit standards that can differ (e.g., What level of effort is sufficient to consider yourself to be “doing homework”?).

As shown in Study 3, self-regulation can be assessed with behavioral tasks, which appear immune to reference bias. However, task measures have their own limitations, including a dramatically lower signal-to-noise ratio when compared to questionnaires and, relatedly, surprisingly modest associations with other measures of self-regulation 46 , 78 , 79 , 80 , 81 .

Perhaps the best means of obviating reference bias is to take a multi-method, multi-informant approach to assessment, including trained observers who can rate behavior across multiple occasions 12 . Observers who have seen hundreds, if not thousands, of cases typically have a wider reference frame than the individuals they are evaluating, which might explain why teacher ratings of behavior are more reliable and predictive of future outcomes than either parental reports or student self-reports 82 . The rarity of multi-method and multi-informant approaches suggests that, unfortunately, few researchers have the necessary resources or expertise to implement it, particularly at scale.

What are the implications of reference bias for researchers and policymakers?

Reference bias could suppress, or even reverse, the measured effects of interventions if the standards by which people judge their own behavior on pre- and post-questionnaires shift as a function of the intervention 83 . In one study, participants were asked to rate their interviewing skills before training ( pre ). Afterward, participants rated themselves again ( post ) and, in addition, retrospectively estimated what their skills had been at baseline ( then ). Even though questionnaire items were identical for all assessments, then ratings were lower than pre ratings—suggesting that participants adopted higher standards as a result of the intervention. Moreover, third-party judges’ ratings of performance matched then - post change better than pre - post differences 84 .

The implications of reference bias extend beyond intervention research. Consider, for example, mean-level increases in conscientiousness from adolescence through midlife 85 , 86 , 87 . If adults in their 50s hold higher standards for what it means to be courteous, rule-abiding, and self-controlled than teenagers, then age differences in conscientiousness may be even larger than we now think. In fact, to the extent that implicit standards and actual behavior are inversely correlated, reference bias should be expected to attenuate associations of self-regulation with groups of any kind.

While the importance of personal qualities like self-regulation is incontrovertible, the specter of reference bias argues against relying on self-report questionnaires when comparing students attending different schools, citizens who live in different countries, or indeed any of the members of any social group whose standards could differ from one another. Are you a hard worker? Responding to such a question requires looking within to identify the patterns of our behavior. In addition, the evidence for reference bias presented here suggests that knowingly or not, we also look around when we decide how to respond.

Ethics statement

All methods were carried out in accordance with relevant guidelines and regulations. Participants in Studies 2 and 3 completed written informed consent prior to participation in this study. Participants in Study 1 were completing country-mandated educational assessments, and thus did not complete written informed consent. We accessed this secondary dataset with authorization from the Mexican Secretariat of Education. Study 1 was approved by the Mexican Secretariat of Education. Study 2 was approved by Advarra IRB. Study 3 was approved by Stanford University IRB.

Data availability

The data that support the findings of Study 1 are available from the Mexican Ministry of Education but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of the Mexican Ministry of Education. Data for Study 2 and Study 3 are included in this published article’s  Supporting Information .

Rothbart, M. K. Temperament, development, and personality. Curr. Dir. Psychol. Sci. 16 , 207–212. https://doi.org/10.1111/j.1467-8721.2007.00505.x (2007).

Article   Google Scholar  

Mischel, W., Shoda, Y. & Rodriguez, M. L. Delay of gratification in children. Science 244 , 933–938. https://doi.org/10.1126/science.2658056 (1989).

Article   ADS   CAS   PubMed   Google Scholar  

Freud, S. Beyond the Pleasure Principle 90 (The International Psycho-Analytical Press, 1922).

Book   Google Scholar  

Roberts, B. W. & Yoon, H. J. Personality psychology. Annu. Rev. Psychol. 73 , 489–516 (2022).

Article   PubMed   Google Scholar  

Nigg, J. T. Annual research review: On the relations among self-regulation, self-control, executive functioning, effortful control, cognitive control, impulsivity, risk-taking, and inhibition for developmental psychopathology. J. Child Psychol. Psychiatry 58 , 361–383. https://doi.org/10.1111/jcpp.12675 (2017).

Berns, G. S., Laibson, D. & Loewenstein, G. Intertemporal choice—toward an integrative framework. Trends Cogn. Sci. 11 , 482–488. https://doi.org/10.1016/j.tics.2007.08.011 (2007).

Heckman, J. J. & Kautz, T. Hard evidence on soft skills. Labour Econ. 19 , 451–464. https://doi.org/10.1016/j.labeco.2012.05.014 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Duckworth, A. L., Taxer, J. L., Eskreis-Winkler, L., Galla, B. M. & Gross, J. J. Self-control and academic achievement. Annu. Rev. Psychol. 70 , 373–399. https://doi.org/10.1146/annurev-psych-010418-103230 (2019).

Bierman, K. L., Nix, R. L., Greenberg, M. T., Blair, C. & Domitrovich, C. E. Executive functions and school readiness intervention: Impact, moderation, and mediation in the Head Start REDI program. Dev. Psychopathol. 20 , 821–843. https://doi.org/10.1017/S0954579408000394 (2008).

Vedel, A. The Big Five and tertiary academic performance: A systematic review and meta-analysis. Personal. Individ. Differ. 71 , 66–76. https://doi.org/10.1016/j.paid.2014.07.011 (2014).

Daly, M., Egan, M., Quigley, J., Delaney, L. & Baumeister, R. F. Childhood self-control predicts smoking throughout life: Evidence from 21,000 cohort study participants. Health Psychol. 35 , 1254–1263. https://doi.org/10.1037/hea0000393 (2016).

Moffitt, T. E. et al. A gradient of childhood self-control predicts health, wealth, and public safety. Proc. Natl. Acad. Sci. 108 , 2693–2698. https://doi.org/10.1073/pnas.1010076108 (2011).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Bogg, T. & Roberts, B. W. Conscientiousness and health-related behaviors: A meta-analysis of the leading behavioral contributors to mortality. Psychol. Bull. 130 , 887–919. https://doi.org/10.1037/0033-2909.130.6.887 (2004).

Hofmann, W., Luhmann, M., Fisher, R. R., Vohs, K. D. & Baumeister, R. F. Yes, but are they happy? Effects of trait self-control on affective well-being and life satisfaction: Trait self-control and well-being. J. Pers. 82 , 265–277. https://doi.org/10.1111/jopy.12050 (2014).

Hirschi, T. Self-control and crime. In Handbook of Self-Regulation , 537–552.

Barrick, M. R. & Mount, M. K. The big five personality dimensions and job performance: A meta-analysis. Pers. Psychol. 44 , 1–26. https://doi.org/10.1111/j.1744-6570.1991.tb00688.x (1991).

Duckworth, A. L., Weir, D., Tsukayama, E. & Kwok, D. Who does well in life? Conscientious adults excel in both objective and subjective success. Front. Psychol. https://doi.org/10.3389/fpsyg.2012.00356 (2012).

Wiersma, U. J. & Kappe, R. Selecting for extroversion but rewarding for conscientiousness. Eur. J. Work Organ. Psychol. 26 , 314–323. https://doi.org/10.1080/1359432X.2016.1266340 (2017).

Denissen, J. J. A. et al. Uncovering the power of personality to shape income. Psychol. Sci. 29 , 3–13 (2018).

Doebel, S. Rethinking executive function and its development. Perspect. Psychol. Sci. 15 , 942–956. https://doi.org/10.1177/1745691620904771 (2020).

Casey, B. J. Beyond simple models of self-control to circuit-based accounts of adolescent behavior. Annu. Rev. Psychol. 66 , 295–319. https://doi.org/10.1146/annurev-psych-010814-015156 (2015).

Article   CAS   PubMed   Google Scholar  

Dahl, R. E., Allen, N. B., Wilbrecht, L. & Suleiman, A. B. Importance of investing in adolescence from a developmental science perspective. Nature 554 , 441–450. https://doi.org/10.1038/nature25770 (2018).

Steinberg, L. Cognitive and affective development in adolescence. Trends Cogn. Sci. 9 , 69–74. https://doi.org/10.1016/j.tics.2004.12.005 (2005).

Bailey, R., Meland, E. A., Brion-Meisels, G. & Jones, S. M. Getting developmental science back into schools: Can what we know about self-regulation help change how we think about “No Excuses’’?. Front. Psychol. 10 , 1885. https://doi.org/10.3389/fpsyg.2019.01885 (2019).

Hamilton, S. F. Chapter 6: The secondary school in the ecology of adolescent development. Rev. Res. Educ. 11 , 227–258. https://doi.org/10.3102/0091732X011001227 (1984).

Leonard, J. A., Lee, Y. & Schulz, L. E. Infants make more attempts to achieve a goal when they see adults persist. Science 357 , 1290–1294. https://doi.org/10.1126/science.aan2317 (2017).

Bandura, A. & Mischel, W. Modification of Self-Imposed delay of reward through exposure to live and symbolic models. J. Pers. Soc. Psychol. 2 , 698–705 (1965).

King, K. M., McLaughlin, K. A., Silk, J. & Monahan, K. C. Peer effects on self-regulation in adolescence depend on the nature and quality of the peer interaction. Dev. Psychopathol. 30 , 1389–1401. https://doi.org/10.1017/S0954579417001560 (2018).

Doebel, S. & Munakata, Y. Group influences on engaging self-control: Children delay gratification and value it more when their in-group delays and their out-group doesn’t. Psychol. Sci. 29 , 738–748. https://doi.org/10.1177/0956797617747367 (2018).

Bertling, J. P., Marksteiner, T. & Kyllonen, P. C. General noncognitive outcomes. In Assessing Contexts of Learning (eds Kuger, S. et al. ) 255–281 (Springer, 2016). https://doi.org/10.1007/978-3-319-45357-6_10 .

Chapter   Google Scholar  

U.S. Department of Education. Every Student Succeeds Act (ESSA) (2015).

Center for Disease Control and Prevention. Whole School, Whole Community, Whole Child (WSCC) (2021).

OECD. Beyond Academic Learning: First Results from the Survey of Social and Emotional Skills (OECD, 2021).

Jackson, C. K., Porter, S. C., Easton, J. Q., Blanchard, A. & Kiguel, S. School effects on socioemotional development, school-based arrests, and educational attainment. Am. Econ. Rev. Insights 2 , 491–508 (2020).

West, M. R. et al. Promise and paradox: Measuring students’ non-cognitive skills and the impact of schooling. Educ. Eval. Policy Anal. 38 , 148–170. https://doi.org/10.3102/0162373715597298 (2016).

Dobbie, W. & Fryer, R. G. The medium-term impacts of high-achieving charter schools. J. Polit. Econ. 123 , 985–1037. https://doi.org/10.1086/682718 (2015).

Tuttle, C. C. et al. Understanding the Effect of KIPP as it Scales, Volume I, Impacts on Achievement and Other Outcomes. Tech. Rep, Mathematica Policy Research (2015).

Tuttle, C. C. et al. KIPP Middle Schools: Impacts on Achievement and Other Outcomes. Tech. Rep, Mathematica Policy Research (2013).

Angrist, J. D., Pathak, P. A. & Walters, C. R. Explaining charter school effectiveness. Am. Econ. J. Appl. Econ. 5 , 1–27. https://doi.org/10.1257/app.5.4.1 (2013).

Dobbie, W. & Fryer, R. G. Getting beneath the veil of effective schools: Evidence from New York City. Am. Econ. J. Appl. Econ. 5 , 28–60. https://doi.org/10.1257/app.5.4.28 (2013).

Heine, S. J., Lehman, D. R., Peng, K. & Greenholtz, J. What’s wrong with cross-cultural comparisons of subjective Likert scales?: The reference-group effect. J. Pers. Soc. Psychol. 82 , 903–918. https://doi.org/10.1037/0022-3514.82.6.903 (2002).

Van de Gaer, E., Grisay, A., Schulz, W. & Gebhardt, E. The reference group effect. Cult. Psychol. 43 , 24 (2012).

Google Scholar  

Van Vaerenbergh, Y. & Thomas, T. D. Response styles in survey research: A literature review of antecedents, consequences, and remedies. Int. J. Public Opin. Res. 25 , 195–217. https://doi.org/10.1093/ijpor/eds021 (2013).

Schwarz, N. & Oyserman, D. Asking questions about behavior: Cognition, communication, and questionnaire construction. Am. J. Eval. 22 , 127–160 (2001).

Tourangeau, R., Rips, L. J. & Rasinski, K. The Psychology of Survey Response (Cambridge University Press, 2000).

Duckworth, A. L. & Yeager, D. S. Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educ. Res. 44 , 237–251. https://doi.org/10.3102/0013189X15584327 (2015).

Morina, N. Comparisons Inform Me Who I Am: A general comparative-processing model of self-perception. Perspect. Psychol. Sci. 16 , 1281–1299 (2021).

Marsh, H. W. & Craven, R. G. Reciprocal effects of self-concept and performance from a multidimensional perspective: Beyond seductive pleasure and unidimensional perspectives. Perspect. Psychol. Sci. 1 , 133–163. https://doi.org/10.1111/j.1745-6916.2006.00010.x (2006).

Marsh, H. W. et al. The Big-fish-little-pond-effect stands up to critical scrutiny: Implications for theory, methodology, and future research. Educ. Psychol. Rev. 20 , 319–350. https://doi.org/10.1007/s10648-008-9075-6 (2008).

Marsh, H. W. The Big-Fish-Little-Pond effect on academic self-concept. J. Educ. Psychol. 79 , 280–295 (1987).

Gerber, J. P., Wheeler, L. & Suls, J. A social comparison theory meta-analysis 60+ years on. Psychol. Bull. 144 , 177–197. https://doi.org/10.1037/bul0000127 (2018).

Bybee, R. & McCrae, B. Scientific literacy and student attitudes: Perspectives from PISA 2006 science. Int. J. Sci. Educ. 33 , 7–26. https://doi.org/10.1080/09500693.2010.518644 (2011).

Schmitt, D. P., Allik, J., McCrae, R. R. & Benet-Martínez, V. The geographic distribution of Big Five personality traits: Patterns and profiles of human self-description across 56 nations. J. Cross-Cultural Psychol. 38 , 173–212. https://doi.org/10.1177/0022022106297299 (2007).

Duckworth, A. L., Peterson, C., Matthews, M. D. & Kelly, D. R. Grit: Perseverance and passion for long-term goals. J. Pers. Soc. Psychol. 92 , 1087–1101. https://doi.org/10.1037/0022-3514.92.6.1087 (2007).

Duckworth, A. L. & Quinn, P. D. Development and validation of the Short Grit scale (Grit-S). J. Pers. Assess. 91 , 166–174. https://doi.org/10.1080/00223890802634290 (2009).

Kuncel, N. R., Credé, M. & Thomas, L. L. The validity of self-reported grade point averages, class ranks, and test scores: A meta-analysis and review of the literature. Rev. Educ. Res. 75 , 63–82. https://doi.org/10.3102/00346543075001063 (2005).

American Psychological Association. APA Dictionary of Psychology 1st edn. (American Psychological Association, 2007).

Soto, C. J. & John, O. P. Short and extra-short forms of the Big Five Inventory-2: The BFI-2-S and BFI-2-XS. J. Res. Pers. 68 , 69–81. https://doi.org/10.1016/j.jrp.2017.02.004 (2017).

Galla, B. M. et al. The Academic Diligence Task (ADT): Assessing individual differences in effort on tedious but important schoolwork. Contemp. Educ. Psychol. 39 , 314–325. https://doi.org/10.1016/j.cedpsych.2014.08.001 (2014).

Zamarro, G., Nichols, M., Duckworth, A. & D’Mello, S. Further Validation of Survey Effort Measures of Relevant Character Skills: Results from a Sample of High School Students. EDRE Working Paper2018-07. https://doi.org/10.2139/ssrn.3265332 (2018).

Galla, B. M. et al. Why high school grades are better predictors of on-time college graduation than are admissions test scores: The roles of self-regulation and cognitive ability. Am. Educ. Res. J. 56 , 2077–2115. https://doi.org/10.3102/0002831219843292 (2019).

Tsukayama, E., Duckworth, A. L. & Kim, B. Domain-specific impulsivity in school-age children. Dev. Sci. 16 , 879–893. https://doi.org/10.1111/desc.12067 (2013).

Raven, J. & Raven, J. Raven progressive matrices. In Handbook of Nonverbal Assessment (ed. McCallum, R. S.) 223–237 (Kluwer Academic, 2003).

Dynarski, S. M., Hemelt, S. W. & Hyman, J. M. The missing manual: Using National Student Clearinghouse data to track postsecondary outcomes. Educ. Eval. Policy Anal. 37 , 53S-79S (2015).

Schoenecker, C. & Reeves, R. The National Student Clearinghouse: The largest current student tracking database. New Directions Community Coll. 143 , 47–57. https://doi.org/10.1002/cc.335 (2008).

Cialdini, R. B. Descriptive social norms as underappreciated sources of social control. Psychometrika 72 , 263–268. https://doi.org/10.1007/s11336-006-1560-6 (2007).

Article   MathSciNet   MATH   Google Scholar  

Bandura, A. Social Learning Theory (General Learning Press, 1971).

Sacerdote, B. Peer effects in education: How might they work, how big are they and how much do we know thus far?. Handb. Econ. Educ. 3 , 249–277. https://doi.org/10.1016/B978-0-444-53429-3.00004-1 (2011).

Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research: Sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2 , 156–168. https://doi.org/10.1177/2515245919847202 (2019).

Martínez, A. & Salgado, J. F. A meta-analysis of the faking resistance of forced-choice personality inventories. Front. Psychol. 12 , 732241. https://doi.org/10.3389/fpsyg.2021.732241 (2021).

Steinberg, L. & Amanda, S. M. Adolescent development. Annu. Rev. Psychol. 52 , 83–110 (2000).

King, G., Murray, C. J. L., Salomon, J. A. & Tandon, A. Enhancing the validity and cross-cultural comparability of measurement in survey research. Am. Polit. Sci. Rev. 98 , 191–207. https://doi.org/10.1017/S000305540400108X (2004).

Primi, R., Zanon, C., Santos, D., De Fruyt, F. & John, O. P. Anchoring vignettes: Can they make adolescent self-reports of social-emotional skills more reliable, discriminant, and criterion-valid?. Eur. J. Psychol. Assess. 32 , 39–51. https://doi.org/10.1027/1015-5759/a000336 (2016).

Grol-Prokopczyk, H., Verdes-Tennant, E., McEniry, M. & Ispány, M. Promises and pitfalls of anchoring vignettes in health survey research. Demography 52 , 1703–1728. https://doi.org/10.1007/s13524-015-0422-1 (2015).

Bertling, J. P., Borgonovi, F. & Almonte, D. E. Psychosocial skills in large-scale assessments: Trends, challenges, and policy implications. In Psychosocial Skills and School Systems in the 21st Century: Theory, Research, and Practice. The Springer Series on Human Exceptionality (eds Lipnevich, A. A. et al. ) (Springer, 2016). https://doi.org/10.1007/978-3-319-28606-8 .

Schwab, D. P., Heneman, H. G. & DeCotiis, T. A. Behaviorally anchored rating scales: A review of the literature. Pers. Psychol. 28 , 549–562. https://doi.org/10.1111/j.1744-6570.1975.tb01392.x (1975).

Buss, D. M. & Craik, K. H. The act frequency approach to personality. Psychol. Rev. 90 , 105–126 (1983).

Enkavi, A. Z. et al. Large-scale analysis of test-retest reliabilities of self-regulation measures. Proc. Natl. Acad. Sci. 116 , 5472–5477. https://doi.org/10.1073/pnas.1818430116 (2019).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Duckworth, A. L. & Kern, M. L. A meta-analysis of the convergent validity of self-control measures. J. Res. Pers. 45 , 259–268. https://doi.org/10.1016/j.jrp.2011.02.004 (2011).

Sharma, L., Kohl, K., Morgan, T. A. & Clark, L. A. “Impulsivity’’: Relations between self-report and behavior. J. Pers. Soc. Psychol. 104 , 559–575. https://doi.org/10.1037/a0031181 (2013).

Friedman, N. P. & Gustavson, D. E. Do rating and task measures of control abilities assess the same thing?. Curr. Dir. Psychol. Sci. 31 , 262–271. https://doi.org/10.1177/09637214221091824 (2022).

Feng, S., Han, Y., Heckman, J. J. & Kautz, T. Comparing the reliability and predictive power of child, teacher, and guardian reports of noncognitive skills. Proc. Natl. Acad. Sci. 119 , e2113992119. https://doi.org/10.1073/pnas.2113992119 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Howard, G. S. Response-shift bias: A problem in evaluating interventions with pre/post self-reports. Eval. Rev. 4 , 93–106 (1980).

Howard, G. S. & Dailey, P. R. Response-Shift Bias: A source of contamination of self-report measures. J. Appl. Psychol. 64 , 144–150 (1979).

Roberts, B. W., Walton, K. E. & Viechtbauer, W. Patterns of mean-level change in personality traits across the life course: A meta-analysis of longitudinal studies. Psychol. Bull. 132 , 1–25. https://doi.org/10.1037/0033-2909.132.1.1 (2006).

Damian, R. I. & Spengler, M. Sixteen going on sixty-six: A longitudinal study of personality stability and change across 50 years. J. Pers. Soc. Psychol. 117 , 674–695 (2018).

Roberts, B. W. & DelVecchio, W. F. The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychol. Bull. 126 , 3–25. https://doi.org/10.1037/0033-2909.126.1.3 (2000).

Download references

Acknowledgements

This research received support from the Bill & Melinda Gates Foundation, the Raikes Foundation, the William T. Grant Foundation, and a fellowship from the Center for Advanced Study in the Behavioral Sciences (CASBS) to the sixth author and grants from the John Templeton Foundation, the Walton Family Foundation, and National Science Foundation to the last author. This research was supported by the National Institute of Child Health and Human Development (Grant No. 10.13039/100000071 R01HD084772-01). The authors wish to thank Donald Kamentz, Laura Keane, and the schools and students who participated in the research.

Author information

These authors contributed equally: Benjamin Lira and Joseph M. O’Brien.

Authors and Affiliations

University of Pennsylvania, Philadelphia, USA

Benjamin Lira & Angela L. Duckworth

University of Texas at Austin, Austin, USA

Joseph M. O’Brien & David S. Yeager

University of Chicago, Chicago, USA

Pablo A. Peña

University of Pittsburgh, Pittsburgh, USA

Brian M. Galla

University of Colorado-Boulder, Boulder, USA

Sidney D’Mello

Mathematica, Inc., Princeton, USA

Amy Defnet, Tim Kautz & Kate Munkacsy

You can also search for this author in PubMed   Google Scholar

Contributions

A.L.D., D.S.Y., J.M.O., and P.A.P. conceptualized the study and developed the methodology; B.M.G. developed methodology; A.L.D., D.S.Y., and J.M.O. collected data; B.L., J.M.O., P.A.P., A.D., T.K., K.M., A.L.D., and D.S.Y. analyzed and interpreted data; B.L., A.L.D., D.S.Y., J.M.O., P.A.P., A.D., T.K., and K.M. wrote the paper; all authors revised and approved the final draft .

Corresponding author

Correspondence to Benjamin Lira .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Lira, B., O’Brien, J.M., Peña, P.A. et al. Large studies reveal how reference bias limits policy applications of self-report measures. Sci Rep 12 , 19189 (2022). https://doi.org/10.1038/s41598-022-23373-9

Download citation

Received : 17 May 2022

Accepted : 31 October 2022

Published : 10 November 2022

DOI : https://doi.org/10.1038/s41598-022-23373-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Active social engagement and health among older adults: assessing differences by cancer survivorship status.

  • Jennifer L. Moss
  • Veronica Bernacchi
  • Erin Kitt-Lewis

Health and Quality of Life Outcomes (2024)

Is irritability multidimensional: Psychometrics of The Irritability and Dysregulation of Emotion Scale (TIDES-13)

  • Andrew S. Dissanayake
  • Annie Dupuis

European Child & Adolescent Psychiatry (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

the problem of the self report in survey research

APS

The Science of Self-Report

BETHESDA, MARYLAND-The accuracy and reliability of reports of one’s own behavior and physical state are at the root of effective medical practice and valid research in health and psychology. To address this crucial element of research, the National Institutes of Health (NIH) held an informative conference here in November, “The Science of Self-report: Implications for Research & Practice,” at which more than 500 researchers and policymakers learned about many of the critical limits of “self-report” as a research tool as well as some of the latest techniques to enhance its effectiveness.

Sponsored by the Office of Behavioral and Social Science Research (OBSSR), the symposium drew participants from virtually every area of health and medicine policy, practice, and research. The issue of self-report as a primary tool in research and as an unavoidable component in health care is of central concern to medical and social science researchers and medical and psychology practitioners, and many other scientists.

Drawing on the expertise of disciplines ranging from anthropology to sociology, the conference’s 32 speakers and introducers featured 10 APS members, including the following NIH staff: Wendy Baldwin (NIH Office of Extramural Research deputy director), Norman Anderson (OBSSR director), Virginia Cain (OBSSR), Howard Kurtzman (National Institute of Mental Health), and Jaylan Turkkan (Program Co-Chair).

Value, Limits, and Improvements

”’The issue we have to consider regarding self-report data is not that it should be replaced by external measurements but that we will always need self-report about many behaviors that are simply going to be unobservable by anyone else. We’re going to need it because the interpretation of events may be important, and only the individual can provide those interpretations,” said Baldwin in the opening session initiating the two-day conference. But assessing patient compliance with medical regimens and eliciting medical histories are just two of the particularly important areas in which self-report data is routinely, and perhaps blindly, accepted as reliable in many current medical contexts.

“Consequently, the effort should be placed on improving the self-report measures, as opposed to just looking for weaknesses or how they can be replaced by external measures,” Baldwin emphasized in her comments that set the tone for the exceptionally practical conference. In fact, all speakers at the conference emphasized the invaluable nature of self-report measurements and called for a continual effort to improve their utility. “Where we have other validation, that’s great! But we have a very important job ahead of us to make sure that we can learn why self-report either works well or doesn’t, and when it works well, and when it doesn’t,” said Baldwin. Observational and experimental studies have shown that there are barriers to accuracy at every stage of the autobiographical report process- perception of the state of the self, encoding and storage of memory, understanding the question being asked, recalling the facts, and judging how and what to answer. And one intention of the conference was to systematical1y review the documented problems across several research and medical contexts.

Reporting Symptoms and Physiology

Psychologist James Pennebaker, Southern Methodist University, presented data from studies on the ability to perceive one’s own physical symptoms and other aspects of physiology such as heart rate.

“People are generally not good at this,” he finds, “but there are interesting sex differences.” In laboratory settings, men are better at perceiving their inner physiological states than are women, but the difference is largely erased when the studies are conducted in a more natural environment. This is because men and women emphasize different sources of information, when asked to define their internal states: Men rely more directly on internal bodily cues, while women rely more on situational cues. There is, of course, a lack of normal situational cues in the laboratory setting. One practical application of the skill of defining one’s internal state is that diabetics must be trained to monitor their own blood glucose levels. Having instead to resort to chemical testing for glucose is often impractical.

Reporting Pain

Pain is not a simple sensory event and is not proportional to tissue damage, reported APS Member Francis Keefe, of Duke University Medical Center’s Pain Management Program. In his discussion of the perception of pain, Keefe explained that pain is influenced by psychological, social and cultural factors, all of which act via a gating mechanism in the spinal cord, to influence the perception of pain. Also, the intensity of pain is separate from the degree of unpleasant affect associated with it, and this difference is reflected in pharmacology: While the drug fentanyl reduces the intensity of pain, diazepam reduces its unpleasantness.

Affect, in turn, modifies pain tolerance: A negative mood decreases tolerance for experimental pain in the cold pressor test, and affect at the time of pain influences the later recall of the intensity of the pain. Keefe says some pain specialists have advocated training patients with chronic pain (i.e., cancer patients) to be more emphatic and expressive in describing their pain to their doctors, in order to help ensure that adequate pain relief is prescribed. However, he says, many pain control techniques are effective because they influence affect and mood, more than they influence the intensity of the pain per se.

Reporting Data Through High- Tech “Diaries”

In his presentation on high-tech techniques to obtain self-report data, APS Member Saul Shiffman of the University of Pittsburgh’s Department of Psychology indicated that written daily or weekly diaries have not proven themselves very good for accurate recording of simple objective events like smoking. In fact, people often fail even to accurately enter many simple events into their memory, let alone document them on paper. To avoid the problem, he describes the technique of Ecological Momentary Assessment (EMA). EMA requires the subject/patient to carry a custom-designed palm-top computer, which prompts him throughout the day to answer a question (e.g., “Are you smoking right now?”). The question is posed according to the desired sampling, which can be purely random over time or contingent upon various other behaviors (like drinking coffee). By avoiding recall completely, this method can provide a very revealing picture of the subject’s pattern of behavior. It also generates great quantities of data, but the analysis of that data poses unique and controversial statistical problems, because they do not fit into the standard definitions of repeated measures.

Reporting Temporal Frequencies of Behavior From Memory

Several presenters stressed the problems posed by aspects of the mechanisms of memory encoding and recall. Norman Bradburn of the National Opinion Research Center and the University of Chicago was the first of many speakers to note that remembering is very definitely a reconstructive task. It typically suffers from several distortions, including the bundling of events, and the tendency to “telescope” events. or bring them forward in the past when remembering.

Rounding errors are frequent when self-reported time intervals approach conventional discrete units of time (e.g., an hour, a week, a month, a year). Events six or eight days ago tend to be remembered as “one week” ago, and whatever the unit of time (e.g., an hour, a week, a month, a year). Events six or eight days ago tend to be remembered as “one week” ago, and whatever the unit of time appropriate to the interval, errors are made in whole unit chunks rather than in parts of units. “We are more likely to think in terms of three weeks, than 20 days,” said Bradburn. “Many people do not enumerate events, even when we might expect the question to lead them to do so. Rather, they estimate the number of events on the basis of some rule.”

Sex Differences Reporting Temporal Facts

And, just as many have thought, women do remember dates better than men. To help the respondent reconstruct the past, the interviewer or questionnaire should ask questions that are structured according to the way in which the events are likely to be encoded. Memories are rarely linked to calendar dates but rather to notable life events (e.g., graduation from college). Roger Tourangeau of the National Opinion Research Center further analyzed the distinction between questions designed to encourage estimation and questions designed to encourage recall of individual events.

Decompositional Approach

And, Geeta Menon of New York University’s Department of Marketing has analyzed the role of the decompositional question in eliciting recall of regular versus sporadic behaviors. Should we simply ask the open-ended question “How many times did you do X last week?” Or, should we ask the same thing using a decompositional approach? For example, “How many times did you do X while driving? While sitting at home? While working? … ”

Menon’s research indicates that the open-ended question (“How many times did you do X in the last month?”) tends to encourage the subject to answer by referring to a “rule,” or an estimate of frequency. For regularly occurring behaviors this elicits accurate answers with the minimum of mental effort. For behaviors that are more sporadic, on the other hand, it is better to ask decompositional questions (i.e., to help the respondent by breaking the problem up into chunks). For irregular behaviors, a rule is less useful, and it is desirable to encourage the subject to recal1 each instance, using an enumeration strategy.

False and Forgotten Memories

Demonstrations that there can be both false negatives and false positives in memories of events that occurred long ago (or did not occur at all) have a particular relevance to the problem of sexual abuse of children. Speaking on the subject of false positives in memory, APS Fellow Elizabeth Loftus of the Psychology Department at the University of Washington presented findings—demonstrated in many experiments- that it is possible to create false memories. Such “memories” can be induced either by: (I) simply having the subject imagine a scenario vividly, and then later asking them to recount “memories” of similar events, or (2) by frankly telling a subject that a specific event happened and then reinforcing the associated “memory” by attempting to convince the subject of the authenticity of the event (e.g., by coaxing the subject with the question “Can’t you try to remember the time you got lost at the shopping mall?”).

People can import true memories from other events, thereby giving their false event memories seeming credibility, people can forget the source of a memory by wrongly attributing the memory of a fantasy to memory of a real event, and people make up completely unfounded facts, as well. The confidence one feels in the validity of one’s recall also has little correlation with its accuracy.

Linda Williams of the University of New Hampshire’s Family Research Laboratory has documented the other side of this issue, the false negative for a documented event. In these studies, children who were seen at hospitals for instances of sexual abuse were asked, many years later, to recall any such events. A substantial minority of the children, including those who had findings on physical exam that confirmed the abuse, failed to recall the instances. Interestingly, the forgetting was not correlated with the use of force or coercion by their abuser. The children were, however, more likely to forget abuse at the hands of individuals closest to them (i.e., in terms of familial relation, familiarity, or friendship).

Prolong the Pain

Psychologist Daniel Kahneman of the Woodrow Wilson School at Princeton University studies the memory of pain, as in painful medical procedures. Do we remember the quantity of pain as something like its intensity multiplied by its duration? Not at all. We remember an average of the moment of peak intensity and the pain at the end of the procedure. This has applications to colonoscopy, which is distinctly unpleasant, and for which one would like the subject to return for a repeat test every ten years. Strangely, Kahneman suggested, his research findings may mean that in order to make the long-term memory of the pain less severe, one should extend the time of the procedure, by keeping the colonoscope inserted, but not moving it. The pain is less for those last few minutes, even though we have added several minutes of diminished pain to the end of a painful experience.

Mood and Memory

APS Fellow John Kihlstrom of the Department of Psychology at Yale University took a logical and deductive approach to the problem of the influence of affect on memory. Although some experimenters have failed to find a link, he says, others have. There are some robust paradigms of mood-dependent memory. Because memory is reconstructive, not merely a readout of data, it is a cognitive task. Performance on other cognitive tasks is affected by mood, and so we should expect recall to be influenced by mood. For example, many mental patients report being abused in childhood. Is this a causal association or an example of preferential recall of mood congruent memories? What is needed to untangle this link, he says, are prospective studies.

Sensitive Topics

Nora Cate Shaeffer of the Department of Sociology at the University of Wisconsin-Madison addressed the problem of self-report in sensitive topics, such as sexual behavior or drug abuse. People will tend to present themselves in a positive light, sometimes to look good, and sometimes to “please” the researcher. The more serious an illegal behavior (e.g., the “harder” the drug), the less likely people are to report their recent use of it, while events in the distant past are less sensitive, and consequently are less likely to be concealed. Men tend to exaggerate their sexual histories, while women tend to understate them. But in any individual case, one doesn’t know how accurate a source is. Not only do people calculate the risk of revealing sensitive information (e.g., they may ask themselves “Will my spouse find out?” “Will the police find out?”), but they may even reinterpret the question, so as to allow themselves to answer evasively. (For example, a respondent may reason as follows: “Well, I did have that abortion, but I’m really not ‘the kind of person’ who would do that normally, so I’ll say “never.'” Or, “This interviewer has a hell of a nerve; it’s none of his business, ergo I don’t feel dishonest lying about this.”)

Medical Compliance

Cynthia Rand of the Johns Hopkins University Asthma and Allergy Center discussed the problem of medical noncompliance. This generates a problem for research as well as practice. If everyone in a study takes half as many pills as they say they did, the FDA-approved and officially sanctioned dosage will be twice as high as the dosage that most people reported worked best. (Yes, this suggests that to avoid an overdose of medication, it may be best to be no more compliant than the average participant in the clinical trial that determined the proper dose!) What can be done to increase the honesty of responses? For starters, a physician’s question such as “You’re taking the pills the way I prescribed, aren’t you?” is not likely to uncover any problems with compliance. It is important to discuss the patient’s experience with the regimen in more detail, to reveal possible problems or hidden issues.

Ethics in Self-report

APS Fellow Donald Bersoff of the Villanova University School of Law addressed the knotty problems arising from ethical considerations in asking sensitive questions. If a subject reports self-destructive behavior, should the researcher intervene? Does that violate confidentiality and thereby compromise the autonomy of the subject?

Bersoff implores researchers to at least address these issues before beginning research studies. For example, before undertaking a study on the attitudes of teenagers toward dangerous behaviors, researchers should consider what they will do if they find out that a teenager is contemplating suicide, or is using heroin. “Have a plan, have a policy, discuss the pros and cons of breaking confidentiality before the issue comes up,” said Bersoff. ”Too many researchers of sensitive topics don ‘t even think about what they will do, until they have in hand the information, and then they must agonize over their choices.”

Ethnic and Cultural Considerations

In many cases, the accuracy of a subject’s response depends on the understanding of the question. Spero Manson of the Department of Psychiatry at the University of Colorado’s School of Medicine has rewritten surveys specifically for Native American populations, and, with sensitivity to cross-cultural issues is able to raise the consistency of the scores very significantly. He cites one particular Indian culture in which it is considered very important never to give voice to certain negative thoughts; consequently, questions about suicidal ideation are either simply skipped by respondents at very high rates or are not answered frankly.

Efficient Screen for Depression

Ronald Kessler of Harvard Medical School’s Department of Health Care Policy has been developing a short screening test for major depression. A psychiatrist asks questions until he knows the answers he is seeking, but screening tests must be designed for administration by non-specialists, with minimal preparation. Kessler’s test, intended for screening large populations and subject to severe budget constraints, is an extreme version of this problem. The screen must not yield many false positives, it must be understandable by people of widely varying literacy and cultural backgrounds; 75% of the general population should score zero on the test, meaning that it is sensitive to only the serious cases. Interestingly, out of scores of possible questions, he has been able to narrow the survey to six very robust questions! They will be made available on the worldwide web at URLs http://www.umich.edu/~icpe/ or www.umich.edu/~ncsum/.

If You Can’t Beat Them, Join Them

Douglas Massey of the University of Pennsylvania’s Population Studies Center presented a novel approach to securing sensitive or personal data in his presentation titled “When surveys fail,” addressing the fact that many such research efforts simply demand that the researcher abandon the traditionally administered surveyor questionnaire. For a detailed study of undocumented workers from Mexico, for example, he has combined ethnography and surveys into an approach, called “ethnosurvey,” in which anthropologists get to personally know the members of a Mexican town and then travel to a town in the United States where many of the workers go to work.

By demonstrating their involvement in the community and their knowledge of its members and worker’s relatives, Massey and colleagues are able to establish trust, over a period of years, and to get answers about the laborers’ experiences, documenting answers to non-standardized questions in an extensive data recording sheet. But, of course, even ethnosurveys are plagued by the same problems of faulty recall and encoding that researchers using more standard surveys encounter.

Practical Implications for Symptoms, Illness, & Health

Linking the findings from self-report research directly to medical practice , speaker Arthur Barsky of Harvard Medical School’s Division of Psychiatry at Brigham and Women’s Hospital pointed out that there is a very poor correlation between the patient’s report of the seriousness of his symptoms, the medical findings of the presence of a pathological condition, and the patient’s utilization of health care.

Why, then, given the flawed nature of self-report of symptoms, is history-taking so important in medical practice? Several speakers reaffirmed the dogma that hi story-taking must come first. The implication would seem to be that the real skill of history-taking is in the ability to get useful information about the patient, despite the fact that his/her self-report is probably riddled with factual errors. As other speakers stated repeatedly during these two days, the respondent is always telling us something important. It just isn’t always the answer to the question we thought we were asking!

APS regularly opens certain online articles for discussion on our website. Effective February 2021, you must be a logged-in APS member to post comments. By posting a comment, you agree to our Community Guidelines and the display of your profile information, including your name and affiliation. Any opinions, findings, conclusions, or recommendations present in article comments are those of the writers and do not necessarily reflect the views of APS or the article’s author. For more information, please see our Community Guidelines .

Please login with your APS account to comment.

the problem of the self report in survey research

New Report Finds “Gaps and Variation” in Behavioral Science at NIH

A new NIH report emphasizes the importance of behavioral science in improving health, observes that support for these sciences at NIH is unevenly distributed, and makes recommendations for how to improve their support at the agency.

the problem of the self report in survey research

APS Advocates for Psychological Science in New Pandemic Preparedness Bill

APS has written to the U.S. Senate to encourage the integration of psychological science into a new draft bill focused on U.S. pandemic preparedness and response.

the problem of the self report in survey research

APS Urges Psychological Science Expertise in New U.S. Pandemic Task Force

APS has responded to urge that psychological science expertise be included in the group’s personnel and activities.

Privacy Overview

CookieDurationDescription
__cf_bm30 minutesThis cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
CookieDurationDescription
AWSELBCORS5 minutesThis cookie is used by Elastic Load Balancing from Amazon Web Services to effectively balance load on the servers.
CookieDurationDescription
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 27 daysSet by addthis.com to determine the usage of addthis.com service.
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_3507334_11 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CookieDurationDescription
loc1 year 27 daysAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextIdneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requestsneverThis cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Breadcrumbs Section. Click here to navigate to respective pages.

The Science of Self-report

The Science of Self-report

DOI link for The Science of Self-report

Get Citation

Rigorous methodological techniques have been developed in the last decade to improve the reliability and accuracy of self reports from research volunteers and patients about their pain, mood, substance abuse history, or dietary habits. This book presents cutting-edge research on optimal methods for obtaining self-reported information for use in the evaluation of scientific hypothesis, in therapeutic interventions, and in the development of prognostic indicators. ALTERNATE BLURB: Self-reports constitute critically important data for research and practice in many fields. As the chapters in this volume document, psychological and social processes influence the storage and recall of self-report information. There are conditions under which self-reports should be readily accepted by the clinician or researcher, and other conditions where healthy scepticism is required. The chapters demonstrate methods for improving the accuracy of self-reports, ranging from fine-tuning interviews and questionnaires to employing emerging technologies to collect data in ways that minimize bias and encourage accurate reporting. Representing a diverse group of disciplines including sociology, law, psychology, and medicine, the distinguished authors offer crucial food for thought to all those whose work depends on the accurate self-reports of others.

TABLE OF CONTENTS

Part | 2  pages, part i general issues in self-report, chapter 1 | 6  pages, information no one else knows: the value of self-report, chapter 2 | 16  pages, ethical issues in the collection of self-report data, part | 4  pages, part ii cognitive processes in self-report, chapter 3 | 20  pages, remembering what happened: memory errors and survey reports, chapter 4 | 14  pages, temporal representation and event dating, chapter 5 | 18  pages, the use of memory and contextual cues in the formation of behavioral frequency judgments, chapter 6 | 20  pages, emotion and memory: implications for self-report, part iii self-reporting sensitive events and characteristics, chapter 7 | 18  pages, asking questions about threatening topics: a selective overview, chapter 8 | 20  pages, the association between self-reports of abortion and breast cancer risk: fact or artifact, part iv special issues on self-report, chapter 9 | 16  pages, when surveys fail: an alternative for data collection, chapter 10 | 14  pages, assessing protocols for child interviews, chapter 11 | 24  pages, do i do what i say a perspective on self-report methods in drug dependence epidemiology, part v self-report of distant memories, chapter 12 | 10  pages, suggestion, imagination, and the transformation of reality, chapter 13 | 16  pages, validity of women’s self-reports of documented child sexual abuse, part vi self-reporting of health behaviors and psychiatric symptoms, chapter 14 | 28  pages, methodological issues in assessing psychiatric disorders with self-reports, chapter 15 | 20  pages, “i took the medicine like you told me, doctor”: self-report of adherence with medical regimens, chapter 16 | 20  pages, real-time self-report of momentary states in the natural environment: computerized ecological momentary assessment, part vii self-reporting of physical symptoms, chapter 17 | 18  pages, psychological factors influencing the reporting of physical symptoms, chapter 18 | 22  pages, self-report of pain: issues and opportunities, chapter 19 | 24  pages, the validity of bodily symptoms in medical outpatients.

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

The Limitations of Self-Report Measures of Non-cognitive Skills

Subscribe to the brown center on education policy newsletter, martin r. west martin r. west associate professor of education - harvard university.

December 18, 2014

Recent evidence from economics and psychology highlights the importance of traits other than general intelligence for success in school and in life. Disparities in so-called “non-cognitive skills” appear to contribute to the academic achievement gap separating rich from poor students. Non-cognitive skills may also be more malleable and thus amenable to intervention than cognitive ability, particularly beyond infancy and early childhood. Understandably, popular interest in measuring and developing students’ non-cognitive skills has surged.

As practice and policy race forward, however, research on non-cognitive skills remains in its infancy. There is little agreement on which skills are most important, their stability within the same individual in different contexts, and, perhaps most fundamentally, how they can be reliably measured. Whereas achievement tests that assess how well children can read, write, and cipher are widely available, non-cognitive skills are typically assessed using self-report and, less frequently, teacher-report questionnaires. Like achievement tests, questionnaires have the advantage of quick, cheap, and easy administration. And unlike behavioral proxies that might be used to gauge the overall strength of a student’s character, questionnaires can be crafted to capture more specific traits to be targeted for development.

One obvious limitation of questionnaires is that they are subject to faking, and therefore, to social desirability bias. When considering whether an item such as “I am a hard worker” should be marked “very much like me,” a child (or her teacher or parent) may be inclined to choose a higher rating in order to appear more attractive to herself or to others. To the extent that social desirability bias is uniform within a group under study, it will inflate individual responses but not alter their rank order. If some individuals respond more to social pressure than others, however, their placement within the overall distribution of responses could change.

Possibly more troublesome is reference bias, which occurs when survey responses are influenced by differing standards of comparison. A child deciding whether she is a hard worker must conjure up a mental image of hard work to which she can compare her own habits. A child with high standards might consider a hard worker to be someone who does all of her homework well before bedtime and, in addition, organizes and reviews all of her notes from the day’s classes. Another child might consider a hard worker to be someone who brings home her assignments and attempts to complete them, even if most of them remain unfinished the next morning.

To illustrate the potential for reference bias in self-reported measures of non-cognitive skills, I draw on cross-sectional data from a sample of Boston students discussed in detail in a recent working paper . Colleagues from Harvard, MIT, and the University of Pennsylvania and I used self-report surveys to gather information on non-cognitive skills from more than 1,300 eighth-grade students across 32 of the city’s public schools, and linked this information to administrative data on the students’ behavior and test scores. The non-cognitive skills we measured include conscientiousness, self-control, and grit – a term coined by our collaborator Angela Duckworth to capture students’ tendency to sustain interest in, and effort toward, long-term goals.

Importantly, the schools attended by students in our sample include both open-enrollment public schools operated by the local school district and five over-subscribed charter schools that have been shown to have large, positive impacts on student achievement as measured by state math and English language arts tests. These charter schools have a “no excuses” orientation and an explicit focus on cultivating non-cognitive skills as a means to promote academic achievement and post-secondary success.

Our results confirm that the surveys we administered capture differences in non-cognitive skills that are related to important behavioral and academic outcomes. Figures 1a, 1b, and 1c compare the average number of absences, the share of students who were suspended, and the average test-score gains between fourth and eighth grade of students who ranked in the bottom- and top-quartile on each skill. [1]  It shows, for example, that students who rated themselves in the bottom quartile with respect to self-control were absent 2.9 more days than students in the top quartile, and were nearly three times as likely to have been suspended as eighth graders; similar differences in absences and suspension rates are evident for conscientiousness and grit. In addition, the differences in test-score gains between bottom- and top-quartile students on each non-cognitive skill amount to almost a full year’s worth of learning in math over the middle school years.

Figure 1a. Average days absent, by non-cognitive skill quartile

west 1a

  Figure 1b. Percent suspended, by non-cognitive skill quartile

west 1b

Figure 1c. Math test-score gains between 4 th and 8 th grade, by non-cognitive skill quartile

west 1c

Note: * indicates that the difference between bottom- and top-quartile students is statistically significant at the 95 percent confidence level.

Paradoxically, however, the positive relationships between these self-reported measures of non-cognitive skills and growth in academic achievement dissipate when the measures are aggregated to the school level. In other words, schools in which the average student reports higher levels of conscientiousness, self-control, and grit do not exhibit higher test-score gains than do other schools. In fact, students in these schools appear to learn a bit less.

This paradox is most vivid when comparing students who attend “no excuses” charter schools and those who attend open-enrollment district schools. Despite making far larger test-score gains than students attending open-enrollment district schools, and despite the emphasis their schools place on cultivating non-cognitive skills, charter school students exhibit markedly lower average levels of self-control as measured by student self-reports (see Figure 2). This statistically significant difference of -0.23 standard deviations is in the opposite direction of that expected, based on the student-level relationships between self-control and test-score gains displayed above. The average differences between the charter and district students in conscientiousness and grit, although statistically insignificant, run in the same counter-intuitive direction. [2]

Figure 2. Average math test-score gains and “non-cognitive skills,” by school type

west 2 rev

Note: * indicates that the difference between district and charter schools is statistically significant at the 95 percent confidence level.

Two competing hypotheses could explain this paradox. One is that the measures are accurate and the charter schools, despite their success in raising test scores, and contrary to their pedagogical goals, weaken students’ non-cognitive skills along crucial dimensions such as conscientiousness, self-control, and grit.

The alternative and, in my view, more plausible hypothesis is that the measures are misleading due to reference bias stemming from differences in school climate between district and charter schools. Figure 3 confirms that the academic and disciplinary climates of the charter schools in our sample, as perceived by their students, do in fact differ from those of the open-enrollment district schools. Charter students rate teacher strictness, the clarity of rules, and the work ethic expected of them substantially higher than do students in district schools. For example, charter students’ ratings of expectations for student behavior exceed those of their district counterparts by 0.57 on the 5-point scale used for these items. Students attending charter schools also report substantially lower levels of negative peer effects and modestly lower levels of student input in their schools. Of course, these data also come from self-report surveys and may themselves be subject to reference bias. Nonetheless, they suggest the academic and disciplinary climates of the charter schools differ in ways that could lead their students to set a higher bar when assessing their conscientiousness, self-control, and grit.

Figure 3. Academic and disciplinary climates as perceived by students, by school type

west 3 rev

Other recent studies of “no excuses” charter schools reinforce the plausibility of the reference bias hypothesis. For example, a 2013 Mathematica evaluation of KIPP middle schools finds large positive effects on student test scores and time spent on homework, but no effects on student-reported measures of self-control and persistence in school. Similarly, Will Dobbie and Roland Fryer find that attending the Harlem Promise Academy reduced student-reported grit, despite having positive effects on test scores and college enrollment, and negative effects on teenage pregnancy (for females) and incarceration (for males). This parallel evidence from research in similar settings confirms that reference bias stemming from differences in school climate is the most likely explanation for these paradoxical findings.

If the apparent negative effects of attending a “no excuses” charter school on conscientiousness, self-control, and grit do in fact reflect reference bias, then what our data show is that these schools influence the standards to which students hold themselves when evaluating their own non-cognitive skills. The consequences of this shift in normative standards for their actual behavior both within and outside of school are of course unknown – and merit further research.

As importantly, it appears that existing survey-based measures of non-cognitive skills, although perhaps useful for making comparisons among students within the same educational environment, are inadequate to gauge the effectiveness of schools, teachers, or interventions in cultivating the development of those skills. Evaluations of the effects of teacher, school, and family influences on the development of non-cognitive skills could lead to false conclusions if the assessments used are biased by distinct frames of reference.

In the rush to embrace non-cognitive skills as the missing piece in American education, policymakers may overlook the limitations of extant measures. It is therefore essential that researchers and educators seeking to enhance students’ non-cognitive skills develop alternative measures that are valid across a broad range of school settings. In the meantime, policymakers should resist proposals to incorporate survey-based measures of non-cognitive skills into high-stakes accountability systems. 

[1] To measure math test-score gains, we regressed 8 th -grade test scores on a cubic polynomial of 4 th -grade scores in both math and English language arts and used the residuals from this regression as a measure of students’ performance relative to expectations based on their achievement before entering middle school.

[2] Estimates of the impact of attending a charter school based on admissions lotteries confirm that these patterns are not due to selection of students with weak non-cognitive skills into charter schools; rather each year’s attendance at a charter has a statistically significant negative impact on self-reported conscientiousness, self-control, and grit.

Early Childhood Education K-12 Education

Governance Studies

Brown Center on Education Policy

Michael Trucano, Sopiko Beriashvili

September 20, 2024

Megan Kuhfeld, Karyn Lewis

Sweta Shah, Juanita Morales

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Measuring bias in self-reported data

Robert rosenman.

School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA

Vidhura Tennekoon

School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA ude.usw@aruhdiv

Laura G. Hill

Department of Human Development, Washington State University, 523 Johnson Tower, Pullman WA 99164, USA ude.usw@lliharual

Response bias shows up in many fields of behavioural and healthcare research where self-reported data are used. We demonstrate how to use stochastic frontier estimation (SFE) to identify response bias and its covariates. In our application to a family intervention, we examine the effects of participant demographics on response bias before and after participation; gender and race/ethnicity are related to magnitude of bias and to changes in bias across time, and bias is lower at post-test than at pre-test. We discuss how SFE may be used to address the problem of ‘response shift bias’ – that is, a shift in metric from before to after an intervention which is caused by the intervention itself and may lead to underestimates of programme effects.

1 Introduction

In this paper, we demonstrate the potential of a common econometric tool, stochastic frontier estimation (SFE), to measure response bias and its covariates in self-reported data. We illustrate the approach using self-reported measures of parenting behaviours before and after a family intervention. We demonstrate that in addition to affecting targeted behaviours, an intervention may also affect any bias associated with self-assessment of those behaviours. We show that SFE can be used to identify and correct for bias in self-assessment both before and after treatment, resulting in more accurate estimates of treatment effects.

Response bias is a widely discussed phenomenon in behavioural and healthcare research where self-reported data are used; it occurs when individuals offer self-assessed measures of some phenomenon. There are many reasons individuals might offer biased estimates of self-assessed behaviour, ranging from a misunderstanding of what a proper measurement is to social-desirability bias, where the respondent wants to ‘look good’ in the survey, even if the survey is anonymous. Response bias itself can be problematic in programme evaluation and research, but is especially troublesome when it causes a recalibration of bias after an intervention. Recalibration of standards can cause a particular type of measurement bias known as ‘response-shift bias’ ( Howard, 1980 ). Response-shift bias occurs when a respondent's frame of reference changes across measurement points, especially if the changed frame of reference is a function of treatment or intervention, thus, confounding the treatment effect with bias recalibration. More specifically, an intervention may change respondents’ understanding or awareness of the target concept and the estimation of their level of functioning with respect to the concept ( Sprangers and Hoogstraten, 1989 ), thus changing the bias at each measurement point. In fact, some treatments or interventions are intended to change how respondents look at the target concept. Further complicating matters is that an intervention may affect not only a respondent's metric for targeted behaviours across time points (resulting in response shift bias) but may also affect other types of response bias. For example, social desirability bias may decrease over the course of an intervention as respondents come to know and trust a service provider. Thus, it is necessary to understand the degree and type of response bias at both pretest and posttest in order to determine whether response shift has occurred.

When there is a potential for confusing bias recalibration with treatment outcomes, statistical approaches may be useful ( Schwartz and Sprangers, 1999 ). In recent years, researchers have applied structural equation modelling (SEM) to the problem of decomposing error in order to identify response shift bias ( Oort, 2005 ; Oort et al., 2005 ). In this paper, we suggest a different statistical approach which reveals response bias at a single time point as well as differences in bias across time points. Perhaps more importantly, it identifies covariates of these differences. When applied before and after an intervention, it reveals differences related to changes in respondents’ frame of reference. Thus, it can be used to decompose errors so that recalibration of the bias occurring across time points can be distinguished from simple response bias within each time point. The suggested approach is based on SFE ( Aigner et al., 1977 ; Battese and Coelli, 1995 ; Meeusen and van den Broeck, 1977 ), a technique widely used in economics and operational research.

Our approach has two significant advantages over that proposed by Oort et al. (2005) . Their approach reveals only aggregate changes in the responses and requires a minimum of two temporal sets of observations on the self-rating of interest as well as multiple measures of the item to be rated. SFE, to its credit, can identify response differences across individuals (as opposed to simply aggregate response shifts) with a single temporal observation and a single measure, so is much less data intensive. Moreover, since it identifies differences at the individual level, it allows the analyst to identify not only that responses differ by individual, but what characteristics are at the root of the differences. Thus, as long as more than one temporal observation is available for respondents, SFE can be used to systematically identify different types of response recalibration by looking at the changes at the individual level, and aggregating them. SFE again has an advantage because the causes of both bias and recalibration can be identified at the individual level.

What may superficially be seen as two disadvantages to SFE when compared to SEM approaches are actually common to both methods. First, both measure response (and therefore response shift) against a common subjective metric established by the norm of the data. In fact, any systematic difference by an individual from this norm is how we measure ‘response bias’. With both SEM and SFE, if an objective metric exists, the difference between the self-rating and the objective measure is easily established. A second apparent disadvantage is that SFE requires a specific assumption of a truncated distribution of the bias (although it is possible to test this assumption statistically). While SEM can reveal response shift on individual bias without such a strong assumption, aggregate changes become manifest only if “many respondents experience the same shift in the same direction” [ Oort, (2005) , p.595]. Hence, operationally the assumptions are nearly equivalent.

In next section, we explain how we model response bias and response recalibration within the SFE framework. In Section 3, we present our empirical application including the results of our baseline model and a model with heteroscedastic errors as a robustness check. In Section 4, we discuss the relative merits of the method we propose, together with its limitations and offer some conclusions.

2 Response bias and SFE

We are concerned with situations where individuals do not have an objective measure of some variable of interest which we denote Y * it , and we have to use a subjective measure (denoted Y it ) as a proxy instead. An unbiased estimate of the variable of interest Y * it can be defined as,

where Y it denotes the observed measurement, Y * it is the true attribute being measured and Z it represents variables other than Y * it . When Y it is self-reported Z it includes (often unobserved) variables affecting the frame of reference used by respondents for measuring Y * it and (1) is not assured. Within this context, response bias is simply the case that Y it | Y * it , Z it ≠ Y it | Y * it . The bias is upward if Y it | Y * it , Z it > Y it | Y * it and downward if the inequality goes the other way.

Our approach for measuring response bias and bias recalibration (change in response bias between two time periods) is based on the Battese and Coelli (1995) adaptation of the stochastic frontier model (SFE) independently proposed by Aigner et al. (1977) , and Meeusen and van den Broeck (1977) . Let

where Y i t ∗ is the true (latent) outcome, T denotes some treatment or intervention, 1 X it are variables other than the treatment that explain the outcome and ε i is a random error term. For identification, we assume that ε it is distributed iid N ( 0 , σ ε 2 ) . The observed self-reported outcome is a combination of true outcome and the response bias Y i t R .

We consider the specific case that the bias term Y i t R has a truncated-normal distribution

where u it is a random variable which accounts for response shift away from a subjective norm response level (usually called the ‘frontier’ in SFE) and is distributed N ( μ i t , σ u 2 ) independent of ε it . Moreover,

where the vector z it includes variables (other than the treatment) that explain the specific deviation from the response frontier. Subscript i indexes the individual observation and, subscript t denotes time. 2 Substituting (2), (4) and (5) in (3) we can write,

where φ(.) and Φ(.) are the standard normal probability density function and cumulative probability functions, respectively. Any treatment effect is given by β 0 in equation (6) . The normal relationship between the Xs and Y are given by β t . The last three terms on the right hand side represent the observation-specific response bias from this normal relationship. Treatment can affect both the maximum possible value of the measured outcome of a given individual (as defined by X it β t ), and the response bias. If treatment changes the response bias it will be indicated by the term δ 0 and the bias recalibration is given by

The estimated δ 0 coefficient on treatment indicates how treatment has changed response bias. If δ 0 = 0 there is no recalibration and the response bias, if it exists, is not affected by the treatment. Cross terms of treatment and other variables (that is, slope dummy variables) may be used if the treatment is thought to change the general way these other variables interact with functioning.

Recalibration can occur independently of the treatment effect. In fact, recalibration is sometimes a goal of the treatment or intervention in addition to the targeted outcome, which means a desired outcome is that δ ≠ 0 and Y i 1 | Y * it ≠ Y i 2 | Y * it for t ∈{1,2}. In other words, there is a change in individual measurement scale caused (and intended) by the intervention.

3 An application to evaluation of a family intervention

We applied SFE to examine response bias and recalibration in programme evaluations of a popular, evidence-based family intervention (the Strengthening Families Program for Parents and Youth 10–14, or SFP) ( Kumpfer et al., 1996 ). Families attend SFP once a week for seven weeks and engage in activities designed to improve family communication, decrease harsh parenting practices, and increase parents’ family management skills. At the beginning and end of a programme, parents report their level of agreement with various statements related to skills and behaviours targeted by the intervention (e.g., ‘I have clear and specific rules about my child's association with peers who use alcohol’). Consistent with the literature on response shift, we hypothesised that non-random bias would be greater at pretest than at posttest as parents changed their standards about intervention-targeted behaviours and became more conservative in their self-ratings. In other words, we expected that after the intervention parents would recalibrate their self-ratings downward, resulting in an underestimate of the programme's effects.

Our data consisted of 1437 parents who attended 94 SFP cycles in Washington State and Oregon from 2005 through 2009. 25% of the participants identified themselves as male, 72% as female, and 3% did not report gender. 27% of the participants identified themselves as Hispanic/Latino, 60% as White, 2% as Black, 4% as American Indian/Alaska Native, 3% as other or multiple race/ethnicity, and 3% did not report race/ethnicity. Almost 74% of the households included a partner or spouse of the attending parent, and 19% reported not having a spouse or partner. For almost 8% of the sample, the presence of a partner or spouse is unknown. Over 62% of our observations are from Washington State, with the remainder from Oregon.

3.2 Measures

The outcome measure consisted of 13 items assessing parenting behaviours targeted by the intervention, including communication about substance use, general communication, involvement of children in family activities and decisions, and family conflict. Items were designed by researchers of the programme's efficacy trial and information about the scale has been reported on elsewhere ( Spoth et al., 1995 ; Spoth et al., 1998 ). Cronbach's alpha (a measure of internal consistency) in the current data was .85 at both pretest and posttest. Items were scored on a 5-point Likert-type scale ranging from 1 (‘strongly disagree’) to 5 (‘strongly agree’).

Variables used in the analysis, including definitions and summary statistics, are presented in Table 1 . The average family functioning, as measured by the change in self-assessed parenting behaviours from the pretest to the posttest, increased from 3.98 to 4.27 after participation in SFP.

Variable names, descriptions and summary statistics

NameDescriptionMSD
Pretest functioningSemi-continuous (0-5)3.9790.546
Posttest functioningSemi-continuous (0-5)4.2730.461
MaleIf Male = 10.2500.433
Gender missingIf gender not reported = 10.0300.170
WhiteIf White = 10.6010.490
BlackIf Black = 10.0230.150
Latino/HispanicIf Latino/Hispanic = 10.2690.443
Native AmericanIf Native American = 10.0400.195
OtherIf Other race/ethnicity = 10.0340.182
Race missingIf race not reported = 10.0340.182
AgeInteger (17-73)38.8227.846
Partner or spouseIf Partner or spouse in family = 10.7360.441
Partner or spouse missingIf Partner or spouse in family not reported = 10.0770.266
Partner or spouse attendsIf Partner or spouse attended SFP = 10.4990.500
Washington StateIf family lives in Washington State = 10.6220.485

3.3 Procedure

Pencil-and-paper pretests were administered as part of a standard, ongoing programme evaluation on the first night of the programme, before programme content was delivered; posttests were administered on the last night of the programme. All data are anonymous; names of programme participants are not linked to programme evaluations and are unknown to researchers. The Institutional Review Board of Washington State University issued a Certificate of Exemption for the procedures of the current study.

We used SFE to estimate (pre- and post-treatment) family functioning scores as a function primarily of demographic characteristics. Based on previous literature ( Howard and Dailey, 1979 ), we hypothesised that the one-sided errors (response bias) would be downward, and preliminary analysis supported that hypothesis. 3 Additional preliminary analysis of which variables to include among z i (including a model using all the explanatory variables) led us to conclude that three variables determined the level of bias in the family functioning assessment – age, Latino/Hispanic ethnicity, and whether or not the functioning measure was a pretest or posttest assessment. We used the ‘xtfrontier’ routine in Stata to estimate the parameters of our models. Unlike the applications of SFE to technical efficiency estimation our model does not require log transforming the dependent variable.

3.4 The baseline model

The results of the baseline SFE model are shown in Table 2 . The Wald χ 2 statistic indicated that the regression was highly significant. Several demographic variables were found to influence the assessment of family functioning with conventional statistical significance. Males gave lower estimates of family functioning than did females and those with unreported gender. All non-White ethnic groups (and those with unreported race/ethnicity) assessed their family's functioning more highly than did White respondents. Participation in the Strengthening Families Program increased individuals’ assessments of their family's functioning.

SFE - total effects model

VariablePSEZp < Z
Functioning
    Treatment0.1560.0275.870.000
    Male−0.1190.020−6.030.000
    Gender missing−0.0180.058−0.300.760
    Black0.1670.0543.110.002
    Latino/Hispanic0.2560.0298.860.000
    Native American0.0900.0432.080.038
    Other0.1740.0453.830.000
    Race missing0.1130.0542.080.038
    Age−0.0050.001−3.920.000
    Partner or spouse−0.0260.022−1.180.237
    Partner or spouse missing−0.0620.037−1.700.090
    Washington State0.0230.0181.310.189
    Constant4.6050.05485.630.000
μ
    Treatment−1.1950.407−2.940.003
    Hispanic1.1000.3832.870.004
    Age−0.0520.028−1.880.061
lnsigma20.2910.2011.000.317
inlgtgamma2.5590.2639.720.000
σ 1.3380.389
γ 0.9280.018
σ 1.2420.383
σ 0.0960.010
Wald χ (15) = 331.46
Prob > χ = 0.000

We assessed bias, and its change, from the coefficient estimates for the δ parameters where μ i = z i δ. Our first overall question was if, in fact, there was a one-sided error. Three measures of unexplained variation are shown in Table 2 : σ 2 = E (ε i – u i ) 2 is the variance of the total error, which can be broken down into component parts, σ u 2 = E ( u i 2 ) and σ ε 2 = E ( ε i 2 ) . The statistic γ = σ u 2 σ u 2 + σ ε 2 gives the percent of total unexplained variation attributable to the one-sided error. To ensure 0 ≤ γ ≤ 1 the model was parameterised as the inverse logit of γ and reported as inlgtgamma. Similarly, the model estimated the natural log of σ 2 , reported as lnsigma2, and used these estimates to derive σ 2 , σ ε 2 , σ u 2 and γ. As seen in the table the estimates for inlgtgamma was highly significant but the estimate for lnsigma2 had a p-value of 0.317, which means we cannot reject a hypothesis that all of the variation in the responses is due to respondent-specific bias. Hence, we found strong support for the one-sided variation that we call bias, and we saw that by far the most substantial portion of the unexplained variation in our data came from that source.

Three variables explained the level of bias. Latino/Hispanic respondents on average had more biased estimates of their family functioning. Looking again at equation (3) , we see that this means they, relative to other ethnic groups, underestimated their family functioning. However, we found that older participants had smaller biases, thus giving closer estimates of their family's relative functioning. Of primary interest is the estimate of the treatment effect. Participation in SFP strongly lowered the bias, on average.

3.5 Decomposing the measured change in functioning

The total change in the functioning score averaged 0.295. This total change consisted of two parts as indicated by the following:

Total change = Measured prescore − Measured postscore = (Real prevalue − Prevalue bias) − (Real postvalue − Postvalue bias) = Real change − (Postvalue bias − Prevalue bias) The term in parentheses is negative (the estimation indicates that treatment lowered the bias). Thus, the total change in the family functioning score underestimated the improvement due to SFP, although the measured post-treatment family functioning was not as large as it would seem from the reported family functioning scores, on average. Table 3 shows the average estimated bias by pre- and post-treatment, and the average change in bias, which was –0.133. Thus, the average improvement in family functioning was underestimated by this amount.

Averages of bias and change

VariableMSD
Estimated u, pre-treatment0.4690.368
Estimated u, post-treatment0.3350.273
Change in u, post minus pre−0.1330.346

Table 4 shows the results of a regression on bias change and demographic and other characteristics. Males and Black respondents had marginally larger bias changes, while those with race/ethnicity unreported had smaller bias changes. Since the bias change was measured as postscore bias minus prescore bias, this means that the bias changed less, on average, for male and Black respondents, but more, on average, for those whose race was unreported.

Regression of bias change

Dependent variable: change in bias β SEtp < t
Male0.0500.0232.190.029
Gender missing0.1000.0641.550.122
Black0.1140.0621.840.066
Latino/Hispanic0.0150.0220.680.496
Native American0.0480.0471.020.308
Other0.0780.0511.540.125
Race/ethnicity missing–0.1470.061–2.420.016
Age0.0030.0012.740.006
Partner or spouse0.0320.0281.130.258
Partner or spouse information missing0.0510.0401.270.203
Washington State–0.0020.020–0.110.912
Partner or spo use attended–0.0090.024–0.360.721
Constant–0.3030.054–5.650.000
SourceSum of square errorsdfF(12, 1424) = 2.4
Model3.40804212Prob. >F = 0.0044
Residual168.21811,424R-squared = 0.019
Total171.62621,436

3.6 The SFE model with heteroscedastic error

One alternative to our baseline model (known as the total effects model in SFE terminology) which generated the results in Table 2 is a SFE model which allows for heteroscedasticity in ε i , u i , or both. More precisely, for this model, we maintained equation (3) but had E (ε 2 ) = ω ε w i and E ( u ) = ω u w i where ω ε and ω u are parameters to be estimated and w i are variables that explain the heteroscedasticity. We note that w i need not be the same in the two expressions, but since elements of ω ε and ω u can be zero we lose no generality by showing it as we do, and in fact in our application we used the same variables in both expressions, those that we used to explain μ in the first model. Table 5 reports the results of such a model. In this case, the one-sided error we ascribe to bias is evident from statistically significant parameters in the explanatory expressions for σ u 2 .

SFE with heteroscedasticity

VariablePSEZp<Z
Functioning
    Treatment0.2220.0326.940.000
    Male-0.0980.019-5.110.000
    Gender missing0.0020.0570.040.970
    African Americans0.1590.0542.950.003
    Hispanic0.3440.0359.950.000
    Native American0.0960.0422.270.023
    Other0.1580.0443.630.000
    Race missing0.0900.0531.690.091
    Age–0.0010.002–0.650.516
    Partner or spouse–0.0270.021–1.290.199
    Partner or spouse missing–0.0440.035–1.250.213
    Washington State0.0170.0170.980.325
    Constant4.5320.08851.550.000
Ln )
    Treatment–0.7150.187–3.810.000
    Hispanic–1.1320.288–3.940.000
    Age–0.0070.010–0.660.512
    Constant–1.9060.434–4.390.000
ln (σ )
    Treatment–0.2470.116–2.130.033
    Hispanic0.9130.1237.420.000
    Age–0.0050.007–0.670.504
    Constant–0.7610.319–2.390.017
Wald χ (12) = 253.60
Prob. > χ = 0.000

We note first that the estimates in the main body of the equation were quantitatively and qualitatively very similar to those for the non-heteroscedastic SFE model. The only substantive change is that age was no longer significant at an acceptable p-value, and race unreported had a p-value of 0.1. All signs and magnitudes were similar. Once again, results indicated that participation in SFP (treatment) strongly improved functioning. Additionally, treatment lowered the variability of both sources of unexplained variation across participants. Th e decreased unexplained variation due to ε is likely explained by individuals having a better idea of the constructs assessed by scale items. For our purposes, the key statistic here is the coefficient of treatment explaining σ u 2 . The estimated parameter was negative and significant with a p-value = 0.03. Since the bias was one-sided we clearly can conclude that going through SFP lowered the variability of the bias significantly. Moreover, these estimates can be used to predict the bias of each observation, and with this model the average bias fell from 0.545 to 0.492, so while the biases were larger with this model, the decrease in the average (–0.63) was about one-half the decrease we saw in the first model.

4 Discussion and conclusions

As we noted earlier, bias in self-rating is of concern in a variety of research areas. In particular, the potential for recalibration of self-rating bias as a function of material or skills learned in an intervention has long been a concern to programme evaluators as it may result in underestimates of programme effectiveness ( Howard and Dailey, 1979 ; Norman, 2003 ; Pratt et al., 2000 ; Sprangers, 1989 ). However, in the absence of an objective performance measurement, it has not been possible to determine whether lower posttest scores truly represent response-shift bias or instead an actual decrement in targeted behaviours or knowledge (i.e., an iatrogenic effect of treatment). By allowing evaluators to test for a decrease in response bias from pretest to posttest, SFE provides a means of resolving this conundrum.

The SFE method, however, is not without problems. The main limitation is that the estimates rely on assumptions about the distributions of the two error components. Model identification requires that one of the error terms, the bias term in our application, to be one-sided. This, however, is not as strong an assumption as it looks, for two reasons. First, often there is prior information or theory that indicates the most likely direction for the bias. Second, the validity of the assumption can be tested statistically.

We presented SFE as a method to identify response bias and changes in response bias, within the context of self-reported measurements at individual and aggregate levels. Even though we proposed a novel application, the techniques not new, and has been widely used in economics and operational research for over three decades. The procedure is easily adoptable by researchers, since it is already supported by several statistical packages including Stata ( StataCorp., 2009 ) and Limdep ( Econometrica Software, Inc., 2009 ).

Response bias has long been a key issue in psychometrics, with response shift bias a particular concern in programme evaluation. However, almost all statistical attempts to address the issue have been confined to using SEM to test for response shift bias at the aggregate level. As noted in the introduction, our approach has three significant advantages over SEM techniques that try to measure response bias. SEM requires more data – multiple time periods and multiple measures, and measures bias only in the aggregate. SFE can identify bias with a single time period (although multiple observations are needed to identify bias recalibration) and identifies response biases across individuals. Perhaps the biggest advantage over SEM approaches is that SFE not only identifies bias but also provides information about the root causes of the bias. SFE allows simultaneously analysis about treatment effectiveness, causal factors of outcomes, and covariates to the bias, improving the statistical efficiency of the analysis over traditional SEM which often cannot identify causal factors and covariates to bias, and when it can, it requires two-step procedures. And since SFE allows the researcher to identify bias and causal factors at the individual level, it expands our ability to identify, understand, explain, and potentially correct for, response shift bias. Of course, bias at the individual level can be aggregated to measures comparable to what is learned through SEM approaches.

Acknowledgements

The authors would like to thank the anonymous referees. This study was supported in part by the National Institute of Drug Abuse (grants R21 DA025139-01Al and R21 DA19758-01). We thank the programme providers and families who participated in the programme evaluation.

Robert Rosenman is a Professor of Economics in the School of Economic Sciences at Washington State University. His current research aims to develop new approaches to measure of economic benefits of substance abuse prevention programmes. His research has appeared in journals such as the American Economic Review , Health Economics , Clinical Infectious Diseases and Health Care Management Science .

Vidhura Tennekoon is a Graduate student in the School of Economic Sciences at Washington State University. His research interests are in health economics, applied econometrics and prevention science with a current research focus in dealing with misclassification in survey data.

Laura G. Hill is a Psychologist and Associate Professor of Human Development at Washington State University. Her research focuses on the translation of evidence-based prevention programmes from research to practice and measurement of programme effectiveness in uncontrolled settings.

Reference to this paper should be made as follows: Rosenman, R., Tennekoon, V. and Hill, L.G. (2011). ‘Measuring bias in self-reported data’, Int. J. Behavioural and Healthcare Research , Vol. 2, No. 4/2011, pp. 320-332.

1 We present a single model that allows for pre- and post-intervention measurement of the outcome of interest and bias. If the self-reported data is not related to an intervention, β 0 and δ 0 (below) are identically 0 and there is only one time period, t .

2 Due to symmetry of the normal distribution, without loss of generality we can also assume that the bias distribution is right truncated.

3 When we tried to estimate the parameters of a model with one-sided errors upward the maximisation procedure failed to converge. A specification with one-sided errors upward but without a constant term converged, but a null hypothesis that there is a one-side error term was rejected with near certainty, indicating that there is no sizable upward response bias. A similar analysis but with the one-sided upward errors completely random (rather than dependent on treatment and other variables) was also rejected, again with near certainty. Thus, upward bias was robustly rejected.

Contributor Information

Robert Rosenman, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA.

Vidhura Tennekoon, School of Economic Sciences, Washington State University, P.O. Box 646210, Pullman, WA 99164-6210, USA ude.usw@aruhdiv .

Laura G. Hill, Department of Human Development, Washington State University, 523 Johnson Tower, Pullman WA 99164, USA ude.usw@lliharual .

  • Aigner D, Lovell CAK, Schmidt P. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics. 1977; 6 (1):21–37. [ Google Scholar ]
  • Battese GE, Coelli TJ. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics. 1995; 20 2 :325–332. [ Google Scholar ]
  • Econometrica Software, Inc. LIMDEP Version 9.0 [Computer Software] Econometrica Software, Inc.; Plainview, NY: 2009. [ Google Scholar ]
  • Howard GS. Response-shift bias: a problem in evaluating interventions with pre/post self-reports. Evaluation Review. 1980; 4 1 :93–106. DOI: 10.1177/0193841x8000400105. [ Google Scholar ]
  • Howard GS, Dailey PR. Response-shift bias: a source of contamination of self-report measures. Journal of Applied Psychology. 1979; 64 (2):144–150. [ Google Scholar ]
  • Kumpfer KL, Molgaard V, Spoth R. The strengthening families program for the prevention of delinquency and drug use. In: Peters RD, McMahon RJ, editors. Preventing Childhood Disorders, Substance Abuse, and Delinquency, Banff International Behavioral Science Series. Vol. 3. Sage Publications, Inc.; Thousand Oaks, CA, USA: 1996. pp. 241–267. [ Google Scholar ]
  • Meeusen W, van den Broeck J. Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review. 1977; 18 2 :435–444. [ Google Scholar ]
  • Norman G. Hi! How are you? Response shift, implicit theories and differing epistemologies. Quality of Life Research. 2003; 12 3 :239–249. [ PubMed ] [ Google Scholar ]
  • Oort FJ. Using structural equation modeling to detect response shifts and true change. Quality of Life Research. 2005; 14 3 :587–598. [ PubMed ] [ Google Scholar ]
  • Oort FJ, Visser MRM, Sprangers MAG. An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Quality of Life Research. 2005; 14 3 :599–609. [ PubMed ] [ Google Scholar ]
  • Pratt CC, McGuigan WM, Katzev AR. Measuring program outcomes: using retrospective pretest methodology. American Journal of Evaluation. 2000; 21 (3):341–349. [ Google Scholar ]
  • Schwartz CE, Sprangers MAG. Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Social Science & Medicine. 1999; 48 (11):1531–1548. [ PubMed ] [ Google Scholar ]
  • Spoth R, Redmond C, Shin C. Direct and indirect latent-variable parenting outcomes of two universal family-focused preventive interventions: Extending a public health-oriented research base. Journal of Consulting and Clinical Psychology. 1998; 66 (2):385–399. DOI: 10.1037/0022-006x.66.2.385. [ PubMed ] [ Google Scholar ]
  • Spoth R, Redmond C, Haggerty K, Ward T. A controlled parenting skills outcome study examining individual difference and attendance effects. Journal of Marriage and Family. 1995; 57 (2):449–464. DOI: 10.2307/353698. [ Google Scholar ]
  • Sprangers M. Subject bias and the retrospective pretest in retrospect. Bulletin of the Psychonomic Society. 1989; 27 (1):11–14. [ Google Scholar ]
  • Sprangers M, Hoogstraten J. Pretesting effects in retrospective pretest-posttest designs. Journal of Applied Psychology. 1989; 74 (2):265–272. DOI: 10.1037/0021-9010.74.2.265. [ Google Scholar ]
  • StataCorp. Stata Statistical Software: Release 11 [Computer Software] StataCorp LP; College Station, TX: 2009. [ Google Scholar ]
  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

The Use of Self-Report Data in Psychology

  • Disadvantages
  • Other Data Sources

How to Create a Self-Report Study

In psychology, a self-report is any test, measure, or survey that relies on an individual's own report of their symptoms, behaviors, beliefs, or attitudes. Self-report data is gathered typically in paper-and-pencil or electronic format or sometimes through an interview.

Self-reporting is commonly used in psychological studies because it can yield valuable and diagnostic information to a researcher or a clinician.

This article explores examples of how self-report data is used in psychology. It also covers the advantages and disadvantages of this approach.

Examples of Self-Reports

To understand how self-reports are used in psychology, it can be helpful to look at some examples. Some many well-known assessments and inventories rely on self-reporting to collect data.

One of the most commonly used self-report tools is the  Minnesota Multiphasic Personality Inventory (MMPI) for personality testing . This inventory includes more than 500 questions focused on different areas, including behaviors, psychological health, interpersonal relationships, and attitudes. It is often used as a mental health assessment, but it is also used in legal cases, custody evaluations, and as a screening instrument for some careers.

The 16 Personality Factor (PF) Questionnaire

This personality inventory is often used as a diagnostic tool to help therapists plan treatment. It can be used to learn more about various individual characteristics, including empathy, openness, attitudes, attachment quality, and coping style.

Myers-Briggs Type Indicator (MBTI)

The MBTI is a popular personality measure that describes personality types in four categories: introversion or extraversion, sensing or intuiting, thinking or feeling, and judging or perceiving. A letter is taken from each category to describe a person's personality type, such as INTP or ESFJ.

Personality inventories and psychology assessments often utilize self-reporting for data collection. Examples include the MMPI, the 16PF Questionnaire, and the MBTI.

Advantages of Self-Report Data

One of the primary advantages of self-reporting is that it can be easy to obtain. It is also an important way that clinicians diagnose their patients—by asking questions. Those making the self-report are usually familiar with filling out questionnaires.

For research, it is inexpensive and can reach many more test subjects than could be analyzed by observation or other methods. It can be performed relatively quickly, so a researcher can obtain results in days or weeks rather than observing a population over the course of a longer time frame.

Self-reports can be made in private and can be anonymized to protect sensitive information and perhaps promote truthful responses.

Disadvantages of Self-Report Data

Collecting information through a self-reporting has limitations. People are often biased when they report on their own experiences. For example, many individuals are either consciously or unconsciously influenced by "social desirability." That is, they are more likely to report experiences that are considered to be socially acceptable or preferred.

Self-reports are subject to these biases and limitations:

  • Honesty : Subjects may make the more socially acceptable answer rather than being truthful.
  • Introspective ability : The subjects may not be able to assess themselves accurately.
  • Interpretation of questions : The wording of the questions may be confusing or have different meanings to different subjects.
  • Rating scales : Rating something yes or no can be too restrictive, but numerical scales also can be inexact and subject to individual inclination to give an extreme or middle response to all questions.
  • Response bias : Questions are subject to all of the biases of what the previous responses were, whether they relate to recent or significant experience and other factors.
  • Sampling bias : The people who complete the questionnaire are the sort of people who will complete a questionnaire. Are they representative of the population you wish to study?

Self-Report Info With Other Data

Most experts in psychological research and diagnosis suggest that self-report data should not be used alone, as it tends to be biased. Research is best done when combining self-reporting with other information, such as an individual’s behavior or physiological data.

This “multi-modal” or “multi-method” assessment provides a more global, and therefore more likely accurate, picture of the subject.

The questionnaires used in research should be checked to see if they produce consistent results over time. They also should be validated by another data method demonstrating that responses measure what they claim they measure. Questionnaires and responses should be easy to discriminate between controls and the test group.

If you are creating a self-report tool for psychology research, there are a few key steps you should follow. First, decide what type of data you want to collect. This will determine the format of your questions and the type of scale you use. 

Next, create a pool of questions that are clear and concise. The goal is to have several items that cover all the topics you wish to address. Finally, pilot your study with a small group to ensure it is valid and reliable.

When creating a self-report study, determine what information you need to collect and test the assessment with a group of individuals to determine if the instrument is reliable.

Self-reporting can be a useful tool for collecting data. The benefits of self-report data include lower costs and the ability to collect data from a large number of people. However, self-report data can also be biased and prone to errors.

Levin-Aspenson HF, Watson D. Mode of administration effects in psychopathology assessment: Analyses of gender, age, and education differences in self-rated versus interview-based depression . Psychol Assess. 2018;30(3):287-295. doi:10.1037/pas0000474

Tarescavage AM, Ben-Porath YS. Examination of the feasibility and utility of flexible and conditional administration of the Minnesota Multiphasic Personality Inventory-2-Restructured Form . Psychol Assess. 2017;29(11):1337-1348. doi:10.1037/pas0000442

Warner CH, Appenzeller GN, Grieger T, et al. Importance of anonymity to encourage honest reporting in mental health screening after combat deployment . Arch Gen Psychiatry . 2011;68(10):1065-1071. doi:10.1001/archgenpsychiatry.2011.112

Devaux M, Sassi F. Social disparities in hazardous alcohol use: Self-report bias may lead to incorrect estimates . Eur J Public Health . 2016;26(1):129-134. doi:10.1093/eurpub/ckv190

Althubaiti A. Information bias in health research: Definition, pitfalls, and adjustment methods . J Multidiscip Healthc . 2016;9:211-217. doi:10.2147/JMDH.S104807

Hopwood CJ, Good EW, Morey LC. Validity of the DSM-5 Levels of Personality Functioning Scale-Self Report . J Pers Assess. 2018;100(6):650-659. doi:10.1080/00223891.2017.1420660

By Kristalyn Salters-Pedneault, PhD  Kristalyn Salters-Pedneault, PhD, is a clinical psychologist and associate professor of psychology at Eastern Connecticut State University.

  • Utility Menu

University Logo

  • Get Involved
  • News & Events

qualtrics survey

The scary truth about how far behind american kids have fallen.

Students of all ages still haven’t made up the ground they lost during the pandemic.

Sometimes, panics are overblown. Sometimes, older generations are just freaking out about the youngs, as they have since time immemorial.

That’s not the case, unfortunately, with kids’ learning right now, more than four years after the pandemic shuttered classrooms and disrupted the lives of millions of children. The effects were seen almost immediately, as students’ performance in reading and math began to dip  far below pre-pandemic norms , worrying educators and families around the country.

Even now, according to a  new report  released this week by the Center on Reinventing Public Education (CRPE), a research group at Arizona State University that has studied the impact of Covid on education since 2020, the average American student is “less than halfway to a full academic recovery” from the effects of the pandemic.

The report — the group’s third annual analysis of the “state of the American student” — combines test scores and academic research with parent interviews to paint a picture of the challenges facing public schools and the families they serve. That picture is sobering: In spring 2023, just 56 percent of American fourth-graders were performing on grade level in math, down from 69 percent in 2019, according to just one example of  test score data  cited in the report.

Declines in reading were less stark but still concerning, and concentrated in earlier grades, with 65 percent of third-graders performing on grade level, compared with 72 percent in 2019. Recovery in reading has also been slower, with some researchers finding  essentially no rebound  since students returned to the classroom.

The report mirrors what many teachers say they are seeing in their classrooms, as some  sound the alarm publicly  about kids who they say can’t write a sentence or pay attention to a three-minute video.

“Focus and endurance for any sort of task, especially reading, has been really hard for a lot of teenagers” since coming back from pandemic closures, Sarah Mulhern Gross, who teaches honors English at High Technology High School in Lincroft, New Jersey, told Vox.

Meanwhile, even the youngest children, who were not yet in school when lockdowns began, are showing troubling signs of academic and behavioral delays. “We are talking 4- and 5-year-olds who are throwing chairs, biting, hitting,” Tommy Sheridan, deputy director of the National Head Start Association,  told the New York Times  earlier this year.

If schools and districts can’t reverse these trends, Covid could leave “an indelible mark” on a generation of kids, CRPE director Robin Lake said this week. The effects are greatest for low-income students, students with disabilities, and children learning English as a second language, who faced educational inequities prior to the pandemic that have only worsened today. Covid “shined a light on the resource inequities and opportunity gaps that existed in this country, and then it exacerbated them,” said Allison Socol, vice president for P-12 policy, research, and practice at EdTrust, a nonprofit devoted to educational equity.

The report is the latest effort to catalog what many educators, parents, and kids see as the deep scars — academic, but also social and emotional — left behind by the pandemic.

Earlier this year, the Northwest Evaluation Association (NWEA), a nationwide testing company,  reported that  rather than making up ground since the pandemic, students were falling further behind. In 2023-24, the gap between pre- and post-Covid test score averages widened by an average of 36 percent in reading and 18 percent in math, according to the NWEA report.

When it comes to education, the effect of the pandemic “is not over,” Lake said. “It’s not a thing of the past.”

Kids are behind in reading and math, and they’re not catching up

Nearly all public schools  in America closed by the end of March 2020, and while  some reopened that fall , others did not fully resume in-person learning until fall 2021.

The switch to remote school, along with the trauma and upheaval of living through a global health emergency in which  more than a million Americans  died, dealt a major blow to students’ learning. Scores on one set of national tests, released in September 2022, dropped to historic lows, reversing two decades of progress in reading and math,  the New York Times reported .

Still, experts were optimistic that students could make up the ground they’d lost. NWEA’s MAP tests, which measure academic growth, showed a strong rebound in the 2021-22 school year, said Karyn Lewis, vice president of research and policy partnerships at NWEA. But growth slowed the following year, and now lags behind pre-pandemic trends.

Kids “are learning throughout the year, but they are doing so at a slightly sluggish pace,” Lewis said — not enough to make up for their Covid-era losses.

A team of researchers using separate data from state tests appeared to find  more hopeful results  earlier this year, documenting significant recovery in both reading and math between 2022 and 2023. But after reanalyzing their data, they found that the improvements in reading were probably produced by changes in state tests, not actual improvements in student achievement, said Thomas Kane, faculty director of the Center for Education Policy Research at Harvard and one of the leaders of the research team. In fact, though students did gain some ground in math, they showed little recovery in reading between 2022 and 2023.

More recent data does not paint a rosier picture. About half of states have released test results for the 2023-24 school year, and “I don’t see a lot of states with substantial increases” in scores, Kane said.

Many factors probably contribute to students’ slow recovery, experts say. Some may have missed “foundational pieces” of reading and math in 2020 and 2021, Lewis said. Learning loss can be like a “compounding debt,” she explained, with skills missed in early grades causing bigger and bigger problems as kids get older. Chronic absenteeism also remains a big obstacle to learning.  Twenty-six percent of students  were considered chronically absent in 2022-23, up from 13 percent in 2019-2020.

Continue reading at vox.com .

News by Focus Area

  • COVID-19 Impact (73)
  • Postsecondary Access & Success (31)
  • School Improvement & Redesign (65)
  • Teacher Effectiveness (57)

News by Type

  • Newsletters (5)
  • Press Releases (43)
  • Announcements (6)
  • In the News (229)

News by Project

News by year.

U.S. flag

An official website of the United States government.

Here’s how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • American Job Centers
  • Apprenticeship
  • Demonstration Grants
  • Farmworkers
  • Federal Bonding Program
  • Foreign Labor Certification
  • Indians and Native Americans
  • Job Seekers
  • Layoffs and Rapid Response
  • National Dislocated Worker Grants
  • Older Workers
  • Skills Training Grants
  • Trade Adjustment Assistance
  • Unemployment Insurance
  • Workforce Innovation and Opportunity Act (WIOA)
  • WIOA Adult Program
  • Advisories and Directives
  • Regulations
  • Labor Surplus Area
  • Performance
  • Recovery-Ready Workplace Resource Hub
  • Research and Evaluation
  • ETA News Releases
  • Regional Offices
  • Freedom of Information Act
  • Office of Apprenticeship
  • Office of Foreign Labor Certification
  • Office of Grants Management
  • Office of Job Corps
  • Office of Unemployment Insurance (1-877-S-2JOBS)

Bridging the Gap for New Americans: Final Report

Publication info, research methodology, country, state or territory, description, other products.

This report, prepared in response to the Bridging the Gap for New Americans Act (Pub. L. No. 117–210, enacted in October 2022) focuses on immigrants and refugees who are lawfully present in the U.S., arrived during the 5 years prior to the law, and have occupational credentials or academic degrees obtained outside the United States. The report explores the size of the relevant population, the percentage among it that experiences difficulties obtaining employment commensurate with their credentials or academic preparation, the types of difficulties that individuals in this group experience, and the services provided by various organizations and public agencies to aid this group. 

The report is based on a targeted literature review, an exploration of the available data on the relevant population, and a review of public and private programs that aid this population. While the study team found no recent studies or national datasets that cover the target population as defined in the statute, it did identify related data and information. Key findings include: 1) the number of immigrants with at least a college degree obtained outside the U.S. was estimated to be approximately 7 million, based on 2019 Census data from the American Community Survey (ACS); 2) based on the 2019 ACS, 24 percent of immigrants who obtained college degrees outside the U.S. accepted a job that did not require a college degree or were unemployed, 3) recredentialing or relicensing for such individuals is complex, expensive, and time-consuming, due to problems navigating licensing systems, lack of English language proficiency, and lack of sufficient funds, and 4) there are nonprofit organizations, state governments, and community colleges (all identified in the report) that have implemented strategies and approaches to address those various challenges.

Fact Check Team: 41% of Americans report peak stress levels, study finds

by JANAE BOWENS | THE NATIONAL DESK

WASHINGTON, DC - OCTOBER 31:  Senate Judiciary Committee member Sen. Al Franken (D-MN) (L) covers his face in frustration as he questions witnesses from Google, Facebook and Twitter during a Crime and Terrorism Subcommittee hearing in the Hart Senate Office Building on Capitol Hill October 31, 2017 in Washington, DC. The committee questioned the tech company representatives about attempts by Russian operatives to spread disinformation and purchase political ads on their platforms, and what efforts the companies plan to use to prevent similar incidents in future elections.  (Photo by Chip Somodevilla/Getty Images)

WASHINGTON (TND) — Many Americans are stressed out.

In fact, 41% say they are at peak stress right now, according to a Talker Research for Traditional Medicinals survey.

The top stressors listed were finances (35%), the economy (28%), physical health (25%), the 2024 presidential election (20%), and other world issues (19%).

A survey of 2,000 adults found that these stress headaches break down to three times a week and that respondents recalled having brain fog just as often.

Americans noted various signs of stress, including trouble sleeping, irritability, fatigue, and worry.

Seventy-one percent of Americans say self-care routines are an important part of the stress solution, but 45% say incorporating self-care strategies into their daily life is a hurdle.

With cold and flu season approaching, self care and stress management are more important than ever," said Kristel Corson, chief marketing officer at Traditional Medicinals. "Half of those surveyed believe that stress is often the main cause of them getting sick, and when asked what season is most stressful, the highest percentage of respondents (26%) said winter given seasonal changes and the holidays.

Forty-five percent of people have never taken a mental health day or sick day from work solely due to stress.

Gen Z Views on Social Media

Nearly half of Gen Zers wish TikTok, Snapchat, and X were never invented.

The Harris Poll is behind the survey that shows more Gen Zers think social media has had a negative impact than a positive one on their behavioral health. The majority of Gen Z women feel social media harms their emotional health.

Eight in ten have taken steps to limit social media usage at some point including things like unfollowing or muting an account or deleting the app.

The majority of Gen Zers want social media around.

A strong majority believe social media is positive for their social health, and 76% of those who use social media use it as an entertainment source.

Three in five think social media has hurt their generation and society overall.

Sixty-nine percent support a law requiring social media companies to develop a "child-safe" account option for users younger than 18.

Instagram is rolling out new restrictions for its teen users.

"We hope these changes give parents peace of mind about how their children use our apps and provide them with a clear, manageable way to keep tabs on their child’s smartphone use," Nick Clegg, Meta's president of global affairs, said in a post on X.

the problem of the self report in survey research

Advertisement

Advertisement

Is self-compassion a protective factor for addictions? Exploring its effects on alcohol and gambling-related problems using a self-compensation model framework among a Chinese adult sample in Hong Kong, China

  • Published: 18 September 2024

Cite this article

the problem of the self report in survey research

  • Hong Mian Yang 1 , 2 ,
  • Lawrence Hoc Nang Fong 3 , 4 ,
  • Hui Zhou 1 , 3 ,
  • Robin Chark 3 , 4 ,
  • Davis Ka Chio Fong 3 , 4 ,
  • Bryant P. H. Hui 5 &
  • Anise M. S. Wu   ORCID: orcid.org/0000-0001-8174-6581 1 , 3  

15 Accesses

Explore all metrics

Self-compassion has been generally recognized as a protective psychological factor against mental illnesses. However, its protective value against addictions remained unclear. To address this research gap, the current study explores the associations between self-compassion and both alcohol use disorder [AUD] and gambling disorder [GD] tendencies under the framework of self-compensation model of addiction. Data from a convenience sample of 682 adult past-year gamblers and drinkers were collected via an online survey in Hong Kong, China. Results of a series of mediation, moderation, and moderated mediation analyses showed a double-edged effect of self-compassion on addiction: the mindfulness facet of self-compassion was associated with lower levels of AUD and GD tendencies through suppressing self-compensation motivation for drinking and gambling, whereas its self-kindness facet was positively associated with both AUD and GD tendencies via increased self-compensation motivations among participants under high stress. Common humanity was not related to risk for AUD or GD. These findings indicate the conditional risk-enhancing effects of self-kindness may counteract the protective effects of mindfulness against both substance and behavioral addictions. Future addiction prevention programs may monitor participants’ stress, as well as self-kindness levels, and focus on cultivating mindfulness, especially when utilizing self-compassion-based interventions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

the problem of the self report in survey research

Data availability

The dataset generated and analyzed during the current study is available at https://osf.io/hz6yq/ .

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). American Psychiatric Publishing.

Book   Google Scholar  

Bailis, D. S., Brais, N. J., Single, A. N., & Schellenberg, B. J. (2021). Self-compassion buffers impaired decision-making by potential problem gamblers in a casino setting. Journal of Gambling Studies, 37 , 269–282. https://doi.org/10.1007/s10899-020-09993-8

Article   PubMed   Google Scholar  

Bailis, D. S., Single, A. N., Brais, N. J., & Schellenberg, B. J. (2022). Going for broke: Self-compassion, risky decision-making, and differences in problem gambling severity among undergraduates. Self and Identity, 22 (3), 379–407. https://doi.org/10.1080/15298868.2022.2104363

Article   Google Scholar  

Ballenger, J. C., & Post, R. M. (1978). Kindling as a model for alcohol withdrawal syndromes. The British Journal of Psychiatry, 133 (1), 1–14. https://doi.org/10.1192/bjp.133.1.1

Baránková, M., & Karpinský, A. (2022). Effectiveness of emotion focused training for self-compassion and self-protection in individuals addicted to the internet. Psychoterapie, 16 (1), 55–67.

Google Scholar  

Barnard, L. K., & Curry, J. F. (2011). Self-compassion: Conceptualizations, correlates, & interventions. Review of General Psychology, 15 (4), 289–303. https://doi.org/10.1037/a0025754

Brooks, M., Kay-Lambkin, F., Bowman, J., & Childs, S. (2012). Self-compassion amongst clients with problematic alcohol use. Mindfulness, 3 , 308–317. https://doi.org/10.1007/s12671-012-0106-5

Buchanan, T. W., McMullin, S. D., Baxley, C., & Weinstock, J. (2020). Stress and gambling. Current Opinion in Behavioral Sciences, 31 , 8–12. https://doi.org/10.1016/j.cobeha.2019.09.004

Bush, K., Kivlahan, D. R., McDonell, M. B., Fihn, S. D., & Bradley, K. A. (1998). The AUDIT alcohol consumption questions (AUDIT-C): An effective brief screening test for problem drinking. Archives of Internal Medicine, 158 , 1789–1795. https://doi.org/10.1001/archinte.158.16.1789

Chen, J. H., Tong, K. K., Wu, A. M. S., Lau, J. T., & Zhang, M. X. (2018). The comorbidity of gambling disorder among Macao adult residents and the moderating role of resilience and life purpose. International Journal of Environmental Research and Public Health, 15 (12), 2774. https://doi.org/10.3390/ijerph15122774

Article   PubMed   PubMed Central   Google Scholar  

Chu, X. Y., Huang, W. F., Li, Y., & Lei, L. (2020). The relationship between perceived stress and problematizing excessive online gaming of undergraduate: A moderated mediating model. Chinese Journal of Clinical Psychology, 28 (2), 379–382. https://doi.org/10.16128/j.cnki.1005-3611.2020.02.033

Dev, V., Fernando, A. T., & Consedine, N. S. (2020). Self-compassion as a stress moderator: A cross-sectional study of 1700 doctors, nurses, and medical students. Mindfulness, 11 , 1170–1181. https://doi.org/10.1007/s12671-020-01325-6

Egan, H., & Mantzios, M. (2018). A qualitative exploration of self-kindness and treating oneself in contexts of eating, weight regulation and other health behaviors: Implications for mindfulness-based eating programs. Frontiers in Psychology, 9 , 377424. https://doi.org/10.3389/fpsyg.2018.00880

Esch, T., & Stefano, G. B. (2004). The neurobiology of pleasure, reward processes, addiction and their health implications. Neuroendocrinology Letters, 25 (4), 235–251.

PubMed   Google Scholar  

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39 (2), 175–191. https://doi.org/10.3758/BF03193146

Fritz, M. S., & MacKinnon, D. P. (2007). Required sample size to detect the mediated effect. Psychological Science, 18 (3), 233–239. https://doi.org/10.1111/j.1467-9280.2007.01882.x

Garner, A. R., Gilbert, S. E., Shorey, R. C., Gordon, K. C., Moore, T. M., & Stuart, G. L. (2020). A longitudinal investigation on the relation between self-compassion and alcohol use in a treatment sample: A brief report. Substance Abuse: Research and Treatment, 14 , 1178221820909356. https://doi.org/10.1177/1178221820909356

Germer, C. K., & Neff, K. D. (2013). Self-compassion in clinical practice. Journal of Clinical Psychology, 69 (8), 856–867. https://doi.org/10.1002/jclp.22021

Grant, J. E., & Chamberlain, S. R. (2020). Gambling and substance use: Comorbidity and treatment implications. Progress in Neuro-Psychopharmacology and Biological Psychiatry, 99 , 109852. https://doi.org/10.1016/j.pnpbp.2019.109852

Hall, C. W., Row, K. A., Wuensch, K. L., & Godley, K. R. (2013). The role of self-compassion in physical and psychological well-being. The Journal of Psychology, 147 (4), 311–323. https://doi.org/10.1080/00223980.2012.693138

Halland, E., De Vibe, M., Solhaug, I., Friborg, O., Rosenvinge, J. H., Tyssen, R., Sørlie, T., & Bjørndal, A. (2015). Mindfulness training improves problem-focused coping in psychology and medical students: Results from a randomized controlled trial. College Student Journal, 49 (3), 387–398.

Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach . The Guilford Press.

Igarashi, N. S., Karam, C. H., Afonso, R. F., Carneiro, F. D., Lacerda, S. S., Santos, B. F., Kozasa, E. H., & Rangel, É. B. (2022). The effects of a short-term meditation-based mindfulness protocol in patients receiving hemodialysis. Psychology Health & Medicine, 27 (6), 1286–1295. https://doi.org/10.1080/13548506.2021.1871769

İskender, M., & Akin, A. (2011). Self-compassion and internet addiction. Turkish Online Journal of Educational Technology-TOJET, 10 (3), 215–221.

Jin, X. T., Zhao, T. Y., Cui, H. J., Xu, W., & Li, G. Z. (2017). The influence of the perceived status change on status consumption. Acta Psychologica Sinica, 49 , 273–284. https://doi.org/10.3724/SP.J.1041.2017.00273

Kardefelt-Winther, D. (2014). Problematizing excessive online gaming and its psychological predictors. Computers in Human Behavior, 31 , 118–122. https://doi.org/10.1016/j.chb.2013.10.017

Karyadi, K. A., VanderVeen, J. D., & Cyders, M. A. (2014). A meta-analysis of the relationship between trait mindfulness and substance use behaviors. Drug and Alcohol Dependence, 143 , 1–10. https://doi.org/10.1016/j.drugalcdep.2014.07.014

Koob, G. F. (1992). Drugs of abuse: Anatomy, pharmacology and function of reward pathways. Trends in Pharmacological Aciences, 13 , 177–184.

Lathren, C., Bluth, K., & Park, J. (2019). Adolescent self-compassion moderates the relationship between perceived stress and internalizing symptoms. Personality and Individual Differences, 143 , 36–41. https://doi.org/10.1016/j.paid.2019.02.008

Leung, D. Y., Lam, T. H., & Chan, S. S. (2010). Three versions of perceived stress scale: Validation in a sample of Chinese cardiac patients who smoke. Bmc Public Health, 10 , 1–7. https://doi.org/10.1186/1471-2458-10-513

Li, L., Niu, Z., Griffiths, M. D., & Mei, S. (2021). Relationship between gaming disorder, self-compensation motivation, game flow, time spent gaming, and fear of missing out among a sample of Chinese university students: A network analysis. Frontiers in Psychiatry, 12 , 761519. https://doi.org/10.3389/fpsyt.2021.761519

Liu, Y., Ni, X., & Niu, G. (2021). Perceived stress and short-form video application addiction: A moderated mediation model. Frontiers in Psychology, 12 , 5691. https://doi.org/10.3389/fpsyg.2021.747656

Mantzios, M., & Egan, H. H. (2017). On the role of self-compassion and self-kindness in weight regulation and health behavior change. Frontiers in Psychology, 8 , 229. https://doi.org/10.3389/fpsyg.2017.00229

McCarthy, C. J., Fouladi, R. T., Juncker, B. D., & Matheny, K. B. (2006). Psychological resources as stress buffers: Their relationship to university students’ anxiety and depression. Journal of College Counseling, 9 (2), 99–110. https://doi.org/10.1002/j.2161-1882.2006.tb00097.x

Memon, M. A., Cheah, J. H., Ramayah, T., Ting, H., Chuah, F., & Cham, T. H. (2019). Moderation analysis: Issues and guidelines. Journal of Applied Structural Equation Modeling, 3 (1), 1–11.

Meng, R., Yu, Y., Chai, S., Luo, X., Gong, B., Liu, B., Hu, Y., Luo, Y., & Yu, C. (2019). Examining psychometric properties and measurement invariance of a Chinese version of the Self-Compassion Scale-short form (SCS-SF) in nursing students and medical workers. Psychology Research and Behavior Management, 12 , 793–809. https://doi.org/10.2147/PRBM.S216411

Moore, A., & Malinowski, P. (2009). Meditation, mindfulness and cognitive flexibility. Consciousness and Cognition, 18 (1), 176–186. https://doi.org/10.1016/j.concog.2008.12.008

Neff, K. D. (2003). The development and validation of a scale to measure self-compassion. Self and Identity, 2 (3), 223–250. https://doi.org/10.1080/15298860309027

Neff, K., & Davidson, O. (2016). Self-compassion: Embracing suffering with kindness. Mindfulness in positive psychology (pp. 37–50). Routledge.

Neff, K. D., Tóth-Király, I., Yarnell, L. M., Arimitsu, K., Castilho, P., Ghorbani, N., Guo, H. X., Hirsch, J. K., Hupfeld, J., Hutz, C. S., Kotsou, I., Lee, W. K., Montero-Marin, J., Sirois, F. M., De Souza, L. K., Svendsen, J. L., Wilkinson, R. B., & Mantzios, M. (2019). Examining the factor structure of the Self-Compassion Scale in 20 diverse samples: Support for use of a total score and six subscale scores. Psychological Assessment, 31 (1), 27–45. https://doi.org/10.1037/pas0000629

Peng, Y., Zhou, H., Zhang, B., Mao, H., Hu, R., & Jiang, H. (2022). Perceived stress and mobile phone addiction among college students during the 2019 coronavirus disease: The mediating roles of rumination and the moderating role of self-control. Personality and Individual Differences, 185 , 111222. https://doi.org/10.1016/j.paid.2021.111222

Phelps, C. L., Paniagua, S. M., Willcockson, I. U., & Potter, J. S. (2018). The relationship between self-compassion and the risk for substance use disorder. Drug and Alcohol Dependence, 183 , 78–81. https://doi.org/10.1016/j.drugalcdep.2017.10.026

Phillips, A. C., Carroll, D., & Der, G. (2015). Negative life events and symptoms of depression and anxiety: Stress causation and/or stress generation. Anxiety Stress & Coping, 28 (4), 357–371. https://doi.org/10.1080/10615806.2015.1005078

Potenza, M. N., Balodis, I. M., Derevensky, J., Grant, J. E., Petry, N. M., Verdejo-Garcia, A., & Yip, S. W. (2019). Gambling disorder. Nature Reviews Disease Primers, 5 (1), 51. https://doi.org/10.1038/s41572-019-0099-7

Raes, F. (2010). Rumination and worry as mediators of the relationship between self-compassion and depression and anxiety. Personality and Individual Differences, 48 (6), 757–761. https://doi.org/10.1016/j.paid.2010.01.023

Rende, R., & Plomin, R. (1992). Diathesis-stress models of psychopathology: A quantitative genetic perspective. Applied and Preventive Psychology, 1 (4), 177–182. https://doi.org/10.1016/S0962-1849(05)80123-4

Sahinler, S., & Topuz, D. (2007). Bootstrap and jackknife resampling algorithms for estimation of regression parameters. Journal of Applied Quantitative Methods, 2 (2), 188–199.

Shonin, E., & Van Gordon, W. (2016). The mechanisms of mindfulness in the treatment of mental illness and addiction. International Journal of Mental Health and Addiction, 14 , 844–849. https://doi.org/10.1007/s11469-016-9653-7

Spillane, N. S., Schick, M. R., Goldstein, S. C., Nalven, T., & Kahler, C. W. (2022). The protective effects of self-compassion on alcohol-related problems among first nation adolescents. Addiction Research & Theory, 30 (1), 33–40. https://doi.org/10.1080/16066359.2021.1902994

Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. Journal of Clinical Epidemiology, 54 (8), 774–781. https://doi.org/10.1016/S0895-4356(01)00341-9

Tackett, J. L., Krieger, H., Neighbors, C., Rinker, D., Rodriguez, L., & Edward, G. (2017). Comorbidity of alcohol and gambling problems in emerging adults: A bifactor model conceptualization. Journal of Gambling Studies, 33 , 131–147. https://doi.org/10.1007/s10899-016-9618-6

Trompetter, H. R., De Kleine, E., & Bohlmeijer, E. T. (2017). Why does positive mental health buffer against psychopathology? An exploratory study on self-compassion as a resilience mechanism and adaptive emotion regulation strategy. Cognitive Therapy and Research, 41 , 459–468. https://doi.org/10.1007/s10608-016-9774-0

Tsai, M. C., Tsai, Y. F., Chen, C. Y., & Liu, C. Y. (2005). Alcohol Use Disorders Identification Test (AUDIT): Establishment of cut-off scores in a hospitalized Chinese population. Alcoholism: Clinical and Experimental Research, 29 (1), 53–57. https://doi.org/10.1097/01.ALC.0000151986.96710.E0

Wang, Y., Chen, X., Gong, J., & Yan, Y. (2016). Relationships between stress, negative emotions, resilience, and smoking: Testing a moderated mediation model. Substance Use & Misuse, 51 (4), 427–438. https://doi.org/10.3109/10826084.2015.1110176

Wilson, A. C., Mackintosh, K., Power, K., & Chan, S. W. (2019). Effectiveness of self-compassion related therapies: A systematic review and meta-analysis. Mindfulness, 10 , 979–995. https://doi.org/10.1007/s12671-018-1037-6

Wisener, M., & Khoury, B. (2020). Is self-compassion negatively associated with alcohol and marijuana-related problems via coping motives? Addictive Behaviors, 111 , 106554. https://doi.org/10.1016/j.addbeh.2020.106554

Zessin, U., Dickhäuser, O., & Garbade, S. (2015). The relationship between self-compassion and well‐being: A meta‐analysis. Applied Psychology: Health and Well‐Being, 7 (3), 340–364. https://doi.org/10.1111/aphw.12051

Zhang, M. X., Yang, H. M., Tong, K. K., & Wu, A. M. S. (2020). The prospective effect of purpose in life on gambling disorder and psychological flourishing among university students. Journal of Behavioral Addictions, 9 (3), 756–765. https://doi.org/10.1556/2006.2020.00046

Zhou, H., Wu, A. M. S., Su, X., Chang, L., Chen, J. H., Zhang, M. X., & Tong, K. K. (2023). Childhood environment and adulthood food addiction: Testing the multiple mediations of life history strategies and attitudes toward self. Appetite, 182 , 106448. https://doi.org/10.1016/j.appet.2023.106448

Download references

The research was supported by the research grants of the University of Macau [grant numbers: MYRG2022-00130-FSS, CRG2020-00001-ICI, and MYRG-CRG2022-00003-FSS-ICI].

Author information

Authors and affiliations.

Department of Psychology, Faculty of Social Sciences, University of Macau, Macao, China

Hong Mian Yang, Hui Zhou & Anise M. S. Wu

Faculty of Health and Wellness, City University of Macau, Macao, China

Hong Mian Yang

Center for Cognitive and Brain Sciences, Institute of Collaborative Innovation, University of Macau, Macao, China

Lawrence Hoc Nang Fong, Hui Zhou, Robin Chark, Davis Ka Chio Fong & Anise M. S. Wu

Department of Integrated Resort and Tourism Management, Faculty of Business Administration, University of Macau, Macao, China

Lawrence Hoc Nang Fong, Robin Chark & Davis Ka Chio Fong

Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hong Kong, China

Bryant P. H. Hui

You can also search for this author in PubMed   Google Scholar

Contributions

HMY Conceptualization, Methodology, Data analysis, Finding interpretation, Writing-Original draft, Writing-Reviewing and Editing. LHNF Writing-Reviewing and Editing. HZ Writing-Reviewing and Editing. RC Writing-Reviewing and Editing. DKCF Writing-Reviewing and Editing. BPHH Writing-Reviewing and Editing. AMSW Funding acquisition, Supervision, Coordination, and Writing-Reviewing and Editing. All authors contributed to and approved the final manuscript.

Corresponding author

Correspondence to Anise M. S. Wu .

Ethics declarations

Ethics approval.

This research was approved by the Research Ethics Committee of the corresponding author's affiliated department at University of Macau.

Competing interests

The authors report no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Yang, H.M., Fong, L.H.N., Zhou, H. et al. Is self-compassion a protective factor for addictions? Exploring its effects on alcohol and gambling-related problems using a self-compensation model framework among a Chinese adult sample in Hong Kong, China. Curr Psychol (2024). https://doi.org/10.1007/s12144-024-06659-1

Download citation

Accepted : 03 September 2024

Published : 18 September 2024

DOI : https://doi.org/10.1007/s12144-024-06659-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Self-kindness
  • Mindfulness
  • Self-compensation motivation
  • Substance use
  • Perceived stress
  • Behavioral addiction
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Screenshots of the self-report survey.

    the problem of the self report in survey research

  2. 1: The screenshot of the self-report survey.

    the problem of the self report in survey research

  3. Research Method: Self-Report

    the problem of the self report in survey research

  4. Solved What is a self-report, how does it collect data and

    the problem of the self report in survey research

  5. The screenshot of the self-report survey.

    the problem of the self report in survey research

  6. (PDF) The Future of Survey Self-report: An experiment contrasting

    the problem of the self report in survey research

VIDEO

  1. కౌంటింగ్ కి ముందు సంచలన రిపోర్ట్

  2. Self Report and Observation in Educational Assessment

  3. Leadership and Self-Deception by The Arbinger Institute

  4. Why Your True Self Matters

  5. Self report methods in psychology tutorial

  6. What are the challenges of sustainability reporting?

COMMENTS

  1. Lies, Damned Lies, and Survey Self-Reports? Identity as a Cause of Measurement Bias

    METHODS. In February 2014, 4 a random sample of 1,200 undergraduates at a large, public Midwestern university received an email invitation to participate in a two part study—a brief web survey followed by a five-day texting component—and contained a link to the first assessment. Respondents were promised a $50 incentive upon completion of the study in its entirety.

  2. Bias in Self-Reports: An Initial Elevation Phenomenon

    His research has focused on self-reported affect, mood, emotions, and well-being, and the biases involved in these measures. Emir Efendić is an assistant professor at the School of Business and Economics in Maastricht University in the Netherlands.

  3. (PDF) The Promise and Pitfalls of Self-report: Development, research

    Therefore, gaining insights into how students engage in the learning process is crucial. To date, empirical research has predominantly relied on self-report questionnaires for this (Catrysse et al ...

  4. Subjective data, objective data and the role of bias in predictive

    For decades, self-report measures based on questionnaires have been widely used in educational research to study implicit and complex constructs such as motivation, emotion, cognitive and metacognitive learning strategies. However, the existence of potential biases in such self-report instruments might cast doubts on the validity of the measured constructs. The emergence of trace data from ...

  5. The Problem of the Self-report in Survey Research

    The Problem of the Self-report in Survey Research: Working Paper. David A. Northrup. Institute for Social Research, York University, 1997 - Confidential communications - 66 pages. Bibliographic information. Title: The Problem of the Self-report in Survey Research: Working Paper: Author:

  6. PDF The Future of Survey Self-report: An experiment contrasting Likert, VAS

    Self-report is a fundamental research tool for the social sciences. Despite quantitative surveys being the ... self-report, surveys also receive the most criticism. These censures generally focus on two critical ... problem. Likert format surveys (Voutilainen, et al., 2016) and the survey statements themselves (Austin ...

  7. Understanding and Evaluating Survey Research

    Survey research is defined as "the collection of information from a sample of individuals through their responses to questions" (Check & Schutt, 2012, p. 160). This type of research allows for a variety of methods to recruit participants, collect data, and utilize various methods of instrumentation. Survey research can use quantitative research ...

  8. The Promise and Pitfalls of Self-report

    As a prelude to this special issue on the promise and pitfalls of self-report, this article addresses three issues critical to its current and future use. The development of self-report is framed in Vertical (improvement) and Horizontal (diversification) terms, making clear the role of both paths for continued innovation. The ongoing centrality of research design and analysis in ensuring that ...

  9. Self‐reported data in institutional research: Review and

    Higher education scholars and institutional researchers rely heavily on self-reported survey data in their work. This chapter explores problems associated with self-reports and provides questions and recommendations for their use.

  10. Large studies reveal how reference bias limits policy applications of

    As part of a larger survey administered by Character Lab, students completed a self-report questionnaire of conscientiousness (the tendency to be organized, responsible, and hardworking 57) as ...

  11. Information bias in health research: definition, pitfalls, and

    The issue of self-reporting bias represents a key problem in the assessment of most observational (such as cross-sectional or comparative, eg, case-control or cohort) research study designs, although it can still affect experimental studies. ... the questions asked may concern private or sensitive topics, such as self-report of dietary intake ...

  12. Measuring the Reliability of Self-Reported Behavior

    Measuring the Reliability of Self-Reported Behavior. By Chuck Dinerstein, MD, MBA — Oct 09, 2019. The use of self-reported behavior has been an Achilles heel of sorts, regarding the certainty of research outcomes. A new study shows not only that "self-reports" may be incorrect, but the degree of uncertainty introduced by them varies with the ...

  13. The Science of Self-Report

    The issue of self-report as a primary tool in research and as an unavoidable component in health care is of central concern to medical and social science researchers and medical and psychology practitioners, and many other scientists. Drawing on the expertise of disciplines ranging from anthropology to sociology, the conference's 32 speakers ...

  14. On the use of self-reports in marketing research: insights about

    Self-report data are regularly used in marketing research when consumer perceptions are central to understanding consumer responses to marketing efforts. Self-report data are convenient and cost-effective. A widely known response bias that is inherent to self-report data and illuminated by daily diary data is a tendency of the first report by study participants to be more extreme relative to ...

  15. The Science of Self-report

    This book presents cutting-edge research on optimal methods for obtaining self-reported information for use in the evaluation of scientific hypothesis, in therapeutic interventions, and in the development of prognostic indicators. ALTERNATE BLURB: Self-reports constitute critically important data for research and practice in many fields.

  16. Identifying and Addressing Response Errors in Self-Report Surveys

    A second screening problem in self-report offending surveys is that respondents often are asked to do a number of complex cognitive tasks rather than focusing on the single task of searching their memory for potentially eligible events. ... Footnote 11 While research in the self-report offending tradition established the superiority of ...

  17. Self-Report Bias in Estimating Cross-Sectional and Treatment Effects

    Spurious observed effect of a weight loss program that is ineffective at changing true weight but affects the self-report bias. With initial underreporting of the true weight, an increase in the bias (more underreporting) leads to a spurious decrease in observed weight (left panel).Similarly, a decrease in the bias (less underreporting) leads to a spurious increase in observed weight (right panel)

  18. The consequences of self‐reporting biases: Evidence from the crash

    Relying on firms to self-report information is an information-gathering mechanism that often results in biased measures due to the incentives of the reporting firms. What is less commonly understood is that using self-reported information for decision-making results in endogenous selection bias, which creates spurious associations between the ...

  19. The Limitations of Self-Report Measures of Non-cognitive Skills

    Colleagues from Harvard, MIT, and the University of Pennsylvania and I used self-report surveys to gather information on non-cognitive skills from more than 1,300 eighth-grade students across 32 ...

  20. Measuring bias in self-reported data

    1 Introduction. In this paper, we demonstrate the potential of a common econometric tool, stochastic frontier estimation (SFE), to measure response bias and its covariates in self-reported data. We illustrate the approach using self-reported measures of parenting behaviours before and after a family intervention.

  21. The Use of Self-Report Data in Psychology

    In psychology, a self-report is any test, measure, or survey that relies on an individual's own report of their symptoms, behaviors, beliefs, or attitudes. Self-report data is gathered typically in paper-and-pencil or electronic format or sometimes through an interview. Self-reporting is commonly used in psychological studies because it can ...

  22. Self-report study

    A self-report study is a type of survey, questionnaire, or poll in which respondents read the question and select a response by themselves without any outside interference. [1] A self-report is any method which involves asking a participant about their feelings, attitudes, beliefs and so on. Examples of self-reports are questionnaires and interviews; self-reports are often used as a way of ...

  23. Limitations of Self-report Delinquency Surveys

    report delinquency surveys. Self-report sur-veys are widely used by delinquency re-searchers, making it critical that readers of their work understand the problems associ-ated with this technique. Our experience has revealed, however, that students do not find learning about the problems associated with self-report delinquency surveys to be a ...

  24. Constructing and Evaluating Self-Report Measures

    A self-report (a.k.a. survey) is a measure where the respondent supplies information about him or herself. Self-reports are important in medical research because some variables (e.g., attitudes, beliefs, self-judged ability) only can be assessed from information directly furnished by the patient or other subject.

  25. The scary truth about how far behind American kids have fallen

    The report — the group's third annual analysis of the "state of the American student" — combines test scores and academic research with parent interviews to paint a picture of the challenges facing public schools and the families they serve. ... with skills missed in early grades causing bigger and bigger problems as kids get older ...

  26. Bridging the Gap for New Americans: Final Report

    This report, prepared in response to the Bridging the Gap for New Americans Act (Pub. L. No. 117-210, enacted in October 2022) focuses on immigrants and refugees who are lawfully present in the U.S., arrived during the 5 years prior to the law, and have occupational credentials or academic degrees obtained outside the United States. The report explores the size of the relevant population ...

  27. Fact Check Team: 41% of Americans report peak stress levels, study finds

    WASHINGTON (TND) — Many Americans are stressed out. In fact, 41% say they are at peak stress right now, according to a Talker Research for Traditional Medicinals survey.. The top stressors ...

  28. Self-medication among general population in the European Union

    Self-medication (SM) forms an important part of public health strategy. Nonetheless, little research has been performed to understand the current state of self-medication in the European Union (EU). Utilizing data from the third wave of the European Health Interview Surveys, this study finds an estimated SM prevalence of 34.3% in the EU (95%CI = 34.1-34.5%; n = 255,758). SM prevalence, as well ...

  29. Is self-compassion a protective factor for addictions? Exploring its

    Self-compassion has been generally recognized as a protective psychological factor against mental illnesses. However, its protective value against addictions remained unclear. To address this research gap, the current study explores the associations between self-compassion and both alcohol use disorder [AUD] and gambling disorder [GD] tendencies under the framework of self-compensation model ...