U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Neurol Res Pract

Logo of neurrp

How to use and assess qualitative research methods

Loraine busetto.

1 Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany

Wolfgang Wick

2 Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Christoph Gumbinger

Associated data.

Not applicable.

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 – 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 – 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig1_HTML.jpg

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig2_HTML.jpg

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig3_HTML.jpg

From data collection to data analysis

Attributions for icons: see Fig. ​ Fig.2, 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 – 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig4_HTML.jpg

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 – 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 – 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table ​ Table1. 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Take-away-points

• Assessing complex multi-component interventions or systems (of change)

• What works for whom when, how and why?

• Focussing on intervention improvement

• Document study

• Observations (participant or non-participant)

• Interviews (especially semi-structured)

• Focus groups

• Transcription of audio-recordings and field notes into transcripts and protocols

• Coding of protocols

• Using qualitative data management software

• Combinations of quantitative and/or qualitative methods, e.g.:

• : quali and quanti in parallel

• : quanti followed by quali

• : quali followed by quanti

• Checklists

• Reflexivity

• Sampling strategies

• Piloting

• Co-coding

• Member checking

• Stakeholder involvement

• Protocol adherence

• Sample size

• Randomization

• Interrater reliability, variability and other “objectivity checks”

• Not being quantitative research

Acknowledgements

Abbreviations.

EVTEndovascular treatment
RCTRandomised Controlled Trial
SOPStandard Operating Procedure
SRQRStandards for Reporting Qualitative Research

Authors’ contributions

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

no external funding.

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Open access
  • Published: 27 May 2020

How to use and assess qualitative research methods

  • Loraine Busetto   ORCID: orcid.org/0000-0002-9228-7875 1 ,
  • Wolfgang Wick 1 , 2 &
  • Christoph Gumbinger 1  

Neurological Research and Practice volume  2 , Article number:  14 ( 2020 ) Cite this article

774k Accesses

355 Citations

90 Altmetric

Metrics details

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 , 8 , 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 , 10 , 11 , 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

figure 1

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

figure 2

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

figure 3

From data collection to data analysis

Attributions for icons: see Fig. 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 , 25 , 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

figure 4

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 , 32 , 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 , 38 , 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Availability of data and materials

Not applicable.

Abbreviations

Endovascular treatment

Randomised Controlled Trial

Standard Operating Procedure

Standards for Reporting Qualitative Research

Philipsen, H., & Vernooij-Dassen, M. (2007). Kwalitatief onderzoek: nuttig, onmisbaar en uitdagend. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Qualitative research: useful, indispensable and challenging. In: Qualitative research: Practical methods for medical practice (pp. 5–12). Houten: Bohn Stafleu van Loghum.

Chapter   Google Scholar  

Punch, K. F. (2013). Introduction to social research: Quantitative and qualitative approaches . London: Sage.

Kelly, J., Dwyer, J., Willis, E., & Pekarsky, B. (2014). Travelling to the city for hospital care: Access factors in country aboriginal patient journeys. Australian Journal of Rural Health, 22 (3), 109–113.

Article   Google Scholar  

Nilsen, P., Ståhl, C., Roback, K., & Cairney, P. (2013). Never the twain shall meet? - a comparison of implementation science and policy implementation research. Implementation Science, 8 (1), 1–12.

Howick J, Chalmers I, Glasziou, P., Greenhalgh, T., Heneghan, C., Liberati, A., Moschetti, I., Phillips, B., & Thornton, H. (2011). The 2011 Oxford CEBM evidence levels of evidence (introductory document) . Oxford Center for Evidence Based Medicine. https://www.cebm.net/2011/06/2011-oxford-cebm-levels-evidence-introductory-document/ .

Eakin, J. M. (2016). Educating critical qualitative health researchers in the land of the randomized controlled trial. Qualitative Inquiry, 22 (2), 107–118.

May, A., & Mathijssen, J. (2015). Alternatieven voor RCT bij de evaluatie van effectiviteit van interventies!? Eindrapportage. In Alternatives for RCTs in the evaluation of effectiveness of interventions!? Final report .

Google Scholar  

Berwick, D. M. (2008). The science of improvement. Journal of the American Medical Association, 299 (10), 1182–1184.

Article   CAS   Google Scholar  

Christ, T. W. (2014). Scientific-based research and randomized controlled trials, the “gold” standard? Alternative paradigms and mixed methodologies. Qualitative Inquiry, 20 (1), 72–80.

Lamont, T., Barber, N., Jd, P., Fulop, N., Garfield-Birkbeck, S., Lilford, R., Mear, L., Raine, R., & Fitzpatrick, R. (2016). New approaches to evaluating complex health and care systems. BMJ, 352:i154.

Drabble, S. J., & O’Cathain, A. (2015). Moving from Randomized Controlled Trials to Mixed Methods Intervention Evaluation. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 406–425). London: Oxford University Press.

Chambers, D. A., Glasgow, R. E., & Stange, K. C. (2013). The dynamic sustainability framework: Addressing the paradox of sustainment amid ongoing change. Implementation Science : IS, 8 , 117.

Hak, T. (2007). Waarnemingsmethoden in kwalitatief onderzoek. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Observation methods in qualitative research] (pp. 13–25). Houten: Bohn Stafleu van Loghum.

Russell, C. K., & Gregory, D. M. (2003). Evaluation of qualitative research studies. Evidence Based Nursing, 6 (2), 36–40.

Fossey, E., Harvey, C., McDermott, F., & Davidson, L. (2002). Understanding and evaluating qualitative research. Australian and New Zealand Journal of Psychiatry, 36 , 717–732.

Yanow, D. (2000). Conducting interpretive policy analysis (Vol. 47). Thousand Oaks: Sage University Papers Series on Qualitative Research Methods.

Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects. Education for Information, 22 , 63–75.

van der Geest, S. (2006). Participeren in ziekte en zorg: meer over kwalitatief onderzoek. Huisarts en Wetenschap, 49 (4), 283–287.

Hijmans, E., & Kuyper, M. (2007). Het halfopen interview als onderzoeksmethode. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [The half-open interview as research method (pp. 43–51). Houten: Bohn Stafleu van Loghum.

Jansen, H. (2007). Systematiek en toepassing van de kwalitatieve survey. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Systematics and implementation of the qualitative survey (pp. 27–41). Houten: Bohn Stafleu van Loghum.

Pv, R., & Peremans, L. (2007). Exploreren met focusgroepgesprekken: de ‘stem’ van de groep onder de loep. In L. PLBJ & H. TCo (Eds.), Kwalitatief onderzoek: Praktische methoden voor de medische praktijk . [Exploring with focus group conversations: the “voice” of the group under the magnifying glass (pp. 53–64). Houten: Bohn Stafleu van Loghum.

Carter, N., Bryant-Lukosius, D., DiCenso, A., Blythe, J., & Neville, A. J. (2014). The use of triangulation in qualitative research. Oncology Nursing Forum, 41 (5), 545–547.

Boeije H: Analyseren in kwalitatief onderzoek: Denken en doen, [Analysis in qualitative research: Thinking and doing] vol. Den Haag Boom Lemma uitgevers; 2012.

Hunter, A., & Brewer, J. (2015). Designing Multimethod Research. In S. Hesse-Biber & R. B. Johnson (Eds.), The Oxford Handbook of Multimethod and Mixed Methods Research Inquiry (pp. 185–205). London: Oxford University Press.

Archibald, M. M., Radil, A. I., Zhang, X., & Hanson, W. E. (2015). Current mixed methods practices in qualitative research: A content analysis of leading journals. International Journal of Qualitative Methods, 14 (2), 5–33.

Creswell, J. W., & Plano Clark, V. L. (2011). Choosing a Mixed Methods Design. In Designing and Conducting Mixed Methods Research . Thousand Oaks: SAGE Publications.

Mays, N., & Pope, C. (2000). Assessing quality in qualitative research. BMJ, 320 (7226), 50–52.

O'Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research: A synthesis of recommendations. Academic Medicine : Journal of the Association of American Medical Colleges, 89 (9), 1245–1251.

Saunders, B., Sim, J., Kingstone, T., Baker, S., Waterfield, J., Bartlam, B., Burroughs, H., & Jinks, C. (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality and Quantity, 52 (4), 1893–1907.

Moser, A., & Korstjens, I. (2018). Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis. European Journal of General Practice, 24 (1), 9–18.

Marlett, N., Shklarov, S., Marshall, D., Santana, M. J., & Wasylak, T. (2015). Building new roles and relationships in research: A model of patient engagement research. Quality of Life Research : an international journal of quality of life aspects of treatment, care and rehabilitation, 24 (5), 1057–1067.

Demian, M. N., Lam, N. N., Mac-Way, F., Sapir-Pichhadze, R., & Fernandez, N. (2017). Opportunities for engaging patients in kidney research. Canadian Journal of Kidney Health and Disease, 4 , 2054358117703070–2054358117703070.

Noyes, J., McLaughlin, L., Morgan, K., Roberts, A., Stephens, M., Bourne, J., Houlston, M., Houlston, J., Thomas, S., Rhys, R. G., et al. (2019). Designing a co-productive study to overcome known methodological challenges in organ donation research with bereaved family members. Health Expectations . 22(4):824–35.

Piil, K., Jarden, M., & Pii, K. H. (2019). Research agenda for life-threatening cancer. European Journal Cancer Care (Engl), 28 (1), e12935.

Hofmann, D., Ibrahim, F., Rose, D., Scott, D. L., Cope, A., Wykes, T., & Lempp, H. (2015). Expectations of new treatment in rheumatoid arthritis: Developing a patient-generated questionnaire. Health Expectations : an international journal of public participation in health care and health policy, 18 (5), 995–1008.

Jun, M., Manns, B., Laupacis, A., Manns, L., Rehal, B., Crowe, S., & Hemmelgarn, B. R. (2015). Assessing the extent to which current clinical research is consistent with patient priorities: A scoping review using a case study in patients on or nearing dialysis. Canadian Journal of Kidney Health and Disease, 2 , 35.

Elsie Baker, S., & Edwards, R. (2012). How many qualitative interviews is enough? In National Centre for Research Methods Review Paper . National Centre for Research Methods. http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf .

Sandelowski, M. (1995). Sample size in qualitative research. Research in Nursing & Health, 18 (2), 179–183.

Sim, J., Saunders, B., Waterfield, J., & Kingstone, T. (2018). Can sample size in qualitative research be determined a priori? International Journal of Social Research Methodology, 21 (5), 619–634.

Download references

Acknowledgements

no external funding.

Author information

Authors and affiliations.

Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120, Heidelberg, Germany

Loraine Busetto, Wolfgang Wick & Christoph Gumbinger

Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Wolfgang Wick

You can also search for this author in PubMed   Google Scholar

Contributions

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

Corresponding author

Correspondence to Loraine Busetto .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Busetto, L., Wick, W. & Gumbinger, C. How to use and assess qualitative research methods. Neurol. Res. Pract. 2 , 14 (2020). https://doi.org/10.1186/s42466-020-00059-z

Download citation

Received : 30 January 2020

Accepted : 22 April 2020

Published : 27 May 2020

DOI : https://doi.org/10.1186/s42466-020-00059-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Qualitative research
  • Mixed methods
  • Quality assessment

Neurological Research and Practice

ISSN: 2524-3489

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

qualitative research nature

Banner Image

Quantitative and Qualitative Research

  • I NEED TO . . .
  • What is Quantitative Research?
  • What is Qualitative Research?
  • Quantitative vs Qualitative
  • Step 1: Accessing CINAHL
  • Step 2: Create a Keyword Search
  • Step 3: Create a Subject Heading Search
  • Step 4: Repeat Steps 1-3 for Second Concept
  • Step 5: Repeat Steps 1-3 for Quantitative Terms
  • Step 6: Combining All Searches
  • Step 7: Adding Limiters
  • Step 8: Save Your Search!
  • What Kind of Article is This?
  • More Research Help This link opens in a new window

What is qualitative research?

Qualitative research is a process of naturalistic inquiry that seeks an in-depth understanding of social phenomena within their natural setting. It focuses on the "why" rather than the "what" of social phenomena and relies on the direct experiences of human beings as meaning-making agents in their every day lives. Rather than by logical and statistical procedures, qualitative researchers use multiple systems of inquiry for the study of human phenomena including biography, case study, historical analysis, discourse analysis, ethnography, grounded theory, and phenomenology.

University of Utah College of Nursing, (n.d.). What is qualitative research? [Guide] Retrieved from  https://nursing.utah.edu/research/qualitative-research/what-is-qualitative-research.php#what 

The following video will explain the fundamentals of qualitative research.

  • << Previous: What is Quantitative Research?
  • Next: Quantitative vs Qualitative >>
  • Last Updated: Aug 19, 2024 2:09 PM
  • URL: https://libguides.uta.edu/quantitative_and_qualitative_research

University of Texas Arlington Libraries 702 Planetarium Place · Arlington, TX 76019 · 817-272-3000

  • Internet Privacy
  • Accessibility
  • Problems with a guide? Contact Us.

Qualitative vs Quantitative Research Methods & Data Analysis

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

The main difference between quantitative and qualitative research is the type of data they collect and analyze.

Quantitative data is information about quantities, and therefore numbers, and qualitative data is descriptive, and regards phenomenon which can be observed but not measured, such as language.
  • Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed numerically. Quantitative research is often used to test hypotheses, identify patterns, and make predictions.
  • Qualitative research gathers non-numerical data (words, images, sounds) to explore subjective experiences and attitudes, often via observation and interviews. It aims to produce detailed descriptions and uncover new insights about the studied phenomenon.

On This Page:

What Is Qualitative Research?

Qualitative research is the process of collecting, analyzing, and interpreting non-numerical data, such as language. Qualitative research can be used to understand how an individual subjectively perceives and gives meaning to their social reality.

Qualitative data is non-numerical data, such as text, video, photographs, or audio recordings. This type of data can be collected using diary accounts or in-depth interviews and analyzed using grounded theory or thematic analysis.

Qualitative research is multimethod in focus, involving an interpretive, naturalistic approach to its subject matter. This means that qualitative researchers study things in their natural settings, attempting to make sense of, or interpret, phenomena in terms of the meanings people bring to them. Denzin and Lincoln (1994, p. 2)

Interest in qualitative data came about as the result of the dissatisfaction of some psychologists (e.g., Carl Rogers) with the scientific study of psychologists such as behaviorists (e.g., Skinner ).

Since psychologists study people, the traditional approach to science is not seen as an appropriate way of carrying out research since it fails to capture the totality of human experience and the essence of being human.  Exploring participants’ experiences is known as a phenomenological approach (re: Humanism ).

Qualitative research is primarily concerned with meaning, subjectivity, and lived experience. The goal is to understand the quality and texture of people’s experiences, how they make sense of them, and the implications for their lives.

Qualitative research aims to understand the social reality of individuals, groups, and cultures as nearly as possible as participants feel or live it. Thus, people and groups are studied in their natural setting.

Some examples of qualitative research questions are provided, such as what an experience feels like, how people talk about something, how they make sense of an experience, and how events unfold for people.

Research following a qualitative approach is exploratory and seeks to explain ‘how’ and ‘why’ a particular phenomenon, or behavior, operates as it does in a particular context. It can be used to generate hypotheses and theories from the data.

Qualitative Methods

There are different types of qualitative research methods, including diary accounts, in-depth interviews , documents, focus groups , case study research , and ethnography .

The results of qualitative methods provide a deep understanding of how people perceive their social realities and in consequence, how they act within the social world.

The researcher has several methods for collecting empirical materials, ranging from the interview to direct observation, to the analysis of artifacts, documents, and cultural records, to the use of visual materials or personal experience. Denzin and Lincoln (1994, p. 14)

Here are some examples of qualitative data:

Interview transcripts : Verbatim records of what participants said during an interview or focus group. They allow researchers to identify common themes and patterns, and draw conclusions based on the data. Interview transcripts can also be useful in providing direct quotes and examples to support research findings.

Observations : The researcher typically takes detailed notes on what they observe, including any contextual information, nonverbal cues, or other relevant details. The resulting observational data can be analyzed to gain insights into social phenomena, such as human behavior, social interactions, and cultural practices.

Unstructured interviews : generate qualitative data through the use of open questions.  This allows the respondent to talk in some depth, choosing their own words.  This helps the researcher develop a real sense of a person’s understanding of a situation.

Diaries or journals : Written accounts of personal experiences or reflections.

Notice that qualitative data could be much more than just words or text. Photographs, videos, sound recordings, and so on, can be considered qualitative data. Visual data can be used to understand behaviors, environments, and social interactions.

Qualitative Data Analysis

Qualitative research is endlessly creative and interpretive. The researcher does not just leave the field with mountains of empirical data and then easily write up his or her findings.

Qualitative interpretations are constructed, and various techniques can be used to make sense of the data, such as content analysis, grounded theory (Glaser & Strauss, 1967), thematic analysis (Braun & Clarke, 2006), or discourse analysis .

For example, thematic analysis is a qualitative approach that involves identifying implicit or explicit ideas within the data. Themes will often emerge once the data has been coded .

RESEARCH THEMATICANALYSISMETHOD

Key Features

  • Events can be understood adequately only if they are seen in context. Therefore, a qualitative researcher immerses her/himself in the field, in natural surroundings. The contexts of inquiry are not contrived; they are natural. Nothing is predefined or taken for granted.
  • Qualitative researchers want those who are studied to speak for themselves, to provide their perspectives in words and other actions. Therefore, qualitative research is an interactive process in which the persons studied teach the researcher about their lives.
  • The qualitative researcher is an integral part of the data; without the active participation of the researcher, no data exists.
  • The study’s design evolves during the research and can be adjusted or changed as it progresses. For the qualitative researcher, there is no single reality. It is subjective and exists only in reference to the observer.
  • The theory is data-driven and emerges as part of the research process, evolving from the data as they are collected.

Limitations of Qualitative Research

  • Because of the time and costs involved, qualitative designs do not generally draw samples from large-scale data sets.
  • The problem of adequate validity or reliability is a major criticism. Because of the subjective nature of qualitative data and its origin in single contexts, it is difficult to apply conventional standards of reliability and validity. For example, because of the central role played by the researcher in the generation of data, it is not possible to replicate qualitative studies.
  • Also, contexts, situations, events, conditions, and interactions cannot be replicated to any extent, nor can generalizations be made to a wider context than the one studied with confidence.
  • The time required for data collection, analysis, and interpretation is lengthy. Analysis of qualitative data is difficult, and expert knowledge of an area is necessary to interpret qualitative data. Great care must be taken when doing so, for example, looking for mental illness symptoms.

Advantages of Qualitative Research

  • Because of close researcher involvement, the researcher gains an insider’s view of the field. This allows the researcher to find issues that are often missed (such as subtleties and complexities) by the scientific, more positivistic inquiries.
  • Qualitative descriptions can be important in suggesting possible relationships, causes, effects, and dynamic processes.
  • Qualitative analysis allows for ambiguities/contradictions in the data, which reflect social reality (Denscombe, 2010).
  • Qualitative research uses a descriptive, narrative style; this research might be of particular benefit to the practitioner as she or he could turn to qualitative reports to examine forms of knowledge that might otherwise be unavailable, thereby gaining new insight.

What Is Quantitative Research?

Quantitative research involves the process of objectively collecting and analyzing numerical data to describe, predict, or control variables of interest.

The goals of quantitative research are to test causal relationships between variables , make predictions, and generalize results to wider populations.

Quantitative researchers aim to establish general laws of behavior and phenomenon across different settings/contexts. Research is used to test a theory and ultimately support or reject it.

Quantitative Methods

Experiments typically yield quantitative data, as they are concerned with measuring things.  However, other research methods, such as controlled observations and questionnaires , can produce both quantitative information.

For example, a rating scale or closed questions on a questionnaire would generate quantitative data as these produce either numerical data or data that can be put into categories (e.g., “yes,” “no” answers).

Experimental methods limit how research participants react to and express appropriate social behavior.

Findings are, therefore, likely to be context-bound and simply a reflection of the assumptions that the researcher brings to the investigation.

There are numerous examples of quantitative data in psychological research, including mental health. Here are a few examples:

Another example is the Experience in Close Relationships Scale (ECR), a self-report questionnaire widely used to assess adult attachment styles .

The ECR provides quantitative data that can be used to assess attachment styles and predict relationship outcomes.

Neuroimaging data : Neuroimaging techniques, such as MRI and fMRI, provide quantitative data on brain structure and function.

This data can be analyzed to identify brain regions involved in specific mental processes or disorders.

For example, the Beck Depression Inventory (BDI) is a clinician-administered questionnaire widely used to assess the severity of depressive symptoms in individuals.

The BDI consists of 21 questions, each scored on a scale of 0 to 3, with higher scores indicating more severe depressive symptoms. 

Quantitative Data Analysis

Statistics help us turn quantitative data into useful information to help with decision-making. We can use statistics to summarize our data, describing patterns, relationships, and connections. Statistics can be descriptive or inferential.

Descriptive statistics help us to summarize our data. In contrast, inferential statistics are used to identify statistically significant differences between groups of data (such as intervention and control groups in a randomized control study).

  • Quantitative researchers try to control extraneous variables by conducting their studies in the lab.
  • The research aims for objectivity (i.e., without bias) and is separated from the data.
  • The design of the study is determined before it begins.
  • For the quantitative researcher, the reality is objective, exists separately from the researcher, and can be seen by anyone.
  • Research is used to test a theory and ultimately support or reject it.

Limitations of Quantitative Research

  • Context: Quantitative experiments do not take place in natural settings. In addition, they do not allow participants to explain their choices or the meaning of the questions they may have for those participants (Carr, 1994).
  • Researcher expertise: Poor knowledge of the application of statistical analysis may negatively affect analysis and subsequent interpretation (Black, 1999).
  • Variability of data quantity: Large sample sizes are needed for more accurate analysis. Small-scale quantitative studies may be less reliable because of the low quantity of data (Denscombe, 2010). This also affects the ability to generalize study findings to wider populations.
  • Confirmation bias: The researcher might miss observing phenomena because of focus on theory or hypothesis testing rather than on the theory of hypothesis generation.

Advantages of Quantitative Research

  • Scientific objectivity: Quantitative data can be interpreted with statistical analysis, and since statistics are based on the principles of mathematics, the quantitative approach is viewed as scientifically objective and rational (Carr, 1994; Denscombe, 2010).
  • Useful for testing and validating already constructed theories.
  • Rapid analysis: Sophisticated software removes much of the need for prolonged data analysis, especially with large volumes of data involved (Antonius, 2003).
  • Replication: Quantitative data is based on measured values and can be checked by others because numerical data is less open to ambiguities of interpretation.
  • Hypotheses can also be tested because of statistical analysis (Antonius, 2003).

Antonius, R. (2003). Interpreting quantitative data with SPSS . Sage.

Black, T. R. (1999). Doing quantitative research in the social sciences: An integrated approach to research design, measurement and statistics . Sage.

Braun, V. & Clarke, V. (2006). Using thematic analysis in psychology . Qualitative Research in Psychology , 3, 77–101.

Carr, L. T. (1994). The strengths and weaknesses of quantitative and qualitative research : what method for nursing? Journal of advanced nursing, 20(4) , 716-721.

Denscombe, M. (2010). The Good Research Guide: for small-scale social research. McGraw Hill.

Denzin, N., & Lincoln. Y. (1994). Handbook of Qualitative Research. Thousand Oaks, CA, US: Sage Publications Inc.

Glaser, B. G., Strauss, A. L., & Strutzel, E. (1968). The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4) , 364.

Minichiello, V. (1990). In-Depth Interviewing: Researching People. Longman Cheshire.

Punch, K. (1998). Introduction to Social Research: Quantitative and Qualitative Approaches. London: Sage

Further Information

  • Mixed methods research
  • Designing qualitative research
  • Methods of data collection and analysis
  • Introduction to quantitative and qualitative research
  • Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?
  • Qualitative research in health care: Analysing qualitative data
  • Qualitative data analysis: the framework approach
  • Using the framework method for the analysis of
  • Qualitative data in multi-disciplinary health research
  • Content Analysis
  • Grounded Theory
  • Thematic Analysis

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Qualitative vs. Quantitative Research | Differences, Examples & Methods

Qualitative vs. Quantitative Research | Differences, Examples & Methods

Published on April 12, 2019 by Raimo Streefkerk . Revised on June 22, 2023.

When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge.

Common quantitative methods include experiments, observations recorded as numbers, and surveys with closed-ended questions.

Quantitative research is at risk for research biases including information bias , omitted variable bias , sampling bias , or selection bias . Qualitative research Qualitative research is expressed in words . It is used to understand concepts, thoughts or experiences. This type of research enables you to gather in-depth insights on topics that are not well understood.

Common qualitative methods include interviews with open-ended questions, observations described in words, and literature reviews that explore concepts and theories.

Table of contents

The differences between quantitative and qualitative research, data collection methods, when to use qualitative vs. quantitative research, how to analyze qualitative and quantitative data, other interesting articles, frequently asked questions about qualitative and quantitative research.

Quantitative and qualitative research use different research methods to collect and analyze data, and they allow you to answer different kinds of research questions.

Qualitative vs. quantitative research

Quantitative and qualitative data can be collected using various methods. It is important to use a data collection method that will help answer your research question(s).

Many data collection methods can be either qualitative or quantitative. For example, in surveys, observational studies or case studies , your data can be represented as numbers (e.g., using rating scales or counting frequencies) or as words (e.g., with open-ended questions or descriptions of what you observe).

However, some methods are more commonly used in one type or the other.

Quantitative data collection methods

  • Surveys :  List of closed or multiple choice questions that is distributed to a sample (online, in person, or over the phone).
  • Experiments : Situation in which different types of variables are controlled and manipulated to establish cause-and-effect relationships.
  • Observations : Observing subjects in a natural environment where variables can’t be controlled.

Qualitative data collection methods

  • Interviews : Asking open-ended questions verbally to respondents.
  • Focus groups : Discussion among a group of people about a topic to gather opinions that can be used for further research.
  • Ethnography : Participating in a community or organization for an extended period of time to closely observe culture and behavior.
  • Literature review : Survey of published works by other authors.

A rule of thumb for deciding whether to use qualitative or quantitative data is:

  • Use quantitative research if you want to confirm or test something (a theory or hypothesis )
  • Use qualitative research if you want to understand something (concepts, thoughts, experiences)

For most research topics you can choose a qualitative, quantitative or mixed methods approach . Which type you choose depends on, among other things, whether you’re taking an inductive vs. deductive research approach ; your research question(s) ; whether you’re doing experimental , correlational , or descriptive research ; and practical considerations such as time, money, availability of data, and access to respondents.

Quantitative research approach

You survey 300 students at your university and ask them questions such as: “on a scale from 1-5, how satisfied are your with your professors?”

You can perform statistical analysis on the data and draw conclusions such as: “on average students rated their professors 4.4”.

Qualitative research approach

You conduct in-depth interviews with 15 students and ask them open-ended questions such as: “How satisfied are you with your studies?”, “What is the most positive aspect of your study program?” and “What can be done to improve the study program?”

Based on the answers you get you can ask follow-up questions to clarify things. You transcribe all interviews using transcription software and try to find commonalities and patterns.

Mixed methods approach

You conduct interviews to find out how satisfied students are with their studies. Through open-ended questions you learn things you never thought about before and gain new insights. Later, you use a survey to test these insights on a larger scale.

It’s also possible to start with a survey to find out the overall trends, followed by interviews to better understand the reasons behind the trends.

Qualitative or quantitative data by itself can’t prove or demonstrate anything, but has to be analyzed to show its meaning in relation to the research questions. The method of analysis differs for each type of data.

Analyzing quantitative data

Quantitative data is based on numbers. Simple math or more advanced statistical analysis is used to discover commonalities or patterns in the data. The results are often reported in graphs and tables.

Applications such as Excel, SPSS, or R can be used to calculate things like:

  • Average scores ( means )
  • The number of times a particular answer was given
  • The correlation or causation between two or more variables
  • The reliability and validity of the results

Analyzing qualitative data

Qualitative data is more difficult to analyze than quantitative data. It consists of text, images or videos instead of numbers.

Some common approaches to analyzing qualitative data include:

  • Qualitative content analysis : Tracking the occurrence, position and meaning of words or phrases
  • Thematic analysis : Closely examining the data to identify the main themes and patterns
  • Discourse analysis : Studying how communication works in social contexts

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

Quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings.

Quantitative methods allow you to systematically measure variables and test hypotheses . Qualitative methods allow you to explore concepts and experiences in more detail.

In mixed methods research , you use both qualitative and quantitative data collection and analysis methods to answer your research question .

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts and meanings, use qualitative methods .
  • If you want to analyze a large amount of readily-available data, use secondary data. If you want data specific to your purposes with control over how it is generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Data collection is the systematic process by which observations or measurements are gathered in research. It is used in many different contexts by academics, governments, businesses, and other organizations.

There are various approaches to qualitative data analysis , but they all share five steps in common:

  • Prepare and organize your data.
  • Review and explore your data.
  • Develop a data coding system.
  • Assign codes to the data.
  • Identify recurring themes.

The specifics of each step depend on the focus of the analysis. Some common approaches include textual analysis , thematic analysis , and discourse analysis .

A research project is an academic, scientific, or professional undertaking to answer a research question . Research projects can take many forms, such as qualitative or quantitative , descriptive , longitudinal , experimental , or correlational . What kind of research approach you choose will depend on your topic.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Streefkerk, R. (2023, June 22). Qualitative vs. Quantitative Research | Differences, Examples & Methods. Scribbr. Retrieved August 29, 2024, from https://www.scribbr.com/methodology/qualitative-quantitative-research/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Other students also liked, what is quantitative research | definition, uses & methods, what is qualitative research | methods & examples, mixed methods research | definition, guide & examples, what is your plagiarism score.

  • Original research
  • Open access
  • Published: 30 August 2024

Human errors in emergency medical services: a qualitative analysis of contributing factors

  • Anna Poranen   ORCID: orcid.org/0000-0001-6193-725X 1 ,
  • Anne Kouvonen 2 , 3 &
  • Hilla Nordquist 1 , 2 , 4  

Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine volume  32 , Article number:  78 ( 2024 ) Cite this article

Metrics details

The dynamic and challenging work environment of the prehospital emergency care settings creates many challenges for paramedics. Previous studies have examined adverse events and patient safety activities, but studies focusing on paramedics’ perspectives of factors contributing to human error are lacking. In this study, we investigated paramedics’ opinions of the factors contributing to human errors.

Data was collected through semi-structured individual interviews ( n  = 15) with paramedics and emergency medical field supervisors in Finland. The data was analyzed using inductive content analysis. Consolidated criteria for reporting qualitative research were used.

Contributing factors to human errors were divided into three main categories. The first main category, Changing work environment , consisted of two generic categories: The nature of the work and Factors linked to missions . The second main category, Organization of work , was divided into three generic categories: Inadequate care guidelines , Interaction challenges and Challenges related to technological systems . The third main category, Paramedics themselves , consisted of four generic categories: Issues that complicate cognitive processing , Individual strains and needs , Attitude problems and Impact of work experience .

Various factors contributing to human errors in emergency medical services (EMS) settings were identified. Although many of them were related to individual factors or to the paramedics themselves, system-level factors were also found to affect paramedics’ work and may therefore negatively impact patient safety. The findings provide insights for organizations to use this knowledge proactively to develop their procedures and to improve patient safety.

Healthcare is considered a high-risk industry similarly to aviation industry where human error management has been acknowledged already for decades [ 1 ]. Healthcare systems worldwide have learned safety procedures from other safety critical organizations, and a great deal of attention has been focused on eliminating human errors and improving patient safety. In complex and changing health care work environments, many factors contribute to errors, not all of which can be eliminated [ 1 , 2 ]. Human action is valuable and necessary because it withstands variability and fine adjustment that is needed in dynamic and complex systems [ 3 , 4 ]. The variability of human action can lead to both successes and failures. Therefore, contributing factors which lead to variations of human actions, and sometimes undesirable outcomes should be identified because safety is not improved by simply eliminating errors [ 3 ]. However, if patient safety protocols are deviated from, organizations should not only investigate errors related to human behavior, but they should also explore how interactions between the system and the individuals may have failed [ 4 ]. Human error is not a cause of adverse events [ 4 ]. Human error can be defined as a situation where performance variability is needed, and the outcome is undesirable in the end. Secondly, under normal circumstances, the action leads to a desirable outcome [ 4 ]. In contrast, unfamiliar work circumstances or a distraction causes a loss of focus that leads to an error [ 5 ].

Emergency medical services (EMS) work environment is dynamic and challenging, and the risk for errors is high. Previous studies have indicated that fatigue and shift work can increase the risk of medical errors and negatively impact patient safety [ 6 , 7 ]. Critically ill patients and organizational factors, such as a deviation of standard of care or insufficient training can also create a risk for adverse events [ 8 ]. Furthermore, difficulties related to decision-making can affect patient safety [ 8 , 9 ].

In the EMS setting, human errors have been studied since the 1980s and, a proactive approach is recommended for exploring factors that affect errors [ 10 , 11 ]. Previous studies have investigated medication errors, patient safety activities and adverse events in the prehospital emergency care setting [ 8 , 11 , 12 , 13 ], but little is known about factors contributing to human errors from paramedics’ perspectives. Proactive exploration of contributing factors to errors can provide new research understanding of this area and improve patient safety in the EMS setting. Therefore, the research question for this study was as follows: In paramedics’ opinions, what kinds of issues contribute to human error?

Materials and methods

A qualitative study with semi-structured interviews and inductive content analysis was implemented to investigate human errors from paramedics’ perspective, capturing their lived experiences and views of this complex issue [ 14 ]. The consolidated criteria for reporting qualitative research (COREQ) checklist [ 15 ] were used for reporting this study and are outlined in Additional file 1 .

This study was carried out in Finland in 2020. At that time 21 hospital districts organized EMS in their areas. The hospital districts could provide EMS by themselves, in cooperation with local rescue services or by outsourcing the services to the private sector. In one hospital district, there could be more than one EMS provider organization. All EMS organizations are guided by the Ministry of Social Affairs and Health and national legislation [ 16 , 17 ]. The Finnish EMS consists of advanced-level EMS units (staffed with at least one paramedic and a practical/registered nurse or a firefighter) and basic-level EMS units (staffed with one healthcare professional, e.g., a practical nurse who has specialized in prehospital emergency care and another practical/registered nurse or a firefighter) [ 17 ]. In Finland, advanced-level paramedics are either registered nurses with at least three-and-a-half years of training in a University of Applied Sciences (UAS) and additional prehospital emergency care specialization, or emergency care nurses with at least four years of training in UAS. Each hospital district had at least one EMS field supervisor who was responsible for the operational aspects of EMS. EMS field supervisors are advanced-level paramedics with sufficient work experience and operative leadership training and they were operating by their own units [ 17 ]. In addition, each hospital district had at least one on-call EMS physician who could always be requested for care instructions by Finnish paramedics. Finland also has helicopter EMS units, and in some districts, EMS physicians operate their own ground-units as well [ 17 , 18 ].

Participants

This study focused on paramedics and EMS field supervisors working in the EMS setting. The inclusion criteria were: (1) advanced-level or basic-level paramedic or EMS field supervisor with any length of work experience and (2) at the time of the study, worked in EMS. The convenience sampling method was used which is common in qualitative studies such as ours [ 19 ]. Participants were recruited via social media; in June 2020, a recruitment ad was posted in the Finnish Facebook group Ensihoidon uutiset (“News of Prehospital Emergency Medical Services”), which at the time had over 5,000 members working in EMS settings across Finland. Potential participants were asked to contact the first author via Facebook Messenger to receive more information about the study, after which, they confirmed their participation via email.

Eighteen people initially contacted the first author. Of these, two did not confirm their participation, and one wanted a different method of data collection. In total, 15 people confirmed their participation in the study. These participants were advanced-level paramedics and EMS field supervisors (later, paramedics); nine women and six men from seven EMS organizations in Finland, which represented eastern, western, northern, and central parts of Finland.

Data collection

To enable a dialogue between the interviewer and the participant, semi-structured individual interviews were used to collect the data [ 14 ]. The interview guide, which addressed the knowledge gaps in the literature, was formulated by the first and the last authors. An external expert on system safety and human factors was asked to assess the appropriateness of the interview guide, after which a pilot interview was conducted with a potential study participant. A few changes were made based on the expert’s comments and the pilot interview.

The first author, an advanced-level paramedic with several years of experience in EMS, conducted the interviews. During the interviews, open and trusting dialogues were maintained to ensure that the interviewer did not influence the participants’ responses.

The interviews began with the question, “What does human error mean to you?” Subsequent questions encouraged the participants to describe the issues and situations, they believed to be linked to or affect human error in EMS settings. While the interview guide was predesigned, most of the follow-up questions were formulated based on the participants’ earlier responses and the interviewer’s notes that were written during the interviews. For instance, a follow up question was, “You said a long work shift can create a risk for errors, how will this affect working in the EMS in your opinion?” The interview guide is displayed in Additional file 2 . Each interview was audio recorded and lasted between 40 and 75 min. Interviews were carried out in person ( n  = 8), online ( n  = 4), and by phone ( n  = 3), between July and October 2020. After conducting 13 interviews, the responses began to repeat themselves, yet interviews were conducted with all the paramedics who volunteered for the study.

The audio recordings were transcribed verbatim by the first author. All the interviews were assigned numerical codes and pseudonymized, and no personal information was included. Confidentiality of identity was guaranteed throughout the study process.

Data analysis

The paramedics’ opinions on factors contributing to human errors were analyzed using inductive content analysis and the process followed the phases described by Elo and Kyngäs [ 20 ]. The first author read the transcripts carefully several times to obtain an overall understanding of the data. Then, short sentences were chosen as units of meaning, and the coding began. The contents answering to the research question were marked in the text, and the headings describing all aspects of the content were written in the text while it was being read. This was done by the first author without the use of any analysis software. To ensure the trustworthiness of the coding process and the correctness of the interpretation of participants’ responses, the first and the last author discussed and reviewed the process together. The last author has several years of both academic research experience and supervisory experience in the EMS setting, enabling a comprehensive understanding of the research method.

The headings were collected into a chart, and overlaps were removed. Then grouping began and similar content belonging together were grouped into subcategories and named using content-characteristic words. Then, the similar and related subcategories were grouped into broader generic categories and named. Finally, the main categories were formed based on the related generic categories [ 20 ]. An example of category grouping is shown in Additional file 3 . The first and the last author worked together to group the categories. During the process, to ensure the trustworthiness of this study, the categories were reviewed and compared against the original data several times, and the main categories were formulated after profound reflection. The first, second, and last authors collaborated to finalize the categories. The second author has extensive experience in academic research, methodology, and supervision, as well as a broad understanding of occupational health and related social phenomena.

Three main categories were formed: (1) Changing work environment , (2) Organization of work , and (3) Paramedics themselves . An overview of the whole analysis is displayed in Additional file 4 . Figure  1 provides an overview of the main categories.

figure 1

Factors contributing to human errors in EMS, according to paramedics

Changing work environment

The main category Changing work environment consisted of two generic categories: The nature of the work and Factors related to missions . These categories describe aspects that are related to the unique work environment of the EMS. An overview of the first main category can be seen in Fig.  2 .

figure 2

An overview of the first main category, Changing work environment

The nature of the work

Urgency. According to the participants, urgency is a factor that contributes to human errors. In their perspectives, a sense of urgency adds pressure, which in turn may lead to carelessness, and, subsequently, to errors. Participants explained that emergency missions can lessen one’s situational awareness and vigilance and therefore one’s control, which may cause errors to occur. Moreover, many missions include the added pressure of needing to be managed quickly so that an ambulance will be available for the next mission. Field supervisors may also cause a sense of urgency if they oversee how long paramedics stay at the scene.

There is a rushed situation , and you are like , “Okay , it is this , ” and then you give the wrong dose [of medication to the patient] — that rushed situation has created pressure in a way , and that is why an error happens. (P12)

Disruptive external issues. Based on the interviews, external factors can lead to human errors because they disturb the paramedics’ work and can distract them. A paramedic may not understand or notice something relevant, resulting in an error. For example, if a patient’s family members disagree with or argue with the paramedics, it can complicate the situation and make the paramedics’ work difficult, as it requires the paramedics’ attention to be diverted from the patient. According to the participants, families and other bystanders must be taken into account but if they feel that they are not noticed, a mutual agreement of the patient’s care can disappear and a risk for error can occur. Radio communication and ear buds can also be distracting.

Of course , you must wear ear buds all the time and you have to listen to certain channels , and if you are focusing on something… it will disturb your own work. (P3)

Factors related to missions

Particular patient groups. The participants described particular patient groups that can affect work circumstances and may therefore contribute to errors.

Based on the interviews, the presence of critical illness can make circumstances challenging and these acute emergency situations can cause a sense of urgency and time pressure for paramedics, which can lead to forgetting something, such as when a paramedic must administer the urgently needed medication quickly. According to the opinions of the participants, these kinds of circumstances may contribute to errors. Critical procedures can cause pressure that can worsen the paramedics’ concentration in the missions and control of the situation as a whole can be lost. In addition, acute and mentally demanding missions and unstable patients requiring immediate care can cause errors because of the emergency. Moreover, a paramedic’s work may be affected by missions with similar critically ill patients because they may not remember which patient needs which medication or dose.

We had a patient with an epileptic seizure the previous day , and I had given them an intranasal medication. Without thinking , I gave the same dose of medication intravenously to the next patient , like I had a day before. (P15)

The participants noted that the factors related to frequent callers may also expose paramedics to errors, as several visits to the same address can increase paramedics’ disregard, and something might go unnoticed during these missions. This disregard may manifest as not paying enough attention. Patients with substance abuse problems, especially patients with alcohol problems, were mentioned as a particular group of frequent callers. In such cases, a patient may be left at home after an incomplete assessment. Furthermore, verbally difficult patients who argue against or otherwise do not cooperate with the paramedics can make the paramedics’ work more complicated and divert attention to potentially irrelevant things. This may contribute to errors.

That disregard in terms of particular missions and patient groups—you don’t have enough strength with these same situations. We visit them 10 times for the same reason , and one day there is a real symptom , but you don’t have an interest in this patient anymore and when there is a real symptom , paramedics may ignore that because of the patient’s background. (P8)

Driving. Emergency response driving was seen as a stress factor per se; when paramedics drive quickly with lights and sirens on, it can lead to them making errors. A lack of communication between working pairs during emergency response driving was also mentioned in the interviews. Furthermore, participants said that falling asleep while driving and lack of local knowledge were additional factors that contribute to errors.

When you switch on the lights and sirens, they are already stress factors. (P11)

Challenging working conditions. The participants stated that a forest, for example, or a container can create challenging working conditions. These environments can influence the management of a mission because they are unexpected. Unpleasant scenes or health and safety risks can cause paramedics to hurry to leave the scene of the mission and thus may contribute to human error.

Challenging circumstances… that place that is absolutely not normal , or somewhere high up , somewhere where you must be suddenly able to work [to take care of the patient]. (P9)

Deviation from standard procedure. According to the participants, paramedics may provide incorrect care if they rely too heavily on the mission code, and the patient assessment is not done systematically and thoroughly. According to the participants, too many EMS workers involved in a situation can result in unintended carelessness where individuals are not working together, which in turn can lead to unstructured care. This can impact patient safety and was mentioned as a contributing factor to human error.

We start to give medication and there is a crowd… somebody pulls the medicine into the syringe and says , “I have done it.” Another next to them watches it happen but doesn’t look carefully , and then there are two milligrams instead of one , or something like that. That circumstance makes the situation… a lot of people , a little bit of pressure. (P12)

According to the participants, not double-checking the medications can contribute to human error. For instance, if the partners work too well together, they may decide that they do not need to double-check the medication, or it may be forgotten.

A classic mistake — you didn’t remember to double-check the medication , did not remember or did not bother to do that , and then an error happens. (P3)

Rarely given treatments were also described as contributing to errors, as something essential may not be noticed. In addition, new and rarely used medications and inexperience with them can cause mental pressure. The participants said that if a paramedic is not competent in using a medication or does not trust themselves or the medication itself, errors may occur.

Something may not be noticed when the stress is increasing so much and if you haven’t been or have rarely been on a so-called tough mission. (P9)

Organization of work

The category Organization of work consisted of three generic categories: Inadequate care guidelines , Interaction challenges and Challenges related to technological systems . These categories are formed with aspects that are mostly related to the organizational level. An overview of the second main category is presented in Fig.  3 .

figure 3

The second main category Organization of work

Inadequate care guidelines

According to the participants, if a standard operating procedure (SOP) is unclear, a paramedic may choose the wrong treatment. A lack of SOPs was also mentioned as leading to unsystematic and unstructured actions. The participants felt that concrete SOPs are needed because something essential may be forgotten even if they have practiced any special treatments or procedures beforehand. Paramedics may not know or be unsure about how to manage missions, procedures, and treatments if they do not have sufficient SOPs. Participants said that they sometimes have to make difficult decisions in unclear situations if they do not have adequate SOPs, and these decisions may be wrong for a patient. Moreover, recent updates to SOPs can also contribute to errors.

Guidelines that are poorly made—that is why a paramedic might understand the guideline wrong [and an error may happen]. (P13)

Based on the interviews, working in the border of two EMS operating areas and different options for follow-up care in neighbouring areas can make it challenging for paramedics to remember the care alternatives available in both their own and other districts.

Interaction challenges

The participants mentioned challenges related to teamwork and the transmission of information as contributors to human errors. For example, working in pairs was mentioned as a risk factor. If work partners do not get along well, cooperation may not be as high in quality as it should be. On the other hand, if work partners get along too well, and are too familiar with each other, some matters or essential procedures may be forgotten.

If you have a good , familiar working partner and you know the job goes well , you just forget in a situation , or you just don’t do something because [you think] , “We don’t make errors because we work so well together.” (P3) .

According to the participants, if one paramedic is at an advanced level and the other one is at a basic level, the paramedic with the more education may dominate decision-making and not acknowledge the basic-level paramedic. In addition, if a paramedic is inexperienced, it can cause them to feel uncertainty and lack the courage to voice their thoughts about a situation to a more experienced paramedic, which creates a risk for human error. Inexperience may also cause challenges if the partners do not trust each other or if one partner doubts their less-experienced partner’s decision-making. Conversely, there is also a risk of error if an experienced paramedic is overly trusting of the competence of a student who is doing their practical training and gives them too much responsibility.

[There can be] a strong-willed , experienced advanced-level paramedic and then there is a basic level , fairly inexperienced paramedic who hasn’t been listened to during this shift , whose lead hasn’t listened to them at all. Will they be heard in that moment when it would actually be reasonable if the other one is irritated , tired , and just moves on? (P6)

The importance of communication when working in pairs was emphasized in the interviews. Insufficient communication can decrease situational awareness, cause a breakdown in communication, and increase misunderstandings. Unclear communication or situations in which one of the working partners does not maintain situational awareness or participate in decision-making may lead to information not being shared. In such cases, one paramedic must work alone and make decisions by themselves.

You are trying to say something , but your working partner doesn’t understand for one reason or another; in other words , communication is unclear or incomplete , or you don’t know where you are going [in the situation]. In our minds , we may be treating two different patients. (P7)

Challenges related to technological systems

According to the participants, paramedics can be overly trusting of technological systems, and they often make assumptions without checking the accuracy of the information. Rarely used devices may cause human error if something goes unnoticed while a treatment is being performed, especially if a long time has passed since the previous training session.

You have had training , but you don’t remember how to use this device anymore. A device that is rarely used , for instance , external pacing; if you forget something , some nuance gets overlooked. (P12)

Paramedics themselves

The main category Paramedics themselves consisted of four generic categories: Issues that complicate cognitive processing , Individual strains and needs , Attitude problems and Impact of work experience . These categories consist of aspects that are mostly linked to paramedics’ personal issues. Figure  4 provides an overview of the third main category.

figure 4

The third main category, Paramedics themselves

Issues that complicate cognitive processing

Personal thoughts. Participants mentioned that a paramedic with emotionally stressful personal issues may have decreased concentration at work. When their thoughts are somewhere else, a paramedic may do something that is not necessary or that should be done in some other way. The first work shift after vacation was also mentioned as a factor that contribute to human error.

There is something else stressing you out, in your personal life. (P9)

Difficulties in decision-making. According to the participants, high-pressure situations in which paramedics must make quick decisions, weighing the pros and cons thereof, can contribute to human error. Errors can happen when a decision must be made but not all the essential issues of the situation are acknowledged. It was mentioned that fast information processing and information overflow can contribute to errors. A decision to not convey a patient to the hospital after evaluating their condition was mentioned as an example of a human error that may occur in such situations.

The errors happen when the decisions have to be made—you make , for example , a wrong decision or… I don’t want to say “the wrong decision” but you don’t acknowledge everything possible in your decision-making. (P7)

Individual strains and needs

Work overload. Work overload was mentioned as a factor contributing to human error because it can cause a lapse in concentration. Even if a paramedic has sufficient competence and knowledge, the accumulation of stressful situational factors may lead to errors. Furthermore, young paramedics may find it difficult to accept that errors occur and be afraid of making them, which increases their stress during their spare time as well.

There is pressure in this role , and it feels uncomfortable to be under pressure , and even after five years of work experience , I still feel uncomfortable. Working under pressure is not nice , but you have to try to bear it (P5)

Low energy levels. According to the participants, energy levels are lower at night, which increases the risk of human error. Specifically, early hours were mentioned as a time period that has a higher risk of errors. Fatigue may also be a contributing factor, for instance, if there are many missions during one shift. According to the participants, fatigue can lead to errors because something important may not be noticed or asked about during a patient’s assessment. Tiredness can lessen one’s ability to concentrate, leading to a risk of misunderstanding, and the conceptualization of the overall situation may be distorted. In addition, when paramedics are tired during their shift, they may ask for a doctor to order non-conveyance of a patient without a sufficient interview or examination. Fatigue can weaken the decision-making process as a whole.

You are tired , tired and hungry and you decide a little bit too quickly , for example , to not take a patient to the hospital by ambulance. With that rapid phone call to a doctor , you make a decision , and then perhaps there ends up being an issue that is not noticed. (P12)

Hunger was mentioned as contributing to human error; it may cause the paramedics to speed up, either unconsciously or intentionally, managing a mission quickly when they want to eat. Hunger was claimed to be a simple and unambiguous cause of errors in EMS settings.

It doesn’t need anything other than hunger, hurry, and fatigue. (P14)

Attitude problems

Low motivation toward work is strongly related to human error; it can lessen one’s understanding of the seriousness of a situation and negatively affect professional skills. The participants explained that if a paramedic is not highly motivated, leaving for a mission can be annoying, and a negative attitude can also affect colleagues’ dynamics. Learned working models, a negative attitude towards new information, and a stubbornness to do things one’s own way can also contribute to human error.

If the attitude is “could not care less , ” and when motivation eats your own professional skills , kind of…even though you are a skilled professional , if you don’t have the right attitude in that situation. (P5)

Impact of work experience

The participants mentioned that both inexperienced and a very long experience can cause human error, although for different reasons. The participants felt that an inexperienced working partner may not have enough competence in a situation, which can cause errors to happen. A paramedic not having experience in a certain mission type can increase stress levels and even manifest in incapacitation or enacting the wrong procedure. Inexperience can also cause an inability to reflect one’s competence or one believing that one’s skills are limited. Furthermore, the participants mentioned that an inexperienced paramedic may feel the need to show off or be afraid of admitting their ignorance, especially if they only have a fixed-term job contract. Keeping a lack of competence hidden may lead to errors.

If , for example , there are two inexperienced paramedics working together , human error can happen because they are not quite sure what they are doing. (P15)

However, the participants argued that long experience can also be a risk factor for human errors. Long experience can cause indifference and adherence to one’s working manners. A very experienced paramedic might do things automatically and not stop to think about what they are doing, and some patient safety procedures may be forgotten.

An experienced [paramedic] , they do things out of instinct , they don’t stop to think about those issues , and they don’t check everything. (P11)

In this study, our aim was to investigate paramedics’ opinions on the factors contributing to human errors. In the analysis, the main categories that were identified were Changing work environment , Organization of work , and Paramedics themselves . The analysis showed the interaction between the system and the paramedics, and how paramedics should be able to adapt their performance in different circumstances. These results support the prevalent theory of human error [ 4 ].

There are many factors related to working in the EMS that can contribute to human error. The findings of this study showed that paramedics must adapt to challenging and dynamic environments and circumstances. These situations may contribute to human error. In accordance with previous studies, a sense of urgency is a considerable stress factor in EMS settings [ 8 , 21 , 22 , 23 , 24 ]. Our results found that challenging work conditions can affect paramedics’ work, which is in line with the study by Bigham et al. [ 12 ]. Other emergency service professionals such as firefighters and first responders may face similar challenges as they work in the same prehospital emergency care setting, however, future studies are needed about the differences of challenges between these professions.

Relying too much on mission codes indicates that cognitive biases are common in the EMS [ 25 ]. If paramedics are overly reliant on dispatch information, they may have preconceived assumptions about a patient’s condition, which can cause bias [ 26 ], and contribute to human error. In addition, emergency response driving includes many risks of errors in EMS settings, also found in a previous study [ 27 ].

Particular patient groups can change work circumstances and require performance variability. In that way, those situations may contribute to errors. Treating critically ill patients in EMS settings includes many stressors that can affect paramedics’ work [ 23 , 28 ]. Another patient group that was mentioned was frequent callers. This study indicated that there may be a risk of ignoring necessary information with frequent callers or anchor information that is easily available with critically ill patients and as a result, the vigilance to notice other possible factors that may affect the condition could be reduced [ 25 ]. However, this phenomenon would require more focused studies.

Inadequate care guidelines may also contribute to human error; these findings indicate that many contributing factors are system-level issues which support the system approach to human error [ 3 ]. Moreover, previous studies have demonstrated that a lack of SOPs increases intuitive thinking processes that can expose individuals to cognitive biases [ 29 , 30 , 31 ]. A study by Diller et al. [ 29 ] showed that communication problems can stop the flow of information or cause misunderstandings. This study adds that there are many aspects and challenges related to teamwork that can negatively impact patient safety and care in EMS.

Paramedics must process large amounts of information in dynamic environments during EMS missions [ 32 ], and unique and multidimensional decision-making can create a risk for patient safety [ 33 , 34 ]. Factors related to the work environment, such as unsafe scenes and time pressure, can challenge paramedics’ decision-making [ 24 , 34 ]. Decision-making support systems might be a way for EMS organizations to support paramedics in their work and improve patient safety, for instance, using electronic SOPs as a decision supporting tool can improve patient safety [ 35 ]. However, further studies are needed about the factors that affect paramedics’ decision-making, and how paramedics decision-making can be supported under stressful circumstances.

The findings regarding work-related stress support evidence from previous studies [ 6 , 12 ]. Personal issues that are emotionally stressful can negatively impact paramedics’ concentration and patient safety. Many studies have investigated fatigue in the EMS setting, and the findings of this study are consistent therewith; fatigue creates risks related to both patient and occupational safety [ 6 , 7 , 24 , 36 ]. However, this study provides insight into how fatigue affects EMS workers from paramedics’ perspectives.

Our findings showed that attitude problems can contribute to human error. Many factors, including occupational stress, can reduce paramedics’ motivation to work. A few studies have indicated that EMS-specific factors, such as stressful and challenging environments, as well as occupational factors, can cause job dissatisfaction and negatively impact patient safety [ 37 , 38 ]. This is one of many reasons why organizations should become aware of these system-level issues and support paramedics’ psychological well-being at work. Moreover, further studies should examine, for instance, work motivation, and organizational factors.

Professional competence plays a key role in managing missions with different challenges in the EMS, which is why support from colleagues is needed [ 39 ]. The results of this study suggested that very experienced paramedics have set routines, which can create a risk for error. In addition, routine matters may not need particular attention, but external factors or interruptions can cause errors to occur [ 3 , 29 ]. The dynamic and complex work environment of the EMS can create favorable circumstances for routine-based errors if organizations do not understand and prepare for these factors.

Methodological considerations

Potential participants were recruited via social media with the aim of getting a wide representation from across Finland, and paramedics working in different parts of Finland could be best reached through a specific social media group. Using a social media platform for participant recruitment has limitations, as only those who use this platform could be reached, which could affect the number and the homogeneity of potential participants [ 40 ]. Still, more traditional recruitment methods have similar challenges and recruiting participants through social media has been found to be a useful and valid method [ 41 ]. All of the volunteer paramedics who were interested were included in this study. Their own interest can be assessed as at least partly stemming from the perception that they had a lot to contribute to the research. This supported the common goal of qualitative research of achieving in-depth understanding of the studied topic [ 42 ]. The data was saturated, meaning that no additional aspects were mentioned [ 43 ] and in-depth results can be assessed as achieved. Moreover, although qualitative research does not aim for broad generalizations [ 42 ], the convenience sampling method could limit the transferability of the results [ 40 ].

The inclusion criteria were that paramedics should be basic- or advanced-level paramedics or EMS field supervisors with any length of work experience and who worked in EMS at the time of recruitment. No other characteristics of the participants, such as age or education, were collected beyond gender, occupation, and EMS area. This is because in the recruitment letter, the potential participants were assured that their personal information would not be recorded. A lack of background information can be considered either a strength or a limitation. Still, during the interviews, all the participants share that they had several years of work experience in the EMS, and they represented various EMS areas in Finland. Selecting participants with different lengths of work experience could have produced more varied insights into the topic; however, there was a limited number of interested paramedics, hence any purposive sampling could not be used.

The first author conducted the interviews by herself. Clinical experience in the EMS setting was beneficial for asking specific follow-up questions during the interviews and gaining a deeper understanding of the participants’ perspectives. However, the interviewer’s pre-understanding of the research topic may have caused some bias toward the subject. During the interviews, the participants were encouraged to outline factors contributing to human errors by describing situations in which they had made an error or “a near miss” situation. However, that was voluntary, and the participants were not pressured to talk about such situations if they were a sensitive topic for them.

Interviews were conducted in three different ways (face-to-face, online and by phone) which can be seen both a strength or a limitation. With the use of multiple data collection methods, more participants could be reached because different methods allow access to geographically wider areas and a participant could choose the most appropriate method for themselves [ 44 , 45 , 46 ]. However, there may be challenges to build a rapport between the participant and the interviewer, for instance, in phone interviews. Moreover, the quality and depth of research data can vary when using multiple data collection methods [ 45 ]. Considering the aim and sampling of this study, multiple data collection methods were seen as appropriate for capturing an in-depth view of the research topic.

Conclusions

The paramedics recognized various factors that can contribute to human error in the EMS setting. Although the findings revealed that many of the contributing factors related to the paramedics themselves, system-level matters were also found to affect paramedics’ work and paramedics must adapt to different circumstances. Our findings shed new light on research in this area by investigating human error proactively from paramedics’ point of view. However, further qualitative and quantitative research is needed to form a deeper understanding of contributing factors of human error in the EMS setting.

Recommendations for future practice

To understand contributors to human errors at the level of practice and proactively, many individual and system-level matters should be acknowledged. Organizations and educational institutions can use the findings of this study to develop and refine procedures and supporting systems for paramedics, thereby improving patient safety.

Data availability

The datasets generated and analyzed during the current study are not publicly available for ethical reasons. The informed consent contained a statement that only researchers have access to the raw data and the findings would be presented in an anonymized way.

Abbreviations

  • Emergency medical services

Consolidated criteria for reporting qualitative research

University of Applied Sciences

Standard operating procedure

Liberati EG, Peerally MF, Dixon-Woods M. Learning from high risk industries may not be straightforward: a qualitative study of the hierarchy of risk controls approach in healthcare. Int J Qual Health Care. 2018;30(1):39–43.

Article   PubMed   Google Scholar  

Ruth CK. Human factors contributing to nursing errors [Dissertation]: University of Texas at Tyler; 2014.

Hollnagel E, Safety-I. and Safety-II. The past and future of safety management. 1st ed. Boca Raton: CRC Press Taylor & Francis Group; 2014.

Google Scholar  

Read GJM, Shorrock S, Walker GH, Salmon PM. State of science: evolving perspectives on ‘human error’. Ergonomics. 2021;64(9):1091–114.

Roth C, Brewer M, Wieck KL. Using a Delphi Method to identify human factors contributing to nursing errors. Nurs Forum. 2017;52(3):173–9.

Patterson PD, Weaver MD, Frank RC, Warner CW, Martin-Gill C, Guyette FX, et al. Association between poor sleep, fatigue, and safety outcomes in emergency medical services providers. Prehosp Emerg Care. 2012;16(1):86–97.

Donnelly EA, Bradford P, Davis M, Hedges C, Socha D, Morassutti P. Fatigue and safety in Paramedicine. CJEM. 2019;21(6):762–5.

Hagiwara MA, Magnusson C, Herlitz J, Seffel E, Axelsson C, Munters M, et al. Adverse events in prehospital emergency care: a trigger tool study. BMC Emerg Med. 2019;19(1):14.

Article   PubMed   PubMed Central   Google Scholar  

Andersson U, Maurin Söderholm H, Wireklint Sundström B, Andersson Hagiwara M, Andersson H. Clinical reasoning in the emergency medical services: an integrative review. Scand J Trauma Resusc Emerg Med. 2019;27(1):76.

Wasserberger J, Ordog GJ, Donoghue G, Balasubramaniam S. Base station prehospital care: judgement errors and deviations from protocol. Ann Emerg Med. 1987;16(8):867–71.

Article   CAS   PubMed   Google Scholar  

Crossman M. Technical and Environmental Impact on Medication Error in Paramedic Practice: a review of causes, consequences and strategies for Prevention. J Emerg Prim Health Care. 2009;7(3).

Bigham BL, Buick JE, Brooks SC, Morrison M, Shojania KG, Morrison LJ. Patient safety in emergency medical services: a systematic review of the literature. Prehosp Emerg Care. 2012;16(1):20–35.

Misasi P, Keebler JR. Medication safety in emergency medical services: approaching an evidence-based method of verification to reduce errors. Ther Adv Drug Saf. 2019;10:2042098618821916.

Dicicco-Bloom B, Crabtree BF. The qualitative research interview. Med Educ. 2006;40(4):314–21.

Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. 2007;19(6):349–57.

Health Care Act. Stat. 1326 (2010).

Degree of prehospital emergency care. Stat 585 (2017).

Raatiniemi L, Brattebo G. The challenge of ambulance missions to patients not in need of emergency medical care. Acta Anaesthesiol Scand. 2018;62(5):584–7.

Stratton SJ. Population Research: convenience sampling strategies. Prehosp Disaster Med. 2021;36(4):373–4.

Elo S, Kyngäs H. The qualitative content analysis process. J Adv Nurs. 2008;62(1):107–15.

Cushman JT, Fairbanks RJ, O’Gara KG, Crittenden CN, Pennington EC, Wilson MA, et al. Ambulance personnel perceptions of near misses and adverse events in pediatric patients. Prehosp Emerg Care. 2010;14(4):477–84.

Edland A, Svenson O. Judgment and decision making under time pressure. In: Svenson O, Maule AJ, editors. Time pressure and stress in Human Judgment and decision making. Boston (MA): Springer; 1993.

Lammers R, Byrwa M, Fales W. Root causes of errors in a simulated prehospital pediatric emergency. Acad Emerg Med. 2012;19(1):37–47.

Galy E, Cariou M, Mélan C. What is the relationship between mental workload factors and cognitive load types? Int J Psychophysiol. 2012;83(3):269–75.

Graber ML, Franklin N, Gordon R. Diagnostic error in internal medicine. Arch Intern Med. 2005;165(13):1493–9.

Johansson H, Lundgren K, Hagiwara MA. Reasons for bias in ambulance clinicians’ assessments of non-conveyed patients: a mixed-methods study. BMC Emerg Med. 2022;22(1).

Jakonen A, Manty M, Nordquist H. Safety checklists for Emergency Response Driving and Patient Transport: experiences from Emergency Medical services. Jt Comm J Qual Patient Saf. 2021;47(9):572–80.

PubMed   Google Scholar  

Lauria MJ, Gallo IA, Rush S, Brooks J, Spiegel R, Weingart SD. Psychological skills to improve Emergency Care Providers’ performance under stress. Ann Emerg Med. 2017;70(6):884–90.

Diller T, Helmrich G, Dunning S, Cox S, Buchanan A, Shappell S. The human factors analysis classification system (HFACS) applied to health care. Am J Med Qual. 2014;29(3):181–90.

Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inf Decis Mak. 2016;16(1):138.

Article   Google Scholar  

Pedersen I, Solevåg AL, Trygg Solberg M. Simulation-based training promotes higher levels of cognitive control in acute and unforeseen situations. Clin Simul Nurs. 2019;34:6–15.

Sedlár M. Cognitive skills of emergency medical services crew members: a literature review. BMC Emerg Med. 2020;20(1):44.

Reay G, Rankin JA, Smith-MacDonald L, Lazarenko GC. Creative adapting in a fluid environment: an explanatory model of paramedic decision making in the pre-hospital setting. BMC Emerg Med. 2018;18(1):42.

Bijani M, Abedi S, Karimi S, Tehranineshat B. Major challenges and barriers in clinical decision-making as perceived by emergency medical services personnel: a qualitative content analysis. BMC Emerg Med. 2021;21(1):1–12.

Bashiri A, Alizadeh Savareh B, Ghazisaeedi M. Promotion of prehospital emergency care through clinical decision support systems: opportunities and challenges. Clin Exp Emerg Med. 2019;6(4):288–96.

Ramey S, MacQuarrie A, Cochrane A, McCann I, Johnston CW, Batt AM. Drowsy and dangerous? Fatigue in paramedics: an overview. Ir J Paramedicine. 2019;4(1):1–9.

Eiche C, Birkholz T, Konrad F, Golditz T, Keunecke JG, Prottengeier J. Job satisfaction and performance orientation of paramedics in German Emergency Medical Services-A Nationwide Survey. Int J Environ Res Public Health. 2021;18(23).

Hammer JS, Mathews JJ, Lyons JS, Johnson NJ. Occupational stress within the paramedic profession: an initial report of stress levels compared to hospital employees. Ann Emerg Med. 1986;15(5):536–9.

Hörberg A, Jirwe M, Kalén S, Vicente V, Lindström V. We need support! A Delphi study about desirable support during the first year in the emergency medical service. Scand J Trauma Resusc Emerg Med. 2017;25(1):89.

Gill SL. Qualitative sampling methods. J Hum Lactation. 2020;36(4):579–81.

Thornton L, Batterham PJ, Fassnacht DB, Kay-Lambkin F, Calear AL, Hunt S. Recruiting for health, medical or psychosocial research using Facebook: systematic review. Internet Interventions. 2016;4:72–81.

Polit DF, Beck CT. Generalization in quantitative and qualitative research: myths and strategies. Int J Nurs Stud. 2010;47(11):1451–8.

Mwita K. Factors influencing data saturation in qualitative studies. Int J Res Bus Social Sci. 2022;11(4):2147–4478.

Heath J, Williamson H, Williams L, Harcourt D. It’s just more personal: using multiple methods of qualitative data collection to facilitate participation in research focusing on sensitive subjects. Appl Nurs Res. 2018;43:30–5.

Deakin H, Wakefield K. Skype interviewing: reflections of two PhD researchers. Qualitative Res. 2013;14(5):603–16.

Opdenakker R. Advantages and disadvantages of four interview techniques in qualitative research. Forum: Qualitative Social Res / Qualitative Sozialforschung. 2006;7(4):1.

Download references

Acknowledgements

The authors would like to thank the Finnish paramedics and EMS field supervisors who participated in this study.

Open access funded by Helsinki University Library. This research did not receive any other specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Open Access funding provided by University of Helsinki (including Helsinki University Central Hospital).

Author information

Authors and affiliations.

Faculty of Medicine, University of Helsinki, Helsinki, 00014, Finland

Anna Poranen & Hilla Nordquist

Faculty of Social Sciences, University of Helsinki, Helsinki, 00014, Finland

Anne Kouvonen & Hilla Nordquist

Centre for Public Health, Queen’s University Belfast, Belfast, BT12 6BA, Northern Ireland

Anne Kouvonen

South-Eastern Finland University of Applied Sciences, Kotka, 48220, Finland

Hilla Nordquist

You can also search for this author in PubMed   Google Scholar

Contributions

AP and HN designed the study. AP conducted the data collection and drafted the manuscript. AP and HN jointly performed the data analysis. HN and AK reviewed, edited and supervised the manuscript. All authors approved the final manuscript’s submission for publication.

Corresponding author

Correspondence to Anna Poranen .

Ethics declarations

Ethics approval and consent to participate.

The research process followed the ethical principles and good scientific practices defined by the Finnish National Board on Research Integrity. The study was carried out in accordance with relevant guidelines and regulations of the Declaration of Helsinki. The study protocol was reviewed and approved by the ethics committee of the South-Eastern Finland University of Applied Sciences on May 19, 2020. All participants were informed about the study’s purpose, process and that their interview responses would be used for scientific purposes. Information about the researchers and a statement of data protection were shared with the participants before each interview, after which written informed consent was obtained from the participants.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Poranen, A., Kouvonen, A. & Nordquist, H. Human errors in emergency medical services: a qualitative analysis of contributing factors. Scand J Trauma Resusc Emerg Med 32 , 78 (2024). https://doi.org/10.1186/s13049-024-01253-7

Download citation

Received : 07 November 2022

Accepted : 21 August 2024

Published : 30 August 2024

DOI : https://doi.org/10.1186/s13049-024-01253-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Patient safety
  • Human error

Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine

ISSN: 1757-7241

qualitative research nature

American Psychological Association Logo

Methods for Quantitative Research in Psychology

  • Conducting Research

Psychological Research

August 2023

qualitative research nature

This seven-hour course provides a comprehensive exploration of research methodologies, beginning with the foundational steps of the scientific method. Students will learn about hypotheses, experimental design, data collection, and the analysis of results. Emphasis is placed on defining variables accurately, distinguishing between independent, dependent, and controlled variables, and understanding their roles in research.

The course delves into major research designs, including experimental, correlational, and observational studies. Students will compare and contrast these designs, evaluating their strengths and weaknesses in various contexts. This comparison extends to the types of research questions scientists pose, highlighting how different designs are suited to different inquiries.

A critical component of the course is developing the ability to judge the quality of sources for literature reviews. Students will learn criteria for evaluating the credibility, relevance, and reliability of sources, ensuring that their understanding of the research literature is built on a solid foundation.

Reliability and validity are key concepts addressed in the course. Students will explore what it means for an observation to be reliable, focusing on consistency and repeatability. They will also compare and contrast different forms of validity, such as internal, external, construct, and criterion validity, and how these apply to various research designs.

The course concepts are thoroughly couched in examples drawn from the psychological research literature. By the end of the course, students will be equipped with the skills to design robust research studies, critically evaluate sources, and understand the nuances of reliability and validity in scientific research. This knowledge will be essential for conducting high-quality research and contributing to the scientific community.

Learning objectives

  • Describe the steps of the scientific method.
  • Specify how variables are defined.
  • Compare and contrast the major research designs.
  • Explain how to judge the quality of a source for a literature review.
  • Compare and contrast the kinds of research questions scientists ask.
  • Explain what it means for an observation to be reliable.
  • Compare and contrast forms of validity as they apply to the major research designs.

This program does not offer CE credit.

More in this series

Introduces applying statistical methods effectively in psychology or related fields for undergraduates, high school students, and professionals.

August 2023 On Demand Training

Introduces the importance of ethical practice in scientific research for undergraduates, high school students, and professionals.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

children-logo

Article Menu

qualitative research nature

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Unraveling childhood obesity: a grounded theory approach to psychological, social, parental, and biological factors.

qualitative research nature

Graphical Abstract

1. Introduction

2. materials and methods, 2.1. methodology, 2.2. inclusion-exclusion criteria, 2.3. search, 2.4. building the grounded theory, 4. discussion, 4.1. social factors, 4.2. biological-genetic factors, 4.3. psychological factors, 4.4. “family condition-related factors”, “parenting style factors”, and “feeding and health related practices”, 4.5. consequences of obesity, 4.6. grounded theory, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, appendix a. the categories, subcategories, and codes that emerged.

CATEGORY 1.
SOCIAL FACTORS
RELATED WITH PARENTAL SOCIAL STATUS: Socioeconomic status, Low or medium income, Social class, Occupation of the parents, Economic situation of family, Educational level of the parents (particularly maternal education), Unemployment of the parents, Poverty, Social vulnerabilities, Prolonged maternal full-time employment, Parental unemployment (particularly paternal unemployment, Migrant status, Occupational prestige, Poor quality of life, Parental cognitions
RELATED WITH SPECIFIC TIME PERIODS: The impact of COVID-19, The impact of the measures for the management of COVID-19, Consumption of cheap and easily available high-calorie food as a lifestyle, Decreased or lack of physical activity as a lifestyle, Lifestyle changes in teenagers, Overconsumption of foods and beverages as a lifestyle, Lack of undertaking physical activity in sport clubs in boys, Change in nutritional habits, Social changes, Generation specific effects, Lifestyle behaviors during pregnancy, Snacking dietary pattern in school children
RELATED WITH SPECIFIC GEOGRAPHIC LOCATIONS AND CULTURES: Living in rural areas, Poor-quality environments, Early feeding practices supported by family culture, Socioeconomic deprivation during the prenatal period and early childhood, Epidemiologic and demographic transitions, Urbanization, Affluence, Political environment, Failing economic environment, Cultural effects, Social inequality as a result of economic insecurity
RELATED WITH SPECIFIC IDEOLOGIES: Cultural beliefs that define a larger infant as representing a healthy and active child, Bogus beliefs and taboos, The concept that “chubby children look cute and lovely”, The concept that “overweight is a minor problem”, The concept that “a large infant is an indication of successful mothering”, Low subjective perceptions of social position, Gender inequalities, Gender roles, Women having primary responsibility in food parenting practices and nutrition, Fathers’ and mother’s beliefs and concerns about nutrition and physical activity, Mistakes of the parents on children’s appropriate diet and weight
RELATED WITH SOCIAL NETWORKS AND OTHER INFLUENCING FACTORS: Lack of support of parents in interventions aimed at the prevention and management of overweight, School environment, Lack of school-based strategies for obesity prevention, Low support from formal and informal sources, Low social support, Minimal social networks, Societal neglect, Lack of guidance of recommended dietary guidelines, Bereavement, Language barrier, Culture shock and lack of acceptance by the new nation in migrant children, Psychosocial stress and feelings of insecurity, Effect of the media, Intergenerational transmission of social disadvantage and health outcomes, Lack of nutritional discipline
CATEGORY 2.
GENETIC AND BIOLOGICAL FACTORS
GENETIC FACTORS: Age (greater effects on youngest), Gender (greater genetic effects on boys), Combination of the gender of both parent and child, Child’s birth weight, Familial height and weight, Height, Mother’s age at delivery, The composition of bacteria in the gut, the human microbiome, Genes influencing dopamine and serotonin function, Changes to the precursor stem cell of adipose cells and neurons related to appetite regulation, Epigenetic adaptations and changes, Intergenerational influences, Genetic makeup of individuals, Slow metabolism, Genomics
BIOLOGICAL FACTORS: Mechanism of metabolic programming, Heredity, Monogenic or endocrine causes, Metabolic pathways, Hormonal signaling, Altered glucose metabolism, Growth trajectory, Epigenetic influences that cause heritable alterations in gene expression, Intergenerational transfer of obesity, Intrauterine environment and biological programming, Developmental origins of disease, Low fat-free mass, Functional connectivity between the ventral striatum and emotion/motor preparation structures, Connectivity between the ventral striatum and amygdala and attention-related regions, Inflammatory markers, Earlier onset of puberty in females
FACTORS DURING PREGNANCY AND PRENATAL PERIOD: Mother’s diet during pregnancy, Maternal weight gain during pregnancy, Maternal obesity during the first trimester of pregnancy, Excess maternal weight prior to conception, Healthy diet and regular physical activity during pregnancy, Altered metabolism in offspring resulting from variations in the father’s diet, Hormonal signaling during pregnancy, Altered glucose metabolism during pregnancy, Exposure to leptin during the prenatal period, Changes in certain metabolic pathways during pregnancy, Alterations in maternal metabolism, Under- and overnutrition and micronutrient intake during pregnancy, Maternal biology, Gestational diabetes, Greater methylation of specific genes prenatally, Mothers unique influence on offspring body composition, possibly through intrauterine mechanisms, Excessive gestational weight gain (GWG), Rapid infant weight gain, Excess weight at ages 6 months, 1 year, and 2 years, Maternal and parental smoking during the prenatal period, Exercise during pregnancy, The food that a mother consumes and the experiences of taste and smell that function during fetal life, Changes to the placenta, Exercise during pregnancy
BIOLOGICAL AND OTHER INDICATORS FROM THE PARENTS: Abnormal body mass in at least one of the parents, Obesity in both parents affects boys and girls, Obese parents affect sons, Obese mothers affect daughters, Parents slimness in childhood, Parent’s diet, Taste and nutrition preferences of parents, Parents’ smoking habits affect children and especially girls, Paternal and maternal smoking during pregnancy, Mothers’ nutritional status throughout her life, Food cue responsiveness, Maternal smoking during her life
CATEGORY 3.
FAMILY CONDITION-RELATED FACTORS
PSYCHO-EMOTIONAL FACTORS RELATED WITH FAMILY AND PARENTS: Stress-coping styles presented by the mothers, Maternal stress, Lack of the ability of parents to regulate their emotions (sadness, stress, etc.), Child maltreatment, Quality of child care, Instrumental feeding, Emotional feeding, The parents’ experience of stress after the birth of the child and during toddlerhood, Insufficient capacity of mothers to decode nonverbal expressions of emotions, Fathers’ mixed levels of self-efficacy in food and activity parenting practices, Resistance from children as a major barrier to promoting healthy eating and physical activity at home
FAMILY-MEMBERS RELATIONAL FACTORS: Difficulties in family relationships, Poor family functioning, Home environment factors, Emotional climate during meals, Poor communication, Poor behavior control, High levels of family conflict, Low family hierarchy values, Discord between parents, Violence, Household dysfunction, The role of food in family gatherings, Family cohesion and flexibility, Family food rules or rituals
COGNITIVE PERCEPTIONS AND BEHAVIORAL FACTORS OF THE PARENTS: Taste and nutrition preferences of parents, Mothers’ nutritional status throughout her life, Parental healthy modeling, Low parental concerns about their child’s thinness, Parental concern about child weight, Parental difficulty in recognizing weight problems, Parental perceptions of the diet, Authoritative feeding style, Authoritarian (restrictive) feeding style, Autonomy-supportive food parenting practices
PREVAILING FAMILY CONDITIONS: Having only one son in the family, Parental separation or divorce, Living with a substance abuser, Imprisonment of a household member, Witnessing a parent being abused, Living with a mentally ill person, The effect of birth order, Being part of nontraditional families, Number of children in family, Adverse experiences in childhood, Limited time to take care of children
CATEGORY 4.
PSYCHOLOGICAL FACTORS
MENTAL HEALTH ISSUES: Depression, Anxiety, Eating disorders, Coping with stress, Infant’s temperament, Autism spectrum disorders, Attention-deficit hyperactivity disorder, Alexithymia, Behavior disorders, Negative emotionality, Negative self-evaluation, Poor self-image, Body dissatisfaction, Conduct problems, Hyperkinetic disorders (hyperactivity, inattention, and impulsivity), Peer relationship problems and prosocial behavior, Coping with stressful situations, Coping with traumatic experiences
PSYCHOLOGICAL FACTORS CONNECTED WITH FOOD CONSUMPTION: Emotion regulation with food, Disturbing behavior, Neophobia (fear of new foods), Food addiction, Tantrums over food, Delay of gratification, Overeating amongst girls, Binge eating, Emotional feeding from parents, Inability to monitor food intake, Emotional eating, Eating in the absence of hunger, Higher food responsiveness (being attracted to food and eating)
COPING WITH EMOTIONS ISSUES: Psychological control, Behavioral regulation, Social-emotional competence, Emotion and self-regulation, Inhibitory control, Emotional reactivity, Increased levels of negative affect, Less emotional awareness, Difficulty in coping with negative emotions, Child emotional insecurity, Problems with experiencing, describing, and identifying one’s emotions, Internalizing or externalizing difficulties, Emotional abuse
CATEGORY 5.
PARENTING STYLE
GENERAL PARENTING STYLE: Strict parenting style, Authoritative parenting style (Balanced use of open, communicative warmth and assertive discipline), Permissive parenting style (little to no discipline or control over a child), Authoritarian parenting style (Heavy use of control and discipline with little warm communication), Neglectful parenting style, Responsiveness of the parent, Demandingness of the parent (especially of the mother), Uninvolved parenting style, Negative parental practices, Uninvolved parenting style (parents who are low on both warmth and control), Inconsistent parenting, Poor parenting
RELATED TO EMOTIONAL AND PSYCHOLOGICAL SITUATIONS: Monitoring and controlling child activities and deviant behaviors, Lack of praise, Levels of parental and maternal emotional warmth, Parental psychological control, Family communication, Negative paternal and maternal communication, Parental neglect, Insecure attachment relationship, Lack of acceptance from the parents, Poor mother–child relationship followed by an insecure mother–child attachment, Parental interpersonal dysphoria, Maternal intrusiveness, Levels of parental support and encouragement, Overprotection, Coercive control, Differential parental treatment to the kids of a family, Soothing strategies for infant/toddler distress and fussiness, Parental responsiveness to their child’s needs, Absent parents, Maternal depression, self-esteem, financial strain, and maternal distress
CATEGORY 6.
FEEDING AND HEALTH RELATED PRACTICES
PRACTICES AROUND FOOD CONSUMPTION: Eating habits such as not drinking enough water, or not chewing food adequately, Not offering assistance during mealtimes, Early introduction of complementary solid foods, Exposure to a certain food type after a period of restriction to it, Pressing the children to eat, Not promoting self-regulation of the children, Parental strict limitations in food, Food fussiness, Absence of frequent family meals, Formula-fed infants, Age-inappropriate feeding, Greater role for fat and added sugars in foods, Reduced intakes of complex carbohydrates and dietary fiber, Reduced fruit and vegetable intake, Eating rate, Disinhibited eating, Use of food as a reward, Large portions, Response to children’s hunger and fullness cues, Breastfeeding period
HEALTH RELATED PRACTICES: Not enhancing physical activity, Not controlling screen time, Absence of establishment of rules for sleep schedules, Absence of age-appropriate sleep patterns and duration, Enhancing sedentary behavior, Use of car seats and strollers, Exposure to television and media, Sleep deprivation, Having a television in children’s bedrooms, Quality of sleep, Medication, Having the television on during dinner, Leisure time activities, Drug, alcohol, cigarette consumption, Not doing things together with children, Spending time with children in physical activities
PRACTICES AROUND FOOD PREPARATION AND AVAILABILITY: Availability of healthy food at home, Not educating children about nutrition, No involvement of the children in preparing meals, Not offering different choices for food consumption, Not discussing food choices with children, Absence of flexible, individualized dietary plan, Absence of clear and consistent rules related to food, Not respecting infant’s or toddler’s flavor or food preferences, Not respecting appetitive characteristics and traits, Allowing children unrestricted access to inappropriate foods or displaying no supportive guidance, Asserting strict control over all feeding behaviors, Not enhancing the children to eat both new and familiar foods, Intake of unhealthy snack foods as an easy choice
CATEGORY 7.
CONSEQUENCES OF OBESITY
SOCIAL: Weigh related stigma, Body image concerns, Being avoided, ignored, or the subject of negative rumors, Problems of integration with peers, Bullying, Joint problems, Dissatisfaction with one’s own body, High school drop-out, Reduced work integration, Poor quality of life
PSYCHOLOGICAL: Emotional difficulty, Mental disorders, Higher rates of sadness, loneliness, and nervousness, Decreased self-esteem, Psychological problems, Poor self-image, Depression, Anxiety, Psychiatric health problems, Suicidality, Poorer well-being
BIOLOGICAL: Increases mortality, Sixth risk factor for death, Cardiovascular disorders, Metabolic disorders, Adult obesity, Diabetes and insulin resistance, Renal and liver disorders, Musculoskeletal disorders, Respiratory disorders, Neurological disorders, Chronic diseases, Menstrual disorders, Fertility challenges, Cancers of the esophagus, pancreas, colon and rectum, breast (post-menopausal), endometrium, and kidney, Lower physical functioning performance, High blood pressure, Asthma
  • Rolland-Cachera, M.F.; Sempé, M.; Guilloud-Bataille, M.; Patois, E.; Péquignot-Guggenbuhl, F.; Fautrad, V. Adiposity Indices in Children. Am. J. Clin. Nutr. 1982 , 36 , 178–184. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kêkê, L.M.; Samouda, H.; Jacobs, J.; di Pompeo, C.; Lemdani, M.; Hubert, H.; Zitouni, D.; Guinhouya, B.C. Body Mass Index and Childhood Obesity Classification Systems: A Comparison of the French, International Obesity Task Force (IOTF) and World Health Organization (WHO) References. Rev. Epidemiol. Sante Publique 2015 , 63 , 173–182. [ Google Scholar ] [ CrossRef ]
  • Kamal, S.A. In search of a definition of childhood obesity. Int. J. Biol. Biotechnol. 2017 , 14 , 49–67. [ Google Scholar ]
  • Faienza, M.F.; Chiarito, M.; Molina-Molina, E.; Shanmugam, H.; Lammert, F.; Krawczyk, M.; D’Amato, G.; Portincasa, P. Childhood Obesity, Cardiovascular and Liver Health: A Growing Epidemic with Age. World J. Pediatr. 2020 , 16 , 438–445. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Haththotuwa, R.N.; Wijeyaratne, C.N.; Senarath, U. Chapter 1—Worldwide Epidemic of Obesity. In Obesity and Obstetrics , 2nd ed.; Mahmood, T.A., Arulkumaran, S., Chervenak, F.A., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; pp. 3–8. ISBN 978-0-12-817921-5. [ Google Scholar ]
  • Frazier, J.A.; Li, X.; Kong, X.; Hooper, S.R.; Joseph, R.M.; Cochran, D.M.; Kim, S.; Fry, R.C.; Brennan, P.A.; Msall, M.E.; et al. Perinatal Factors and Emotional, Cognitive, and Behavioral Dysregulation in Childhood and Adolescence. J. Am. Acad. Child. Adolesc. Psychiatry 2023 , 62 , 1351–1362. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Vazquez, C.E.; Cubbin, C. Socioeconomic Status and Childhood Obesity: A Review of Literature from the Past Decade to Inform Intervention Research. Curr. Obes. Rep. 2020 , 9 , 562–570. [ Google Scholar ] [ CrossRef ]
  • Jebeile, H.; Kelly, A.S.; O’Malley, G.; Baur, L.A. Obesity in Children and Adolescents: Epidemiology, Causes, Assessment, and Management. Lancet Diabetes Endocrinol. 2022 , 10 , 351–365. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Omer, T.A.M. The Causes of Obesity: An in-Depth Review. Adv. Obes. Weight Manag. Control 2020 , 10 , 90–94. [ Google Scholar ] [ CrossRef ]
  • Chatham, R.E.; Mixer, S.J. Cultural Influences on Childhood Obesity in Ethnic Minorities: A Qualitative Systematic Review. J. Transcult. Nurs. 2020 , 31 , 87–99. [ Google Scholar ] [ CrossRef ]
  • Hemmingsson, E.; Nowicka, P.; Ulijaszek, S.; Sørensen, T.I.A. The Social Origins of Obesity within and across Generations. Obes. Rev. 2023 , 24 , e13514. [ Google Scholar ] [ CrossRef ]
  • Deal, B.J.; Huffman, M.D.; Binns, H.; Stone, N.J. Perspective: Childhood Obesity Requires New Strategies for Prevention. Adv. Nutr. 2020 , 11 , 1071–1078. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kansra, A.R.; Lakkunarajah, S.; Jay, M.S. Childhood and Adolescent Obesity: A Review. Front. Pediatr. 2020 , 8 , 581461. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Marcus, C.; Danielsson, P.; Hagman, E. Pediatric Obesity-Long-Term Consequences and Effect of Weight Loss. J. Intern. Med. 2022 , 292 , 870–891. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Handakas, E.; Lau, C.H.; Alfano, R.; Chatzi, V.L.; Plusquin, M.; Vineis, P.; Robinson, O. A Systematic Review of Metabolomic Studies of Childhood Obesity: State of the Evidence for Metabolic Determinants and Consequences. Obes. Rev. 2022 , 23 (Suppl. S1), e13384. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Caprio, S.; Santoro, N.; Weiss, R. Childhood Obesity and the Associated Rise in Cardiometabolic Complications. Nat. Metab. 2020 , 2 , 223–232. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Rojo, M.; Solano, S.; Lacruz, T.; Baile, J.I.; Blanco, M.; Graell, M.; Sepúlveda, A.R. Linking Psychosocial Stress Events, Psychological Disorders and Childhood Obesity. Children 2021 , 8 , 211. [ Google Scholar ] [ CrossRef ]
  • Thaker, V.V.; Osganian, S.K.; deFerranti, S.D.; Sonneville, K.R.; Cheng, J.K.; Feldman, H.A.; Richmond, T.K. Psychosocial, Behavioral and Clinical Correlates of Children with Overweight and Obesity. BMC Pediatr. 2020 , 20 , 291. [ Google Scholar ] [ CrossRef ]
  • Smith, J.D.; Fu, E.; Kobayashi, M.A. Prevention and Management of Childhood Obesity and Its Psychological and Health Comorbidities. Annu. Rev. Clin. Psychol. 2020 , 16 , 351–378. [ Google Scholar ] [ CrossRef ]
  • Campos, P.; Luceño, L.; Aguirre, C. Physical Spaces in Higher Education as Scenarios of Learning Innovation: Compositional and Formative Synergies among Architecture, Music, and Fashion. Eur. J. Investig. Health Psychol. Educ. 2021 , 11 , 1166–1180. [ Google Scholar ] [ CrossRef ]
  • Snelling, A.; Hawkins, M.; McClave, R.; Irvine Belson, S. The Role of Teachers in Addressing Childhood Obesity: A School-Based Approach. Nutrients 2023 , 15 , 3981. [ Google Scholar ] [ CrossRef ]
  • Spinelli, A.; Censi, L.; Mandolini, D.; Ciardullo, S.; Salvatore, M.A.; Mazzarella, G.; Nardone, P.; 2019 OKkio alla SALUTE Group. Inequalities in Childhood Nutrition, Physical Activity, Sedentary Behaviour and Obesity in Italy. Nutrients 2023 , 15 , 3893. [ Google Scholar ] [ CrossRef ]
  • Buksh, S.M.; Hay, P.; de Wit, J.B.F. Perceptions on Healthy Eating Impact the Home Food Environment: A Qualitative Exploration of Perceptions of Indigenous Food Gatekeepers in Urban Fiji. Nutrients 2023 , 15 , 3875. [ Google Scholar ] [ CrossRef ]
  • Figueira, M.; Araújo, J.; Gregório, M.J. Monitoring Food Marketing Directed to Portuguese Children Broadcasted on Television. Nutrients 2023 , 15 , 3800. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bondyra-Wiśniewska, B.; Harton, A. Effect of the Nutritional Intervention Program on Body Weight and Selected Cardiometabolic Factors in Children and Adolescents with Excess Body Weight and Dyslipidemia: Study Protocol and Baseline Data. Nutrients 2023 , 15 , 3646. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gioxari, A.; Amerikanou, C.; Peraki, S.; Kaliora, A.C.; Skouroliakou, M. Eating Behavior and Factors of Metabolic Health in Primary Schoolchildren: A Cross-Sectional Study in Greek Children. Nutrients 2023 , 15 , 3592. [ Google Scholar ] [ CrossRef ]
  • Pavlidou, E.; Papandreou, D.; Taha, Z.; Mantzorou, M.; Tyrovolas, S.; Kiortsis, D.N.; Psara, E.; Papadopoulou, S.K.; Yfantis, M.; Spanoudaki, M.; et al. Association of Maternal Pre-Pregnancy Overweight and Obesity with Childhood Anthropometric Factors and Perinatal and Postnatal Outcomes: A Cross-Sectional Study. Nutrients 2023 , 15 , 3384. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Silva-Uribe, M.; Máynez-López, F.; Denova-Gutiérrez, E.; Muñoz-Guerrero, B.; Omaña-Guzmán, I.; Messiah, S.E.; Ruíz-Arroyo, A.; Lozano-González, E.; Villanueva-Ortega, E.; Muñoz-Aguirre, P.; et al. Validation of the Childhood Family Mealtime Questionnaire in Mexican Adolescents with Obesity and Their Caregivers. Nutrients 2023 , 15 , 4937. [ Google Scholar ] [ CrossRef ]
  • Calcaterra, V.; Rossi, V.; Tagi, V.M.; Baldassarre, P.; Grazi, R.; Taranto, S.; Zuccotti, G. Food Intake and Sleep Disorders in Children and Adolescents with Obesity. Nutrients 2023 , 15 , 4736. [ Google Scholar ] [ CrossRef ]
  • Mannino, A.; Sarapis, K.; Mourouti, N.; Karaglani, E.; Anastasiou, C.A.; Manios, Y.; Moschonis, G. The Association of Maternal Weight Status throughout the Life-Course with the Development of Childhood Obesity: A Secondary Analysis of the Healthy Growth Study Data. Nutrients 2023 , 15 , 4602. [ Google Scholar ] [ CrossRef ]
  • Lin, S.-F.; Zive, M.M.; Schmied, E.; Helm, J.; Ayala, G.X. The Effects of a Multisector, Multilevel Intervention on Child Dietary Intake: California Childhood Obesity Research Demonstration Study. Nutrients 2023 , 15 , 4449. [ Google Scholar ] [ CrossRef ]
  • Blake, M.K.; Ma, R.; Cardenas, E.V.; Varanloo, P.; Agosto, Y.; Velasquez, C.; Espina, K.A.; Palenzuela, J.; Messiah, S.E.; Natale, R.A. Infant Nutrition and Other Early Life Risk Factors for Childhood Obesity According to Disability Status. Nutrients 2023 , 15 , 4394. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Arellano-Alvarez, P.; Muñoz-Guerrero, B.; Ruiz-Barranco, A.; Garibay-Nieto, N.; Hernandez-Lopez, A.M.; Aguilar-Cuarto, K.; Pedraza-Escudero, K.; Fuentes-Corona, Z.; Villanueva-Ortega, E. Barriers in the Management of Obesity in Mexican Children and Adolescents through the COVID-19 Lockdown—Lessons Learned and Perspectives for the Future. Nutrients 2023 , 15 , 4238. [ Google Scholar ] [ CrossRef ]
  • González-Torres, M.L.; Garza-Olivares, X.; Navarro-Contreras, G.; González-Orozco, L.A. Validation of the Scale on Parental Feeding Behaviors (ECOPAL) for Caregivers of Mexican Children. Nutrients 2023 , 15 , 3698. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Borkhoff, S.A.; Parkin, P.C.; Birken, C.S.; Maguire, J.L.; Macarthur, C.; Borkhoff, C.M. Examining the Double Burden of Underweight, Overweight/Obesity and Iron Deficiency among Young Children in a Canadian Primary Care Setting. Nutrients 2023 , 15 , 3635. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Kim, J.H.; Lee, E.; Ha, E.K.; Lee, G.C.; Shin, J.; Baek, H.-S.; Choi, S.-H.; Shin, Y.H.; Han, M.Y. Infant Feeding Pattern Clusters Are Associated with Childhood Health Outcomes. Nutrients 2023 , 15 , 3065. [ Google Scholar ] [ CrossRef ]
  • Strauss, A.; Corbin, J. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory , 2nd ed.; Sage Publications, Inc.: Thousand Oaks, CA, USA, 1998; ISBN 978-0-8039-5939-2. [ Google Scholar ]
  • Glaser, B.G.; Strauss, A.L. The Discovery of Grounded Theory: Strategies for Qualitative Research , 1st ed.; Routledge: London, UK, 2017; ISBN 978-0-203-79320-6. [ Google Scholar ]
  • Bidopia, T. The Development of Disordered Eating and Body Image Issues in Latina Adolescent Girls: A Grounded Theory Approach. Master’s Thesis, Fordham University, New York, NY, USA, 2023; pp. 1–110. [ Google Scholar ]
  • Vaezghasemi, M.; Öhman, A.; Ng, N.; Hakimi, M.; Eriksson, M. Concerned and Conscious, but Defenceless-the Intersection of Gender and Generation in Child Malnutrition in Indonesia: A Qualitative Grounded Theory Study. Glob. Health Action 2020 , 13 , 1744214. [ Google Scholar ] [ CrossRef ]
  • Nguyen, T.; Trat, T.; Tieu, N.T.; Vu, L.; Sokal-Gutierrez, K. Key Informants’ Perspectives on Childhood Obesity in Vietnam: A Qualitative Study. Matern. Child Health J. 2022 , 26 , 1811. [ Google Scholar ] [ CrossRef ]
  • Verga, S.M.P.; de Azevedo Mazza, V.; Teodoro, F.C.; Girardon-Perlini, N.M.O.; Marcon, S.S.; de Almeida Fernandes Rodrigues, É.T.; Ruthes, V.B.T.N.M. The Family System Seeking to Transform Its Eating Behavior in the Face of Childhood Obesity. Rev. Bras. Enferm. 2022 , 75 , e20210616. [ Google Scholar ] [ CrossRef ]
  • Jansen, E.; Harris, H.; Rossi, T. Fathers’ Perceptions of Their Role in Family Mealtimes: A Grounded Theory Study. J. Nutr. Educ. Behav. 2020 , 52 , 45–54. [ Google Scholar ] [ CrossRef ]
  • Ayre, S.K.; White, M.J.; Harris, H.A.; Byrne, R.A. “I’m Having Jelly Because You’ve Been Bad!”: A Grounded Theory Study of Mealtimes with Siblings in Australian Families. Matern. Child Nutr. 2023 , 19 , e13484. [ Google Scholar ] [ CrossRef ]
  • Middleton, G.; Golley, R.K.; Patterson, K.A.; Coveney, J. Barriers and Enablers to the Family Meal across Time; a Grounded Theory Study Comparing South Australian Parents’ Perspectives. Appetite 2023 , 191 , 107091. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Eaton, E. Parental Perspectives of the Barriers to Sustaining Health Behaviour Change for Their Child Living with Overweight or Obesity–A Grounded Theory Study. Ph.D. Thesis, University of Essex, Colchester, UK, 2023. [ Google Scholar ]
  • Bisogni, C.A.; Jastran, M.; Seligson, M.; Thompson, A. How People Interpret Healthy Eating: Contributions of Qualitative Research. J. Nutr. Educ. Behav. 2012 , 44 , 282–301. [ Google Scholar ] [ CrossRef ]
  • Ponterotto, J.G. Qualitative Research in Counseling Psychology: A Primer on Research Paradigms and Philosophy of Science. J. Couns. Psychol. 2005 , 52 , 126–136. [ Google Scholar ] [ CrossRef ]
  • Strauss, A.; Corbin, J. Basics of Qualitative Research Techniques , 2nd ed.; Sage Publications: Thousand Oaks, CA, USA, 1998. [ Google Scholar ]
  • Corbin, J.; Strauss, A. Basics of Qualitative Research: Techniques and Procedures for Developing Grounded Theory , 3rd ed.; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2008; ISBN 978-1-4522-3015-3. [ Google Scholar ]
  • Charmaz, K. Grounded Theory: Main Characteristics. Qual. Anal. Eight Approaches Soc. Sci. 2020 , 1 , 195–222. [ Google Scholar ]
  • Mohajan, D.; Mohajan, H. Classic Grounded Theory: A Qualitative Research on Human Behavior. Stud. Soc. Sci. Humanit. 2022 , 2 , 1–7. [ Google Scholar ] [ CrossRef ]
  • Ligita, T.; Harvey, N.; Wicking, K.; Nurjannah, I.; Francis, K. A Practical Example of Using Theoretical Sampling throughout a Grounded Theory Study: A Methodological Paper. Qual. Res. J. 2020 , 20 , 116–126. [ Google Scholar ] [ CrossRef ]
  • Coleman, P. Validity and Reliability within Qualitative Research for the Caring Sciences. Int. J. Caring Sci. 2022 , 14 , 2041–2045. [ Google Scholar ]
  • Morse, J.M.; Barrett, M.; Mayan, M.; Olson, K.; Spiers, J. Verification Strategies for Establishing Reliability and Validity in Qualitative Research. Int. J. Qual. Methods 2002 , 1 , 13–22. [ Google Scholar ] [ CrossRef ]
  • Hendrawan, E.; Meisel, M.; Sari, D.N. Analysis and implementation of computer network systems using software draw.io. Asia Inf. Syst. J. 2023 , 2 , 9–15. [ Google Scholar ] [ CrossRef ]
  • Glaser, B.G. Doing Grounded Theory: Issues and Discussions ; Sociology Press: Mill Valley, CA, USA, 1998; ISBN 978-1-884156-11-3. [ Google Scholar ]
  • Thornberg, R. Informed Grounded Theory. Scand. J. Educ. Res. 2012 , 56 , 243–259. [ Google Scholar ] [ CrossRef ]
  • Thornberg, R.; Charmaz, K. The SAGE Handbook of Qualitative Data Analysis ; SAGE Publications Ltd.: Thousand Oaks, CA, USA, 2014; ISBN 978-1-4462-8224-3. [ Google Scholar ]
  • Batko, B.; Kowal, M.; Szwajca, M.; Pilecki, M. Relationship between biopsychosocial factors, body mass and body composition in preschool children. Psychiatr. I Psychol. Klin.-J. Psychiatry Clin. Psychol. 2020 , 20 , 164–173. [ Google Scholar ] [ CrossRef ]
  • Carnell, S.; Kim, Y.; Pryor, K. Fat Brains, Greedy Genes, and Parent Power: A Biobehavioural Risk Model of Child and Adult Obesity. Int. Rev. Psychiatry 2012 , 24 , 189–199. [ Google Scholar ] [ CrossRef ]
  • Chatzidaki, E.; Chioti, V.; Mourtou, L.; Papavasileiou, G.; Kitani, R.-A.; Kalafatis, E.; Mitsis, K.; Athanasiou, M.; Zarkogianni, K.; Nikita, K. Parenting Styles and Psychosocial Factors of Mother–Child Dyads Participating in the ENDORSE Digital Weight Management Program for Children and Adolescents during the COVID-19 Pandemic. Children 2024 , 11 , 107. [ Google Scholar ] [ CrossRef ]
  • Coleman, J.R.I.; Krapohl, E.; Eley, T.C.; Breen, G. Individual and Shared Effects of Social Environment and Polygenic Risk Scores on Adolescent Body Mass Index. Sci. Rep. 2018 , 8 , 6344. [ Google Scholar ] [ CrossRef ]
  • Do, L.M.; Larsson, V.; Tran, T.K.; Nguyen, H.T.; Eriksson, B.; Ascher, H. Vietnamese Mother’s Conceptions of Childhood Overweight: Findings from a Qualitative Study. Glob. Health Action 2016 , 9 , 30215. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Faith, M.S.; Berkowitz, R.I.; Stallings, V.A.; Kerns, J.; Storey, M.; Stunkard, A.J. Eating in the Absence of Hunger: A Genetic Marker for Childhood Obesity in Prepubertal Boys? Obesity 2006 , 14 , 131–138. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Haire-Joshu, D.; Tabak, R. Preventing Obesity Across Generations: Evidence for Early Life Intervention. Annu. Rev. Public Health 2016 , 37 , 253–271. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Holmen, T.L.; Bratberg, G.; Krokstad, S.; Langhammer, A.; Hveem, K.; Midthjell, K.; Heggland, J.; Holmen, J. Cohort Profile of the Young-HUNT Study, Norway: A Population-Based Study of Adolescents. Int. J. Epidemiol. 2014 , 43 , 536–544. [ Google Scholar ] [ CrossRef ]
  • Iguacel, I.; Fernández-Alvira, J.M.; Ahrens, W.; Bammann, K.; Gwozdz, W.; Lissner, L.; Michels, N.; Reisch, L.; Russo, P.; Szommer, A.; et al. Prospective Associations between Social Vulnerabilities and Children’s Weight Status. Results from the IDEFICS Study. Int. J. Obes. 2018 , 42 , 1691–1703. [ Google Scholar ] [ CrossRef ]
  • Ji, M.; An, R. Parenting styles in relation to childhood obesity, smoking and drinking: A gene–environment interaction study. J. Hum. Nutr. 2022 , 35 , 625–633. [ Google Scholar ] [ CrossRef ]
  • Regber, S.; Dahlgren, J.; Janson, S. Neglected Children with Severe Obesity Have a Right to Health: Is Foster Home an Alternative?—A Qualitative Study. Child Abus. Negl. 2018 , 83 , 106–119. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ji, M.; An, R. Parental Effects on Obesity, Smoking, and Drinking in Children and Adolescents: A Twin Study. J. Adolesc. Health 2022 , 71 , 196–203. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Paul, I.M.; Williams, J.S.; Anzman-Frasca, S.; Beiler, J.S.; Makova, K.D.; Marini, M.E.; Hess, L.B.; Rzucidlo, S.E.; Verdiglione, N.; Mindell, J.A. The Intervention Nurses Start Infants Growing on Healthy Trajectories (INSIGHT) Study. BMC Pediatr. 2014 , 14 , 184. [ Google Scholar ] [ CrossRef ]
  • Van De Beek, C.; Hoek, A.; Painter, R.C.; Gemke, R.J.; Van Poppel, M.N.; Geelen, A.; Groen, H.; Mol, B.W.; Roseboom, T.J. Women, their Offspring and improving lifestyle for Better cardiovascular health of both (WOMB project): A protocol of the follow-up of a multicenter randomized controlled trial. BMJ Open 2018 , 8 , e016579. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Vedanthan, R.; Bansilal, S.; Soto, A.V.; Kovacic, J.C.; Latina, J.; Jaslow, R.; Santana, M.; Gorga, E.; Kasarskis, A.; Hajjar, R.; et al. Family-Based Approaches to Cardiovascular Health Promotion. J. Am. Coll. Cardiol. 2016 , 67 , 1725–1737. [ Google Scholar ] [ CrossRef ]
  • Murrin, C.M.; Kelly, G.E.; Tremblay, R.E.; Kelleher, C.C. Body Mass Index and Height over Three Generations: Evidence from the Lifeways Cross-Generational Cohort Study. BMC Public Health 2012 , 12 , 81. [ Google Scholar ] [ CrossRef ]
  • Oparaocha, E. Childhood Obesity in Nigeria: Causes and Suggestions for Control. Niger. J. Parasitol. 2018 , 39 , 1–7. [ Google Scholar ] [ CrossRef ]
  • Kiefner-Burmeister, A.; Hinman, N. The Role of General Parenting Style in Child Diet and Obesity Risk. Curr. Nutr. Rep. 2020 , 9 , 14–30. [ Google Scholar ] [ CrossRef ]
  • Poulain, T.; Baber, R.; Vogel, M.; Pietzner, D.; Kirsten, T.; Jurkutat, A.; Hiemisch, A.; Hilbert, A.; Kratzsch, J.; Thiery, J. The LIFE Child Study: A Population-Based Perinatal and Pediatric Cohort in Germany. Eur. J. Epidemiol. 2017 , 32 , 145–158. [ Google Scholar ] [ CrossRef ]
  • McDonald, G.; Faga, P.; Jackson, D.; Mannix, J.; Firtko, A. Mothers’ Perceptions of Overweight and Obesity in Their Children. Aust. J. Adv. Nurs. 2005 , 23 , 8–13. [ Google Scholar ]
  • Zhang, Y.; Hurtado, G.A.; Flores, R.; Alba-Meraz, A.; Reicks, M. Latino Fathers’ Perspectives and Parenting Practices Regarding Eating, Physical Activity, and Screen Time Behaviors of Early Adolescent Children: Focus Group Findings. J. Acad. Nutr. Diet 2018 , 118 , 2070–2080. [ Google Scholar ] [ CrossRef ]
  • Suder, A.; Chrzanowska, M. Risk factors for abdominal obesity in children and adolescents from Cracow, Poland (1983–2000). J. Biosoc. Sci. 2015 , 47 , 203–219. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Russell, C.G.; Russell, A. Biological and Psychosocial Processes in the Development of Children’s Appetitive Traits: Insights from Developmental Theory and Research. Nutrients 2018 , 10 , 692. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mazzeo, S.; Mitchell, K.; Gerke, C.; Bulik, C. Parental Feeding Style and Eating Attitudes: Influences on Children’s Eating Behavior. Curr. Nutr. Food Sci. 2006 , 2 , 275–295. [ Google Scholar ] [ CrossRef ]
  • Levitt, H.M. Qualitative Generalization, Not to the Population but to the Phenomenon: Reconceptualizing Variation in Qualitative Research. Qual. Psychol. 2021 , 8 , 95. [ Google Scholar ] [ CrossRef ]
  • Iguacel, I.; Gasch-Gallén, Á.; Ayala-Marín, A.M.; De Miguel-Etayo, P.; Moreno, L.A. Social Vulnerabilities as Risk Factor of Childhood Obesity Development and Their Role in Prevention Programs. Int. J. Obes. 2021 , 45 , 1–11. [ Google Scholar ] [ CrossRef ]
  • Grube, M.; Bergmann, S.; Keitel, A.; Herfurth-Majstorovic, K.; Wendt, V.; von Klitzing, K.; Klein, A.M. Obese Parents–Obese Children? Psychological-Psychiatric Risk Factors of Parental Behavior and Experience for the Development of Obesity in Children Aged 0–3: Study Protocol. BMC Public Health 2013 , 13 , 1193. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Polit, D.F.; Beck, C.T. Generalization in Quantitative and Qualitative Research: Myths and Strategies. Int. J. Nurs. Stud. 2010 , 47 , 1451–1458. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Author-AuthorsYearTitleTypes of Factors Reference Number
Batko, B., Kowal, M., Szwajca, M., and Pilecki, M. 2020Relationship between biopsychosocial factors, body mass and body composition in preschool childrenBiological and psychological factors[ ]
Carnell, S., Kim, Y., and Pryor, K. 2012Fat brains, greedy genes, and parent power: A biobehavioral risk model of child and adult obesityParental and biological factors[ ]
Chatzidaki, E., Chioti, V., Mourtou, L., Papavasileiou, G., Kitani, R.-A., Kalafatis, E., Mitsis, K., Athanasiou, M., Zarkogianni, K., and Nikita, K. 2024Parenting styles and psychosocial factors of mother–child dyads participating in the ENDORSE digital weight management program for children and adolescents during the COVID-19 pandemicParental, social and psychological factors[ ]
Coleman, J. R., Krapohl, E., Eley, T. C., and Breen, G. 2018Individual and shared effects of social environment and polygenic risk scores on adolescent body mass indexSocial and biological factors [ ]
Do, L. M., Larsson, V., Tran, T. K., Nguyen, H. T., Eriksson, B., and Ascher, H. 2016Vietnamese mother’s conceptions of childhood overweight: Findings from a qualitative studyParental factors[ ]
Faith, M. S., Berkowitz, R. I., Stallings, V. A., Kerns, J., Storey, M., and Stunkard, A. J. 2006Eating in the absence of hunger: A genetic marker for childhood obesity in prepubertal boys?Social factors[ ]
Haire-Joshu, D., and Tabak, R. 2016Preventing obesity across generations: Evidence for early life interventionSocial and biological factors[ ]
Holmen, T. L., Bratberg, G., Krokstad, S., Langhammer, A., Hveem, K., Midthjell, K., Heggland, J., and Holmen, J. 2014Cohort profile of the young-HUNT study, Norway: A population-based study of adolescentsBiological and psychological factors[ ]
Iguacel, I., Fernández-Alvira, J. M., Ahrens, W., Bammann, K., Gwozdz, W., Lissner, L., Michels, N., Reisch, L., Russo, P., and Szommer, A. 2018Prospective associations between social vulnerabilities and children’s weight status. Results from the IDEFICS studySocial factors[ ]
Ji, M. and An, R. 2022aParental effects on obesity, smoking, and drinking in children and adolescents: A twin studyParental factors[ ]
Ji, M. and An, R. 2022bParenting styles in relation to childhood obesity, smoking, and drinking: A gene–environment interaction studySocial and biological factors[ ]
Kiefner-Burmeister, A., and Hinman, N. 2020The role of general parenting style in child diet and obesity riskParental factors[ ]
Grube, M., Bergmann, S., Keitel, A., Herfurth-Majstorovic, K., Wendt, V., von Klitzing, K., and Klein, A.M.2013Obese parents—obese children? Psychological-psychiatric risk factors of parental behavior and experience for the development of obesity in children aged 0–3: Study protocolParental and psychological factors[ ]
Mazzeo, S. E., Mitchell, K. S., Gerke, C. K., and Bulik, C. M. 2006Parental feeding style and eating attitudes: Influences on children’s eating behaviorParental and psychological factors[ ]
McDonald, G., Faga, P., Jackson, D., Mannix, J., and Firtko, A. 2005Mothers’ perceptions of overweight and obesity in their childrenParental factors[ ]
Murrin, C. M., Kelly, G. E., Tremblay, R. E., and Kelleher, C. C. 2012Body mass index and height over three generations: evidence from the Lifeways cross-generational cohort studyBiological factors[ ]
Oparaocha, E. 2018Childhood obesity in Nigeria: Causes and suggestions for controlSocial factors[ ]
Paul, I. M., Williams, J. S., Anzman-Frasca, S., Beiler, J. S., Makova, K. D., Marini, M. E., Hess, L. B., Rzucidlo, S. E., Verdiglione, N., and Mindell, J. A. 2014The Intervention Nurses Start Infants Growing on Healthy Trajectories (INSIGHT) studyBiological factors[ ]
Poulain, T., Baber, R., Vogel, M., Pietzner, D., Kirsten, T., Jurkutat, A., Hiemisch, A., Hilbert, A., Kratzsch, J., and Thiery, J. 2017The LIFE Child study: a population-based perinatal and pediatric cohort in GermanyBiological factors[ ]
Regber, S., Dahlgren, J., and Janson, S. 2018Neglected children with severe obesity have a right to health: Is foster home an alternative?—A qualitative studySocial and parental factors[ ]
Russell, C. G., and Russell, A. 2018Biological and psychosocial processes in the development of children’s appetitive traits: Insights from developmental theory and researchBiological, social and psychological factors[ ]
Suder, A., and Chrzanowska, M. 2015Risk factors for abdominal obesity in children and adolescents from Cracow, Poland (1983–2000) Biological, social and psychological factors[ ]
Van De Beek, C., Hoek, A., Painter, R. C., Gemke, R. J., Van Poppel, M. N., Geelen, A., Groen, H., Mol, B. W., and Roseboom, T. J. 2018Women, their offspring and improving lifestyle for better cardiovascular health of both (WOMB project): A protocol of the follow-up of a multicenter randomized controlled trialBiological, social and parental factors[ ]
Vedanthan, R., Bansilal, S., Soto, A. V., Kovacic, J. C., Latina, J., Jaslow, R., Santana, M., Gorga, E., Kasarskis, A., and Hajjar, R. 2016Family-based approaches to cardiovascular health promotionBiological and parental factors[ ]
Zhang, Y., Hurtado, G. A., Flores, R., Alba-Meraz, A., and Reicks, M. 2018Latino fathers’ perspectives and parenting practices regarding eating, physical activity, and screen time behaviors of early adolescent children: Focus group findingsParental factors[ ]
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Karakitsiou, G.; Plakias, S.; Christidi, F.; Tsiakiri, A. Unraveling Childhood Obesity: A Grounded Theory Approach to Psychological, Social, Parental, and Biological Factors. Children 2024 , 11 , 1048. https://doi.org/10.3390/children11091048

Karakitsiou G, Plakias S, Christidi F, Tsiakiri A. Unraveling Childhood Obesity: A Grounded Theory Approach to Psychological, Social, Parental, and Biological Factors. Children . 2024; 11(9):1048. https://doi.org/10.3390/children11091048

Karakitsiou, Georgia, Spyridon Plakias, Foteini Christidi, and Anna Tsiakiri. 2024. "Unraveling Childhood Obesity: A Grounded Theory Approach to Psychological, Social, Parental, and Biological Factors" Children 11, no. 9: 1048. https://doi.org/10.3390/children11091048

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Framing Collective Moral Responsibility for Climate Change: A Longitudinal Frame Analysis of Energy Company Climate Reporting

  • Original Paper
  • Open access
  • Published: 26 August 2024

Cite this article

You have full access to this open access article

qualitative research nature

  • Melanie Feeney 1 ,
  • Jarrod Ormiston 2 ,
  • Wim Gijselaers 1 ,
  • Pim Martens 3 &
  • Therese Grohnert 1  

389 Accesses

Explore all metrics

Responding to climate change and avoiding irreversible climate tipping points requires radical and drastic action by 2030. This urgency raises serious questions for energy companies, one of the world’s largest emitters of greenhouse gases (GHGs), in terms of how they frame, and reframe, their response to climate change. Despite the majority of energy companies releasing ambitious statements declaring net zero carbon ambitions, this ‘talk’ has not been matched with sufficient urgency or substantive climate action. To unpack the disconnect between talk and action, this paper draws on the literature on framing, organisational hypocrisy, and collective moral responsibility. We conduct a longitudinal qualitative content analysis of the framing of climate change used by the ten largest European investor-owned energy companies and the actions they have taken to shift their business practices. Our findings reveal three main categories of energy companies: (i) deflecting, (ii) stagnating, and (iii) evolving. We show key differences in the relationship between framing and action over time for each category, revealing how deflecting companies have larger and persistent gaps between green talk and concrete action and how stagnating companies are delaying action despite increased green talk, while evolving companies exhibit a closer link between talk and action that tends to be realised over time. Our analysis reveals how competing approaches to framing collective moral responsibility help understand the trajectories of talk and action across the different categories of energy companies. This research makes several contributions to the literature on organisational hypocrisy and collective moral responsibility in the context of climate change. Our analysis highlights the complex relationship between collective moral responsibility, organisational hypocrisy and climate action, revealing how different collective framings—diffuse, teleological, or agential—can both enable and offset substantive climate action. The study also enriches our understanding of the performative nature of collective moral responsibility by examining its temporal dimensions and showing how an agential, backward-looking focus is associated with more meaningful climate action.

Similar content being viewed by others

qualitative research nature

Climate change countermovements and adaptive strategies: insights from Heartland Institute annual conferences a decade apart

qualitative research nature

Accelerating Climate Action Beyond Company Gates

qualitative research nature

Explore related subjects

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

Introduction

The energy sector remains one of the world’s largest emitters of greenhouse gases (GHGs), with electricity and heat production responsible for almost half of the world’s GHGs in 2014 (Ritchie & Roser, 2019 ; United Nations, 2019 ). In 2020, two of the world’s largest energy providers, Royal Dutch Shell and British Petroleum (BP), released statements declaring their net zero carbon ambitions by 2050 (Shell Ambrose, 2020 ; Global, 2020 ). However, recent studies suggest that responding to climate change and avoiding irreversible climate tipping points requires drastic action by 2030, not 2050 (Liu et al., 2019 ; Steffen et al., 2015a , 2015b ). Acknowledging this urgency, the European Union recently committed to fighting climate change through higher renewable energy targets by 2030, aiming to source 42.5% of its energy from renewable sources such as wind and solar (Reuters, 2023 ).

This increased urgency raises challenging questions for energy companies in terms of how they frame, and reframe, their response to climate change (Campbell et al., 2019 ; Cornelissen & Werner, 2014 ), and whether this framing can be matched by the radical transformation needed in their business models and actions. To understand this transformation, this paper explores how energy companies are framing their responses to climate change and their actions to shift their business practices. In doing so, we engage with the growing field of scholarship exploring the role of framing in justifying climate change responses and legitimising sustainability strategies (Hahn & Lulfs, 2014 ; Metze, 2018 ; Nyberg & Wright, 2006 ; Nyberg et al., 2018 ; Wright & Nyberg, 2017 ).

Despite the increase in talk about sustainable action and engagement with frames to discuss responses to climate change, we as a society are trending towards overstepping multiple environmental limits and planetary boundaries (O’Neill et al., 2018 ; Steffen et al., 2015a , 2015b ). This dissonance is matched in corporate sustainability reporting, where the expansion of sustainability talk has not been matched with sufficient sustainability action (Cho et al., 2015 ; Higgins et al., 2020 ; Milne & Gray, 2013 ). Understanding this disconnect between talk and action is crucial in ensuring the energy sector moves beyond discursive strategies to seek legitimacy, towards genuine climate action that will contribute to a just transition (Banerjee, 2012 ; Christensen et al., 2021 ). To unpack this potential disconnection between frames, decisions and action, we draw on the literature on organisational hypocrisy and collective moral responsibility.

Organisational hypocrisy aims to explain the discrepancies between the talk and actions of companies (Brunsson, 2002 ; Wagner et al., 2009 ). In recent years, there has been a growing body of literature that explores hypocrisy in corporate sustainability reporting by comparing symbolic approaches (talk) with substantive approaches (action) (Hyatt & Berente, 2017 ; Rodrigue et al., 2013 ; Schons & Steinmeier, 2016 ). Through a critical lens, the hypocritical gap between symbolic talk and substantive action can be viewed as a duplicitous attempt to conceal unsustainable practices or hide a lack of substantive action (Cho & Patton, 2007 ; Milne & Gray, 2013 ; Hyatt & Berente, 2017 ; Snelson-Powell et al., 2020 ). Alternatively, this hypocrisy may be viewed as inevitable as organisations attempt to juggle competing stakeholders' demands (Brunsson, 1986 , 1993 ; Higgins et al., 2020 ), and may be a signal for future substantive action (Clarkson, et al., 2008 ; Clune & O’Dwyer, 2020 ; Malsch, 2013 ).

To better understand the nature of organisational hypocrisy in energy company responses to climate change, we examine the role of collective moral responsibility in shaping the disconnect between talk and action. Moral responsibility refers to the blameworthiness or praiseworthiness for a particular situation (Bovens, 1998 ). We engage collectivist perspectives of moral responsibility, arguing that organisations may have a collective responsibility to respond to, or bring about, a particular state of affairs (Mellema, 1997 , 2003 ; Soares, 2003 ; Tamminga & Hindriks, 2020 ). In unpacking the role of collective moral responsibility in shaping climate action, we zoom in on the temporal nature of moral responsibility, differentiating between both backward-looking (reactive) and forward-looking (prospective) responsibility (Gilbert, 2006a , 2006b ; Sanbhu, 2012 ; Van de Poel, 2011 ). Backward-looking moral responsibility involves taking on blame for immoral past actions, while forward-looking moral responsibility refers to a sense of obligation to avoid future immoral actions (Sanbhu, 2012 ). We also draw on the work of Collins ( 2019 ) that explores a more nuanced understanding of the ‘collective’, differentiating between diffuse collectives that are loosely described groups of agents such as ‘society’, ‘the private sector’, teleological collectives that are responsive towards each other and act towards commons goals such as ‘the energy sector’, and agential collectives that have well-defined collective-level decision-making procedure such as a specific company, partnership or alliance.

To shed light on the disconnects between talk and action and the role of collective moral responsibility, we conduct a qualitative content analysis of the framing used by Europe’s ten largest investor-owned energy companies over a ten-year period. We review 111 sustainability reports from these energy companies between 2010 and 2019 to understand the evolution of their framing of climate change and the actions they have taken over time. The analysis is guided by the following overarching research questions: “How have energy companies framed their responses to climate change over time?”, “How does their framing relate to climate action?”, and “What is the relationships between different framings of collective moral responsibility and the nature of climate change talk and action?”.

Our analysis of framing and action over time reveals three main categories of energy companies: (i) deflecting, (ii) stagnating, (iii) evolving. Deflecting companies continue to engage in unsustainable business-as-usual practices despite offering some green rhetoric. Stagnating companies are making some progress but seem to be stalling and delaying more radical action despite increased sustainability talk. Evolving companies appear to be progressing towards a more sustainable future and questioning and rethinking their business models. We noticed key differences in the relationship between action and framing over time for each category, with evolving companies having a closer link between talk and action that tends to be realised over time, and deflecting companies having more significant and persistent gaps between green talk and concrete action.

The findings show how competing approaches to framing the nature of collective moral responsibility help to understand the trajectories of talk and action across the different categories of energy companies. As suggested by the data from our study of ten energy companies, deflecting firms seem to evoke a diffuse collective of society, deferring responsibility to other actors, including government and civil society and framing their own moral responsibility in a more forward-looking, prospective way. The companies we classified as stagnating seem to shift from a diffuse collective before framing their role as part of a broader teleological collective of the energy sector, yet remain somewhat vague in terms of their own responsibility for climate action. The companies we classified as evolving, seem to frame their role as an agential collective and acknowledge their own moral responsibility for causing or contributing to climate change. This backward-looking perspective on their moral responsibility appears to be shaping substantive action in the present.

By engaging with theories of collective moral responsibility, our paper contributes to the literature on business ethics and climate change in several ways. We contribute to the literature on business ethics, moral responsibility, and organisational hypocrisy by providing a nuanced understanding of the performative nature of collective moral responsibility (Soares, 2003 ; Tamminga & Hindriks, 2020 ). In doing so, we highlight the diverse ways in which conceptions of the collective as diffuse, teleological, or agential (Collins, 2019 ) are associated with different types of climate talk and action and different levels of organisational hypocrisy. Specifically, we show that agential collectives with a backward-looking sense of responsibility are more likely to engage in substantive action, while diffuse and teleological collectives tend to focus on symbolic talk. We contribute to the broader literature on framing (Cornelissen & Werner, 2014 ) and organisational hypocrisy (Brunsson, 2002 ) by unpacking the relationship between organisational framing of collective moral responsibility and the nature of organisational hypocrisy. We show how agential notions of the collective and backward-looking responsibility are associated with more substantive climate action. This insight extends prior research by highlighting the dynamic interplay between framing, moral responsibility and climate action. We also contribute to a temporal understanding of collective moral responsibility and organisational hypocrisy by adopting a temporal lens that reveals how the understanding of the collective and the direction of responsibility shift over time and how this relates to action and inaction on climate change (Brunson, 1986 , 1993 , 2002 ; Cho et al., 2015 ). Our longitudinal analysis shows the ways in which these shifts are critical for substantive action. Finally, we contribute to practice by highlighting the shifts in collective moral responsibility associated with energy companies becoming more sustainable and authentically engaging in climate action.

Theoretical Background

Frames and sustainability.

Corporate responses to climate change, particularly within the energy sector, require drastic changes to a company’s strategy, operations, and often even their identity (Boons et al., 2013 ; Frandsen & Johansen, 2011 ). Transitioning from a company that has operated as a leader in energy production sourced from fossil fuels to a company that prioritises a carbon–neutral energy mix is a complicated process (Mori, 2021 ). This shift requires companies to rethink what technologies they invest in, the speed at which they make these changes, and how to ensure their workforce is on-board and prepared for the change (Garavan & McGuire, 2010 ; Nisar et al., 2013 ). All of these require managers to make tough choices between long-term and short-term value (Slawinski & Bansal, 2015 ). This transformation requires companies to frame and reframe how they understand their role in terms of climate action. We thereby engage with literature on framing (Cornelissen & Werner, 2014 ) to explore how energy companies engage in meaning-making with regard to climate change and how this relates to their climate change responses and actions.

The construct of frame or framing was first introduced in the 1930s within the social sciences and has since gained popularity in a wide range of research traditions (Cornelissen & Werner, 2014 ), including cognitive psychology and behavioural economics (e.g., Kahneman et al., 1986 ), sociology and social movements (e.g., Fligstein & McAdam, 2011 ), political science (e.g., Barth & Bijsmans, 2018 ), and organisation and management studies (e.g., Gioia & Chittipeddi, 1991 ). According to Goffman ( 1974 ), no action or behaviour can be initiated without some form of framing, that is, making sense of what is going on. Frames are constructed based on past experiences and act as a point of reference for sense-making (Kahneman, 1984 ). Rather than viewing a frame as an isolated or static structure, framing is understood as an interactional and ongoing process of constructing meaning (Dewulf et al., 2009 ). Frames are, therefore, constantly updated and adjusted based on new experiences or information (Dewulf et al., 2009 ; Kahneman, 1984 ). In fact, Nyberg et al. ( 2016 ) argue that any theory of framing must contain time, which underpins our temporal analysis of energy company framing of climate change over a ten-year period.

Framing has been applied in research on cognition, sense-making and decision-making processes (Benner & Tripsas, 2012 ; Walsh, 1995 ; Weick, 1995 ), and in research on organised groups and organisations (Cornelissen & Werner, 2014 ). How an organisation frames its environment and where it sits within that environment is referred to as strategic framing (Gilbert, 2006a , 2006b , Kaplan, 2008 ). A strategic frame refers to “a set of cause-effect understandings about industry boundaries, competitive rules, and strategy-environment relationships available to a group of related firms in an industry” (Nadkarni & Narayanan, 2007 , p.689). Strategic frames, and subsequent decision-making processes, can therefore be influenced by a variety of actors, e.g., shareholders and other stakeholders, or external forces e.g., changing markets, industry trends or changing societal beliefs and values (Battilana et al., 2009 ; Gilbert, 2006a , 2006b ).

Traditionally the greatest forces of influence over corporate sustainability strategies have come from government legislation and regulation and changing market and industry trends (Boons et al., 2013 ; Brønn & Vidaver-Cohen, 2009 ; Christensen et al., 2021 ). Whilst these regulatory and market conditions still greatly influence corporate sustainability strategies, we are now also seeing increased pressure from social actors on companies to act ethically and responsibly (Banerjee, 2008 ; O’Brien et al., 2018 ; Porter & Kramer, 2011 ). As a result, the variety of stakeholder expectations that energy companies must consider, and the regulatory and market environments that they operate in, have become increasingly complex (Banerjee, 2008 ). With this growing complexity has come an increased interest from scholars in corporate sustainability framing and responses (Hahn et al., 2014 ).

Framing and sustainability responses in the energy sector have been a growing area of academic interest in recent years (Schlichting, 2013 , Hahn et al., 2014 ). Studies have sought to understand the frames adopted in political conversations around specific energy technologies like fracking (Metze, 2018 ; Nyberg et al., 2020 ), or the framing of intertemporal tensions in oil companies’ climate change responses (Slawninski & Bansal, 2015 ). These studies have found that how an organisation frames climate change has implications for the types of responses they enact in the short and long-term (Nyberg et al., 2020 ; Slawinski & Bansal, 2015 ). These studies further demonstrate the importance of unpacking energy company framing of climate change to understand current and future action and inaction.

Several theoretical and empirical articles that have contributed to our understanding of energy company framing of sustainability and climate change in recent years (Wright & Nyberg, 2017 , Hahn et al., 2014 , Shlichting, 2013 ). In 2013, Schlichting published an article that looked at the ways different industry actors (including energy sector actors) had framed climate change from 1990 to 2010, their reasoning for adopting each frame, and their strategies for communicating frames. The study revealed dominant frames at three moments of time across the two decades, starting with ‘scientific uncertainty’ from 1990 to the mid-1990s when industry actors questioned the science around climate change. From 1997 to the early 2000s, companies used ‘socioeconomic consequences' frames, where industry actors acknowledged the potential risks of climate change but drew attention to the costs to the company and consumers if they were to act in accordance with the Kyoto Protocol (that was passed in 1997). Finally, from the 2000s to 2010, companies adopted ‘industrial leadership’ frames, where industry actors acknowledged their role in climate change and saw technology as offering a win–win solution to remaining competitive while also responding to the threat of climate change. Whilst Schlichting ( 2013 ) contributes to our understanding of energy company framing of climate change, the article does not consider the specific actions or inactions that are related to each frame.

In a similar study, Wright and Nyberg ( 2017 ) looked at framing as one element of corporate responses to climate change from 2005 to 2015 and concluded that the dominant framing across all companies (including one oil and gas company) for climate change was ‘business case’ framing. Wright and Nyberg ( 2017 ) describe business case framing of climate change as when companies conformed to short-term market conditions and observed that over time, companies would regress toward traditional business concerns, i.e., profit maximisation. The authors offer some examples of how energy company framing aligns with actions in response to climate change, i.e., investment in renewable energy projects and greater attention given to potential regulatory changes, however, due to the diversity of companies included in the study, these examples are limited.

Finally, Hahn et al. ( 2014 ) identify the business case as a dominant frame in their review study of managers’ responses to sustainability. However, the authors positioned business case frames on a continuum with ‘paradoxical’ frames on the opposing end. Paradoxical frames capture a more developed understanding and appreciation for the tensions between social, environmental, and economic aspects of sustainability by managers and are more aligned with more radical, albeit slow, responses to sustainability issues (Hanh et al., 2014 ). Given that the Hahn et al. ( 2014 ) article is a review paper, the focus is largely theoretical and does not specifically observe the relationship between frames and actions.

Our study builds on previous research by focusing specifically on energy companies and attempting to understand the relationship between frames and actions. Previous research has paid limited attention to the relationship between climate change frames and actions adopted by energy companies (for example, Schlichting, 2013 , Hahn et al., 2014 ). Understanding the relationship between frames and action is important, as while frames are often viewed as causal mechanisms for shaping decisions and action, broader research on sustainability reporting highlights the pervasive disconnects between sustainability talk and action (Cho et al., 2015 , Higgins et al., 2020 , Hyatt & Berente, 2017 , Rodrigue et al., 2013 , Schons & Steinmeier, 2016 ). Our paper thereby aims to build on previous research by taking a more critical and nuanced stance in reviewing frames and sustainability reports that question the links between frames and action. In the following section, we introduce the literature on organisational hypocrisy (Brunsson, 2002 ; Wagner et al., 2009 ) to help unpack the potential for disconnects between symbolic talk and substantive action.

Symbolic Talk, Substantive Action, and Organisational Hypocrisy

Research on sustainability reporting suggests a persistent gap between talk and action in how corporations are responding to sustainability challenges (Cho et al., 2015 ; Higgins et al., 2020 ). Fassin and Buelens ( 2011 ) highlight this dissonance between sustainability rhetoric and actual business practices noting that the “idealism of corporate communication contrasts sharply with the reality of day-to-day business life” (pp. 586–587). To better understand the disconnect between sustainability talk and action in energy company responses to climate change, we engage with the literature on organisational hypocrisy. Organisational hypocrisy refers to the disconnect between talk and action (Brunsson, 2002 ; Wagner et al., 2009 ), as evidenced by “the distance between assertions and performance” (Fassin & Buelens, 2011 , p. 587). This literature on organisational hypocrisy is underpinned by early work in institutional theory that explores how organisations engage in myth and ceremony as they decouple talk from action in order to gain legitimacy (Bromley & Powell, 2012 ; Crilly et al., 2012 ; Meyer & Rowan, 1977 , Oliver, 1991 ). This research has tended to look at how talk is decoupled from action at particular moments in time, with insufficient exploration of the relationship between talk and action over time (Reinecke & Lawrence, 2023 ).

Significant literature on corporate sustainability and corporate social responsibility has explored organisational hypocrisy by comparing symbolic approaches (green talk) with substantive approaches (green action) (Hyatt & Berente, 2017 ; Rodrigue et al., 2013 ; Schons & Steinmeier, 2016 ). Substantive approaches involve meaningful ‘actions’ that shift practices to prioritise improved environmental performance (Hyatt & Berente, 2017 ; Sharma & Vredenburg, 1998 ). Ashforth and Gibbs ( 1990 ) define substantive approaches as those that involve “real, material changes in organizational goals, structures, and processes or socially institutionalized practices.” (p. 178). Substantive approaches thereby require tangible, observable shifts in organisational activities and resource use (Schons & Steinmeier, 2016 ).

Symbolic approaches refer to ‘talk’ that creates an appearance of commitment to sustainability without necessarily shifting organisational practices (Donia & Sirsly, 2016 ; Hyatt & Berente, 2017 ). Companies often engage in symbolic talk to enhance their reputation or to increase their legitimacy in the eyes of certain stakeholders (Ashforth & Gibbs, 1990 ; Elsbach & Sutton, 1992 ). Symbolic approaches can be viewed as ceremonial conformity to the demands of influential stakeholders without the actual changes to activities (Meyer & Rowan, 1977 ; Oliver, 1991 ). The goal of engaging in purely symbolic talk is often to deflect or conceal relatively poor environmental performance (Cho et al., 2010 ).

There are significant debates about the linkages between symbolic talk and substantive action and the implications of organisational hypocrisy for sustainability action over time (Rodrigue et al., 2013 ). From a critical perspective, the hypocritical gap between talk and action is viewed as an attempt to conceal continued poor environmental performance, recast unsustainable practices in a more positive light, or obscure a lack of substantive action (Cho & Patton, 2007 ; Milne & Gray, 2013 ; Hyatt & Berente, 2017 ). Some studies adopt a more positive lens on the disconnect between talk and action, suggesting that symbolic talk in the form of extensive environmental disclosures can be a signal for future substantive action on environmental issues (Clarkson, et al., 2008 ; Clune & O’Dwyer, 2020 ; Malsch, 2013 ). This stream of research suggests the potential for hypocrisy to play an aspirational role, as discrepancies between talk and action may serve to stimulate improvements in sustainability performance over time, even when companies do not meet their aspirations (Christensen et al., 2013 ).

Research on organisational hypocrisy also reveals competing perspectives on the nature of intentionality and duplicity associated with the disconnect between talk and action. Organisational hypocrisy is often used to describe situations in which companies have intentionally presented themselves in a way that does not reflect the underlying reality (Higgins et al., 2020 ; Laufer, 2003 ). This form of hypocrisy is viewed as duplicitous, where the intention is to deceive certain parties (Snelson-Powell et al., 2020 ). This duplicitous form of hypocrisy is echoed in the literature on greenwashing and other forms of unethical management practices (Delmas & Burbano, 2011 ; Laufer, 2003 ; Lyon & Montgomery, 2015 ). An alternative perspective views organisational hypocrisy as an inadvertent and inevitable response for organisations attempting to juggle competing demands and expectations in their broader environment (Higgins et al., 2020 ). Through this lens, organisations might construct conflicting ideologies and hypocritical talk and decisions in order to garner support and legitimacy in the face of incompatible demands (Brunsson, 1986 ) In this sense, organisations might engage in hypocrisy in an attempt to isolate competing stakeholder ideas and pressures from action, resulting in actions that are difficult to justify being compensated by talk in the opposite direction (Brunsson, 1993 ).

To unpack the evolution of talk and action over time, we draw on Dyllick and Muff’s ( 2016 ) article that introduces a typology of business sustainability actions and responses. The article provides examples of common sustainability-related strategies, the actions that support these strategies, and four levels of business sustainability, i.e., business-as-usual, sustainability 1.0, sustainability 2.0, and sustainability 3.0. We detail how the Dyllick and Muff ( 2016 ) framework was used as a starting point for analysing the frames and actions of energy company sustainability reports in the methods section.

To better understand the nature of, and reasons for, organisational hypocrisy in energy company responses to climate change, we now turn to the literature on collective moral responsibility as a lens to explore disconnects between talk and action.

Collective Moral Responsibility

To understand the relationship between climate change talk and action over time, we engage with the literature on collective moral responsibility. Moral responsibility refers to the blameworthiness or praiseworthiness for a certain state of affairs (Bovens, 1998 ). For moral responsibility (blame or praise) to be ascribed to an agent, they need to have autonomy, intentionality, and contextual knowledge, and there needs to be a direct or direct causal connection between the agent and the outcome (Constantinescu & Kaptein, 2015 ).

There are multiple philosophical and ethical debates between collectivist and individualist approaches of understanding moral responsibility (Miller & Makela, 2005 ; Soares, 2003 ). In this paper, we align with collectivist approaches to moral responsibility, which argue that a collective may have a responsibility to bring about a certain state of affairs and that while no individual might be individually responsible, they have an obligation as a member of the collective (Mellema, 1997 , 2003 ; Tamminga & Hindriks, 2020 ). This collective view has been adopted by business ethics scholars, who argue that this broader collective perspective on moral responsibility is needed to ensure corporations and organisations take into consideration the needs and interests of society (Soares, 2003 ). Through this lens, responsibility for a situation can be ascribed to the corporation and the individual members or both (Constantinescu & Kaptein, 2015 ). Corporations should thereby be viewed as intentional actors capable of responding to internal and external challenges (Soares, 2003 ).

Moral responsibility can be both backward and forward-looking (Gilbert, 2006a , 2006b ; Sanbhu, 2012 ; Van de Poel, 2011 ). Forward-looking moral responsibility is concerned with obligations to prevent future immoral actions, whereas backward-looking moral responsibility is concerned with the blameworthiness of immoral actions in the past (Sanbhu, 2012 ). Gilbert ( 2006a , 2006b ) describes the related yet distinct nature of backward-looking and forward-looking moral responsibility as follows: “Though we are not morally responsible for what happened, we are morally responsible for ameliorating its effects.” (p.94). Van de Poel ( 2011 ) outlines five normative notions of moral responsibility, three of which are backward-looking (accountability, blameworthiness and liability) and two which are forward-looking (responsibility as virtue and as moral obligation). In this sense, backward-looking responsibility involves seeing oneself as accountable or to blame for past actions. Whereas forward-looking responsibility is associated with future actions involved in seeing “to it that something is the case” rather than taking the blame for actions in the past.

Prior research is often ambiguous about the nature of the collective when exploring moral responsibility, and often uses backward-looking and forward-looking responsibility interchangeably. To provide a more nuanced perspective on the role of collective moral responsibility in shaping responses to climate change in the energy sector, we draw on the work of Collins ( 2019 ) which encourages a more nuanced understanding of the ‘collective’ and the temporal nature of moral responsibility. Collins ( 2019 ) suggests three forms of collective: diffuse, teleological and agential. As explained in the table below, diffuse collectives are loosely described groups of agents such as ‘society’, ‘humanity’, ‘the private sector’, teleological collectives that are responsive towards each other and act towards commons goals such as ‘the fossil fuel lobby’, ‘the energy sector’, and agential collectives that have well-defined collective-level decision-making procedures such as a specific company, partnership or alliance. Collins ( 2019 ) also differentiates between two forms of moral responsibility: backward-looking or reactive and forward-looking or prospective. These perspectives help to understand whether collectives are taking blame or praise for past actions or future obligations. Table 1 provides a description and example of each of these forms of collective and moral responsibility.

Methodology

We conducted a qualitative content analysis of ten European energy companies’ sustainability reports to determine how they are framing their responses to climate change and the related actions they have taken to shift their business practices. Qualitative content analysis is “a research method for the subjective interpretation of the content of text data through the systematic classification process of coding and identifying themes or patterns” (Hsieh and Shannon, 2005 , p. 1278). Qualitative content analysis was chosen as it allows for a more contextual and circumstantial understanding of texts communicated by companies, rather than quantitative approaches that focus on the frequency of the texts or words used (Mayring, 2000 , 2010 ). Qualitative content analysis has been a widely used approach for analysing corporate sustainability reports (see for example, Boiral et al., 2019 ; Boiral, 2016 ; Hahn & Lulfs, 2014 ) and is viewed as an important method in business ethics research to understand talk and action (Cowton, 1998 ; Lock & Seele, 2015 ). In the following section, we detail the case selection, materials, and methods of content analysis that informed our findings.

Case Selection and Material

The sample consisted of sustainability reports (or Corporate social responsibility (CSR) reports or environment reports) from the ten Footnote 1 largest investor-owned European energy companies (see Table  2 ). The ten energy companies were selected based on the S&P Global Platts Top 250 companies based on their “asset worth, revenues, profits and return on invested capital” (S&P Global, 2020 , p. 3). We chose companies specifically within the European Union (at the time of reporting) to ensure the companies shared the same regulatory environment. Consequently, we excluded the Norwegian company Equinor ASA from the case selection. Additionally, we chose to focus on investor-owned energy companies as many of the largest state-owned energy companies did not publicly list their sustainability data and reports. We note that this lack of data on state-owned companies requires further study given their substantial environmental impact. Table 2 lists the ten companies included in the analysis in order of where they ranked in the S&P Global list of energy companies. The table also shows the country in which they are headquartered, and the number of reports included in the analysis.

For each of the ten selected companies, we then checked whether they were listed on relevant sustainability rankings. It was found that three of the selected companies had been listed on the Carbon Majors database, a global list of companies responsible for the largest amounts of carbon and methane emitted into the atmosphere (Climate Accountability Institute, 2017 ): Royal Dutch Shell, Total and BP. Three were listed on the Global Corporate Knights index, an independently organised ranking of companies based on their sustainability performance (Scott, 2020 ): Enel, Iberdrola and Ørsted. The remaining four companies were not listed on either ranking, including E.ON, Eni, OMV and Repsol. This range in sustainability performance across the ten companies ensured a rich and diverse case selection for exploring our research questions.

We collected PDF versions of each company’s publicly accessible sustainability reports from 2010 to 2019. We chose not to include reports from 2020 due to the potential impact of the COVID-19 pandemic on our analysis. In some instances, sustainability reports were not published for the full timeframe of interest. In these cases, company annual reports were analysed for climate change framing and actions. Similarly, several companies published multiple climate-related reports in the same year, for example, Eni published a ‘Decarbonization report’ and ‘Sustainability report’ in 2017. To ensure that an accurate interpretation of the company’s framing of climate change was captured, all available climate-related reports were included in the analysis. This resulted in a total of 111 reports. As the focus of the study was on climate change, a decision was also made to exclude sections of the sustainability reports not relevant to climate change, specifically some elements of the ‘social’ pillar of sustainability, e.g., ‘working with communities’, and ‘diversity’ as these issues were more closely related to employment and workforce matters rather than core operations.

Data Analysis

We applied a combination of Mayring’s ( 2014 ) step models of deductive and inductive approaches to analysing qualitative data. Our analysis has four main stages (i) coding climate talk and action; (ii) comparing talk and action over time (i.e. organisational hypocrisy); (iii) coding framing of collective moral responsibility; (iv) analysing linkages between organisational hypocrisy and collective moral responsibility.

Stage 1 – Coding talk and action

The first stage of analysis involved coding climate change talk and action in each report. A mix of climate talk and action was coded using Atlas.ti qualitative analysis software. We inductively coded talk and action regarding climate change which led to the emergence of the following dominant categories of codes: climate crisis, competitor mindset, external dialogue, governance, innovation and technology, policy and compliance, positioning, reporting, research and development, shared decision-making, strategy, sustainability goals, temporality, tension between actors, values prioritised.

We then separated and categorised talk from action in each report according to the following four categories that were derived from the Dyllick and Muff ( 2016 ) Business Sustainability Typology (BST). We thereby provided an overall rating for both action and framing for each report according to the following levels.

0.0 – Business as usual: : where companies prioritise financial outcomes and value creation for shareholders with limited focus on sustainability actions.

1.0 – Sustainability as risk management and compliance : where companies take some actions toward sustainability in response to pressure from external stakeholders, viewing sustainability actions as a form of risk management or compliance.

2.0 – Sustainability as multiple value creation: where companies begin attending to multiple forms of value creation (social/cultural, environmental, economic value) and develop defined goals and actions to address sustainability issues.

3.0 – Sustainable transformation: where companies aim to utilise their capabilities and expertise for the purpose of addressing pressing societal challenges such as climate change and enact actions to intentionally generate a positive impact on the world.

In some cases, we coded action or framing as between levels and thereby used 1.5 or 2.5 as the rating. Table 3 provides a description of each of those codes and representative quotes of both action and framing for each level of sustainability.

Stage 2 – Comparing talk and action over time (i.e. organisational hypocrisy)

Following our coding of climate talk and action according to the levels derived from the Dyllick and Muff ( 2016 ) typology, we then explored the relationship between talk and action for each company over the ten-year period. Plotting the shifts in climate talk and action over time allowed us to visualise the evolution of sustainable action from each energy company, as well as visualising the gap between talk and action (i.e. organisational hypocrisy). This analysis allowed us to ascertain different categories of energy companies.

Through this temporal analysis, we identified three categories of energy companies. The first category, Shell and BP, has the largest gaps between talk and action, especially at the beginning of the decade, and were the least progressed in terms of their level of sustainability, only reaching the Sustainability 1.0 level. The second category, Total, Eni, Enel, Repsol, OMV, has made some progress towards the Sustainability 2.0 level but at a relatively slow pace, with action remaining about half a step behind action throughout the decade. The third category, Ørsted, EON, Iberdrola, had made the most progress towards the Sustainability 3.0 level, with framing more closely linked to action, and with action eventually matching up to the talk. The figures presented in the findings section visualise the development of climate talk and action over the decade for each of the companies and the combined development for each category.

Stage 3 – Coding framing of collective moral responsibility

In the next phase of analysis, we sought to understand whether overarching approaches to framing climate change were shaping the nature of climate talk and action. Through a process of connecting, merging and subdivision of codes, we unpacked four overarching frames for conceptualising the role of energy companies in addressing climate change: ‘moral responsibility’, business case’, ‘technological’ and ‘disclosure’. We observed that energy companies did not simply adopt one frame but often adopted multiple frames to communicate and motivate their climate change responses. We observed that the ‘moral responsibility’ framing was the most relevant in differentiating between the three different categories. We thereby decided to recode our data to draw out a more nuanced understanding of how energy companies were framing the nature of moral responsibility.

In this phase of the analysis, we deductively coded the reports drawing on Collins’s ( 2019 ) theorisation of collective moral responsibility. We coded each report for three forms of the ‘collective’: diffuse, teleological, and agential. We also coded two forms of ‘moral responsibility’: backward-looking/reactive and forward-looking/prospective. The following table provides a description and representative quotes for each of these forms of collective and moral responsibility (Table  4 ).

Stage 4 – Linkages between organisational hypocrisy and collective moral responsibility

In the final stage of analysis, we explored the linkages between the framing of collective moral responsibility and the nature of talk and action by comparing the framing of the collective and the temporal nature of moral responsibility for each of the three categories. We observed that the first category tended to refer to a more diffuse collective and frame moral responsibility in a forward-looking manner. We named this category ‘Deflecting’ as they appear to shift blame and responsibility towards a diffuse collective of government, industry and broader societal actors which results in a larger gap between climate talk and action.

Alternatively, the third category tended to refer to a more agential collective over time as they began to take on blame for causing or contributing to climate change through a more backward-looking understanding of moral responsibility. We named this category ‘Evolving’ given that both their climate talk and action tended to improve over time as they adopted a more agential responsibility.

In analysing the remaining category, which sat between the deflecting and evolving groups, we noted that they had shifted more towards a teleological collective over time as they began to acknowledge the role of the energy sector in addressing climate change. This category was less specific about their own responsibility compared to the evolving group, and appeared to have stalled somewhat in their climate action over time. We named this category ‘Stagnating’, as a reflection of their slow movement towards more sustainable climate action.

Table 5 provides an overview of the three categories, the links between talk and action for each category, and the nature of collective moral responsibility for each category.

Our analysis of actions and framing over time revealed three main categories of energy companies: (i) Deflecting (ii) Stagnating, (iii) Evolving.

Deflecting companies largely maintain business as usual despite some green talk/rhetoric. These companies focus on staying compliant with regulations and ensuring awareness of changing societal standards and expectations. Deflecting companies are slow to adopt targets for emissions and energy intensity, choosing to focus more on targets for investment.

Stagnating companies seem to be stalling and delaying more radical action. While they tend to set clear emissions and energy intensity targets, progress in meeting these targets is slow. This is largely due to the fact that they focus on the easier wins that come by saving costs and waste through efficiencies. Stagnating companies also tend to invest in new renewable technologies, however, this is often framed as an opportunity to diversify their portfolio rather than radically transform their business models away from fossil fuels.

Evolving companies are progressing towards a more sustainable future and rethinking their business models. Evolving companies category bold emissions and energy intensity targets that they often meet and exceed, this is largely achieved by a combination of efficiency technologies and going beyond investment in renewables to diversify their portfolio and moving their entire business strategy away from fossil fuels.

We noticed meaningful differences in the relationship between action and framing over time for each category, with evolving companies having a closer link between talk and action and deflecting companies having larger gaps between green talk and concrete action.

We also observed competing approaches to framing the nature of collective moral responsibility. Deflecting firms seem to evoke a diffuse collective and frame moral responsibility in a more forward-looking, prospective way. Stagnating companies seem to engage in teleological moral responsibility over time but are somewhat vague. Evolving companies seem to frame their role as an agential collective and acknowledge a backward-looking moral responsibility for climate change.

Deflecting Companies

Deflecting firms like Shell and BP appear to maintain a largely business-as-usual approach to their activities despite increasing green talk and rhetoric. Figure  1 below illustrates how these companies enacted limited shifts in their climate actions over the decade despite increasing rhetoric. The trajectories highlight that these companies had the largest gaps between talk and action, especially earlier in the decade.

figure 1

The relationship between talk and action for deflecting companies

As shown in the figures, deflecting companies exhibited the largest gaps between climate talk and action and evidenced the least progression in terms of substantive climate action. Deflecting companies tended to prioritise the growth of their existing fossil fuel-reliant business models over climate outcomes. Over the decade, they appear to justify their continued use of fossil fuels by the need to provide reliable energy to a growing global population of consumers. Using notions of ‘security of supply’ and ‘energy for all’ to position themselves as the solution to the growing demand for energy in developing and emerging economies and argue that renewable energies are not reliable or abundant enough to achieve this:

We have an important role to play in finding much needed resources of oil and gas to meet the growing energy demand. (BP, 2013 Report) The Arctic could be essential to meeting growing demand for energy in the future. It holds as much as 30% of the world’s undiscovered natural gas and around 13% of its yet-to-find oil, according to the U.S. Geological Survey. (Shell, 2010 Report) The world needs to produce enough energy to keep economies growing, while reducing the impact of energy use on a planet threatened by climate change. Shell works to help meet rising energy demand in a responsible way. That means operating safely, minimising our impact on the environment and building trust with the communities who are our neighbours (Shell, 2012 Report)

As illustrated in the following quotes, these energy companies will often prefix their climate commitments by first demonstrating the ways in which they will maximise shareholder returns in future energy scenarios, with climate change presented as an opportunity to increase profits or as a threat that must be dealt with to protect future profits. In these reports early in the decade, we see limited accountability or blame taken on by these companies.

BP’s objective is to create value for shareholders by helping to meet the world’s growing energy needs safely and responsibly (BP, 2011 Report) We are taking steps to prepare for the potential physical impacts of climate change on our existing and future operations. Projects implementing our environmental and social practices are required to assess the potential impacts to the project from the changing climate and manage any significant impacts identified. (BP, 2011 Report)

The actions of these deflecting companies highlight their support for a gas-led transition, offering gas as a cleaner fossil fuel to other alternatives like coal and oil. However, energy companies also use advancements in fossil fuel technologies to position other, more polluting, fossil fuels as ‘clean’.

Our approach to helping to tackle global CO2 emissions focuses on four main areas: producing more natural gas, helping to develop carbon capture and storage, producing low-carbon biofuel and working to improve energy efficiency in our operations (Shell, 2011 Report) We believe that, to meet global climate goals, the world should prioritize: Reducing emissions rather than promoting any one fuel as the answer. The world will need all forms of energy for a long time to come, so we need to make all fuels cleaner. (BP, 2017 Report)

Overall, it is clear from the way that these deflecting companies talk throughout the decade that they are not questioning their underlying business model, and not engaging in any forms of reactive or backward-looking responsibility.

We are producing almost as much cleaner-burning natural gas as oil, producing low-carbon biofuel, helping to develop carbon capture and storage (CCS) technologies, and putting in place steps to improve our energy efficiency. (Shell, 2012 report)

The disconnect between talk and action for deflecting companies was evidenced by the dissonance between their framing around meeting the needs of society while showing that their actions were still focused on business-as-usual practices. For example, both Shell and BP spoke of their desire to focus on the needs of society:

In 2017, we announced our ambition to cut the net carbon footprint of the energy products we provide by around half by 2050 in step with society’s drive to align with the goals of the Paris Agreement. (Shell, 2017 report) Today’s challenge is to manage and meet growing demand for secure, affordable energy while addressing climate change and other environmental and social issues. (BP, 2012 report)

Yet, both companies make clear that they will only act where it makes commercial sense. Or note that they continue with business as usual until climate inaction presents a financial risk:

Shell is a willing and able player in this transition. We will play our role where it makes commercial sense, in oil and gas, as well as in low-carbon technologies and renewable energy sources. (Shell, 2017 report) Even under the International Energy Agency’s most ambitious climate policy scenario (the 450 scenario a), oil and gas would still make up 50% of the energy mix in 2030...This is one reason why BP’s portfolio includes oil sands, shale gas, deepwater oil and gas production, biofuels and wind. (BP, 2012 report)

For deflecting companies, the distance between talk and actions is achieved by talking about what they want to do rather than substantiated action:

We want to help the world reach net zero and improve people’s lives and can only do this by being a safe, focused, responsible, well-governed and transparent organization. (BP, 2019 report)

Or by leading reports with ‘cherry-picked’ data that presents an incomplete story of their actions. For example, in the quote below, Shell draws attention to their success in improving energy intensity across their operations, despite the fact that their overall direct emissions increased in the same year. These deflections can be seen as attempts by Shell to avoid blame or accountability for past actions.

In 2014, we continued to improve our energy intensity (the amount of energy consumed for every unit of output). This is the result of work within our operations to improve the reliability of equipment and undertake energy efficiency projects. (Shell, 2014 report)

Our analysis suggests that deflecting companies seem to evoke a diffuse collective and frame moral responsibility in a more forward-looking, prospective way. The following quotes capture their framing of the diffuse collective of ‘businesses, governments and civil society’ and ‘society as a whole’ when discussing who is responsible for climate action:

Tackling climate change remains urgent and requires action by governments, industry and consumers. (Shell, 2010 report) Climate change is a major global challenge—one that will require the efforts of governments, industry and individuals (BP, 2010 report) Governments and civil society must work together to overcome the challenges of climate change and the energy-water-food stresses. We are encouraging this collaboration. (Shell, 2012 report)

These deflecting firms explicitly do not take responsibility as an agential collective, deferring to the broader diffuse notions of collective responsibility:

The scale of the global challenges that the world faces is too great for one company, or one sector, to resolve. (Shell, 2013 report) No one company or sector alone can deliver a low-carbon future. Everyone, from consumers to corporations to governments, needs to take responsibility. (BP, 2017 report)

The focus on the role of the diffuse collective remains at the end of the decade, despite the acknowledgement of the increased urgency:

In 2019, demands for urgent action on climate change grew ever louder. All of society, from consumers, to businesses, to governments, recognised the need to accelerate global efforts to reduce greenhouse gas emissions. (Shell, 2019 report) Of course, the task of tackling climate change is bigger than any single company. Everyone on the planet, from consumers, to businesses, to governments, must play their part in reducing greenhouse gas emissions. Everyone must work together. (Shell, 2019 report) A shared challenge. To meet the Paris goals, we believe the world must take strong action on a range of fronts (BP, 2018 Report)

Overall, we observed deflecting companies’ tendency to talk about the climate action they will take in the future, with a tendency to talk about what they want to do, rather than what they have been doing. Moral responsibility is thereby considered in a forward-looking, prospective sense:

In 2017, we announced our ambition to cut the net carbon footprint of the energy products we provide by around half by 2050 in step with society’s drive to align with the goals of the Paris Agreement. (Shell, 2017 report) We have set out our strategy for the coming decades, integrating our ambition to be a safe, strong, successful business with our aspiration to be a good corporate citizen and part of the solution to climate change. (BP, 2016 report) We want to help the world reach net zero and improve people’s lives and can only do this by being a safe, focused, responsible, well-governed and transparent organization. (BP, 2019 report)

Stagnating Companies

The category of stagnating companies, which in our study was found to be represented by Total, Eni, Enel, Repsol, and OMV, appear to be somewhat stalled in their attempts to enact more radical sustainability action. As shown in Fig.  2 below, despite early aspirations at the beginning of the decade, these stagnating companies were relatively slow in shifting their activities over the decade.

figure 2

The relationship between talk and action for stagnating companies

While these companies are setting clear emissions and energy intensity targets, their progress toward meeting these targets is relatively slow. This is largely due to the fact that they focus on the easier wins that come by saving costs and waste through efficiencies. For example, the quote below from Total acknowledges the commitments made as part of the Paris Climate Agreement’s goal of remaining within 2 °C of global temperature increase from pre-industrial levels but aims to do this by making oil and gas more efficient rather than shifting their business model away from fossil fuels.

Under the 2 °C scenario, oil and gas will still make up almost 50% of the primary energy mix at that time. So yes, of course, we will still be an oil and gas major, meeting this demand. But our ambition is to put our talent to work to become the leader in responsible oil and gas, while also ramping up renewables. (Total, 2016 report)

Similarly, Eni draws attention to the reductions in GHG emissions they have achieved in their activities since 2010, while maintaining their conventional asset portfolio. Rather than signalling a shift away from a fossil fuel-based business model, they instead focus future reductions on increasing energy efficiencies. Similar to the deflecting companies, these stagnating companies appear to take limited blame or accountability for how their actions might have shaped the current state of the climate.

Our organic growth is based on a conventional asset portfolio. Since 2010, we have reduced our GHG emissions by 28%. In the future, we aim at a further reduction of 43% in our upstream emissions index, by decreasing flaring and fugitive methane emissions and increasing energy efficiency. (Eni, 2015 report)

Stagnating companies also tend to invest in new renewable technologies, however, this is often framed as an opportunity to diversify their portfolio rather than radically transform their business models away from fossil fuels.

Our ambition is summed up by the motto “20% in 20 years.” We want to make low-carbon businesses a genuine and profitable growth driver accounting for around 20% of our portfolio in 20 years’ time. (TOTAL, 2016 report)

The gap between talk and action for stagnating companies is less pronounced than for deflecting companies, but overall it tends to remain about half a step behind. For example, the below quote from Repsol shows that the company has clearly defined targets around reductions in C02 emissions, actions that could be aligned with sustainability 2.0 framing:

At Repsol, we are committed to the fight against climate change, which is reflected in the company’s new Strategic Plan 2016-2020. In this sense, we have set a goal to reduce CO2 emissions by 22% over the 2011- 2020 period when compared to 2010, and currently we have already reduced emissions by more than 15%. (Respol, 2015 report)

Whilst the company states that they are committed to fighting climate change and are on track toward meeting their defined emissions targets, stating they have already reduced 15% of their 22% target, they are also found to be taking actions that contradict these claims in the acquisition of Talisman Energy, a large oil and gas company, that increased their annual emissions by 50%. This provides one example of how their actions are not in line with their framing of climate change:

Direct emissions of CO2 equivalent during 2015 were 21 million tons, 50% greater than the previous year due to the inclusion of the emissions from new assets in exploration and production acquired from Talisman. All other business emissions remain at values comparable to 2014. (Repsol, 2015 report)

Similar disconnects were observed at OMV where the aspiration for reducing their carbon footprint is at odds with their actions focused on exploring new approaches to oil and gas. These future-focused targets highlight how moral responsibility is viewed in a prospective manner.

We have pledged to reduce the carbon emissions of our operations, as well as the carbon footprint of our product portfolio in order to make a significant contribution to climate protection. (OMV, 2019 report) To realize its mission of providing energy for a better life, OMV is committed to exploring the full potential of oil and gas at its best by following a responsible approach in producing, processing, and marketing oil and gas and petrochemical products. (OMV, 2019 report)

In analysing the relationship between talk, action, and framings of collective moral responsibility for stagnating companies, we noticed that they often engage in teleological moral responsibility, where they vaguely express responsibility at an industry or sector level for increasing GHG emissions. Despite adopting a less diffuse lens on the collective compared to deflecting companies, these stagnating companies tend to frame responsibility prospectively, where they focus on their role in the future of contributing to climate change solutions.

The quotes below from Eni illustrate their framing of responsibility to a teleological collective at an industry level. We note that they still situate this responsibility within the context of other large companies rather than fully taking responsibility for the industry’s role in contributing to climate change.

This is particularly significant given that the industry is responsible for 40% of all greenhouse gas emissions by companies listed in the Global 500 Index, which groups together the top 500 companies worldwide by revenue. (Eni, 2012 report) There is no doubt that much of the economic growth the world has seen over the past 100 years has been achieved thanks to the discovery and use of fossil fuels. For that, they deserve to be thanked. However, it is now abundantly clear that we can no longer continue to use fossil fuels. (Eni, 2017 report)

Similarly, Total assigns responsibility to a teleological collective of high-emitting industry actors, that includes power generation, and engages in prospective responsibility by claiming that they are charged with realising the energy transition. Rather than taking responsibility for contributing to climate change, Total instead focuses on the potential implications that climate change could have on their operations in the future:

The sectors most responsible for emissions in the EU (i.e., power generation, industry, transport, buildings and construction, as well as agriculture) are charged with making the transition to a low-carbon economy over the coming decades, and these issues could affect TOTAL’s operations in the future. (Total, 2014, report)

In another example, OMV below takes some vague accountability for the impacts their operations have on the environment and the broad areas where they attempt to minimise these impacts. These vague comments seem to fall short of an agential view on moral responsibility.

Due to the nature of our operations, we have an impact on the environment. We strive to minimize that impact at all times, particularly in the areas of spills, energy efficiency, greenhouse gas (GHG) emissions, water and waste management. (OMV, 2016 report)

Evolving Companies

The category of evolving companies, which in our study was found to be represented by Ørsted, EON, and Iberdrola, are not only investing in renewables to diversify their portfolios but are moving the entire business strategy away from fossil fuels. As illustrated in Fig.  3 , their framing still tends to be ahead of action, with actions eventually catching up.

figure 3

The relationship between talk and action for evolving companies

Evolving companies often describe climate change as requiring radical transformation of business models and the energy sector and provide examples of how they are challenging, questioning and rethinking their business model on the path to more sustainable action. These actions include technological advancements to decarbonise the economy, reduce C0 2 emissions and combat climate change, for example, battery storage, localisation of the grid and electric vehicles. The following quotes provide evidence of the substantive actions evolving companies are undertaking as they transform their business activities, which demonstrate an underlying appreciation of their role as agents, and sense of accountability for past actions.

By the end of 2019, we had realised an 86% carbon reduction since 2006, and 86% of the energy we generated came from renewable sources. In just ten years, we met the transformation target we defined for 2040… We had installed 9.9GW renewable capacity, enough to power more than 15 million people. We had reduced our coal consumption by 91%, and 96% of the wooden biomass we sourced was certified sustainable biomass. (Ørsted, 2019 Report) Iberdrola has proposed the shut-down of all of its coal plants. – The company’s CO2 emissions are already 70% less than the average for the European electricity sector (Iberdola, 2017)

For these evolving companies, framing tends to eventually align with action. Evolving companies tend to go beyond what is required of them by law and set their own ambitions for achieving climate outcomes that exceed regulatory expectations. For example, in 2009 Ørsted set themselves the goal of transforming their energy mix from 85% fossil fuels and 15% renewables to 85% renewables and 15% fossil fuels by 2040. By setting bold emissions and energy intensity targets that they often meet and exceed, these energy companies provide insights on what transformation towards authentic and substantive climate action might look like.

We want sustainable energy to empower people, businesses and societies to unleash their potential without having to worry about harming the planet or reducing the opportunities for future generations…. We have now defined a new target of phasing out coal completely from our production by 2023, because coal is the type of fossil energy causing the highest amount of CO2 emissions. (Orsed, 2016 report) It has also set a goal of reducing greenhouse gas (GHG) emissions of absolute scope 1, 2 and 3, which has been approved by the Science-Based Target initiative…The company has committed to maintaining its position as one of the leading European companies with the lowest CO2 emissions per kWh produced, and to achieve this by focusing its efforts on reducing the intensity of greenhouse gases, promoting renewable technology and increasing efficiency. (Iberdola, 2019 report)

These evolving companies provide examples of how the gap in talk and action in evolving companies can be seen to be a positive sign of what is to come in terms of future action. For these companies, the aspirational talk in earlier years appears to have provided an authentic signal of more ambitious and meaningful climate actions rather than an attempt to hide poor sustainability performance.

Over time, evolving companies appear to be more focused on an agential view of the collective and their own responsibility for contributing to climate change in the past. Evolving companies draw attention to the ecological and societal stakes that are at risk by continuing down the path of fossil fuel-dependent energy systems and presenting themselves as being part of the transition toward a cleaner and more just energy future. They often frame the energy sector and their own company as being largely responsible for climate change and consider it their moral responsibility or obligation to reduce C0 2 emissions and respond to climate change.

Ørsted provides a great example of how evolving companies shift their framing of collective moral responsibility over time. At the beginning of the decade, Ørsted evokes a more diffuse collective with forward-looking prospective responsibility by speaking about the role of the energy sector in the future energy transition. Over time they shift towards a more agential collective and backward-looking moral responsibility where they take more ownership of both the blame and future solutions to climate change. The following quotes show how, earlier in the decade, Ørsted tended to evoke a more diffuse collective when discussing climate change:

The challenges facing the energy sector are part of a wider challenge concerning how we, as modern societies, use our resources. (Ørsted, 2011 report) the world is facing serious resource and climate challenges…With more people on the planet and a rapidly expanding consumer middle class, global resources and ecosystems are put under strain.(Ørsted, 2011 report)

In these early reports, Ørsted would frame their moral responsibility through a prospective lens:

As an energy company, we have a major responsibility to help steer the world in a more sustainable direction. We must develop and deploy low-carbon technologies that can meet the future energy demand of our customers, enabling people to live their lives and businesses to thrive. (Ørsted, 2014 report)

Towards the end of the decade, as Ørsted became more sustainable, there was a clear shift in the framing of collective moral responsibility towards agential collective and backward-looking responsibility whereby Ørsted acknowledged their contribution to the current situation.

We need to transform the global energy systems from black to green energy at a higher pace than the current trajectory. (Ørsted, 2017 report) At Ørsted, our vision directly addresses the challenge of climate change. We used to be one of the blackest energy companies in Europe. Today, we produce 64% green energy, and our target for 2023 takes us beyond 95%. (Ørsted, 2017 report)

By the end of the decade, the blameworthiness shifts to praiseworthiness as Ørsted begins to take credit in their own transition and leadership position in renewable energies. In this 2019 report, Ørsted has a strong framing on the role of transformational leadership and how they have transformed their entire business model. They also make frequent mention of their ambitious long-term targets that go beyond the expectations of the industry and underpin their view on their moral responsibility to combat climate change.

Over the past decade, we have been on a major decarbonisation journey to transform from one of Europe’s most carbon-intensive energy companies to a global leader in renewable energy. (Ørsted, 2019 report) In 2019, we adopted three new climate targets to guide our continued decarbonisation journey… Our biggest contribution is our actions to help fight climate change. (Ørsted, 2019 report)

Overall, these insights highlight how evolving companies combine an agential collective perspective with backward-looking responsibility to acknowledge their role in contributing to current climate situation, thereby taking ownership of past actions and future solutions.

A Typology of Energy Company Framing and Action in Response to Climate Change

Our findings illustrate how energy companies are framing their responses to climate change and the related actions they have taken to shift their business practices. Table 6 presents a typology of in energy company responses to climate change through the lens organisational hypocrisy and collective moral responsibility. The table summarises the relationship between talk, action and framing of collective moral responsibility, highlighting the implications for climate action.

Discussion and Concluding Comments

Our paper makes multiple contributions to the literature on business ethics and climate change. First, we contribute to the literature on business ethics, moral responsibility, and organisational hypocrisy by providing a nuanced understanding of the performative nature of collective moral responsibility (Soares, 2003 ; Tamminga & Hindriks, 2020 ). The performative nature of collective moral responsibility refers to how organisations’ talk and actions regarding moral obligations shape and are shaped by their sense of the collective and their relationship with broader stakeholders. As highlighted in our finding, this performativity suggest that collective moral responsibility is not static, but rather is actively constructed and reconstructed through organisational actions and discourses. Revealing this performativity highlights the diverse ways in which conceptions of the collective as diffuse, teleological, or agential (Collins, 2019 ) are associated with different types of climate talk and action and different levels of organisational hypocrisy. As highlighted in Table  6 , we show how organisations that frame their role as part of a more diffuse or teleological collective engage in forward-looking moral responsibility, which tends to promote symbolic talk rather than substantive action. For example, deflecting and stagnating companies made less sustainability progress over time and had larger gaps between talk and action than evolving companies. On the contrary, organisations that understand blameworthiness through a more agential collective, as was the case with Ørsted, E.ON and Iberdrola, seem to engage in substantive climate action as they view moral responsibility from a more backward-looking perspective. This more backward-looking perspective can create an obligation to authentically shift business practices. These findings highlight the importance of developing a more nuanced understanding of the ‘collective’ (Collins, 2019 ). Further, we reveal the value of differentiating between backward-looking (reactive) and forward-looking (prospective) moral responsibility (Gilbert, 2006a , 2006b ; Sanbhu, 2012 ; Poel, 2011 ) for understanding the connection between talk and action.

Our findings contribute to the broader literature on framing (Cornelissen & Werner, 2014 ) and organisational hypocrisy (Brunsson, 2002 ) by unpacking the relationship between organisational framing of collective moral responsibility and organisational hypocrisy. We show that organisations that view collective moral responsibility through the lens of diffuse collectives or teleological collectives (e.g., deflecting and stagnating companies) and forward-looking responsibility tend to have larger disconnects between talk and action and are less likely to engage in substantive action. Conversely, we show how companies that view moral responsibility through the lens of an agential collective (e.g., evolving companies) adopt a more backward-looking sense of responsibility that is associated with tighter linkages between talk and action and indicative of more substantive action over time. These insights extend prior research on framing and climate change by unpacking the relationship between frames and action (Campbell et al., 2019 ; Hahn & Lulfs, 2014 ; Metze, 2018 ; Nyberg & Wright, 2006 ; Nyberg et al., 2018 ; Wright & Nyberg, 2017 ) and showing how shifts in ethical frames relate to substantive shifts in action. Previous studies on framing and climate action have shown how specific political and social events can shape responses to climate change (Nyberg et al., 2018 ; Slawinski & Bansal, 2015 ). Our insights build on this work through a longitudinal analysis of how framing evolves over time and how different frames are correlated with different levels of organisational hypocrisy. In doing so we go beyond prior research by revealing the dynamic nature of these frames and their implications for action over time.

Through the use of a longitudinal study, we contribute to a temporal understanding of collective moral responsibility and organisational hypocrisy. By adopting a temporal lens, we reveal how the understanding of the collective and the direction of responsibility might shift over time and how this relates to action and inaction on climate change (Brunson, ; Cho et al., 2015 ). Evidence from the category of evolving companies, represented in this study by Ørsted, E.ON and Iberdrola, suggests that organisations that consider their own role as an agential collective with backward-looking responsibility seem to live up to aspirational talk over time. This finding extends insights that suggest that organisational hypocrisy can be beneficial to sustainability action when framing is eventually realised in future action (Christensen et al., 2013 ). The findings suggest that while these companies initially engaged in symbolic framing to signal their commitments to climate action, over time, they were able to align their actions with their talk and shift from the symbolic to the substantive. This evolution reveals how, in certain circumstances, organisational hypocrisy can lead to meaningful climate action.

Alternatively, the journey of the category of deflecting companies, represented in this study by BP and Shell, provides insights into the situations in which framing offsets action (Brunson, 1986 , 1993 , 2002 ) or creates facades (Cho et al., 2015 ) that draw attention away from poor performance or climate inaction. These findings align with the seminal work on organisational hypocrisy by Brunnson ( 1986 ), who theorised that talk, and subsequent decisions, often substitute for or postpone action, especially when organisations do not consider action important or desirable.

These temporal insights contribute to the broader literature on sustainability that highlights the need to adopt a process lens when understanding climate action and inaction (Mazutis et al., 2021 ; Schultz, 2022 ; Slawinski & Bansal, 2015 ; Slawinski et al., 2017 ) and respond to calls for a deeper understanding of the temporal elements involved in ethical considerations (Hockerts & Searcy, 2023 ). We show that as organisations shift from the notion of a diffuse collective to a more agential collective, they tend to move away from a forward-looking sense of moral responsibility towards a backward-looking sense that is associated with more substantive action. The impact of this shift in collective moral responsibility over time is best illustrated by the journey of Ørsted in radically transforming its business model. By revisiting institutional theory, we might understand these temporal shifts as being shaped by broader regulatory and normative institutional pressures (Bromley & Powell, 2012 ; Meyer & Rowan, 1977 ; Oliver, 1991 ). For example, shifts towards a stricter regulatory environment might expedite the shifts from symbolic talk to substantive action, while increased pressure from investors, community and NGOs and greater demands of transparency are also pushing energy companies towards a deeper sense of moral responsibility and more genuine climate action. Overall, we echo the call from Collins ( 2019 ) to consider the temporal horizon of moral responsibility in shaping climate action.

Building on these theoretical contributions, we see multiple fruitful avenues for future research. While this study explored talk and action over a ten-year period, future studies would benefit by investigating climate action and the framing of collective moral responsibility over longer time horizons through accessing historical data. Taking time seriously in studies of climate action would assist in developing more processual understanding of how green talk translates into action. Future research would also benefit from comparative studies that take a global lens and explore investor-owned energy companies along with publicly owned energy organisations. Increasing the heterogeneity of energy organisations would assist in understanding the influence of cultural and regulatory differences in shaping climate talk and action. Finally, extending research on framing and collective moral responsibility beyond the context of the climate crisis to human rights issues such as modern slavery and forced displacement would assist in unpacking the nature of organisational hypocrisy in social as compared to environmental crises.

Finally, we contribute to practice by highlighting the shifts in collective moral responsibility associated with energy companies becoming more sustainable and authentically engaging in climate action. As highlighted in the final row of Table  6 , the insights on the category of evolving companies, represented in this study by Ørsted, E.ON and Iberdrola, suggest that for symbolic talk to match substantive action, organisations need to actively question and reject current unsustainable practices. Evolving companies highlight how they are challenging, questioning, and rethinking their business model as they go beyond diversifying their portfolios towards moving their entire business strategy away from fossil fuels. Our findings suggest that aspirational talk is not sufficient to generate substantive climate action. We suggest that organisations that genuinely want to engage in climate action need to engage in a more agential view of the collective and reconsider their own responsibility for contributing to climate change in the past. Rather than deflecting and deferring responsibility to diffuse notions of society, government, civil society, and corporations, organisations that hope to genuinely contribute to climate action need to take ownership of both the blame for past action and their obligation to find future solutions.

Data Availability

This research is based on publicly available sustainability reports.

We note that E.ON separated into two companies in 2016 by setting up a separate entity Uniper to manage its fossil fuel assets. In order to gain a full picture of E.ON’s climate change responses we also reviewed Uniper’s sustainability reporting from 2016 to 2019. For consistency, we still refer to ten energy companies throughout the paper.

Ambrose, J. (2020). BP sets net zero carbon target for 2050 . Retrieved from: https://www.theguardian.com/business/2020/feb/12/bp-sets-net-zero-carbon-target-for-2050

Ashforth, B. E., & Gibbs, B. W. (1990). The double-edge of organizational legitimation. Organization Science, 1 (2), 177–194.

Article   Google Scholar  

Banerjee, S. B. (2008). Corporate social responsibility: The good, the bad and the ugly. Critical Sociology, 34 (1), 51–79.

Banerjee, S. B. (2012). A climate for change? Critical reflections on the Durban United Nations climate change conference. Organization Studies, 33 (12), 1761–1786.

Barth, C., & Bijsmans, P. (2018). The Maastricht Treaty and public debates about European integration: The emergence of a European public sphere? The Maastricht Treaty and public debates about European. Journal of Contemporary European Studies, 2804 , 1–17.

Google Scholar  

Battilana, J., Leca, B., & Boxenbaum, E. (2009). How actors change institutions: Towards a theory of institutional entrepreneurship. The Academy of Management Annals, 3 (1), 65–107.

Benner, M. J., & Tripsas, M. (2012). The influence of prior industry affiliation on framing in nascent industries: The evolution of digital cameras. Strategic Management Journal, 33 , 277–302.

Boiral, O. (2016). Accounting for the unaccountable: Biodiversity reporting and impression management. Journal of Business Ethics, 135 (4), 751–768.

Boiral, O., Heras-Saizarbitoria, I., & Brotherton, M. C. (2019). Assessing and improving the quality of sustainability reports: The auditors’ perspective. Journal of Business Ethics, 155 (3), 703–721.

Boons, F., Montalvo, C., Quist, J., & Wagner, M. (2013). Sustainable innovation, business models and economic performance: An overview. Journal of Cleaner Production, 45 , 1–8.

Bovens, M. (1998). The quest for responsibility: Accountability and citizenship in complex organisations. Cambridge University Press.

Bromley, P., & Powell, W. W. (2012). From smoke and mirrors to walking the talk: Decoupling in the contemporary world. Academy of Management Annals, 6 (1), 483–530.

Brønn, P. S., & Vidaver-Cohen, D. (2009). Corporate motives for social initiative: Legitimacy, sustainability, or the bottom line? Journal of Business Ethics, 87 , 91–109.

Brunsson, N. (1986). Organizing for inconsistencies: On organizational conflict, depression and hypocrisy as substitutes for action. Scandinavian Journal of Management Studies, 2 (3–4), 165–185.

Brunsson, N. (1993). Ideas and actions: Justification and hypocrisy as alternatives to control. Accounting, Organizations and Society, 18 (6), 489–506.

Brunsson, N. (2002). The organization of hypocrisy . Copenhagen Business School Press.

Campbell, N., McHugh, G., & Dylan-Ennis, P. (2019). Climate change is not a problem: Speculative realism at the end of organization. Organization Studies, 40 (5), 725–744.

Cho, C. H., Laine, M., Roberts, R. W., & Rodrigue, M. (2015). Organized hypocrisy, organizational façades, and sustainability reporting. Accounting, Organizations and Society, 40 , 78–94.

Cho, C. H., & Patten, D. M. (2007). The role of environmental disclosures as tools of legitimacy: A research note. Accounting Organizations and Society, 32 (7–8), 639–647.

Cho, C. H., Roberts, R. W., & Patten, D. M. (2010). The language of US corporate environmental disclosure. Accounting Organizations and Society, 35 (4), 431–443.

Christensen, H. B., Hail, L., & Leuz, C. (2021). Mandatory CSR and sustainability reporting: Economic analysis and literature review. Review of Accounting Studies, 26 (3), 1176–1248.

Christensen, L. T., Morsing, M., & Thyssen, O. (2013). CSR as aspirational talk. Organization, 20 (3), 372–393.

Clarkson, P. M., Li, Y., Richardson, G. D., & Vasvari, F. P. (2008). Revisiting the relation between environmental performance and environmental disclosure: An empirical analysis. Accounting, Organizations and Society, 33 (4–5), 303–327.

Climate Accountability Institute. (2017). CDP Carbon Majors Report 2017 . Retrieved from: https://cdn.cdp.net/cdp-production/cms/reports/documents/000/002/327/original/Carbon-Majors-Report-2017.pdf ?

Clune, C., & O’Dwyer, B. (2020). Organizing dissonance through institutional work: The embedding of social and environmental accountability in an investment field. Accounting, Organizations and Society, 85 , 101130.

Collins, S. (2019). Collective responsibility gaps. Journal of Business Ethics, 154 , 943–954.

Constantinescu, M., & Kaptein, M. (2015). Mutually enhancing responsibility: A theoretical exploration of the interaction mechanisms between individual and corporate moral responsibility. Journal of Business Ethics, 129 , 325–339.

Cornelissen, J. P., & Werner, M. D. (2014). The academy of management annals putting framing in perspective: A review of framing and frame analysis across the management and organizational literature. Academy of Management Annals, 8 (1), 181–235.

Cowton, C. J. (1998). The use of secondary data in business ethics research. Journal of Business Ethics, 17 (4), 423–434.

Crilly, D., Zollo, M., & Hansen, M. T. (2012). Faking it or muddling through? Understanding decoupling in response to stakeholder pressures. Academy of Management Journal, 55 (6), 1429–1448.

Delmas, M. A., & Burbano, V. C. (2011). The drivers of greenwashing. California Management Review, 54 (1), 64–87.

Dewulf, A., Gray, B., Putnam, L., & Lewicki, R. (2009). Disentangling approaches to framing in conflict and negotiation research: A meta-paradigmatic perspective. Human Relations, 62 (2), 155–193.

Donia, M. B., & Sirsly, C. A. T. (2016). Determinants and consequences of employee attributions of corporate social responsibility as substantive or symbolic. European Management Journal, 34 (3), 232–242.

Dyllick, T., & Muff, K. (2016). Clarifying the meaning of sustainable business: Introducing a typology from business-as-usual to true business sustainability. Organization & Environment, 29 (2), 156–174.

Elsbach, K. D., & Sutton, R. I. (1992). Acquiring organizational legitimacy through illegitimate actions: A marriage of institutional and impression management theories. Academy of Management Journal, 35 (4), 699–738.

Fassin, Y., & Buelens, M. (2011). The hypocrisy-sincerity continuum in corporate communication and decision making: A model of corporate social responsibility and business ethics practices. Management Decision, 49 (4), 586–600.

Fligstein, N., & McAdam, D. (2011). Toward a general theory of strategic action fields. Sociological Theory, 29 (1), 1–26.

Frandsen, F., & Johansen, W. (2011). Rhetoric, climate change, and corporate identity management. Management Communication Quarterly, 25 (3), 511–530.

Garavan, T. N., & McGuire, D. (2010). Human resource development and society: Human resource development’s role in embedding corporate social responsibility, sustainability, and ethics in organizations. Advances in Developing Human Resources, 12 (5), 487–507.

Gilbert, C. G. (2006a). Change in the presence of residual fit: Can competing frames coexist? Organization Science., 17 (1), 150–167.

Gilbert, M. (2006b). Who’s to blame? Collective moral responsibility and its implications for group members. Midwest Studies in Philosophy, 30 (1), 94–114.

Gioia, D. A., & Chittipeddi, K. (1991). Sensemaking and sensegiving in strategic change initiation. Strategic Management Journal, 12 (6), 433–448.

Goffman, E. (1974). Frame analysis: An essay on the organization of experience . North Eastern University Press.

Hahn, R., & Lulfs, R. (2014). Legitimizing negative aspects in GRI-oriented sustainability reporting: A qualitative analysis of corporate disclosure strategies. Journal of Business Ethics, 123 , 401–420.

Higgins, C., Tang, S., & Stubbs, W. (2020). On managing hypocrisy: The transparency of sustainability reports. Journal of Business Research, 114 , 395–407.

Hockerts, K., & Searcy, C. (2023). How to sharpen our discourse on corporate sustainability and business ethics—A view from the section editors. Journal of Business Ethics . https://doi.org/10.4324/9781315162935-11

Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15 (9), 1277–1288.

Hyatt, D. G., & Berente, N. (2017). Substantive or symbolic environmental strategies? Effects of external and internal normative stakeholder pressures. Business Strategy and the Environment, 26 (8), 1212–1234.

Kahneman, D. (1984). Choices, Values, and Frames. American Psychologist, 39 (4), 341–350.

Kahneman, D., Knetsch, J. L., & Thaler, R. H. (1986). Fairness and the assumptions of economics. The Journal of Business Ethics, 59 (4), 5285–5300.

Kaplan, S. (2008). Framing contests: Strategy making under uncertainty. Organization Science, 19 (5), 729–752.

Laufer, W. S. (2003). Social accountability and corporate greenwashing. Journal of Business Ethics, 43 (3), 253–261.

Liu, Y., Kumar, M., Katul, G. G., & Porporato, A. (2019). Reduced resilience as an early warning signal of forest mortality. Nature Climate Change, 9 (11), 880–885.

Lock, I., & Seele, P. (2015). Quantitative content analysis as a method for business ethics research. Business Ethics: A European Review, 24 , S24–S40.

Lyon, T. P., & Montgomery, A. W. (2015). The means and end of greenwash. Organization & Environment, 28 (2), 223–249.

Malsch, B. (2013). Politicizing the expertise of the accounting industry in the realm of corporate social responsibility. Accounting, Organizations and Society, 38 (2), 149–168.

Mayring, P. (2014). Qualitative Content Analysis: A theoretical foundation, basic procedures and software solution .

Mayring, P. (2000). Qualitative content analysis. Forum: Qualitative Social Research, 1 (2), 20.

Mayring, P. (2010). Qualitative content analysis: Basics and techniques (11th ed.). Beltz.

Mazutis, D., Slawinski, N., & Palazzo, G. (2021). A time and place for sustainability: A spatiotemporal perspective on organizational sustainability frame development. Business and Society, 60 (7), 1–42.

Mellema, G. (1997). Collective responsibility (Vol. 50). Rodopi Press.

Book   Google Scholar  

Mellema, G. (2003). Responsibility, taint, and ethical distance in business ethics. Journal of Business Ethics, 47 , 125–132.

Metze, T. (2018). Framing the future of fracking: Discursive lock-in or energy degrowth in the Netherlands? Journal of Cleaner Production, 197 , 1737–1745.

Meyer, J. W., & Rowan, B. (1977). Institutionalized organizations: Formal structure as myth and ceremony. American Journal of Sociology, 83 (2), 340–363.

Miller, S., & Makela, P. (2005). The collectivist approach to collective moral responsibility. Metaphilosophy, 36 (5), 634–651.

Milne, M. J., & Gray, R. (2013). W (h) ither ecology? The triple bottom line, the global reporting initiative, and corporate sustainability reporting. Journal of Business Ethics, 118 , 13–29.

Mori, A. (2021). How do incumbent companies’ heterogeneous responses affect sustainability transitions? Insights from China’s major incumbent power generators. Environmental Innovation and Societal Transitions, 39 , 55–72.

Nadkarni, S., & Narayanan, V. K. (2007). The evolution of collective strategy frames in. Organization Science, 18 (4), 688–710.

Nisar, A., Ruiz, F., & Palacios, M. (2013). Organisational learning, strategic rigidity and technology adoption: Implications for electric utilities and renewable energy firms. Renewable and Sustainable Energy Reviews, 22 , 438–445.

Nyberg, D., & Wright, C. (2006). Justifying business responses to climate change: Discursive strategies of similarity and difference. Environment and Planning a: Economy and Space, 44 (8), 1819–1835.

Nyberg, D., & Wright, C. (2016). Performative and political: Corporate constructions of climate change risk. Organization, 23 (5), 617–638.

Nyberg, D., Wright, C., & Kirk, J. (2018). Dash for gas: Climate change, hegemony and the scalar politics of fracking in the UK. British Journal of Management, 29 , 235–251.

Nyberg, D., Wright, C., & Kirk, J. (2020). Fracking the future: The temporal portability of frames in political contests. Organization Studies, 41 (2), 175–196.

O’Brien, K. O., Selboe, E., & Hayward, B. M. (2018). Exploring youth activism on climate change: dutiful, disruptive, and dangerous dissent. Ecology and Society . https://doi.org/10.5751/ES-10287-230342

Oliver, C. (1991). Strategic responses to institutional processes. Academy of Management Review, 16 (1), 145–179.

O’Neill, D. W., Fanning, A. L., Lamb, W. F., & Steinberger, J. K. (2018). A good life for all within planetary boundaries. Nature Sustainability, 1 (2), 88–95.

Porter, M. R., & Kramer, M. E. (2011). The big idea: Creating shared value. CFA Digest, 41 (1), 12–13.

Reinecke, J., & Lawrence, T. B. (2023). The role of temporality in institutional stabilization: A process view. Academy of Management Review, 48 (4), 639–658.

Reuters (2023, March 30) EU reaches deal on higher renewable energy share by 2030 . Retrieved from: https://www.reuters.com/business/sustainable-business/eu-reaches-deal-more-ambitious-renewable-energy-targets-2030-2023-03-30/

Ritchie, Hannah & Roser, Max. (2019). ‘C02 and Greenhouse Gas Emissions’. Published online at ourworldindata.org. Retrieved from: https://ourworldindata.org/co2-and-other-greenhouse-gas-emissions

Rodrigue, M., Magnan, M., & Cho, C. H. (2013). Is environmental governance substantive or symbolic? An empirical investigation. Journal of Business Ethics, 114 , 107–129.

Sandbu, M. E. (2012). Stakeholder duties: On the moral responsibility of corporate investors. Journal of Business Ethics, 109 , 97–107.

Schlichting, I. (2013). Strategic framing of climate change by industry actors: A meta-analysis strategic framing of climate change by industry actors: A meta-analysis. Environmental Communication, 7 (4), 493–511.

Schons, L., & Steinmeier, M. (2016). Walk the talk? How symbolic and substantive CSR actions affect firm performance depending on stakeholder proximity. Corporate Social Responsibility and Environmental Management, 23 (6), 358–372.

Schultz, M. (2022). The strategy–identity nexus: The relevance of their temporal interplay to climate change. Strategic Organization, 20 (4), 821–831.

Scott, M. (2020). Top company profile: Denmark’s Ørsted is 2020’s most sustainable corporation. Published online at corporateknights.com. Retrieved from: https://www.corporateknights.com/reports/2020-global-100/top-company-profile-Ørsted-15795648/

Sharma, S., & Vredenburg, H. (1998). Proactive corporate environmental strategy and the development of competitively valuable organizational capabilities. Strategic Management Journal, 19 (8), 729–753.

Shell Global. (2020). What is Shell’s net carbon footprint? Retrieved from: https://www.shell.com/energy-and-innovation/the-energy-future/what-is-shells-net-carbon-footprint-ambition.html

Slawinski, N., & Bansal, P. (2015). Short on time: Intertemporal tensions in business sustainability. Organization Science, 26 (2), 531–549.

Slawinski, N., Pinkse, J., Busch, T., & Banerjee, S. B. (2017). The role of short-termism and uncertainty avoidance in organizational inaction on climate change: A multi-level framework. Business & Society, 56 (2), 253–282.

Snelson-Powell, A. C., Grosvold, J., & Millington, A. I. (2020). Organizational hypocrisy in business schools with sustainability commitments: The drivers of talk-action inconsistency. Journal of Business Research, 114 , 408–420.

Soares, C. (2003). Corporate versus individual moral responsibility. Journal of Business Ethics, 46 , 143–150.

Steffen, W., Richardson, K., Rockström, J., Cornell, S. E., Fetzer, I., Bennett, E. M., & Sörlin, S. (2015a). Planetary boundaries: Guiding human development on a changing planet. Science . https://doi.org/10.1126/science.1259855

Steffen, W., Richardson, K., Rockström, J., Cornell, S. E., Fetzer, I., Bennett, E. M., & Sörlin, S. (2015b). Planetary boundaries: Guiding human development on a changing planet. Science, 347 (6223), 1259855.

Tamminga, A., & Hindriks, F. (2020). The irreducibility of collective obligations. Philosophical Studies, 177 , 1085–1109.

United Nations. (2019). The Sustainable Development Goals Report 2019 . Retrieved from:

van de Poel, I. (2011). The relation between forward-looking and backward-looking responsibility. Moral responsibility: Beyond free will and determinism (pp. 37–52). Springer Netherlands.

Chapter   Google Scholar  

Wagner, T., Lutz, R. J., & Weitz, B. A. (2009). Corporate hypocrisy: Overcoming the threat of inconsistent corporate social responsibility perceptions. Journal of Marketing, 73 (6), 77–91.

Walsh, J. P. (1995). Managerial and organizational cognition: Notes from a trip down memory lane. Organization Science, 6 (3), 280–321.

Weick, K. E. (1995). Sensemaking in organizations . Sage.

Wright, C., & Nyberg, D. (2017). An inconvenient truth: How organizations translate climate change into business as usual. Academy of Management Journal, 60 (5), 1633–1661.

Download references

Acknowledgements

We thank the Social Issues in Management (SIM) Division Sandbox community for their invaluable feedback on our paper.

Author information

Authors and affiliations.

The University of Maastricht, Tongersestraat 53, 6211 LM, Maastricht, The Netherlands

Melanie Feeney, Wim Gijselaers & Therese Grohnert

University of Technology Sydney, 15 Broadway, Ultimo, NSW, 2007, Australia

Jarrod Ormiston

The University of Maastricht, Kapoenstraat 2, 6211 KR, Maastricht, The Netherlands

Pim Martens

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Melanie Feeney .

Ethics declarations

Conflict of interest.

This research involved no conflicts of interest.

Consent to Participants

No human or animal participants were involved.

Informed Consent

No informed consent was required for this research.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Feeney, M., Ormiston, J., Gijselaers, W. et al. Framing Collective Moral Responsibility for Climate Change: A Longitudinal Frame Analysis of Energy Company Climate Reporting. J Bus Ethics (2024). https://doi.org/10.1007/s10551-024-05801-0

Download citation

Received : 17 June 2022

Accepted : 08 August 2024

Published : 26 August 2024

DOI : https://doi.org/10.1007/s10551-024-05801-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Climate change
  • Moral responsibility
  • Energy companies
  • Sustainability reporting
  • Content analysis
  • Find a journal
  • Publish with us
  • Track your research

Page Header

ISSN 2580-6122 (Print)

qualitative research nature

E-ISSN 3025-1877 (ONLINE)

qualitative research nature

Grasia Butsaina Tsaabita Faculty of Humanities, Universitas Sebelas Maret

LOGIN Editorial Team Focus and Scope Author Guidelines Peer Reviewers Contact

Journal Template

qualitative research nature

qualitative research nature

View My Stats

  • Other Journals
  • For Readers
  • For Authors
  • For Librarians

qualitative research nature

Symbolism and Cultural Representation in Lomban Tradition in Jepara

Alamsyah, A. (2013). Budaya Syawalan Atau Lomban Di Jepara: Studi Komparasi Akhir Abad Ke-19 Dan Tahun 2013. Humanika, 18(2). https://doi.org/10.14710/humanika.18.2.

Afifah, N. (2024, 19 Januari). Tradisi Lomban Pelarungan Kepala Kerbau di Jepara Warisan Budaya yang Dilestarikan dengan Ragam Acara Penting. Diakses pada tanggal 24 Juni 2024, dari https://portalkudus.pikiran- rakyat.com/wisata/pr-797609761/tradisi-lomban-pelarungan-kepala-kerbau-di-jepara-warisan-budaya-yang- dilestarikan-dengan-ragam-acara-penting?page=all.

Ningsih, D. P. (2017). Nilai Kearifan Lokal Dalam Tradisi Lomba Masyarakat Jepara. JIME, 3(2), 173- 180.

Nugroho, V. A., Zumrotun, E. (2023). Cerita Rakyat di Jepara. Semarang: Cahya Ghani Recovery.

Priyanto, H. (2021, 19 Mei). Menguak Misteri Sesaji Dalam Tradisi Lomban Jepara. Diakses pada tanggal 24 Juni 2024, dari https://suarabaru.id/2021/05/19/menguak-misteri-sesaji-dan-tradisi-lomban-jepara

  • There are currently no refbacks.

Jalan Ir. Sutami 36 A, Surakarta, 57126

(0271) 638959

Universitas Sebelas Maret Logo

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 28 August 2024

AI generates covertly racist decisions about people based on their dialect

  • Valentin Hofmann   ORCID: orcid.org/0000-0001-6603-3428 1 , 2 , 3 ,
  • Pratyusha Ria Kalluri 4 ,
  • Dan Jurafsky   ORCID: orcid.org/0000-0002-6459-7745 4 &
  • Sharese King 5  

Nature ( 2024 ) Cite this article

50 Altmetric

Metrics details

  • Computer science

Hundreds of millions of people now interact with language models, with uses ranging from help with writing 1 , 2 to informing hiring decisions 3 . However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans 4 , 5 , 6 , 7 . Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement 8 , 9 . It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.

Similar content being viewed by others

qualitative research nature

Large language models propagate race-based medicine

qualitative research nature

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

qualitative research nature

Cognitive causes of ‘like me’ race and gender biases in human language production

Language models are a type of artificial intelligence (AI) that has been trained to process and generate text. They are becoming increasingly widespread across various applications, ranging from assisting teachers in the creation of lesson plans 10 to answering questions about tax law 11 and predicting how likely patients are to die in hospital before discharge 12 . As the stakes of the decisions entrusted to language models rise, so does the concern that they mirror or even amplify human biases encoded in the data they were trained on, thereby perpetuating discrimination against racialized, gendered and other minoritized social groups 4 , 5 , 6 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 .

Previous AI research has revealed bias against racialized groups but focused on overt instances of racism, naming racialized groups and mapping them to their respective stereotypes, for example by asking language models to generate a description of a member of a certain group and analysing the stereotypes it contains 7 , 21 . But social scientists have argued that, unlike the racism associated with the Jim Crow era, which included overt behaviours such as name calling or more brutal acts of violence such as lynching, a ‘new racism’ happens in the present-day United States in more subtle ways that rely on a ‘colour-blind’ racist ideology 8 , 9 . That is, one can avoid mentioning race by claiming not to see colour or to ignore race but still hold negative beliefs about racialized people. Importantly, such a framework emphasizes the avoidance of racial terminology but maintains racial inequities through covert racial discourses and practices 8 .

Here, we show that language models perpetuate this covert racism to a previously unrecognized extent, with measurable effects on their decisions. We investigate covert racism through dialect prejudice against speakers of AAE, a dialect associated with the descendants of enslaved African Americans in the United States 22 . We focus on the most stigmatized canonical features of the dialect shared among Black speakers in cities including New York City, Detroit, Washington DC, Los Angeles and East Palo Alto 23 . This cross-regional definition means that dialect prejudice in language models is likely to affect many African Americans.

Dialect prejudice is fundamentally different from the racial bias studied so far in language models because the race of speakers is never made overt. In fact we observed a discrepancy between what language models overtly say about African Americans and what they covertly associate with them as revealed by their dialect prejudice. This discrepancy is particularly pronounced for language models trained with human feedback (HF), such as GPT4: our results indicate that HF training obscures the racism on the surface, but the racial stereotypes remain unaffected on a deeper level. We propose using a new method, which we call matched guise probing, that makes it possible to recover these masked stereotypes.

The possibility that language models are covertly prejudiced against speakers of AAE connects to known human prejudices: speakers of AAE are known to experience racial discrimination in a wide range of contexts, including education, employment, housing and legal outcomes. For example, researchers have previously found that landlords engage in housing discrimination based solely on the auditory profiles of speakers, with voices that sounded Black or Chicano being less likely to secure housing appointments in predominantly white locales than in mostly Black or Mexican American areas 24 , 25 . Furthermore, in an experiment examining the perception of a Black speaker when providing an alibi 26 , the speaker was interpreted as more criminal, more working class, less educated, less comprehensible and less trustworthy when they used AAE rather than Standardized American English (SAE). Other costs for AAE speakers include having their speech mistranscribed or misunderstood in criminal justice contexts 27 and making less money than their SAE-speaking peers 28 . These harms connect to themes in broader racial ideology about African Americans and stereotypes about their intelligence, competence and propensity to commit crimes 29 , 30 , 31 , 32 , 33 , 34 , 35 . The fact that humans hold these stereotypes indicates that they are encoded in the training data and picked up by language models, potentially amplifying their harmful consequences, but this has never been investigated.

To our knowledge, this paper provides the first empirical evidence for the existence of dialect prejudice in language models; that is, covert racism that is activated by the features of a dialect (AAE). Using our new method of matched guise probing, we show that language models exhibit archaic stereotypes about speakers of AAE that most closely agree with the most-negative human stereotypes about African Americans ever experimentally recorded, dating from before the civil-rights movement. Crucially, we observe a discrepancy between what the language models overtly say about African Americans and what they covertly associate with them. Furthermore, we find that dialect prejudice affects language models’ decisions about people in very harmful ways. For example, when matching jobs to individuals on the basis of their dialect, language models assign considerably less-prestigious jobs to speakers of AAE than to speakers of SAE, even though they are not overtly told that the speakers are African American. Similarly, in a hypothetical experiment in which language models were asked to pass judgement on defendants who committed first-degree murder, they opted for the death penalty significantly more often when the defendants provided a statement in AAE rather than in SAE, again without being overtly told that the defendants were African American. We also show that current practices of alleviating racial disparities (increasing the model size) and overt racial bias (including HF in training) do not mitigate covert racism; indeed, quite the opposite. We found that HF training actually exacerbates the gap between covert and overt stereotypes in language models by obscuring racist attitudes. Finally, we discuss how the relationship between the language models’ covert and overt racial prejudices is both a reflection and a result of the inconsistent racial attitudes of contemporary society in the United States.

Probing AI dialect prejudice

To explore how dialect choice impacts the predictions that language models make about speakers in the absence of other cues about their racial identity, we took inspiration from the ‘matched guise’ technique used in sociolinguistics, in which subjects listen to recordings of speakers of two languages or dialects and make judgements about various traits of those speakers 36 , 37 . Applying the matched guise technique to the AAE–SAE contrast, researchers have shown that people identify speakers of AAE as Black with above-chance accuracy 24 , 26 , 38 and attach racial stereotypes to them, even without prior knowledge of their race 39 , 40 , 41 , 42 , 43 . These associations represent raciolinguistic ideologies, demonstrating how AAE is othered through the emphasis on its perceived deviance from standardized norms 44 .

Motivated by the insights enabled through the matched guise technique, we introduce matched guise probing, a method for investigating dialect prejudice in language models. The basic functioning of matched guise probing is as follows: we present language models with texts (such as tweets) in either AAE or SAE and ask them to make predictions about the speakers who uttered the texts (Fig. 1 and Methods ). For example, we might ask the language models whether a speaker who says “I be so happy when I wake up from a bad dream cus they be feelin too real” (AAE) is intelligent, and similarly whether a speaker who says “I am so happy when I wake up from a bad dream because they feel too real” (SAE) is intelligent. Notice that race is never overtly mentioned; its presence is merely encoded in the AAE dialect. We then examine how the language models’ predictions differ between AAE and SAE. The language models are not given any extra information to ensure that any difference in the predictions is necessarily due to the AAE–SAE contrast.

figure 1

a , We used texts in SAE (green) and AAE (blue). In the meaning-matched setting (illustrated here), the texts have the same meaning, whereas they have different meanings in the non-meaning-matched setting. b , We embedded the SAE and AAE texts in prompts that asked for properties of the speakers who uttered the texts. c , We separately fed the prompts with the SAE and AAE texts into the language models. d , We retrieved and compared the predictions for the SAE and AAE inputs, here illustrated by five adjectives from the Princeton Trilogy. See Methods for more details.

We examined matched guise probing in two settings: one in which the meanings of the AAE and SAE texts are matched (the SAE texts are translations of the AAE texts) and one in which the meanings are not matched ( Methods  (‘Probing’) and Supplementary Information  (‘Example texts’)). Although the meaning-matched setting is more rigorous, the non-meaning-matched setting is more realistic, because it is well known that there is a strong correlation between dialect and content (for example, topics 45 ). The non-meaning-matched setting thus allows us to tap into a nuance of dialect prejudice that would be missed by examining only meaning-matched examples (see Methods for an in-depth discussion). Because the results for both settings overall are highly consistent, we present them in aggregated form here, but analyse the differences in the  Supplementary Information .

We examined GPT2 (ref. 46 ), RoBERTa 47 , T5 (ref. 48 ), GPT3.5 (ref. 49 ) and GPT4 (ref. 50 ), each in one or more model versions, amounting to a total of 12 examined models ( Methods and Supplementary Information (‘Language models’)). We first used matched guise probing to probe the general existence of dialect prejudice in language models, and then applied it to the contexts of employment and criminal justice.

Covert stereotypes in language models

We started by investigating whether the attitudes that language models exhibit about speakers of AAE reflect human stereotypes about African Americans. To do so, we replicated the experimental set-up of the Princeton Trilogy 29 , 30 , 31 , 34 , a series of studies investigating the racial stereotypes held by Americans, with the difference that instead of overtly mentioning race to the language models, we used matched guise probing based on AAE and SAE texts ( Methods ).

Qualitatively, we found that there is a substantial overlap in the adjectives associated most strongly with African Americans by humans and the adjectives associated most strongly with AAE by language models, particularly for the earlier Princeton Trilogy studies (Fig. 2a ). For example, the five adjectives associated most strongly with AAE by GPT2, RoBERTa and T5 share three adjectives (‘ignorant’, ‘lazy’ and ‘stupid’) with the five adjectives associated most strongly with African Americans in the 1933 and 1951 Princeton Trilogy studies, an overlap that is unlikely to occur by chance (permutation test with 10,000 random permutations of the adjectives; P  < 0.01). Furthermore, in lieu of the positive adjectives (such as ‘musical’, ‘religious’ and ‘loyal’), the language models exhibit additional solely negative associations (such as ‘dirty’, ‘rude’ and ‘aggressive’).

figure 2

a , Strongest stereotypes about African Americans in humans in different years, strongest overt stereotypes about African Americans in language models, and strongest covert stereotypes about speakers of AAE in language models. Colour coding as positive (green) and negative (red) is based on ref. 34 . Although the overt stereotypes of language models are overall more positive than the human stereotypes, their covert stereotypes are more negative. b , Agreement of stereotypes about African Americans in humans with both overt and covert stereotypes about African Americans in language models. The black dotted line shows chance agreement using a random bootstrap. Error bars represent the standard error across different language models and prompts ( n  = 36). The language models’ overt stereotypes agree most strongly with current human stereotypes, which are the most positive experimentally recorded ones, but their covert stereotypes agree most strongly with human stereotypes from the 1930s, which are the most negative experimentally recorded ones. c , Stereotype strength for individual linguistic features of AAE. Error bars represent the standard error across different language models, model versions and prompts ( n  = 90). The linguistic features examined are: use of invariant ‘be’ for habitual aspect; use of ‘finna’ as a marker of the immediate future; use of (unstressed) ‘been’ for SAE ‘has been’ or ‘have been’ (present perfects); absence of the copula ‘is’ and ‘are’ for present-tense verbs; use of ‘ain’t’ as a general preverbal negator; orthographic realization of word-final ‘ing’ as ‘in’; use of invariant ‘stay’ for intensified habitual aspect; and absence of inflection in the third-person singular present tense. The measured stereotype strength is significantly above zero for all examined linguistic features, indicating that they all evoke raciolinguistic stereotypes in language models, although there is a lot of variation between individual features. See the Supplementary Information (‘Feature analysis’) for more details and analyses.

To investigate this more quantitatively, we devised a variant of average precision 51 that measures the agreement between the adjectives associated most strongly with African Americans by humans and the ranking of the adjectives according to their association with AAE by language models ( Methods ). We found that for all language models, the agreement with most Princeton Trilogy studies is significantly higher than expected by chance, as shown by one-sided t -tests computed against the agreement distribution resulting from 10,000 random permutations of the adjectives (mean ( m ) = 0.162, standard deviation ( s ) = 0.106; Extended Data Table 1 ); and that the agreement is particularly pronounced for the stereotypes reported in 1933 and falls for each study after that, almost reaching the level of chance agreement for 2012 (Fig. 2b ). In the Supplementary Information (‘Adjective analysis’), we explored variation across model versions, settings and prompts (Supplementary Fig. 2 and Supplementary Table 4 ).

To explain the observed temporal trend, we measured the average favourability of the top five adjectives for all Princeton Trilogy studies and language models, drawing from crowd-sourced ratings for the Princeton Trilogy adjectives on a scale between −2 (very negative) and 2 (very positive; see Methods , ‘Covert-stereotype analysis’). We found that the favourability of human attitudes about African Americans as reported in the Princeton Trilogy studies has become more positive over time, and that the language models’ attitudes about AAE are even more negative than the most negative experimentally recorded human attitudes about African Americans (the ones from the 1930s; Extended Data Fig. 1 ). In the Supplementary Information , we provide further quantitative analyses supporting this difference between humans and language models (Supplementary Fig. 7 ).

Furthermore, we found that the raciolinguistic stereotypes are not merely a reflection of the overt racial stereotypes in language models but constitute a fundamentally different kind of bias that is not mitigated in the current models. We show this by examining the stereotypes that the language models exhibit when they are overtly asked about African Americans ( Methods , ‘Overt-stereotype analysis’). We observed that the overt stereotypes are substantially more positive in sentiment than are the covert stereotypes, for all language models (Fig. 2a and Extended Data Fig. 1 ). Strikingly, for RoBERTa, T5, GPT3.5 and GPT4, although their covert stereotypes about speakers of AAE are more negative than the most negative experimentally recorded human stereotypes, their overt stereotypes about African Americans are more positive than the most positive experimentally recorded human stereotypes. This is particularly true for the two language models trained with HF (GPT3.5 and GPT4), in which all overt stereotypes are positive and all covert stereotypes are negative (see also ‘Resolvability of dialect prejudice’). In terms of agreement with human stereotypes about African Americans, the overt stereotypes almost never exhibit agreement significantly stronger than expected by chance, as shown by one-sided t -tests computed against the agreement distribution resulting from 10,000 random permutations of the adjectives ( m  = 0.162, s  = 0.106; Extended Data Table 2 ). Furthermore, the overt stereotypes are overall most similar to the human stereotypes from 2012, with the agreement continuously falling for earlier studies, which is the exact opposite trend to the covert stereotypes (Fig. 2b ).

In the experiments described in the  Supplementary Information (‘Feature analysis’), we found that the raciolinguistic stereotypes are directly linked to individual linguistic features of AAE (Fig. 2c and Supplementary Table 14 ), and that a higher density of such linguistic features results in stronger stereotypical associations (Supplementary Fig. 11 and Supplementary Table 13 ). Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE, irrespective of how the deviations look ( Supplementary Information (‘Alternative explanations’), Supplementary Figs. 12 and 13 and Supplementary Tables 15 and 16 ). Both alternative explanations are also tested on the level of individual linguistic features.

Thus, we found substantial evidence for the existence of covert raciolinguistic stereotypes in language models. Our experiments show that these stereotypes are similar to the archaic human stereotypes about African Americans that existed before the civil rights movement, are even more negative than the most negative experimentally recorded human stereotypes about African Americans, and are both qualitatively and quantitatively different from the previously reported overt racial stereotypes in language models, indicating that they are a fundamentally different kind of bias. Finally, our analyses demonstrate that the detected stereotypes are inherently linked to AAE and its linguistic features.

Impact of covert racism on AI decisions

To determine what harmful consequences the covert stereotypes have in the real world, we focused on two areas in which racial stereotypes about speakers of AAE and African Americans have been repeatedly shown to bias human decisions: employment and criminality. There is a growing impetus to use AI systems in these areas. Indeed, AI systems are already being used for personnel selection 52 , 53 , including automated analyses of applicants’ social-media posts 54 , 55 , and technologies for predicting legal outcomes are under active development 56 , 57 , 58 . Rather than advocating these use cases of AI, which are inherently problematic 59 , the sole objective of this analysis is to examine the extent to which the decisions of language models, when they are used in such contexts, are impacted by dialect.

First, we examined decisions about employability. Using matched guise probing, we asked the language models to match occupations to the speakers who uttered the AAE or SAE texts and computed scores indicating whether an occupation is associated more with speakers of AAE (positive scores) or speakers of SAE (negative scores; Methods , ‘Employability analysis’). The average score of the occupations was negative ( m  = –0.046,  s  = 0.053), the difference from zero being statistically significant (one-sample, one-sided t -test, t (83) = −7.9, P  < 0.001). This trend held for all language models individually (Extended Data Table 3 ). Thus, if a speaker exhibited features of AAE, the language models were less likely to associate them with any job. Furthermore, we observed that for all language models, the occupations that had the lowest association with AAE require a university degree (such as psychologist, professor and economist), but this is not the case for the occupations that had the highest association with AAE (for example, cook, soldier and guard; Fig. 3a ). Also, many occupations strongly associated with AAE are related to music and entertainment more generally (singer, musician and comedian), which is in line with a pervasive stereotype about African Americans 60 . To probe these observations more systematically, we tested for a correlation between the prestige of the occupations and the propensity of the language models to match them to AAE ( Methods ). Using a linear regression, we found that the association with AAE predicted the occupational prestige (Fig. 3b ; β  = −7.8, R 2 = 0.193, F (1, 63) = 15.1, P  < 0.001). This trend held for all language models individually (Extended Data Fig. 2 and Extended Data Table 4 ), albeit in a less pronounced way for GPT3.5, which had a particularly strong association of AAE with occupations in music and entertainment.

figure 3

a , Association of different occupations with AAE or SAE. Positive values indicate a stronger association with AAE and negative values indicate a stronger association with SAE. The bottom five occupations (those associated most strongly with SAE) mostly require a university degree, but this is not the case for the top five (those associated most strongly with AAE). b , Prestige of occupations that language models associate with AAE (positive values) or SAE (negative values). The shaded area shows a 95% confidence band around the regression line. The association with AAE or SAE predicts the occupational prestige. Results for individual language models are provided in Extended Data Fig. 2 . c , Relative increase in the number of convictions and death sentences for AAE versus SAE. Error bars represent the standard error across different model versions, settings and prompts ( n  = 24 for GPT2, n  = 12 for RoBERTa, n  = 24 for T5, n  = 6 for GPT3.5 and n  = 6 for GPT4). In cases of small sample size ( n  ≤ 10 for GPT3.5 and GPT4), we plotted the individual results as overlaid dots. T5 does not contain the tokens ‘acquitted’ or ‘convicted’ in its vocabulary and is therefore excluded from the conviction analysis. Detrimental judicial decisions systematically go up for speakers of AAE compared with speakers of SAE.

We then examined decisions about criminality. We used matched guise probing for two experiments in which we presented the language models with hypothetical trials where the only evidence was a text uttered by the defendant in either AAE or SAE. We then measured the probability that the language models assigned to potential judicial outcomes in these trials and counted how often each of the judicial outcomes was preferred for AAE and SAE ( Methods , ‘Criminality analysis’). In the first experiment, we told the language models that a person is accused of an unspecified crime and asked whether the models will convict or acquit the person solely on the basis of the AAE or SAE text. Overall, we found that the rate of convictions was greater for AAE ( r  = 68.7%) than SAE ( r  = 62.1%; Fig. 3c , left). A chi-squared test found a strong effect ( χ 2 (1,  N  = 96) = 184.7,  P  < 0.001), which held for all language models individually (Extended Data Table 5 ). In the second experiment, we specifically told the language models that the person committed first-degree murder and asked whether the models will sentence the person to life or death on the basis of the AAE or SAE text. The overall rate of death sentences was greater for AAE ( r  = 27.7%) than for SAE ( r  = 22.8%; Fig. 3c , right). A chi-squared test found a strong effect ( χ 2 (1,  N  = 144) = 425.4,  P  < 0.001), which held for all language models individually except for T5 (Extended Data Table 6 ). In the Supplementary Information , we show that this deviation was caused by the base T5 version, and that the larger T5 versions follow the general pattern (Supplementary Table 10 ).

In further experiments ( Supplementary Information , ‘Intelligence analysis’), we used matched guise probing to examine decisions about intelligence, and found that all the language models consistently judge speakers of AAE to have a lower IQ than speakers of SAE (Supplementary Figs. 14 and 15 and Supplementary Tables 17 – 19 ).

Resolvability of dialect prejudice

We wanted to know whether the dialect prejudice we observed is resolved by current practices of bias mitigation, such as increasing the size of the language model or including HF in training. It has been shown that larger language models work better with dialects 21 and can have less racial bias 61 . Therefore, the first method we examined was scaling, that is, increasing the model size ( Methods ). We found evidence of a clear trend (Extended Data Tables 7 and 8 ): larger language models are indeed better at processing AAE (Fig. 4a , left), but they are not less prejudiced against speakers of it. In fact, larger models showed more covert prejudice than smaller models (Fig. 4a , right). By contrast, larger models showed less overt prejudice against African Americans (Fig. 4a , right). Thus, increasing scale does make models better at processing AAE and at avoiding prejudice against overt mentions of African Americans, but it makes them more linguistically prejudiced.

figure 4

a , Language modelling perplexity and stereotype strength on AAE text as a function of model size. Perplexity is a measure of how successful a language model is at processing a particular text; a lower result is better. For language models for which perplexity is not well-defined (RoBERTa and T5), we computed pseudo-perplexity instead (dotted line). Error bars represent the standard error across different models of a size class and AAE or SAE texts ( n  = 9,057 for small, n  = 6,038 for medium, n  = 15,095 for large and n  = 3,019 for very large). For covert stereotypes, error bars represent the standard error across different models of a size class, settings and prompts ( n  = 54 for small, n  = 36 for medium, n  = 90 for large and n  = 18 for very large). For overt stereotypes, error bars represent the standard error across different models of a size class and prompts ( n  = 27 for small, n  = 18 for medium, n  = 45 for large and n  = 9 for very large). Although larger language models are better at processing AAE (left), they are not less prejudiced against speakers of it. Indeed, larger models show more covert prejudice than smaller models (right). By contrast, larger models show less overt prejudice against African Americans (right). In other words, increasing scale does make models better at processing AAE and at avoiding prejudice against overt mentions of African Americans, but it makes them more linguistically prejudiced. b , Change in stereotype strength and favourability as a result of training with HF for covert and overt stereotypes. Error bars represent the standard error across different prompts ( n  = 9). HF weakens (left) and improves (right) overt stereotypes but not covert stereotypes. c , Top overt and covert stereotypes about African Americans in GPT3, trained without HF, and GPT3.5, trained with HF. Colour coding as positive (green) and negative (red) is based on ref. 34 . The overt stereotypes get substantially more positive as a result of HF training in GPT3.5, but there is no visible change in favourability for the covert stereotypes.

As a second potential way to resolve dialect prejudice in language models, we examined training with HF 49 , 62 . Specifically, we compared GPT3.5 (ref. 49 ) with GPT3 (ref. 63 ), its predecessor that was trained without using HF ( Methods ). Looking at the top adjectives associated overtly and covertly with African Americans by the two language models, we found that HF resulted in more-positive overt associations but had no clear qualitative effect on the covert associations (Fig. 4c ). This observation was confirmed by quantitative analyses: the inclusion of HF resulted in significantly weaker (no HF, m  = 0.135,  s  = 0.142; HF, m  = −0.119,  s  = 0.234;  t (16) = 2.6,  P  < 0.05) and more favourable (no HF, m  = 0.221,  s  = 0.399; HF, m  = 1.047,  s  = 0.387;  t (16) = −6.4,  P  < 0.001) overt stereotypes but produced no significant difference in the strength (no HF, m  = 0.153,  s  = 0.049; HF, m  = 0.187,  s  = 0.066;  t (16) = −1.2, P  = 0.3) or unfavourability (no HF, m  = −1.146, s  = 0.580; HF, m = −1.029, s  = 0.196; t (16) = −0.5, P  = 0.6) of covert stereotypes (Fig. 4b ). Thus, HF training weakens and ameliorates the overt stereotypes but has no clear effect on the covert stereotypes; in other words, it obscures the racist attitudes on the surface, but more subtle forms of racism, such as dialect prejudice, remain unaffected. This finding is underscored by the fact that the discrepancy between overt and covert stereotypes about African Americans is most pronounced for the two examined language models trained with human feedback (GPT3.5 and GPT4; see ‘Covert stereotypes in language models’). Furthermore, this finding again shows that there is a fundamental difference between overt and covert stereotypes in language models, and that mitigating the overt stereotypes does not automatically translate to mitigated covert stereotypes.

To sum up, neither scaling nor training with HF as applied today resolves the dialect prejudice. The fact that these two methods effectively mitigate racial performance disparities and overt racial stereotypes in language models indicates that this form of covert racism constitutes a different problem that is not addressed by current approaches for improving and aligning language models.

The key finding of this article is that language models maintain a form of covert racial prejudice against African Americans that is triggered by dialect features alone. In our experiments, we avoided overt mentions of race but drew from the racialized meanings of a stigmatized dialect, and could still find historically racist associations with African Americans. The implicit nature of this prejudice, that is, the fact it is about something that is not explicitly expressed in the text, makes it fundamentally different from the overt racial prejudice that has been the focus of previous research. Strikingly, the language models’ covert and overt racial prejudices are often in contradiction with each other, especially for the most recent language models that have been trained with HF (GPT3.5 and GPT4). These two language models obscure the racism, overtly associating African Americans with exclusively positive attributes (such as ‘brilliant’), but our results show that they covertly associate African Americans with exclusively negative attributes (such as ‘lazy’).

We argue that this paradoxical relation between the language models’ covert and overt racial prejudices manifests the inconsistent racial attitudes present in the contemporary society of the United States 8 , 64 . In the Jim Crow era, stereotypes about African Americans were overtly racist, but the normative climate after the civil rights movement made expressing explicitly racist views distasteful. As a result, racism acquired a covert character and continued to exist on a more subtle level. Thus, most white people nowadays report positive attitudes towards African Americans in surveys but perpetuate racial inequalities through their unconscious behaviour, such as their residential choices 65 . It has been shown that negative stereotypes persist, even if they are superficially rejected 66 , 67 . This ambivalence is reflected by the language models we analysed, which are overtly non-racist but covertly exhibit archaic stereotypes about African Americans, showing that they reproduce a colour-blind racist ideology. Crucially, the civil rights movement is generally seen as the period during which racism shifted from overt to covert 68 , 69 , and this is mirrored by our results: all the language models overtly agree the most with human stereotypes from after the civil rights movement, but covertly agree the most with human stereotypes from before the civil rights movement.

Our findings beg the question of how dialect prejudice got into the language models. Language models are pretrained on web-scraped corpora such as WebText 46 , C4 (ref. 48 ) and the Pile 70 , which encode raciolinguistic stereotypes about AAE. A drastic example of this is the use of ‘mock ebonics’ to parodize speakers of AAE 71 . Crucially, a growing body of evidence indicates that language models pick up prejudices present in the pretraining corpus 72 , 73 , 74 , 75 , which would explain how they become prejudiced against speakers of AAE, and why they show varying levels of dialect prejudice as a function of the pretraining corpus. However, the web also abounds with overt racism against African Americans 76 , 77 , so we wondered why the language models exhibit much less overt than covert racial prejudice. We argue that the reason for this is that the existence of overt racism is generally known to people 32 , which is not the case for covert racism 69 . Crucially, this also holds for the field of AI. The typical pipeline of training language models includes steps such as data filtering 48 and, more recently, HF training 62 that remove overt racial prejudice. As a result, much of the overt racism on the web does not end up in the language models. However, there are currently no measures in place to curtail covert racial prejudice when training language models. For example, common datasets for HF training 62 , 78 do not include examples that would train the language models to treat speakers of AAE and SAE equally. As a result, the covert racism encoded in the training data can make its way into the language models in an unhindered fashion. It is worth mentioning that the lack of awareness of covert racism also manifests during evaluation, where it is common to test language models for overt racism but not for covert racism 21 , 63 , 79 , 80 .

As well as the representational harms, by which we mean the pernicious representation of AAE speakers, we also found evidence for substantial allocational harms. This refers to the inequitable allocation of resources to AAE speakers 81 (Barocas et al., unpublished observations), and adds to known cases of language technology putting speakers of AAE at a disadvantage by performing worse on AAE 82 , 83 , 84 , 85 , 86 , 87 , 88 , misclassifying AAE as hate speech 81 , 89 , 90 , 91 or treating AAE as incorrect English 83 , 85 , 92 . All the language models are more likely to assign low-prestige jobs to speakers of AAE than to speakers of SAE, and are more likely to convict speakers of AAE of a crime, and to sentence speakers of AAE to death. Although the details of our tasks are constructed, the findings reveal real and urgent concerns because business and jurisdiction are areas for which AI systems involving language models are currently being developed or deployed. As a consequence, the dialect prejudice we uncovered might already be affecting AI decisions today, for example when a language model is used in application-screening systems to process background information, which might include social-media text. Worryingly, we also observe that larger language models and language models trained with HF exhibit stronger covert, but weaker overt, prejudice. Against the backdrop of continually growing language models and the increasingly widespread adoption of HF training, this has two risks: first, that language models, unbeknownst to developers and users, reach ever-increasing levels of covert prejudice; and second, that developers and users mistake ever-decreasing levels of overt prejudice (the only kind of prejudice currently tested for) for a sign that racism in language models has been solved. There is therefore a realistic possibility that the allocational harms caused by dialect prejudice in language models will increase further in the future, perpetuating the racial discrimination experienced by generations of African Americans.

Matched guise probing examines how strongly a language model associates certain tokens, such as personality traits, with AAE compared with SAE. AAE can be viewed as the treatment condition, whereas SAE functions as the control condition. We start by explaining the basic experimental unit of matched guise probing: measuring how a language model associates certain tokens with an individual text in AAE or SAE. Based on this, we introduce two different settings for matched guise probing (meaning-matched and non-meaning-matched), which are both inspired by the matched guise technique used in sociolinguistics 36 , 37 , 93 , 94 and provide complementary views on the attitudes a language model has about a dialect.

The basic experimental unit of matched guise probing is as follows. Let θ be a language model, t be a text in AAE or SAE, and x be a token of interest, typically a personality trait such as ‘intelligent’. We embed the text in a prompt v , for example v ( t ) = ‘a person who says t tends to be’, and compute P ( x ∣ v ( t );  θ ), which is the probability that θ assigns to x after processing v ( t ). We calculate P ( x ∣ v ( t );  θ ) for equally sized sets T a of AAE texts and T s of SAE texts, comparing various tokens from a set X as possible continuations. It has been shown that P ( x ∣ v ( t );  θ ) can be affected by the precise wording of v , so small modifications of v can have an unpredictable effect on the predictions made by the language model 21 , 95 , 96 . To account for this fact, we consider a set V containing several prompts ( Supplementary Information ). For all experiments, we have provided detailed analyses of variation across prompts in the  Supplementary Information .

We conducted matched guise probing in two settings. In the first setting, the texts in T a and T s formed pairs expressing the same underlying meaning, that is, the i -th text in T a (for example, ‘I be so happy when I wake up from a bad dream cus they be feelin too real’) matches the i -th text in T s (for example, ‘I am so happy when I wake up from a bad dream because they feel too real’). For this setting, we used the dataset from ref. 87 , which contains 2,019 AAE tweets together with their SAE translations. In the second setting, the texts in T a and T s did not form pairs, so they were independent texts in AAE and SAE. For this setting, we sampled 2,000 AAE and SAE tweets from the dataset in ref. 83 and used tweets strongly aligned with African Americans for AAE and tweets strongly aligned with white people for SAE ( Supplementary Information (‘Analysis of non-meaning-matched texts’), Supplementary Fig. 1 and Supplementary Table 3 ). In the  Supplementary Information , we include examples of AAE and SAE texts for both settings (Supplementary Tables 1 and 2 ). Tweets are well suited for matched guise probing because they are a rich source of dialectal variation 97 , 98 , 99 , especially for AAE 100 , 101 , 102 , but matched guise probing can be applied to any type of text. Although we do not consider it here, matched guise probing can in principle also be applied to speech-based models, with the potential advantage that dialectal variation on the phonetic level could be captured more directly, which would make it possible to study dialect prejudice specific to regional variants of AAE 23 . However, note that a great deal of phonetic variation is reflected orthographically in social-media texts 101 .

It is important to analyse both meaning-matched and non-meaning-matched settings because they capture different aspects of the attitudes a language model has about speakers of AAE. Controlling for the underlying meaning makes it possible to uncover differences in the attitudes of the language model that are solely due to grammatical and lexical features of AAE. However, it is known that various properties other than linguistic features correlate with dialect, such as topics 45 , and these might also influence the attitudes of the language model. Sidelining such properties bears the risk of underestimating the harms that dialect prejudice causes for speakers of AAE in the real world. For example, in a scenario in which a language model is used in the context of automated personnel selection to screen applicants’ social-media posts, the texts of two competing applicants typically differ in content and do not come in pairs expressing the same meaning. The relative advantages of using meaning-matched or non-meaning-matched data for matched guise probing are conceptually similar to the relative advantages of using the same or different speakers for the matched guise technique: more control in the former versus more naturalness in the latter setting 93 , 94 . Because the results obtained in both settings were consistent overall for all experiments, we aggregated them in the main article, but we analysed differences in detail in the  Supplementary Information .

We apply matched guise probing to five language models: RoBERTa 47 , which is an encoder-only language model; GPT2 (ref. 46 ), GPT3.5 (ref. 49 ) and GPT4 (ref. 50 ), which are decoder-only language models; and T5 (ref. 48 ), which is an encoder–decoder language model. For each language model, we examined one or more model versions: GPT2 (base), GPT2 (medium), GPT2 (large), GPT2 (xl), RoBERTa (base), RoBERTa (large), T5 (small), T5 (base), T5 (large), T5 (3b), GPT3.5 (text-davinci-003) and GPT4 (0613). Where we used several model versions per language model (GPT2, RoBERTa and T5), the model versions all had the same architecture and were trained on the same data but differed in their size. Furthermore, we note that GPT3.5 and GPT4 are the only language models examined in this paper that were trained with HF, specifically reinforcement learning from human feedback 103 . When it is clear from the context what is meant, or when the distinction does not matter, we use the term ‘language models’, or sometimes ‘models‘, in a more general way that includes individual model versions.

Regarding matched guise probing, the exact method for computing P ( x ∣ v ( t );  θ ) varies across language models and is detailed in the  Supplementary Information . For GPT4, for which computing P ( x ∣ v ( t );  θ ) for all tokens of interest was often not possible owing to restrictions imposed by the OpenAI application programming interface (API), we used a slightly modified method for some of the experiments, and this is also discussed in the  Supplementary Information . Similarly, some of the experiments could not be done for all language models because of model-specific constraints, which we highlight below. We note that there was at most one language model per experiment for which this was the case.

Covert-stereotype analysis

In the covert-stereotype analysis, the tokens x whose probabilities are measured for matched guise probing are trait adjectives from the Princeton Trilogy 29 , 30 , 31 , 34 , such as ‘aggressive’, ‘intelligent’ and ‘quiet’. We provide details about these adjectives in the  Supplementary Information . In the Princeton Trilogy, the adjectives are provided to participants in the form of a list, and participants are asked to select from the list the five adjectives that best characterize a given ethnic group, such as African Americans. The studies that we compare in this paper, which are the original Princeton Trilogy studies 29 , 30 , 31 and a more recent reinstallment 34 , all follow this general set-up and observe a gradual improvement of the expressed stereotypes about African Americans over time, but the exact interpretation of this finding is disputed 32 . Here, we used the adjectives from the Princeton Trilogy in the context of matched guise probing.

Specifically, we first computed P ( x ∣ v ( t );  θ ) for all adjectives, for both the AAE texts and the SAE texts. The method for aggregating the probabilities P ( x ∣ v ( t );  θ ) into association scores between an adjective x and AAE varies for the two settings of matched guise probing. Let \({t}_{{\rm{a}}}^{i}\) be the i -th AAE text in T a and \({t}_{{\rm{s}}}^{i}\) be the i -th SAE text in T s . In the meaning-matched setting, in which \({t}_{{\rm{a}}}^{i}\) and \({t}_{{\rm{s}}}^{i}\) express the same meaning, we computed the prompt-level association score for an adjective x as

where n = ∣ T a ∣ = ∣ T s ∣ . Thus, we measure for each pair of AAE and SAE texts the log ratio of the probability assigned to x following the AAE text and the probability assigned to x following the SAE text, and then average the log ratios of the probabilities across all pairs. In the non-meaning-matched setting, we computed the prompt-level association score for an adjective x as

where again n = ∣ T a ∣ = ∣ T s ∣ . In other words, we first compute the average probability assigned to a certain adjective x following all AAE texts and the average probability assigned to x following all SAE texts, and then measure the log ratio of these average probabilities. The interpretation of q ( x ;  v ,  θ ) is identical in both settings; q ( x ;  v , θ ) > 0 means that for a certain prompt v , the language model θ associates the adjective x more strongly with AAE than with SAE, and q ( x ;  v ,  θ ) < 0 means that for a certain prompt v , the language model θ associates the adjective x more strongly with SAE than with AAE. In the  Supplementary Information (‘Calibration’), we show that q ( x ;  v , θ ) is calibrated 104 , meaning that it does not depend on the prior probability that θ assigns to x in a neutral context.

The prompt-level association scores q ( x ;  v ,  θ ) are the basis for further analyses. We start by averaging q ( x ;  v ,  θ ) across model versions, prompts and settings, and this allows us to rank all adjectives according to their overall association with AAE for individual language models (Fig. 2a ). In this and the following adjective analyses, we focus on the five adjectives that exhibit the highest association with AAE, making it possible to consistently compare the language models with the results from the Princeton Trilogy studies, most of which do not report the full ranking of all adjectives. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Fig. 2 and Supplementary Table 4 ).

Next, we wanted to measure the agreement between language models and humans through time. To do so, we considered the five adjectives most strongly associated with African Americans for each study and evaluated how highly these adjectives are ranked by the language models. Specifically, let R l  = [ x 1 , …,  x ∣ X ∣ ] be the adjective ranking generated by a language model and \({R}_{h}^{5}\) = [ x 1 , …, x 5 ] be the ranking of the top five adjectives generated by the human participants in one of the Princeton Trilogy studies. A typical measure to evaluate how highly the adjectives from \({R}_{h}^{5}\) are ranked within R l is average precision, AP 51 . However, AP does not take the internal ranking of the adjectives in \({R}_{h}^{5}\) into account, which is not ideal for our purposes; for example, AP does not distinguish whether the top-ranked adjective for humans is on the first or on the fifth rank for a language model. To remedy this, we computed the mean average precision, MAP, for different subsets of \({R}_{h}^{5}\) ,

where \({R}_{h}^{i}\) denotes the top i adjectives from the human ranking. MAP = 1 if, and only if, the top five adjectives from \({R}_{h}^{5}\) have an exact one-to-one correspondence with the top five adjectives from R l , so, unlike AP, it takes the internal ranking of the adjectives into account. We computed an individual agreement score for each language model and prompt, so we average the q ( x ;  v ,  θ ) association scores for all model versions of a language model (GPT2, for example) and the two settings (meaning-matched and non-meaning-matched) to generate R l . Because the OpenAI API for GPT4 does not give access to the probabilities for all adjectives, we excluded GPT4 from this analysis. Results are presented in Fig. 2b and Extended Data Table 1 . In the Supplementary Information (‘Agreement analysis’), we analyse variation across model versions, settings and prompts (Supplementary Figs. 3 – 5 ).

To analyse the favourability of the stereotypes about African Americans, we drew from crowd-sourced favourability ratings collected previously 34 for the adjectives from the Princeton Trilogy that range between −2 (‘very unfavourable’, meaning very negative) and 2 (‘very favourable’, meaning very positive). For example, the favourability rating of ‘cruel’ is −1.81 and the favourability rating of ‘brilliant’ is 1.86. We computed the average favourability of the top five adjectives, weighting the favourability ratings of individual adjectives by their association scores with AAE and African Americans. More formally, let R 5 = [ x 1 , …, x 5 ] be the ranking of the top five adjectives generated by either a language model or humans. Furthermore, let f ( x ) be the favourability rating of adjective x as reported in ref. 34 , and let q ( x ) be the overall association score of adjective x with AAE or African Americans that is used to generate R 5 . For the Princeton Trilogy studies, q ( x ) is the percentage of participants who have assigned x to African Americans. For language models, q ( x ) is the average value of q ( x ;  v ,  θ ). We then computed the weighted average favourability, F , of the top five adjectives as

As a result of the weighting, the top-ranked adjective contributed more to the average than the second-ranked adjective, and so on. Results are presented in Extended Data Fig. 1 . To check for consistency, we also computed the average favourability of the top five adjectives without weighting, which yields similar results (Supplementary Fig. 6) .

Overt-stereotype analysis

The overt-stereotype analysis closely followed the methodology of the covert-stereotype analysis, with the difference being that instead of providing the language models with AAE and SAE texts, we provided them with overt descriptions of race (specifically, ‘Black’/‘black’ and ‘White’/‘white’). This methodological difference is also reflected by a different set of prompts ( Supplementary Information ). As a result, the experimental set-up is very similar to existing studies on overt racial bias in language models 4 , 7 . All other aspects of the analysis (such as computing adjective association scores) were identical to the analysis for covert stereotypes. This also holds for GPT4, for which we again could not conduct the agreement analysis.

We again present average results for the five language models in the main article. Results broken down for individual model versions are provided in the  Supplementary Information , where we also analyse variation across prompts (Supplementary Fig. 8 and Supplementary Table 5 ).

Employability analysis

The general set-up of the employability analysis was identical to the stereotype analyses: we fed text written in either AAE or SAE, embedded in prompts, into the language models and analysed the probabilities that they assigned to different continuation tokens. However, instead of trait adjectives, we considered occupations for X and also used a different set of prompts ( Supplementary Information ). We created a list of occupations, drawing from previously published lists 6 , 76 , 105 , 106 , 107 . We provided details about these occupations in the  Supplementary Information . We then computed association scores q ( x ;  v ,  θ ) between individual occupations x and AAE, following the same methodology as for computing adjective association scores, and ranked the occupations according to q ( x ;  v ,  θ ) for the language models. To probe the prestige associated with the occupations, we drew from a dataset of occupational prestige 105 that is based on the 2012 US General Social Survey and measures prestige on a scale from 1 (low prestige) to 9 (high prestige). For GPT4, we could not conduct the parts of the analysis that require scores for all occupations.

We again present average results for the five language models in the main article. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Tables 6 – 8 ).

Criminality analysis

The set-up of the criminality analysis is different from the previous experiments in that we did not compute aggregate association scores between certain tokens (such as trait adjectives) and AAE but instead asked the language models to make discrete decisions for each AAE and SAE text. More specifically, we simulated trials in which the language models were prompted to use AAE or SAE texts as evidence to make a judicial decision. We then aggregated the judicial decisions into summary statistics.

We conducted two experiments. In the first experiment, the language models were asked to determine whether a person accused of committing an unspecified crime should be acquitted or convicted. The only evidence provided to the language models was a statement made by the defendant, which was an AAE or SAE text. In the second experiment, the language models were asked to determine whether a person who committed first-degree murder should be sentenced to life or death. Similarly to the first (general conviction) experiment, the only evidence provided to the language models was a statement made by the defendant, which was an AAE or SAE text. Note that the AAE and SAE texts were the same texts as in the other experiments and did not come from a judicial context. Rather than testing how well language models could perform the tasks of predicting acquittal or conviction and life penalty or death penalty (an application of AI that we do not support), we were interested to see to what extent the decisions of the language models, made in the absence of any real evidence, were impacted by dialect. Although providing the language models with extra evidence as well as the AAE and SAE texts would have made the experiments more similar to real trials, it would have confounded the effect that dialect has on its own (the key effect of interest), so we did not consider this alternative set-up here. We focused on convictions and death penalties specifically because these are the two areas of the criminal justice system for which racial disparities have been described in the most robust and indisputable way: African Americans represent about 12% of the adult population of the United States, but they represent 33% of inmates 108 and more than 41% of people on death row 109 .

Methodologically, we used prompts that asked the language models to make a judicial decision ( Supplementary Information ). For a specific text, t , which is in AAE or SAE, we computed p ( x ∣ v ( t );  θ ) for the tokens x that correspond to the judicial outcomes of interest (‘acquitted’ or ‘convicted’, and ‘life’ or ‘death’). T5 does not contain the tokens ‘acquitted’ and ‘convicted’ in its vocabulary, so is was excluded from the conviction analysis. Because the language models might assign different prior probabilities to the outcome tokens, we calibrated them using their probabilities in a neutral context following v , meaning without text t 104 . Whichever outcome had the higher calibrated probability was counted as the decision. We aggregated the detrimental decisions (convictions and death penalties) and compared their rates (percentages) between AAE and SAE texts. An alternative approach would have been to generate the judicial decision by sampling from the language models, which would have allowed us to induce the language models to generate justifications of their decisions. However, this approach has three disadvantages: first, encoder-only language models such as RoBERTa do not lend themselves to text generation; second, it would have been necessary to apply jail-breaking for some of the language models, which can have unpredictable effects, especially in the context of socially sensitive tasks; and third, model-generated justifications are frequently not aligned with actual model behaviours 110 .

We again present average results on the level of language models in the main article. Results for individual model versions are provided in the  Supplementary Information , where we also analyse variation across settings and prompts (Supplementary Figs. 9 and 10 and Supplementary Tables 9 – 12 ).

Scaling analysis

In the scaling analysis, we examined whether increasing the model size alleviated the dialect prejudice. Because the content of the covert stereotypes is quite consistent and does not vary substantially between models with different sizes, we instead analysed the strength with which the language models maintain these stereotypes. We split the model versions of all language models into four groups according to their size using the thresholds of 1.5 × 10 8 , 3.5 × 10 8 and 1.0 × 10 10 (Extended Data Table 7 ).

To evaluate the familiarity of the models with AAE, we measured their perplexity on the datasets used for the two evaluation settings 83 , 87 . Perplexity is defined as the exponentiated average negative log-likelihood of a sequence of tokens 111 , with lower values indicating higher familiarity. Perplexity requires the language models to assign probabilities to full sequences of tokens, which is only the case for GPT2 and GPT3.5. For RoBERTa and T5, we resorted to pseudo-perplexity 112 as the measure of familiarity. Results are only comparable across language models with the same familiarity measure. We excluded GPT4 from this analysis because it is not possible to compute perplexity using the OpenAI API.

To evaluate the stereotype strength, we focused on the stereotypes about African Americans reported in ref. 29 , which the language models’ covert stereotypes agree with most strongly. We split the set of adjectives X into two subsets: the set of stereotypical adjectives in ref. 29 , X s , and the set of non-stereotypical adjectives, X n  =  X \ X s . For each model with a specific size, we then computed the average value of q ( x ;  v ,  θ ) for all adjectives in X s , which we denote as q s ( θ ), and the average value of q ( x ;  v ,  θ ) for all adjectives in X n , which we denote as q n ( θ ). The stereotype strength of a model θ , or more specifically the strength of the stereotypes about African Americans reported in ref. 29 , can then be computed as

A positive value of δ ( θ ) means that the model associates the stereotypical adjectives in X s more strongly with AAE than the non-stereotypical adjectives in X n , whereas a negative value of δ ( θ ) indicates anti-stereotypical associations, meaning that the model associates the non-stereotypical adjectives in X n more strongly with AAE than the stereotypical adjectives in X s . For the overt stereotypes, we used the same split of adjectives into X s and X n because we wanted to directly compare the strength with which models of a certain size endorse the stereotypes overtly as opposed to covertly. All other aspects of the experimental set-up are identical to the main analyses of covert and overt stereotypes.

HF analysis

We compared GPT3.5 (ref. 49 ; text-davinci-003) with GPT3 (ref. 63 ; davinci), its predecessor language model that was trained without HF. Similarly to other studies that compare these two language models 113 , this set-up allowed us to examine the effects of HF training as done for GPT3.5 in isolation. We compared the two language models in terms of favourability and stereotype strength. For favourability, we followed the methodology we used for the overt-stereotype analysis and evaluated the average weighted favourability of the top five adjectives associated with AAE. For stereotype strength, we followed the methodology we used for the scaling analysis and evaluated the average strength of the stereotypes as reported in ref.  29 .

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

All the datasets used in this study are publicly available. The dataset released as ref. 87 can be found at https://aclanthology.org/2020.emnlp-main.473/ . The dataset released as ref. 83 can be found at http://slanglab.cs.umass.edu/TwitterAAE/ . The human stereotype scores used for evaluation can be found in the published articles of the Princeton Trilogy studies 29 , 30 , 31 , 34 . The most recent of these articles 34 also contains the human favourability scores for the trait adjectives. The dataset of occupational prestige that we used for the employability analysis can be found in the corresponding paper 105 . The Brown Corpus 114 , which we used for the  Supplementary Information (‘Feature analysis’), can be found at http://www.nltk.org/nltk_data/ . The dataset containing the parallel AAE, Appalachian English and Indian English texts 115 , which we used in the  Supplementary Information (‘Alternative explanations’), can be found at https://huggingface.co/collections/SALT-NLP/value-nlp-666b60a7f76c14551bda4f52 .

Code availability

Our code is written in Python and draws on the Python packages openai and transformers for language-model probing, as well as numpy, pandas, scipy and statsmodels for data analysis. The feature analysis described in the  Supplementary Information also uses the VALUE Python library 88 . Our code is publicly available on GitHub at https://github.com/valentinhofmann/dialect-prejudice .

Zhao, W. et al. WildChat: 1M ChatGPT interaction logs in the wild. In Proc. Twelfth International Conference on Learning Representations (OpenReview.net, 2024).

Zheng, L. et al. LMSYS-Chat-1M: a large-scale real-world LLM conversation dataset. In Proc. Twelfth International Conference on Learning Representations (OpenReview.net, 2024).

Gaebler, J. D., Goel, S., Huq, A. & Tambe, P. Auditing the use of language models to guide hiring decisions. Preprint at https://arxiv.org/abs/2404.03086 (2024).

Sheng, E., Chang, K.-W., Natarajan, P. & Peng, N. The woman worked as a babysitter: on biases in language generation. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing (eds Inui. K. et al.) 3407–3412 (Association for Computational Linguistics, 2019).

Nangia, N., Vania, C., Bhalerao, R. & Bowman, S. R. CrowS-Pairs: a challenge dataset for measuring social biases in masked language models. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 1953–1967 (Association for Computational Linguistics, 2020).

Nadeem, M., Bethke, A. & Reddy, S. StereoSet: measuring stereotypical bias in pretrained language models. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (eds Zong, C. et al.) 5356–5371 (Association for Computational Linguistics, 2021).

Cheng, M., Durmus, E. & Jurafsky, D. Marked personas: using natural language prompts to measure stereotypes in language models. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 1504–1532 (Association for Computational Linguistics, 2023).

Bonilla-Silva, E. Racism without Racists: Color-Blind Racism and the Persistence of Racial Inequality in America 4th edn (Rowman & Littlefield, 2014).

Golash-Boza, T. A critical and comprehensive sociological theory of race and racism. Sociol. Race Ethn. 2 , 129–141 (2016).

Article   Google Scholar  

Kasneci, E. et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103 , 102274 (2023).

Nay, J. J. et al. Large language models as tax attorneys: a case study in legal capabilities emergence. Philos. Trans. R. Soc. A 382 , 20230159 (2024).

Article   ADS   Google Scholar  

Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature 619 , 357–362 (2023).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Bolukbasi, T., Chang, K.-W., Zou, J., Saligrama, V. & Kalai, A. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Adv. Neural Inf. Process. Syst. 30 , 4356–4364 (2016).

Google Scholar  

Caliskan, A., Bryson, J. J. & Narayanan, A. Semantics derived automatically from language corpora contain human-like biases. Science 356 , 183–186 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Basta, C., Costa-jussà, M. R. & Casas, N. Evaluating the underlying gender bias in contextualized word embeddings. In Proc. First Workshop on Gender Bias in Natural Language Processing (eds Costa-jussà, M. R. et al.) 33–39 (Association for Computational Linguistics, 2019).

Kurita, K., Vyas, N., Pareek, A., Black, A. W. & Tsvetkov, Y. Measuring bias in contextualized word representations. In Proc. First Workshop on Gender Bias in Natural Language Processing (eds Costa-jussà, M. R. et al.) 166–172 (Association for Computational Linguistics, 2019).

Abid, A., Farooqi, M. & Zou, J. Persistent anti-muslim bias in large language models. In Proc. 2021 AAAI/ACM Conference on AI, Ethics, and Society (eds Fourcade, M. et al.) 298–306 (Association for Computing Machinery, 2021).

Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (Association for Computing Machinery, 2021).

Li, L. & Bamman, D. Gender and representation bias in GPT-3 generated stories. In Proc. Third Workshop on Narrative Understanding (eds Akoury, N. et al.) 48–55 (Association for Computational Linguistics, 2021).

Tamkin, A. et al. Evaluating and mitigating discrimination in language model decisions. Preprint at https://arxiv.org/abs/2312.03689 (2023).

Rae, J. W. et al. Scaling language models: methods, analysis & insights from training Gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021).

Green, L. J. African American English: A Linguistic Introduction (Cambridge Univ. Press, 2002).

King, S. From African American Vernacular English to African American Language: rethinking the study of race and language in African Americans’ speech. Annu. Rev. Linguist. 6 , 285–300 (2020).

Purnell, T., Idsardi, W. & Baugh, J. Perceptual and phonetic experiments on American English dialect identification. J. Lang. Soc. Psychol. 18 , 10–30 (1999).

Massey, D. S. & Lundy, G. Use of Black English and racial discrimination in urban housing markets: new methods and findings. Urban Aff. Rev. 36 , 452–469 (2001).

Dunbar, A., King, S. & Vaughn, C. Dialect on trial: an experimental examination of raciolinguistic ideologies and character judgments. Race Justice https://doi.org/10.1177/21533687241258772 (2024).

Rickford, J. R. & King, S. Language and linguistics on trial: Hearing Rachel Jeantel (and other vernacular speakers) in the courtroom and beyond. Language 92 , 948–988 (2016).

Grogger, J. Speech patterns and racial wage inequality. J. Hum. Resour. 46 , 1–25 (2011).

Katz, D. & Braly, K. Racial stereotypes of one hundred college students. J. Abnorm. Soc. Psychol. 28 , 280–290 (1933).

Gilbert, G. M. Stereotype persistance and change among college students. J. Abnorm. Soc. Psychol. 46 , 245–254 (1951).

Article   CAS   Google Scholar  

Karlins, M., Coffman, T. L. & Walters, G. On the fading of social stereotypes: studies in three generations of college students. J. Pers. Soc. Psychol. 13 , 1–16 (1969).

Article   CAS   PubMed   Google Scholar  

Devine, P. G. & Elliot, A. J. Are racial stereotypes really fading? The Princeton Trilogy revisited. Pers. Soc. Psychol. Bull. 21 , 1139–1150 (1995).

Madon, S. et al. Ethnic and national stereotypes: the Princeton Trilogy revisited and revised. Pers. Soc. Psychol. Bull. 27 , 996–1010 (2001).

Bergsieker, H. B., Leslie, L. M., Constantine, V. S. & Fiske, S. T. Stereotyping by omission: eliminate the negative, accentuate the positive. J. Pers. Soc. Psychol. 102 , 1214–1238 (2012).

Article   PubMed   PubMed Central   Google Scholar  

Ghavami, N. & Peplau, L. A. An intersectional analysis of gender and ethnic stereotypes: testing three hypotheses. Psychol. Women Q. 37 , 113–127 (2013).

Lambert, W. E., Hodgson, R. C., Gardner, R. C. & Fillenbaum, S. Evaluational reactions to spoken languages. J. Abnorm. Soc. Psychol. 60 , 44–51 (1960).

Ball, P. Stereotypes of Anglo-Saxon and non-Anglo-Saxon accents: some exploratory Australian studies with the matched guise technique. Lang. Sci. 5 , 163–183 (1983).

Thomas, E. R. & Reaser, J. Delimiting perceptual cues used for the ethnic labeling of African American and European American voices. J. Socioling. 8 , 54–87 (2004).

Atkins, C. P. Do employment recruiters discriminate on the basis of nonstandard dialect? J. Employ. Couns. 30 , 108–118 (1993).

Payne, K., Downing, J. & Fleming, J. C. Speaking Ebonics in a professional context: the role of ethos/source credibility and perceived sociability of the speaker. J. Tech. Writ. Commun. 30 , 367–383 (2000).

Rodriguez, J. I., Cargile, A. C. & Rich, M. D. Reactions to African-American vernacular English: do more phonological features matter? West. J. Black Stud. 28 , 407–414 (2004).

Billings, A. C. Beyond the Ebonics debate: attitudes about Black and standard American English. J. Black Stud. 36 , 68–81 (2005).

Kurinec, C. A. & Weaver, C. III “Sounding Black”: speech stereotypicality activates racial stereotypes and expectations about appearance. Front. Psychol. 12 , 785283 (2021).

Rosa, J. & Flores, N. Unsettling race and language: toward a raciolinguistic perspective. Lang. Soc. 46 , 621–647 (2017).

Salehi, B., Hovy, D., Hovy, E. & Søgaard, A. Huntsville, hospitals, and hockey teams: names can reveal your location. In Proc. 3rd Workshop on Noisy User-generated Text (eds Derczynski, L. et al.) 116–121 (Association for Computational Linguistics, 2017).

Radford, A. et al. Language models are unsupervised multitask learners. OpenAI https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (2019).

Liu, Y. et al. RoBERTa: a robustly optimized BERT pretraining approach. Preprint at https://arxiv.org/abs/1907.11692 (2019).

Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21 , 1–67 (2020).

MathSciNet   Google Scholar  

Ouyang, L. et al. Training language models to follow instructions with human feedback. In Proc. 36th Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 27730–27744 (NeurIPS, 2022).

OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

Zhang, E. & Zhang, Y. Average precision. In Encyclopedia of Database Systems (eds Liu, L. & Özsu, M. T.) 192–193 (Springer, 2009).

Black, J. S. & van Esch, P. AI-enabled recruiting: what is it and how should a manager use it? Bus. Horiz. 63 , 215–226 (2020).

Hunkenschroer, A. L. & Luetge, C. Ethics of AI-enabled recruiting and selection: a review and research agenda. J. Bus. Ethics 178 , 977–1007 (2022).

Upadhyay, A. K. & Khandelwal, K. Applying artificial intelligence: implications for recruitment. Strateg. HR Rev. 17 , 255–258 (2018).

Tippins, N. T., Oswald, F. L. & McPhail, S. M. Scientific, legal, and ethical concerns about AI-based personnel selection tools: a call to action. Pers. Assess. Decis. 7 , 1 (2021).

Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D. & Lampos, V. Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ Comput. Sci. 2 , e93 (2016).

Surden, H. Artificial intelligence and law: an overview. Ga State Univ. Law Rev. 35 , 1305–1337 (2019).

Medvedeva, M., Vols, M. & Wieling, M. Using machine learning to predict decisions of the European Court of Human Rights. Artif. Intell. Law 28 , 237–266 (2020).

Weidinger, L. et al. Taxonomy of risks posed by language models. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 214–229 (Association for Computing Machinery, 2022).

Czopp, A. M. & Monteith, M. J. Thinking well of African Americans: measuring complimentary stereotypes and negative prejudice. Basic Appl. Soc. Psychol. 28 , 233–250 (2006).

Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Res. 24 , 11324–11436 (2023).

Bai, Y. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint at https://arxiv.org/abs/2204.05862 (2022).

Brown, T. B. et al. Language models are few-shot learners. In  Proc. 34th International Conference on Neural Information Processing Systems  (eds Larochelle, H. et al.) 1877–1901 (NeurIPS, 2020).

Dovidio, J. F. & Gaertner, S. L. Aversive racism. Adv. Exp. Soc. Psychol. 36 , 1–52 (2004).

Schuman, H., Steeh, C., Bobo, L. D. & Krysan, M. (eds) Racial Attitudes in America: Trends and Interpretations (Harvard Univ. Press, 1998).

Crosby, F., Bromley, S. & Saxe, L. Recent unobtrusive studies of Black and White discrimination and prejudice: a literature review. Psychol. Bull. 87 , 546–563 (1980).

Terkel, S. Race: How Blacks and Whites Think and Feel about the American Obsession (New Press, 1992).

Jackman, M. R. & Muha, M. J. Education and intergroup attitudes: moral enlightenment, superficial democratic commitment, or ideological refinement? Am. Sociol. Rev. 49 , 751–769 (1984).

Bonilla-Silva, E. The New Racism: Racial Structure in the United States, 1960s–1990s. In Race, Ethnicity, and Nationality in the United States: Toward the Twenty-First Century 1st edn (ed. Wong, P.) Ch. 4 (Westview Press, 1999).

Gao, L. et al. The Pile: an 800GB dataset of diverse text for language modeling. Preprint at https://arxiv.org/abs/2101.00027 (2021).

Ronkin, M. & Karn, H. E. Mock Ebonics: linguistic racism in parodies of Ebonics on the internet. J. Socioling. 3 , 360–380 (1999).

Dodge, J. et al. Documenting large webtext corpora: a case study on the Colossal Clean Crawled Corpus. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (eds Moens, M.-F. et al.) 1286–1305 (Association for Computational Linguistics, 2021).

Steed, R., Panda, S., Kobren, A. & Wick, M. Upstream mitigation is not all you need: testing the bias transfer hypothesis in pre-trained language models. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (eds Muresan, S. et al.) 3524–3542 (Association for Computational Linguistics, 2022).

Feng, S., Park, C. Y., Liu, Y. & Tsvetkov, Y. From pretraining data to language models to downstream tasks: tracking the trails of political biases leading to unfair NLP models. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 11737–11762 (Association for Computational Linguistics, 2023).

Köksal, A. et al. Language-agnostic bias detection in language models with bias probing. In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds Bouamor, H. et al.) 12735–12747 (Association for Computational Linguistics, 2023).

Garg, N., Schiebinger, L., Jurafsky, D. & Zou, J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl Acad. Sci. USA 115 , E3635–E3644 (2018).

Ferrer, X., van Nuenen, T., Such, J. M. & Criado, N. Discovering and categorising language biases in Reddit. In Proc. Fifteenth International AAAI Conference on Web and Social Media (eds Budak, C. et al.) 140–151 (Association for the Advancement of Artificial Intelligence, 2021).

Ethayarajh, K., Choi, Y. & Swayamdipta, S. Understanding dataset difficulty with V-usable information. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 5988–6008 (Proceedings of Machine Learning Research, 2022).

Hoffmann, J. et al. Training compute-optimal large language models. Preprint at https://arxiv.org/abs/2203.15556 (2022).

Liang, P. et al. Holistic evaluation of language models. Transactions on Machine Learning Research https://openreview.net/forum?id=iO4LZibEqW (2023).

Blodgett, S. L., Barocas, S., Daumé III, H. & Wallach, H. Language (technology) is power: A critical survey of “bias” in NLP. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 5454–5476 (Association for Computational Linguistics, 2020).

Jørgensen, A., Hovy, D. & Søgaard, A. Challenges of studying and processing dialects in social media. In Proc. Workshop on Noisy User-generated Text (eds Xu, W. et al.) 9–18 (Association for Computational Linguistics, 2015).

Blodgett, S. L., Green, L. & O’Connor, B. Demographic dialectal variation in social media: a case study of African-American English. In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (eds Su, J. et al.) 1119–1130 (Association for Computational Linguistics, 2016).

Jørgensen, A., Hovy, D. & Søgaard, A. Learning a POS tagger for AAVE-like language. In Proc. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Knight, K. et al.) 1115–1120 (Association for Computational Linguistics, 2016).

Blodgett, S. L. & O’Connor, B. Racial disparity in natural language processing: a case study of social media African-American English. Preprint at https://arxiv.org/abs/1707.00061 (2017).

Blodgett, S. L., Wei, J. & O’Connor, B. Twitter universal dependency parsing for African-American and mainstream American English. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (eds Gurevych, I. & Miyao, Y.) 1415–1425 (Association for Computational Linguistics, 2018).

Groenwold, S. et al. Investigating African-American vernacular English in transformer-based text generation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (eds Webber, B. et al.) 5877–5883 (Association for Computational Linguistics, 2020).

Ziems, C., Chen, J., Harris, C., Anderson, J. & Yang, D. VALUE: Understanding dialect disparity in NLU. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (eds Muresan, S. et al.) 3701–3720 (Association for Computational Linguistics, 2022).

Davidson, T., Bhattacharya, D. & Weber, I. Racial bias in hate speech and abusive language detection datasets. In Proc. Third Workshop on Abusive Language Online (eds Roberts, S. T. et al.) 25–35 (Association for Computational Linguistics, 2019).

Sap, M., Card, D., Gabriel, S., Choi, Y. & Smith, N. A. The risk of racial bias in hate speech detection. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 1668–1678 (Association for Computational Linguistics, 2019).

Harris, C., Halevy, M., Howard, A., Bruckman, A. & Yang, D. Exploring the role of grammar and word choice in bias toward African American English (AAE) in hate speech classification. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 789–798 (Association for Computing Machinery, 2022).

Gururangan, S. et al. Whose language counts as high quality? Measuring language ideologies in text data selection. In Proc. 2022 Conference on Empirical Methods in Natural Language Processing (eds Goldberg, Y. et al.) 2562–2580 (Association for Computational Linguistics, 2022).

Gaies, S. J. & Beebe, J. D. The matched-guise technique for measuring attitudes and their implications for language education: a critical assessment. In Language Acquisition and the Second/Foreign Language Classroom (ed. Sadtano, E.) 156–178 (SEAMEO Regional Language Centre, 1991).

Hudson, R. A. Sociolinguistics (Cambridge Univ. Press, 1996).

Delobelle, P., Tokpo, E., Calders, T. & Berendt, B. Measuring fairness with biased rulers: a comparative study on bias metrics for pre-trained language models. In Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Carpuat, M. et al.) 1693–1706 (Association for Computational Linguistics, 2022).

Mattern, J., Jin, Z., Sachan, M., Mihalcea, R. & Schölkopf, B. Understanding stereotypes in language models: Towards robust measurement and zero-shot debiasing. Preprint at https://arxiv.org/abs/2212.10678 (2022).

Eisenstein, J., O’Connor, B., Smith, N. A. & Xing, E. P. A latent variable model for geographic lexical variation. In Proc. 2010 Conference on Empirical Methods in Natural Language Processing (eds Li, H. & Màrquez, L.) 1277–1287 (Association for Computational Linguistics, 2010).

Doyle, G. Mapping dialectal variation by querying social media. In Proc. 14th Conference of the European Chapter of the Association for Computational Linguistics (eds Wintner, S. et al.) 98–106 (Association for Computational Linguistics, 2014).

Huang, Y., Guo, D., Kasakoff, A. & Grieve, J. Understanding U.S. regional linguistic variation with Twitter data analysis. Comput. Environ. Urban Syst. 59 , 244–255 (2016).

Eisenstein, J. What to do about bad language on the internet. In Proc. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Vanderwende, L. et al.) 359–369 (Association for Computational Linguistics, 2013).

Eisenstein, J. Systematic patterning in phonologically-motivated orthographic variation. J. Socioling. 19 , 161–188 (2015).

Jones, T. Toward a description of African American vernacular English dialect regions using “Black Twitter”. Am. Speech 90 , 403–440 (2015).

Christiano, P. F. et al. Deep reinforcement learning from human preferences. Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4302–4310 (NeurIPS, 2017).

Zhao, T. Z., Wallace, E., Feng, S., Klein, D. & Singh, S. Calibrate before use: Improving few-shot performance of language models. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 12697–12706 (Proceedings of Machine Learning Research, 2021).

Smith, T. W. & Son, J. Measuring Occupational Prestige on the 2012 General Social Survey (NORC at Univ. Chicago, 2014).

Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K.-W. Gender bias in coreference resolution: evaluation and debiasing methods. In Proc. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Walker, M. et al.) 15–20 (Association for Computational Linguistics, 2018).

Hughes, B. T., Srivastava, S., Leszko, M. & Condon, D. M. Occupational prestige: the status component of socioeconomic status. Collabra Psychol. 10 , 92882 (2024).

Gramlich, J. The gap between the number of blacks and whites in prison is shrinking. Pew Research Centre https://www.pewresearch.org/short-reads/2019/04/30/shrinking-gap-between-number-of-blacks-and-whites-in-prison (2019).

Walsh, A. The criminal justice system is riddled with racial disparities. Prison Policy Initiative Briefing https://www.prisonpolicy.org/blog/2016/08/15/cjrace (2016).

Röttger, P. et al. Political compass or spinning arrow? Towards more meaningful evaluations for values and opinions in large language models. Preprint at https://arxiv.org/abs/2402.16786 (2024).

Jurafsky, D. & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Prentice Hall, 2000).

Salazar, J., Liang, D., Nguyen, T. Q. & Kirchhoff, K. Masked language model scoring. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (eds Jurafsky, D. et al.) 2699–2712 (Association for Computational Linguistics, 2020).

Santurkar, S. et al. Whose opinions do language models reflect? In Proc. 40th International Conference on Machine Learning (eds Krause, A. et al.) 29971–30004 (Proceedings of Machine Learning Research, 2023).

Francis, W. N. & Kucera, H. Brown Corpus Manual (Brown Univ.,1979).

Ziems, C. et al. Multi-VALUE: a framework for cross-dialectal English NLP. In Proc. 61st Annual Meeting of the Association for Computational Linguistics (eds Rogers, A. et al.) 744–768 (Association for Computational Linguistics, 2023).

Download references

Acknowledgements

V.H. was funded by the German Academic Scholarship Foundation. P.R.K. was funded in part by the Open Phil AI Fellowship. This work was also funded by the Hoffman-Yee Research Grants programme and the Stanford Institute for Human-Centered Artificial Intelligence. We thank A. Köksal, D. Hovy, K. Gligorić, M. Harrington, M. Casillas, M. Cheng and P. Röttger for feedback on an earlier version of the article.

Author information

Authors and affiliations.

Allen Institute for AI, Seattle, WA, USA

Valentin Hofmann

University of Oxford, Oxford, UK

LMU Munich, Munich, Germany

Stanford University, Stanford, CA, USA

Pratyusha Ria Kalluri & Dan Jurafsky

The University of Chicago, Chicago, IL, USA

Sharese King

You can also search for this author in PubMed   Google Scholar

Contributions

V.H., P.R.K., D.J. and S.K. designed the research. V.H. performed the research and analysed the data. V.H., P.R.K., D.J. and S.K. wrote the paper.

Corresponding authors

Correspondence to Valentin Hofmann or Sharese King .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Rodney Coates and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 weighted average favourability of top stereotypes about african americans in humans and top overt as well as covert stereotypes about african americans in language models (lms)..

The overt stereotypes are more favourable than the reported human stereotypes, except for GPT2. The covert stereotypes are substantially less favourable than the least favourable reported human stereotypes from 1933. Results without weighting, which are very similar, are provided in Supplementary Fig. 6 .

Extended Data Fig. 2 Prestige of occupations associated with AAE (positive values) versus SAE (negative values), for individual language models.

The shaded areas show 95% confidence bands around the regression lines. The association with AAE versus SAE is negatively correlated with occupational prestige, for all language models. We cannot conduct this analysis with GPT4 since the OpenAI API does not give access to the probabilities for all occupations.

Supplementary information

Supplementary information, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024). https://doi.org/10.1038/s41586-024-07856-5

Download citation

Received : 08 February 2024

Accepted : 19 July 2024

Published : 28 August 2024

DOI : https://doi.org/10.1038/s41586-024-07856-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

qualitative research nature

IMAGES

  1. The Nature of Qualitative Research

    qualitative research nature

  2. Nature of Qualitative Research by Dennis Erasga

    qualitative research nature

  3. PPT

    qualitative research nature

  4. Qualitative Research: Nature and Characteristics ~GM Lectures

    qualitative research nature

  5. PPT

    qualitative research nature

  6. PPT

    qualitative research nature

VIDEO

  1. Definition of Social Research || Characteristics, Types, Nature, Aims/Objectives ||Research Paper||

  2. Beautiful nature with fruits world 😍🥰🥝🫐🍒🍓🍇🍉🥑#222#shorts #fruitworld

  3. 10 Difference Between Qualitative and Quantitative Research (With Table)

  4. QUALITATIVE RESEARCH : Nature & Approaches

  5. Grading the Qualitative Nature of Your Current Revenue! 💡

  6. The iterative Nature of the research process

COMMENTS

  1. What Is Qualitative Research?

    Qualitative research involves collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences. It can be used to gather in-depth insights into a problem or generate new ideas for research. Qualitative research is the opposite of quantitative research, which involves collecting and ...

  2. What is Qualitative in Qualitative Research

    What is qualitative research? If we look for a precise definition of qualitative research, and specifically for one that addresses its distinctive feature of being "qualitative," the literature is meager. ... (Snow and Anderson 1993:22) to four different categories within which questions on the nature and properties of qualitative research ...

  3. What Is Qualitative Research? An Overview and Guidelines

    Abstract. This guide explains the focus, rigor, and relevance of qualitative research, highlighting its role in dissecting complex social phenomena and providing in-depth, human-centered insights. The guide also examines the rationale for employing qualitative methods, underscoring their critical importance. An exploration of the methodology ...

  4. Characteristics of Qualitative Research

    The exploratory nature of qualitative research helps generate hypotheses that can be tested quantitatively (Busetto et al., 2020). Flexibility. Data collection and analysis can be modified and adapted to take the research in a different direction if new ideas or patterns emerge in the data.

  5. Planning Qualitative Research: Design and Decision Making for New

    Qualitative research, conducted thoughtfully, is internally consistent, rigorous, and helps us answer important questions about people and their lives ... The key criteria are the focus on the nature and the meaning of an experience (a phenomenon), described by the people experiencing the phenomenon (Annells, 1999).

  6. Qualitative Research

    Qualitative Research. Qualitative research is a type of research methodology that focuses on exploring and understanding people's beliefs, attitudes, behaviors, and experiences through the collection and analysis of non-numerical data. It seeks to answer research questions through the examination of subjective data, such as interviews, focus groups, observations, and textual analysis.

  7. Qualitative Study

    Qualitative research is a type of research that explores and provides deeper insights into real-world problems.[1] Instead of collecting numerical data points or intervening or introducing treatments just like in quantitative research, qualitative research helps generate hypothenar to further investigate and understand quantitative data. Qualitative research gathers participants' experiences ...

  8. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  9. Powers of qualitative research

    Metrics. Old-fashioned qualitative research methods are still powerful in answering the most emergent climate questions we are faced with. In natural and social science studies, quantitative ...

  10. Qualitative Research: An Overview

    Qualitative research is a 'big tent' that encompasses various schools of thoughts. There is a general consensus that qualitative research is best used to answer why and howresearch questions, but not how much or to what extent questions. The word 'how can Footnote 5 ' is also frequently used in the research question of a qualitative research; this typically requires open-ended vs ...

  11. The data revolution in social science needs qualitative research

    Qualitative research can prevent some of these problems. Such methods can help to understand data quality, inform design and analysis decisions and guide interpretation of results. The ...

  12. What is Qualitative Research? Definition, Types, Examples ...

    Qualitative research is defined as an exploratory method that aims to understand complex phenomena, often within their natural settings, by examining subjective experiences, beliefs, attitudes, and behaviors. ... Subjectivity: Qualitative research acknowledges the subjective nature of human experiences and perceptions. It recognizes that ...

  13. Qualitative research

    Qualitative research is a type of research that aims to gather and analyse non-numerical (descriptive) data in order to gain an understanding of individuals' social reality, including understanding their attitudes, beliefs, and motivation. This type of research typically involves in-depth interviews, focus groups, or observations in order to collect data that is rich in detail and context.

  14. Qualitative research: Thoughts on how to do it; how to judge ...

    Qualitative methods have been gaining acceptance in biomedical research over the past decades. 1,2 Nevertheless, skepticism remains about the validity, reliability, generalizability and general ...

  15. PDF Qualitative Research

    in nature. Applied research can, and often does, generate new knowledge and contrib-ute to theory, but its primary focus is on collecting and generating data to further our ... qualitative research involves collecting and/or working with text, images, or sounds. An outcome-oriented definition such as that proposed by Nkwi et al. avoids (typically

  16. How to use and assess qualitative research methods

    Qualitative research is defined as "the study of the nature of phenomena", including "their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived", but excluding "their range, frequency and place in an objectively determined chain of cause and effect" [].This formal definition can be complemented with a more ...

  17. Quantitative and Qualitative Research

    Qualitative research is a process of naturalistic inquiry that seeks an in-depth understanding of social phenomena within their natural setting. It focuses on the "why" rather than the "what" of social phenomena and relies on the direct experiences of human beings as meaning-making agents in their every day lives.

  18. Qualitative vs Quantitative Research: What's the Difference?

    Advantages. The main difference between quantitative and qualitative research is the type of data they collect and analyze. Quantitative research collects numerical data and analyzes it using statistical methods. The aim is to produce objective, empirical data that can be measured and expressed numerically. Quantitative research is often used ...

  19. Qualitative vs. Quantitative Research

    When collecting and analyzing data, quantitative research deals with numbers and statistics, while qualitative research deals with words and meanings. Both are important for gaining different kinds of knowledge. Quantitative research. Quantitative research is expressed in numbers and graphs. It is used to test or confirm theories and assumptions.

  20. Make the most of qualitative research

    Metrics. Nature Sustainability aims to give qualitative studies the recognition they deserve. To address grand sustainability challenges, we need more than numbers alone. A lot of knowledge is ...

  21. Human errors in emergency medical services: a qualitative analysis of

    The data was analyzed using inductive content analysis. Consolidated criteria for reporting qualitative research were used. Contributing factors to human errors were divided into three main categories. The first main category, Changing work environment, consisted of two generic categories: The nature of the work and Factors linked to missions.

  22. Methods for quantitative research in psychology

    Compare and contrast the major research designs. Explain how to judge the quality of a source for a literature review. Compare and contrast the kinds of research questions scientists ask. Explain what it means for an observation to be reliable. Compare and contrast forms of validity as they apply to the major research designs.

  23. What is Qualitative in Qualitative Research

    What is qualitative research? If we look for a precise definition of qualitative research, and specifically for one that addresses its distinctive feature of being "qualitative," the literature is meager. In this article we systematically search, identify and analyze a sample of 89 sources using or attempting to define the term "qualitative." Then, drawing on ideas we find scattered ...

  24. Unraveling Childhood Obesity: A Grounded Theory Approach to ...

    These factors interact in complex ways, highlighting the multifactorial nature of childhood obesity. The study employed a qualitative grounded theory approach, using research articles to achieve a thorough understanding. Qualitative analysis of the articles was conducted using Atlas.ti 24.0 software.

  25. Framing Collective Moral Responsibility for Climate Change: A

    Prior research is often ambiguous about the nature of the collective when exploring moral responsibility, and often uses backward-looking and forward-looking responsibility interchangeably. ... Forum: Qualitative Social Research, 1(2), 20. Google Scholar Mayring, P. (2010). Qualitative content analysis: Basics and techniques (11th ed.). Beltz ...

  26. Methods of data collection in qualitative research: interviews ...

    There are a variety of methods of data collection in qualitative research, including observations, textual or visual analysis (eg from books or videos) and interviews (individual or group). 1 ...

  27. Symbolism and Cultural Representation in Lomban Tradition in Jepara

    Over time, Lomban has undergone transformations in its execution and symbolic meanings. This research aims to uncover the symbolism within the elements of the Lomban tradition, such as the procession of offerings, traditional dances, and the use of other symbols, using approaches from culturalanthropology and semiotics.

  28. AI generates covertly racist decisions about people based on ...

    Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article. Data availability All the datasets used in this study are publicly available.