One of the best ways to prepare for case interviews at firms like McKinsey, BCG, or Bain, is by studying case interview examples.
There are a lot of free sample cases out there, but it's really hard to know where to start. So in this article, we have listed all the best free case examples available, in one place.
The below list of resources includes interactive case interview samples provided by consulting firms, video case interview demonstrations, case books, and materials developed by the team here at IGotAnOffer. Let's continue to the list.
1. mckinsey case interview examples.
Using case interview examples is a key part of your interview preparation, but it isn’t enough.
At some point you’ll want to practise with friends or family who can give some useful feedback. However, if you really want the best possible preparation for your case interview, you'll also want to work with ex-consultants who have experience running interviews at McKinsey, Bain, BCG, etc.
If you know anyone who fits that description, fantastic! But for most of us, it's tough to find the right connections to make this happen. And it might also be difficult to practice multiple hours with that person unless you know them really well.
Here's the good news. We've already made the connections for you. We’ve created a coaching service where you can do mock case interviews 1-on-1 with ex-interviewers from MBB firms . Start scheduling sessions today!
Comparison chart, methodology, generalizability, case study and survey definitions, what is the purpose of a case study, what is a case study, can case studies be generalized, can case studies be biased, are case studies credible, how is data collected in a case study, how long does a case study take, what is a survey, are case studies qualitative or quantitative, what fields use case studies, are surveys qualitative or quantitative, how are survey results analyzed, what challenges are associated with surveys, can surveys predict behavior, what makes a good case study, what types of surveys exist, what is a good response rate for a survey, what is the purpose of a survey, can surveys be biased, how are surveys conducted.
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
Ragnhild holgaard.
1 Center for Human Resources, Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, Herlev Hospital, 25th floor, Herlev Ringvej 75, Herlev, 2370 Denmark
Frederik zingenberg, peter dieckmann.
2 Department of Public Health, Copenhagen University, Øster Farimagsgade 5, Copenhagen, 1353 Denmark
3 Department of Quality and Health Technology, University in Stavanger, Kjell Arholms Gate 43, Stavanger, 4021 Norway
The data used in this study are not publicly available due to the sensitivity of the data. It will not be possible to obtain the raw data upon request, as neither the participants nor the institution have agreed to have the raw data shared. Questions regarding the data or requests to see the data can be addressed to the corresponding author.
How healthcare professionals understand and use concepts of social and cognitive capabilities will influence their behaviour and their understanding of others’ behaviour. Differing understandings of concepts might lead to healthcare professionals not acting in accordance with other healthcare professionals’ expectations. Therefore, part of the problem concerning errors and adverse incidents concerning social and cognitive capabilities might be due to varying understandings of concepts among different healthcare professionals. This study aimed to examine the variations in how educators at the Copenhagen Academy for Medical Education and Simulation talk about social and cognitive capabilities.
The study was conducted using semi-structured interviews and directed content analysis. The codes for the analysis process were derived from existing non-technical skills models and used to show variations in how the participants talk about the same concepts.
Educators with a background as nurses and physicians, talked differently about leadership and decision-making , with the nurses paying greater attention to group dynamics and external factors when describing both leadership and decision-making , whereas physicians focus on their individual efforts.
We found patterned differences in how the participants described leadership and decision-making that may be related to participants’ professional training/background. As it can create misunderstandings and unsafe situations if nurses and physicians disagree on the meaning of leadership and decision-making (without necessarily recognising this difference), it could be beneficial to educate healthcare professionals to be aware of the specificity of their own concepts, and to communicate what exactly they mean by using a particular concept, e.g. “I want you to coordinate tasks” instead of “I want better leadership”.
The online version contains supplementary material available at 10.1186/s12909-024-05682-x.
Social and cognitive capabilities are essential for safe and proficient patient care and treatment [ 1 – 3 ]. Traditionally, these capabilities have been called ‘non-technical skills’, but concern has been raised that the term is inadequate as it downgrades the value of the capabilities and defines them by what they are not instead of what they are [ 4 ]. Therefore, we have adopted the terminology social and cognitive capabilities . Social and cognitive capabilities include the ability to lead, communicate, make decisions, form an understanding of the situation, or work together in a team [ 5 – 11 ]. The social and cognitive capabilities have earlier been described in an array of models specific to certain medical fields (under the label of non-technical skills models or NTS model) [ 6 , 8 – 11 ]. Each model contains four categories, several elements under each category, and behavioural markers for each element, which together explain vital social and cognitive capabilities within that field. The models have been used for teaching and assessment purposes [ 12 , 13 ].
To the extent that social and cognitive capabilities can be analytically separated from so-called technical skills, studies have shown that issues related to social and cognitive capabilities contribute to up to 2/3 of errors and adverse incidents in hospitals [ 14 ]. Early work on this was done at the end of the 1970s and was intensified with the landmark report “To Err is Human” by the Institute of Medicine in the USA [ 15 , 16 ]. Despite efforts to improve safety, unsafe events in hospitals are still a significant problem [ 17 , 18 ].
Many factors could contribute to an ongoing issue regarding the role of social and cognitive capabilities in patient safety. Some studies seem to indicate that different groups of healthcare professionals might understand and apply certain social and cognitive capabilities – like teamwork, decision-making, and leadership – differently [ 19 , 20 ]. If different groups of healthcare professionals act differently in relation to social and cognitive capabilities, such as leadership and decision-making, it could mean that their concepts behind the capabilities differ. Souba [ 21 ] argued that how we understand and think about a certain word, for example leadership , will influence how we act, how we speak, and what attitudes we have. Following that reasoning, we argue that healthcare professionals’ internal understanding of the terms behind social and cognitive capabilities could be coupled with the way they enact particular social and cognitive capabilities. Such internal understandings of words or terms might be referred to as concepts [ 22 , 23 ], mental models [ 24 ], prototypes [ 25 ], and schemata [ 26 ]. Here, we call such internal understandings concepts . Every experience will potentially work to adjust an individual’s concepts. Some experiences can be designed specifically to form or adjust concepts. This is the case for experiences gained during education. In healthcare education, the concepts of leadership and decision-making will come up in many courses. The overt curriculum in these courses will obviously work to form and adjust concepts, but the hidden curriculum of the courses will also be influential [ 27 , 28 ].
The hidden curriculum is a term used to describe cultural and social norms taught implicitly to students through experiences [ 27 , 29 ]. This can either be their experiences in clinical practice or their experiences in teaching situations, such as how the educator uses and describes concepts. Some research suggests that the hidden curriculum is a more powerful determinant of later behaviour than the formal curriculum [ 28 , 30 ]. Any education situation that health care professionals meet during their lifelong education will have an overt curriculum. This could be a plan of activities to teach them something about leadership. The experience will however also have a hidden curriculum, such as the valence ascribed to different leadership styles based on the educator’s personal preferences [ 27 , 31 ]. Since education, including both formal and hidden curricula, work to shape concepts, educators play a key role for concept formation among healthcare professionals.
Based on Souba’s [ 21 ] observation that our understanding of a concept will influence how we act, we wish to examine variations in how educators at Copenhagen Academy for Medical Education and Simulation (CAMES) talk about social and cognitive capabilities. These potential differences are of interest as they will form part of a hidden curriculum within the courses taught by the educators [ 27 , 31 ]. As such, variations will potentially result in different learning outcomes for the learners, which in this case means different understandings of the concepts behind social and cognitive capabilities. Different understandings of the concepts will mean different actions [ 21 , 32 ], which could be part of the explanation for the differences we see in clinical practice as well as the cause of misunderstandings and unsafe situations. It is within this line of thought that we investigate differences in the ways that nurses and physicians speak of leadership and decision-making.
This qualitative interview study was carried out from February to August 2019. The aim of the study was to examine variations in how educators at CAMES talk about social and cognitive capabilities. The study was conducted using semi-structured interviews [ 33 ] and directed content analysis [ 34 ]. Codes for the analysis process were derived from existing NTS models and applied to show variations in how the participants talk about the same concepts. The Regional Ethics Committee of the Capital Region of Denmark waived the ethical review of the study (H-19,023,177). Participants received written and oral information about the purpose of the study, and all participants signed a written consent form before the interview took place. Data is reported using the COREQ checklist [ 35 ].
FZ, an organizational psychologist, conducted the interviews. At the time of the interviews, FZ was a relatively new colleague of the interviewees and new to the healthcare system, which allowed him to be unbiased by previous healthcare experiences when interviewing and asking clarifying questions. FZ worked alongside the participants but was not directly involved in their work. FZ has conducted interviews in both clinical and work and organisational settings. RH and BB coded the interviews, and RH conducted the preliminary analysis. RH is a cognitive psychologist who has only recently started working within the healthcare system. This was utilised as a strength, as it made it easier for her to notice and be curious about tacit information and patterns in the interviews during the coding and the analysis. RH has worked with tacit information on earlier projects within other fields. BB is an anthropologist (PhD) with qualitative research experience within the healthcare system. Her experience and attention to the situatedness of discourse sharpened the analysis and placed it within current debate in health professions education research. PD is a work and organisational psychologist (PhD) who has worked within the healthcare system for about 22 years. His extensive knowledge was used to place our findings in a theoretical and practical frame within the field. The mix of researchers new to the field and the experience contributed by others, allowed us not only to see new perspectives, but also to situate them within current debates about social and cognitive capabilities [ 4 ].
The study is based on interviews with healthcare educators from CAMES, an influential health professional education institution in Denmark and internationally. CAMES has approximately 10,000 course participants per year, about 110 facilitators, and about 20 course directors. In addition, almost 150 simulation facilitators are trained per year in train-the-trainer courses at CAMES. Taken together, educators at CAMES have the potential to affect conceptualizations in many healthcare professionals in Denmark and thereby potentially influence their work in the Danish healthcare system.
The study participants are 11 course directors at CAMES. Course directors at CAMES organise courses and are responsible for content, programme, materials, and externally recruited educators related to the courses. The course directors involved in this study were approached based on their engagement in courses teaching social and cognitive capabilities. We decided to interview course directors, since their articulation of social and cognitive capabilities will likely influence how those capabilities are taught in their courses (e.g., through selection of educators, content, and materials). In this way, course directors are placed in a position where their knowledge and articulation potentially influence course participants’ learning and subsequent clinical practice. The participating course directors had at least one year of experience with teaching cognitive and social capabilities in simulation-based settings. All of them had a clinical background as nurses (six) or physicians (five). Ten course directors were women. All invited participants agreed to take part in the study and did so voluntarily. No one dropped out of the study.
Data was produced through semi-structured interviews [ 33 ]. We chose this interview method since we were interested in studying how course directors describe social and cognitive capabilities without their descriptions being influenced by a specific teaching situation, where many social factors might influence how a certain concept is spoken about. We asked course directors to talk about their teaching practice and the concepts they teach in their courses. With this prompting their descriptions of the concepts would likely reflect that context. The study was presented as part of ongoing efforts to develop teaching quality at CAMES.
Each interview focused on investigating the course director’s articulation of the central categories in a model of social and cognitive capabilities (NTS model) of their own choice (see interview guide in Appendix 1 ). We interviewed the course directors based on the following models: ANTSdk, N-ANTS, and NOTSSdk [ 8 , 10 , 36 ]. The categories and elements (marked by bullets) in each model are shown in Table 1 . Each interviewee was asked open-ended questions to describe each category in the chosen model, and the categories were later used as initial codes for the directed content analysis inspired by the analysis process as described by Hsieh and Shannon [ 34 ].
The three NTS models used in the interviews [ 8 , 10 , 36 ]
• Gathering information • Recognising and understanding information • Predicting and thinking ahead • Exhibit self-insight |
• Identifying options • Choosing, communicating, and implementing decisions • Re-evaluate decisions |
• Exchanging information • Coordinating activities • Assessing capabilities • Supporting others |
• Planning and preparing • Prioritising • Identifying and utilising resources • Using authority and assertiveness • Setting and maintaining standards | |
• Gathering information • Recognising and understanding information • Anticipating and thinking ahead |
• Identifying options • Assessing and weighing up options • Reassessing decisions |
• Exchanging information • Assessing roles and competencies • Coordinating activities • Displaying authority and strength • Exhibiting team behaviour and support for team members |
• Planning • Setting priorities • Making use of resources • Maintaining standards | |
• Gathering information • Understanding information • Predicting and thinking ahead • Monitoring own performance |
• Considering options • Selecting and communicating decisions • Implementing and assessing decisions |
• Exchanging information • Establishing a shared understanding • Coordinating activities |
• Setting and maintaining standards • Supporting others • Coping with pressure |
The categories of each NTS model are shown in bold, and the elements under each category is marked by bullets
The interviews were carried out at the participants’ workplace, CAMES Herlev Hospital, in an interview room separate from the participants’ workstations and colleagues. In all interviews, only the participant and the interviewer were present in the room. The interview guide was formulated and validated by the authors as well as pilot tested with the first interview. Each interview, lasting between 20 and 30 min., was recorded and afterwards transcribed verbatim. The interviewer took notes during each interview. Interviews, notes, and transcriptions were produced in Danish. Only extracts presented in this article were translated into English. Once the transcriptions were completed, they were returned to the participants for the participants’ review to suggest corrections or comments. As no comments or corrections were provided on the transcriptions, and since authors agreed that data saturation had been reached, it was decided not to carry out any repeat interviews.
Data was analysed using directed content analysis [ 34 ]. We chose this method because we wished to extend our current understanding of well-used concepts [ 34 ]. We used the NTS models as a framework (Table 1 ). Analysis was carried out using the NVivo software ( www.lumivero.com ) based on the following steps: analysis prior to and during data production, workshop with interviewees, coding of transcripts, looking at patterns, recoding of N-ANTS interviews, and consensus.
This being a qualitative study, data analysis started already with our choice of theoretical framework (content analysis), which directed our method (semi-structured interview) and the categories used in the interviews. The data production process has been further described in the section above.
After data production, the initial impressions from the interviews were discussed with the interviewees in a workshop led by FZ. The purpose of the discussion was to bring to light new or missed perspectives from the participants. The workshop was valuable for the participants, but it did not uncover new perspectives for the analysis.
Inspired by the analysis process described by Hsieh and Shannon [ 34 ], RH and BB coded the transcriptions using the elements from the NTS model on which the specific interview was based as initial codes. A coding unit could be one or more sentences or pieces of sentences. A coding unit was coded if it used the specific words of an element from the NTS model used, or if the meaning was judged to be fitting with the meaning of the element. Coding units could be coded to fit into more than one element. In some cases, an interviewee talked about a category from the chosen model without it belonging to a clear element within that model. In these cases, the unit was coded as the category from the model instead of a concrete element. Furthermore, we coded each coding unit based on whether the interviewee was talking about the individual, the team, the organisation, or society. RH and BB discussed differences in their coding, and a shared understanding was reached in each case. Immediately after the first round of coding, we coded all interviews again to check the consistency of the coding. An example of a coding tree for decision-making is shown in Fig. 1 . Similar coding trees were used for the four other categories and their elements.
Coding tree for decision-making
During the analysis, we looked at different constellations in the dataset to find patterns in the variations of how the participants talked about the concepts. In this process, we became interested in leadership and decision-making as we found a clear pattern in relation to these categories. This pattern made us to look at the dataset again, as leadership is not a category within the N-ANTS model. For this reason, we looked through the N-ANTS interviews again and found all the paragraphs concerning leadership. We included these paragraphs in our “ leadership data”. We decided not to continue working with situation awareness , teamwork , and task management as these categories did not show the same pattern as found in leadership and decision-making . Even though we found the pattern only in two of the categories, we argue that the finding is sufficient to establish the possibility that differences in understanding of words exist in the healthcare system.
Throughout the analysis, we examined the transcripts repeatedly to find examples to substantiate or reject the observed patterns. This process involved all authors. We discussed findings and agreed that our data could substantiate the observed patterns sufficiently. We presented our preliminary findings and analysis to the group of participants in the study. Participants’ feedback was acknowledgement and recognition of the observations presented.
In this study, we carried out 11 semi-structured interviews with course directors involved in teaching and planning education related to social and cognitive capabilities. Using a directed content analysis [ 34 ], we examined patterns in variations in how participants talked about concepts in three NTS models; ANTSdk, N-ANTS, and NOTTSdk [ 8 , 10 , 36 ]. During this process, we noted how course directors with professional backgrounds as nurses and course directors with professional backgrounds as physicians described the two categories decision-making and leadership both similarly and differently. Similarities included a pronounced interweaving of course directors’ concepts and definitions of the concepts as per the NTS models. Differences between nurses and physicians included variations in understanding of role distribution (e.g., who could be the leader), focus on external versus internal factors, and the ascribed importance of group dynamics versus individual capabilities. Course directors with a background as nurses generally thought that nurses could be good leaders, and they generally paid greater attention to group dynamics, hierarchies, and external factors. Course directors with a background as physicians tended to think that leaders needed to be physicians, and they focused on their individual efforts when they talked about decision-making and leadership . In the following sections, we elaborate on these findings by first showing how the course directors – nurses and physicians alike – describe decision-making and leadership . Afterwards, we elaborate on patterns in how nurses and physicians describe the two categories differently.
Both nurses and physicians linked leadership and decision-making , and they made connections between these two categories. When asked to describe leadership , there were interviewees, who started talking about a leader (typically the team leader) as a person, whereas other interviewees talked about leadership as an activity. Only one interviewee talked about followership when describing leadership . A team leader was described as someone who collects information about what is going on and maintains an overview of the situation:
“ I perceive the team leader position as having to keep an overview, maintaining the overview, you have to keep track of your team members. You have to keep your hands in your pockets, so that you can maintain an overview and not get caught in a heads-down procedure ” (physician 1).
Furthermore, a team leader was described as responsible for creating a calm, comfortable, and safe atmosphere within the team: “ It is important that you create a pleasant and calm atmosphere around the patient, and that you give your team members a good feeling, so that they won´t become stressed, but can function optimally ” (physician 1). Both nurses and physicians stressed that the team leader should be the person most competent to do the job, and that it can be a problem if someone is appointed leader without the necessary qualifications:
“ Yes, I think it takes professional skills to be the leader in an emergency. It can probably be discussed whether leadership only requires an overview - what gives you that overview? That is a matter of having competencies within the situation, and for that reason it is traditionally the most competent person who gets the leader role. It can become a problem, if you get the leader role based on your profession and not on your competencie s” (nurse 4).
Interviewees from both professions talked about leadership entailing authority. All participants mentioned every element of leadership from their respective NTS models, but it is interesting to note that three nurses were interviewed based on N-ANTS, which does not contain leadership as a category. Still, these nurses referred to leadership : “ So it is both about prioritising what we need to do first and what we need to do afterwards, but also about thinking of delegating: what can I do myself, and what do I want you to do ” (nurse 3).
Decision-making was described as requiring extensive experience and a strong theoretical background – either by the decisionmaker him- or herself, or that the decisionmaker must include knowledge from experts/the team in the decision-making: “ If you have seen 5000 patients being anesthetised in the same way, then you have seen some things and some patterns, and that makes it possible for you to make the right decision ” (nurse 5).
“ As the team leader, I have other tasks. [For example: ] I don’t know which antibiotic to give, and now I have to spend time to look it up – I don’t want to do that. I have a competent person who can do it by virtue of his/her knowledge, so I delegate that task, and then they also have to make the decision about it ” (physician 1).
For this reason, inexperience was mentioned as a potential problem for decision-making. Both nurses and physicians mentioned that inexperienced decisionmakers could have problems recognising the relevant patterns to adequately understand the situation and make the right decisions:
“ When you are a novice and new in a profession, sometimes it is difficult to know what situation you are in, and what decision you should take, because you don’t have the necessary knowledge, or you haven’t seen enough examples of what other possibilities you have ” (nurse 5).
“ It might happen that you don’t dare to make a decision, or that you are not capable of it, because you get so perplexed by all the incoming information when you are standing there with your situational awareness, and you get all this information from your team and how… you cannot recognise patterns, for example ” (physician 3).
Another challenge that might hinder decision-making, according to the interviewees, was the social hierarchy. An example given by the interviewees was how knowledge in the team can be lost if some members of the team (e.g., nurses or junior physicians) are not heard because of social hierarchy, or if they do not feel comfortable sharing their opinions or observations. Lost knowledge potentially leads to poorer decision-making.
“ There can be some hierarchy in it, just because physicians are worth more than nurses, for example. There can also be some hierarchy in whether you are new or experienced. And then there can be a learning culture in a ward, which can result in it not being very welcome to ask all those questions ” (nurse 6).
Interviewees warned against fixation errors in relation to decision-making. They talked about the risk of being wrong when making a decision, and about how it is important to constantly keep an eye on different possibilities: “ We see some systems which fit into the puzzle, and then we think it is like that, and then we go that way and don’t see that the puzzle could be laid in another way ” (physician 4).
Decision-making was, according to the interviewees, related to both teamwork , leadership and situation awareness , and all interviewees mentioned every element under the category decision-making in their respective non-technical skills models (see Table 1 ).
While working with data, we became aware of patterns in the way that nurses and physicians talked differently about the categories in the context of their work life. These differences were not based solely on the content of specific categories, but on differences in what nurses and physicians focused on and showed interest in.
Nurses talked about leadership and decision-making as something every team member takes part in, and as something every team member is responsible for. Nurses described decision-making as something the entire team contributes to: “(…) but I find decision-making isn’t necessarily placed only with the leader, it is placed with every team member in which direction you go ” (nurse 2). Similarly, nurses talked about leadership as something both nurses and physicians enact. Sometimes it was described as different kinds of leadership: “ Because the scrub nurse can really have a lot of leadership in the operating room. That is, the inventory and the accessories and what goes in and out. And who should be called to assist and when. And there is a great deal of leadership in that ” (nurse 2).
Another nurse talked about how nurses can sometimes be the best team leader, for example in a staff constellation of a senior nurse and a junior physician. By contrast, physicians described leadership and decision-making as something the physician does (alone). Their focus was on the role of the physician and the physician’s responsibilities:
“ It is also how the team leader attains a position of authority, as he/she should have, which is especially difficult in a paediatric ward, unlike perhaps… what do I know, surgical wards, it doesn’t feel natural for paediatricians to have authority and assertiveness, because our daily tone is very non-hierarchical and with very little authority and assertiveness from physicians, from where the team leader must be recruited ” (physician 5).
Interestingly, a physician referred to the same staff constellation as described by a nurse, i.e., a situation where a senior nurse and a junior physician would be working together, but the physician described how it is important to teach the junior physician to lead and the senior nurse to respect the leadership.
The tendency for nurses to be oriented towards the team and for physicians to be focussed on individual factors associated with the physician role was apparent throughout the dataset and across the different concepts. Both nurses and physicians talked about both individual factors and team factors, but the tendency was for nurses to talk more about team factors than individual factors, whereas the opposite tendency was found for physicians. An example of a statement related to the team would be: “ You need to include people from your team. Because they can have other information, they can have examined something else, they can have seen something else, heard something else ” (nurse 1).
Furthermore, only nurses talked about organisational factors and societal factors: “ There can be something organisational in task management. We don’t have the resources that we need, we don’t have the equipment that we need ” (nurse 1).
“ When we fixate on something, what we call fixation errors, (…) we actually produce it ourselves in our system in the way patients enter our hospitals. (…) So, we need to work on these concepts [social and cognitive capabilities], because our system is taking part in producing them [fixation errors], like we ourselves can take part in producing them [fixation errors] ” (nurse 6).
Contrary to the nurses’ broad focus on external factors, physicians talked more about the individual physician and internal factors, such as personal growth, how to advance from inexperienced to experienced, individual responsibility, and how to step up and be the team leader, etc.:
“ I think, as a specialist, if they master these [NTS] concepts early, then they get the space to develop the role and to set themselves in the process. This is where the problem sits, I think. It is when the individual and the role get mixed. That is also when it becomes unsafe for patients ” (physician 2).
When talking about leadership and decision - making , we observed a tendency in the data towards nurses talking more about social hierarchies in hospitals than physicians did. When social hierarchies were mentioned, it was mostly with a negative valuation. Social hierarchies were described in relation to seniority, where the senior (experienced) individual would be higher up the hierarchy than the junior (inexperienced) individual, and in relation to nurses and physicians, where physicians would be higher up the hierarchy. Both physicians and nurses talked about the problem of working with someone high in the hierarchy if that individual did not have the required competencies to fill that position:
“ Sometimes there is a formal leader who does not have the necessary qualifications. It is often the problem that there is someone who formally in the hierarchy of the hospital should be team leader, but in reality they are not ready for it at all, and other team members would be able to manage that task better – that is a problem ” (physician 1).
There were nurses who talked about a flatter hierarchical structure, where nurses or young physicians could be team leader, and even structures without any leader at all. One nurse described how the hierarchical structure in the hospital was becoming flatter as a result of a general development in society:
“ I think this hierarchy is evolving. In the past, the chief physician was someone who just stood with folded arms and the nurses ran around. That’s not the case anymore. It has increasingly become a collaboration where you get an understanding that you need each other. I think the whole development in society means that you flatten some of the hierarchy that has existed in the past ” (nurse 6).
This study used semi-structured interviews [ 33 ] and directed content analysis [ 34 ] with the aim to investigate patterns in variations in how healthcare educators talk about leadership and decision-making . The main findings from the study show how educators with backgrounds as nurses and physicians respectively talked differently about leadership and decision-making . The nurses in the current study described both leadership and decision-making as something the whole team engages in, whereas the physicians talked about them as something the physician does (alone). The nurses thought that nurses could be the team leaders, whereas physicians mentioned that the team leader must be a physician. The nurses talked more about group factors than individual factors, and they mentioned both organisational and societal factors. The physicians talked more about individual factors than group factors, and they did not mention organisational or societal factors. The nurses talked more about social hierarchies than the physicians did, and the hierarchies were almost always talked about as negative. The study contributes to the existing literature by showing that there are patterned differences in the way educators with a background as nurses and physicians respectively talk about decision-making and leadership . We argue that these differences might be passed on to the students through teachings. The main findings resonate well with previous studies on behavioural differences between nurses and physicians. This strengthens our argument that differences in understanding of concepts might underlie differences in behaviour which might again lead to safety issues. Thus, safety issues might be compounded by educators’ different understandings of concepts. We will discuss this further below.
Previous studies have found differences between nurses and physicians that are in line with our findings. Barrow and colleagues [ 19 ] found that nurses in their study thought they enacted leadership and decision-making, whereas many of the physicians in the study directly disagreed with that. Likewise, the majority of physicians in the study thought that the effectiveness of interprofessional teams relied on strong leadership from physicians [ 19 ]. Our findings suggest that such disagreement in clinical practice might be rooted in differences in how each profession understands the concepts they disagree about. On the other hand, research has also shown that while physicians exercise “direct” decision-making, nurses apply covert strategies like selecting information given to the physicians to try and steer the physician in the direction of the “right decision” [ 37 , 38 ]. By using a covert strategy for decision-making, it is probable that physicians do not even realise that the nurses are making decisions (or have a part in the decisions taken), which could also be (part of) the reason why nurses and physicians think differently about who makes decisions. Similarly, Barrow and colleagues [ 19 ] described different decision-making and leadership strategies for nurses and physicians, with nurses using external factors as their powerbase for authority and leadership. For example, nurses could say that something should not be done due to current guidelines, or they could approach another physician after shift change if they disagree with a decision (this observation is backed up by Svensson [ 39 ]). These differences in enactment of leadership and decision-making could grow from variations in understanding of the concepts. For example, nurses could be oriented toward external factors in their understanding of the concepts and physicians toward internal factors, as our study finds.
Extending the differences between external and internal factors, one of our main findings show variations in how much nurses and physicians talk about group factors and individual factors. While we have not found any other research analysing the tendency to talk about individual versus group factors among nurses and physicians, some studies have shown that physicians focus on each individual in the team instead of the group as a whole when describing ‘team’ and ‘teamwork’ [ 40 ]. Another study has shown that physicians talked about leadership as a group process when they were asked to define leadership, but as a personality trait when they simply talked unsolicited about leadership [ 41 ]. These findings could be a result of a physician tendency to focus on individual factors when defining concepts related to leadership.
Our findings further indicate variations in understanding of and interest in hierarchy. Several earlier studies showed an effect of hierarchy on how healthcare professionals understand and exercise social and cognitive capabilities [ 20 , 42 ]. Makary and colleagues [ 20 ] found that physicians and nurses in the operating room evaluated their teamwork differently with physicians rating it higher than nurses. Some of the explanations suggested by the authors involved how social status and hierarchies might influence how healthcare professionals perceive teamwork, but also that physicians and nurses might have different ideas about what constitutes good teamwork [ 20 ]. The latter would be in line with our argument in this study. However, hierarchy might also be another explanation as to why differences in behaviour appear in clinical practice. Research has shown instances where hierarchy can influence decision-making by showing how nurses are constrained by physicians in their decision-making, but not the other way around [ 43 ].
Our study indicates that nurses and physicians understand leadership and decision-making differently, which resonates well with earlier studies. Our participants worked as healthcare educators involved with teaching and planning courses concerning leadership and decision-making among other topics. Previous research has found that teachers’ preconceptions, preferences, and biases can form a hidden curriculum within a course [ 27 , 29 , 31 ], and that a hidden curriculum can be a powerful determinant of later behaviour [ 28 , 30 ]. Seeing our findings in this light, we argue that it is likely that the differences in understanding of leadership and decision-making among the educators in our study will form a hidden curriculum in their courses. They might choose certain cases for a simulation session or focus on a specific event in the debriefing, which will advance their particular understanding of a concept. Or they might use different words or emphasis when explaining a concept. Such a hidden curriculum can influence learning and later behaviour among course participants. An example of this was shown in a study by Ju and van Schaik [ 25 ] on leadership prototype formation (the understanding of what it means to be a leader). Ju and van Schaik [ 25 ] argued that prototype formation is influenced by the teaching materials and role models that health professionals are exposed to during their education and clinical practice. Even something as simple as the sex of nurses and physicians in educational videos could have an impact on prototype formation and later behaviour [ 25 ]. We argue that the observed differences in understanding of leadership and decision-making would similarly influence course participants’ concept formation (or ‘prototype formation’ to use the terminology of Ju & van Schaik [ 25 ]), which would cause course participants taught by a nurse educator to form a slightly different understanding of, for example, leadership than a course participant taught by a physician educator. These differences would later cause the course participants to act differently in clinical practice [ 21 , 32 , 44 ], which could potentially lead to miscommunication and misunderstandings between nurses and physicians. Nurses and physicians work together every day in clinical practice, and even minor disagreements about who decides what, and who leads whom can lead to frustration and unsafe situations [ 19 ]. If nurses and physicians do not agree on their respective roles, leadership could for example become unclear in an emergency situation where too many or too few step up to the task [ 45 ]. Alternatively, physicians might make a decision, since they think it is their responsibility, and nurses, who are left out of the original decision-making process, might undermine it or work against it based on frustrations resulting from differences in concepts (as seen in [ 19 ]). Since much nurse and physician learning happens in clinical practice through experience and observation of others [ 46 ], differences in behaviour would reproduce differences in understanding of the concepts. This is particularly pertinent for “newcomers” learning the language and legitimate actions of a workplace [ 47 ]. Novice nurses see how other nurses talk and act in clinical practice and then adapt their language and behaviour based on these interactions to fit into the observed community [ 47 ]. In this way, certain understandings of leadership and decision-making would be reproduced.
Examples of potential miscommunications are already evident in the present study. A withdrawn leader is both described positively (a good leader position with overview) and negatively (as someone just bossing the team members around, without engaging in helping the team). Becoming aware of these different understandings is a first step towards a deeper understanding and better communication among different groups of healthcare professionals, which could potentially alleviate conflicts and improve patient safety.
It is important to mention that the differences we have observed between nurses and physicians in our study might have originated from other characteristics of the participants than their professional background. Examples could be their level of experience as clinicians or as educators, the nature of the courses they teach, the participants in their courses, gender (though unlikely, as the study participants included only one male), or personality traits.
Note that we asked participants to talk about the concepts in the context of their teachings, which might be different from how they would talk about the concepts in another setting, or how they would use the concepts in clinical practice. It would have been beneficial to supplement our interview data with observations of teaching practices or clinical work.
Furthermore, we base our considerations on a small data set, but considering the match between our findings and aspects described in the literature, we see support for our findings and interpretations.
In this study, we found that nurse and physician healthcare educators to a large extent described social and cognitive capabilities as they are described in existing tools addressing non-technical skills. We also found patterned differences in their descriptions that may be related to educators’ professional training/background. Focusing on the concepts leadership and decision-making , nurses paid greater attention to group dynamics and external factors, whereas physicians focused on their individual efforts. If nurses and physicians disagree on the meaning of leadership and decision-making, for example regarding who should decide in a given situation, it can create misunderstandings and unsafe situations. For this reason, it could be beneficial to make healthcare professionals aware of the specificity of their own concepts, so that they can communicate better about meanings and differences of concepts in teamwork situations. This could be done by educating them to describe more precisely what they mean when using a certain concept, for example “I want you to coordinate tasks” instead of “I want better leadership”. In this way, we might avoid healthcare professionals using the same word, but in fact referring to different concepts.
Below is the link to the electronic supplementary material.
Author contributions.
RH analysed the data and wrote the first draft of the manuscript. BB and PD qualified the analysis and commented on the manuscript. FZ conducted the interviews and commented on the manuscript. All authors read and approved the final manuscript.
Our salary was funded by our institution and no external funding was received for conducting this study.
Open access funding provided by Copenhagen University
Declarations.
The Regional Ethics Committee of the Capital Region of Denmark waived the ethical approval of the study (H-19023177). Written informed consent was obtained from all study participants. All methods in this study were carried out in accordance with relevant guidelines and regulations in the Declaration of Helsinki.
RH, BB, and FZ have no conflicts of interest to declare. PD holds a professorship with the University of Stavanger that was established by an unconditional grant from the Laerdal Foundation to the University and that is now financed by the University. PD leads the EuSim group, a network of simulation centres and experts providing simulation faculty development courses.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
You have full access to this open access article
39 Accesses
Explore all metrics
Establishing thresholds of change that are actually meaningful for the patient in an outcome measurement instrument is paramount. This concept is called the minimum clinically important difference (MCID). We summarize available MCID calculation methods relevant to spine surgery, and outline key considerations, followed by a step-by-step working example of how MCID can be calculated, using publicly available data, to enable the readers to follow the calculations themselves.
Thirteen MCID calculations methods were summarized, including anchor-based methods, distribution-based methods, Reliable Change Index, 30% Reduction from Baseline, Social Comparison Approach and the Delphi method. All methods, except the latter two, were used to calculate MCID for improvement of Zurich Claudication Questionnaire (ZCQ) Symptom Severity of patients with lumbar spinal stenosis. Numeric Rating Scale for Leg Pain and Japanese Orthopaedic Association Back Pain Evaluation Questionnaire Walking Ability domain were used as anchors.
The MCID for improvement of ZCQ Symptom Severity ranged from 0.8 to 5.1. On average, distribution-based methods yielded lower MCID values, than anchor-based methods. The percentage of patients who achieved the calculated MCID threshold ranged from 9.5% to 61.9%.
MCID calculations are encouraged in spinal research to evaluate treatment success. Anchor-based methods, relying on scales assessing patient preferences, continue to be the “gold-standard” with receiver operating characteristic curve approach being optimal. In their absence, the minimum detectable change approach is acceptable. The provided explanation and step-by-step example of MCID calculations with statistical code and publicly available data can act as guidance in planning future MCID calculation studies.
Determining the clinical importance of treatment benefits for interventions for painful orthopedic conditions.
Avoid common mistakes on your manuscript.
The notion of minimum clinically important difference (MCID) was introduced to establish thresholds of change in an outcome measurement instrument that are actually meaningful for the patient. Jaeschke et al . originally defined it “as the smallest difference in score in the domain of interest which the patient perceives as beneficial and which would mandate, in the absence of troublesome side-effects and excessive cost, a change in the patient’s management” [ 1 ].
In many clinical trials statistical analyses only focuses on intergroup comparisons of raw outcome scores using parametric/non-parametric tests and deriving conclusions based on the p -value. Using the classical threshold of p- value < 0.05 only suggests that the observed effect is unlikely to have occurred by chance, but it does not equate to a change that is clinically meaningful for the patient [ 2 ]. Calculating MCID scores, and using them as thresholds for “treatment success”, ensures that patients’ needs and preferences are considered and allows for comparison of proportion of patients experiencing a clinically relevant improvement among different groups [ 3 ]. Through MCID, clinicians can better understand the impact of an intervention on their patients’ lives, sample size calculations can become more robust and health policy makers may decide which treatments deserve reimbursement [ 4 , 5 , 6 ].
The MCID can be determined from the patient’s perspective, where it is the patient who decides whether a change in their health was meaningful [ 4 , 7 , 8 , 9 ]. This is the most common “gold-standard” approach and one that we will focus on. Occasionally, the clinician’s perspective can also be used to determine MCID. However, MCID for a clinician may not necessarily mean an increase in a patient’s functionality, but rather a change in disease survival or treatment planning [ 10 ]. MCID can also be defined at a societal level, as e.g. improvement in a patient’s functionality significant enough to aid their return to work [ 11 ].
MCID thresholds are intended to assess an individual’s clinical improvement and ought not to be applied to mean scores of entire groups post-intervention, as doing so may falsely over-estimate treatment effectiveness. It is also noteworthy to mention that obtained MCID values are not treatment-specific but broadly disease category-specific. They rely on a patient’s perception of clinical benefit, which is influenced by their diagnosis and subsequent symptoms, not just treatment modality.
In this study, we summarize available MCID calculation methods and outline key considerations when designing a MCID study, followed by a step-by-step working example of how MCID can be calculated.
To illustrate the MCID methods and to enable the reader to follow the practical calculation guide of different MCID values, based on the described methods along the way, a previously published data set of 84 patients, as described in Minetama et al ., was used based on CC0.10 license [ 12 ]. Data can be downloaded at https://data.mendeley.com/datasets/vm8rg6rvsw/1 . The statistical R code can be found in Supplementry content 1 including instructions on formatting the data set for MCID calculations The title of different MCID methods in the paper (listed below) and their number correspond to the same title and respective number in the R code. All analyses in this case study were carried out using R version 2023.12 + 402 (The R Foundation for Statistical Computing, Vienna Austria) [ 13 ].
The aim of Minetama et al . was to assess the effectiveness of supervised physical therapy (PT) with unsupervised at-home-exercises (HE) in patients with lumbar spinal stenosis (LSS). The main inclusion criteria were presence of neurogenic intermittent claudication and pain/or numbness in the lower extremities with or without back pain and > 50 years of age; diagnosis of LSS confirmed on MRI and a history of ineffective response to therapy for ≥ 3 months. Patients were then randomized into a 6-week PT or HE programme [ 12 ]. All data was pooled, as a clinically significant benefit for patients is independent of group allocation and because MCID is disease-specific. Therefore, the derived MCID will be applicable to most patients with lumbar spinal stenosis, irrespective of treatment modality. Change scores were calculated by subtracting baseline scores from follow-up scores.
There are multiple approaches to calculate MCID, mainly divided into anchor-based and distribution-based methods (Fig. 1 ) [ 4 , 10 , 14 , 15 , 16 , 17 ]. Before deciding on the method, it needs to be defined whether the calculated MCID will be for improvement or deterioration [ 18 ]. Most commonly, MCID is used to measure improvement (as per Jaeschke et al . definition) [ 1 , 4 , 7 , 14 , 15 , 16 , 19 , 20 ]. The value of MCID for improvement should not be directly applied in reverse to determine whether a decrease in patients' scores signifies a clinically meaningful deterioration – those are two separate concepts [ 18 ]. In addition, the actual MCID value ought to be applied to post-intervention score of an individual patient (not the overall score for the whole group), to determine whether, at follow-up, he or she experienced a change equating to MCID or more, compared to their baseline score. Such patient is then classified as “responders”.
Flow diagram presenting range of Minimum clinically important difference calculation methods stratified into anchor, distribution-based and “other” described in the study. MCID, Minimum Clinically Important Difference; MIC, Minimal Important Change
According to the Consensus-based Standards for the selection of health measurement instruments (COSMIN) guidelines, the “anchor-based” approach is regarded as the “gold-standard” [ 21 , 22 , 23 ]. In this approach, we determine the MCID of a chosen outcome measurement, based on whether a pre-defined MCID (usually derived from another published study) was achieved by an external criterion, known as the anchor, usually another patient-reported outcome measure (PROM) or an objective test of functionality [ 4 , 7 , 8 , 15 , 16 , 17 , 18 , 20 ]. It is best to use scales which allow the patient to rate the specific aspect of their health related to the disease of interest post-intervention compared to baseline on a Likert-type scale. This scale may range, for example, from “much worse”, “somewhat worse”, “about the same”, “somewhat better”, to “much better”, such as the established Global Assessment Rating tool [ 7 , 8 , 24 , 25 ]. Depending on the scale, some studies determine MCID by calculating change scores for patients who only ranked themselves as “somewhat better”, and some only consider patients who ranked themselves as “much better” [ 7 , 25 , 26 , 27 , 28 , 29 ]. This discrepancy is likely an explanation for a range of MCID for a single outcome measure dependent on the methodology. There appears to be no singular “correct” approach. One of the alternatives to the Global assessment rating is the use of the health transition item (HTI) from the SF-36 questionnaire, where patients are asked about their overall health compared to one year ago [ 7 , 30 , 31 ]. Although quick and easy to conduct, the patient’s response may be influenced by comorbid health issues other than those targeted by intervention. Nevertheless, any anchor where the patient is the one to decide what change is clinically meaningful, captures the true essence of the MCID. One should however, be mindful of the not easily addressed recall bias with such anchors – patients at times do not reliably remember their baseline health status [ 32 ]. Moreover, what the above anchors do not consider is, whether the patient would still choose the intervention for the same condition despite experiencing side-effects or cost. That can be addressed through implementing anchors such as the Satisfaction with Results scale described in Copay et al ., who found that MCID values based on the Satisfaction with Results scale were slightly higher than those derived from HTI-SF-36 [ 7 , 33 ].
Other commonly used outcome scales, such as Oswestry Disability Index (ODI), Roland–Morris Disability Questionnaire (RMDQ), Visual Analogue Scale (VAS), or EQ5D-3L Health-Related Quality of Life, can also act as anchors [ 7 , 14 , 16 , 34 , 35 ]. In such instances, patients complete the “anchor” questionnaire at baseline and post-intervention and the MCID of that anchor is derived from a previous publication [ 12 , 16 , 35 ]. Before deciding on the MCID, full understanding of how it was derived in that previous publication is crucial. Ideally, this should be done for a population similar to our study cohort, with comparable follow-up periods [ 18 , 20 ]. Correlations between the anchor instrument and the investigated outcome measurement instrument must be recorded, and ought to be at least moderate (> 0.05), as that is the best indicator of construct validity (whether both the anchor instrument and outcome instrument represent a similar construct of patient health) [ 18 , 36 ]. If such correlation is not available, the anchor-based MCID credibility instrument is available to aid in assessing construct proximity between the two [ 36 , 37 ].
Once the process for selecting an anchor and classifying “responders” and “non-responders” is established, the MCID can be calculated. The outcome instrument of interest will be defined as an outcome for which we want to calculate the MCID. The first anchor-based method (within-patient change) focuses on the average improvement seen among clear responders in the anchor. The between-patient change anchor-based method additionally subtracts the average improvement seen among non-responders (unchanged and/or worsened) and consequently ends up with a smaller MCID value. Finally, an anchor-based method based on Receiver Operating Characteristic (ROC) curve analysis–that can be considered the current “gold standard”- also exists, which effectively looks at the MCID calculation as a sort of diagnostic instrument and aims to improve the discriminatory performance of our MCID threshold. In the following paragraphs, the three anchor-based methods are described in more detail. The R code (Supplementry Content 1 ) enables the reader to follow the text and to calculate MCID for the Zurich Claudication Questionnaire (ZCQ) Symptom Severity domain, based on a publicly available dataset [ 12 ].
The chosen outcome measurement instrument in this case study for which MCID for improvement will be calculated is ZCQ Symptom Severity domain [ 12 ]. The ZCQ is composed of three subscales: symptom severity (7 questions, score per question ranging from 1 to 5 points); physical function (5 questions, score per question ranging from 1 to 4 points) and patient satisfaction with treatment scale (6 questions, score per question ranging from to 4 points). Higher scores indicate greater disability/worse satisfaction [ 38 ]. To visualize different MCID values, Numeric Rating Scale (NRS) for Leg Pain (score from 0 “no pain” to 10 “worse possible pain) and Japanese Orthopaedic Association Back Pain Evaluation Questionnaire (JOABPEQ) Walking Ability domain are chosen, as they showed high responsiveness in patients with LSS post-operatively [ 39 ].Through 25 questions, the JOABPEQ assesses five distinctive domains: pain-related symptoms, lumbar spine dysfunction, walking ability, impairment in social functioning and psychological disturbances. The score for each domain ranges from 0 to 100 points (higher score indicating better health status) [ 40 ]. The correlation of ZCQ symptom severity with NRS Leg Pain and JOABPEQ Walking Ability domain, is 0.56 and − 0.51, respectively [ 39 ]. For a patient to be classified as a “responder”, using the NRS for Leg pain or JOABPEQ walking ability, the score at 6-week follow-up must have improved by 1.6 points or 20 points, respectively [ 7 , 40 , 41 ].
This publicly available dataset does not report patient satisfaction or any kind of global assessment rating.
To enable calculation of global assessment rating-based MCID methods for educational purposes, despite very limited availability of studies providing MCID for deterioration of JOABPEQ, we decided to stratify patients in this dataset into the three following groups, based on the JOABPEQ Walking Ability as an anchor: likely improved (change score above 20 points according to Kasai et al . ), no significant change (− 20– + 20 points change score), and likely deteriorated (lower than − 20 points change score) [ 41 ]. As obtained MCID values were expected to be negative, all values, for clarity of presentation, were multiplied by − 1, except in Method (IX), where graphical data distribution was shown.
Method (i) calculating mcid using “within-patient” score change.
The first method focuses on calculating the change between baseline and post-intervention score of our outcome instrument, for each patient classified as a “responder”. A “responder” is a patient who, at follow-up, has achieved the pre-defined MCID of the anchor (or ranks themselves high enough on Global assessment rating type scale based on our methodology). The MCID is then defined as the mean change in the outcome instrument of interest of those classified as “responders” [ 4 , 7 , 16 , 31 ].
The corresponding R-Code formula is described in Step 5a of Supplementry Content 1 . Calculated within-patient MCID of ZCQ Symptom Severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was 4.4 and 4.2, respectively.
In this approach, the mean change in our outcome instrument is calculated for not only “responders” but also for “non-responders”. “Non-responders” are patients who did not achieve the pre-defined MCID of our anchor or who did not rank themselves high enough (unchanged, or sometimes: unchanged + worsened) on Global Assessment Rating type scale according to our methodology. The minimum clinically important difference of our outcome instrument is then defined as the difference between the mean change scores of “responders” and “non-responders” [ 4 , 7 , 16 , 19 ].
The corresponding R-Code formula is described in Step 5b of Supplementry content 1 . Calculated between-patient MCID of ZCQ Symptom Severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was 3.5 and 2.8, respectively.
Here the MCID is derived through ROC analysis to identify the “threshold” score of our outcome instrument that best discriminates between “responders” and “non-responders” of the anchor [ 4 , 7 , 16 , 19 , 27 ]. To understand ROC, one must familiarize oneself with the concept of sensitivity and specificity. In ROC analysis, sensitivity is defined as the ability of the test to correctly detect “true positives”, which in this context refers to patients who have achieved a clinically meaningful change.
“False negative” would be a patient, who was classified as “non-responder” but is really a “responder”. Specificity is defined as the ability of a test to correctly detect a “true negative” result- a patient who did not achieve a clinically meaningful change – a “non-responder” [ 25 ].
A “false positive” would be a patient, who was classified as a “responder” but who was a “non-responder”. Values for sensitivity and specificity range from 0 to 1. Sensitivity of 1 means that the test can detect 100% of “true positives”’ (“responders”), while specificity of 1 reflects the ability to detect 100% of “true negatives” (“non-responders”). It is unclear what the minimum sensitivity and specificity should be for a “gold-standard” MCID, which is why the most established approach is to opt for a MCID threshold that maximizes both sensitivity and specificity at the same time, which can be done using ROC analysis [ 4 , 7 , 25 , 31 , 42 ]. During ROC analysis, the “closest-to-(0,1)-criterion” (the top left most point of the curve) or the Youden index are the two methods to automatically determine the optimal threshold point [ 43 ].
When conducting the ROC analysis, the Area under the curve (AUC) is also determined–a measure of how well the MCID threshold discriminates responders and non-responders in general. Values in AUC can range 0–1. An AUC of 0.5 signifies that the score discriminates no better than random chance, whereas a value of 1 means that the score perfectly discriminates between responders and non-responders. In the literature, an AUC of 0.7 and 0.8 is deemed fair (acceptable), while ≥ 0.8 to < 0.9 is considered good and values ≥ 0.9 are considered excellent [ 44 ]. Calculating the AUC provides a rough estimate of how well the chosen MCID threshold performs. The corresponding R-Code formula is described in Step 5c of Supplementry content 1 . Statistical package pROC was used. The calculated MCID of ZCQ symptom severity based on NRS Leg Pain and JOABPEQ Walking Ability domain was for both 1.5.
Calculation of MCID using the distribution-based approach focuses on statistical properties of the dataset [ 7 , 14 , 16 , 27 , 45 ]. Those methods are objective, easy to calculate, and in some cases, yield values close to anchor-based MCID. The advantage of this approach is that it does not rely on any external criterion or require additional studies on previously established MCIDs or other validated “gold standard” questionnaires for the specific disease in each clinical setting. However, it fails to include the patient’s perspective of a clinically meaningful change, which will be discussed later in this study. In this sense, distribution-based methods focus on finding MCID thresholds that enable mathematical distinction of what is considered a changed vs. unchanged score, whereas anchor-based methods focus on finding MCID thresholds which represent a patient-centered, meaningful improvement.
The standard error of measurement conceptualizes the reliability of the outcome measure, by determining how repeated measurements of an outcome may differ from the “true score”. Greater SEM equates to lower reliability, which is suggestive of meaningful inconsistencies in the values produced by the outcome instrument despite similar measuring conditions. Hence, it has been theorized that 1 SEM is equal to MCID, because a change score ≥ 1 SEM, is unlikely to be due to measurement error and therefore is also more likely to be clinically meaningful [ 46 , 47 ]. The following formula is used: [ 1 , 7 , 35 , 46 , 48 ].
The ICC, also called reliability coefficient, signifies level of agreement or consistency between measurements taken on different occasions or by different raters [ 49 ]. There are various ways of calculating the ICC depending on the used model with values < 0.5, 0.5– 0.75, 0.75–0.9 and > 0.90 indicating poor, moderate, good and excellent reliability, respectively [ 49 ]. While a value of 1 × SEM is probably the most established way to calculate MCID, in the literature, a range of multiplication factors for SEM-based MCID have been used, including 1.96 SEM or even 2.77 SEM to identify a more specific threshold for improvement [ 48 , 50 ]. The corresponding R-Code formula is described in Step 6a of Supplementry Content 1 . The chosen ZCQ Symptom Severity ICC was 0.81 [ 51 ]. The SEM-based MCID was 1.9.
Effect size (ES) is a standardized measure of the strength of the relationship or difference between two variables [ 52 ]. It is described by Cohen et al . as “degree to which the null hypothesis (there is no difference between the two groups) is false”. It allows for direct comparison of different instruments with different units between studies. There are multiple forms to calculate ES, but for the purpose of MCID calculations, the ES represents the number of SDs by which the post-intervention score has changed from baseline score. It is calculated based on the following formula incorporating the average change score divided by the SD of the baseline score: [ 52 ].
According to Cohen et al . 0.2 is considered small ES, 0.5 is moderate ES and 0.8 or more is large ES [ 53 ]. Most commonly, a change score with an ES of 0.2 is considered equivalent to MCID [ 7 , 16 , 31 , 54 , 55 , 56 ]. Using this method, we are basically identifying the mean change score (in this case reflecting the MCID) that equates to an ES of 0.2: [ 7 , 55 ].
Practically, if a patient experienced small improvement in an outcome measure post intervention, the ES will be smaller than for a patient who experienced a large improvement in outcomes measure. The corresponding R-Code formula is described in Step 6b of Supplementry Content 1 . The ES-based MCID was 0.9.
The Standardized Response Mean (SRM) aims to gauge the responsiveness of an outcome similarly to ES. Initially described by Cohen et al . as a derivative of ES assessing differences of paired observations in a single sample, later renamed as SRM, it is also considered an “index of responsiveness” [ 38 , 53 ]. However, the denominator is SD of the change scores–not the SD of the baseline scores–while the numerator remains the average change score from baseline to follow-up: [ 10 , 45 , 57 , 58 , 59 ].
Similarly, to Cohen’s rule of interpreting ES, it has been theorized that responsiveness can be considered low if SRM is 0.2–0.5, moderate if > 0.5–0.8 and large if > 0.8 [ 58 , 59 , 60 ]. Again, a change score equating to SRM of 0.2 (although SRM of 1/3 or 0.5 were also proposed) can be considered MCID, although studies have used the overall SRM as MCID as well [ 45 , 54 , 56 , 61 ]. However, since SRM is a standardized index, similarly to ES, the aim of the SRM-based method ought to be to identify a change score that indicates responsiveness of 0.2: [ 61 ].
Similar to the ES-based method, the SRM-based approach for calculating the MCID is not commonly used in in spine surgery studies [ 14 ]. It is a measure of responsiveness, which is the ability to detect change over time in a construct to be measured by the instrument, and ought to be therefore calculated for the study-specific change score rather than extrapolated as a “universal” MCID threshold to other studies. The corresponding R-Code formula is described in Step 6c of Supplementry Content 1 . The SRM-based MCID was 0.8.
The limitation of using Method (V) and (VI) in MCID calculations will be later described in Discussion.
Standard Deviation represents the average spread of individual data points around the mean value of the outcome measure. Norman et al . found in their review of studies using MCID in health-related quality of life instruments that most studies had an average ES of 0.5, which equated to clinically meaningful change score of 0.5 × SD of baseline score [ 7 , 16 , 30 ].
The corresponding R-Code formula is described in Step 6d of Supplementry content 1 . The SD-based MCID was 2.1.
The MDC is defined as the minimal change below which there is a 95% chance that it is due to measurement error of the outcome measurement instrument: [ 7 , 61 ].
Usually, value corresponding to z is the desired level of confidence, which for 95% confidence level is 1.96. Although MDC–like all distribution-based methods–does not consider whether a change is clinically meaningful, the calculated MCID should be at least the same or greater than MDC to enable distinguishing true mathematical change from measurement noise. The 95% MDC calculation, is the most common distribution-based approach in spinal surgery, and it appears to most closely resemble anchor-derived MCID values, as demonstrated by Copay et al . [ 7 , 14 , 62 ]. The corresponding R-Code formula is described in Step 6e of Supplementry Content 1 . The 95% MDC was 5.1.
Another less frequently applied method through which “responders and “non-responders” can be classified but which does not rely on an external criterion is the Reliable Change Index (RCI), also called the Jacobson–Truax index [ 63 , 64 ]. It indicates whether an individual change score is statistically significantly greater than a change in score that could have occurred due to random measurement error alone [ 63 ].
In theory, a patient can be considered to experience a statistically reliably identifiable improvement ( p < 0.05), if the individual RCI is > 1.96. Again, it does not reflect whether the change is clinically meaningful for the patient but rather that the change should not be attributed to measurement error alone and likely has a component of true score change. Therefore, this method is discouraged in MCID calculations as it relies on statistical properties of the sample and not patient preferences–as all distribution-based methods do [ 65 ]. In the example of Bolton et al . who focused on the Bournemouth Questionnaire in patients with neck pain, RCI was subsequently used to discriminate between “responders” and “non-responders”. The ROC analysis approach was then used to determine the MCID [ 64 ]. The corresponding R-Code formula is described in Step 6f of Supplementry Content 1 . Again, pROC package was used. The ROC-derived MCID was 2.5.
Method (x) calculating mcid through anchor-based minimal important change (mic) distribution model.
In theory, combining anchor- and distribution-based methods could yield superior results. Some suggestions include averaging the values of various methods, simply combining two different methods (i.e. both an anchor-based criterion such as ROC-based MCID from patient satisfaction and 95% MDC-based MCID have to both be met to consider a patient as having achieved MCID) [ 25 ]. In 2007, de Vet et al . introduced a new visual method of MCID calculations that does not only combine but also integrates both anchor- and distribution-based calculations [ 25 ]. In addition, their method allows the calculation of both MCID for improvement and for deterioration, as these can differ.
In short form, using an anchor, patients were divided into three “importantly improved”, “not importantly changed” and “importantly deteriorated” groups (Fig. 2 ) . Then distribution expressed in percentiles of patients who “importantly improved”, “importantly deteriorated” and “not importantly changed” were plotted on a graph. This is the anchor-based part of the approach, ensuring that MCID thresholds chosen have clinical value.
Distribution of the Zurich Claudication Questionnaire Symptom Severity change scores for patients categorized as experiencing “important improvement”, “no important change” or “important deterioration” in JOABPEQ walking ability as an anchor (Method (X)). For ZCQ Symptom Severity score to improve, the actual value must decrease explaining the negative values in the model. ROC , Receiver Operating Characteristic; ZCQ , Zurich Claudication Questionnaire; JOABPEQ , Japanese Orthopaedic Association Back Pain Evaluation Questionnaire
The second part of the approach is then entirely focused on the group of patients determined by the anchor to be “unchanged”, and can be either distribution- or anchor-based:
In the first and more anchor-based method, the ROC-based method described in Method (III) is applied to find the threshold for improvement (by finding the ROC-based threshold point that optimizes sensitivity and specificity of identifying improved vs unchanged patients) or for deterioration (by finding the ROC-based threshold point that optimizes sensitivity and specificity of identifying deteriorated vs unchanged patients). For example, the threshold for improvement is found by combining the improved and unchanged groups, and then testing out different thresholds for discriminating those two groups from each other. The optimal point on the resulting ROC curve based on the closest-to-(0,1)-criterion is then found.
In the second method, which is distribution-based, the upper 95% (for improvement) and lower 95% (for deterioration) limits are found based solely on the group of patients determined to be unchanged. The following formula is used (instead, subtracting instead of adding the 1.645 × SD for deterioration or improvement, respectively): [ 25 ]
The corresponding R-Code formula can be found under Step 7a in Supplementry Content 1 . The model is presented in Fig. 2 . The 95% upper limit and 95% lower limit was 4.1 and − 7.2 respectively. The ROC-derived MCID using RCI was − 2.5 (important improvement vs unchanged) and − 0.5 (important deterioration vs unchanged). For the purpose of the model, MCID values were not multiplied by − 1 but remained in original form.
In recent years, a simple 30% reduction from baseline values has been introduced as an alternative to MCID calculations [ 66 ]. It has been speculated that absolute-point changes are difficult to interpret and have limited value in context of “ceiling” and “floor” effects (i.e. values that are on the extreme spectra of the measurement scale) [ 4 ]. To overcome this, Khan et al . found that 30% reduction in PROMs has similar effectiveness as traditional anchored or distribution-based methods in detecting patients with clinically meaningful differences post lumbar spine surgery [ 15 ]. The corresponding R-Code formula can be found under Step 7b in Supplementry Content 1 .
The Delphi Method is a systemic approach using the collective opinion of experts to establish a consensus regarding a medical issue [ 67 ]. It has mostly been used to develop best practice guidelines [ 68 ]. However, it can also be used to aid MCID determination [ 69 ]. The method focuses on distributing questionnaires or surveys to panel of members. The anonymized answers are grouped together and shared again with the expert panel in subsequent rounds. This allows the experts to reflect on their opinions and consider strengths and weaknesses of the others response. The process is repeated until consensus is reached. Ensuring anonymity, this prevents any potential bias linked to a specific participant’s concern about their own opinion being viewed or influenced by other personal factors [ 67 ].
The final approach is asking patients to compare themselves to other patients, which requires time and resources [ 70 ]. In a study by Redelmeier et al . patients with chronic obstructive pulmonary disease in a rehabilitation program were organized into small groups and observed each other at multiple occasions [ 70 ]. Additionally, each patient was paired with another participant and had a one-to-one interview with them discussing different aspects of their health. Finally, each patient anonymously rated themselves against their partner on a scale “much better”, “somewhat better”, “a little bit better”, “about the same”, “a little bit worse” “somewhat worse” and “much worse”. MCID was then calculated based on the mean change score of patients who graded themselves as “a little bit better” (MCID for improvement) or a “little bit worse” (MCID for deterioration), like in the within-patient change and between-patient change method described in Method (I) and (II) [ 70 ].
Over the years, it has been noted that MCID calculations based either purely on distribution-based method or only group of patients rating themselves as “somewhat better” or “slightly better” does not necessarily constitute a change that patients would consider beneficial enough “to mandate, in the absence of troublesome side effects and excessive cost, to undergo the treatment again” [ 3 , 24 ]. Therefore, the concept of substantial clinical benefit (SCB) has been introduced as a way of identifying a threshold of clinical success of intervention rather than a “floor” value for improvement- that is MCID [ 24 ]. For example, in Carreon et al ., ROC derived SCB “thresholds” were defined as a change score with equal sensitivity and specificity to distinguish “much better” from “somewhat better” patients post cervical spinal fusion [ 71 ]. Glassman et al . on the other hand used ROC derived SCB thresholds to discriminate between “much better” and “about the same” patients following lumbar spinal fusion. The authors stress that SCB and MCID are indeed separate entities, and one should not be used to derive the other [ 24 ]. Thus, while the methods to derive SCB and MCID thresholds can be carried out similarly based on anchors, the ultimate goal of applying SCB versus MCID is different.
Using the various methods explained above, overall, MCID for improvement for ZCQ Symptoms Severity domain ranged from 0.8 to 5.1 (Table 1 ). Here, the readers obtained results can be checked for correctness. On average distribution-based MCID values were lower than anchor-based MCID values. Within distribution-based approach, method (VIII) “Minimum detectable change” resulted in MCID of 5.1, which exceeded the MCID’s derived using the “gold-standard” anchor-based approaches. The average MCID based on anchor of NRS Leg pain and JOABPEQ walking ability was 3.1 and 2.8, respectively. Dependent on methods used, percentage of responders to HE and PT intervention fell within range of 9.5% for “30% Reduction from Baseline” method to 61.9% using ES- and SRM-based method (Table 2 ). Method (X) is graphically presented in Fig. 2 .
As demonstrated above, the MCID is dependent upon the methodology and the chosen anchor, highlighting the necessity for careful preparation in MCID calculations. The lowest MCID of 0.8 was calculated for Method (VI) being SRM. Logically, if a patient on average had a baseline ZCQ Symptom Severity score of 23.2, an improvement of 0.8 is unlikely to be clinically meaningful, even if rounded up. It rather informs on the measurement error property of our instrument as explained by COSMIN. Additionally, the distribution-based methods rely on statistical properties of the sample, which varies from cohort to cohort making it only generalizable to patient groups with similar SD but not applicable to others with a different spread of data [ 52 ]. Not surprisingly, anchor-based methods considering patient preferences yielded on average higher MCID values than distribution-based methods, which again varied from anchor to anchor. The mean MCID for improvement calculated for NPRS Leg Pain was 3.1, while for JOABPEQ Walking Ability it was 2.8—such similar values prove the importance of selecting responsive anchors with at least moderate correlations. Despite assessing different aspects of LSS disease, the MCID remained comparable in this specific case.
Interestingly, Method (VIII) MDC yielded the highest value of 5.1, exceeding the “gold-standard” ROC-derived MCID. This suggests that, in this example, using this ROC-derived MCID in clinical practice would be illogical, as the value falls within the measurement error determined by MDC. Here it would be appropriate to choose MDC approach as the MCID. Interestingly, ROC-derived MCID values based on Global Assessment Rating like stratification of patients based on their JOABPEQ Walking Ability (Method X) yielded higher MCID, than in Method (III). This may be attributed to a more a balanced distribution of “responders” and “non-responders” (only unchanged patients) in Method (X), unlike in the latter (Method III) where patients were strictly categorized into “responders” and “non-responders” (including both deteriorated and unchanged). This further highlights the importance of using global assessment rating type scales in determining the extent of clinical benefit.
Although ES-based (Method (V)) and SRM-based (Method (VI)) MCID calculations have been described in the literature, ES and SRM were originally created to quantify the strength of relationship between scores of two samples (in case of ES) and change score of paired observations in one sample (in case of SRM) [ 53 , 58 , 59 ]. They do offer an alternative to MCID calculations. However, verification with other MCID calculation methods, ideally anchor-based, is strongly recommended. As seen in this case study and other MCID’s derived similarly, they often result small estimates [ 7 , 55 ]. There is also no consensus regarding the choice of SD of Change Score vs. SD of Baseline Score as denominator. Additionally, whether the calculated MCID (mean change score) should represent value, such as the ES is 0.2 indicating small effect, or value should be 0.5 suggesting moderate effect is currently arbitrary and often relies on the researcher’s preference [ 53 , 55 , 59 ]. Both ES and SRM can be used to assess whether the overall change score observed in single study is suggestive of a clinically meaningful benefit in that specific cohort or in case of SRM, whether the outcome measure is responsive. However, it is our perspective that extending such value as “MCID” from one study to another is not recommended.
One can argue whether there is even a place for distribution-based methods in MCID calculations. They ultimately fail to provide an MCID value that meets the original definition of Jaeschke et al . “of smallest change in the outcome that the patient would identify as important”. At no point are patients asked about what constitutes a meaningful change for them, and the value is derived from statistical properties of the sample solely [ 1 ]. Nevertheless, conduction of studies on MCID implementing scales such as Global Assessment Rating is time-consuming and performing studies for each patient outcome and each disease is likely not feasible. Distribution-based methods still have some merit in that they–like the 95% MDC method—can help distinguish measurement noise and inaccuracy from true change. Even if anchor-based methods should probably be used to define MCID thresholds, they ought to be supported by a calculation of MDC so that it can be decided whether the chosen threshold makes sense mathematically (i.e., can reliably be distinguished from measurement inaccuracies) as seen in our case study.
Previously, MCID thresholds for outcome measurement instruments were calculated for generic populations, such as patients suffering from low back pain. More recently, MCID values for commonly used PROMs in spine surgery, such as ODI, RMDQ or NRS have been calculated for more narrowly defined diagnoses, such as lumbar disc herniation (LDH) or LSS. The question arises as to whether a separate MCID is needed for all the different spinal conditions. In general, establishing an MCID specific to these patient groups is only recommended if these patient’s perception of meaningful change is different from that of low back pain in general. Importantly, again, the MCID should not be treatment-specific, but rather broadly disease specific. Therefore, it is advisable to use MCID based on patients who had the most similar disease characteristics to our cohort. For example, an MCID for NRS Back Pain based on study group composed of different types of lumbar degenerative disease, may in some cases, be applied to study cohort composed solely of patients with LDH. However, no such extrapolation should be performed for populations with back pain secondary to malignancy, due to a totally different pathogenesis and associated symptoms that may influence the ability to detect a clinically meaningful change in the above NRS Back Pain such as fatigue or anorexia.
Regardless of robust methodology, it can be expected that it is impossible to obtain the same MCID on different occasions even in the same population due to the inherent subjectivity of what is perceived as “clinically beneficial” and day-to-day symptom fluctuation. However, it was found that patients who have worse baseline scores, reflecting e.g., more advanced disease, require greater overall change at follow-up to report it as clinically meaningful [ 72 ]. One should also be mindful of “regression to the mean” where extremely high or low-scoring patients then subsequently score closer to baseline at second measurement [ 73 ]. Therefore, adequate cohort characteristics need to be presented, for the readers to judge how generalizable the MCID may be to their study cohort. If a patient pre-operatively experiences NRS Leg Pain of 1, and the MCID is 1.6, they cannot achieve MCID at all, as the maximum possible change score is smaller than the MCID threshold (“floor effect”). A similar situation can occur with patients closer to the higher end of the scale (“ceiling effect”). The general rule is, that if at least 15% of the study cohort has the highest or lowest possible score for a given outcome instrument, one can expect significant “ceiling/floor effects” [ 50 ]. One way to overcome this, is through transferring absolute MCID scores to percentage change scores [ 4 , 45 ]. However, percentage change scores only account for high baseline scores, if high baseline scores indicate larger disability (as seen with ODI) and have a possibility of larger change. If a high score in an instruments reflects better health status (as seen in in SF-36), than percentage change scores will increase the association with baseline score [ 4 ]. In general, it is important to consider which patient to exclude from certain analyses when applying MCID: For example, patients without relevant disease preoperatively (for example, those exhibiting so-called “patient-accepted symptom states”, PASS) should probably be excluded altogether when reporting the percentage of patients achieving MCID [ 74 ].
Establishing reliable thresholds for MCID is key in clinical research and forms the basis of patient-centered treatment evaluations when using patient-reported outcome measures or objective functional tests. Calculation of MCID thresholds can be achieved using a variety of different methods, each yielding completely different results, as is demonstrated in this practical guide. Generally, anchor-based methods relying on scales assessing patient preferences/satisfaction or global assessment ratings continue to be the “gold-standard” approach- the most common being ROC analysis. In the absence of appropriate anchors, the distribution-based MCID based on the 95% MDC approach is acceptable, as it appears to yield the most similar results compared to anchor-based approaches. Moreover, we recommend using it as a supplement to any anchor-based MCID thresholds to check if they can reliably distinguish true change from measurement inaccuracies. The explanation provided in this practical guide with step-by-step examples along with public data and statistical code can add as guidance for future studies calculating MCID thresholds.
Jaeschke R, Singer J, Guyatt GH (1989) Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials 10:407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Article CAS PubMed Google Scholar
Concato J, Hartigan JA (2016) P values: from suggestion to superstition. J Investig Med 64:1166. https://doi.org/10.1136/jim-2016-000206
Article PubMed PubMed Central Google Scholar
Zannikos S, Lee L, Smith HE (2014) Minimum clinically important difference and substantial clinical benefit: Does one size fit all diagnoses and patients? Semin Spine Surg 26:8–11. https://doi.org/10.1053/j.semss.2013.07.004
Article Google Scholar
Copay AG, Subach BR, Glassman SD et al (2007) Understanding the minimum clinically important difference: a review of concepts and methods. Spine J 7:541–546. https://doi.org/10.1016/j.spinee.2007.01.008
Article PubMed Google Scholar
Lanario J, Hyland M, Menzies-Gow A et al (2020) Is the minimally clinically important difference (MCID) fit for purpose? a planned study using the SAQ. Euro Respirat J. https://doi.org/10.1183/13993003.congress-2020.2241
Neely JG, Karni RJ, Engel SH, Fraley PL, Nussenbaum B, Paniello RC (2007) Practical guides to understanding sample size and minimal clinically important difference (MCID). Otolaryngol Head Neck Surg 136(1):14–18. https://doi.org/10.1016/j.otohns.2006.11.001
Copay AG, Glassman SD, Subach BR et al (2008) Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry disability index, medical outcomes study questionnaire short form 36, and pain scales. Spine J 8:968–974. https://doi.org/10.1016/j.spinee.2007.11.006
Andersson EI, Lin CC, Smeets RJ (2010) Performance tests in people with chronic low back pain: responsiveness and minimal clinically important change. Spine 35(26):E1559-1563. https://doi.org/10.1097/BRS.0b013e3181cea12e
Mannion AF, Porchet F, Kleinstück FS, Lattig F, Jeszenszky D, Bartanusz V, Dvorak J, Grob D (2009) The quality of spine surgery from the patient’s perspective. Part 1: the core outcome measures index in clinical practice. Euro Spine J 18:367–373. https://doi.org/10.1007/s00586-009-0942-8
Crosby RD, Kolotkin RL, Williams GR (2003) Defining clinically meaningful change in health-related quality of life. J Clin Epidemiol 56:395–407. https://doi.org/10.1016/S0895-4356(03)00044-1
Gatchel RJ, Mayer TG (2010) Testing minimal clinically important difference: consensus or conundrum? Spine J 10:321–327. https://doi.org/10.1016/j.spinee.2009.10.015
Minetama M, Kawakami M, Teraguchi M et al (2019) Supervised physical therapy vs. home exercise for patients with lumbar spinal stenosis: a randomized controlled trial. Spine J 19:1310–1318. https://doi.org/10.1016/j.spinee.2019.04.009
R Core Team (2021) R A Language and Environment for Statistical Computing
Chung AS, Copay AG, Olmscheid N, Campbell D, Walker JB, Chutkan N (2017) Minimum clinically important difference: current trends in the spine literature. Spine 42(14):1096–1105. https://doi.org/10.1097/BRS.0000000000001990
Khan I, Pennings JS, Devin CJ, Asher AM, Oleisky ER, Bydon M, Asher AL, Archer KR (2021) Clinically meaningful improvement following cervical spine surgery: 30% reduction versus absolute point-change MCID values. Spine 46(11):717–725. https://doi.org/10.1097/BRS.0000000000003887
Gautschi OP, Stienen MN, Corniola MV et al (2016) Assessment of the minimum clinically important difference in the timed up and go test after surgery for lumbar degenerative disc disease. Neurosurgery. https://doi.org/10.1227/NEU.0000000000001320
Kulkarni AV (2006) Distribution-based and anchor-based approaches provided different interpretability estimates for the hydrocephalus outcome questionnaire. J Clin Epidemiol 59:176–184. https://doi.org/10.1016/j.jclinepi.2005.07.011
Wang Y, Devji T, Qasim A et al (2022) A systematic survey identified methodological issues in studies estimating anchor-based minimal important differences in patient-reported outcomes. J Clin Epidemiol 142:144–151. https://doi.org/10.1016/j.jclinepi.2021.10.028
Parker SL, Godil SS, Shau DN et al (2013) Assessment of the minimum clinically important difference in pain, disability, and quality of life after anterior cervical discectomy and fusion: clinical article. J Neurosurg Spine 18:154–160. https://doi.org/10.3171/2012.10.SPINE12312
Carrasco-Labra A, Devji T, Qasim A et al (2021) Minimal important difference estimates for patient-reported outcomes: a systematic survey. J Clin Epidemiol 133:61–71. https://doi.org/10.1016/j.jclinepi.2020.11.024
Prinsen CAC, Mokkink LB, Bouter LM et al (2018) COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res 27:1147–1157. https://doi.org/10.1007/s11136-018-1798-3
Article CAS PubMed PubMed Central Google Scholar
Mokkink LB, de Vet HCW, Prinsen CAC et al (2018) COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res 27:1171–1179. https://doi.org/10.1007/s11136-017-1765-4
Terwee CB, Prinsen CAC, Chiarotto A et al (2018) COSMIN methodology for evaluating the content validity of patient-reported outcome measures: a Delphi study. Qual Life Res 27:1159–1170. https://doi.org/10.1007/s11136-018-1829-0
Glassman SD, Copay AG, Berven SH et al (2008) Defining substantial clinical benefit following lumbar spine arthrodesis. J Bone Joint Surg Am 90:1839–1847. https://doi.org/10.2106/JBJS.G.01095
de Vet HCW, Ostelo RWJG, Terwee CB et al (2007) Minimally important change determined by a visual method integrating an anchor-based and a distribution-based approach. Qual Life Res 16:131–142. https://doi.org/10.1007/s11136-006-9109-9
Solberg T, Johnsen LG, Nygaard ØP, Grotle M (2013) Can we define success criteria for lumbar disc surgery? Acta Orthop 84:196–201. https://doi.org/10.3109/17453674.2013.786634
Power JD, Perruccio AV, Canizares M et al (2023) Determining minimal clinically important difference estimates following surgery for degenerative conditions of the lumbar spine: analysis of the Canadian spine outcomes and research network (CSORN) registry. The Spine Journal 23:1323–1333. https://doi.org/10.1016/j.spinee.2023.05.001
Asher AL, Kerezoudis P, Mummaneni PV et al (2018) Defining the minimum clinically important difference for grade I degenerative lumbar spondylolisthesis: insights from the quality outcomes database. Neurosurg Focus 44:E2. https://doi.org/10.3171/2017.10.FOCUS17554
Cleland JA, Whitman JM, Houser JL et al (2012) Psychometric properties of selected tests in patients with lumbar spinal stenosis. Spine J 12:921–931. https://doi.org/10.1016/j.spinee.2012.05.004
Norman GR, Sloan JA, Wyrwich KW (2003) Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 41:582–592. https://doi.org/10.1097/01.MLR.0000062554.74615.4C
Parker SL, Mendenhall SK, Shau DN et al (2012) Minimum clinically important difference in pain, disability, and quality of life after neural decompression and fusion for same-level recurrent lumbar stenosis: understanding clinical versus statistical significance. J Neurosurg Spine 16:471–478. https://doi.org/10.3171/2012.1.SPINE11842
Gatchel RJ, Mayer TG, Chou R (2012) What does/should the minimum clinically important difference measure?: a reconsideration of its clinical value in evaluating efficacy of lumbar fusion surgery. Clin J Pain 28:387. https://doi.org/10.1097/AJP.0b013e3182327f20
Lloyd H, Jenkinson C, Hadi M et al (2014) Patient reports of the outcomes of treatment: a structured review of approaches. Health Qual Life Outcomes 12:5. https://doi.org/10.1186/1477-7525-12-5
Beighley A, Zhang A, Huang B et al (2022) Patient-reported outcome measures in spine surgery: a systematic review. J Craniovertebr Junction Spine 13:378–389. https://doi.org/10.4103/jcvjs.jcvjs_101_22
Ogura Y, Ogura K, Kobayashi Y et al (2020) Minimum clinically important difference of major patient-reported outcome measures in patients undergoing decompression surgery for lumbar spinal stenosis. Clin Neurol Neurosurg 196:105966. https://doi.org/10.1016/j.clineuro.2020.105966
Wang Y, Devji T, Carrasco-Labra A et al (2023) An extension minimal important difference credibility item addressing construct proximity is a reliable alternative to the correlation item. J Clin Epidemiol 157:46–52. https://doi.org/10.1016/j.jclinepi.2023.03.001
Devji T, Carrasco-Labra A, Qasim A et al (2020) Evaluating the credibility of anchor based estimates of minimal important differences for patient reported outcomes: instrument development and reliability study. BMJ 369:m1714. https://doi.org/10.1136/bmj.m1714
Stucki G, Daltroy L, Liang MH et al (1996) Measurement properties of a self-administered outcome measure in lumbar spinal stenosis. Spine 21:796
Fujimori T, Ikegami D, Sugiura T, Sakaura H (2022) Responsiveness of the Zurich claudication questionnaire, the Oswestry disability index, the Japanese orthopaedic association back pain evaluation questionnaire, the 8-item short form health survey, and the Euroqol 5 dimensions 5 level in the assessment of patients with lumbar spinal stenosis. Eur Spine J 31:1399–1412. https://doi.org/10.1007/s00586-022-07236-5
Fukui M, Chiba K, Kawakami M et al (2009) JOA back pain evaluation questionnaire (JOABPEQ)/ JOA cervical myelopathy evaluation questionnaire (JOACMEQ) the report on the development of revised versions April 16, 2007: the subcommittee of the clinical outcome committee of the Japanese orthopaedic association on low back pain and cervical myelopathy evaluation. J Orthop Sci 14:348–365. https://doi.org/10.1007/s00776-009-1337-8
Kasai Y, Fukui M, Takahashi K et al (2017) Verification of the sensitivity of functional scores for treatment results–substantial clinical benefit thresholds for the Japanese orthopaedic association back pain evaluation questionnaire (JOABPEQ). J Orthop Sci 22:665–669. https://doi.org/10.1016/j.jos.2017.02.012
Glassman SD, Carreon LY, Anderson PA, Resnick DK (2011) A diagnostic classification for lumbar spine registry development. Spine J 11:1108–1116. https://doi.org/10.1016/j.spinee.2011.11.016
Perkins NJ, Schisterman EF (2006) The inconsistency of “optimal” cutpoints obtained using two criteria based on the receiver operating characteristic curve. Am J Epidemiol 163:670–675. https://doi.org/10.1093/aje/kwj063
Nahm FS (2022) Receiver operating characteristic curve: overview and practical use for clinicians. Korean J Anesthesiol 75:25–36. https://doi.org/10.4097/kja.21209
Angst F, Aeschlimann A, Angst J (2017) The minimal clinically important difference raised the significance of outcome effects above the statistical level, with methodological implications for future studies. J Clin Epidemiol 82:128–136. https://doi.org/10.1016/j.jclinepi.2016.11.016
Wyrwich KW, Tierney WM, Wolinsky FD (1999) Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol 52:861–873. https://doi.org/10.1016/s0895-4356(99)00071-2
Wolinsky FD, Wan GJ, Tierney WM (1998) Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults. Med Care 36:1589–1598
Wyrwich KW, Nienaber NA, Tierney WM, Wolinsky FD (1999) Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care 37:469–478. https://doi.org/10.1097/00005650-199905000-00006
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15:155–163. https://doi.org/10.1016/j.jcm.2016.02.012
McHorney CA, Tarlov AR (1995) Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res 4:293–307
Hara N, Matsudaira K, Masuda K et al (2016) Psychometric assessment of the Japanese version of the Zurich claudication questionnaire (ZCQ): reliability and validity. PLoS ONE 11:e0160183. https://doi.org/10.1371/journal.pone.0160183
Kazis LE, Anderson JJ, Meenan RF (1989) Effect sizes for interpreting changes in health status. Med Care 27:S178–S189. https://doi.org/10.1097/00005650-198903001-00015
Cohen J (1988) Statistical power analysis for the behavioral sciences. L Erlbaum Associates, Hillsdale, NJ
Franceschini M, Boffa A, Pignotti E et al (2023) The minimal clinically important difference changes greatly based on the different calculation methods. Am J Sports Med 51:1067–1073. https://doi.org/10.1177/03635465231152484
Samsa G, Edelman D, Rothman ML et al (1999) Determining clinically important differences in health status measures: a general approach with illustration to the health utilities index mark II. Pharmacoeconomics 15:141–155. https://doi.org/10.2165/00019053-199915020-00003
Wright A, Hannon J, Hegedus EJ, Kavchak AE (2012) Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Man Manip Ther 20:160–166. https://doi.org/10.1179/2042618612Y.0000000001
Stucki G, Liang MH, Fossel AH, Katz JN (1995) Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 48:1369–1378. https://doi.org/10.1016/0895-4356(95)00054-2
Liang MH, Fossel AH, Larson MGS (1990) Comparisons of five health status instruments for orthopedic evaluation. Med Care 28:632–642
Middel B, Van Sonderen E (2002) Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. Int J Integr care. https://doi.org/10.5334/ijic.65
Revicki D, Hays RD, Cella D, Sloan J (2008) Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 61:102–109. https://doi.org/10.1016/j.jclinepi.2007.03.012
Woaye-Hune P, Hardouin J-B, Lehur P-A et al (2020) Practical issues encountered while determining minimal clinically important difference in patient-reported outcomes. Health Qual Life Outcomes 18:156. https://doi.org/10.1186/s12955-020-01398-w
Parker SL, Mendenhall SK, Shau D et al (2012) Determination of minimum clinically important difference in pain, disability, and quality of life after extension of fusion for adjacent-segment disease. J Neurosurg Spine 16:61–67. https://doi.org/10.3171/2011.8.SPINE1194
Jacobson NS, Truax P (1991) Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 59:12–19
Bolton JE (2004) Sensitivity and specificity of outcome measures in patients with neck pain: detecting clinically significant improvement. Spine 29(21):2410–2417. https://doi.org/10.1097/01.brs.0000143080.74061.25
Blampied NM (2022) Reliable change and the reliable change index: Still useful after all these years? Cogn Behav Ther 15:e50. https://doi.org/10.1017/S1754470X22000484
Asher AM, Oleisky ER, Pennings JS et al (2020) Measuring clinically relevant improvement after lumbar spine surgery: Is it time for something new? Spine J 20:847–856. https://doi.org/10.1016/j.spinee.2020.01.010
Barrett D, Heale R (2020) What are Delphi studies? Evid Based Nurs 23:68–69. https://doi.org/10.1136/ebnurs-2020-103303
Droeghaag R, Schuermans VNE, Hermans SMM et al (2021) Evidence-based recommendations for economic evaluations in spine surgery: study protocol for a Delphi consensus. BMJ Open 11:e052988. https://doi.org/10.1136/bmjopen-2021-052988
Henderson EJ, Morgan GS, Amin J et al (2019) The minimum clinically important difference (MCID) for a falls intervention in Parkinson’s: a delphi study. Parkinsonism Relat Disord 61:106–110. https://doi.org/10.1016/j.parkreldis.2018.11.008
Redelmeier DA, Guyatt GH, Goldstein RS (1996) Assessing the minimal important difference in symptoms: a comparison of two techniques. J Clin Epidemiol 49:1215–1219. https://doi.org/10.1016/s0895-4356(96)00206-5
Carreon LY, Glassman SD, Campbell MJ, Anderson PA (2010) Neck disability index, short form-36 physical component summary, and pain scales for neck and arm pain: the minimum clinically important difference and substantial clinical benefit after cervical spine fusion. Spine J 10:469–474. https://doi.org/10.1016/j.spinee.2010.02.007
Wang Y-C, Hart DL, Stratford PW, Mioduski JE (2011) Baseline dependency of minimal clinically important improvement. Phys Ther 91:675–688. https://doi.org/10.2522/ptj.20100229
Tenan MS, Simon JE, Robins RJ et al (2021) Anchored minimal clinically important difference metrics: considerations for bias and regression to the mean. J Athl Train 56:1042–1049. https://doi.org/10.4085/1062-6050-0368.20
Staartjes VE, Stumpo V, Ricciardi L et al (2022) FUSE-ML: development and external validation of a clinical prediction model for mid-term outcomes after lumbar spinal fusion for degenerative disease. Eur Spine J 31:2629–2638. https://doi.org/10.1007/s00586-022-07135-9
Download references
Open access funding provided by University of Zurich. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Authors and affiliations.
Department of Neurosurgery, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, Amsterdam, The Netherlands
Anita M. Klukowska & W. Peter Vandertop
Department of Neurosurgery, University Clinical Hospital of Bialystok, Bialystok, Poland
Anita M. Klukowska
Department of Neurosurgery, Park Medical Center, Rotterdam, The Netherlands
Marc L. Schröder
Machine Intelligence in Clinical Neuroscience and Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
Victor E. Staartjes
You can also search for this author in PubMed Google Scholar
Correspondence to Victor E. Staartjes .
Conflict of interest.
The authors declare that the article and its content were composed in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Klukowska, A.M., Vandertop, W.P., Schröder, M.L. et al. Calculation of the minimum clinically important difference (MCID) using different methodologies: case study and practical guide. Eur Spine J (2024). https://doi.org/10.1007/s00586-024-08369-5
Download citation
Received : 03 May 2024
Revised : 17 May 2024
Accepted : 10 June 2024
Published : 28 June 2024
DOI : https://doi.org/10.1007/s00586-024-08369-5
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Explore the latest interviews, correspondent coverage, best-of moments and more from The Daily Show.
Attend a Live Taping
Find out how you can see The Daily Show live and in-person as a member of the studio audience.
New Episodes Thursdays
Jon Stewart and special guests tackle complex issues.
Great Things Are in Store
Become the proud owner of exclusive gear, including clothing, drinkware and must-have accessories.
IMAGES
VIDEO
COMMENTS
A case study interview allows the hiring manager to see your skills in action and how you approach business challenges. But it also teaches you a lot about the company (even if you're doing most of the talking). In a sense, you're behaving as an employee during a case study interview. This gives you a peek behind the curtain, allowing you ...
Case interviews assess five different qualities or characteristics: logical and structured thinking, analytical problem solving, business acumen, communication skills, and personality and cultural fit. 1. Logical and structured thinking: Consultants need to be organized and methodical in order to work efficiently.
A Case Study Interview is a real-time problem-solving test used to screen candidates for their ability to succeed in consulting. The case is presented as an open-ended question, often a problem that a specific type of business is facing, that an interviewer asks a candidate to solve.
1. The key to landing your consulting job. Case interviews - where you are asked to solve a business case study under scrutiny - are the core of the selection process right across McKinsey, Bain and BCG (the "MBB" firms). This interview format is also used pretty much universally across other high-end consultancies; including LEK, Kearney ...
Case studies, which involve an in-depth look at a single subject, provide very accurate information via interviews and researcher observations. However, they take a lot of time and, therefore ...
Candidate-Led: In these case interviews, you will be presented with a question by the interviewer and then expected to lead them through to an answer step-by-step. Interviewer-Led: These types of case interviewers involve "1-2 interviewers leading a candidate through a multi-step case problem," says William Wadsworth of Exam Study Expert.
A case study involves an in-depth analysis of a specific individual, group, or situation, aiming to understand the complexities and unique aspects of the subject. It often involves collecting qualitative data through interviews, observations, and document analysis. On the other hand, a survey is a structured data collection method that involves ...
Mental Math for Case Interviews - You were probably better at mental math in 7th grade than you are now. Brush up on your skills to ensure you can ace the interview. Case Interview Examples - See what real consulting applicants experienced during the case interview process. Case Interview Prep - Ordered steps to prepare for your ...
A case interview is defined as "a hypothetical business situation that is presented during the job interview process to determine how a candidate thinks about a particular problem and how they would solve it.". Usually, this specific business problem or situation is one that a candidate would face if hired for the job in that specific company.
Revised on November 20, 2023. A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are ...
A case interview is a type of job interview in which the candidate must analyze and solve a problematic business scenario ("case study"). It is used to simulate the situation on-the-job and to find out if the respective candidate meets the necessary analytical and communications skills required for the profession. Case interviews are commonly and globally used during the selection ...
Two case interview styles exist: Interviewer-led (used at McKinsey) interviewee-led (used almost everywhere else) When we coach candidates 1:1, we will focus on the differences in great detail - but that's not the point of this article. Within both case styles, you will encounter a variety of case interview types.
This type of case study is useful when the researcher wants to identify similarities and differences between the cases. ... Here are some common data collection methods for case studies: Interviews. Interviews involve asking questions to individuals who have knowledge or experience relevant to the case study. Interviews can be structured (where ...
Case study interviews are used by consulting agencies to recruit new consultants in their firms. In this interview, the interviewer presents a scenario based on a real-life business challenge and asks the candidate to design a solution for that problem. This kind of interview is typically a simulation of an issue that the hiring manager has ...
A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...
Employers in technical fields often use case interviews to determine how much a potential employee knows about their chosen field. Employers may use this type of interview to assess how a candidate approaches problems that have multiple potential solutions. Candidates that have an upcoming interview may feel more confident in their responses if ...
Hi Pete, as Lorena told you, Case study is a method and interview is an instrument for data collection for using as part any method. Please read: Creswell, J.W. (2013, 1998) "Qualitative inquiry ...
Although case studies have been discussed extensively in the literature, little has been written about the specific steps one may use to conduct case study research effectively (Gagnon, 2010; Hancock & Algozzine, 2016).Baskarada (2014) also emphasized the need to have a succinct guideline that can be practically followed as it is actually tough to execute a case study well in practice.
One of the contexts where case studies can be very helpful is during the job interview. In some job interviews, you as candidates may be asked to present a case study as part of the selection process. ... The way you present a case study can make all the difference in how it's received. A well-structured presentation not only holds the ...
Interviewing is a very effective method of data collection. It is a systematic and objective conversation between an investigator and respondent for collecting relevant data for a specific research study. Along with conversation, learning about the gestures, facial expressions and environmental conditions of a respondent are also very important.
One of the best ways to prepare for case interviews at firms like McKinsey, BCG, or Bain, is by studying case interview examples.. There are a lot of free sample cases out there, but it's really hard to know where to start. So in this article, we have listed all the best free case examples available, in one place.
Key Differences. A case study involves a detailed examination of a single subject, such as an individual, event, or organization, to gain in-depth insights. In contrast, a survey is a research tool used to gather data from a sample population, focusing on gathering quantitative information or opinions through questions. 14.
This is the case for experiences gained during education. ... This qualitative interview study was carried out from February to August 2019. The aim of the study was to examine variations in how educators at CAMES talk about social and cognitive capabilities. ... Previous studies have found differences between nurses and physicians that are in ...
Choice of outcome measurement instruments for MCID calculation case study. The chosen outcome measurement instrument in this case study for which MCID for improvement will be calculated is ZCQ Symptom Severity domain . The ZCQ is composed of three subscales: symptom severity (7 questions, score per question ranging from 1 to 5 points); physical ...
The source for The Daily Show fans, with episodes hosted by Jon Stewart, Ronny Chieng, Jordan Klepper, Dulcé Sloan and more, plus interviews, highlights and The Weekly Show podcast.