Artificial intelligence in education: : A systematic literature review

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

Information & Contributors

Bibliometrics & citations, view options, graphical abstract, recommendations, mapping research strands of ethics of artificial intelligence in healthcare: a bibliometric and content analysis.

The growth of artificial intelligence in promoting healthcare is rapidly progressing. Notwithstanding its promising nature, however, AI in healthcare embodies certain ethical challenges as well. This research aims to delineate the most ...

  • This paper analyses scientific articles on the ethics of AI in healthcare domain.
  • This analysis detects the hottest ethical challenges in the domain.
  • It identifies the most influential elements such as authors and documents.

Use of Artificial Intelligence for Training: A Systematic Review

With the rapid advancement of artificial intelligence, it has been widely applied in various domains to assist training, including education, medical, automation, and industrial fields. The purpose of this study is to deep dive into the use of ...

Tracking developments in artificial intelligence research: constructing and applying a new search strategy

Artificial intelligence, as an emerging and multidisciplinary domain of research and innovation, has attracted growing attention in recent years. Delineating the domain composition of artificial intelligence is central to profiling and tracking ...

Information

Published in.

Pergamon Press, Inc.

United States

Publication History

Author tags.

  • Artificial intelligence
  • Bibliometric analysis
  • Literature review
  • Content analysis
  • Review-article

Contributors

Other metrics, bibliometrics, article metrics.

  • 0 Total Citations
  • 0 Total Downloads
  • Downloads (Last 12 months) 0
  • Downloads (Last 6 weeks) 0

View options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Share this publication link.

Copying failed.

Share on social media

Affiliations, export citations.

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

  • Review article
  • Open access
  • Published: 19 January 2024

A meta systematic review of artificial intelligence in higher education: a call for increased ethics, collaboration, and rigour

  • Melissa Bond   ORCID: orcid.org/0000-0002-8267-031X 1 , 2 , 3 ,
  • Hassan Khosravi 4 ,
  • Maarten De Laat 5 ,
  • Nina Bergdahl 6 , 7 ,
  • Violeta Negrea 3 ,
  • Emily Oxley 3 ,
  • Phuong Pham 5 ,
  • Sin Wang Chong 3 , 8 &
  • George Siemens 5  

International Journal of Educational Technology in Higher Education volume  21 , Article number:  4 ( 2024 ) Cite this article

26k Accesses

35 Citations

107 Altmetric

Metrics details

Although the field of Artificial Intelligence in Education (AIEd) has a substantial history as a research domain, never before has the rapid evolution of AI applications in education sparked such prominent public discourse. Given the already rapidly growing AIEd literature base in higher education, now is the time to ensure that the field has a solid research and conceptual grounding. This review of reviews is the first comprehensive meta review to explore the scope and nature of AIEd in higher education (AIHEd) research, by synthesising secondary research (e.g., systematic reviews), indexed in the Web of Science, Scopus, ERIC, EBSCOHost, IEEE Xplore, ScienceDirect and ACM Digital Library, or captured through snowballing in OpenAlex, ResearchGate and Google Scholar. Reviews were included if they synthesised applications of AI solely in formal higher or continuing education, were published in English between 2018 and July 2023, were journal articles or full conference papers, and if they had a method section 66 publications were included for data extraction and synthesis in EPPI Reviewer, which were predominantly systematic reviews (66.7%), published by authors from North America (27.3%), conducted in teams (89.4%) in mostly domestic-only collaborations (71.2%). Findings show that these reviews mostly focused on AIHEd generally (47.0%) or Profiling and Prediction (28.8%) as thematic foci, however key findings indicated a predominance of the use of Adaptive Systems and Personalisation in higher education. Research gaps identified suggest a need for greater ethical, methodological, and contextual considerations within future research, alongside interdisciplinary approaches to AIHEd application. Suggestions are provided to guide future primary and secondary research.

Introduction

Artificial Intelligence (AI) has existed since the 1960s and its adoption in education, particularly with the early introduction of intelligent tutoring systems, has become a substantive research domain (AIEd). Despite the growing realisation of the potential for AI within education, influenced by educational evidence-based policy, including education departments and international organisations (e.g., OECD, 2021 ), it has arguably only now transitioned from work in labs to active practice in classrooms, and broken through the veil of public discourse. The introduction of ChatGPT Footnote 1 and DALL-E, Footnote 2 for example, has both captured our imagination and shocked in equal measure (Bozkurt et al., 2023 ), requiring schools, universities, and organisations to respond to generative AI’s growing capabilities, with increasing numbers of publicly available AI chatbots on the horizon (e.g., Google’s Bard Footnote 3 and LLaMA Footnote 4 ). The uptake of these tools has given rise to a debate in education about readiness, ethics, trust, impact and value add of AI, as well as the need for governance, regulation, research and training to cope with the speed and scale at which AI is transforming teaching and learning. Globally, governments are putting measures in place to respond to this unfolding phenomenon, for example in Europe they introduced the EU AI Act, which they claim is the world’s first comprehensive AI law. Footnote 5 Australia established a taskforce to outline a framework for generative artificial intelligence in schools Footnote 6 and in the United States, the Department of Education calls for an AI bill of rights to develop a comprehensive approach towards the adoption of AI in education. Footnote 7 Needless to say, it is important that these actions are based on a solid foundation of research and conceptual grounding. Even though there is a vibrant AIEd research community, much of this foundational work is still in development. This tertiary review, Footnote 8 which is the first of its kind in AIEd, provides the foundation for future conceptualisation and utilisation of AI in higher education.

Contribution of this review

Whilst evidence synthesis is a welcome approach to gaining insight into effective applications of AI in education, there is a risk of ‘research waste’ in every field of research due to a duplication of efforts, by conducting reviews on the same or similar topics (Grainger et al., 2020 ; Siontis & Ioannidis, 2018 ). This can occur when researchers do not give enough consideration to work that has already been published, costing valuable time, effort, and money (Robinson et al., 2021 ). In order to help avoid research waste, and to map the state of the AIEd field in higher education (AIHEd), this review is the first to undertake a tertiary review approach (Kitchenham et al., 2009 ). A tertiary review is a type of research that synthesises evidence from secondary studies, such as systematic reviews, and is sometimes known as a review of reviews or as an overview (Sutton et al., 2019 ). This method allows researchers to gain an overarching meta view of a field through a systematic process, identifying and analysing types of evidence and key characteristics, exploring how research has been conducted, and identifying gaps in the literature to better guide future field development (Polanin et al., 2017 ). Given the current interest around the uptake of generative AI, now is the perfect time to take stock of where we have been, in order to provide suggestions for where we might go in the future.

Research questions

Against this background, the following research question and sub questions guide this review:

What is the nature and scope of AIEd evidence synthesis in higher education (AIHEd)?

What kinds of evidence syntheses are being conducted?

In which conference proceedings and academic journals are AIHEd evidence syntheses published?

What is the geographical distribution of authorship and authors’ affiliations?

How collaborative is AIHEd evidence synthesis?

What technology is being used to conduct aihed evidence synthesis.

What is the quality of evidence synthesis exploring AIHEd?

What main applications are explored in AIHEd secondary research?

What are the key findings of AIHEd research?

What are the benefits and challenges reported within AIHEd reviews?

What research gaps have been identified in AIHEd secondary research?

Literature review

Artificial intelligence in education (aied).

The evolution of AIEd can be traced back several decades, exhibiting a rich history of intertwining educational theory and emergent technology (Doroudi, 2022 ). As the field matured through the 1990s and into the 2000s, research began to diversify and deepen, exploring varied facets of AIEd such as intelligent tutoring systems (Woolf, 2010 ), adaptive learning environments (Desmarais & Baker, 2012 ) as well as supporting collaborative learning environments (Dillenbourg & Jermann, 2007 ). In the last decade, the synergies between AI technologies and educational practices have further intensified, propelled by advancements in machine learning, natural language processing, and cognitive computing. This era explored innovative applications, including chatbots for student engagement, automated grading and feedback, predictive analytics for student success, and various adaptive platforms for personalised learning. Yet, amid the technological strides, researchers also continued to grapple with persistent challenges and new dilemmas such as ensuring ethical use (Holmes et al., 2021 ), enhancing system transparency and explainability (Khosravi et al., 2022 ), and navigating the pedagogical implications of increasingly autonomous AI systems in educational settings (Han et al., 2023 ).

In order to gain further understanding of the applications of AI in higher education, and to provide guidance to the field, Zawacki-Richter et al. ( 2019 ) developed a typology (see. Figure  1 ), classifying research into four broad areas; Profiling and prediction, intelligent tutoring systems, assessment and evaluation and adaptive systems and personalisation.

figure 1

Zawacki-Richter et al.’s ( 2019 ) original AIEd typology

Profiling and Prediction This domain focuses on employing data-driven approaches to make informed decisions and forecasts regarding students’ academic journeys. It includes using AI to optimise admissions decisions and course scheduling, predict and improve dropout and retention rates, and develop comprehensive student models to evaluate and enhance academic achievement by scrutinising patterns and tendencies in student data.

Intelligent Tutoring Systems (ITS) This domain leverages AI to enrich teaching and learning experiences by providing bespoke instructional interventions. The systems work by teaching course content, diagnosing students’ strengths and weaknesses and offering automated, personalised feedback, curating appropriate learning materials, facilitating meaningful collaboration among learners, and providing insights from the teacher’s perspective to improve pedagogical strategies.

Assessment and Evaluation This domain focuses on the potential of AI to automate and enhance the evaluative aspects of the educational process. It includes leveraging algorithms for automated grading, providing immediate and tailored feedback to students, meticulously evaluating student understanding and engagement, ensuring academic integrity, and implementing robust mechanisms for the evaluation of teaching methodologies and effectiveness.

Adaptive Systems and Personalisation This domain explores the use of AI to mould educational experiences that are tailored to individual learners. This involves tailoring course content delivery, recommending personalised content and learning pathways, supporting teachers in enhancing learning design and implementation, utilising academic data to monitor, guide, and support students effectively, and representing knowledge in intuitive and insightful concept maps to facilitate deeper understanding.

Prior AIEd syntheses in higher education

There has been a proliferation of evidence synthesis conducted in the field of EdTech, particularly within the past five years (Zawacki-Richter, 2023 ), with the rising number of secondary research resulting in the need for tertiary reviews (e.g., Lai & Bower, 2020 ; Tamim et al., 2011 ). The interest in AIEd has also been increasing (e.g., Chen et al., 2022 ), for example the first phase of a systematic review of pedagogical agents by Sikström et al. ( 2022 ), included an umbrella review of six reviews and meta-analyses, and Daoudi’s ( 2022 ) review of learning analytics and serious games included at least four literature reviews. Furthermore, according to Google Scholar, Footnote 9 the AIHEd review by Zawacki-Richter et al. ( 2019 ) has been cited 1256 times since it was published, with the article accessed over 215,000 times and appearing six times in written news stories, Footnote 10 indicating a wide-ranging public interest in AIHEd.

Prior AIHEd tertiary syntheses have so far also taken place within secondary research (e.g., systematic reviews), rather than as standalone reviews of reviews such as this one. Saghiri et al. ( 2022 ), for example, included an analysis of four systematic reviews in their scoping review of AI applications in dental education, de Oliveira et al. ( 2021 ) included eight reviews in their systematic review of educational data mining for recommender systems, and Sapci and Sapci ( 2020 ) included five reviews in their systematic review of medical education. However, by synthesising both primary and secondary studies within the one review, there is a risk of study duplication, and authors need to be particularly careful to ensure that a primary study identified for inclusion is not also included in one of the secondary studies, to ensure that the results presented are accurate, and the review conducted to a high quality.

Evidence synthesis methods

Literature reviews (or narrative reviews) are the most commonly known form of secondary research; however, a range of evidence synthesis methods have increasingly emerged, particularly from the field of health care. In fact, Sutton et al. ( 2019 ) identified 48 different review types, which they classified into seven review families (see Table  1 ). Although part of the traditional review family, literature reviews have increasingly been influenced by the move to more systematic approaches, with many now including method sections, whilst still using the ‘literature review’ moniker (e.g., Alyahyan & Düştegör, 2020 ). Bibliometric analyses have also emerged as a popular form of evidence synthesis (e.g., Linnenluecke et al., 2020 ; Zheng et al., 2022 ), which analyse bibliographic data to explore research trends and impact. Whilst not included in the Sutton et al. ( 2019 ) framework, their ability to provide insight into a field arguably necessitates their inclusion as a valuable form of evidence synthesis.

Evidence synthesis quality

It is crucial that any type of evidence synthesis reports the methods used in complete detail (aside from those categorised in the ‘traditional review family’), to enable trustworthiness and replicability (Chalmers et al., 2023 ; Gough et al., 2012 ). Guidance for synthesis methods have been available for more than a decade (e.g., Moher et al., 2009 ; Rader et al., 2014 ) and are constantly being updated as the methodology advances (e.g., Rethlefsen et al., 2021 ; Tricco et al., 2018 ). However, issues of quality when undertaking evidence synthesis persist. Chalmers et al. ( 2023 ), for example, analysed the quality of 307 reviews in the field of Applied Linguistics against the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) guidelines (Shamseer et al., 2015 ), and found that most of the information expected in any research report were present; background, rationale, objectives and a conclusion. However, only 43% included the search terms used to find studies, 78% included the inclusion/exclusion criteria, 53% explained how studies were selected, and 51% outlined the data collection process.

Another popular quality assessment tool is the Database of Abstracts and Reviews of Effects (DARE) tool (Centre for Reviews and Dissemination, 1995 ), which was used by Kitchenham et al. ( 2009 ) in a computer science tertiary review; a methodology that has since been heavily adopted by researchers across a range of disciplines, including computer science, social sciences, and education. Footnote 11 The authors used the DARE tool to assess the quality of 20 computer science systematic reviews based on four criteria:

Are the review’s inclusion and exclusion criteria described and appropriate?

Is the literature search likely to have covered all relevant studies?

Did the reviewers assess the quality/validity of the included studies?

Were the basic data/studies adequately described?

Kitchenham et al. ( 2009 ) found that, although only 35% of studies scored 2 out of 4 or lower, few assessed the quality of the primary studies that had been included in the review. The average score overall was 2.6 out of 4, increasing in quality across 2004–2007, with a Spearman correlation of 0.51 ( p  < 0.023).

In the field of EdTech, Lai and Bower (2020) conducted a tertiary review by also adopting Kitchenham et al.’s ( 2009 ) quality assessment method, critically analysing 73 reviews to uncover the technologies, themes, general findings, and quality of secondary research that has been conducted. They found that there was very little consistency in how articles were organised, with only six papers (8.2%) explicitly defining quality assessment criteria. The average total quality score was 2.7 out of 4 (SD = 0.59), with only four reviews receiving full marks. There was, however, a slight increase in review quality over time, rising from 2.5 in 2010 to 2.9 in 2018. Likewise, in a tertiary mapping review of 446 EdTech evidence syntheses (Buntins et al., 2023 ), 44% ( n  = 192) provided the full search string, 62% ( n  = 275) included the inclusion/exclusion criteria, 37% ( n  = 163) provided the data extraction coding scheme, and only 26% of systematic reviews conducted a quality assessment. Similar findings were reported in an umbrella review of 576 EdTech reviews (Zawacki-Richter, 2023 ), where 73.4% did not conduct a quality appraisal, and only 8.1% achieved a quality score above 90 (out of 100).

Therefore, in order to map the state of the AIHEd field, explore the quality of evidence synthesis conducted, and with a view to suggest future primary and secondary research (Sutton et al., 2019 ), a tertiary review was conducted (Kitchenham et al., 2009 ; Lai & Bower, 2020 ), with the reporting here guided by the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA, Page et al., 2021 ; see OSF Footnote 12 ) for increased transparency. As with other rigorous forms of evidence synthesis such as systematic reviews (Sutton et al., 2019 ), this tertiary review was conducted using explicit, pre-defined criteria and transparent methods of searching, analysis and reporting (Gough et al., 2012 ; Zawacki-Richter et al., 2020 ). All search information can be found on the OSF. Footnote 13

Search strategy and study selection

The review was conducted using an iterative search strategy and was developed based on a previous review of research on AIHEd (Zawacki-Richter et al., 2019 ) and a tertiary mapping review of methodological approaches to conducting secondary research in the field of EdTech (Buntins et al., 2023 ). The initial search was conducted on 13 October 2022, with subsequent searches conducted until 18 July 2023 to ensure the inclusion of extant literature (see OSF for search details Footnote 14 ). The platforms and databases searched were the Web of Science, Scopus, ERIC, EBSCOHost (all databases), IEEE Xplore, Science Direct and ACM Digital Library, as these have been found particularly useful for evidence synthesis (e.g., Gusenbauer & Haddaway, 2020 ). The OpenAlex platform (Priem et al., 2022 ) was also searched, which indexes approximately 209 million publications, and was accessed through evidence synthesis software EPPI Reviewer version 6 (Thomas et al., 2023 ). This included conducting a citation search, bibliography search and bidirectional checking of citations and recommendations on identified included items. Items were also added manually (see Fig.  3 ) by finding them through ResearchGate or social media throughout the reviewing process until July 2023. Additional searches were conducted in Google Scholar for the terms “artificial intelligence” AND “systematic review” AND “education”, with the first 50 returned result pages (500 items) searched for pertinent literature.

Search string

A search string was developed (see Fig.  2 ) based on the search strings from the two previous reviews (Buntins et al., 2023 ; Zawacki-Richter et al., 2019 ), focusing on forms of AI, formal teaching and learning settings, and variations of evidence synthesis. Whilst some tertiary reviews focus on one form of secondary research (e.g., meta-analyses; Higgins et al., 2012 ), it was decided to include any form of evidence synthesis as the goal of this review was to map the field, irrespective of the secondary research approach used.

figure 2

Tertiary review search string

Inclusion/exclusion criteria and screening

The search strategy yielded 5609 items (see Fig.  3 ), which were exported as.ris or.txt files and imported into the evidence synthesis software EPPI Reviewer (Thomas et al., 2023 ). Following the automatic removal of 449 duplicates within the software, 5160 items remained to be screened on title and abstract, applying the inclusion and exclusion criteria (see Table  1 ). Studies were included if they were a form of secondary research on AI applications within formal education settings, with an explicit method section and had been published after January 2018. Owing to time and the project scope, studies were only included if they had been published in the English language and were either a peer-reviewed journal article or conference paper. Although reviews have already started being published on the topic of generative AI, and ChatGPT in particular (e.g., İpek et al., 2023 ; Lo, 2023 ), the decision was made to exclude these from this sample, as these AI developments arguably represent the next stage of AI evolution in teaching and learning (Bozkurt & Sharma, 2023 ; Wu et al., 2023 ) (Table  2 ).

figure 3

Meta review PRISMA diagram

To ensure inter-rater reliability between members of the research team, following lengthy discussion and agreement on the inclusion and exclusion criteria by all authors, two members of the team (MB and PP) double screened the first 100 items, resulting in almost perfect agreement (Cohen’s k = 0.89) (McHugh, 2012 ). After the two disagreements were reconciled, the remaining 5060 items were screened on title and abstract by the same authors, resulting in 4711 items excluded. To continue ensuring inter-rater reliability at the screening on full text stage of 545 studies, three rounds of comparison coding were conducted (50, 30 and 30 items). The same two members of the team (MB and PP) responsible for screening the remaining items, again achieved almost perfect agreement (Cohen’s k = 0.85) (McHugh, 2012 ), with 307 evidence syntheses identified across all education levels for data extraction and synthesis. The reviews that only focus on higher education (or also continuing education) were then identified ( n  = 66) and will be the sole focus of the synthesis in this article. It should be noted that a further 32 reviews were identified that include a focus on higher education in some way (see OSF Footnote 15 ), i.e. the results are combined with other study levels such as K-12, but it was decided not to include them in this article, to ensure that all results pertain to higher education.

Data extraction

The data extracted for this tertiary review were slightly modified from those used by Buntins et al., 2023 and Zawacki-Richter et al. ( 2019 ), and included publication and authorship information (e.g. publication type and name, number of authors, author affiliation), review type (as self-declared by the authors and informed by the typology by Sutton et al., 2019 ), review focus (e.g. AIEd in general or specific type of AI as per Zawacki-Richter et al., 2019 typology), specific educational and participant context (e.g. undergraduates, Health & Welfare), methodological characteristics (e.g. databases used and number of included studies), key findings and research gaps identified (see OSF Footnote 16 for the full coding scheme). All data were extracted manually and input into EPPI Reviewer (Thomas et al., 2023 ), including author affiliations and countries, owing to issues identified in EdTech research with missing metadata in the Web of Science (Bond, 2018 ). Where the author information was not directly provided on either the PDF or the journal website, the code ‘Not mentioned’ was assigned. An initial five studies were coded by all authors, to ensure agreement on the coding scheme, although the key findings and research gaps were coded inductively.

To answer sub-question 1f about the quality of AIHEd secondary research, the decision was made to use the DARE tool (Centre for Reviews and Dissemination, 1995 ), which has been used in previous tertiary reviews (e.g., Kitchenham et al., 2009 ; Tran et al., 2021 ). Although the authors acknowledge the AMSTAR 2 tool as an effective quality assessment tool for systematic reviews (Shea et al., 2017 ), the present review includes any kind of evidence synthesis, as long as it has a method section. Therefore, the decision was made to use a combination of four DARE criteria (indicated by D; as used by Lai & Bower, 2020 ), alongside items from the AMSTAR 2 tool, and further bespoke criteria, as developed by Buntins et al. ( 2023 ):

Are there any research questions, aims or objectives? (AMSTAR 2)

Were inclusion/exclusion criteria reported in the review and are they appropriate? (D)

Are the publication years included defined?

Was the search adequately conducted and likely to have covered all relevant studies? (D)

Was the search string provided in full? (AMSTAR 2)

Do they report inter-rater reliability? (AMSTAR 2)

Was the data extraction coding scheme provided?

Was a quality assessment undertaken? (D)

Are sufficient details provided about the individual included studies? (D)

Is there a reflection on review limitations?

The questions were scored as per the adapted method used by Kitchenham et al., ( 2009 , p. 9) and Tran et al., ( 2021 , Figure S1). The scoring procedure was Yes = 1, Partly = 0.5 and No = 0 (see Fig.  4 ). However, it should be noted that certain types of evidence synthesis do not always need to include a quality assessment (e.g., scoping, traditional literature, and mapping reviews, see Sutton et al., 2019 ) and so these were coded as ‘not applicable’ (N/A) in the coding scheme and scored 1. It should also be noted that the quality appraisal was also not used to eliminate studies from the corpus in this case, but rather to answer one of the sub research questions. Due to this, a quality indicator was used in the inclusion/exclusion criteria instead, namely if a review did not have an identifiable method section it would be excluded, as it was reasoned that these were not attempting to be systematic at all. An overall score was determined out of 10 and items determined as critically low (0–2.5), low (3–4.5), medium (5–7), high (7.5–8.5) or excellent (9–10) quality; a similar approach used by other reviews (e.g., Urdaneta-Ponte et al., 2021 ).

figure 4

Quality assessment criteria

In order to answer sub-questions 1 g and 1 h, the evidence syntheses in the corpus were coded using Zawacki-Richter et al.’s ( 2019 ) typology of Profiling and Prediction, Assessment and Evaluation, Adaptive Systems and Personalisation, and Intelligent Tutoring Systems as a starting point. Studies were coded as ‘General AIEd’ if they claimed to be searching for any applications of AI in education (e.g., Chu et al., 2022 ). It should also be noted that, whilst reviews might have said they were focused on ‘General AIEd’ and were therefore coded as such under ‘Focus of AI review’, their findings might have focused specifically on ‘Assessment and Evaluation’ and ‘Intelligent Tutoring Systems’, which were then coded as such under ‘AI Topics and Key Findings’. For example, Alkhalil et al.’s ( 2021 ) mapping review of big data analytics in higher education was coded as ‘Profiling and Prediction’ and ‘Adaptive Systems and Personalisation’ under ‘Focus of AI review’, but they also discussed the use of big data in evaluating teachers and learning material to aid quality assurance processes, which meant that their results were also coded under ‘Assessment and Evaluation’ in the ‘AI Topics and Key Findings’ section of the data extraction coding tool.

Data synthesis and interactive evidence & gap map development

A narrative synthesis of the data was undertaken (Petticrew & Roberts, 2006 ), including a tabulation of the included studies (see Additional file 1 : Appendix A), in order to provide an overview of the AIHEd field. Further tables are provided throughout the text, or included as appendices, accompanied by narrative descriptions. In order to provide further visual overviews, and to provide publicly accessible resources to the field beyond that which this article can provide, interactive evidence and gap maps were produced for each research question, using the EPPI Mapper application (Digital Solution Foundry & EPPI Centre, 2023 ). To do this, a JSON report of all included studies and associated coding were exported from EPPI Reviewer (Thomas et al., 2023 ) and imported into the EPPI Mapper application, where display options were chosen. The HTML files were then uploaded to the project page and are available to access and download Footnote 17 . An openly accessible web database of the included studies is also available, Footnote 18 which allows users to view the data in an interactive way through crosstabulation and frequency charts, with direct links to included studies, as well as to save and export the data. This was created using the EPPI Visualiser app, which is located within EPPI Reviewer. Footnote 19

Limitations

Whilst every attempt was made to conduct this meta review as rigorously and transparently as possible, there are some limitations that should be acknowledged. Firstly, the protocol was not pre-registered within an official systematic review repository, such as Prospero, Footnote 20 as this is not a medical study and is a tertiary review. However, all search information is openly accessible on the OSF Footnote 21 and in the future, the authors will make use of an organisation such as the International Database of Education Systematic Reviews, Footnote 22 which is now accepting protocols from any education discipline. Only the first 500 records in Google Scholar were considered, as opposed to the 1000 records recommended by Haddaway et al. ( 2015 ), although OpenAlex was also used to supplement this. Further individual academic journals could also have been manually searched, such as Computers & Education: Artificial Intelligence , as well as literature published in languages other than English, in order to reduce language bias (Stern & Kleijnen, 2020 ). Furthermore, the quality assessment tool that was developed is not perfect, and it could be argued that the distance between yes, no and partly cannot be assumed to be equal. However, the two tools are widely used, and this approach has been used in the field previously (e.g., Kitchenham et al., 2009 ; Tran et al., 2021 ).

General publication characteristics

Of the 66 evidence syntheses identified solely focused on AIEd in higher education (AIHEd), the majority were published as journal articles (81.8%, n  = 54), as opposed to conference papers ( n  = 12), but only 67.6% are available open access. Footnote 23 Although there has been an exponential growth in the interest in AIEd (Chen et al., 2022 ; OECD, 2023 ), there was a slight reduction in the number published in 2020 before rising again (see Fig.  5 ). Footnote 24 This is likely due to the impact of the COVID-19 pandemic, and it is interesting to note that 12 had already been published in 2023 up to mid-July.

figure 5

Number of higher education evidence syntheses published by year

Although many reviews synthesised research across multiple settings, there were a small number that focused on AIHEd in specific disciplines or with particular groups of participants, for example Health & Welfare ( n  = 14), STEM ( n  = 4), online or blended learning ( n  = 5), foreign language learning ( n  = 2), pre-service teachers (Salas-Pilco et al., 2022 ), students with disabilities (Fichten et al., 2021 ), and undergraduate students (Lee et al., 2021 ). Six evidence syntheses had a specific geographical focus, with three centred on research conducted within individual countries: India (Algabri et al., 2021 ; Bhattacharjee, 2019 ) and Saudi Arabia (Alotaibi & Alshehri, 2023 ). The other three focused on research from within the regions of Africa (Gudyanga, 2023 ; Maphosa & Maphosa, 2020 ) and Latin America (Salas-Pilco & Yang, 2022 ).

What kinds of evidence syntheses are being conducted in AIHEd?

There were eight different types of evidence syntheses conducted in AIHEd (see Additional file 2 : Appendix B), as identified by their authors. Systematic literature reviews were by far the most popular type, accounting for two thirds of the corpus (66.7%, n  = 44), followed by scoping reviews (12.1%, n  = 8). There were two reviews where authors conducted both a systematic review and a meta-analysis (Fahd et al., 2022 ; Fontaine et al., 2019 ), and two reviews where authors identified their work as a mapping review and a systematic review (del Gobbo et al., 2023 ; Zhong, 2022 ).

In which conferences and academic journals are AIHEd evidence syntheses published?

AIHEd evidence syntheses were published in 42 unique academic journals and 11 different conference proceedings (see Additional file 3 : Appendix C). The top conference was the International Conference on Human–Computer Interaction ( n  = 2), with all other conferences publishing one paper each. The top seven journals were Education and Information Technologies ( n  = 4), International Journal of Educational Technology in Higher Education ( n  = 4), Education Sciences ( n  = 3), Interactive Learning Environments ( n  = 2), Technology, Knowledge and Learning ( n  = 2), Sustainability ( n  = 2), and JMIR Medical Education ( n  = 2). All of these journals have published systematic reviews (see Additional file 4 : Appendix D), although other types have been published as well, with the exception of Technology, Knowledge and Learning and Sustainability .

What are AIHEd evidence synthesis authors’ institutional and disciplinary affiliations?

The AIHEd evidence syntheses in this corpus were written by authors from 110 unique institutions, with the top seven most productive institutions located in five different continents (see Additional file 5 : Appendix E). The most productive institution in each continent were the University of Toronto (North America, n  = 5), The Independent Institute of Education (Africa, n  = 3), Central China Normal University and Fu Jen Catholic University (Asia, n  = 2 each), Sultan Qaboos University (Middle East, n  = 2), and the University of Newcastle (Oceania, n  = 2). The European and the South and Central American institutions all had one publication each.

Although Crompton and Burke ( 2023 ) have reported a rise in the number of Education affiliated authors in AIEd primary research, more than half of evidence synthesis in this corpus have been published by first authors from STEM affiliated backgrounds (56.1%), with Computer Science & IT (30.3%, n  = 20) authors the most prolific (see Additional file 6 : Appendix F). Education affiliated authors do still represent 25.8%, which is encouraging, and six publications did not mention the disciplinary affiliation of their authors. Researchers from Education and Computer Science & IT have published more of a range of evidence synthesis than the other disciplines, although still with a heavy skew towards systematic reviews (71% and 75% respectively). Another interesting finding is that Health, Medical & Physical Education researchers have published twice as many scoping reviews ( n  = 7) as they have systematic reviews ( n  = 3) in this corpus, which may perhaps be due to the longer history of evidence synthesis in that discipline (Sutton et al., 2019 ).

What is the geographical distribution of AIHEd evidence synthesis authorship?

The authorship of AIHEd secondary research has been quite evenly spread between authors from North America (27.3%), Europe (24.2%) and Asia (22.7%), followed by the Middle East (13.6%; see Additional file 7 : Appendix G). In line with previous EdTech research (e.g., Bond et al., 2019 ), there was far less representation from South and Central America (4.5%). Authorship was spread across 32 different countries (see Additional file 9 : Appendix I), with arguably less dominance by the United States than two other recent EdTech tertiary reviews (Buntins et al., 2023 ; Zawacki-Richter, 2023 ) have found. Whilst it was the most productive country (see Table  3 ), the United States was closely followed by Canada and Australia. Furthermore, all continents aside from South and Central America are represented in the top nine most productive countries.

When the geographical distribution is viewed by evidence synthesis type (see Additional file 8 : Appendix H), researchers in Africa, North America, Oceania, the Middle East and Europe have used a wider range of secondary research approaches, although European and Oceanian authors have heavily favoured systematic reviews (75%).

AIHEd evidence synthesis is almost always published collaboratively (89.4%, n  = 59), particularly in teams of two, three or four researchers (see Additional file 9 : Appendix I), with 21 authors of a scoping review the largest number in one publication (Charow et al., 2021 ). African and Middle Eastern researchers have published more as single authors (29% and 22% of publications from those regions). Co-authorship, however, tends to occur in domestic collaborations (71.2%), with only 18.2% of publications internationally co-authored. Rates of domestic co-authorship are particularly high in Oceania (75%) and Europe (69%). The highest rate of international research collaboration is found in South & Central America and the Middle East (33% of cases respectively). Bibliometric reviews (50%), integrative reviews (50%) and meta-analyses (33%) have the highest rates of international co-authorship, although these are also some of the lowest numbers of evidence synthesis produced. Interestingly, systematic reviews are almost exclusively undertaken by researchers located within the same country (70.5%), with all eight scoping reviews published by domestic research collaborations.

51.5% of reviews ( n  = 34) did not report using any kind of digital evidence synthesis tool in their article to conduct their review (see Additional file 10 : Appendix J) and of those that did, only 12.1% ( n  = 8) reported using some kind of evidence synthesis software, which have integrated machine learning functionality (e.g., deduplication, priority screening, snowball searching) to assist in making the review process more transparent and efficient. The most popular of these were EPPI Reviewer ( n  = 3) Footnote 25 and Covidence ( n  = 3). Footnote 26 AIHEd secondary researchers have mostly used spreadsheets (16.7%) and reference management software (16.7%) to manage their reviews, with authors of critical reviews, literature reviews and systematic reviews the least likely to report whether a tool was used at all.

AIHEd evidence synthesis quality

The AIHEd reviews in the corpus were assessed against 10 quality assessment criteria (see Table  4 ), based on the DARE (Centre for Reviews and Dissemination, 1995 ; Kitchenham et al., 2009 ) and AMSTAR 2 (Shea et al., 2017 ) tools, as well as the method by Buntins et al. ( 2023 ). Almost all studies provided explicit information about their research questions, aims or objectives (92.4%), the inclusion/exclusion criteria (77.3%) and the publication years of literature included in the review (87.9%). Whilst 68.2% of reviews provided the exact search string used, there were still 25.8% ( n  = 17) that only provided some of the words used to find the included studies. The most concerning findings were that 31.8% of studies only searched in one or two databases, 51.5% did not report anything about inter-rater reliability or how screening and coding decisions were decided between review teams, only 24.2% provided their exact data extraction coding scheme, 45.5% did not undertake any form of quality assessment, and 34.8% did not reflect at all upon the limitations of their review.

The reviews were given an overall quality assessment score out of 10 (see Fig.  6 ), averaging 6.57 across the corpus. Looking at the quality over time (see Additional file 11 : Appendix K), it is encouraging to see that the percentage of ‘critically low’ and ‘low quality’ studies being conducted appears to be reducing. Meta-analyses and scoping reviews were predominantly coded as ‘high quality’ or ‘excellent quality’, with far more variability in the quality of systematic reviews. Conference papers were lower quality than journal articles, with only 8% of conference papers receiving a ‘high quality’ rating and none receiving ‘excellent quality’. This may, however, be partially owing to the limitations on word count that conference proceedings impose. For example, the most prolific conference in this corpus, the Human Computer Interaction Conference, Footnote 27 accepts paper submissions of up to 20 pages including references. Given the often-lengthy reference list required by an evidence synthesis paper, this restricts the depth of information that can be provided.

figure 6

Overall quality assessment

In order to gain greater insight into methodological quality, each review was coded on whether a specific method or approach was followed (see Additional file 11 : Appendix K). Although 18.2% ( n  = 12) of publications did not provide a reference to a specific approach followed, including some that said they followed the PRISMA guidelines (e.g., Page et al., 2021 ) but did not cite them, 29 different publications were referenced. Of these, the original (Liberati et al., 2009 ; Moher et al., 2009 ) and the updated PRISMA guidelines (Moher et al., 2015 ; Page et al., 2021 ) were referenced as a primary approach by 33.3% ( n  = 22), not including the scoping review PRISMA-S guidelines (Tricco et al., 2018 ) in a further four. However, authors from an Education disciplinary background were slightly more likely to use PRISMA than those from Computer Science, who preferred to follow the guidance of Kitchenham and colleagues (Kitchenham, 2004 ; Kitchenham & Charters, 2007 ; Kitchenham et al., 2009 , 2010 ).

AIEd applications in higher education

The reviews were categorised using Zawacki-Richter et al.’s ( 2019 ) classification (profiling and prediction; intelligent tutoring systems; adaptive systems and personalisation; assessment and evaluation; see Fig.  1 ), depending upon their purported focus within the title, abstract, keywords or search terms, with any reviews not specifying a particular focus categorised as ‘General AIEd’ (see Table  5 ). Most of the reviews (47%, n  = 31) fell under the latter category and explored a range of AI applications. This was followed by reviews focusing on profiling and prediction (e.g., Abu Saa et al., 2019 ) and adaptive systems and personalisation (e.g., Fontaine et al., 2019 ). Reviews focused specifically on assessment and evaluation (e.g., Banihashem et al., 2022 ) and intelligent tutoring systems (e.g., Crow et al., 2018 ) were rare.

Key findings in AIEd higher education evidence synthesis

The student life-cycle (Reid, 1995 ) was used as a framework to identify AI applications at the micro level of teaching and learning, as well as at the institutional and administrative level. Most of the reviews included research focused on academic support services at the teaching and learning level ( n  = 64, 97.0%), Footnote 28 with only 39.3% ( n  = 26) addressing institutional and administrative services. A lower level of focus on administration was also found by Crompton and Burke ( 2023 ), where only 11% of higher education research focused on managers, despite AI being useful for personalising the university experience for students in regards to admissions, examinations and library services (Algabri et al., 2021 ; Zawacki-Richter et al., 2019 ), exploring trends across large datasets (Zhang et al., 2023 ), and for quality assurance (Kirubarajan et al., 2022 ; Manhiça et al., 2022 ; Rabelo et al., 2023 ).

The key findings of the reviews were classified into the four main thematic AI application areas (see Fig.  1 ). More than half of the reviews (54.5%, n  = 36) discussed applications related to adaptive systems and personalisation, closely followed by profiling and prediction (48.5%, n  = 32), 39.4% ( n  = 26) discussed findings related to assessment and evaluation, and only 21.2% ( n  = 14) looked into intelligent tutoring systems. The key findings will now be synthesised below.

Adaptive systems and personalisation

All of the reviews on adaptive systems ( n  = 36) are situated at the teaching and learning level, with only 12 reviews (33.3%) reporting findings for the administrative and institutional level. Five subcategories were found: chatbots/virtual assistants ( n  = 20), providing personalised content ( n  = 14), facial recognition/mood detection ( n  = 9), recommender systems/course scheduling ( n  = 5), and robots ( n  = 3). Li et al.’s ( 2021 ) review also focused on the challenges faced by adaptive learning research. They found that research is still at a nascent stage, with a gap between theory and practice, and that further interdisciplinary approaches are needed, alongside the collection and sharing of massive data that adheres to privacy considerations. Andersen et al.’s (2022) scoping review of adaptive learning in nursing education suggests that further attention also needs to be paid to learning design, alongside further qualitative research.

Chatbots/virtual assistants

Chatbots appeared in in various forms in the literature, including virtual assistants, virtual agents, voice assistants, conversational agents and intelligent helpers (Chaka, 2023 ; Crompton & Burke, 2023 ). Virtual patient apps have become increasingly used within nursing, dental and medical contexts (e.g., Buchanan et al., 2021 ; Zhang et al., 2023 ), with Hwang et al.’s ( 2022 ) review of 112 AI-supported nursing education articles finding that intelligent agents were the most used AI system (53% of studies). Research measured the effectiveness of chatbots on student learning outcomes, critical thinking, empathy, communication skills and satisfaction (Chaka, 2023 ; Frangoudes et al., 2021 ), with a review of English as a foreign language literature (Klímová & Ibna Seraj, 2023 ) finding chatbots having a particularly positive influence on developing speaking skills (intonation, stress, and fluency), possibly in part due to feelings of reduced anxiety (Zhai & Wibowo, 2023 ). Virtual assistants can be particularly useful to enhance accessibility for visually and hearing-impaired students, through automatic speech recognition, text to speech and sign language interpretation (Fichten et al., 2021 ), as well as to help detect anxiety and depressive symptoms in students (Salas-Pilco & Yang, 2022 ). There is potential to use chatbots in a more institution-wide role, for example to collate opinions about teaching and the institution (Sourani, 2019 ) or to scale mentoring of students who are on field placements (Salas-Pilco et al., 2022 ). One review found that students prefer chatbots to other communication methods (Hamam, 2021 ). Further development is suggested on the evaluation of chatbots, such as their effectiveness on affective and social aspects of learning (Algabri et al., 2021 ; Frangoudes et al., 2021 ).

Providing personalised content

The use of personalised learning was identified in 14 reviews, which particularly highlighted the benefits of customising learning to support students (e.g., Algabri et al., 2021 ), although Fontaine et al.’s ( 2019 ) meta-analysis of 21 Health & Welfare studies found that adaptive learning only had a statistically significant effect on learning skills, rather than on building factual knowledge. Fariani et al.’s (2022) review of 39 personalised learning studies found that personalised teaching materials were the most widely used (49%), followed by learning paths (29%), learning strategies (17%) and learning environments (5%), with 49% using machine learning algorithms and 51% measuring the impact of personalisation on learning. Zhong’s ( 2022 ) review of 41 studies found that 54% used learning traits to structure learning content, with macro the most popular sequencing approach (24%). Further studies are needed to explore how personalisation impacts affective aspects such as motivation, engagement, and interest (Alamri, 2021; Fariani et al., 2021 ), with primary research needing to provide more explicit information about the algorithms and architecture used (Fontaine et al., 2019 ).

Facial recognition/mood detection

Five studies (10%) in Kirubarajan et al.’s ( 2022 ) scoping review used motion tracking systems to assess student activity. Face tracker software has been used to manage student attendance (Salas-Pilco & Yang, 2022 ), determine whether students are accurately interpreting ECGs (Zhang et al., 2023 ), and to analyse students’ emotions during clinical simulations, to help educators tailor simulations to student needs more effectively (Buchanan et al., 2021 ). Li et al. ( 2021 ) concluded that research providing real insight into understanding students’ psychological emotions and cognition is currently at a nascent stage. However, Darvishi et al. ( 2022 ) suggest that neuro measurements can help fill this gap by providing further insight into learner mental states and found that facial measurements had a higher adoption rate than EEGs, although cognitive constructs were measured in more EEG studies. 66% ( n  = 6) of the reviews reporting the use of neurophysiological AI, stressed the need for further ethical considerations when undertaking such research in the future, including obtaining participant consent (Salas-Pilco & Yang, 2022 ), more transparent development of AI and clearer reporting of study design (Kirubarajan et al., 2022 ). Darvishi et al. ( 2022 ) suggested that propensity-score matching could be used to conduct quasi-experimental studies more ethically.

Recommender systems/course scheduling

Five reviews located studies on the use of recommender systems (RSs), including Rabelo et al. ( 2023 ), who argue that administrators could make more use of RSs to help retention, including recommending subjects and courses. Banihashem et al. ( 2022 )’s systematic review on the role of learning analytics to enhance feedback reported a few studies where systems had guided students and recommended course material, and Zawacki-Richter et al. ( 2019 ) found three studies, including one suggesting pedagogical strategies for educators (Cobos et al., 2013 ), Urdaneta-Ponte et al.’s ( 2021 ) systematic review focused solely on RSs in HE and included 98 studies. The most commonly used development techniques were collaborative filtering, followed by RSs that combine different techniques. Most RSs suggested learning resources (37.76%) and courses (33.67%). 78% of studies focused on students, and therefore future research could explore the perceptions of educators and other stakeholders. Urdaneta-Ponte et al. ( 2021 ) suggest that further investigation is needed of algorithms that are based on a semantic approach, as well as further development of hybrid systems. They also suggest that user information could be explored along with information from different sources, such as social media, to build more complete profiles.

Only three reviews mentioned the use of robots within HE. In Chaka’s ( 2023 ) literature review, 38% ( n  = 10) of studies focused on how robots could be used to enhance the teaching and learning of undergraduate students, with one study exploring the use of a robot-assisted instructional package to help teach students with intellectual disabilities how to write messages (Pennington et al., 2014 ). Five studies (18.5%) in Buchanan et al.’s ( 2021 ) nursing scoping review pertained to robots, with one study suggesting that there would be an increased presence of humanoid robots and cyborgs in the future to complement high-fidelity simulators. Maphosa and Maphosa ( 2021 ) called for further primary research on the development and application of intelligent robots, although Chaka ( 2023 ) pointed out that barriers to further HE implementation will need to be overcome, including challenges with infrastructure and technology, educator acceptance, and curricula being “robotics-compliant” (p. 34).

Profiling and prediction

All of the reviews pertaining to profiling and prediction included a focus on teaching and learning ( n  = 32), with just over half ( n  = 17, 53.1%) detailing examples of AI support at the administrative level. The 32 reviews were further classified into six subcategories: dropout/retention ( n  = 25), academic achievement/learning outcomes ( n  = 24), admissions/timetabling ( n  = 6), career paths/placement ( n  = 4), student satisfaction ( n  = 3), and diagnostic prediction ( n  = 3).

Dropout/retention

AI’s role in predicting student dropout and aiding retention was highlighted in 25 reviews (37.9%). Liz-Domínguez et al. ( 2019 ) acknowledge the trend of using AI to identify at-risk students, while Maphosa and Maphosa ( 2021 ) note AI’s high accuracy in predicting student outcomes. However, McConvey et al. ( 2023 ) point out limited evidence of the effective use of dropout prediction models in institutions. Li et al. ( 2022 ) emphasise the impact of factors like personal characteristics and family background on student motivation. Cardona et al. ( 2023 ) add that prior knowledge is crucial in determining dropout rates. McConvey et al. ( 2023 ) observe the inclusion of social media activity and financial data in predictive models, highlighting demographic data and LMS activity as common predictors. In terms of algorithms, a number of reviews (e.g., Fahd et al., 2022 ; Hellas et al., 2018 ) report that classifiers are preferred over regression algorithms, especially for dropout and failure risks, as the outputs are categorical variables.

Academic achievement/learning outcomes

24 reviews reported findings associated with predicting academic performance, course selection, course completion, engagement, and academic success. Seven reviews purely focused on the use of AI to predict academic performance in HE (Abu Saa et al., 2019 ; Fahd et al., 2022 ; Ifenthaler & Yau, 2020 ; Zulkifli et al., 2019 ), with some reviews specialising in specific disciplines (STEM; Hellas et al., 2018 ; Moonsamy et al., 2021 ) and study levels (undergraduates; Alyahyan & Düştegör, 2020 ). The features commonly used for prediction can be categorised into demographic (age, gender, etc.), personality (self-efficacy, self-regulation, etc.), academic (previous performance, high school performance, etc.), behavioural (log data, engagement), and institutional (teaching approach, high school quality) (Abu Saa et al., 2019 ). Alyahyan and Düştegör ( 2020 ) report that prior-academic achievement, student demographics, e- learning activity and psychological attributes are the most common factors reported and that the top two factors (prior academic achievement and student demographics) were present in 69% of included literature. Hellas et al. ( 2018 ) identified various techniques for predicting academic outcomes, including Classification (using supervised learning methods like Naive Bayes and Decision Trees), Clustering (involving unsupervised learning), Statistical methods (like correlation and regression), and Data mining. The review noted the prevalent use of linear regression models and the comparison of different algorithms in classification methods, leading to diverse predictive results. Future research should ensure that a detailed description is provided on what is being predicted, how and why (Hellas et al., 2018 ), could be deepened by more diverse study design, such as longitudinal and large-scale studies (Ifenthaler & Yau, 2020 ) with multiple data collection techniques (Abu Saa et al., 2019 ), in a more diverse array of contexts (e.g., Fahd et al., 2022 ; Sghir et al., 2022 ), especially developing countries (e.g., Pinto et al., 2023 ).

Admissions/timetabling

The use of AI to assist with admissions, course booking behaviour, timetabling, and thesis allocation have seen significant advances in HE, which was reported in six reviews (9.1%), although they only reported on a considerably small number of studies; for example, Zawacki-Richter et al. ( 2019 ) found seven studies (4.8%), Sghir et al. ( 2022 ) found three studies (4.1%), and Otoo-Arthur and van Zyl ( 2020 ) two studies (3.6%). Alam and Mohanty ( 2022 ) suggest that applications can be sorted with a 95% accuracy rate when using the support vector machine method. While the use of AI can potentially liberate administrative staff from routine tasks to handle more intricate cases (Zawacki-Richter et al., 2019 ), it also introduces bias, as the approaches have been shown to give prospective students from certain geographic locations an advantage in the college admissions process (Alam & Mohanty, 2022 ). The surge in data from learning management systems (LMS) and self-serve course registration has boosted research in these sectors, and algorithms targeting course selection, program admission, and pathway advising can have significant and sometimes restrictive effects on students (McConvey et al., 2023 ). In particular, it might restrict or overly influence student choices and inadvertently narrow down diverse learning paths and experiences.

Career paths/placement

Four reviews reported findings pertaining to the use of AI to assist with career paths and placements. Although McConvey et al. ( 2023 ) reported that 18% ( n  = 7) of the papers in their review were related to pathway advising, the number of studies researching this remains quite low, with Alkhalil et al. ( 2021 ) finding that managing large volumes of data was the main challenge when using AI to support student career pathways. Pinto et al. ( 2023 ) reported that some researchers have employed ML based approaches to predict the employability of college graduates in order to develop study plans that match the demands of the labour market. Salas-Pilco and Yang ( 2022 ) highlight that upon graduation, while students anticipate employability, many face challenges securing jobs. AI’s role in predicting employability outcomes emphasises the necessity of offering guidance to graduates, ensuring quality in higher education, and understanding graduates’ behavioural patterns to better support their career trajectories.

Student satisfaction

A small number of studies have explored using AI to predict student satisfaction, which was only mentioned in three reviews. Ouyang et al. ( 2020 ) highlighted a paper in their review (Hew et al., 2020 ), which analysed the course features of 249 randomly sampled MOOCs, and 6,393 students’ perceptions were examined to understand what factors predicted student satisfaction. They found that the course instructor, content, assessment, and time schedule played significant roles in explaining student satisfaction levels. Pinto et al. ( 2023 ) highlighted findings from two studies; the first (Abdelkader et al., 2022 ) posited that feature selection increased the predictive accuracy of their ML model, allowing them to predict student satisfaction with online education with nearly perfect accuracy, and the second (Ho et al., 2021 ) investigated the most important predictors in determining the satisfaction of undergraduate students during the COVID-19 pandemic using data from Moodle and Microsoft Teams, which was also included in Rangel-de Lázaro and Duart ( 2023 )’s review. The results showed that random forest recursive feature elimination improved the predictive accuracy of all the ML models.

Diagnostic prediction

Three reviews on AI applications in nursing and medical education (Buchanan et al., 2021 ; Hwang et al., 2022 ; Lee et al., 2021 ) discussed the prevalence of research on AI for diagnosis/prognosis prediction. Whilst all three reviews reported increasing use, they particularly highlighted the implications that this has for HE curricula, which was also echoed by other medical reviews in the corpus (e.g., Burney & Ahmad, 2022 ). Lee et al. ( 2021 ) stressed the need for an evidence-informed AI curriculum, with an emphasis on ethical and legal implications, biomedical knowledge, critical appraisal of AI systems, and working with electronic health records. They called for an evaluation of current AI curricula, including changes in student attitudes, AI knowledge and skills. Buchanan et al. ( 2021 ) suggest that ethical implications, digital literacy, predictive modelling, and machine learning should now be part of any nursing curriculum, which Charow et al. ( 2021 ), Grunhut et al. ( 2021 ), Harmon et al. ( 2021 ) and Sapci and Sapci ( 2020 ) argue should be designed and taught by multidisciplinary teams. Further collaboration between educators and AI developers would also be a way forward (Zhang et al., 2023 ).

Assessment and evaluation

Three reviews focused specifically on assessment and evaluation, including plagiarism (Albluwi, 2019), online learning (Del Gobbo et al., 2023 ), and the role of learning analytics with feedback (Banihashem et al., 2022 ). The systematic review by Crompton and Burke ( 2023 ) found that assessment and evaluation was the most common use of AIHEd, and the algorithm most frequently applied in nursing education for assessment and evaluation in Hwang et al.’s ( 2022 ) systematic review was natural language parsing (18.75%). All the reviews containing findings about assessment and evaluation ( n  = 26) pertain to teaching and learning research, with 10 (38.5%) reporting on the use of AI to assist evaluation at the administrative level. Here, AI has been used to evaluate student outcomes to determine admission decisions (Alam & Mohanty, 2022 ), to inform faculty and institutional quality assurance measures (e.g., Alkhalil et al., 2021 ; Sghir et al., 2022 ), and to analyse the impact of university accreditation on student test performance, as well as academic research performance and scientific productivity (Salas-Pilco & Yang, 2022 ). However, there remain many concerns about how institutions are storing and using teaching and learning data (see section below, Research Gaps ), and therefore further data regulations and a greater emphasis on ethical considerations are needed (Bearman et al., 2023 ; Ullrich et al., 2022 ).

The 26 Assessment and Evaluation reviews were further classified into six subcategories: the evaluation of student understanding, engagement and academic integrity ( n  = 17), automated grading and online exams ( n  = 14), automated feedback ( n  = 10), evaluation of teaching ( n  = 5), evaluation of learning material ( n  = 5), and the evaluation of universities ( n  = 2).

Evaluation of student understanding, engagement, and academic integrity

17 reviews (25.8%) included primary studies that evaluated AI’s impact on learning effectiveness and behaviour (Chu et al., 2022 ), engagement (Rabelo et al., 2023 ; Sghir et al., 2022 ), plagiarism (Albluwi, 2019), reflections and higher order thinking (Crompton & Burke, 2023 ), often through LMS data (Manhiça et al., 2022 ), with a view to identifying students at risk and to enable earlier interventions (Banihashem et al., 2022 ). However, studies that provided explicit details about the actual impact of AI on student learning were rather rare in many of the reviews (e.g., two studies in Rangel-de Lázaro & Duart, 2023 ; three studies in Zawacki-Richter et al., 2019 ), and Hwang et al. ( 2022 ) found very few studies that explored AI’s effect on cognition and affect in nursing education, with further research suggested to explore the acquisition of nursing knowledge and skills, such as the use of AI to evaluate handwashing techniques and to evaluate nursing student emotions during patient interaction, as reported by Buchanan et al. ( 2021 ). This area seems to be slightly more advanced in medical education research, as Kirubarajan et al. ( 2022 ) found 31 studies that used AI to evaluate the surgical performance of trainees, including suturing, knot tying and catheter insertion (see also Burney & Ahmad, 2022 ; Sapci & Sapci, 2020 ). Zhang et al. ( 2023 ) point out, however, that machine learning can only classify surgical trainees into novices and experts through operations on virtual surgical platforms, and therefore some students might be able to deceive the algorithms. Here, Albluwi (2019) stresses the need for more emphasis on integrating academic integrity and AI ethics into the curriculum.

Automated grading and online exams

Automatic assessment was found to be the most common use of AIHEd in Crompton and Burke’s ( 2023 ) systematic review (18.8%, n  = 26), which contrasts with small numbers found in other reviews, exploring the use of automated essay evaluation systems (AES; Ouyang et al., 2020 ) and remotely proctored exams (Pinto et al., 2023 ; Rangel-de Lázaro & Duart, 2023 ). AES use in the studies found by Zawacki-Richter et al. ( 2019 ) were mostly focused on undergraduate students and were used within a range of disciplines, as opposed to the heavy STEM focus reported by del Gobbo et al. ( 2023 ), who found the two most used approaches to be term frequency-inverse document frequency (TF-IDF) and Word Embeddings. Although automatic grading has been found to lessen teacher workload (e.g., Salas-Pilco et al., 2022 ), Alam and Mohanty ( 2022 ) suggest that using AES in small institutions would be challenging, owing to the large number of pre-scored exams required for calibration, and although automatic grading has been used for a wide range of tasks, from short answer tests to essays (Burney & Ahmad, 2022 ), they found that AES might not be appropriate for all forms of writing.

Automated feedback

Most of the 10 reviews (15.2%) identified only a small number of studies that evaluated the impact of automated feedback on students, including on academic writing achievement (Rangel-de Lázaro & Duart, 2023 ; Zawacki-Richter et al., 2019 ), on reflection (Salas-Pilco et al., 2022 ), and on self-awareness (Ouyang et al., 2020 ). Two studies in the scoping review by Kirubarajan et al. ( 2022 ) reported real-time feedback using AI for modelling during surgery. Manhiça et al. ( 2022 ) also found two studies exploring automated feedback, but unfortunately did not provide any further information about them, which gives further weight to the potential of more research need in this area.

Evaluation of teaching

Five reviews (7.6%) found a small number of studies where AI had been used to evaluate teaching effectiveness. This was done by using data mining algorithms to analyse student comments, course evaluations and syllabi (Kirubarajan et al., 2022 ; Salas-Pilco & Yang, 2022 ; Zawacki-Richter et al., 2019 ), with institutions now being able to identify low-quality feedback given by educators and to flag repeat offenders (Zhang et al., 2023 ). Rabelo et al. ( 2023 ) argue, however, that management should make more use of this ability to evaluate teaching quality.

Evaluation of learning material

Five reviews (7.6%) mentioned the use of AI to evaluate learning materials, such as textbooks (Crompton & Burke, 2023 ), particularly done by measuring the amount of time students spend accessing and using them in the LMS (Alkhalil et al., 2021 ; Rabelo et al., 2023 ; Salas-Pilco et al., 2022 ). In Kirubarajan et al.’s ( 2022 ) scoping review on surgical education, nine studies used AI to improve surgical training materials by, for example, categorising surgical procedures.

Intelligent tutoring systems (ITS)

All of the ITS reviews included research at the teaching and learning milieu ( n  = 14), with only two reviews (14.3%) reporting a specific use of ITS at the administrative level. Alotaibi and Alshehri (2023) reported the use of intelligent academic advising, where students are provided with individualised guidance and educational planning, and Zawacki-Richter et al. ( 2019 ) reported examples of AI to support university career services, including an interactive intelligent tutor to assist new students (see Lodhi et al., 2018 ). Previous reviews have commented on the lack of reporting of ITS use in higher education (e.g., Crompton & Burke, 2023 ), and therefore this represents an area for future exploration. One review (Crow et al., 2018 ) focusing solely on the role of ITS in programming education, found that no standard combination of features have been used, suggesting that future research could evaluate individual features or compare the implementation of different systems.

The 14 ITS reviews were further classified into six subcategories; diagnosing strengths/providing automated feedback ( n  = 8), teaching course content ( n  = 8), student ITS acceptance ( n  = 4), curating learning materials ( n  = 3), facilitating collaboration between learners ( n  = 2), and academic advising ( n  = 2; mentioned above).

Diagnosing strengths/providing automated feedback

Eight reviews (12.1%) reported on findings of ITS diagnosing strengths and gaps, suggesting learning paths and providing automated feedback (Salas-Pilco & Yang, 2022 ), which can help reduce educator workload (Alam & Mohanty, 2022 ) and ensure that students receive timely information about their learning (Crompton & Burke, 2023 ). ITS were the second most researched AI application (20%, n  = 10) in Chu et al.’s ( 2022 ) systematic review of the top 50 most cited AIHEd articles in the Web of Science, with the greatest focus being on students’ learning behaviour and affect. Rangel-de Lázaro and Duart ( 2023 ) reported that this was also the focus in three studies in the fields of Business and Medicine.

Teaching course content

Eight reviews (12.1%) also mentioned the role of ITS in teaching course content. Most prevalent was the use of ITS in the medical and scientific fields, for example, as virtual patient simulators or case studies to nursing, medical or dental students and staff (Buchanan et al., 2021 ; Hwang et al., 2022 ; Saghiri et al., 2022 ). In scientific settings, students performed experiments using lab equipment, with support tailored to their needs (Crompton & Burke, 2023 ). Personalised tutoring was also frequently mentioned in addition to teaching content. Rangel-de Lázaro and Duart ( 2023 ) discussed the use of an interactive tutoring component for a Java programming course throughout the Covid-19 pandemic. Intelligent feedback and hints can be embedded into programming tasks, helping with specific semantic or syntactic issues (Crow et al., 2018 ), and specifically tailored hints and feedback were also provided on tasks to solve problems (Zawacki-Richter et al., 2019 ).

Student ITS acceptance

Student acceptance of ITS was addressed in four reviews (6.1%), including Rangel-de Lázaro and Duart ( 2023 ) who found five papers focused on Engineering Education (4.7% of studies). Chu et al. ( 2022 ) found that the most frequently discussed ITS issues were related to affect ( n  = 17, 41.5%) with the most common topics being student attitudes ( n  = 6, 33.33%) and opinions of learners or learning perceptions ( n  = 6, 33.33%), followed by emotion ( n  = 3, 18.75%). Technology acceptance model or intention of use, self-efficacy or confidence, and satisfaction or interest were less discussed. Harmon et al. ( 2021 ) found a limited amount of evidence of positive effects of AI on learning outcomes in their review on pain care in nursing education. The reactions of participants varied and were affected by many factors, including technical aspects (e.g., accessibility or internet speed), a lack of realism, poor visual quality of nonverbal cues, and the ability to ask avatars a question. Saghiri et al. ( 2022 ) examined artificial intelligence (AI) and virtual teaching models within the context of dental education and evaluated students’ attitudes towards VR in implant surgery training, where they also found current ITS capacity to impact on student acceptance, suggesting that future tools need to account for differentiation of oral anatomy.

Curating learning materials

Three reviews (4.5%) addressed the use of material curation when using ITS. Zawacki-Richter et al. ( 2019 ) found three studies (2.1%) that discussed this function, which relate to the presentation of personalised learning materials to students, and only one study was identified by Zhang et al. ( 2023 ). Crow et al. ( 2018 ) concluded that when designing systems to intelligently tutor programming, it would be valuable to consider linking supplementary resources to the intelligent and adaptive component of the system and have suggested this for future ITS development.

Facilitating collaboration between learners

Two reviews (3.0%) discussed findings related to ITS facilitating collaboration, which can help by, for example, generating questions and providing feedback on the writing process (Alam & Mohanty, 2022 ). Zawacki-Richter et al. ( 2019 ) only found two primary studies that explored collaborative facilitation and called for further research to be undertaken with this affordance of ITS functionality.

Benefits and challenges within AIHEd

The evidence syntheses that addressed a variety of AI applications or AI more generally ( n  = 31; see Additional file 5 : Appendix E) were also coded inductively for benefits and challenges. Only two reviews considered AIHEd affordances (Crompton & Burke, 2023 ; Rangel-de Lázaro & Duart, 2023 ), four did not mention any benefits, and six reviews did not mention any challenges, which for four reviews were due to their bibliometric nature (Gudyanga, 2023 ; Hinojo-Lucena et al. 2019 ; Maphosa & Maphosa, 2021 ; Ullrich et al., 2022 ).

Benefits of using AI in higher education

Twelve benefits were identified across the 31 reviews (see Additional file 12 : Appendix L), with personalised learning the most prominent (see Table  6 ). A 32.3% share of reviews identified greater insight into student understanding, positive influence on learning outcomes , and reduced planning and administration time for teachers. The top six benefits will be discussed below.

Zawacki-Richter et al. ( 2019 ) and Sourani ( 2019 ) noted the adaptability of AI to create personalised learning environments, enabling the customisation of educational materials to fit individual learning needs (Algabri et al., 2021 ; Buchanan et al., 2021 ), and thereby support student autonomy by allowing learning at an individual pace (Alotaibi, 2023; Bearman et al., 2023 ). Diagnostic and remedial support is another focus, particularly in tailoring learning paths based on knowledge structures, which can facilitate early interventions for potentially disengaged students (Alam & Mohanty, 2022 ; Chu et al., 2022 ). Interestingly, ten reviews found or mentioned the ability of AI to positively influence learning outcomes (e.g., Alotaibi & Alshehri, 2023 ; Fichten et al., 2021 ), yet few reviews in this corpus provided real evidence of impact (as mentioned above in Assessment and Evaluation ). AI was identified, however, as enhancing learning capabilities and facilitating smoother transitions into professional roles, especially in nursing and medicine (Buchanan et al., 2021 ; Hwang et al., 2022 ; Sapci & Sapci, 2020 ), alongside stimulating student engagement (Chaka, 2023 ) and honing specific skills such as writing performance through immediate feedback systems (Ouyang et al., 2020 ). Several reviews highlighted that AI could automate routine tasks and thereby reduce planning and administrative tasks (e.g., Alam & Mohanty, 2022 ). For instance, AI-powered chatbots and intelligent systems facilitate lesson planning and handle student inquiries, which streamlines the administrative workflow (Algabri et al., 2021 ), and automated grading systems can alleviate workload by assessing student performance (e.g., Crompton & Burke, 2023 ).

Several reviews highlighted the role of machine learning and analytics in enhancing our understanding of student behaviours to support learning (e.g., Alotaibi & Alshehri, 2023 ) and, complementing this, Ouyang et al. ( 2020 ), Rangel-de Lázaro and Duart ( 2023 ), and Salas-Pilco and Yang ( 2022 ) found primary research that focused on the utility of predictive systems. These systems are designed for the early identification of learning issues among students and offer guidance for their academic success. Reviews identified studies analysing student interaction and providing adaptive feedback (e.g., Manhiça et al., 2022 ), which was complemented by Alam and Mohanty ( 2022 ), who highlighted the role of machine learning in classifying patterns and modelling student profiles. Predictive analytics is further supported by reviews such as Salas-Pilco et al. ( 2022 ) and Ouyang et al. ( 2020 ), which discuss their utility in enabling timely interventions.

Seven reviews noted the potential of AI to advance equity in education, with universities’ evolving role in community development contributing to this (Alotaibi & Alshehri, 2023 ). In the future, AI could provide cheaper, more engaging, and more accessible learning opportunities (Alam & Mohanty, 2022 ; Algabri et al., 2021 ), such as using expert systems to assist students who lack human advisors (Bearman et al., 2023 ), thereby alleviating social isolation in distance education (Chaka, 2023 ). In India, AI has also been discussed with regards to innovations such as the ‘Smart Cane’ (Bhattacharjee, 2019 ). AI’s potential to enrich and diversify the educational experience (Manhiça et al., 2022 ), including alleviating academic stress for students with disabilities (Fichten et al., 2021 ), was also discussed.

Algabri et al. ( 2021 ) describe how AI can not only improve grading but also make it objective and error-free, providing educators with analytics tools to monitor student progress. Ouyang et al. ( 2020 ) note that automated essay evaluation systems improve student writing by providing immediate feedback. Zhang et al. ( 2023 ) found that machine learning could reveal objective skills indicators and Kirubarajan et al. ( 2022 ) found that AI-based assessments demonstrated high levels of accuracy. However, other studies discuss the relevance of AI in healthcare, providing tools for data-driven decision making and individualised feedback (Charow et al., 2021 ; Saghiri et al., 2022 ). Collectively, these studies indicate that AI holds promise for making educational assessments more precise, timely, and tailored to individual needs.

Challenges of using AI in higher education

The 31 reviews found 17 challenges, but these were mentioned in fewer studies than the benefits (see Additional file 12 : Appendix L). Nine studies (see Table  7 ) reported a lack of ethical consideration , followed by curriculum development, infrastructure, lack of teacher technical knowledge , and shifting authority , which were identified in 22.6% of studies. Reviews discuss the ethical challenges that medical professionals face when interpreting AI predictions (Grunhut et al., 2021 ; Lee et al., 2021 ). AI applications in education also raise ethical considerations, ranging from professional readiness to lapses in rigour, such as not adhering to ethical procedures when collecting data (e.g., Salas-Pilco & Yang, 2022 ), and ethical and legal issues related to using tools prematurely (Zhang et al., 2023 ). Chu et al. ( 2022 ) explored the ethical challenges in balancing human and machine-assisted learning, suggesting that educators need to consciously reflect on these issues when incorporating AI into their teaching methods.

In relation to the challenges of integrating AI into education, curriculum development issues and infrastructural problems span from broad systemic concerns to specific educational contexts. According to Ouyang et al. ( 2020 ), there is a disconnect between AI technology and existing educational systems, and suggest the need for more unified, standardised frameworks that incorporate ethical principles and advocate for the development of multidisciplinary teams (Charow et al., Lee et al., 2021 ), with a stronger focus on more robust and ethically aware AI curricula (e.g., Grunhut et al., 2021 ). Furthermore, despite its potential, a country may lag behind in both AI research and digital infrastructure (Bhattacharjee, 2019 ) with technical, financial and literacy barriers (Alotaibi & Alshehri, 2023 ; Charow et al., 2021 ), such as the high costs associated with developing virtual programming and high-speed internet (Harmon et al., 2021 ).

With the potential to slow AI curriculum development and application efforts, several reviews mentioned a lack of teacher technical knowledge , reporting that many educators would need new skills in order to effectively use AI (Alotaibi & Alshehri, 2023 ; Bhattacharjee, 2019 ; Chu et al., 2022 ; Grunhut et al., 2021 ; Lee et al., 2021 ). While it was reported that faculty generally lack sufficient time to integrate AI effectively into the curriculum (Charow et al., 2021 ), this was compounded by the fear of being replaced by AI (Alotaibi & Alshehri, 2023 ; Bearman et al., 2023 ). To this end, Charow et al. ( 2021 ) emphasise the need to see AI as augmenting rather than replacing. At the same time, it has been recognised that a lack of AI literacy could lead to a shift in authority moving decision-making from clinicians to AI systems (Lee et al., 2021 ). Overcoming resistance to change and solving various challenges, including those of an ethical and administrative nature, was identified as pivotal for successful AIHEd integration (Sourani, 2019 ).

What research gaps have been identified?

Each review in this corpus ( n  = 66) was searched for any research gaps that had been identified within the primary studies, which were then coded inductively (see Additional file 1 : Appendix A). More than 30 different categories of research suggestions emerged (see Additional file 13 : Appendix M), with the top ten research gap categories found in more than 10% of the corpus (see Table  8 ). The most prominent research issue (in 40.9% of studies) relates to the need for further ethical consideration and attention within AIHEd research as both a topic of research and as an issue in the conduct of empirical research, followed closely by the need for a range of further empirical research with a greater emphasis on methodological rigour, including research design and reporting (36.4%). AIHEd reviews also identified the need for future primary research with a wider range of stakeholders (21.2%), within a more diverse array of countries (15.2%) and disciplines (16.7%).

Ethical implications

Eight reviews found that primary research rarely addressed privacy problems, such as participant data protection during educational data collection (Alam & Mohanty, 2022 ; Fichten et al., 2021 ; Li et al., 2021 ; Manhiça et al., 2022 ; Otoo-Arthur & van Zyl, 2020 ; Salas-Pilco & Yang, 2022 ; Salas-Pilco et al., 2022 ; Zawacki-Richter et al., 2019 ), and that this necessitates the need for the creation or improvement of ethical frameworks (Zhai & Wibowo, 2023 ), alongside a deeper understanding of the social implications of AI more broadly (Bearman et al., 2023 ). Educating students about their own ethical behaviour and the ethical use of AI also emerged as an important topic (Albluwi, 2019; Buchanan et al., 2021 ; Charow et al., 2021 ; Lee et al., 2021 ; Salas-Pilco & Yang, 2022 ), with the need for more evaluation and reporting of current curriculum impact, especially in the fields of Nursing and Medicine (e.g., Grunhut et al., 2021 ). Potential topics of future research include:

Student perceptions of the use of AI in assessment (del Gobbo et al., 2023 );

How to make data more secure (Ullrich et al., 2022 );

How to correct sample bias and balance issues of privacy with the affordances of AI (Saghiri et al., 2022 ; Zhang et al., 2023 ); and

How institutions are storing and using teaching and learning data (Ifenthaler & Yau, 2020 ; Maphosa & Maphosa, 2021 ; McConvey et al., 2023 ; Rangel-de Lázaro & Duart, 2023 ; Sghir et al., 2022 ; Ullrich et al., 2022 ).

Methodological approaches

Aside from recognising that further empirical research is needed (e.g., Alkhalil et al., 2021 ; Buchanan et al., 2021 ), more rigorous reporting of study design in primary research was called for, including ensuring that the number of participants and study level is reported (Fichten et al., 2021 ; Harmon et al., 2021 ). Although there is still a recognised need for AIHEd quasi-experiments (Darvishi et al., 2022 ) and experiments, particularly those that allow multiple educational design variations (Fontaine et al., 2019 ; Hwang et al., 2022 ; Zhang et al., 2023 ; Zhong, 2022 ), a strong suggestion has been made for more qualitative, mixed methods and design-based approaches (e.g., Abu Saa et al., 2019 ), alongside longitudinal studies (e.g., Zawacki-Richter et al., 2019 ) and larger sample sizes (e.g., Zhang et al., 2023 ). Further potential approaches and topics include:

The use of surveys, course evaluation surveys, network access logs, physiological data, observations, interviews (Abu Saa et al., 2019 ; Alam & Mohanty, 2022 ; Andersen et al., 2022; Chu et al., 2022 ; Hwang et al., 2022 ; Zawacki-Richter et al., 2019 );

More evaluation of the effectiveness of tools on learning, cognition, affect, skills etc. rather than focusing on technical aspects like accuracy (Albluwi, 2019; Chaka, 2023 ; Crow et al., 2018 ; Frangoudes et al., 2021 ; Zhong, 2022 );

Multiple case study design (Bearman et al., 2023 ; Ullrich et al., 2022 );

Cross referencing data with external platforms such as social media data (Rangel-de Lázaro & Duart, 2023 ; Urdaneta-Ponte et al., 2021 ); and

A focus on age and gender as demographic variables (Zhai & Wibowo, 2023 ).

Study contexts

In regard to stakeholders who should be included in future AIHEd research, reviews identified the need for more diverse populations when training data (e.g., Sghir et al., 2022 ), such as underrepresented groups (Pinto et al., 2023 ) and students with disabilities (Fichten et al., 2021 ), to help ensure that their needs are reflected in AI development. Further primary research with postgraduate students (Crompton & Burke, 2023 ), educators (Alyahyan & Düştegör, 2020 ; del Gobbo et al., 2023 ; Hamam, 2021 ; Sourani, 2019 ), and managers/administrators (e.g., Ullrich et al., 2022 ) has also been called for.

More research is needed within a wider range of contexts, especially developing countries (e.g., Pinto et al., 2023 ), such as India (Bhattacharjee, 2019 ) and African nations (Gudyanga, 2023 ; Maphosa & Maphosa, 2020 ), in order to better understand how AI can be used to enhance learning in under-resourced communities (Crompton & Burke, 2023 ). Multiple reviews also stressed the need for further research in disciplines other than STEM (e.g., Chaka, 2023 ), including Social Sciences (e.g., Alyahyan & Düştegör, 2020 ), Visual Arts (Chu et al., 2022 ) and hands-on subjects such as VET education (Fariani et al., 2021 ), although there were still specific areas of need identified in nursing (Hwang et al., 2022 ) and dentistry (Saghiri et al., 2022 ) for example. The state of AIHEd research within Education itself is an issue (Alam & Mohanty, 2022 ; Zawacki-Richter et al., 2019 ), and suggestions for more interdisciplinary approaches have been made, in order to improve pedagogical applications and outcomes (e.g., Kirubarajan et al., 2022 ). Potential further research approaches include:

Student perceptions of effectiveness and AI fairness (del Gobbo et al., 2023 ; Hamam, 2021 ; Otoo-Arthur & van Zyl, 2020 );

Combining student and educator perspectives (Rabelo et al., 2023 );

Low level foreign language learners and chatbots (Klímová & Ibna Seraj, 2023 );

Non formal education (Urdaneta-Ponte et al., 2021 ); and

Investigating a similar dataset with data retrieved from different educational contexts (Fahd et al., 2022 )

By using the framework of Zawacki-Richter et al. ( 2019 ), this tertiary review of 66 AIHEd evidence syntheses found that most reviews report findings on the use of adaptive systems and personalisation tools, followed by profiling and prediction tools. However, owing to the heavy predominance of primary AIHEd research in STEM and Health & Welfare courses, as in other EdTech research (e.g., Lai & Bower, 2019 ), AI applications and presence within the curriculum appear to be at a more mature stage in those rather than in other disciplines. Furthermore, insights into how AI is being used at the postgraduate level, as well as at the institutional and administrative level, remain limited.

This review of reviews confirms that the benefits of AI in higher education are multifold. Most notably, AI facilitates personalised learning, which constitutes approximately 38.7% of the identified advantages in the reviewed studies. AI systems are adaptable and allow learning materials to be tailored to individual needs, thereby enhancing student autonomy, and enabling early interventions for disengaged students (Algabri et al., 2021 ; Alotaibi & Alshehri, 2023 ; Bearman et al., 2023 ). Other significant benefits include the positive influence on learning outcomes, reduced administrative time for educators, and greater insight into student understanding. AI not only enhances traditional academic outcomes but also aids in professional training and specific skill development (Buchanan et al., 2021 ; Hwang et al., 2022 ; Sapci & Sapci, 2020 ). However, the adoption of AI in higher education is not without challenges. The most frequently cited concern is the lack of ethical consideration in AI applications, followed by issues related to curriculum development and infrastructure. Studies indicate the need for substantial financial investment and technical literacy to fully integrate AI into existing educational systems (Alotaibi & Alshehri, 2023 ; Charow et al., 2021 ). Moreover, there is a noted lack of educator technical knowledge and fears regarding job displacement due to AI, which require attention (Alotaibi & Alsheri, 2023 ; Bearman et al., 2023 ).

In contrast to previous reviews in the field of EdTech (e.g., Bodily et al., 2019 ), and previous EdTech tertiary reviews (Buntins et al., 2023 ; Zawacki-Richter, 2023 ), authors conducting AIHEd evidence synthesis represent a wide range of countries, with the top six most productive countries from six different continents. Despite this, there is still less research emerging from Oceania, Africa and, in particular, from South and Central America, although in the case of the latter, it is possible that this is due to authors publishing in their own native language rather than in English (Marin et al., 2023 ). Related to the issue of global reach, only 67.7% of evidence synthesis in this sample were published open access, as opposed to 88.3% of higher education EdTech research published during the pandemic (Bond et al., 2021 ). This limits not only the ability of educators and researchers from lower resourced institutions to read these reviews, but it decreases its visibility generally, thereby increasing the likelihood that other researchers will duplicate effort and conduct similar or exactly the same research, leading to ‘research waste’ (Grainger et al., 2020 ; Siontis & Ioannidis, 2018 ). Therefore, in order to move the AIHEd field forward, we are calling for a focus on three particular areas, namely ethics, collaboration, and rigour.

A call for increased ethics

There is a loud and resounding call for an enhanced focus on ethics in future AIHEd research, with 40.9% of reviews in this corpus indicating that some form of ethical considerations are needed. Whilst this realisation is not lost on the AIEd field, with at least four evidence syntheses published specifically on the topic in the last two years (Guan et al., 2023 ; Mahmood et al., 2022 ; Rios-Campos et al., 2023 ; Yu & Yu, 2023 ), Footnote 29 this meta review indicates that the issue remains pressing. Future primary research must ensure that lengthy consideration is given to participant consent, data collection procedures, and data storage (Otoo-Arthur & van Zyl, 2020 ). Further consideration must also be given to the biases that can be perpetuated through data (Zhang et al., 2023 ), as well as embedding ethical AI as a topic throughout the HE curriculum (Grunhut et al., 2021 ).

There is also a need for more ethical consideration when conducting evidence synthesis. This review uncovered examples of evidence synthesis that stated the ‘use’ of the PRISMA guidelines (Page et al., 2021 ), for example, but that did not cite it in the reference list or cited it incorrectly, as well as secondary research that used the exact methodology and typology of Zawacki-Richter et al. ( 2019 ), ending up with very similar findings, but that did not cite the original article at all. Further to this, one review was excluded from the corpus, as it plagiarised the entire Zawacki-Richter et al. ( 2019 ) article. Whilst concerns are growing over the use and publication of generative AI produced summaries that plagiarise whole sections of text (see Kalz, 2023 ), ensuring that we conduct primary and secondary research as rigorously and transparently as possible is our purview as researchers, and is vitally needed if we are to expand and enhance the field.

A call for increased collaboration

The findings of this review highlighted the need for collaboration in four key areas: the development of AI applications, designing and teaching AI curriculum, researching AIHEd, and conducting evidence syntheses. In order to translate future AI tools into practice and meet community expectations, there is a need to include intended users in their development (Harmon et al., 2021 ; McConvey et al., 2023 ), which Li et al. ( 2021 ) also suggest could include the collection and sharing of massive data across disciplines and contexts, whilst adhering to considerations of privacy. Multidisciplinary teams should then be brought together, including data scientists, educators and students, to ensure that AI curricula are robust, ethical and fit for purpose (Charow et al., 2021 ; Sapci & Sapci, 2020 ). In the case of medical education, health professionals and leaders, as well as patients, should also be involved (Grunhut et al., 2021 ; Zhang et al., 2023 ).

In order to evaluate the efficacy of AI applications in higher education, interdisciplinary research teams should include a range of stakeholders from diverse communities (Chu et al., 2022 ; Crompton & Burke, 2023 ; Hwang et al., 2021), for example linking computer scientists with researchers in the humanities and social sciences (Ullrich et al., 2022 ). Finally, in terms of evidence synthesis authorship, the large amount of domestic research collaborations indicates that the field could benefit from further international research collaborations, especially for authors in Oceania and Europe, as this might provide more contextual knowledge, as well as help eliminate language bias when it comes to searching for literature (Rangel-de Lázaro & Duart, 2023 ). A large proportion of authors from Africa and the Middle East also published as single authors (29% and 22% respectively). By conducting evidence synthesis in teams, greater rigour can be achieved through shared understanding, discussion and inter-rater reliability measures (Booth et al., 2013 ). It should be noted here, however, that less than half of the reviews in this corpus (43.9%, n  = 29) did not report any inter-rater agreement processes, which, although this is better than what was found in previous umbrella reviews of EdTech research (Buntins et al., 2023 ; Zawacki-Richter, 2023 ), represents the beginning of a much-needed discussion on research rigour.

A call for increased rigour

The prevailing landscape of AIHEd research evidences a compelling call for enhanced rigour and methodological robustness. A noticeable 65% of reviews are critically low to medium quality, signalling an imperative to recalibrate acceptance criteria to strengthen reliability and quality. The most concerning findings were that 31.8% of studies only searched in one or two databases, only 24.2% provided their exact data extraction coding scheme (compared to 51% in Chalmers et al., 2023 and 37% in Buntins et al., 2023 ), 45.5% did not undertake any form of quality assessment, and 34.8% did not reflect at all upon the limitations of their review. Furthermore, over half of the reviews (51.5%) did not report whether some form of digital evidence synthesis tool was used to conduct the review. Given the affordances in efficiency that machine learning can bring to evidence synthesis (e.g., Stansfield et al., 2022 ; Tsou et al., 2020 ), as well as the enhanced transparency through visualisation tools such as EPPI Visualiser, it is surprising that the AIHEd community has not made more use of them (see Zhang & Neitzel, 2023 ). These inconsistencies and the lack of using any methodological guidance, or the frequent recourse to somewhat dated (yet arguably seminal) approaches by Kitchenham et al. (2004, 2007, 2009)—prior to the first and subsequently updated PRISMA guidelines (Moher et al., 2009 ; Page et al., 2021 )—underscore an urgent necessity for contemporary, stringent, and universally adopted review guidelines within AIEd, but also within the wider field of EdTech (e.g., Jing et al., 2023 ) and educational research at large (e.g., Chong et al., 2023 ).

This tertiary review synthesised the findings of 66 AIHEd evidence syntheses, with a view to map the field and gain an understanding of authorship patterns, research quality, key topics, common findings, and potential research gaps in the literature. Future research will explore the full corpus of 307 AIEd evidence syntheses located across various educational levels, providing further insight into applications and future directions, alongside further guidance for the conduct of evidence synthesis. While AI offers promising avenues for enhancing educational experiences and outcomes, there are significant ethical, methodological, and pedagogical challenges that need to be addressed to harness its full potential effectively.

Data availability

All data is available to access via the EPPI Centre ( https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3917 ). This includes the web database ( https://eppi.ioe.ac.uk/eppi-vis/login/open?webdbid=322 ) and the search strategy information on the OSF ( https://doi.org/10.17605/OSF.IO/Y2AFK ).

https://chat.openai.com/ .

https://openai.com/dall-e-2 .

https://blog.google/technology/ai/bard-google-ai-search-updates/ .

https://ai.meta.com/blog/large-language-model-llama-meta-ai/ .

https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence .

https://education.nsw.gov.au/about-us/strategies-and-reports/draft-national-ai-in-schools-framework .

https://www.ed.gov/news/press-releases/us-department-education-shares-insights-and-recommendations-artificial-intelligence .

Otherwise known as a review of reviews (see Kitchenham et al., 2009 ; Sutton et al., 2019 ).

As of 6th December 2023, https://scholar.google.com/scholar?oi=bibs&hl=en&cites=6006744895709946427 .

According to the journal website on Springer Open (see Zawacki-Richter et al., 2019 ).

As of 6th December 2023, it has been cited 2,559 times according to Science Direct and 4,678 times according to Google Scholar.

https://doi.org/10.17605/OSF.IO/Y2AFK .

https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3917 .

https://eppi.ioe.ac.uk/eppi-vis/login/open?webdbid=322 .

For more information about EPPI Mapper and creating interactive evidence gap maps, as well as using EPPI Visualiser, see https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3790 .

https://www.crd.york.ac.uk/PROSPERO/ .

https://idesr.org/ .

See Additional file 1 : Appendix A for a tabulated list of included study characteristics and https://eppi.ioe.ac.uk/eppi-vis/login/open?webdbid=322 for the interactive web database.

It should be noted that Cardona et al. ( 2023 ) was originally published in 2020 but has since been indexed in a 2023 journal issue. Their review has been kept as 2020.

https://eppi.ioe.ac.uk/cms/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4& .

https://www.covidence.org/ .

https://2024.hci.international/papers .

Two bibliometric studies (Gudyanga, 2023 ; Hinojo-Lucena et al., 2019 ) focused on trends in AI research (countries, journals etc.) and did not specify particular applications.

These are not included in this corpus, as they include results from other educational levels.

*Indicates that the article is featured in the corpus of the review

Abdelkader, H. E., Gad, A. G., Abohany, A. A., & Sorour, S. E. (2022). An efficient data mining technique for assessing satisfaction level with online learning for higher education students during the COVID-19 pandemic. IEEE Access, 10 , 6286–6303. https://doi.org/10.1109/ACCESS.2022.3143035

Article   Google Scholar  

*Abu Saa, A., Al-Emran, M., & Shaalan, K. (2019). Factors affecting students’ performance in higher education: A systematic review of predictive data mining techniques. Technology, Knowledge and Learning, 24 (4), 567–598. https://doi.org/10.1007/s10758-019-09408-7

*Alam, A., & Mohanty, A. (2022). Foundation for the Future of Higher Education or ‘Misplaced Optimism’? Being Human in the Age of Artificial Intelligence. In M. Panda, S. Dehuri, M. R. Patra, P. K. Behera, G. A. Tsihrintzis, S.-B. Cho, & C. A. Coello Coello (Eds.), Innovations in Intelligent Computing and Communication (pp. 17–29). Springer International Publishing. https://doi.org/10.1007/978-3-031-23233-6_2

*Algabri, H. K., Kharade, K. G., & Kamat, R. K. (2021). Promise, threats, and personalization in higher education with artificial intelligence. Webology, 18 (6), 2129–2139.

Google Scholar  

*Alkhalil, A., Abdallah, M. A., Alogali, A., & Aljaloud, A. (2021). Applying big data analytics in higher education: A systematic mapping study. International Journal of Information and Communication Technology Education, 17 (3), 29–51. https://doi.org/10.4018/IJICTE.20210701.oa3

Allman, B., Kimmons, R., Rosenberg, J., & Dash, M. (2023). Trends and Topics in Educational Technology, 2023 Edition. TechTrends Linking Research & Practice to Improve Learning, 67 (3), 583–591. https://doi.org/10.1007/s11528-023-00840-2

*Alotaibi, N. S., & Alshehri, A. H. (2023). Prospers and obstacles in using artificial intelligence in Saudi Arabia higher education institutions—The potential of AI-based learning outcomes. Sustainability, 15 (13), 10723. https://doi.org/10.3390/su151310723

*Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17 (1), 1–21. https://doi.org/10.1186/s41239-020-0177-7

Arksey, H., & O’Malley, L. (2005). Scoping studies: Towards a methodological framework. International Journal of Social Research Methodology, 8 (1), 19–32. https://doi.org/10.1080/1364557032000119616

*Banihashem, S. K., Noroozi, O., van Ginkel, S., Macfadyen, L. P., & Biemans, H. J. (2022). A systematic review of the role of learning analytics in enhancing feedback practices in higher education. Educational Research Review, 37 , 100489. https://doi.org/10.1016/j.edurev.2022.100489

*Bearman, M., Ryan, J., & Ajjawi, R. (2023). Discourses of artificial intelligence in higher education: A critical literature review. Higher Education, 86 (2), 369–385. https://doi.org/10.1007/s10734-022-00937-2

*Bhattacharjee, K. K. (2019). Research Output on the Usage of Artificial Intelligence in Indian Higher Education - A Scientometric Study. In 2019 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) (pp. 916–919). IEEE. https://doi.org/10.1109/ieem44572.2019.8978798

Bodily, R., Leary, H., & West, R. E. (2019). Research trends in instructional design and technology journals. British Journal of Educational Technology, 50 (1), 64–79. https://doi.org/10.1111/bjet.12712

Bond, M. (2018). Helping doctoral students crack the publication code: An evaluation and content analysis of the Australasian Journal of Educational Technology. Australasian Journal of Educational Technology, 34 (5), 168–183. https://doi.org/10.14742/ajet.4363

Bond, M., Bedenlier, S., Marín, V. I., & Händel, M. (2021). Emergency remote teaching in higher education: mapping the first global online semester. International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-021-00282-x

Bond, M., Zawacki-Richter, O., & Nichols, M. (2019). Revisiting five decades of educational technology research: A content and authorship analysis of the British Journal of Educational Technology. British Journal of Educational Technology, 50 (1), 12–63. https://doi.org/10.1111/bjet.12730

Booth, A., Carroll, C., Ilott, I., Low, L. L., & Cooper, K. (2013). Desperately seeking dissonance: Identifying the disconfirming case in qualitative evidence synthesis. Qualitative Health Research, 23 (1), 126–141. https://doi.org/10.1177/1049732312466295

Bozkurt, A., & Sharma, R. C. (2023). Challenging the status quo and exploring the new boundaries in the age of algorithms: Reimagining the role of generative AI in distance education and online learning. Asian Journal of Distance Education. https://doi.org/10.5281/zenodo.7755273

Bozkurt, A., Xiao, J., Lambert, S., Pazurek, A., Crompton, H., Koseoglu, S., Farrow, R., Bond, M., Nerantzi, C., Honeychurch, S., Bali, M., Dron, J., Mir, K., Stewart, B., Costello, E., Mason, J., Stracke, C. M., Romero-Hall, E., Koutropoulos, A., Toquero, C. M., Singh, L., Tlili, A., Lee, K., Nichols, M., Ossiannilsson, E., Brown, M., Irvine, V., Raffaghelli, J. E., Santos-Hermosa, G., Farrell, O., Adam, T., Thong, Y. L., Sani-Bozkurt, S., Sharma, R. C., Hrastinski, S., & Jandrić, P. (2023). Speculative futures on ChatGPT and generative Artificial Intelligence (AI): A collective reflection from the educational landscape.  Asian Journal of Distance Education , 18(1), 1–78. http://www.asianjde.com/ojs/index.php/AsianJDE/article/view/709/394

*Buchanan, C., Howitt, M. L., Wilson, R., Booth, R. G., Risling, T., & Bamford, M. (2021). Predicted influences of artificial intelligence on nursing education: Scoping review. JMIR Nursing, 4 (1), e23933. https://doi.org/10.2196/23933

Buntins, K., Bedenlier, S., Marín, V., Händel, M., & Bond, M. (2023). Methodological approaches to evidence synthesis in educational technology: A tertiary systematic mapping review. MedienPädagogik, 54 , 167–191. https://doi.org/10.21240/mpaed/54/2023.12.20.X

*Burney, I. A., & Ahmad, N. (2022). Artificial Intelligence in Medical Education: A citation-based systematic literature review. Journal of Shifa Tameer-E-Millat University, 5 (1), 43–53. https://doi.org/10.32593/jstmu/Vol5.Iss1.183

*Cardona, T., Cudney, E. A., Hoerl, R., & Snyder, J. (2023). Data mining and machine learning retention models in higher education. Journal of College Student Retention: Research, Theory and Practice, 25 (1), 51–75. https://doi.org/10.1177/1521025120964920

Centre for Reviews and Dissemination (UK). (1995). Database of Abstracts of Reviews of Effects (DARE): Quality-assessed Reviews. https://www.ncbi.nlm.nih.gov/books/NBK285222/ . Accessed 4 January 2023.

*Chaka, C. (2023). Fourth industrial revolution—a review of applications, prospects, and challenges for artificial intelligence, robotics and blockchain in higher education. Research and Practice in Technology Enhanced Learning, 18 (2), 1–39. https://doi.org/10.58459/rptel.2023.18002

Chalmers, H., Brown, J., & Koryakina, A. (2023). Topics, publication patterns, and reporting quality in systematic reviews in language education. Lessons from the international database of education systematic reviews (IDESR). Applied Linguistics Review . https://doi.org/10.1515/applirev-2022-0190

*Charow, R., Jeyakumar, T., Younus, S., Dolatabadi, E., Salhia, M., Al-Mouaswas, D., Anderson, M., Balakumar, S., Clare, M., Dhalla, A., Gillan, C., Haghzare, S., Jackson, E., Lalani, N., Mattson, J., Peteanu, W., Tripp, T., Waldorf, J., Williams, S., & Wiljer, D. (2021). Artificial intelligence education programs for health care professionals: Scoping review. JMIR Medical Education, 7 (4), e31043. https://doi.org/10.2196/31043

Chen, X., Zou, D., Xie, H., Cheng, G., & Liu, C. (2022). Two decades of artificial intelligence in education: Contributors, collaborations, research topics, challenges, and future directions. Educational Technology and Society, 25 (1), 28–47. https://doi.org/10.2307/48647028

Chong, S. W., Bond, M., & Chalmers, H. (2023). Opening the methodological black box of research synthesis in language education: Where are we now and where are we heading? Applied Linguistics Review . https://doi.org/10.1515/applirev-2022-0193

Chu, H.-C., Hwang, G.-H., Tu, Y.-F., & Yang, K.-H. (2022). Roles and research trends of artificial intelligence in higher education: A systematic review of the top 50 most-cited articles. Australasian Journal of Educational Technology, 38 (3), 22–42. https://doi.org/10.14742/ajet.7526

Cobos, C., Rodriguez, O., Rivera, J., Betancourt, J., Mendoza, M., León, E., & Herrera-Viedma, E. (2013). A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes. Information Processing and Management, 49 (3), 607–625. https://doi.org/10.1016/j.ipm.2012.12.002

*Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-023-00392-8

*Crow, T., Luxton-Reilly, A., & Wuensche, B. (2018). Intelligent tutoring systems for programming education. In R. Mason & Simon (Eds.), Proceedings of the 20th Australasian Computing Education Conference (pp. 53–62). ACM. https://doi.org/10.1145/3160489.3160492

Daoudi, I. (2022). Learning analytics for enhancing the usability of serious games in formal education: A systematic literature review and research agenda. Education and Information Technologies, 27 (8), 11237–11266. https://doi.org/10.1007/s10639-022-11087-4

*Darvishi, A., Khosravi, H., Sadiq, S., & Weber, B. (2022). Neurophysiological measurements in higher education: A systematic literature review. International Journal of Artificial Intelligence in Education, 32 (2), 413–453. https://doi.org/10.1007/s40593-021-00256-0

*de Oliveira, T. N., Bernardini, F., & Viterbo, J. (2021). An Overview on the Use of Educational Data Mining for Constructing Recommendation Systems to Mitigate Retention in Higher Education. In 2021 IEEE Frontiers in Education Conference (FIE) (pp. 1–7). IEEE. https://doi.org/10.1109/FIE49875.2021.9637207

*Del Gobbo, E., Guarino, A., Cafarelli, B., Grilli, L., & Limone, P. (2023). Automatic evaluation of open-ended questions for online learning. A systematic mapping. Studies in Educational Evaluation, 77 , 101258. https://doi.org/10.1016/j.stueduc.2023.101258

Desmarais, M. C., & Baker, R. S. D. (2012). A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 22 , 9–38.

Digital Solution Foundry, & EPPI-Centre. (2023). EPPI-Mapper (Version 2.2.3) [Computer software]. UCL Social Research Institute, University College London. http://eppimapper.digitalsolutionfoundry.co.za/#/

Dillenbourg, P., & Jermann, P. (2007). Designing integrative scripts. In Scripting computer-supported collaborative learning: Cognitive, computational and educational perspectives (pp. 275–301). Springer US.

Doroudi, S. (2022). The intertwined histories of artificial intelligence and education. International Journal of Artificial Intelligence in Education, 1–44.

*Fahd, K., Venkatraman, S., Miah, S. J., & Ahmed, K. (2022). Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies, 27 (3), 3743–3775. https://doi.org/10.1007/s10639-021-10741-7

*Fariani, R. I., Junus, K., & Santoso, H. B. (2023). A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning, 28 (2), 449–476. https://doi.org/10.1007/s10758-022-09628-4

*Fichten, C., Pickup, D., Asunsion, J., Jorgensen, M., Vo, C., Legault, A., & Libman, E. (2021). State of the research on artificial intelligence based apps for post-secondary students with disabilities. Exceptionality Education International, 31 (1), 62–76. https://doi.org/10.5206/EEI.V31I1.14089

*Fontaine, G., Cossette, S., Maheu-Cadotte, M.-A., Mailhot, T., Deschênes, M.-F., Mathieu-Dupuis, G., Côté, J., Gagnon, M.-P., & Dubé, V. (2019). Efficacy of adaptive e-learning for health professionals and students: A systematic review and meta-analysis. British Medical Journal Open, 9 (8), e025252. https://doi.org/10.1136/bmjopen-2018-025252

*Frangoudes, F., Hadjiaros, M., Schiza, E. C., Matsangidou, M., Tsivitanidou, O., & Neokleous, K. (2021). An Overview of the Use of Chatbots in Medical and Healthcare Education. In P. Zaphiris & A. Ioannou (Eds.), Lecture Notes in Computer Science. Learning and Collaboration Technologies: Games and Virtual Environments for Learning (Vol. 12785, pp. 170–184). Springer International Publishing. https://doi.org/10.1007/978-3-030-77943-6_11

Gough, D., Oliver, S., & Thomas, J. (Eds.). (2012). An introduction to systematic reviews . SAGE.

Grainger, M. J., Bolam, F. C., Stewart, G. B., & Nilsen, E. B. (2020). Evidence synthesis for tackling research waste. Nature Ecology & Evolution, 4 (4), 495–497. https://doi.org/10.1038/s41559-020-1141-6

*Grunhut, J., Wyatt, A. T., & Marques, O. (2021). Educating Future Physicians in Artificial Intelligence (AI): An integrative review and proposed changes. Journal of Medical Education and Curricular Development, 8 , 23821205211036836. https://doi.org/10.1177/23821205211036836

Guan, X., Feng, X., & Islam, A. A. (2023). The dilemma and countermeasures of educational data ethics in the age of intelligence. Humanities and Social Sciences Communications . https://doi.org/10.1057/s41599-023-01633-x

*Gudyanga, R. (2023). Mapping education 4.0 research trends. International Journal of Research in Business and Social Science, 12 (4), 434–445. https://doi.org/10.20525/ijrbs.v12i4.2585

Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of google scholar, Pubmed and 26 other resources. Research Synthesis Methods, 11 (2), 181–217. https://doi.org/10.1002/jrsm.1378

Haddaway, N. R., Collins, A. M., Coughlin, D., & Kirk, S. (2015). The role of google scholar in evidence reviews and its applicability to grey literature searching. PLoS ONE, 10 (9), e0138237. https://doi.org/10.1371/journal.pone.0138237

*Hamam, D. (2021). The New Teacher Assistant: A Review of Chatbots’ Use in Higher Education. In C. Stephanidis, M. Antona, & S. Ntoa (Eds.), Communications in Computer and Information Science. HCI International 2021—Posters (Vol. 1421, pp. 59–63). Springer International Publishing. https://doi.org/10.1007/978-3-030-78645-8_8

Han, B., Nawaz, S., Buchanan, G., & McKay, D. (2023). Ethical and Pedagogical Impacts of AI in Education. In International Conference on Artificial Intelligence in Education (pp. 667–673). Cham: Springer Nature Switzerland.

*Harmon, J., Pitt, V., Summons, P., & Inder, K. J. (2021). Use of artificial intelligence and virtual reality within clinical simulation for nursing pain education: A scoping review. Nurse Education Today, 97 , 104700. https://doi.org/10.1016/j.nedt.2020.104700

*Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V. V., Gutica, M., Hynninen, T., Knutas, A., Leinonen, J., Messom, C., & Liao, S. N. (2018). Predicting Academic Performance: A Systematic Literature Review. In ITiCSE 2018 Companion, Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (pp. 175–199). Association for Computing Machinery. https://doi.org/10.1145/3293881.3295783

Hew, K. F., Hu, X., Qiao, C., & Tang, Y. (2020). What predicts student satisfaction with MOOCs: A gradient boosting trees supervised machine learning and sentiment analysis approach. Computers & Education, 145 , 103724. https://doi.org/10.1016/j.compedu.2019.103724

Higgins, S., Xiao, Z., & Katsipataki, M. (2012). The impact of digital technology on learning: A summary for the Education Endowment Foundation. Education Endowment Foundation. https://eric.ed.gov/?id=ED612174

*Hinojo-Lucena, F.-J., Aznar-Diaz, I., Romero-Rodríguez, J.-M., & Cáceres-Reche, M.-P. (2019). Artificial Intelligence in Higher Education: A Bibliometric Study on its Impact in the Scientific Literature. Education Sciences . https://doi.org/10.3390/educsci9010051

Ho, I. M., Cheong, K. Y., & Weldon, A. (2021). Predicting student satisfaction of emergency remote learning in higher education during COVID-19 using machine learning techniques. PLoS ONE . https://doi.org/10.1371/journal.pone.0249423

Holmes, W., Porayska-Pomsta, K., Holstein, K., Sutherland, E., Baker, T., Shum, S. B., ... & Koedinger, K. R. (2021). Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education , 1–23.

*Hwang, G.‑J., Tang, K.‑Y., & Tu, Y.‑F. (2022). How artificial intelligence (AI) supports nursing education: Profiling the roles, applications, and trends of AI in nursing education research (1993–2020). Interactive Learning Environments , https://doi.org/10.1080/10494820.2022.2086579

*Ifenthaler, D., & Yau, J.Y.-K. (2020). Utilising learning analytics to support study success in higher education: A systematic review. Educational Technology Research & Development, 68 (4), 1961–1990. https://doi.org/10.1007/s11423-020-09788-z

İpek, Z. H., Gözüm, A. İC., Papadakis, S., & Kallogiannakis, M. (2023). Educational Applications of the ChatGPT AI System: A Systematic Review Research. Educational Process International Journal . https://doi.org/10.22521/edupij.2023.123.2

Jing, Y., Wang, C., Chen, Y., Wang, H., Yu, T., & Shadiev, R. (2023). Bibliometric mapping techniques in educational technology research: A systematic literature review. Education and Information Technologies . https://doi.org/10.1007/s10639-023-12178-6

Kalz, M. (2023). AI destroys principles of authorship. A scary case from educational technology publishing . https://kalz.cc/2023/09/15/ai-destroys-principles-of-authorship.-a-scary-case-from-educational-technology-publishing

Khosravi, H., Shum, S. B., Chen, G., Conati, C., Tsai, Y. S., Kay, J., ... & Gašević, D. (2022). Explainable artificial intelligence in education. Computers and Education: Artificial Intelligence, 3, 100074.

*Kirubarajan, A., Young, D., Khan, S., Crasto, N., Sobel, M., & Sussman, D. (2022). Artificial Intelligence and Surgical Education: A Systematic Scoping Review of Interventions. Journal of Surgical Education, 79 (2), 500–515. https://doi.org/10.1016/j.jsurg.2021.09.012

Kitchenham, B. (2004). Procedures for Performing Systematic Reviews . Keele. Software Engineering Group, Keele University. https://www.inf.ufsc.br/~aldo.vw/kitchenham.pdf

Kitchenham, B., & Charters, S. (2007). Guidelines for Performing Systematic Literature Reviews in Software Engineering: Technical Report EBSE 2007-001. Keele University and Durham University.

Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering—A systematic literature review. Information and Software Technology, 51 (1), 7–15. https://doi.org/10.1016/j.infsof.2008.09.009

Kitchenham, B., Pretorius, R., Budgen, D., Pearl Brereton, O., Turner, M., Niazi, M., & Linkman, S. (2010). Systematic literature reviews in software engineering—A tertiary study. Information and Software Technology, 52 (8), 792–805. https://doi.org/10.1016/j.infsof.2010.03.006

*Klímová, B., & Ibna Seraj, P. M. (2023). The use of chatbots in university EFL settings: Research trends and pedagogical implications. Frontiers in Psychology, 14 , 1131506. https://doi.org/10.3389/fpsyg.2023.1131506

Lai, J. W., & Bower, M. (2019). How is the use of technology in education evaluated? A systematic review. Computers & Education, 133 , 27–42. https://doi.org/10.1016/j.compedu.2019.01.010

Lai, J. W., & Bower, M. (2020). Evaluation of technology use in education: Findings from a critical analysis of systematic literature reviews. Journal of Computer Assisted Learning, 36 (3), 241–259. https://doi.org/10.1111/jcal.12412

*Lee, J., Wu, A. S., Li, D., & Kulasegaram, K. M. (2021). Artificial Intelligence in Undergraduate Medical Education: A Scoping Review. Academic Medicine, 96 (11S), S62–S70. https://doi.org/10.1097/ACM.0000000000004291

*Li, C., Herbert, N., Yeom, S., & Montgomery, J. (2022). Retention Factors in STEM Education Identified Using Learning Analytics: A Systematic Review. Education Sciences, 12 (11), 781. https://doi.org/10.3390/educsci12110781

*Li, F., He, Y., & Xue, Q. (2021). Progress, Challenges and Countermeasures of Adaptive Learning: A Systematic Review. Educational Technology and Society, 24 (3), 238–255. https://eric.ed.gov/?id=EJ1305781

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke, M., Devereaux, P. J., Kleijnen, J., & Moher, D. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration. BMJ (clinical Research Ed.), 339 , b2700. https://doi.org/10.1136/bmj.b2700

Linnenluecke, M. K., Marrone, M., & Singh, A. K. (2020). Conducting systematic literature reviews and bibliometric analyses. Australian Journal of Management, 45 (2), 175–194. https://doi.org/10.1177/0312896219877678

*Liz-Domínguez, M., Caeiro-Rodríguez, M., Llamas-Nistal, M., & Mikic-Fonte, F. A. (2019). Systematic literature review of predictive analysis tools in higher education. Applied Sciences, 9 (24), 5569. https://doi.org/10.3390/app9245569

Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13 (4), 410. https://doi.org/10.3390/educsci13040410

Lodhi, P., Mishra, O., Jain, S., & Bajaj, V. (2018). StuA: An intelligent student assistant. International Journal of Interactive Multimedia and Artificial Intelligence, 5 (2), 17–25. https://doi.org/10.9781/ijimai.2018.02.008

Mahmood, A., Sarwar, Q., & Gordon, C. (2022). A Systematic Review on Artificial Intelligence in Education (AIE) with a focus on Ethics and Ethical Constraints. Pakistan Journal of Multidisciplinary Research, 3(1). https://pjmr.org/pjmr/article/view/245

*Manhiça, R., Santos, A., & Cravino, J. (2022). The use of artificial intelligence in learning management systems in the context of higher education: Systematic literature review. In 2022 17th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1–6). IEEE. https://doi.org/10.23919/CISTI54924.2022.9820205

*Maphosa, M., & Maphosa, V. (2020). Educational data mining in higher education in sub-saharan africa. In K. M. Sunjiv Soyjaudah, P. Sameerchand, & U. Singh (Eds.), Proceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications (pp. 1–7). ACM. https://doi.org/10.1145/3415088.3415096

*Maphosa, V., & Maphosa, M. (2021). The trajectory of artificial intelligence research in higher education: A bibliometric analysis and visualisation. In 2021 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD) (pp. 1–7). IEEE. https://doi.org/10.1109/icabcd51485.2021.9519368

Marin, V. I., Buntins, K., Bedenlier, S., & Bond, M. (2023). Invisible borders in educational technology research? A comparative analysis. Education Technology Research & Development, 71 , 1349–1370. https://doi.org/10.1007/s11423-023-10195-3

*McConvey, K., Guha, S., & Kuzminykh, A. (2023). A Human-Centered Review of Algorithms in Decision-Making in Higher Education. In A. Schmidt, K. Väänänen, T. Goyal, P. O. Kristensson, A. Peters, S. Mueller, J. R. Williamson, & M. L. Wilson (Eds.), Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–15). ACM. https://doi.org/10.1145/3544548.3580658

McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia Medica . https://doi.org/10.11613/BM.2012.031

Moher, D., Liberati, A., Tetzlaff, J., & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. BMJ (clinical Research Ed.), 339 , b2535. https://doi.org/10.1136/bmj.b2535

Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., Stewart, L. A., PRISMA-P Group. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement. Systematic Reviews, 4 (1), 1. https://doi.org/10.1186/2046-4053-4-1

*Moonsamy, D., Naicker, N., Adeliyi, T. T., & Ogunsakin, R. E. (2021). A Meta-analysis of Educational Data Mining for Predicting Students Performance in Programming. International Journal of Advanced Computer Science and Applications, 12 (2), 97–104. https://doi.org/10.14569/IJACSA.2021.0120213

OECD. (2021). AI and the Future of Skills, Volume 1: Capabilities and Assessments . OECD Publishing. https://doi.org/10.1787/5ee71f34-en

OECD. (2023). AI publications by country. Visualisations powered by JSI using data from OpenAlex. Accessed on 27/9/2023, www.oecd.ai

*Otoo-Arthur, D., & van Zyl, T. (2020). A Systematic Review on Big Data Analytics Frameworks for Higher Education—Tools and Algorithms. In EBIMCS ‘19, Proceedings of the 2019 2nd International Conference on E-Business, Information Management and Computer Science. Association for Computing Machinery. https://doi.org/10.1145/3377817.3377836

*Ouyang, F., Zheng, L., & Jiao, P. (2022). Artificial intelligence in online higher education: A systematic review of empirical research from 2011 to 2020. Education and Information Technologies, 27 (6), 7893–7925. https://doi.org/10.1007/s10639-022-10925-9

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ (clinical Research Ed.), 372 , n71. https://doi.org/10.1136/bmj.n71

Pennington, R., Saadatzi, M. N., Welch, K. C., & Scott, R. (2014). Using robot-assisted instruction to teach students with intellectual disabilities to use personal narrative in text messages. Journal of Special Education Technology, 29 (4), 49–58. https://doi.org/10.1177/016264341402900404

Peters, M. D. J., Marnie, C., Colquhoun, H., Garritty, C. M., Hempel, S., Horsley, T., Langlois, E. V., Lillie, E., O’Brien, K. K., Tunçalp, Ӧ, Wilson, M. G., Zarin, W., & Tricco, A. C. (2021). Scoping reviews: Reinforcing and advancing the methodology and application. Systematic Reviews, 10 (1), 263. https://doi.org/10.1186/s13643-021-01821-3

Peters, M. D. J., Marnie, C., Tricco, A. C., Pollock, D., Munn, Z., Alexander, L., McInerney, P., Godfrey, C. M., & Khalil, H. (2020). Updated methodological guidance for the conduct of scoping reviews. JBI Evidence Synthesis, 18 (10), 2119–2126. https://doi.org/10.11124/JBIES-20-00167

Petticrew, M., & Roberts, H. (2006). Systematic Reviews in the Social Sciences . Blackwell Publishing.

Book   Google Scholar  

*Pinto, A. S., Abreu, A., Costa, E., & Paiva, J. (2023). How Machine Learning (ML) is Transforming Higher Education: A Systematic Literature Review. Journal of Information Systems Engineering and Management, 8 (2), 21168. https://doi.org/10.55267/iadt.07.13227

Polanin, J. R., Maynard, B. R., & Dell, N. A. (2017). Overviews in Education Research. Review of Educational Research, 87 (1), 172–203. https://doi.org/10.3102/0034654316631117

Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts . ArXiv. https://arxiv.org/abs/2205.01833

*Rabelo, A., Rodrigues, M. W., Nobre, C., Isotani, S., & Zárate, L. (2023). Educational data mining and learning analytics: A review of educational management in e-learning. Information Discovery and Delivery . https://doi.org/10.1108/idd-10-2022-0099

Rader, T., Mann, M., Stansfield, C., Cooper, C., & Sampson, M. (2014). Methods for documenting systematic review searches: A discussion of common issues. Research Synthesis Methods, 5 (2), 98–115. https://doi.org/10.1002/jrsm.1097

*Rangel-de Lázaro, G., & Duart, J. M. (2023). You can handle, you can teach it: Systematic review on the use of extended reality and artificial intelligence technologies for online higher education. Sustainability, 15 (4), 3507. https://doi.org/10.3390/su15043507

Reid, J. (1995). Managing learner support. In F. Lockwood (Ed.), Open and distance learning today (pp. 265–275). Routledge.

Rethlefsen, M. L., Kirtley, S., Waffenschmidt, S., Ayala, A. P., Moher, D., Page, M. J., & Koffel, J. B. (2021). Prisma-S: An extension to the PRISMA Statement for Reporting Literature Searches in Systematic Reviews. Systematic Reviews, 10 (1), 39. https://doi.org/10.1186/s13643-020-01542-z

Rios-Campos, C., Tejada-Castro, M. I., Del Viteri, J. C. L., Zambrano, E. O. G., Núñez, J. B., & Vara, F. E. O. (2023). Ethics of artificial intelligence. South Florida Journal of Development, 4 (4), 1715–1729. https://doi.org/10.46932/sfjdv4n4-022

Robinson, K. A., Brunnhuber, K., Ciliska, D., Juhl, C. B., Christensen, R., & Lund, H. (2021). Evidence-based research series-paper 1: What evidence-based research is and why is it important? Journal of Clinical Epidemiology, 129 , 151–157. https://doi.org/10.1016/j.jclinepi.2020.07.020

*Saghiri, M. A., Vakhnovetsky, J., & Nadershahi, N. (2022). Scoping review of artificial intelligence and immersive digital tools in dental education. Journal of Dental Education, 86 (6), 736–750. https://doi.org/10.1002/jdd.12856

*Salas-Pilco, S., Xiao, K., & Hu, X. (2022). Artificial intelligence and learning analytics in teacher education: A systematic review. Education Sciences, 12 (8), 569. https://doi.org/10.3390/educsci12080569

*Salas-Pilco, S. Z., & Yang, Y. (2022). Artificial intelligence applications in Latin American higher education: A systematic review. International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-022-00326-w

*Sapci, A. H., & Sapci, H. A. (2020). Artificial Intelligence Education and Tools for Medical and Health Informatics Students: Systematic Review. JMIR Medical Education, 6 (1), e19285. https://doi.org/10.2196/19285

*Sghir, N., Adadi, A., & Lahmer, M. (2022). Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022). Education and Information Technologies, 28 , 8299–8333. https://doi.org/10.1007/s10639-022-11536-0

Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. A. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: Elaboration and explanation. BMJ (clinical Research Ed.), 350 , g7647. https://doi.org/10.1136/bmj.g7647

Shea, B. J., Reeves, B. C., Wells, G., Thuku, M., Hamel, C., Moran, J., Moher, D., Tugwell, P., Welch, V., Kristjansson, E., & Henry, D. A. (2017). Amstar 2: A critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (clinical Research Ed.), 358 , j4008. https://doi.org/10.1136/bmj.j4008

Sikström, P., Valentini, C., Sivunen, A., & Kärkkäinen, T. (2022). How pedagogical agents communicate with students: A two-phase systematic review. Computers & Education, 188 , 104564. https://doi.org/10.1016/j.compedu.2022.104564

Siontis, K. C., & Ioannidis, J. P. A. (2018). Replication, duplication, and waste in a quarter million systematic reviews and meta-analyses. Circulation Cardiovascular Quality and Outcomes, 11 (12), e005212. https://doi.org/10.1161/CIRCOUTCOMES.118.005212

*Sourani, M. (2019). Artificial Intelligence: A Prospective or Real Option for Education? Al Jinan الجنان, 11(1), 23. https://digitalcommons.aaru.edu.jo/aljinan/vol11/iss1/23

Stansfield, C., Stokes, G., & Thomas, J. (2022). Applying machine classifiers to update searches: Analysis from two case studies. Research Synthesis Methods, 13 (1), 121–133. https://doi.org/10.1002/jrsm.1537

Stern, C., & Kleijnen, J. (2020). Language bias in systematic reviews: You only get out what you put in. JBI Evidence Synthesis, 18 (9), 1818–1819. https://doi.org/10.11124/JBIES-20-00361

Sutton, A., Clowes, M., Preston, L., & Booth, A. (2019). Meeting the review family: Exploring review types and associated information retrieval requirements. Health Information and Libraries Journal, 36 (3), 202–222. https://doi.org/10.1111/hir.12276

Tamim, R. M., Bernard, R. M., Borokhovski, E., Abrami, P. C., & Schmid, R. F. (2011). What forty years of research says about the impact of technology on learning. Review of Educational Research, 81 (1), 4–28. https://doi.org/10.3102/0034654310393361

Thomas, J., Graziosi, S., Brunton, J., Ghouze, Z., O’Driscoll, P., Bond, M., & Koryakina, A. (2023). EPPI Reviewer: Advanced software for systematic reviews, maps and evidence synthesis [Computer software]. EPPI Centre Software. UCL Social Research Institute. London. https://eppi.ioe.ac.uk/cms/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4

Tran, L., Tam, D. N. H., Elshafay, A., Dang, T., Hirayama, K., & Huy, N. T. (2021). Quality assessment tools used in systematic reviews of in vitro studies: A systematic review. BMC Medical Research Methodology, 21 (1), 101. https://doi.org/10.1186/s12874-021-01295-w

Tricco, A. C., Lillie, E., Zarin, W., O’Brien, K. K., Colquhoun, H., Levac, D., Moher, D., Peters, M. D. J., Horsley, T., Weeks, L., Hempel, S., Akl, E. A., Chang, C., McGowan, J., Stewart, L., Hartling, L., Aldcroft, A., Wilson, M. G., Garritty, C., & Straus, S. E. (2018). Prisma Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Annals of Internal Medicine, 169 (7), 467–473. https://doi.org/10.7326/M18-0850

Tsou, A. Y., Treadwell, J. R., Erinoff, E., et al. (2020). Machine learning for screening prioritization in systematic reviews: Comparative performance of Abstrackr and EPPI-Reviewer. Systematic Reviews, 9 , 73. https://doi.org/10.1186/s13643-020-01324-7

*Ullrich, A., Vladova, G., Eigelshoven, F., & Renz, A. (2022). Data mining of scientific research on artificial intelligence in teaching and administration in higher education institutions: A bibliometrics analysis and recommendation for future research. Discover Artificial Intelligence . https://doi.org/10.1007/s44163-022-00031-7

*Urdaneta-Ponte, M. C., Mendez-Zorrilla, A., & Oleagordia-Ruiz, I. (2021). Recommendation Systems for Education: Systematic Review. Electronics, 10 (14), 1611. https://doi.org/10.3390/electronics10141611

*Williamson, B., & Eynon, R. (2020). Historical threads, missing links, and future directions in AI in education. Learning, Media & Technology, 45 (3), 223–235. https://doi.org/10.1080/17439884.2020.1798995

Woolf, B. P. (2010). Building intelligent interactive tutors: Student-centered strategies for revolutionizing e-learning. Morgan Kaufmann.

Wu, T., He, S., Liu, J., Sun, S., Liu, K., Han, Q. L., & Tang, Y. (2023). A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica, 10 (5), 1122–1136.

Yu, L., & Yu, Z. (2023). Qualitative and quantitative analyses of artificial intelligence ethics in education using VOSviewer and CitNetExplorer. Frontiers in Psychology, 14 , 1061778. https://doi.org/10.3389/fpsyg.2023.1061778

Zawacki-Richter, O. (2023). Umbrella Review in ODDE. Herbsttagung der Sektion Medienpädagogik (DGfE), 22 September 2023.

Zawacki-Richter, O., Kerres, M., Bedenlier, S., Bond, M., & Buntins, K. (Eds.). (2020). Systematic Reviews in Educational Research . Springer Fachmedien. https://doi.org/10.1007/978-3-658-27602-7

*Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education—where are the educators? International Journal of Educational Technology in Higher Education . https://doi.org/10.1186/s41239-019-0171-0

*Zhai, C., & Wibowo, S. (2023). A systematic review on artificial intelligence dialogue systems for enhancing English as foreign language students’ interactional competence in the university. Computers and Education: Artificial Intelligence, 4 , 100134. https://doi.org/10.1016/j.caeai.2023.100134

Zhang, Q., & Neitzel, A. (2023). Choosing the Right Tool for the Job: Screening Tools for Systematic Reviews in Education. Journal of Research on Educational Effectiveness . https://doi.org/10.1080/19345747.2023.2209079

*Zhang, W., Cai, M., Lee, H. J., Evans, R., Zhu, C., & Ming, C. (2023). AI in Medical Education: Global situation, effects and challenges. Education and Information Technologies . https://doi.org/10.1007/s10639-023-12009-8

Zheng, Q., Xu, J., Gao, Y., Liu, M., Cheng, L., Xiong, L., Cheng, J., Yuan, M., OuYang, G., Huang, H., Wu, J., Zhang, J., & Tian, J. (2022). Past, present and future of living systematic review: A bibliometrics analysis. BMJ Global Health . https://doi.org/10.1136/bmjgh-2022-009378

*Zhong, L. (2022). A systematic review of personalized learning in higher education: Learning content structure, learning materials sequence, and learning readiness support. Interactive Learning Environments . https://doi.org/10.1080/10494820.2022.2061006

*Zulkifli, F., Mohamed, Z., & Azmee, N. A. (2019). Systematic research on predictive models on students’ academic performance in higher education. International Journal of Recent Technology and Engineering, 8 (23), 357–363. https://doi.org/10.35940/ijrte.B1061.0782S319

Download references

This research has not received any funding.

Author information

Authors and affiliations.

EPPI Centre, University College London, London, UK

Melissa Bond

Knowledge Center for Education, University of Stavanager, Stavanger, Norway

National Institute of Teaching, London, UK

Melissa Bond, Violeta Negrea, Emily Oxley & Sin Wang Chong

Institute for Teaching and Learning Innovation, The University of Queensland, St Lucia, Australia

Hassan Khosravi

Centre for Change and Complexity in Learning, Education Futures, University of South Australia, Adelaide, Australia

Maarten De Laat, Phuong Pham & George Siemens

Halmstad University, Halmstad, Sweden

Nina Bergdahl

Stockholm University, Stockholm, Sweden

International Education Institute, University of St Andrew’s, St Andrews, UK

Sin Wang Chong

You can also search for this author in PubMed   Google Scholar

Contributions

MB, HK, MDL, PP and GS all contributed to the initial development of the review and were involved in the searching and screening stages. All authors except GS were involved in data extraction. MB, HK, MDL, NB, VN, EO, and GS synthesised the results and wrote the article, with editing suggestions also provided by PP and SWC.

Corresponding author

Correspondence to Melissa Bond .

Ethics declarations

Competing interests.

There are no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

Appendix A. List of studies in the corpus by thematic focus.

Additional file 2:

Appendix B. Types of evidence synthesis published in AIEd higher education.

Additional file 3:

Appendix C. Journals and conference proceedings.

Additional file 4:

Appendix D. Top 7 journals by evidence synthesis types.

Additional file 5:

Appendix E. Institutional affiliations.

Additional file 6:

Appendix F. Author disciplinary affiliation by evidence synthesis types.

Additional file 7:

Appendix G. Geographical distribution of authors.

Additional file 8:

Appendix H. Geographical distribution by evidence synthesis type.

Additional file 9:

Appendix I. Co-authorship and international research collaboration.

Additional file 10:

Appendix J. Digital evidence synthesis tools (DEST) used in AIHEd secondary research.

Additional file 11:

Appendix K. Quality assessment.

Additional file 12:

Appendix L. Benefits and Challenges identified in ‘General AIEd’ reviews.

Additional file 13:

Appendix M. Research Gaps.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Bond, M., Khosravi, H., De Laat, M. et al. A meta systematic review of artificial intelligence in higher education: a call for increased ethics, collaboration, and rigour. Int J Educ Technol High Educ 21 , 4 (2024). https://doi.org/10.1186/s41239-023-00436-z

Download citation

Received : 04 October 2023

Accepted : 13 December 2023

Published : 19 January 2024

DOI : https://doi.org/10.1186/s41239-023-00436-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial Intelligence
  • Evidence synthesis
  • Tertiary review
  • Research methods
  • Quality assessment
  • Intelligent tutoring systems
  • Adaptive systems
  • Personalisation
  • Automatic assessment

ai in education a systematic literature review

IGI Global

  • Get IGI Global News

US Flag

  • All Products
  • Book Chapters
  • Journal Articles
  • Video Lessons
  • Teaching Cases
  • Share with Librarian
  • Share with Colleague
  • Fair Use Policy
  • Access on Platform

Export Reference

Mendeley

  • e-Journal Collection
  • Education Knowledge Solutions e-Journal Collection
  • Computer Science and IT Knowledge Solutions e-Journal Collection
  • Business Knowledge Solutions e-Journal Collection

AI in Education: A Systematic Literature Review

AI in Education: A Systematic Literature Review

Introduction

The use of technology in education dates back to the emergence of 1 st generation computers and their subsequent updated versions (Schindler et al., 2017). Teachers were seen using computers in teaching, researching, and recording students’ grades and in doing other things. Similarly, students, among other things, made use of computers in studying, researching, and solving problems. Also, computers have been used as an educational resource (analogous to a library or laboratory), as well as a means for maintaining databases of student information. (Jones, 1985). The use of technology in education is far advanced with the emergence of artificial intelligence (AI); a system where machines are designed to mimic humans. Artificial Intelligence is “ the science and engineering of making intelligent machines ” or “ a machine that behaves in a way that could be considered intelligent if it was a human being .” (Mccarthy, 2007).

This expression Artificial Intelligence (AI) was first coined by John McCarthy at the Dartmouth Artificial intelligence conference in 1956. Leading researchers from different disciplines converged to discuss topics on the abstraction of content from sensory inputs, the relationship of randomness to creative thinking, and others that developed the concept around “thinking machines”. Most participants envisaged the possibility of computers having capabilities to mimic the intelligence of human beings, but their biggest question was how and when it would happen. Currently, Artificial Intelligence is developing and spreading over every part of the world at an alarming rate (Tegmark, 2015). It plays an increasingly important role in our daily life. As the introduction of AI and Machine learning is catching on with many people, its use in different devices, applications, and services are becoming widespread (Zawacki-Richter et al., 2019). Applications such as Google duplex (chat agent that can carry out specific verbal tasks, such as making a reservation or appointment, over the phone) and FaceApp, which uses AI to identify persons that are tagged in other photos in Facebook are some AI applications and services. Other intelligent appliances such as autonomous vacuum cleaners are examples of AI applications. As indicated earlier, the use of AI in education cannot be overemphasized. Yuki and Sophia the humanoid robot are examples of AI applications in education (Retto, 2017).

AI is broadly categorized into two domains: the weak or domain-specific, which focuses on specific problems; and the strong or general with the ability to perform general intelligent actions. (Berker, 2018). Stephen Hawking’s and other researchers have proposed that the use of strong AI may lead to chaos and destruction of mankind, other AI researchers have propounded that the emergence of AI in education might displace teachers. In the context of this paper, we refer to AI as the Soft AI since machines currently have not assumed the capabilities to perform general intelligent actions.

Studies mainly in the Developed Countries have concentrated on challenges in the disruption of AI in Education whiles, the opportunities and benefits of AI in education have received infinitesimal attention. This study is one of the few that provides an integrated overview of the opportunities, benefits, and challenges that Artificial intelligence (AI) adoption presents to the educational discipline. And complementing it with the Technological-Organizational- Environmental (TOE) theoretical framework as a lens in discussing the challenges in AI adoption in Education.

The objective of the study is to analyze the existing state of the art in AI technology in education by investigating the challenges, opportunities, and benefits of adopting AI in education. The study seeks to review relevant studies to understand the current research focus and provide an in-depth understanding of AI technology in education to guide educators and researchers in designing new educational models. The study will also serve as a reference for future research in related works.

This paper is structured as follows: Section 1 presents the introduction and background to the study, followed by section 2 with the state-of-the-art on the types of AI systems in education, the challenges and opportunities, and benefits of AI in education and TOE theoretical framework. Section 3 presents the research methodology for the literature review, then section 4, where discussions of the opportunities, benefit, and challenges of AI adoption based on the literature review will be presented with a discussion of the practical implications of the findings, and finally, section 5, concludes with the future research topic and limitations of the research.

Complete Article List

  • Open access
  • Published: 19 April 2023

AI literacy in K-12: a systematic literature review

  • Lorena Casal-Otero   ORCID: orcid.org/0000-0002-0906-4321 1 ,
  • Alejandro Catala   ORCID: orcid.org/0000-0002-3677-672X 2 , 3 ,
  • Carmen Fernández-Morante   ORCID: orcid.org/0000-0003-4398-3361 1 ,
  • Maria Taboada   ORCID: orcid.org/0000-0002-2353-596X 2 ,
  • Beatriz Cebreiro   ORCID: orcid.org/0000-0003-2064-915X 1 &
  • Senén Barro   ORCID: orcid.org/0000-0001-6035-540X 3  

International Journal of STEM Education volume  10 , Article number:  29 ( 2023 ) Cite this article

47k Accesses

74 Citations

28 Altmetric

Metrics details

The successful irruption of AI-based technology in our daily lives has led to a growing educational, social, and political interest in training citizens in AI. Education systems now need to train students at the K-12 level to live in a society where they must interact with AI. Thus, AI literacy is a pedagogical and cognitive challenge at the K-12 level. This study aimed to understand how AI is being integrated into K-12 education worldwide. We conducted a search process following the systematic literature review method using Scopus. 179 documents were reviewed, and two broad groups of AI literacy approaches were identified, namely learning experience and theoretical perspective. The first group covered experiences in learning technical, conceptual and applied skills in a particular domain of interest. The second group revealed that significant efforts are being made to design models that frame AI literacy proposals. There were hardly any experiences that assessed whether students understood AI concepts after the learning experience. Little attention has been paid to the undesirable consequences of an indiscriminate and insufficiently thought-out application of AI. A competency framework is required to guide the didactic proposals designed by educational institutions and define a curriculum reflecting the sequence and academic continuity, which should be modular, personalized and adjusted to the conditions of the schools. Finally, AI literacy can be leveraged to enhance the learning of disciplinary core subjects by integrating AI into the teaching process of those subjects, provided the curriculum is co-designed with teachers.

Introduction

In recent years, the convergence of huge computing power, massive amounts of data and improved machine learning algorithms have led to remarkable advances in Artificial Intelligence (AI) based technologies, which are set to be the most socially and economically disruptive technologies ever developed (Russell, 2021 ). The irruption of AI-based technology in our daily lives (e.g., robot vacuum cleaners, real-time location and search systems, virtual assistants, etc.) has generated a growing social and political interest in educating citizens about AI. The scientific community has also begun to engage in this education after detecting a significant gap in the understanding of AI, based on comments and fears expressed by citizens about this technology (West & Allen, 2018 ). Therefore, integrating AI into curricula is necessary to train citizens who must increasingly live and act in a world with a significant presence of AI.

It is worth noting that AI education addresses not only the learning of the scientific and technological foundations of AI, but also the knowledge and critical reflection on how a trustworthy AI should be developed and the consequences of not doing so. Hence, it is crucial to incorporate AI teaching from the earliest stages of education (Heintz, 2021 ). However, although some countries are making significant efforts to promote AI teaching in K-12 (Touretzky et al., 2019a ), this is being implemented through highly varied AI training experiences, such as data-driven design (Vartiainen et al., 2021 ), interactive data visualizations (Chittora & Baynes, 2020 ; von Wangenheim et al., 2021 ), virtual reality and robotics (Narahara & Kobayashi, 2018 ), games (Giannakos et al., 2020 ), or even based on combined workshop series (Lee et al., 2021 ). To date, there are very few methodological proposals on how to introduce the AI curriculum in K-12 education (Lee et al., 2020 ).

Since the development of a field requires prior research, we propose in this paper to identify and examine the way in which AI literacy is developing in K-12 around the world, to draw conclusions and guide teaching proposals for AI literacy in K-12. By highlighting and discussing the pros and cons of the different approaches and experiences in the literature, we aim to inspire new initiatives and guide the actors involved, from decisions-makers, for example in education policy, to teachers involved in their conception, design and implementation. We also hope to raise awareness of the importance of learning about AI from an early age, emphasizing the key aspects of this training and, hopefully, fueling the debate that needs to be fostered in our research community.

Integration of AI into the K12 curriculum

As a scientific-technological field, AI is just a few decades old. The name was coined in 1956, and since then different disciplines (such as computer science, mathematics, philosophy, neuroscience, or psychology) have contributed to its development from an interdisciplinary focus. AI is oriented to comprehend, model, and replicate human intelligence and cognitive processes into artificial systems. Currently, it covers a wide range of subfields such as machine learning, perception, natural language processing, knowledge representation and reasoning, computer vision, among many others (Russell & Norvig, 2021 ).

Starting in the 1970s, AI began to emerge in educational contexts through tools specifically designed to support learning, teaching, and the management of educational institutions. Since many jobs are now AI-related and will continue to increase in the coming years, some researchers believe that AI education should be considered as important as literacy in reading and writing (Kandlhofer et al., 2016 ). The highly interdisciplinary character is also another factor to consider. AI literacy can be defined as a set of skills that enable a solid understanding of AI through three priority axes: learning about AI, learning about how AI works, and learning for life with AI (Long & Magerko, 2020 ; Miao et al., 2021 ). The first axis focuses on understanding AI concepts and techniques to enable the recognition of which artifacts/platforms use AI and which do not. The second axis addresses the understanding of how AI works, to effectively interact with it. The third axis seeks to understand how AI can affect our lives, allowing us to critically evaluate its technology. Thus, AI literacy goes beyond the use of AI applications in education, such as Intelligent Tutoring Systems (ITS) (du Boulay, 2016 ).

The teaching of knowledge in AI has traditionally been carried out at the university level, focused on students who study disciplines closely related to computing and ICT in general. In recent years, AI learning has also started to be relevant both in university programs with diverse study backgrounds (Kong et al., 2021 ), as well as at the K-12 level (Kandlhofer & Steinbauer, 2021 ; Tedre et al., 2021 ). However, teaching AI at the K-12 level is not yet prevalent in formal settings and is considered challenging. Experts believe it is important to have some thought on what AI education should look like at the K-12 level so that future generations can become informed citizens who understand the technologies they interact with in their daily lives (Touretzky et al., 2019a ). Training children and teenagers will allow them to understand the basics of the science and technology that underpins AI, its possibilities, its limits and its potential social and economic impact. It also stimulates and better prepares them to pursue further studies related to AI or even to become creators and developers of AI themselves (Heintz, 2021 ).

Nowadays, research on AI teaching is still scarce (Chai et al., 2020a , 2020b ; Lee et al., 2020 ). The acquisition of knowledge in AI represents a great pedagogical challenge for both experts and teachers, and a cognitive challenge for students (Micheuz, 2020 ). Some countries are making significant efforts to promote AI education in K-12 (Touretzky et al., 2019b ), by developing relatively comprehensive curriculum guidelines (Yue et al., 2021 ). Through interviews with practitioners and policy makers from three different continents (America, Asia and Europe), some studies report on continuing works to introduce AI in K-12 education (He et al., 2020 ). Some other work focuses on examining and comparing AI curricula in several countries (Yue et al., 2021 ). In addition, there are a growing number of AI training experiences that explore pathways to optimize AI learning for K-12 students. However, most of them are somehow thematically limited as they do not adequately address key areas of AI, such as planning, knowledge representation and automated reasoning (Nisheva-Pavlova, 2021 ). Additionally, due to the rapid growth of AI, there is a need to understand how educators can best leverage AI techniques for the academic success of their students. Zhai et al. ( 2021 ) recommend that educators work together with AI experts to bridge the gap between technique and pedagogy.

Using a systematic review method, our research aims to present an overview of current approaches to understand how AI is taught worldwide. Several studies have conducted systematic reviews concerning applications of AI in education. Zhai et al. ( 2021 ) analyzed how AI was applied to the education domain from 2010 to 2020. Their review covers research on AI-based learning environments, from their construction to their application and integration in the educational environment. Guan et al. ( 2020 ) reviewed the main themes and trends in AI research in education over the past two decades. The authors found that research on the use of AI techniques to support teaching or learning has stood the test of time and that learner profiling models and learning analytics have proliferated in the last two decades. Ng et al. ( 2022 ) examined learner types, teaching tools and pedagogical approaches in AI teaching and learning, mainly in university computer science education. Chen et al. ( 2020 ) covered education enhanced by AI techniques aimed to back up teaching and learning. All these studies have focused on the main role that AI has played in educational applications over the last decades. However, in light of the recent need to consider how AI education should be approached at the K-12 level (Kandlhofer et al., 2016 ; Long & Magerko, 2020 ; Miao et al., 2021 ; Touretzky et al., 2019b ), it would be of great value to structure and characterize the different approaches used so far to develop AI literacy in K-12, as well as to identify research gaps to be explored. Recently, Yue et al. ( 2022 ) analyzed the main components of the pedagogical design in 32 empirical studies in K-12 AI education and Su et al. ( 2022 ) examined 14 learning experiences carried out in the Asian-Pacific region. These components included target audience, setting, duration, contents, pedagogical approaches to teaching, and assessment methods. Sanusi et al. ( 2022 ) reviewed research on teaching machine learning in K-12 from four perspectives: curriculum development, technology development, pedagogical development, and teacher training development. The findings of the study revealed that more studies are needed on how to integrate machine learning into subjects other than computer science. Crompton et al. ( 2022 ) carried out a systematic review on the use of AI as a supporting tool in K-12 teaching, which entails an interesting but narrower scope. Our study extends previous reviews on K-12 AI research by emphasizing how the current approaches are integrating AI literacy in K-12 education worldwide.

Research question

To begin the systematic review, a single research question (RQ) was formulated.

RQ: How are current approaches integrating AI literacy into K-12 education worldwide?

In essence, the RQ aims to investigate the characterization of the different approaches being employed to incorporate AI education in K-12. The following subsections in the methodology describe the search and the data collection process in such a way that an answer to the RQ can be provided in a replicable and objective fashion.

The research method chosen to conduct this research was the systematic literature review (SLR), following the guidelines posed by Kitchenham ( 2004 ). Accordingly, the following subsections summarize and document the key steps implemented in this research method.

Search process

We used Scopus to implement the search process. Scopus provides an integrated search facility to find relevant papers in its database based on curated metadata. It includes primary bibliographic sources published by Elsevier , Springer , ACM , and IEEE , among others. It provides a comprehensive coverage of journals and top-ranked conferences within fields of interest. We did not limit our search to specific journals or regular conference proceedings, as there is not yet a clearly established body of literature on the subject. All searches were performed based on title, keywords and abstract, and conducted between 21 October 2021 and 9 March 2023.

To decide the search string, we ran an initial search and found only a few papers focused on ‘literacy’ whereas the vast majority referred to the broader term ‘education’. Therefore, we decided to use both search terms (key issue 1 in Table 1 ). As some recent works combine the terms ‘Artificial Intelligence’ and ‘education’/’literacy’ into single terms such as ‘AI literacy’ or ‘AI education’, these were added to the search string (key issue 2 in Table 1 ). The educational stage was also included in the search string (key issue 3 in Table 1 ). As the search term ‘education’ also returns AI-based learning environments which are outside the scope of our review, we explicitly considered negated terms to leave out both computer-based learning and intelligent tutoring systems (key issue 4 in Table 1 ). A final decision was whether to use the term ‘Artificial Intelligence’ as a single umbrella term or to add narrower terms related to AI subfields (e.g., machine learning). After a preliminary inspection of a few relevant papers, we observed that such additional specific terms usually co-occur with the string ‘Artificial Intelligence’ in education, and they were therefore regarded as unnecessary. Thus, to capture the essence of our RQ and to build up the complete search string, we considered the search terms as shown in Table 1 . Eventually, this resulted in the following complete search string in Scopus:

TITLE-ABS-KEY ( ( ( ( literacy OR education) AND ( ( artificial AND intelligence))) OR ( "AI literacy" OR "AI education")) AND ( "primary school" OR "secondary school" OR k-12 OR "middle school"

OR "high school") AND NOT ( "computer-based learning") AND NOT ( "intelligent tutoring system")).

We included peer-reviewed papers published on topics related to literacy and education on AI at school. Then we excluded papers whose usage of AI was limited to 1) supporting computer-based learning only, with no focus on learning about AI; 2) supporting assessment/tutoring based on AI. We also excluded papers that targeted college students and those that were limited to K-12 programming/CS concepts as a prerequisite for learning about AI in the future. Following these inclusion and exclusion criteria, our search in Scopus returned an initial list of 750 documents. After we inspected the title, abstract, keywords and full-text screening, we obtained a final list of 179 documents.

Data collection extraction and synthesis strategy

Data collection extraction was performed, discussed, and coordinated through regular meetings. After inspecting and discussing 10% of the papers over multiple meetings, the authors agreed on the annotations presented in Table 2 . This process is important as it allowed us to build a data annotation scheme empirically emerging from the sampled papers. A copy of the papers was also kept for easy review in case of doubts or disagreements.

The data resulted in a spreadsheet with the metadata of the papers which passed the inclusion and exclusion criteria, and a document with the list of paper IDs together with the rest of annotations. Some Python scripts were used to further process metadata (e.g., counting participating countries, frequencies, etc.) and produce a more complete bibliographic report with histograms and overview counting. A more qualitative analysis was carried out to answer the research question based on paper reading and annotations.

The results were organized into two subsections. The first subsection is a bibliometric analysis of the reviewed studies, which is based on the metadata provided by Scopus. The second subsection provides a qualitative analysis of the studies, which is based on the extracted data annotations (see Table 2 ). Both analyses are complementary and together deliver a better understanding of the research articles retrieved.

Bibliometric analysis

Figure  1 shows that the annual scientific production has been modest. It gained traction in 2016 and increased sharply in 2020.

figure 1

Annual scientific production: number of papers by year

Most of the contributions are conference publications (126 papers), while 52 are journal articles and one is a book chapter (Fig.  2 ).

figure 2

Type of contributions: number of papers by type

Eighty out of 179 papers have at least a citation in Scopus. There are 13 papers that have 10 or more citations, and the most cited papers are Long and Magerko ( 2020 ) and Touretzky et al. ( 2019b ). Figure  3 summarizes the number of contributions by publishers, where Springer, IEEE and ACM stand out, followed by Elsevier. As for journals, there are no single journals concentrating the publication of articles. Nevertheless, there are some journals that are especially relevant and well-known by the community such as the International Journal of Child-Computer Interaction, Computers and Education: Artificial Intelligence, International Journal of Artificial Intelligence in Education, or IEEE Transactions on Education.

figure 3

Frequency of publishers: number of papers by publisher

As for conferences, Fig.  4 summarizes the main conference events where papers are published. It includes flagship conferences Footnote 1 such as CHI and AAAI, top-ranked conferences such as HRI or SIGCSE and several noteworthy events (IDC, ICALT, ITiCSE, VL/HCC, to name a few). It is worth mentioning that AAAI is receiving contributions from recent years, which confirms the interest in the field in broadening the discussion to education. There are some additional publications associated with satellite AAAI events, such as workshops in CEUR-WS that deal with the issue under study. Although such contributions may sometimes be short, we decided to include them as they were relevant. For instance, the works published in (Herrero et al., 2020 ) and (Micheuz, 2020 ) include the German countrywide proposal for educating about AI, through a 6-module course focusing on explaining how AI works, the social discourse on AI and reducing existing misconceptions. On the other hand, Aguar et al. ( 2016 ) talk about teaching AI via an optional course which does not contribute to the final grades.

figure 4

Main conference events: number of papers by conference

The analysis did not reveal particularly outstanding institutions (see Table 3 for a summary). Among the 299 affiliated institutions, we mostly find universities and research centers along with a few collaboration associations. The most active institutions are the Chinese University of Hong Kong, University of Eastern Finland and MIT, whose authors participated in a total of 19, 11 and 10 contributions, respectively.

Finally, the retrieved papers were co-authored by 643 different authors affiliated to research institutions from 42 countries. Figure  5 shows the histogram of participation by country. Of the 179 papers reviewed, most papers were written by authors affiliated with institutions in the same country. Only 32 papers involved authors from several countries. It is remarkable that in these cases at least one author is from the US, Hong Kong or China.

figure 5

Country participation: number of papers by country

Literature analysis

By analyzing the data extracted, the papers were classified into two broad thematic categories according to the type of educational approach, namely, learning experience and theoretical perspective. The first category covers AI learning experiences focused on understanding a particular AI concept/technique or using specific tools/platforms to illustrate some AI concepts. The second category involves initiatives for the implementation of AI education for K-12 through the development of guidelines, curriculum design or teacher training, among others. Each main category was further subdivided into other subcategories to structure the field and characterize the different approaches used in developing AI literacy in K-12. Figure  6 shows all the identified categories and subcategories.

figure 6

Taxonomy of approaches to AI learning in K-12

Learning experiences focused on understanding AI

This category covers learning experiences aimed at experimenting and becoming familiar with AI concepts and techniques. Based on the priority axes in AI literacy (Long & Magerko, 2020 ; Miao et al., 2021 ), we identified experiences aimed at acquiring basic AI knowledge to recognize artifacts using AI, learning how AI works, learning tools for AI and learning to live with AI.

Learning to recognize artifacts using AI

This subcategory refers to experiences that aim to understand AI concepts and techniques enabling the recognition of which artifacts/platforms use AI and which do not. Four studies were found in this subcategory. They are proposals aimed at helping young people to understand and demystify AI through different types of activities. These activities included conducting discussions after watching AI-related movies (Tims et al., 2012 ), carrying out computer-based simulations of human-like behaviors (Ho et al., 2019 ), experimenting as active users of social robots (Gonzalez et al., 2017 ) and programming AI-based conversational agents (Van Brummelen et al., 2021b ).

Learning about how AI works

This topic covers proposals designed to understand how AI works to make user interaction with AI easier and more effective. In this type of proposal, the focus is on methodology and learning is achieved through technology (Kim et al., 2023 ). The objective is to provide a better understanding of a particular aspect of reality in order to carry out a project or solve a problem (Lenoir & Hasni, 2016 ). The activities are supported by active experiences based on building and creating intelligent devices to achieve the understanding of AI concepts following the idea of Papert’s constructionism.

These experiences are mainly focused on teaching AI subfields such as ML or AI algorithms applied to robotics. Understanding the principles of ML, its workflows and its role in everyday practices to solve real-life problems has been the main objective of some studies (Burgsteiner et al., 2016 ; Evangelista et al., 2019 ; Lee et al., 2020 ; Sakulkueakulsuk et al., 2019 ; Vartiainen et al., 2021 ). In addition, there are also experiences focused on unplugged activities that simulate AI algorithms. For example, through classic games such as Mystery Hunt, one can learn how to traverse a graph without being able to see beyond the next path to be traversed (blind search) (Kandlhofer et al., 2016 ). Similarly, the AI4K12 initiative (Touretzky et al., 2019b ) collects a large set of activities and resources to simulate AI algorithms.

Learning tools for AI

This topic includes approaches that involve learning about AI support tools. The development of intelligent devices in the context of teaching AI requires specific programming languages or age-appropriate tools. Many of the tools currently available are focused on ML, with the aim of demystifying this learning in K-12 education (Wan et al., 2020 ). Some of them are integrated into block-based programming languages (such as Scratch or App Inventor) (Toivonen et al., 2020 ; von Wangenheim et al., 2021 ), enabling the deployment of the ML models built into games or mobile applications. Other approaches use data visualization and concepts of gamification to engage the student in the learning process (Reyes et al., 2020 ; Wan et al., 2020 ) or combine traditional programming activities with ML model building (Rodríguez-García et al., 2020 ).

This type of proposal aims to introduce AI through tools that enable the use of AI techniques. It is therefore an approach focused on learning by using AI-oriented tools. In this vein, different experiences have focused on learning programming tools for applications based on Machine Learning (Reyes et al., 2020 ; Toivonen et al., 2020 ; von Wangenheim et al., 2021 ; Wan et al., 2020 ), robotics (Chen et al., 2017 ; Eguchi, 2021 ; Eguchi & Okada, 2020 ; Holowka, 2020 ; Narahara & Kobayashi, 2018 ; Nurbekova et al., 2018 ; Verner et al., 2021 ), programming and the creation of applications (Chittora & Baynes, 2020 ; Giannakos et al., 2020 ; Kahn et al., 2018 ; Kelly et al., 2008 ; Park et al., 2021 ). Some of these tools use Scratch-based coding platforms to make AI-based programming attractive to children. In (Kahn et al., 2018 ), students play around with machine learning to classify self-captured images, using a block-based coding platform.

There are also experiences in which other types of environments are used to facilitate learning (Aung et al., 2022 ). In (Holowka, 2020 ; Verner et al., 2021 ), students can learn reinforcement learning through online simulation. In (Narahara & Kobayashi, 2018 ), a virtual environment helps students generate data in a playful setting, which is then used to train a neural network for the autonomous driving of a toy car-lab. In (Avanzato, 2009 ; Croxell et al., 2007 ), students experiment with different AI-based tasks through robotics-oriented competitions.

Learning for life with AI

This subcategory covers experiences aimed at understanding how AI can affect our lives thus providing us with skills to critically assess its technology. In (Vachovsky et al., 2016 ), technically rigorous AI concepts are contextualized through the impact on society. There are also experiences where students explore how a robot equipped with AI components can be used in society (Eguchi & Okada, 2018 ), program conversational agents (Van Brummelen et al., 2021b ), or learn to recognize credible but fake media products (video, photos), which have been generated using AI-based techniques ( 2021b ; Ali et al., 2021a ).

The ethical and philosophical implications of AI have also been addressed in some experiences ( 2021b ; Ali et al., 2021a ; Ellis et al., 2005 ), whereas others focus on training students to participate in present-day society and become critical consumers of AI (Alexandre et al., 2021 ; Cummings et al., 2021 ; Díaz et al., 2015 ; Kaspersen et al., 2022 ; Lee et al., 2021 ; Vartiainen et al., 2020 ).

Proposals for implementation of AI learning at the K-12 level

Some countries are making efforts to promote AI education in K-12. In the U.S., intense work is being carried out on the integration of AI in schools and among these schemes, AI4K12 stands out (Heintz, 2021 ). This scheme is especially interesting since it defines the national guidelines for future curricula, highlighting the essential collaborative work between developers, teachers and students (Touretzky et al., 2019a ). This idea of co-creation is also stressed in other schemes (Chiu, 2021 ). In the U.S. we can also mention the proposal made by the Massachusetts Institute of Technology, which is an AI curriculum that aims to engage students with its social and ethical implications (Touretzky et al., 2019a ). Although the United States is working intensively on the design of integrating this knowledge into the curriculum, so far AI is not widely offered in most K-12 schools (Heintz, 2021 ).

In China, the Ministry of Education has integrated AI into the compulsory secondary school curriculum (Ottenbreit-Leftwich et al., 2021 ; Xiao & Song, 2021 ). Among their schemes we can reference the AI4Future initiative of the Chinese University of Hong Kong (CUHK), which promotes the co-creation process to implement AI education (Chiu et al., 2021 ). In Singapore, a program for AI learning in schools has also been developed, where K-12 children learn AI interactively. However, the program is hindered by a lack of professionals (teachers) with adequate training (Heintz, 2021 ). In Germany, there are also several initiatives to pilot AI-related projects and studies (Micheuz, 2020 ), including the launch of a national initiative to teach a holistic view of AI. This initiative consists of a 6-module course aimed at explaining how AI works, stimulating a social discourse on AI and clarifying the abundant existing misconceptions (Micheuz, 2020 ). Canada has also designed an AI course for high schools. The course is intended to empower students with knowledge about AI, covering both its philosophical and conceptual underpinnings as well as its practical aspects. The latter are achieved by building AI projects that solve real-life problems (Nisheva-Pavlova, 2021 ).

The literature also highlights the different approaches that AI literacy should focus on: curriculum design, AI subject design, student perspective, teacher training, resource design and gender diversity. All these approaches are described in depth below.

AI literacy curriculum design

Approaches to curriculum development differ widely, ranging from the product-centered model (technical-scientific perspective) to the process-centered model (learner perspective) (Yue et al., 2021 ). AI literacy can be launched in primary and secondary education depending on the age and computer literacy of the students. To do this, it is necessary to define the core competencies for AI literacy according to three dimensions: AI concepts, AI applications and AI ethics and security (Long & Magerko, 2020 ; Wong et al., 2020 ). Research has focused on the understanding of the concepts, the functional roles of AI, and the development of problem-solving skills (Woo et al., 2020 ). This has led to proposing a redefinition of the curriculum (Han et al., 2019 ; Malach & Vicherková, 2020 ; Zhang et al., 2020 ) supported by different ideas that K-12 students should know (Chiu et al., 2021 ; Sabuncuoglu, 2020 ; Touretzky et al., 2019b ). Several countries have already made different curricular proposals (Alexandre et al., 2021 ; Micheuz, 2020 ; Nisheva-Pavlova, 2021 ; Ottenbreit-Leftwich et al., 2021 ; Touretzky et al., 2019b ; Xiao & Song, 2021 ), where they argue that the curricular design must include different elements such as content, product, process and praxis (Chiu, 2021 ). It is also convenient for learning in AI to follow the computational thinking model (Shin, 2021 ), contextualizing the proposed curriculum (Eguchi et al., 2021 ; Wang et al., 2020 ) and providing it with the necessary resources for teachers (Eguchi et al., 2021 ). In this sense, emerging initiatives highlight the need to involve teachers in the process of co-creating a curriculum associated to their context (Barlex et al., 2020 ; Chiu et al., 2021 ; Dai et al., 2023 ; Lin & Brummelen, 2021 ; Yau et al., 2022 ).

AI as a subject in K-12 education

Traditionally, including computer science or new technologies in the educational system has been carried out through a specific subject integrated into the curriculum or through the offer of extracurricular activities. In this sense, different proposals have suggested the integration of AI as a subject in K-12 education (Ellis et al., 2009 ; Knijnenburg et al., 2021 ; Micheuz, 2020 ; Sperling & Lickerman, 2012 ), in short-term courses (around 15 h) and divided into learning modules focused on classical and modern AI (Wong, 2020 ) or through MOOCs (Alexandre et al., 2021 ).

Student perspective on AI Literacy

Student-focused studies explore and analyze attitudes and previous knowledge to make didactic proposals adapted to the learner. Some of them measure their intention and interest in learning AI (Bollin et al., 2020 ; Chai et al., 2021 , 2020a , 2020b ; Gao & Wang, 2019 ; Harris et al., 2004 ; Sing, et al., 2022 ; Suh & Ahn, 2022 ), whereas others discuss their views on the integration of technologies in the education system (Sorensen & Koefoed, 2018 ) and on teaching–learning support tools in AI (Holstein et al., 2019 ).

Teacher training in AI

Teachers are key players for the integration of AI literacy in K-12, as proven by the numerous studies that examine this issue (An et al., 2022 ; Bai & Yang, 2019 ; Chiu & Chai, 2020 ; Chiu et al., 2021 ; Chounta et al., 2021 ; Judd, 2020 ; Kandlhofer et al., 2019 , 2021 ; Kim et al., 2021 ; Korenova, 2016 ; Lin et al., 2022 ; Lindner & Berges, 2020 ; Oh, 2020 ; Summers et al., 1995 ; Wei et al., 2020 ; Wu et al., 2020 ; Xia & Zheng, 2020 ). This approach places teachers at the center, bearing in mind what they need to know so as to integrate AI into K-12 (Itmazi & Khlaif, 2022 ; Kim et al., 2021 ). The literature analyzed reports on the factors that influence the knowledge of novice teachers (Wei, 2021 ) and focuses on teacher training in AI (Lindner & Berges, 2020 ; Olari & Romeike, 2021 ). Thus, AI training proposals can be found aimed at both teachers in training (Xia & Zheng, 2020 ) and practicing educators. Training schemes focus on their knowledge in technologies to facilitate their professional development (Wei et al., 2020 ) through the TPACK (Technological, Pedagogical and Content Knowledge) teaching knowledge model (Gutiérrez-Fallas & Henriques, 2020 ). Studies focusing on teachers’ opinions on curriculum development in AI are relevant (Chiu & Chai, 2020 ), as are their self-efficacy in relation to ICT (Wu et al., 2020 ), their opinions on the tools that support the teaching–learning process in AI (Holstein et al., 2019 ) and their teacher training in technologies (Cheung et al, 2018 ; Jaskie et al., 2021 ). These elements are central to the design of an AI literacy strategy in K-12. Both the co-design of ML curricula between AI researchers and K-12 teachers, and the assessment of the impact of these educational interventions on K-12 are important issues today. At present, there is a shortage of teachers with training in AI and working with teachers in training (Xia & Zheng, 2020 ) or with teachers in schools (Chiu et al., 2021 ) is proposed as an effective solution. One of the most interesting analyses of teacher competency proposes the acquisition of this skill for the teaching of AI in K-12, through the analysis of the curricula and resources of AI using TPACK. This model was formulated by (Mishra & Koehler, 2006 ) and aims to define the different types of knowledge that teachers need to integrate ICT effectively in the classroom. In this regard, it is suggested that teachers imparting AI to K-12 students require TPACK to build an environment and facilitate project-based classes that solve problems using AI technologies (Kim et al., 2021 ).

AI literacy support resources

Research using this approach focuses on presenting resources that support AI literacy (Kandlhofer & Steinbauer, 2021 ), considering that the creation of resources and repositories is a priority in supporting this teaching–learning process (Matarić et al., 2007 ; Mongan & Regli, 2008 ). However, these resources largely do not meet an interdisciplinary approach and do not embody a general approach to AI development (Sabuncuoglu, 2020 ).

Gender diversity in AI literacy

AI education, as a broad branch of computer science, also needs to address the issue of gender diversity. Lack of gender diversity can impact the lives of the people for whom AI-based systems are developed. The literature highlights the existence of proposals designed with a perspective toward gender, where the activities designed are specifically aimed at girls (Ellis et al., 2009 ; Jagannathan & Komives, 2019 ; Perlin et al., 2005 ; Summers et al., 1995 ; Vachovsky et al., 2016 ; Xia et al., 2022 ).

The huge impact that AI is having on our lives, at work and in every type of organization and business sector is easily recognizable today. No one doubts that AI is one of the most disruptive technologies in history, if not the most. In recent years, the expectations generated by AI, far from being deflated, have only grown. We are still a long way from general-purpose AI, but the application of AI to solve real problems has already taken hold for a wide range of purposes. It is therefore necessary for young people to know how AI works, as this learning will make it easier for them to use these technologies in their daily lives, both to learn and to interact with others.

Like any other technology, the potential uses and abuses of AI go hand in hand with its disruptive capacity. Many social groups and governments are expressing concern about the possible negative consequences of AI misuse. Although it is crucial to adequately regulate the use of AI, education is as important, if not more important, than regulation. Everything, whether good or bad, stems from the education received. Thus, education systems must prepare students for a society in which they will have to live and interact with AI. AI education will enable young people to discover how these tools work and, consequently, to act responsibly and critically. Therefore, AI literacy has become a relevant and strategic issue (Chiu & Chai, 2020 ).

This systematic review has focused on analyzing AI teaching–learning proposals in K-12 globally. The results confirm that the teaching of basic AI- related concepts and techniques at the K-12 level is scarce (Kandlhofer et al., 2016 ). Our work shows that there have been, on the one hand, different AI learning experiences and, on the other hand, proposals for the implementation of AI literacy, made at the political level and by different experts. The learning experiences described show that AI literacy in schools has focused on technical, conceptual, and applied skills in some domains of interest. Proposals for AI implementation, especially those defined by the US and China, reveal that significant efforts are being made to design models that frame AI literacy proposals.

We also found that there are hardly any AI learning experiences that have analyzed learning outcomes, e.g., through assessments of learners’ understanding of AI concepts. Obviously, this is a result of the infancy of these AI learning experiences at the K-12 level. However, it is important for learning experiences to be based on clearly defined competencies in a particular AI literacy framework, such as those proposed in the literature (Alexandre et al., 2021 ; Han et al., 2019 ; Long & Magerko, 2020 ; Malach & Vicherková, 2020 ; Micheuz, 2020 ; Ottenbreit-Leftwich et al., 2021 ; Touretzky et al., 2019a ; Wong et al., 2020 ; Xiao & Song, 2021 ; Zhang et al., 2020 ). Recently, Van Brummelen et al. ( 2021a ) designed a curriculum for a five-day online workshop based on the specific AI competencies proposed by Long and Magerko ( 2020 ). They used several types of questionnaires to assess the quality of the program through the knowledge acquired by the students in these competencies. Therefore, clearly defined competency-based learning experiences can provide a rigorous assessment of student learning outcomes.

The research shows that clear guidelines are needed on what students are expected to learn about AI in K-12 (Chiu, 2021 ; Chiu & Chai, 2020 ; Lee et al., 2020 ). These studies highlight the need for a competency framework to guide the design of didactic proposals for AI literacy in K-12 in educational institutions. This framework would provide a benchmark for describing the areas of competency that K-12 learners should develop and which specific educational projects can be designed. Furthermore, it would support the definition of a curriculum reflecting sequence and academic continuity (Woo et al., 2020 ). Such a curriculum should be modular and personalized (Gong et al., 2019 ) and adjusted to the conditions of the schools (Wang et al., 2020 ). In the teaching of AI, an exploratory education should be adopted, which integrates science, computer science and integral practice (Wang et al., 2020 ). It should also address issues related to the ethical dimension, which is fundamental to the literacy of K-12 students as it enables them to understand the basic principles of AI (Henry et al., 2021 ). This training facilitates the development of students’ critical capacity, and this is necessary to understand that technology is not neutral and to benefit from and make appropriate use of it. Ethics, complementary to legal norms, enhances the democratic quality of society by setting legitimate limits in the shaping of technological life. In this sense, different AI literacy proposals in K-12 already support the addressing of ethical, social and security issues linked to AI technologies (Eguchi et al., 2021 ; Micheuz, 2020 ; Wong et al., 2020 ). Moreover, considering designing for social good could foster or help to motivate learning about AI (Chai et al., 2021 ). Without a doubt, all this will impact on the achievement of a more democratic society. Due to the gender gap in issues related to computer science, it is also necessary to address the gender perspective. In this vein, the research proposes, among other strategies, to focus AI literacy on real-world elements since this approach favors the motivation of girls and greater involvement in learning (Jagannathan & Komives, 2019 ). However, little attention is paid to the undesirable consequences of an indiscriminate and insufficiently thought-out application of AI, both in higher education and especially in K-12. For example, the increase in socio-economic inequality between countries and within countries, resulting from the increasing automation of employment, is of particular concern. This is leading to growing inequality in wages and preservation of human employment, but it is not usually a subject of interest in education.

Currently, the challenges of this AI literacy require an interdisciplinary and critical approach (Henry et al., 2021 ). We believe that AI literacy can be leveraged to enhance the learning of disciplinary core subjects by integrating AI into the teaching process of those subjects. AI literacy should rely on transferring AI knowledge and methods to core subjects, allowing education to cross disciplinary boundaries, but staying within the framework of disciplinary core subjects. To achieve this change, educators need to take a closer look at the current capabilities of AI. This would enable them to identify all options to improve the core of educational practice and thus optimize the educational process. For example, understanding and using word clouds is a powerful educational strategy to enhance education in core subjects such as science (e.g., to facilitate object classification), language (e.g. to enable the matching of different topics or authors’ works), music (e.g., to support the analysis of song lyrics) or social sciences (e.g., to assist in comparing different discourses). Since AI is highly interdisciplinary in nature, it has a broad projection on multiple fields and problems that require a transversal and applied approach. For example, the basic algorithms of ML could be taught in Mathematics and related disciplines, the design of supervised classifiers could be performed for the study of taxonomies in Biology, natural language processing could be used to make the study of a language more attractive, or the ethical issues surrounding AI could be discussed in Philosophy and Social Sciences subjects.

Finally, for this meaningful learning to take place, AI teaching must be addressed through holistic, active, and collaborative pedagogical strategies in which real problem solving is the starting point of the learning process. An important gap regarding the integration of AI in K-12 concerns teachers, as it is unclear how to prepare and involve them in the process (Chiu & Chai, 2020 ). Teachers’ attitudes towards AI have a significant influence on the effectiveness of using AI in education. Teachers can swing between total resistance and overconfidence. The first could arise from inadequate, inappropriate, irrelevant, or outdated professional development. On the one hand, teachers must be digitally-competent enough to integrate AI into the teaching–learning processes of their subjects. Therefore, teacher training is also necessary following a framework of standard competencies. This should include new ways of organizing the professional role of teachers, as well as enhancing students’ attitudes towards these changes. On the other hand, research reveals that it is essential for didactic proposals to be co-designed and implemented by the teachers at those schools involved (Henry et al., 2021 ), to undergo training in the specific AI subjects and for this knowledge to be integrated into non-computer subjects (Lin & Brummelen, 2021 ). To this end, it is crucial to identify the perception and knowledge that teachers have about AI and involve them in the design of curricular proposals (Chiu, 2021 ; Chiu & Chai, 2020 ; Chiu et al., 2021 ).

This study aimed to understand how AI literacy is being integrated into K-12 education. To achieve this, we conducted a search process following the systematic literature review method and using Scopus. Two broad groups of AI literacy approaches were identified, namely learning experiences and theoretical perspective. The study revealed that learning experiences in schools have focused mainly on technical and applied skills limited to a specific domain without rigorously assessing student learning outcomes. In contrast, the US and China are leading the way in AI literacy implementation schemes which are broader in scope and involve a more ambitious approach. However, there is still a need to test these initiatives through comprehensive learning experiences that incorporate an analysis of learning outcomes. This work has allowed us to draw several conclusions that can be considered in the design of AI literacy proposals in K-12. Firstly, AI literacy should be based on an interdisciplinary and competency-based approach and integrated into the school curriculum. There is no need to include a new AI subject in the curriculum, but rather to build on the competencies and content of disciplinary subjects and then integrate AI literacy into those subjects. Given the interdisciplinary nature of AI, AI education can break disciplinary boundaries and adopt a global, practical, and active approach in which project-based and contextualized work plays an important role. Secondly, AI literacy should be leveraged to extend and enhance learning in curricular subjects. As a final point, AI literacy must prioritize the competency of teachers and their active participation in the co-design of didactic proposals, together with pedagogues and AI experts.

Availability of data and materials

Last revision round required update the review. Thus, Additional file 1 contains a.csv file with the listing of papers that are not cited but are part of the reviewed papers. The papers cited in text already appear in the Reference section and, therefore, not in the Additional file.

1 Conference categorization and ranking based on the GII-GRIN-SCIE (GGS) Conference Ratings: https://scie.lcc.uma.es/

Aguar, K., Arabnia, H. R., Gutierrez, J. B., Potter, W. D., & Taha, T. R. (2016). Making cs inclusive: An overview of efforts to expand and diversify cs education. In International Conference on Computational Science and Computational Intelligence (CSCI). (pp. 321–326). https://doi.org/10.1109/CSCI.2016.0067

Alexandre, F., Becker, J., Comte, M. H., Lagarrigue, A., Liblau, R., Romero, M., & Viéville, T. (2021). Why, What and How to help each citizen to understand artificial intelligence? KI Kunstliche Intelligenz, 35 (2), 191–199. https://doi.org/10.1007/s13218-021-00725-7

Article   Google Scholar  

Ali, S., DiPaola, D., Lee, I., Hong, J., & Breazeal, C. (2021a). Exploring generative models with middle school students. In Proceedings of the 2021a CHI Conference on Human Factors in Computing Systems. (pp. 1–13). https://doi.org/10.1145/3411764.3445226

Ali, S., DiPaola, D., Lee, I., Sindato, V., Kim, G., Blumofe, R., & Breazeal, C. (2021b). Children as creators, thinkers and citizens in an AI-driven future. Computers and Education Artificial Intelligence . https://doi.org/10.1016/j.caeai.2021.100040

An, X., Chai, C. S., Li, Y., Zhou, Y., Shen, X., Zheng, C., & Chen, M. (2022). Modeling English teachers’ behavioral intention to use artificial intelligence in middle schools. Education and Information Technologies . https://doi.org/10.1007/s10639-022-11286-z

Aung, Z. H., Sanium, S., Songsaksuppachok, C., Kusakunniran, W., Precharattana, M., Chuechote, S., & Ritthipravat, P. (2022). Designing a novel teaching platform for AI: A case study in a Thai school context. Journal of Computer Assisted Learning, 38 (6), 1714–1729. https://doi.org/10.1111/jcal.12706

Avanzato, R. L. (2009). Autonomous Outdoor Mobile Robot Challenge. Computer in Education Journal (July-September 2009).

Bai, H., & Yang, S. (2019, October). Research on the Sustainable Development Model of Information Technology Literacy of Normal Students Based on Deep Learning Recommendation System. In 2019 4th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). (pp. 837–840). https://doi.org/10.1109/ICMCCE48743.2019.00192

Barlex, D., Steeg, T., & Givens, N. (2020). Teaching about disruption: A key feature of new and emerging technologies. Learning to Teach Design and Technology in the Secondary School, 4 , 137–154. https://doi.org/10.4324/9780429321191-9

Bollin, A., Kesselbacher, M., & Mößlacher, C. (2020). Ready for computing science? A closer look at personality, interests and self-concept of girls and boys at secondary level. In  Informatics in Schools. Engaging Learners in Computational Thinking: 13th International Conference, ISSEP. (pp. 107–118). https://doi.org/10.1007/978-3-030-63212-0_9

Burgsteiner, H., Kandlhofer, M., & Steinbauer, G. (2016). IRobot: teaching the basics of artificial intelligence in high schools. Proceedings of the AAAI Conference on Artificial Intelligence . https://doi.org/10.1609/aaai.v30i1.9864

Chai, C.S., Lin, P.-Y., Jong, M.S.-Y., Dai, Y., Chiu, T.K., & Huang, B. (2020a). Factors influencing students' behavioral intention to continue artificial intelligence learning. In 2020a International Symposium on Educational Technology (ISET). (pp. 147–150).  https://doi.org/10.1109/ISET49818.2020.00040

Chai, C. S., Wang, X., & Xu, C. (2020b). An extended theory of planned behavior for the modelling of chinese secondary school students’ intention to learn artificial intelligence. Mathematics, 8 (11), 1–18. https://doi.org/10.3390/math8112089

Chai, C.S., Lin, P.-Y., Jong, M.S.-Y., Dai, Y., Chiu, T.K., & Qin, J. (2021). Perceptions of and behavioral intentions towards learning artificial intelligence in primary school students. Educational Technology & Society , 24 (3), 89–101. Retrieved from https://www.jstor.org/stable/27032858

Chen, S., Qian, B., & Cheng, H. (2017). Voice recognition for STEM education using robotics. In Volume 9: 13th ASME/IEEE International Conference on Mechatronic and Embedded Systems and Applications. ASME . https://doi.org/10.1115/DETC2017-68368

Chen, M., Zhou, C., & Wu, Y. (2020). Research on the model and development status of information literacy self-improvement ability of primary and secondary school teachers. In Ninth International Conference of Educational Innovation through Technology (EITT). (pp. 87–91). https://doi.org/10.1109/EITT50754.2020.00021

Cheung, S. K., Lam, J., Li, K. C., Au, O., Ma, W. W., & Ho, W. S. (Eds.). (2018).  Technology in Education. Innovative Solutions and Practices: Third International Conference, ICTE 2018. Springer.

Chittora, S., & Baynes, A. (2020, October). Interactive Visualizations to Introduce Data Science for High School Students. In  Proceedings of the 21st Annual Conference on Information Technology Education.  (pp. 236–241). https://doi.org/10.1145/3368308.3415360

Chiu, T. K. F. (2021). A holistic approach to the design of artificial intelligence (AI) education for k-12 schools. TechTrends, 65 (5), 796–807. https://doi.org/10.1007/s11528-021-00637-1

Chiu, T. K., & Chai, C.-S. (2020). Sustainable curriculum planning for artificial intelligence education: A self-determination theory perspective. Sustainability (switzerland) . https://doi.org/10.3390/su12145568

Chiu, T. K., Meng, H., Chai, C. S., King, I., Wong, S., & Yam, Y. (2021). Creation and evaluation of a pretertiary artificial intelligence (AI) curriculum. IEEE Transactions on Education, 65 (1), 30–39. https://doi.org/10.1109/TE.2021.3085878

Chounta, I.-A., Bardone, E., Raudsep, A., & Pedaste, M. (2021). Exploring teachers’ perceptions of artificial intelligence as a tool to support their practice in estonian k-12 education. International Journal of Artificial Intelligence in Education, 32 , 725–755. https://doi.org/10.1007/s40593-021-00243-5

Crompton, H., Jones, M. V., & Burke, D. (2022). Affordances and challenges of artificial intelligence in K-12 education: a systematic review. Journal of Research on Technology in Education . https://doi.org/10.1080/15391523.2022.2121344

Croxell, J., Mead, R., & Weinberg, J. (2007). Designing robot competitions that promote ai solutions: Lessons learned competing and designing. Technical Report of the 2007 American Association of Artificial Intelligence. Spring Symposia , SS-07–09. (pp. 29–34).

Cummings D., Anthony M., Watson C., Watson A., & Boone S. (2021). Combating social injustice and misinformation to engage minority youth in computing sciences. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. (pp.1006–1012). https://doi.org/10.1145/3408877.3432452 .

Dai, Y., Liu, A., Qin, J., Guo, Y., Jong, M. S. Y., Chai, C. S., & Lin, Z. (2023). Collaborative construction of artificial intelligence curriculum in primary schools. Journal of Engineering Education, 112 (1), 23–42. https://doi.org/10.1002/jee.20503

Díaz, J., Queiruga, C., Tzancoff, C., Fava, L., & Harari, V. (2015). Educational robotics and videogames in the classroom. In 2015 10th Iberian Conference on Information Systems and Technologies (CISTI) . Aveiro, Portugal. (pp. 1–6). https://doi.org/10.1109/CISTI.2015.7170616

du Boulay, B. (2016). Artificial intelligence as an effective classroom assistant. IEEE Intelligent Systems, 31 (6), 76–81. https://doi.org/10.1109/MIS.2016.93

Eguchi, A. (2021). AI-robotics and ai literacy. Studies in Computational Intelligence, 982 , 75–85. https://doi.org/10.1007/978-3-030-77022-8

Eguchi, A., & Okada, H. (2018). If you give students a social robot? - world robot summit pilot study. In  Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction.  (pp. 103–104). https://doi.org/10.1145/3173386.3177038

Eguchi, A., & Okada, H. (2020). Imagine the Future with Social Robots - World Robot Summit’s Approach: Preliminary Investigation. In M. Moro, D. Alimisis, & L. Iocchi, L. (Eds) Educational Robotics in the Context of the Maker Movement. Edurobotics 2018. Advances in Intelligent Systems and Computing, (p. 946). Springer. https://doi.org/10.1007/978-3-030-18141-3_10

Eguchi, A., Okada, H., & Muto, Y. (2021). Contextualizing AI education for k- 12 students to enhance their learning of ai literacy through culturally responsive approaches. KI Kunstliche Intelligenz, 35 (2), 153–161. https://doi.org/10.1007/s13218-021-00737-3

Ellis, G., Ory, E., Bhushan, N. (2005). Organizing a K-12 AI curriculum using philosophy of the mind. Engineering: Faculty Publications, Smith College. Retrieved from https://scholarworks.smith.edu/egr_facpubs/96

Ellis, G., Silva, K., Epstein, T., & Giammaria, N. (2009). Artificial intelligence in pre-college education: Learning within a philosophy of the mind framework. International Journal of Engineering Education, 25 (3), 511–522.

Google Scholar  

Evangelista, I., Blesio, G., & Benatti, E. (2019). Why are we not teaching machine learning at high school? a proposal. In 2018 World Engineering Education Forum - Global Engineering Deans Council (WEEF-GEDC). (pp. 1–6). https://doi.org/10.1109/WEEF-GEDC.2018.8629750

Gao, J., & Wang, L. (2019). Reverse thinking teaching discussion in high school information technology under new curriculum standards. In 14th International Conference on Computer Science & Education (ICCSE). (pp. 222–226). https://doi.org/10.1109/ICCSE.2019.8845429

Giannakos, M., Voulgari, I., Papavlasopoulou, S., Papamitsiou, Z., & Yannakakis, G. (2020). Games for artificial intelligence and machine learning education: Review and perspectives. Lecture Notes in Educational Technology . https://doi.org/10.1007/978-981-15-6747-6_7

Gong, X., Zhao, L., Tang, R., Guo, Y., Liu, X., He, J., … Wang, X. (2019). AI education system for primary and secondary schools. In  2019 ASEE Annual Conference & Exposition .

Gonzalez, A. J., Hollister, J. R., DeMara, R. F., Leigh, J., Lanman, B., Lee, S. Y., & Wilder, B. (2017). AI in informal science education: bringing turing back to life to perform the turing test. International Journal of Artificial Intelligence in Education, 27 (2), 353–384. https://doi.org/10.1007/s40593-017-0144-1

Guan, C., Mou, J., & Jiang, Z. (2020). Artificial intelligence innovation in education: A twenty-year data-driven historical analysis. International Journal of Innovation Studies, 4 (4), 134–147. https://doi.org/10.1016/j.ijis.2020.09.001

Gutiérrez, L. F., & Henriques, A. (2020). Prospective mathematics teachers’ tpack in a context of a teacher education experiment. Revista Latinoamericana De Investigación En Matemática Educativa, 23 (2), 175–202. https://doi.org/10.12802/relime.20.2322

Han, X., Hu, F., Xiong, G., Liu, X., Gong, X., Niu, X., … Wang, X. (2019). Design of AI + curriculum for primary and secondary schools in Qingdao. In Chinese Automation Congress (CAC) . (pp. 4135–4140). https://doi.org/10.1109/CAC.2018.8623310

Harris, E., Lamonica, A., & Weinberg. JB. (2004) Interfacing the public and technology: a web controlled mobile robot. In Accessible hands-on artificial intelligence and robotics education: working papers of the 2004. AAAI spring symposium series. AAAI Press. (pp.106–110)

He, Y.-T., Guo, B.-J., Lu, J., Xu, Y.-P., & Gong, M. (2020). Research of scratch programming recommendation system based on med and knowledge graph. In 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) . (pp. 2158–2163). https://doi.org/10.1109/ICMCCE51767.2020.00469

Heintz, F. (2021). Three interviews about k-12 ai education in america, europe, and singapore. KI Kunstliche Intelligenz, 35 (2), 233–237. https://doi.org/10.1007/s13218-021-00730-w

Henry, J., Hernalesteen, A., & Collard, A.-S. (2021). Teaching artificial intelli- gence to k-12 through a role-playing game questioning the intelligence concept. KI Kunstliche Intelligenz, 35 (2), 171–179. https://doi.org/10.1007/s13218-021-00733-7

Herrero, Á., Cambra, C., Urda, D., Sedano, J., Quintián, H., & Corchado, E. (Eds) (2020). 11th International Conference on European Transnational Educational (ICEUTE 2020). ICEUTE 2020. Advances in Intelligent Systems and Computing, 1266 . Springer. https://doi.org/10.1007/978-3-030-57799-5_8

Ho, J. W., Scadding, M., Kong, S. C., Andone, D., Biswas, G., Hoppe, H. U., & Hsu, T. C. (2019). Classroom activities for teaching artificial intelligence to primary school students. In  Proceedings of international conference on computational thinking education. The Education University of Hong Kong. (pp. 157–159).

Holowka, P. (2020). Teaching robotics during COVID-19: Machine learning, simulation, and aws deepracer. In 17th International Conference on Cognition and Exploratory Learning in Digital Age, CELDA .

Holstein, K., McLaren, B. M., & Aleven, V. (2019). Designing for complementarity: Teacher and student needs for orchestration support in ai-enhanced classrooms. In S. Isotani, E. Millán, A. Ogan, P. Hastings, B. McLaren, & R. Luckin (Eds.), Artificial intelligence in education. AIED 2019. Lecture notes in computer science (p. 11625). Springer. 10.1007/978-3-030-23204-7_14.

Itmazi, J., & Khlaif, Z. N. (2022). Science education in Palestine: Hope for a better future. Lecture Notes in Educational Technology . https://doi.org/10.1007/978-981-16-6955-2_9

Jagannathan, R. K., & Komives, C. (2019). Teaching by induction: Project- based learning for Silicon Valley. Journal of Engineering Education Transformations, 33 (1), 22–26. https://doi.org/10.16920/jeet/2019/v33i1/149003

Jaskie, K., Larson, J., Johnson, M., Turner, K., O’Donnell, M., Christen, J.B., & Spanias, A. (2021). Research experiences for teachers in machine learning. In IEEE Frontiers in Education Conference (FIE) . Lincoln, NE, USA. (pp. 1–5). https://doi.org/10.1109/FIE49875.2021.9637132

Judd, S. (2020). Activities for Building Understanding: How AI4ALL Teaches AI to Diverse High School Students. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. (pp. 633–634). https://doi.org/10.1145/3328778.3366990

Kahn, K., Megasari, R., Piantari, E., & Junaeti, E. (2018). AI programming by children using Snap! block programming in a developing country. In Thirteenth European Conference on Technology Enhanced Learning . (p. 11082). https://doi.org/10.1007/978-3-319-98572-5

Kandlhofer, M., Steinbauer, G., Hirschmugl-Gaisch, S., & Huber, P. (2016). Artificial intelligence and computer science in education: From kinder- garten to university. In IEEE Frontiers in Education Conference. (pp. 1–9). https://doi.org/10.1109/FIE.2016.7757570

Kandlhofer, M., Steinbauer, G., Lasnig, J.P., Baumann, W., Plomer, S., Ballagi, A., & Alfoldi, I. (2019). Enabling the creation of intelligent things: Bringing artificial intelligence and robotics to schools. In IEEE Frontiers in Education Conference (FIE) . (pp. 1–5). https://doi.org/10.1109/FIE43999.2019.9028537

Kandlhofer, M., & Steinbauer, G. (2021). AI k-12 education service. KI Kunstliche Intelligenz, 35 (2), 125–126. https://doi.org/10.1007/s13218-021-00715-9

Kandlhofer, M., Steinbauer, G., Lassnig, J., Menzinger, M., Baumann, W., Ehardt-Schmiederer, M., & Szalay, I. (2021). EDLRIS: A European driving license for robots and intelligent systems. KI Kunstliche Intelligenz, 35 (2), 221–232. https://doi.org/10.1007/s13218-021-00716-8

Kaspersen, M. H., Bilstrup, K. E. K., Van Mechelen, M., Hjort, A., Bouvin, N. O., & Petersen, M. G. (2022). High school students exploring machine learning and its societal implications Opportunities and challenges. International Journal of Child-Computer Interaction, 34 , 1–12. https://doi.org/10.1016/j.ijcci.2022.100539

Kelly, J., Binney, J., Pereira, A., Khan, O., & Sukhatme, G. (2008). Just add wheels: Leveraging commodity laptop hardware for robotics and ai education. In  Proceedings of AAAI Education Colloquium , 22.

Kim, K., Kwon, K., Ottenbreit-Leftwich, A., Bae, H., & Glazewski, K. (2023). Exploring middle school students’ common naive conceptions of Artificial Intelligence concepts, and the evolution of these ideas. Education and Information Technologies . https://doi.org/10.1007/s10639-023-11600-3

Kim, S., Jang, Y., Choi, S., Kim, W., Jung, H., Kim, S., & Kim, H. (2021). Analyzing teacher competency with tpack for k-12 ai education. KI Kunstliche Intelligenz, 35 (2), 139–151. https://doi.org/10.1007/s13218-021-00731-9

Kitchenham, B. (2004). Procedures for performing systematic reviews (Vol. 33, pp. 1–26). Keele: Keele University.

Knijnenburg, B., Bannister, N., & Caine, K. (2021). Using mathematically- grounded metaphors to teach ai-related cybersecurity. In IJCAI-21 Workshop on Adverse Impacts and Collateral Effects of Artificial Intelligence Technologies (AIofAI) .

Kong, S. C., ManYinCheung, W., & Zhang, G. (2021). Evaluation of an artificial intelligence literacy course for university students with diverse study backgrounds. Computers and Education: Artificial Intelligence . https://doi.org/10.1016/j.caeai.2021.100026

Korenova, L. (2016). Digital technologies in teaching mathematics on the faculty of education of the Comenius University in Bratislava. In 15 Conference on Applied Mathematics. Slovak University of Technology in Bratislava. (p. 690–699).

Lee, S., Mott, B., Ottenbriet-Leftwich, A., Scribner, A., Taylor, S., Glazewski, K.,…Lester, J. (2020). Designing a collaborative game-based learning environment for ai-infused inquiry learning in elementary school class- rooms. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education . (pp. 566–566). https://doi.org/10.1145/3341525.3393981

Lee, I., Ali, S., Zhang, H., Dipaola, D., & Breazeal, C. (2021). Developing middle school students’ ai literacy. In Association for Computing Machinery, Inc. (pp. 191–197). https://doi.org/10.1145/3408877.3432513

Lenoir, Y., & Hasni, A. (2016). Interdisciplinarity in primary and secondary school: Issues and perspectives. Creative Education, 7 (16), 2433–2458. https://doi.org/10.4236/ce.2016.716233

Lin, P., & Brummelen, J. (2021). Engaging teachers to co-design integrated ai curriculum for k-12 classrooms. In CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems . (pp.1–12). https://doi.org/10.1145/3411764.3445377

Lin, X. F., Chen, L., Chan, K. K., Peng, S., Chen, X., Xie, S., & Hu, Q. (2022). Teachers’ perceptions of teaching sustainable artificial intelligence: A design frame perspective. Sustainability, 14 (13), 1–20. https://doi.org/10.3390/su14137811

Lindner, A., & Berges, M. (2020). Can you explain ai to me? teachers’ pre- concepts about artificial intelligence. In IEEE Frontiers in Education Conference (FIE). (pp. 1–9). https://doi.org/10.1109/FIE44824.2020.9274136

Long, D., & Magerko, B. (2020). What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 chi conference on human factors in computing systems. (pp. 1–16). https://doi.org/10.1145/3313831.3376727

Malach, J., & Vicherková, D. (2020). Background of the Revision of the Secondary School Engineering Curriculum in the Context of the Society 4.0. In M. Auer, H. Hortsch & P. Sethakul (Eds). The Impact of the 4th Industrial Revolution on Engineering Education. ICL Advances in Intelligent Systems and Computing, vol 1135. Springer. https://doi.org/10.1007/978-3-030-40271-6_27

Matarić, M.J., Koenig, N., & Feil-Seifer, D. (2007). Materials for enabling hands- on robotics and stem education. In AAAI Spring Symposium: Semantic Scientific Knowledge Integration. (pp. 99–102). http://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-09/SS07-09-022.pdf

Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: A guidance for policymakers . UNESCO Publishing.

Micheuz, P. (2020). Approaches to Artificial Intelligence as a Subject in School Education. In T. Brinda, D. Passey, & T. Keane (Eds), Empowering Teaching for Digital Equity and Agency. OCCE 2020. IFIP Advances in Information and Communication Technology, 595. Springer. https://doi.org/10.1007/978-3-030-59847-1_1

Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record, 108 (6), 1017–1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x

Mongan, W.M., & Regli, W.C. (2008). A cyber-infrastructure for supporting k-12 engineering education through robotics, WS-08-02, 68–73.

Narahara, T., & Kobayashi, Y. (2018). Personalizing homemade bots with plug & play ai for steam education. In SIGGRAPH Asia 2018 technical briefs . (pp. 1–4). https://doi.org/10.1145/3283254.3283270

Ng, D. T. K., Lee, M., Tan, R. J. Y., Hu, X., Downie, J. S., & Chu, S. K. W. (2022). A review of AI teaching and learning from 2000 to 2020. Education and Information Technologies . https://doi.org/10.1007/s10639-022-11491-w

Nisheva-Pavlova, M.M. (2021). Ai courses for secondary and high school - comparative analysis and conclusions. In CEUR Workshop Proceedings, 3061. (pp. 9–16).

Nurbekova, Z., Mukhamediyeva, K., & Assainova, A. (2018). Educational robotics technologies in Kazakhstan and in the world: Comparative analysis, current state and perspectives. Astra Salvensis, 6 (1), 665–686.

Oh, W. (2020). Physics teachers’ perception of it convergence-based physics education. New Physics: Sae Mulli, 70 (8), 660–666. https://doi.org/10.3938/NPSM.70.660

Olari, V., & Romeike, R. (2021). Addressing ai and data literacy in teacher education: A review of existing educational frameworks. In WiPSCE '21: The 16th Workshop in Primary and Secondary Computing Education, 17. (pp. 1–2) https://doi.org/10.1145/3481312.3481351

Ottenbreit-Leftwich, A., Glazewski, K., Jeon, M., Hmelo-Silver, C., Mott, B., Lee, S., & Lester, J. (2021). How do elementary students conceptualize artificial intelligence? In SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education . (pp. 1261). https://doi.org/10.1145/3408877.3439642

Park, K., Mott, B., Lee, S., Glazewski, K., Scribner, J., Ottenbreit-Leftwich, A., & Lester, J. (2021). Designing a visual interface for elementary students to formulate ai planning tasks. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). (pp. 1–9). https://doi.org/10.1109/VL/HCC51201.2021.9576163

Perlin, K., Flanagan, M., & Hollingshead, A. (2005). The Rapunsel Project. In Subsol, G. (Eds). Virtual Storytelling. Using Virtual Reality Technologies for Storytelling. ICVS 2005. Lecture Notes in Computer Science , 3805. Springer. https://doi.org/10.1007/11590361_29

Reyes, A., Elkin, C., Niyaz, Q., Yang, X., Paheding, S., & Devabhaktuni, V. (2020). A preliminary work on visualization-based education tool for high school machine learning education. In IEEE Integrated STEM Education Conference (ISEC). (pp. 1–5). https://doi.org/10.1109/ISEC49744.2020.9280629

Rodríguez-García, J., Moreno-León, J., Román-González, M., & Robles, G. (2020). Introducing artificial intelligence fundamentals with learning ML: Artificial intelligence made easy. In TEEM'20: Eighth International Conference on Technological Ecosystems for Enhancing Multiculturality . (pp. 18–20). https://doi.org/10.1145/3434780.3436705

Russell, S. (2021). The history and future of AI. Oxford Review of Economic Policy, 37 (3), 509–520. https://doi.org/10.1093/oxrep/grab013

Russell, S., & Norvig, P. (2021). Artificial Intelligence, global edition a modern approach . Pearson Deutschland.

Sabuncuoglu, A. (2020). Designing one year curriculum to teach artificial intelligence for middle school. In Proceedings of the 2020 ACM conference on innovation and technology in computer science education . (pp. 96–102). https://doi.org/10.1145/3341525.3387364

Sakulkueakulsuk, B., Witoon, S., Ngarmkajornwiwat, P., Pataranutapom, P., Surareungchai, W., Pataranutaporn, P., & Subsoontorn, P. (2019). Kids making ai: Integrating machine learning, gamification, and social context. In 2018 IEEE international conference on teaching, assessment, and learning for engineering (TALE) . (pp. 1005–1010). https://doi.org/10.1109/TALE.2018.8615249

Sanusi, I. T., Oyelere, S. S., Vartiainen, H., Suhonen, J., & Tukiainen, M. (2022). A systematic review of teaching and learning machine learning in K-12 education. Education and Information Technologies . https://doi.org/10.1007/s10639-022-11416-7

Shin, S. (2021). A study on the framework design of artificial intelligence thinking for artificial intelligence education. International Journal of Information and Education Technology, 11 (9), 392–397. https://doi.org/10.18178/ijiet.2021.11.9.1540

Sing, C. C., Teo, T., Huang, F., Chiu, T. K., & Xing Wei, W. (2022). Secondary school students’ intentions to learn AI: Testing moderation effects of readiness, social good and optimism. Educational Technology Research and Development, 70 (3), 765–782. https://doi.org/10.1007/s11423-022-10111-1

Sorensen, L., & Koefoed, N. (2018). The future of teaching—what are students’ expectations. In 2018 11th CMI International Conference: Prospects and Challenges Towards Developing a Digital Economy within the EU . (pp. 62–66). https://doi.org/10.1109/PCTDDE.2018.8624771

Sperling, A., & Lickerman, D. (2012). Integrating AI and machine learning in software engineering course for high school students. In Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education . (pp. 244–249). https://doi.org/10.1145/2325296.2325354

Su, J., Zhong, Y., & Ng, D. T. K. (2022). A meta-review of literature on educational approaches for teaching AI at the K-12 levels in the Asia-Pacific region. Computers and Education: Artificial Intelligence, 3 , 1–18. https://doi.org/10.1016/j.caeai.2022.100065

Suh, W., & Ahn, S. (2022). Development and validation of a scale measuring student attitudes toward artificial intelligence. SAGE Open, 12 (2), 1–12. https://doi.org/10.1177/21582440221100463

Summers, B.G., Hicks, H., & Oliver, C. (1995). Reaching minority, female and disadvantaged students. In Proceedings Frontiers in Education 1995 25th Annual Conference. Engineering Education for the 21st Century , 1 . (992a4–16). https://doi.org/10.1109/FIE.1995.483030

Tedre, M., Toivonen, T., Kahila, J., Vartiainen, H., Valtonen, T., Jormanainen, I., & Pears, A. (2021). Teaching machine learning in k-12 classroom: Pedagogical and technological trajectories for artificial intelligence education. IEEE Access, 9 , 110558–110572. https://doi.org/10.1109/ACCESS.2021.3097962

Tims, H., Turner III, G., Cazes, G., & Marshall, J. (2012). Junior cyber discovery: Creating a vertically integrated middle school cyber camp. In  2012 ASEE Annual Conference & Exposition.  (pp. 25–867). Retrieved from https://peer.asee.org/21624

Toivonen, T., Jormanainen, I., Kahila, J., Tedre, M., Valtonen, T., Vartiainen, H. (2020). Co-designing machine learning apps in k-12 with primary school children. In  2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT). IEEE. (pp. 308–310). https://doi.org/10.1109/ICALT49669.2020.00099

Touretzky, D., Gardner-McCune, C., Breazeal, C., Martin, F., & Seehorn, D. (2019a). A year in k-12 ai education. AI Magazine, 40 (4), 88–90. https://doi.org/10.1609/aimag.v40i4.5289

Touretzky, D., Gardner-McCune, C., Martin, F., & Seehorn, D. (2019b). Envisioning ai for k-12: What should every child know about ai? Proceedings of the AAAI Conference on Artificial Intelligence, 33 (01), 9795–9799. https://doi.org/10.1609/aaai.v33i01.33019795

Vachovsky, M., Wu, G., Chaturapruek, S., Russakovsky, O., Sommer, R., & Fei-Fei, L. (2016). Towards more gender diversity in cs through an arti- ficial intelligence summer program for high school girls. In  Proceedings of the 47th ACM technical symposium on computing science education.  (pp. 303–308). https://doi.org/10.1145/2839509.2844620

Van Brummelen, J., Heng, T., & Tabunshchyk, V. (2021a). Teaching tech to talk: K-12 conversational artificial intelligence literacy curriculum and development tools. Proceedings of the AAAI Conference on Artificial Intelligence, 35 (17), 15655–15663. https://doi.org/10.1609/aaai.v35i17.17844

Van Brummelen, J., Tabunshchyk, V., & Heng, T. (2021b). Alexa, can i program you? Student perceptions of conversational artificial intelligence before and after programming Alexa. In IDC '21: Interaction Design and ChildrenJune. (pp. 305–313) https://doi.org/10.1145/3459990.3460730

Vartiainen, H., Tedre, M., & Valtonen, T. (2020). Learning machine learning with very young children: Who is teaching whom? International Journal of Child-Computer Interaction, 25 , 1–11. https://doi.org/10.1016/j.ijcci.2020.100182

Vartiainen, H., Toivonen, T., Jormanainen, I., Kahila, J., Tedre, M., & Valtonen, T. (2021). Machine learning for middle schoolers: Learning through data- driven design. International Journal of Child-Computer Interaction, 29 , 1–12. https://doi.org/10.1016/j.ijcci.2021.100281

Verner, I., Cuperman, D., & Reitman, M. (2021). Exploring robot connectivity and collaborative sensing in a high-school enrichment program. Robotics, 10 (1), 1–19. https://doi.org/10.3390/robotics10010013

von Wangenheim, C. G., Hauck, J. C., Pacheco, F. S., & Bueno, M. F. B. (2021). Visual tools for teaching machine learning in K-12: A ten-year systematic mapping. Education and Information Technologies, 26 (5), 5733–5778. https://doi.org/10.1007/s10639-021-10570-8

Wan, X., Zhou, X., Ye, Z., Mortensen, C., & Bai, Z. (2020). Smileyclus- ter: Supporting accessible machine learning in k-12 scientific discovery. In  proceedings of the Interaction Design and Children Conference.  (pp. 23–35). https://doi.org/10.1145/3392063.3394440

Wang, H., Liu, Y., Han, Z., & Wu, J. (2020). Extension of media literacy from the perspective of artificial intelligence and implementation strategies of artificial intelligence courses in junior high schools. In  2020 International Conference on Artificial Intelligence and Education (ICAIE).  (pp. 63–66). https://doi.org/10.1109/ICAIE50891.2020.00022

Wei, Y. (2021). Influence factors of using modern teaching technology in the classroom of junior middle school teachers under the background of artificial intelligence-analysis based on HLM. Advances in Intelligent Systems and Computing, 1282 , 110–118. https://doi.org/10.1007/978-3-030-62743-0_16

Wei, Q., Li, M., Xiang, K., & Qiu, X. (2020). Analysis and strategies of the professional development of information technology teachers under the vision of artificial intelligence. In  2020 15th International Conference on Computer Science & Education (ICCSE).  (pp. 716–721). https://doi.org/10.1109/ICCSE49874.2020.9201652

West, D.M., & Allen, J.R. (2018). How artificial intelligence is transforming the world. Report. Retrieved April 24, 2018, f rom https://www.brookings.edu/research/how-artificial-intelligence-is-transforming-the-world/

Wong, K.-C. (2020). Computational thinking and artificial intelligence education: A balanced approach using both classical AI and modern AI. CoolThink@ JC , 108.

Wong, G. K., Ma, X., Dillenbourg, P., & Huen, J. (2020). Broadening artificial intelligence education in k-12: Where to start? ACM Inroads, 11 (1), 20–29. https://doi.org/10.1145/3381884

Woo, H., Kim, J., Kim, J., & Lee, W. (2020). Exploring the ai topic composition of k-12 using nmf-based topic modeling. International Journal on Advanced Science, Engineering and Information Technology, 10 (4), 1471–1476. https://doi.org/10.18517/ijaseit.10.4.12787

Wu, D., Zhou, C., Meng, C., & Chen, M. (2020). Identifying multilevel factors influencing ICT self-efficacy of k-12 teachers in China. In  Blended Learning. Education in a Smart Learning Environment: 13th International Conference, ICBL 2020. (pp. 303–314). Springer International Publishing. https://doi.org/10.1007/978-3-030-51968-1

Xia, Q., Chiu, T. K., & Chai, C. S. (2022). The moderating effects of gender and need satisfaction on self-regulated learning through Artificial Intelligence (AI). Education and Information Technologies . https://doi.org/10.1007/s10639-022-11547-x

Xia, L., & Zheng, G. (2020). To meet the trend of AI: The ecology of developing ai talents for pre-service teachers in China. International Journal of Learning, 6 (3), 186–190. https://doi.org/10.18178/IJLT.6.3.186-190

Xiao, W., & Song, T. (2021). Current situation of artificial intelligence education in primary and secondary schools in China. In  The Sixth International Conference on Information Management and Technology. (pp. 1–4). https://doi.org/10.1145/3465631.3465980

Yau, K. W., Chai, C. S., Chiu, T. K., Meng, H., King, I., Wong, S. W. H., & Yam, Y. (2022). Co-designing artificial intelligence curriculum for secondary schools: A grounded theory of teachers' experience. In 2022 International Symposium on Educational Technology (ISET). (pp. 58–62). https://doi.org/10.1109/ISET55194.2022.00020

Yue, M., Dai, Y., Siu-Yung, M., & Chai, C.-S. (2021). An analysis of k-12 artificial intelligence curricula in eight countries. In  Proceedings of the 29th International Conference on Computers in Education.  (pp. 22–26).

Yue, M., Jong, M. S. Y., & Dai, Y. (2022). Pedagogical design of K-12 artificial intelligence education: A systematic review. Sustainability, 14 (23), 15620. https://doi.org/10.3390/su142315620

Zhai, X., Chu, X., Chai, C. S., Jong, M. S. Y., Istenic, A., Spector, M., & Li, Y. (2021). A review of artificial intelligence (AI) in education from 2010 to 2020. Complexity . https://doi.org/10.1155/2021/8812542

Zhang, N., Biswas, G., McElhaney, K.W., Basu, S., McBride, E., & Chiu, J.L. (2020). Studying the interactions between science, engineering, and computational thinking in a learning-by-modeling environment. In  International conference on artificial intelligence in education. (pp. 598–609). Springer.

Download references

Acknowledgements

Authors would like to thank the reviewers and editors, whose comments and feedback helped us to improve the original manuscript.

This work has partially been funded by the Spanish Ministry of Science, Innovation and Universities (PID2021-123152OB-C21), and the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431C2022/19 and reference competitive group, ED431G2019/04) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS—Centro Singular de Investigación en Tecnoloxías Intelixentes da Universidade de Santiago de Compostela as a Research Center of the Galician University System. This work also received support from the Educational Knowledge Transfer (EKT), the Erasmus + project (reference number 612414-EPP-1-2019-1-ES-EPPKA2-KA) and the Knowledge Alliances call (Call EAC/A03/2018).

Author information

Authors and affiliations.

Pedagogy and Didactics Department, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain

Lorena Casal-Otero, Carmen Fernández-Morante & Beatriz Cebreiro

Departamento de Electrónica e Computación, Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain

Alejandro Catala & Maria Taboada

Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, 15782, Santiago de Compostela, Spain

Alejandro Catala & Senén Barro

You can also search for this author in PubMed   Google Scholar

Contributions

All authors have contributed significantly to the authorship of this work in all stages of conceptualization, discussions, definition of the methodology, carrying out the analysis as well as writing—review & editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alejandro Catala .

Ethics declarations

Ethics approval and consent to participate.

This research is carried out in accordance to ethics recommendations. As it focuses on a systematic literature review as a research method, ethics approval by the University ethics committee does not apply.

Competing interests

The authors declare that they have no competing interests. The authors have no other relevant financial or non-financial interests to disclose and no further competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Additional listing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Casal-Otero, L., Catala, A., Fernández-Morante, C. et al. AI literacy in K-12: a systematic literature review. IJ STEM Ed 10 , 29 (2023). https://doi.org/10.1186/s40594-023-00418-7

Download citation

Received : 06 September 2022

Accepted : 30 March 2023

Published : 19 April 2023

DOI : https://doi.org/10.1186/s40594-023-00418-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Secondary education
  • Teaching/learning strategies
  • Twenty-first century skills
  • Cultural and social implications

ai in education a systematic literature review

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • DOI: 10.2139/ssrn.4892468
  • Corpus ID: 271777263

The Best Practices of Financial Management in Education: A Systematic Literature Review

  • Ruel S. Vicente , Loyd C. Flores , +2 authors Jocel P. Lopez
  • Published in Social Science Research… 2024
  • Education, Business, Economics

Related Papers

Showing 1 through 3 of 0 Related Papers

  • Open access
  • Published: 16 September 2024

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis

  • Hye Kyung Jin 1 , 2 ,
  • Ha Eun Lee 2 &
  • EunYoung Kim 1 , 2 , 3  

BMC Medical Education volume  24 , Article number:  1013 ( 2024 ) Cite this article

Metrics details

ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations in the medical field. However, thus far, an overall evaluation of the potential of ChatGPT models (ChatGPT-3.5 and GPT-4) in a variety of national health licensing examinations is lacking. This study aimed to provide a comprehensive assessment of the ChatGPT models’ performance in national licensing examinations for medical, pharmacy, dentistry, and nursing research through a meta-analysis.

Following the PRISMA protocol, full-text articles from MEDLINE/PubMed, EMBASE, ERIC, Cochrane Library, Web of Science, and key journals were reviewed from the time of ChatGPT’s introduction to February 27, 2024. Studies were eligible if they evaluated the performance of a ChatGPT model (ChatGPT-3.5 or GPT-4); related to national licensing examinations in the fields of medicine, pharmacy, dentistry, or nursing; involved multiple-choice questions; and provided data that enabled the calculation of effect size. Two reviewers independently completed data extraction, coding, and quality assessment. The JBI Critical Appraisal Tools were used to assess the quality of the selected articles. Overall effect size and 95% confidence intervals [CIs] were calculated using a random-effects model.

A total of 23 studies were considered for this review, which evaluated the accuracy of four types of national licensing examinations. The selected articles were in the fields of medicine ( n  = 17), pharmacy ( n  = 3), nursing ( n  = 2), and dentistry ( n  = 1). They reported varying accuracy levels, ranging from 36 to 77% for ChatGPT-3.5 and 64.4–100% for GPT-4. The overall effect size for the percentage of accuracy was 70.1% (95% CI, 65–74.8%), which was statistically significant ( p  < 0.001). Subgroup analyses revealed that GPT-4 demonstrated significantly higher accuracy in providing correct responses than its earlier version, ChatGPT-3.5. Additionally, in the context of health licensing examinations, the ChatGPT models exhibited greater proficiency in the following order: pharmacy, medicine, dentistry, and nursing. However, the lack of a broader set of questions, including open-ended and scenario-based questions, and significant heterogeneity were limitations of this meta-analysis.

Conclusions

This study sheds light on the accuracy of ChatGPT models in four national health licensing examinations across various countries and provides a practical basis and theoretical support for future research. Further studies are needed to explore their utilization in medical and health education by including a broader and more diverse range of questions, along with more advanced versions of AI chatbots.

Peer Review reports

Over the past decade, artificial intelligence (AI) technology has experienced rapid evolution, leading to significant advancements across various fields [ 1 , 2 ]. One of the most recent and notable advancements is ChatGPT by OpenAI (San Francisco, CA), a natural language processing program designed to generate human-like language [ 3 ]. Since its launch, this innovative technology has demonstrated applicability in a variety of domains, including healthcare, education, research, business, and industry [ 4 , 5 , 6 ]. Moreover, ChatGPT is a robust, evolving AI chatbot with considerable potential as a support resource for healthcare professionals, educators, and learners. For instance, in the realm of education, cutting-edge ChatGPT versions can aid students through personalized learning experiences and tutoring [ 7 ]. In the healthcare domain, it can assist medical professionals in diagnoses, treatment plans, and patient education by integrating medical knowledge with interactive dialogue [ 1 , 8 , 9 , 10 ].

With ongoing advancements in this novel technology, the accuracy of ChatGPT is expected to improve, thereby expanding their applicability, specifically in healthcare education. In a study by Gilson et al. emphasized that such technology could achieve equivalence with third-year US medical students, highlighting its potential to provide logical explanations and informative feedback [ 7 ]. Additionally, in the field of medical research, ChatGPT’s efficacy has been proven in the United States Medical Licensing Examination (USMLE), in which it achieved accuracy rates of 40–60% [ 7 , 11 ]. The latest ChatGPT version, GPT-4, outperforms its predecessor, ChatGPT-3.5, in reliability and accuracy, delivering expert-level quality responses [ 12 ]. Furthermore, Yang et al. found that ChatGPT-4’s ability to respond to USMLE questions involving images resulted in a 90.7% accuracy rate across the entire USMLE examination, exceeding the passing threshold of approximately 60% accuracy [ 13 ]. Remarkably, GPT-4 also exhibited the capacity to address intricate interpersonal, ethical, and professional requirements within medical practice [ 14 ].

However, despite its capacity to significantly improve the efficiency of education, efforts to adopt it in healthcare professional education remain limited [ 15 ]. Specifically, this AI chatbot can produce inaccurate outputs [ 16 ], and the reported response accuracy rates vary among different medical examinations and medical fields [ 17 , 18 ]. Other studies have also shown that calculations are identified as a domain where large language models (LLMs) tend to exhibit comparatively lower precision [ 19 ] and fail to achieve the required accuracy level, as demonstrated in the National Medical Licensing Examination (NMLE), National Pharmacist Licensing Examination (NPLE), and National Nurse Licensing Examination (NNLE) in non-English language regions [ 20 , 21 ]. Potential reasons for the performance differences have been proposed, including variations in curricula and examination content [ 22 , 23 ]. Recent studies have shown that while ChatGPT’s accuracy has improved over time, its scores are still lower than those of medical students [ 24 , 25 ]. Consequently, existing studies are inconsistent and do not yield definitive conclusions.

Given these considerations, a meta-analysis is needed to offer a more comprehensive summary of the evidence regarding the performance of OpenAI’s ChatGPT series in four types of national healthcare-related licensing examinations conducted in a range of countries. Therefore, this study aimed to conduct a meta-analysis of studies reporting the performance of both versions of ChatGPT (ChatGPT-3.5 and GPT-4) across different healthcare-related national licensing examinations without restriction to specific countries. The specific research questions were as follows: (1) What is the overall effect size of ChatGPT models in the national licensing examinations for medicine, pharmacy, dentistry, and nursing? (2) Which ChatGPT model has the greatest influence on the effect sizes measured in national licensing examinations? (3) Which medical field examinations (e.g., medicine, pharmacy, dentistry, and nursing) have the greatest influence on the effect sizes measured in ChatGPT model research?

The reporting of this review and meta-analysis is based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [ 26 ]. See the completed PRISMA 2020 for abstracts (Supplementary Table 1) and the PRISMA 2020 checklists (Supplementary Table 2).

Data sources

The protocol for the review was not registered. Relevant articles were identified by searching the following databases: MEDLINE/PubMed, EMBASE, Cochrane Library, ERIC, and Web of Science. To identify further relevant studies, hand searches of key issues of journals related to healthcare education, as well as the reference lists of the retrieved studies [ 11 , 13 , 14 , 21 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 ] and previously conducted systematic reviews [ 17 , 46 , 47 ] were carried out. The retrieved records were imported into an EndNote™ library. The search criteria were restricted to articles published in English, with a publication date ranging from November 30, 2022, the release date of OpenAI, to February 27, 2024.

Search strategy

This study employed a structured search strategy that involved the use of specific search terms associated with ChatGPT (e.g., “Artificial intelligence” [MeSH Terms], “ChatGPT” [Title/Abstract], “GPT-4” [Title/Abstract], “Chatbot” [Title/Abstract], “Natural language processing” [Title/Abstract], “Large language models” [Title/Abstract]). These terms were combined with terms related to the licensing examination (e.g., “educational measurement” [MeSH Terms], “licensure, medical” [MeSH Terms], “licensure, pharmacy” [MeSH Terms], “licensure, dental” [MeSH Terms], “licensure, nursing” [MeSH Terms], “medical licensure exam*” [Title/Abstract]). Variations of this search strategy were applied across various sources. Controlled vocabulary such as MeSH (Medical Subject Headings) for MEDLINE (PubMed) and thesaurus terms from ERIC were utilized alongside advanced search techniques such as truncation for broader retrieval. The first author (HK) conducted the search, and the full search strategies are shown in Supplementary Table 3.

Inclusion and exclusion criteria

Two reviewers (HK and HE) independently reviewed the eligibility of studies for inclusion and exclusion. Disagreements between the reviewers were resolved by consensus or discussion with a third reviewer (EY). A study was considered for inclusion in the meta-analysis only if it met all of the following four criteria: (1) focused on evaluating the performance of a ChatGPT model (ChatGPT-3.5 or GPT-4); (2) related to national licensing examinations in the fields of medicine, pharmacy, dentistry, or nursing; (3) involved multiple-choice questions; and (4) provided data that enabled the calculation of effect size. Studies were excluded if they (1) were review articles or meta-analyses; (2) used any AI platform other than ChatGPT-3.5 or GPT-4; (3) were unrelated to national licensing exams; (4) lacked sufficient data for effect size calculation; (5) did not use multiple-choice questions (e.g., used open-ended questions); (6) were unavailable in full text; and (7) were not in the English language.

The outcome of this study was the accuracy of ChatGPT (GPT-3.5 and GPT-4) on the national licensing examinations for medicine, pharmacy, dentistry, and nursing.

Risk of bias in studies

The Joanna Briggs Institute (JBI) critical appraisal tools for analytical cross- sectional studies were used to assess the risk of bias [ 48 ]. This checklist consists of eight questions and each item is scored as 1 (yes) or 0 (no, unclear or not applicable), resulting in a total score of 8 if all questions are positively answered. Potential disagreements between the two reviewers were resolved by discussions. The risk of bias of the study was judged as follows: High risk if two or more domains were considered at high risk; moderate risk if one domain was considered at high risk or if two or more domains were considered unclear; low risk if no domains were considered at high risk.

Data extraction and coding

Data extraction was conducted independently by two reviewers (HK and HE) using a dual-review process, and any discrepancies in the data were resolved by consensus or adjudication by a third reviewer (EY). The data extraction form was designed to collect relevant information from studies, including descriptive details, such as the first author, publication year, journal, and country. Additionally, the specific characteristics of the studies included the ChatGPT models used, field of study, number of questions, accuracy rates, and key results observed.

Each of the 23 included studies was coded by the first author (HK), who subsequently trained other coders on the procedures, allowing them to execute the coding task independently.

Independence violations occur when a single study yields multiple effect sizes [ 49 ]. This meta-analysis included 144 effect sizes from 23 studies and used multivariate data. Such data typically emerge when a study produces several effect sizes. To maintain the assumption of independence and to avoid information loss, this study adopted a “shifting unit of analysis” approach [ 50 ]. Specifically, in this meta-analysis, the total effect size was calculated using the study as the unit of analysis; however, for the subgroup analysis, individual effect sizes were used as the units to estimate the effect sizes.

Statistical analysis

A random-effects model was used to calculate the overall accuracy rate and all testing of moderators owing to the high level of heterogeneity among the included studies. All data analyses were conducted using the Comprehensive Meta-Analysis version 4 (Biostat, Englewood, NJ, USA). The number of correctly answered questions and the total number of questions were used to calculate the effect sizes in each study as proportions. Forest plot was employed to visually illustrate each study’s contribution to this meta-analysis. The heterogeneity among the studies was measured using Q statistics, and the I 2 statistics illustrated the degree of heterogeneity. A sensitivity analysis was conducted to assess the robustness of the findings by excluding studies that were judged to be at high risk of bias. Additionally, subgroup analyses was performed to explore potential reasons for variance with the moderator variables (ChatGPT models and fields of health licensing examinations). Meta-regression was not performed in this study because there were no appropriate continuous variables available to assess the trends provided by the relevant variables through meta-regression analysis. Funnel plots, Egger’s regression test, the Duval and Tweedie trim-and-fill methods, and classic fail-safe N test were used to evaluate publication bias. All analyses performed in this study were at 95% CIs.

Ethical considerations

This meta-analysis was conducted using previously published studies. Therefore, evaluation by the institutional review board was not considered necessary.

Study selection

The literature search yielded 433 articles after the initial search. After removing duplicates and performing title/abstract screening, a total of 31 articles were included for full-text review. Of these, one was excluded as it was reported in a language other than English, three for non-multiple-choice questions, and eight for insufficient data for calculating effect size. After assessing the articles for eligibility, 23 articles were included in this review (Fig.  1 ).

figure 1

PRISMA flowchart of included studies

Risk of bias assessment

The findings of the risk of bias assessment using The Joanna Briggs Institute (JBI) Critical Appraisal Checklist for analytical cross-sectional studies are presented in Table  1 . Among the studies, two were deemed to be at high risk [ 33 , 34 ], and 18 at low risk [ 11 , 13 , 21 , 27 , 29 , 30 , 31 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 ] with the most commonly observed weaknesses relating to the identification of confounding factors as well as the strategies to deal with them. The risk of bias for each of the other studies was considered to be moderate [ 14 , 28 , 32 ].

Study characteristics

The characteristics of the 23 included studies [ 11 , 13 , 14 , 21 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 ] are shown in Table  2 . These studies were all published between 2022 and 2024. Five of the included studies were conducted in the United States [ 11 , 13 , 14 , 28 , 37 ], one in the UK [ 36 ], six in Japan [ 33 , 35 , 39 , 40 , 41 , 45 ], four in China [ 29 , 42 , 43 , 44 ], two in Taiwan [ 21 , 32 ], and one each in Australia [ 34 ], Belgium [ 38 ], Peru [ 30 ], Switzerland [ 31 ], and Saudi Arabia [ 27 ]. In terms of field, 17 articles focused on medicine [ 11 , 13 , 14 , 27 , 29 , 30 , 33 , 34 , 36 , 37 , 38 , 40 , 41 , 42 , 43 , 44 , 45 ], three on pharmacy [ 21 , 28 , 35 ], two on nursing [ 32 , 39 ], and one on dentistry [ 31 ]. The range of questions per examination varied from 21 questions related to communication skills, ethics, empathy, and professionalism in the USMLE [ 14 ] to 1510 questions across four separate exams in 2022 and 2023 in the Registered Nurse License Exam [ 32 ].

The performance of ChatGPT-3.5 across these studies fluctuated drastically, with scores as low as 38% in the Japanese National Medical Licensure Examination [ 33 ] to as high as 73% in the Peruvian National Licensing Medical Examination [ 30 ]. Similarly, GPT-4’s performance varied but generally demonstrated superior accuracy, achieving from 64.4% in the Swiss Federal Licensing Examination in Dental Medicine [ 31 ] to 100% in USMLE questions related to soft skills assessments [ 14 ]. This consistent improvement across different exams and regions indicates GPT-4’s enhanced capability to handle diverse medical licensing examinations with higher accuracy compared to its predecessor.

Publication bias

A funnel plot of the included studies revealed somewhat asymmetry (Fig.  2 ). Egger’s regression test resulted in a significant p-value ( p  = 0.028), suggesting the presence of publication bias in the meta-analysis. Conversely, the trim-and-fill analysis revealed that no studies were omitted or required trimming, indicating that the overall effect size remained unchanged. Additionally, Rosenthal’s fail-safe number, calculated in CMA as 310, suggests the absence of publication bias when compared to the Eq. 5n + 10 [ 51 ], where n represents the number of studies included in the meta-analysis. Taken together, we believe that publication bias does not appear likely in the study.

figure 2

Funnel plot for studies included in the meta-analysis

Overall analysis

A random-effects model was selected based on the assumption that effect sizes might vary among studies due to variations in examinations. Consequently, the effect sizes in the primary studies were also heterogeneous (Q = 1201.303, df = 22, p  < 0.001), with an I 2 of 98.2%. The overall effect size for the percentage of performance of ChatGPT models was 70.1% (95% CI, 65.0-74.8%). The forest plot of individual and overall effect sizes is shown in Fig.  3 .

figure 3

Forest plot of effect sizes using random effect model

Subgroup analyses

Under a random-effects model, subgroup analyses were conducted with the two categorical variables: (1) the type of ChatGPT model (ChatGPT-3.5 vs. GPT-4); and (2) the field of health licensing examinations (medicine, pharmacy, dentistry, and nursing), as detailed in Table  3 . Other potential moderating factors were limited, as they were either too infrequently reported or inadequately described to facilitate a comprehensive subgroup analysis.

ChatGPT models

The effect size for the percentage of performance with regard to the ChatGPT models was 58.9% (95% CI, 56.2-61.6%) for ChatGPT-3.5 and 80.4% (95% CI, 78.6-82.0%) for GPT-4. The test results demonstrated a significant difference between the effect sizes of the studies (Q = 177.027, p <  0.001).

Fields of health licensing examinations

The effect size for percentage of performance was at an average rate of 71.5% (95% CI, 66.3-76.2%) in pharmacy, 69.7% (95% CI, 65.9-73.2%) in medicine, 63.2% (95% CI, 54.6-71.1%) in dentistry, and 61.8% (95% CI, 58.7-64.9%) in nursing. This indicated a significant difference between the fields (Q = 15.334, p  = 0.002).

Sensitivity analysis

The sensitivity analysis demonstrated that removing high-risk studies did not significantly alter the overall accuracy, which remained at 71.3% (95% CI, 66.3-75.8%), indicating that the results were robust ( p  < 0.001), with an I² of 98.1% (Fig.  4 ).

figure 4

Sensitivity analysis omitting high risk studies

Overall, though exhibiting varying levels of risk of bias, the majority of the included studies were considered to be of high quality. The largest threats were related to the lack of identification of confounding factors and the strategies to address them. To the best of our knowledge, this is the first meta-analysis focusing on the performance of ChatGPT technology in four types of health licensing examinations, specifically comparing GPT-3.5 and GPT-4. Most previous studies have either focused solely on academic testing or knowledge assessment rather than national licensing exams [ 25 , 52 , 53 , 54 ] or have combined these contexts without distinction [ 17 , 46 , 47 ]. Additionally, they have primarily reported the accuracy rate of a single version of ChatGPT instead of comparing different versions [ 17 , 29 , 32 , 37 , 39 , 44 ].

The findings clearly indicate that GPT-4’s accuracy rates were significantly higher than those of ChatGPT-3.5 across various national medical licensing exams conducted in different countries. Furthermore, these results demonstrate that GPT-4 not only surpasses GPT-3.5 in overall accuracy but also shows more reliable performance, making it a more advanced tool for such assessments. A previous study suggested that achieving an accuracy rate exceeding 95% could make ChatGPT a reliable education tool [ 55 ]. Although it remains uncertain whether future models will attain this level of proficiency, the rapid development of these LLM technologies, owing to user feedback and deep learning, will lead to their potential utilization in learning and education in the medical field.

Regarding the types of ChatGPT models, this study provides insights into the potential of GPT-4, the latest version, particularly within the scope of its training dataset. Our finding is similar to what a previous study showed regarding USMLE questions, where accuracy rates of 84.7% were observed for GPT-4 and 56.9% for its earlier version [ 56 ]. Another study also reported that GPT-4 had an accuracy of 86.1% and ChatGPT-3.5 had an accuracy of 77.2% in the Peruvian National Licensing Medical Examination [ 30 ]. All of this suggests that GPT-4’s outperformance is perhaps attributable to its improved reasoning capabilities or critical thinking [ 57 ].

The current study revealed that in the context of health licensing examinations, ChatGPT models exhibited the highest proficiency for pharmacy, followed by medicine, dentistry, and nursing. Our results were inconsistent with previous research indicating that ChatGPT exhibited greater proficiency in NNLE, with NMLE and NPLE following behind in China, which is likely associated with the difference in GPT model versions as well as the difficulty and complexity of the test questions [ 20 ]. Moreover, variations in language or culture can lead to performance disparities. Similarly, a study by Seghier also highlighted that these innovative models encountered linguistic challenges and emphasized that their performance in non-English responses was significantly lower [ 58 ]. Consequently, the present study recommends incorporating additional training data in languages other than English to improve performance, enabling ChatGPT to be an effective educational tool as an AI assistant in broader educational contexts, for both students and professionals. Along with this, while this meta-analysis includes studies from various educational fields, it is important to address the scarcity of research specifically in the field of dentistry, with only one study available. Further research in this area is necessary to ensure that our final findings are robust and generalizable across different educational settings.

In addition to the suggestions discussed above, it must be noted that the rapid evolution of medical fields necessitates high quality, up-to-date data to enhance performance. For instance, considering the annual introduction of new and updated medications and medical guidelines, ChatGPT might not always have access to the most recent information due to its available knowledge being limited to 2021. Furthermore, previous studies have reported that ChatGPT might provide incorrect information or plausible but wrong answers, commonly referred to as “hallucinations” [ 16 , 32 , 38 , 43 ]. The models used in this study, ChatGPT-3.5 and GPT-4, represent earlier and more recent versions, respectively, and as AI technology rapidly evolves, these versions may soon become obsolete and appear limited. Previous research has suggested that many limitations identified in ChatGPT-3.5, including hallucinations, still apply to GPT-4 [ 57 ]. Therefore, it is imperative for future versions to recognize both the limitations and possibilities of these generative AI chatbots, and to ensure continuous refinement and training, in order to achieve a more reliable resource and practical application.

Based on the results of this study, certain limitations that could affect the interpretation of our findings should be acknowledged. First, while this systematic review employed rigorous methods and has been reported in accordance with PRISMA, it is acknowledged that the protocol was not registered. Second, the variability in the inclusion criteria and the design of the studies across different countries poses a significant limitation. Health licensing examinations’ structures, difficulty levels, and content vary by country, which might influence the performance outcomes of ChatGPT models. Therefore, this variability is possible to make it challenging to accurately assess the effectiveness of the AI models. Third, the questions included in the studies were multiple choice, covering only a limited scope of knowledge and requiring the selection of the best answer. This approach deviates significantly from real-world medical education settings; hence, specialized training data is needed for the different types of questions, including open-ended or scenario-based formats. Fourth, it must be emphasized that although the ChatGPT versions demonstrated potential in answering questions from the licensing examination, they should not be solely depended upon for studying for tests. As these LLMs often deliver false or unreliable data [ 59 ], it is essential to handle their results with caution and verify them against reliable educational resources, especially in high-stakes exams such as health licensing exams. Fifth, this study did not include any grey literature. Incorporating these sources would help provide a more balanced and comprehensive view of the effectiveness of ChatGPT. Finally, this study concentrated solely on ChatGPT models, but it is important to consider that other LLMs, such as Google’s Bard, Microsoft’s Bing Chat, and Meta’s LLaMA, have also made significant advancements and ongoing improvements. Therefore, future research should explore the application of LLMs beyond ChatGPT to offer an up-to-date perspective on their efficacy in medical and health education.

This study evaluated the performance of both ChatGPT versions across four types of health licensing examinations using a meta-analysis. The findings indicated that GPT-4 significantly outperformed its predecessor, ChatGPT-3.5, in terms of accuracy in providing correct responses. Additionally, the ChatGPT models showed higher proficiency for pharmacy, followed by medicine, dentistry, and nursing. However, future research needs to incorporate larger and more varied sets of questions, as well as advanced generations of AI chatbots, to achieve a more in-depth understanding in health educational and clinical settings.

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

China National Medical Licensing Examination

English version of China National Medical Licensing Examination

Japanese National Examination for Pharmacists

Large Language Models

National Medical Licensing Examination

National Nurse Licensing Examination

National Pharmacist Licensing Examination

Preferred Reporting Items for Systematic Reviews and Meta-Analysis

United States Medical Licensing Examination

Holzinger A, Keiblinger K, Holub P, Zatloukal K, Müller H. AI for life: trends in artificial intelligence for biotechnology. N Biotechnol. 2023;74:16–24. https://doi.org/10.1016/j.nbt.2023.02.001 .

Article   Google Scholar  

Montejo-Ráez A, Jiménez-Zafra SM. Current approaches and applications in natural language processing. Appl Sci. 2022;12(10):4859. https://doi.org/10.3390/app12104859 .

Open AI. Introducing ChatGPT. San Francisco. https://openai.com/blog/chatgpt . Accessed 10 2024.

Fui-Hoon Nah F, Zheng R, Cai J, Siau K, Chen L, Generative. AI and ChatGPT: applications, challenges, and AI-human collaboration. J Inf Technol Case Appl Res. 2023;25(3):277–304. https://doi.org/10.1080/15228053.2023.2233814 .

Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber Phys Syst. 2023;3:121–54. https://doi.org/10.1016/j.iotcps.2023.04.003 .

Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med. 2023;388:1233–9. https://doi.org/10.1056/NEJMsr2214184 .

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. https://doi.org/10.2196/45312 .

Nakhleh A, Spitzer S, Shehadeh N. ChatGPT’s response to the diabetes knowledge questionnaire: implications for diabetes education. Diabetes Technol Ther. 2023;25(8):571–3. https://doi.org/10.1089/dia.2023.0134 .

Webb JJ. Proof of concept: using ChatGPT to teach emergency physicians how to break bad news. Cureus. 2023;15(5):e38755. https://doi.org/10.7759/cureus.38755 .

Huang Y, Gomaa A, Semrau S, Haderlein M, Lettmaier S, Weissmann T, et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for Ai-assisted medical education and decision making in radiation oncology. Front Oncol. 2023;13:1265024. https://doi.org/10.3389/fonc.2023.1265024 .

Kung TH, Cheatham M, Medenilla A, Sillos C, de Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198. https://doi.org/10.1371/journal.pdig.0000198 .

OpenAI. GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. https://openai.com/product/gpt-4 . Accessed 10 Jan 2024.

Yang Z, Yao Z, Tasmin M, Vashisht P, Jang WS, Ouyang F et al. Performance of multimodal GPT-4V on USMLE with image: potential for imaging diagnostic support with explanations. medRxiv 202310.26.23297629. https://doi.org/10.1101/2023.10.26.23297629

Brin D, Sorin V, Vaid A, Soroush A, Glicksberg BS, Charney AW, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep. 2023;13:16492. https://doi.org/10.1038/s41598-023-43436-9 .

O’Connor S, Yan Y, Thilo FJS, Felzmann H, Dowding D, Lee JJ. Artificial intelligence in nursing and midwifery: a systematic review. J Clin Nurs. 2023;32(13–14):2951–68. https://doi.org/10.1111/jocn.16478 .

Azamfirei R, Kudchadkar SR, Fackler J. Large language models and the perils of their hallucinations. Crit Care. 2023;27(1):120. https://doi.org/10.1186/s13054-023-04393-x .

Levin G, Horesh N, Brezinov Y, Meyer R. Performance of ChatGPT in medical examinations: a systematic review and a meta-analysis. BJOG. 2024;131:378–80. https://doi.org/10.1111/1471-0528.17641 .

Alfertshofer M, Hoch CC, Funk PF, Hollmann K, Wollenberg B, Knoedler S, et al. Sailing the seven seas: a multinational comparison of ChatGPT’s performance on medical licensing examinations. Ann Biomed Eng. 2024;52(6):1542–5. https://doi.org/10.1007/s10439-023-03338-3 .

Shakarian P, Koyyalamudi A, Ngu N, Mareedu L. An independent evaluation of ChatGPT on mathematical word problems (MWP). https://doi.org/10.48550/arXiv.2302.13814

Zong H, Li J, Wu E, Wu R, Lu J, Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ. 2024;24(1):143. https://doi.org/10.1186/s12909-024-05125-7 .

Wang YM, Shen HW, Chen TJ. Performance of ChatGPT on the pharmacist licensing examination in Taiwan. J Chin Med Assoc. 2023;86(7):653–8. https://doi.org/10.1097/JCMA.0000000000000942 .

Price T, Lynn N, Coombes L, Roberts M, Gale T, de Bere SR, et al. The international landscape of medical licensing examinations: a typology derived from a systematic review. Int J Health Policy Manag. 2018;7(9):782–90. https://doi.org/10.15171/ijhpm.2018.32 .

Zawiślak D, Kupis R, Perera I, Cebula G. A comparison of curricula at various medical schools across the world. Folia Med Cracov. 2023;63(1):121–34. https://doi.org/10.24425/fmc.2023.145435 .

Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical final examination. Sci Rep. 2023;13(1):20512. https://doi.org/10.1038/s41598-023-46995-z .

Huh S. Are ChatGPT’s knowledge and interpretation ability comparable to those of medical students in Korea for taking a parasitology examination? A descriptive study. J Educ Eval Health Prof. 2023;20:1. https://doi.org/10.3352/jeehp.2023.20.1 .

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. https://doi.org/10.1136/bmj.n71 .

Aljindan FK, Al Qurashi AA, Albalawi IAS, Alanazi AMM, Aljuhani HAM, Falah Almutairi F, et al. ChatGPT conquers the Saudi medical licensing exam: exploring the accuracy of artificial intelligence in medical knowledge assessment and implications for modern medical education. Cureus. 2023;15(9):e45043. https://doi.org/10.7759/cureus.45043 .

Angel M, Patel A, Alachkar A, Baldi B. Clinical knowledge and reasoning abilities of AI large language models in pharmacy: a comparative study on the NAPLEX exam. bioRxiv 2023.06.07.544055. https://doi.org/10.1101/2023.06.07.544055

Fang C, Wu Y, Fu W, Ling J, Wang Y, Liu X, et al. How does ChatGPT-4 preform on non-english national medical licensing examination? An evaluation in Chinese language. PLOS Digit Health. 2023;2(12):e0000397. https://doi.org/10.1371/journal.pdig.0000397 .

Flores-Cohaila JA, García-Vicente A, Vizcarra-Jiménez SF, De la Cruz-Galán JP, Gutiérrez-Arratia JD, Quiroga Torres BG, et al. Performance of ChatGPT on the Peruvian national licensing medical examination: cross-sectional study. JMIR Med Educ. 2023;9:e48039. https://doi.org/10.2196/48039 .

Fuchs A, Trachsel T, Weiger R, Eggmann F. ChatGPT’s performance in dentistry and allergy-immunology assessments: a comparative study. Swiss Dent J. 2023;134(5). Epub ahead of print.

Huang H. Performance of ChatGPT on registered nurse license exam in Taiwan: a descriptive study. Healthc (Basel). 2023;11(21):2855. https://doi.org/10.3390/healthcare11212855 .

Kataoka Y, Yamamoto-Kataoka S, So R, Furukawa TA. Beyond the pass mark: accuracy of ChatGPT and Bing in the national medical licensure examination in Japan. JMA J. 2023;6(4):536–8. https://doi.org/10.31662/jmaj.2023-0043 .

Kleinig O, Gao C, Bacchi S. This too shall pass: the performance of ChatGPT-3.5, ChatGPT-4 and New Bing in an Australian medical licensing examination. Med J Aust. 2023;219(5):237. https://doi.org/10.5694/mja2.52061 .

Kunitsu Y. The potential of GPT-4 as a support tool for pharmacists: analytical study using the Japanese national examination for pharmacists. JMIR Med Educ. 2023;9:e48452. https://doi.org/10.2196/48452 .

Lai UH, Wu KS, Hsu TY, Kan JKC. Evaluating the performance of ChatGPT-4 on the United Kingdom medical licensing assessment. Front Med (Lausanne). 2023;10:1240915. https://doi.org/10.3389/fmed.2023.1240915 .

Mihalache A, Huang RS, Popovic MM, Muni RH. ChatGPT-4: an assessment of an upgraded artificial intelligence chatbot in the United States medical licensing examination. Med Teach. 2024;46(3):366–72. https://doi.org/10.1080/0142159X.2023.2249588 .

Morreel S, Verhoeven V, Mathysen D. Microsoft Bing outperforms five other generative artificial intelligence chatbots in the Antwerp University multiple choice medical license exam. PLOS Digit Health. 2024;3(2):e0000349. https://doi.org/10.1371/journal.pdig.0000349 .

Taira K, Itaya T, Hanada A. Performance of the large language model ChatGPT on the National Nurse examinations in Japan: evaluation study. JMIR Nurs. 2023;6:e47305. https://doi.org/10.2196/47305 .

Takagi S, Watari T, Erabi A, Sakaguchi K. Performance of GPT-3.5 and GPT-4 on the Japanese medical licensing examination: comparison study. JMIR Med Educ. 2023;9:e48002. https://doi.org/10.2196/48002 .

Tanaka Y, Nakata T, Aiga K, Etani T, Muramatsu R, Katagiri S, et al. Performance of generative pretrained transformer on the national medical licensing examination in Japan. PLOS Digit Health. 2024;3(1):e0000433. https://doi.org/10.1371/journal.pdig.0000433 .

Tong W, Guan Y, Chen J, Huang X, Zhong Y, Zhang C, et al. Artificial intelligence in global health equity: an evaluation and discussion on the application of ChatGPT, in the Chinese national medical licensing examination. Front Med (Lausanne). 2023;10:1237432. https://doi.org/10.3389/fmed.2023.1237432 .

Wang H, Wu W, Dou Z, He L, Yang L. Performance and exploration of ChatGPT in medical examination, records and education in Chinese: pave the way for medical AI. Int J Med Inf. 2023;177:105173. https://doi.org/10.1016/j.ijmedinf.2023.105173 .

Wang X, Gong Z, Wang G, Jia J, Xu Y, Zhao J, et al. ChatGPT performs on the Chinese national medical licensing examination. J Med Syst. 2023;47(1):86. https://doi.org/10.1007/s10916-023-01961-0 .

Yanagita Y, Yokokawa D, Uchida S, Tawara J, Ikusaka M. Accuracy of ChatGPT on medical questions in the national medical licensing examination in Japan: evaluation study. JMIR Form Res. 2023;7:e48023. https://doi.org/10.2196/48023 .

Sumbal A, Sumbal R, Amir A. Can ChatGPT-3.5 pass a medical exam? A systematic review of ChatGPT’s performance in academic testing. J Med Educ Curric Dev. 2024;11:1–12. https://doi.org/10.1177/23821205241238641 .

Lucas HC, Upperman JS, Robinson JR. A systematic review of large language models and their implications in medical education. Med Educ. 2024;1–10. https://doi.org/10.1111/medu.15402 .

Moola S, Munn Z, Tufanaru C, Aromataris E, Sears K, Sfetcu R, et al. Chapter 7: systematic reviews of etiology and risk. In: Aromataris E, Munn Z, editors. Editors). JBI Manual for evidence synthesis . JBI; 2020. https://jbi.global/critical-appraisal-tools .

Becker BJ. Multivariate meta-analysis. In: Tinsley HEA, Brown SD, editors. Handbook of applied multivariate statistics and mathematical modeling. San Diego: Academic; 2000. pp. 499–525.

Chapter   Google Scholar  

Cooper. Synthesizing research: a guide for literature reviews. 3rd ed. Thousand Oaks, CA: Sage; 1998.

Google Scholar  

Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979;86(3):638–41. https://doi.org/10.1037/0033-2909.86.3.638 .

Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141(6):589–97. https://doi.org/10.1001/jamaophthalmol.2023.1144 .

Humar P, Asaad M, Bengur FB, Nguyen V. ChatGPT is equivalent to first-year plastic surgery residents: evaluation of ChatGPT on the plastic surgery in-service examination. Aesthet Surg J. 2023;43(12):NP1085–9. https://doi.org/10.1093/asj/sjad130 .

Hopkins BS, Nguyen VN, Dallas J, Texakalidis P, Yang M, Renn A, et al. ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board-style questions. J Neurosurg. 2023;139(3):904–11. https://doi.org/10.3171/2023.2.JNS23419 .

Suchman K, Garg S, Trindade AJ. Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology self-assessment test. Am J Gastroenterol. 2023;118:2280–2. https://doi.org/10.14309/ajg.0000000000002320 .

Knoedler L, Alfertshofer M, Knoedler S, Hoch CC, Funk PF, Cotofana S, et al. Pure wisdom or potemkin villages? A comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE step 3 style questions: quantitative analysis. JMIR Med Educ. 2024;10:e51148. https://doi.org/10.2196/51148 .

OpenAI. GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf . Accessed 10 2024.

Seghier ML. ChatGPT: not all languages are equal. Nature. 2023;615(7951):216. https://doi.org/10.1038/d41586-023-00680-3 .

Mello MM, Guha N. ChatGPT and physicians’ malpractice risk. JAMA Health Forum. 2023;4(5):e231938. https://doi.org/10.1001/jamahealthforum.2023.1938 .

Download references

This research was supported by the Chung-Ang University Graduate Research Scholarship in 2023 and by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1A6A1A03044296).

Author information

Authors and affiliations.

Research Institute of Pharmaceutical Sciences, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea

Hye Kyung Jin & EunYoung Kim

Data Science, Evidence-Based and Clinical Research Laboratory, Department of Health, Social, and Clinical Pharmacy, College of Pharmacy, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea

Hye Kyung Jin, Ha Eun Lee & EunYoung Kim

Division of Licensing of Medicines and Regulatory Science, The Graduate School of Pharmaceutical Management, and Regulatory Science Policy, The Graduate School of Pharmaceutical Regulatory Sciences, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, South Korea

EunYoung Kim

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization HKJ and EYK; literature searches HKJ; data extraction and coding HKJ and HUL; data analysis HKJ; writing-original draft HKJ; writing-review and editing HKJ, and EYK; overall study supervision EYK. All the authors have read and approved the final manuscript.

Corresponding author

Correspondence to EunYoung Kim .

Ethics declarations

Ethics approval and consent to participate.

This study did not involve human participants; therefore, this is not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Jin, H.K., Lee, H.E. & Kim, E. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis. BMC Med Educ 24 , 1013 (2024). https://doi.org/10.1186/s12909-024-05944-8

Download citation

Received : 29 March 2024

Accepted : 22 August 2024

Published : 16 September 2024

DOI : https://doi.org/10.1186/s12909-024-05944-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • ChatGPT-3.5
  • National licensing examination
  • Healthcare professionals
  • Meta-analysis

BMC Medical Education

ISSN: 1472-6920

ai in education a systematic literature review

Relationship between mental health and students’ academic performance through a literature review

  • Open access
  • Published: 17 September 2024
  • Volume 4 , article number  119 , ( 2024 )

Cite this article

You have full access to this open access article

ai in education a systematic literature review

  • Cynthia Lizeth Ramos-Monsivais 1 ,
  • Sonia Rodríguez-Cano 2 ,
  • Estefanía Lema-Moreira   ORCID: orcid.org/0000-0003-2286-4902 3 &
  • Vanesa Delgado-Benito 2  

Mindfulness has become increasingly popular to improve physical and mental health. Its implementation transcends boundaries of disciplines that study its impact. The aim of this study is to identify and analyze the benefits of mindfulness on mental health, academic performance, well-being, mindfulness and prosocial behavior of university students, as well as to identify the most effective way to achieve habituation to the practice. An analysis and systematic review of papers published in the Scopus database was conducted. It was found that publications on the implementation of mindfulness in higher education began in 2004. Their study has been developed in 22 countries, 15 are European, 3 Asians, 2 North American, one Latin American and one from Oceania. Spain is the only Spanish-speaking country. Academically, mindfulness stimulates creativity, exploratory thinking, critical thinking, attention regulation, increases concentration and improves the learning experience. In addition, immersive virtual reality experiences were found to positively influence habituation towards mindfulness practice among university students.

Explore related subjects

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

In recent decades, mindfulness has gained popularity as a technique for reducing stress, anxiety, and depression. As well as increasing the well-being and quality of life of people who practice it [ 1 ]. Its origin is found in the Buddhist tradition, as a way to achieve clarity of thought [ 2 ]. Although this technique has been practiced in the East for more than 25 centuries, in the West its popularity is recent [ 3 ]. However, its application is expanding more and more in different disciplines [ 4 ].

Social-emotional learning has been introduced in education. It refers to the training of attention, through meditation techniques, such as mindfulness, the most recent update of the programs that seek emotional intelligence [ 5 ]. This type of education is also known as contemplative education, which seeks to enhance the learning experience through reflection and personal perception [ 6 ].

Dr. Jon Kabat-Zinn defines mindfulness as “awareness that develops by paying concrete, sustained, deliberate, and non-judgmental attention to the present moment” [ 7 , p. 13]. It facilitates maintaining mental calm and training attention [ 8 ]; in addition to increasing mental clarity and awareness [ 9 ].

In terms of operability, three qualities that people develop while practicing mindfulness and three qualities related to how the practice is carried out are recognized. The first are observation, description, and participation. While in the mode of practice, acceptance is required, in the present moment and in an effective manner [ 10 ].

Mindfulness can be practiced formally and informally. In formal practice, a specific time is set aside daily for guided meditations. Informal practice brings awareness to daily activities. That is, paying attention to sensations and perceptions while walking, driving, eating, cleaning, among other activities [ 7 ].

Mindfulness has been shown to improve physical and mental health. In terms of physical health, it favors the increase of Brain Derived Neurotrophic Factor (BNDF) [ 11 ]. While in mental health it reduces symptoms of anxiety [ 12 ], stress [ 13 , 14 , 15 , 16 ] and depression [ 12 ]. It also facilitates coping with change and uncertainty [ 14 ] and increases well-being [ 17 ].

1.1 How might the efficacy of mindfulness be evaluated?

Blood tests can be used to measure the effectiveness of mindfulness. A reduction in the levels of cortisol, the stress hormone [ 13 ]; and of increased BNDF can be observed after two weeks of practice [ 11 ]. Increased blood BNDF levels are a potential mediator between meditation practice and brain health [ 13 ]. BNDF measured in the blood by plasma or saliva is called peripheral BNDF [ 18 ].

BNDF is a modulator that regulates neuron growth. It allows the creation of new dendrites which improves communication between neurons; in other words, it promotes greater neuronal plasticity in the central and peripheral nervous system [ 11 , 13 , 18 , 19 , 20 ]. Its main function is at the level of the hippocampus and cerebral cortex, structures linked to learning and memory functions [ 13 ].

BNDF is produced in the central nervous system and peripheral tissues. Over time, its production tends to decrease. Its absence is related to psychiatric and neurological disorders such as emotional burnout, anxiety, depression and Alzheimer’s disease [ 13 ] However, some activities stimulate its production. Exercising, practicing yoga, undergoing controlled stress, traveling, acquiring new experiences, learning and mindfulness stimulate its production [ 13 , 20 ].

1.2 What are the reasons for integrating mindfulness into higher education?

The increase in mental health illnesses in college students has become a recognized concern [ 16 , 21 ]; which requires innovative interventions to address this reality [ 22 , 23 ]. In this sense, mindfulness emerges as a proposed solution [ 12 ], to prevent and reduce professional burnout [ 24 ]. Thus, there is growing interest in its applications in higher education [ 25 , 26 ].

In addition to the physical and mental health benefits, mindfulness practice promotes better academic performance [ 8 , 27 , 28 ]. Such as increased attention, learning and thinking [ 29 ]; and reduced pre-test anxiety [ 29 , 30 ].

Mindfulness practice also stimulates exploratory thinking [ 4 ], creative thinking [ 4 , 31 ], and critical thinking [ 2 ]. It increases spatial and sensory awareness [ 4 ], academic self-efficacy [ 32 , 33 ], productivity and task quality [ 8 ]; in addition to increasing the feeling of personal accomplishment [ 34 ].

On the other hand, it facilitates information retention [ 35 ], improves concentration [ 22 , 26 , 36 , 37 ], attention self-regulation skills [ 32 , 37 , 38 ] and allows for a perceived improvement in the overall learning experience [ 31 , 37 , 39 , 40 , 41 ]. This is because it is essentially training the brain that facilitates focusing attention. A faculty that, for William James, father of American psychology, constituted the root of judgment, character and will [ 42 ].

1.3 Technological immersion in mindfulness

Studies show that technology is increasingly present in the field of mindfulness practice. Evidence of that is the introduction of video games such as the one developed at the University of Wisconsin called tenacity. This is to improve mindfulness through breathing exercises [ 5 ]. Mobile applications such as Headspace and Calm have also been developed to promote meditation techniques [ 43 , 44 ].

In addition to the above, immersive environments incorporating Virtual Reality (VR) have been developed to stimulate mindfulness practice. Home meditation studio, tripp and maloka are some of the applications that virtual reality allows mindfulness practice in totally immersive environments.

1.4 Virtual reality and mindfulness in education

VR makes it possible to experience alternative realities perceived atmospherically [ 45 ]. It is applied in disciplines and sciences such as medicine, engineering, mathematics, dentistry and education [ 46 ]. In education it is used to improve academic performance [ 29 ], and increase attention, creativity, flow state, and habituation to practice [ 47 ].

Pascual et al. [ 48 ] state that, despite there being few studies related to the evaluation of mindfulness interventions using VR, it is considered a more effective platform than standalone mobile meditation apps for encouraging daily practice. Along those lines, results from Miller et al. [ 49 ] study indicates that VR-guided meditation practice is associated with increased positive affect compared to non-VR meditation.

In the case study by Malighetti et al. [ 50 ] it was found that techniques for the development of emotional intelligence such as increased awareness, identification of emotional states, increased resilience and self-control implemented through VR allowed greater mental regulation in terms of eating habits in patients with binge eating disorders. In that order, students with greater emotional regulation have greater self-efficacy [ 51 ].

VR mindfulness promotes mental health [ 52 ]. Studies show that it can reduce insomnia and stress [ 53 ] and improve learning [ 46 ]. Coupled with the above, Kwon et al. [ 30 ] found that incorporating virtual environments through VR is feasible for managing anxiety stemming from academic exams.

Kaplan-Rakowski et al. [ 29 ] study showed that students who meditated with VR performed better academically than those who meditated using videos. While Yang et al.’s [ 47 ] research, immersive virtual reality experiences were found to affect traits associated with students' creativity such as flow state and attention. When students were assigned creative challenges or challenges, those who participated in immersive VR produced better quality products. They also maintained a more stable attention level than the control group.

VR can impact long-term learning. According to Mohring and Brendel [ 45 ] it use in the educational context needs to be reflected upon, because it triggers human perception with far-reaching consequences and people using it hardly question the alternative reality experience it offers. Nevertheless, it can contribute significantly to students’ training through the development of enhanced digital skills and increased mindfulness.

According to Mohring and Brendel [ 45 ] VR can trace the path towards mindfulness in different educational contexts: in teaching and in transforming the relationship between society and the environment. A view that coincides with Whewell et al. [ 54 ] who argue that these immersive experiences contribute to the development of enhanced digital skills, increased student engagement, cultural competence and global mindfulness in university students. VR can foster the conditions for students to become global change agents “within the spheres of entrepreneurship and education” [ 54 , p.1].

However, mindfulness benefits require continuous practice. According to the study by Pascual et al. [ 48 ], meditation sessions are associated with a decrease in anxiety. Therefore, identifying how to introduce and implement an effective program is of the utmost relevance for updating the current educational system.

In that sense, this research aims to identify programs that have been implemented to incorporate mindfulness into higher education. From its beginnings to the present, it analyzes the scientific literature to understand the evolution of its implementation. It identifies the countries where these programs are carried out, the universities that participate, the years they have been carried out and the types of documents published.

Mindfulness's documented benefits for mental health, academic performance, well-being, and students' awareness and prosocial behavior are discussed. Finally, technology, specifically virtual reality, is addressed as a medium that facilitates mindfulness practice stimulation and habituation.

Therefore, the following research questions were defined: 1. How many publications are published per year? 2. In what language are they published? 3. What kind of documents are published? 4. Which universities are involved in the research? 5. In which countries are mindfulness and higher education being studied? 6. What is the impact of mindfulness on higher education students' mental health? 7. What is the impact of mindfulness on higher education students' academic performance? 8. What is the impact of mindfulness on higher education students’ well-being? 9. What is the impact of mindfulness on higher education students’ conscientiousness and prosocial behaviour? 10. Is virtual reality the most effective medium for fostering mindfulness among higher education students?

An analysis of scientific publications in the scopus database, which could be accessed through an institutional account of the University of Burgos in Spain as part of a research stay, was carried out. The information search was conducted using English keywords. The keywords used to elaborate the search string were mindfulness, meditation, university students and higher education students. This search string yielded 70 publications as of July 19, 2024.

All Scopus database publication types were considered inclusion criteria: articles, book chapters, papers, reviews, books and short surveys. In English and Spanish. All articles whose information was not available, were not aimed at higher education students, or did not address any meditation technique were excluded.

An Excel document with the articles' information was extracted for analysis. One article was not available so 69 documents were considered. It was found that 11 publications did not actually mention meditation techniques and were excluded. Also, 5 publications not directed at higher education students were not considered. This resulted in 53 selected research papers. Figure  1 illustrates the situation.

figure 1

Flow diagram

To answer questions 6, 7 and 8, a subsequent analysis was carried out to identify the measurement variables used by the authors. Measurement variables were identified in the selected documents. The variables were divided into four categories. Mental health, academic achievement, well-being, and prosocial awareness and attitude.

The mental health category includes 9 variables: reduction of stress, anxiety, depression, emotional exhaustion, depersonalization burnout and negative mood. Also increased mental health, calmness and positive mood. Of the 53 items, 4 address some mental health elements and 23 also include elements from other categories.

Academic achievement is made up of 16 variables: academic performance, clinical performance, exploratory thinking, critical thinking, creative thinking, productivity, task quality, academic speed, persistence, observation skills, attention regulation skills, information retention, academic self-efficacy and concentration. Additionally, the learning experience and divergent and convergent creative writing will be improved. Of the 53 items, 5 address elements relating to academic achievement and 19 also include elements from other categories.

The well-being category consists of 13 variables: increased life satisfaction, well-being, sense of belonging, emotional self-regulation, quality of life, self-compassion, physical activity, resilience, non-judgmental acceptance, perceived social support, and sense of accomplishment. Also included are better dietary decision making and improved sleep quality. Of the 53 items, 1 addresses well-being items and 20 include items from other categories.

In the category awareness and prosocial behavior, 14 variables were integrated: increased mindfulness skills, spatial awareness, sensory awareness, self-awareness, dispositional mindfulness, empathy, benevolence, prosocial behavior, collectivism, a sense of transcendence, universalalism, mental clarity, responsibility and improved interpersonal relationships. Of the 53 items, one addresses element unique to prosocial awareness and behavior and 21 also include elements from other categories.

To answer question 8, an additional search integrating technology and virtual reality was included. Although the object of this study is directed primarily at higher education students, research that analyses mindfulness incorporation at other educational levels was considered in this question.

The results of the research are presented in this section. We start with the general findings and then answer the research questions.

3.1 General findings

Although all the investigations analysed are directed at higher education students, 27 do not specify the discipline or the educational program in which the students are enrolled. However, it was found that the educational programs where mindfulness effectiveness is most frequently studied is in medicine and nursing with six investigations, engineering with four, and then anaesthesiology, arts and design, sciences, modern dance, law midwifery, writing, pharmacy, literature, music, social work and design pedagogy with one respectively.

Regarding the duration of the programs, of the 53 studies analysed, 31 do not specify the duration of the practice in weeks, days or sessions. However, in six investigations the programs lasted 8 weeks and in five investigations, 6 weeks. The longest program consisted of 12 weeks and the shortest 1 day. About the analysis of keywords, Fig.  2 shows the identified word networks.

figure 2

Visualization of keyword networks based on a VOSviewer version 1.6.20 elaboration

In this analysis, it was found that of the 418 keywords used, 30 have at least a frequency of occurrence of 5. It is highlighted that the words with a higher frequency of occurrence and greater connectivity are mindfulness and meditation. Next, the research questions are answered.

How many papers are published each year about mindfulness and higher education students?

According to Table  1 , publications on mindfulness in higher education began in 2004. In 2014, these rates began to remain constant. In the United States, the first publication was produced by the doctor Daniel Holland, associated with universities in Pennsylvania, Arkansas, Illinois, and Washington. At the University of Pennsylvania, the first program for developing resilience in children was developed. Furthermore, in the late 1990s, doctors Martin Seligman and Mihaly Csikszentmihalyi, both affiliated with the same university, pioneered positive psychology [ 55 ].

As part of positive education, positive psychology was introduced to institutions. The concept of positive education succeeds the concept of emotional education. In addition to emotions, this approach incorporates other elements such as meditation in order to increase well-being [ 56 ].

What is the language in which mindfulness research is published? There are 53 documents in the collection, 50 of which are in English and three of which are in Spanish

Are there any published documents that discuss mindfulness and students in higher education? Publications were classified into five categories: articles, reviews, book chapters, presentations and books. As shown in Table  2 , each type of document has a different quantity.

There are several different types of documents published. Articles are the most frequently published. Review articles, presentations, book chapters, and books follow.

What are the publications of universities on mindfulness and higher education students?

A summary of the publications produced, the universities that participate in collaborations, and the most important findings are presented in this section according to the type of document, the language, and the year.

3.2 Spanish-language articles

There have been only three articles published in Spanish. These include one by the University of Almería in Spain in 2009, another by the University of Lisbon in Portugal in 2022 and another by the University of Granada in Spain in 2024. A study by Justo and Luque [ 57 ] demonstrated that mindfulness leads to a deepening of reflection and self-awareness, which in turn stimulates prosocial values like benevolence, collectivism, and the sense of universalism and transcendence. Sobral and Caetano [ 58 ] conducted a study in which individual and collective activities were incorporated into two courses, including mindfulness, using students' portfolios and teachers’ notes. On the other hand, in the study by García-Pérez et al. [ 23 ] mindfulness is considered as a starting point to guarantee mental health and improve the well-being of university students.

3.3 Articles in English

In 2014, two English-language publications were published. One by Nottingham Trent University in the United Kingdom and one by Duke University Medical Center in the United States. Greeson et al. [ 59 ] found that the Koru mindfulness training program improved sleep, improved mindfulness skills, increased self-compassion, and decreased stress among college students.

According to Van Gordon et al. [ 3 ], the Meditation, Awareness Training (MAT) program has been evaluated by college students. During the eighth weeks of training, the students demonstrated improved well-being and self-regulation skills in terms of thoughts, feelings, and behavior. A significant increase was also observed in dispositional mindfulness.

In 2015 only one paper was published by Newcastle University in Australia. In this study, after 7 weeks of practicing mindfulness, students showed an improvement in their well-being, sleep quality, increased concentration, mental clarity and a reduction in negative mood was observed [ 22 ].

In 2016, two articles were published, one by Chatham University in the United States, and another where two universities from two different countries participated, the National University of Ireland and Coleraine University in the United Kingdom. In the study by Noone et al. [ 2 ] it was found that dispositional mindfulness facilitates critical thinking. While in the research of Spadaro and Hunker [ 38 ] it was found that after 8 weeks of practicing mindfulness online, nursing students in the United States reduced anxiety and stress. They also increased mindfulness self-regulation skills.

There were three articles published in 2017. The first study was conducted by Ohio State University in the United States, the second by Ryerson University in Canada, and the third by the Department of Psychiatry at MoleMann Hospital for Mental Health in the Netherlands.

Using reflective writing and guided mindfulness meditations, Klatt [ 60 ] conducted research at Ohio University to increase awareness of students' life goals. According to Schwind et al. [ 37 ], mindfulness and loving-kindness meditation practice after eight weeks reduced anxiety, improved learning experience, increased sense of calm, concentration, and attention self-regulation skills among Canadian university students.

While in the research of Van D’Ijk et al. [ 61 ] it was found that after 8 weekly sessions of two hours daily using the mindfulness-based stress reduction (MBSR) program, students from the Netherlands reduced anxiety and negative emotional states. Improved mental health, life satisfaction and increased mindfulness skills were also observed. However, empathy was not increased.

In 2018, three articles were published. One by the University of Seville in Spain, one by the National University of Ireland and one where an international collaboration between 5 universities took place. The University of Southampton in the UK, the Helvetiapraxis Medical Centre in Switzerland, Kings College London in the UK, the Coburg University of Applied Sciences and Arts in Germany and the Poznan University of Medical Sciences in Poland.

Research conducted by Bernárdez et al. [ 9 ] revealed that software engineering students at the University of Seville in Spain improved their academic self-efficacy after 6 weeks of practicing mindfulness.

Lynch et al. [ 25 ] evaluated mindfulness-based coping with university life (MBCUL), an adaptation of the MBSR program. College students increased their mindfulness skills, decreased stress, anxiety, and depression after eight weeks. The study by Noone and Hogan [ 62 ] found that practicing mindfulness using the headspace mobile app for 6 weeks or 30 sessions increased dispositional mindfulness, but not critical thinking. Students at the National University of Ireland participated in this study.

There were three articles published in 2019. In the United Kingdom, Birmingham City University submitted the first study, in the United States, Lousville University submitted the second, and in Iceland, the University of Rhode Island submitted the third.

A study conducted by Dutt et al. [ 84 ] from the University of Birmingham has demonstrated that mindfulness reduces stress and helps to make better dietary decisions. The University of Rhode Island conducted a study in which Lemay et al. [ 63 ] found that after 6 weeks of practicing viyansa yoga, pharmacy students were able to increase their mindfulness skills and reduce their levels of stress and anxiety. Weingartner et al. [ 39 ] found that mindfulness and compassion training increased mindfulness skills, dispositional mindfulness, and empathy in medical students at Lousville University. As a result, interpersonal relationships, resilience, nonjudgmental acceptance, observational skills, and learning experiences were also improved.

In 2020, four papers were published. In the United States, there are four, one from the University of North Carolina, one from the University of Florida, one from Juiz de Fora in Brazil, and one from the Department of Psychological and Behavioral Sciences at the London School of Economics and Political Science.

At the University of North Carolina, a slow sensory experience linked to meditation techniques is introduced in the modern dance program to improve concentration [ 64 ]. According to the study by Bóo et al. [ 27 ], mindfulness increases academic performance, emotional self-regulation, and self-awareness in the UK. However, Damião et al. [ 65 ] found no significant increase in mindfulness skills of medical students at the Federal University of Juiz de Fora, Brazil, following a 6-week mindfulness training program. Stress, anxiety, or depression did not decrease. Quality of life and mental health also showed no change.

A study by Williams et al. [ 40 ] concluded that medical students at the University of Florida improved their mindfulness skills, perceived social support, empathy, and prosocial behavior after 11 weeks participating in the Promoting Resilience in Medicine (PRIMe) program, although they did not reduce stress. Behaviors characterized by empathy and prosociality. As a result, the general well-being and learning experience have also improved.

There were three articles published in 2021. First, the University of Manitoba in Canada, second, Bilkent University in Turkey, and third, Johns Hopkins University in the United States. Altay and Porter [ 4 ] found that mindfulness practice among design psychology students in Turkey increased non-judgmental acceptance, exploratory thinking, creative thinking, spatial awareness, sensory awareness, and empathy.

An evaluation of the effectiveness of the Headspace mobile application was conducted by Carullo et al. [ 33 ]. Over the course of four months, anesthesiology and medical students from the United States practiced mindfulness. Depression levels were reduced and personal accomplishments were increased. The level of emotional exhaustion nor the level of depersonalization burnout, however, did not improve. Based on research conducted by Litwiller et al. [ 21 ] among college students in Canada, mindfulness, meditation, Tai Chi, yoga, exercise, and animal therapy have been found to be effective in reducing stress, anxiety, depression, and negative mood.

The year 2022 saw the publication of nine papers. The first was completed by the Aix-Marseille University in France, the second by the Department of Anthropology at the University of Missouri in the United States, and the third by the University of Central Arkansas in the United States in collaboration with the University of Missouri. It was also submitted by the University of Illinois in the United States, Kirikkale University in Turkey, Arizona State University in the United States, the University of Seville in Spain, Brock University in Canada, and the University of Lisbon in Portugal.

Researchers in Turkey found that mindfulness practice increases life satisfaction among nursing students. According to Bernárdez et al. [ 8 ], mindfulness enhanced academic performance, productivity, task quality, and academic speed in Spanish students. Devillers-Réolon et al. [ 66 ] found that stress, anxiety, and depression were reduced in their research. The ability of French university students to regulate their attention did not improve, despite improvements in their well-being.

Researchers at Arizona State University found that mindfulness practice increased concentration, non-judgmental acceptance, and resilience among arts and design students. An opinion survey conducted by Klonoff-Cohen [ 67 ] revealed that college students in Illinois believe meditation and mindfulness exercises are effective coping mechanisms. The study by Sensiper [ 26 ] from the Anthropology Department concluded that after 10 weeks of structured in-class meditations, mindfulness exercises, contemporary text readings, and reflective writing, college students exhibited reduced anxiety, improved well-being, increased emotional self-regulation, concentration, and dispositional mindfulness.

As part of the research conducted by Sobral and Caetano [ 58 ], the University of Lisbon conducted a self-study on emotional education. Teachers evaluated the students’ portfolios in order to identify recurrent problems, and students evaluated mindfulness practices, collective and individual projects.

Strickland et al. [ 68 ] reported that mindfulness combined with a modified version of Dr. Robert Boice’s blocked writers program increased positive mood and resilience to stress and anxiety in students and teachers in higher education.

According to Woloshyn et al. [ 31 ], mindful writing stimulates creative thinking, increases empathy and prosocial behavior in higher education students and teachers in Canada. A positive emotional state can also be achieved through non-judgmental acceptance, increased self-awareness, self-compassion, and non-judgmental acceptance. In addition, it enhances well-being and the learning experience.

Six papers have been published in 2023. One by the University of Rome in Italy, one by Griffith University in Australia, another is the result of a collaboration between the University of South Carolina and Winthrop University both in the United States; and another due to collaboration between the Institute of Psychology of Lorand University in Hungary, the University of Vienna and the University of Artois in France.

One paper is the result of a collaboration between the University of the West of England in United Kingdom, and Dongguk University in South Korea. And another article was the result of a collaboration between University of Limoges, University of Montpellier and University of Paris Cité in France and University of Brussels in Belgium.

In the research by Fagioli et al. [ 32 ] University students in Italy practice mindfulness online for 28 days. An improved sense of belonging increased academic self-efficacy and self-regulation of attention skills were observed. In the study by García et al. [ 69 ], mindfulness was practiced for 1 week, 5 min daily. This exercise reduced anxiety, increased physical activity and improved sleep in United States students. Nagy et al. [ 70 ] found that mindfulness practice can increase persistence in those with a strong disposition toward a growth mindset or mindfulness.

In the research of Hagège et al. [ 71 ] it was found that the Meditation-Based Ethics of Responsibility (MBER) program had a positive impact on sense of responsibility and convergent and divergent creative writing tasks in undergraduate science students. In undergraduate music therapy students, it was found that eight weeks of practicing mindfulness can reduce stress and improve mindfulness and well-being [ 72 ]. While Pearson’s [ 73 ] looks for strategies on how mindfulness can be introduced into law education programs in Australia.

So far in 2024, three papers have been published. One by the Virginia Tech College of engineering. Another by the collaboration of Idaho State University and the University of Wisconsin Oshkosh, in the United States. Another by Kaohsiung Medical University and Meiho University, both from Taiwan.

In the research of Giesler et al. [ 74 ] the Caring Action Leadership Mindfulness model is proposed to increase mental health and sense of belonging in undergraduate social work students. In the study by Liu et al. [ 75 ] it was found that practicing mindfulness for 50 min a week for 8 weeks reduced stress and increased mindfulness skills in nursing students. On the other hand, Martini et al. [ 76 ] found that although most engineering students after practicing mindfulness experienced a reduction in perceived stress, a sense of calm, increased energy, and greater concentration, other students who expressed feeling more tired and distressed after meditation practice.

3.4 Book chapters

Book chapters are rare. One by Queen Margaret University in 2015 and one by the University of Surrey in 2020, both UK universities. In the Oberski et al. [ 35 ] study, it was documented that mindfulness in college students allows for increased information retention and a positive emotional state. In Kilner-Johnson and Udofia’s [ 77 ] research, techniques for incorporating mindfulness in the humanities in higher education are proposed.

On the other hand, only one book was published by the University of Groningen in the Netherlands in 2021. This work addresses the benefits of incorporating mindfulness into higher education courses. It documents the results of the Munich model named mindfulness and meditation in the university context. It also includes practical exercises with instructions for implementation in educational institutions.

3.6 Conferences

Three conferences have been published from the United States. One in 2006 by the University of Arkansas, another by the University of Denver Colorado in 2021, and another by Northeastern University in 2023. Holland [ 6 ] presents a course developed and implemented in some universities in the United States through his personal experience, while Wu [ 41 ] states that sonic meditation for higher education students improves the learning experience. In the study by Grahame et al. [ 78 ] it was found that daily mindfulness practice enables engineering undergraduates to reduce stress.

3.7 Reviews

Six reviews have been published. One was in 2004 by Southeastern Illinois University in the United States. In 2017 there were 2 publications. One by the University of Portland in the United States and one by LaTrobe University in Australia. In 2019 the Medical Department of the University of Amsterdam in the Netherlands also published a review. In 2021, a collaboration between three UK universities—Queens University, the University of Suffolk and the University of York was published. In 2024 another was published by Padjadjaran University in Indonesia.

Holland [ 79 ] outlines how mindfulness can be incorporated into higher education and the benefits this can bring for students with disabilities and promote health. McConville et al. [ 33 ] found that mindfulness reduces stress, anxiety, and depression. It also increases mindfulness skills, empathy, a positive emotional state, and academic self-efficacy. Stillwell et al. [ 80 ] found that both the MBSR program, yoga, breath work, meditation, and mindfulness in nursing students reduced stress.

Breedvelt et al. [ 81 ] evaluated the effectiveness of meditation, yoga, and mindfulness on symptoms of depression, anxiety, and stress in college students. They concluded that most publications regarding mindfulness have a high risk of bias, are of poor quality, and do not specify which technique provides the benefits. For it is unclear whether it is mindfulness, yoga or another meditation technique that is effective. McVeigh et al. [ 28 ] found that mindfulness practice in nursing students reduces stress, increases clinical academic performance and self-awareness. In the research of Yosep et al. [ 82 ] it was found that digital mindfulness through audios and videos is effective in improving the mental health of university students.

What are the countries where mindfulness and higher education students are most widely published?

Based on the description of the universities in question three, Fig.  3 illustrates the countries and locations where publications on mindfulness and higher education students have been published.

figure 3

Geographical location of countries where mindfulness research has been conducted. Font: Own elaboration in the Mapchart application [ 83 ]

As can be seen, the United States leads in mindfulness research and higher education students. It is followed by the United Kingdom, Canada, Australia and Spain. Spain is the only Spanish-speaking country on the list.

On the other hand, although the research is carried out in 22 countries, the collaboration networks include 14 countries. Figure  4 shows the collaboration networks detected.

figure 4

Cross-country collaboration networks based on a VOSviewer version 1.6.20 elaboration

Figure  4 shows a collaborative network of 14 countries composed of four nodes. One is formed by Austria, Belgium, Canada, France and Hungary in red. In green by the United Kingdom, Turkey, South Korea and Ireland. In blue, Germany, Switzerland and Poland and in yellow, Australia and the Netherlands.

What are the benefits of mindfulness practice for higher education students’ mental health?

Mindfulness practice reduces stress [ 21 , 25 , 28 , 33 , 38 , 59 , 63 , 66 , 80 , 84 ] anxiety [ 21 , 25 , 26 , 33 , 37 , 38 , 61 , 63 , 66 , 69 ] and depression [ 21 , 25 , 33 , 34 , 66 ].

Mindfulness reduces negative mood [ 21 , 22 , 61 ]. As well as increasing positive mood [ 31 , 33 , 35 , 68 ]. In research by Bernárdez et al. [ 9 ], mindfulness was found to reduce emotional exhaustion and depersonalization burnout. While Van D’Ijk et al. [ 61 ], that it improves mental health. Schwind et al. [ 37 ] found that it increases the feeling of calm.

3.8 Stress reduction

In the case of Devillers-Réolon et al. [ 66 ] and Spadaro & Hunker [ 38 ] the mindfulness practice was conducted online and lasted for 17 days and 8 weeks respectively. Greeson et al. [ 59 ] study was also practiced online using the Koru program. Although the duration of this is not specified.

In Lynch et al. [ 25 ] research, the MBSR program was used for 8 weeks. While Stiwell et al. [ 80 ] the same program was used, although the duration of time is not specified. Of the five studies in which mindfulness is practiced traditionally through guided meditations, only one, that of Lemay et al. [ 63 ] indicates that the program lasted 6 weeks in 60-min sessions. The other investigations do not indicate weeks or practice sessions.

According to Yogeswaran and Morr [ 16 ] online mindfulness practice can be effective in addressing stress. However, at least for medical students, the evidence was not sufficient to prove its efficacy in decreasing symptoms of depression and anxiety. In contrast, the study by Ahmad et al. [ 12 ] found that, among university students in Toronto, Canada, internet-based Cognitive Behavioral Mindfulness Therapy interventions could reduce symptoms of anxiety, depression and stress after 8 weeks.

What are the benefits of mindfulness practice on higher education students’ academic performance?

Mindfulness increases clinical performance [ 28 ] and academic performance [ 8 , 27 , 28 ]. Stimulates exploratory thinking [ 4 ], creative thinking [ 4 , 31 ] and critical thinking [ 2 ].

It increases productivity, task quality and academic speed [ 8 ]. As well, it also increases academic self-efficacy [ 9 , 32 , 33 ], improves the learning experience [ 31 , 37 , 39 , 40 , 41 ], and improves observation skills [ 39 ].

Coupled with the above, it improves information retention [ 35 ], increases concentration [ 22 , 26 , 36 , 37 ], and attention self-regulation skills [ 32 , 37 , 38 ]. Another finding in relation to academic performance is that mindfulness can increase persistence in people with a strong disposition toward mindfulness or a growth mindset [ 70 ].

3.9 What benefits does mindfulness practice have on higher education students?

Mindfulness practice increased perceived social support [ 31 , 40 ], improves well-being [ 3 , 22 , 26 , 31 , 40 , 66 ] and improve dietary decision-making [ 84 ]. It also increases sense of belonging [ 32 ], life satisfaction [ 61 , 85 ], physical activity [ 69 ]; and improves sleep quality [ 22 , 59 , 69 ]. Damião´s et al. [ 65 ] research showed no improvements in quality of life after the intervention.

Mindfulness allows increasing self-compassion [ 31 , 59 ], sense of personal achievement [ 34 ], self-regulation of thoughts, feelings and behaviors [ 3 , 26 , 27 ]. It stimulates the development of resilience for stress and anxiety management [ 36 , 39 , 68 ]; and it helps to manage the judgmental voice. That is, it facilitates non-judgmental acceptance [ 4 , 31 , 36 , 39 ].

What are the benefits of mindfulness practice on mindfulness and prosocial behavior in higher education students?

Mindfulness allows for increases in self-awareness [ 27 , 28 , 31 ], sensory and spatial awareness [ 4 ], mindfulness skills [ 25 , 33 , 39 , 40 , 59 , 61 , 63 ] and disposition toward mindfulness [ 3 , 26 , 39 , 68 ].

It also stimulates prosocial behavior [ 40 ], collectivism [ 31 , 57 ]. It increases empathy [ 4 , 31 , 33 , 39 , 40 ] and benevolence [ 57 ]. It improves interpersonal relationships [ 31 , 39 , 40 ], clarity of thought [ 22 ]; and increases the sense of universalism and transcendence [ 57 ].

Is virtual reality the most effective way to promote mindfulness among higher education students?

Virtual reality could facilitate mindfulness habituation. In the study by Navarrete et al. [ 86 ] conducted with university medical students in Valencia, Spain, it was found that those who participated in the virtual reality program meditated twice as long as those who only practiced through regular guided meditation. Along these lines, Pascual et al. [ 48 ] found that health professionals who practiced meditation completed more sessions than those who did not use VR.

Likewise, in the study by Modrego-Alarcón et al. [ 15 ] and Miller et al. [ 49 ] it was found that VR students acquired greater immersion and mindfulness practice. Therefore, immersive virtual reality environments favor habituation toward mindfulness practice.

4 Discussion

The benefits of mindfulness in higher education students at the psychoemotional level have been widely documented [ 12 , 13 , 14 , 15 , 16 , 17 , 87 ]. One of the most frequently highlighted benefits of mindfulness in higher education students is the positive effect on self-esteem, as evidenced by the findings of several studies [ 88 , 89 ]. Additionally, mindfulness has been shown to reduce stress levels [ 25 , 33 , 39 , 40 , 59 , 61 , 63 , 90 ]. These types of benefits have also been observed in other demographic groups. For example, a study conducted by Chandna et al. [ 91 ] with an adult population demonstrated that mindfulness practice was associated with significant improvements in self-esteem and self-efficacy.

As previously stated, mindfulness practice has been identified as a potential solution to the emotional difficulties experienced by higher education students in the current context [ 12 ]. The positive effects of mindfulness on students’ psychoemotional well-being have been demonstrated in numerous studies [ 66 , 67 , 85 ]. It can thus be inferred that these benefits will also affect other areas of students’ lives, reducing their difficulties both psychoemotionally and academically, for example.

In terms of academic performance, the findings of Bóo et al. [ 27 ], Bernárdez et al. [ 8 ] and McVeigh et al. [ 28 ] are worthy of note. This is not exclusive to students in higher education. A study by Artika et al. [ 92 ] with a sample of 469 secondary school students indicates that mindfulness is a significant predictor of student participation in the school context, with an associated increase in participation through improved self-esteem. In contrast, Cordeiro et al. [ 93 ] conducted an experimental study with a control group of third-grade students and found that mindfulness significantly enhanced cognitive flexibility and handwriting fluency.

Prosocial behaviour has been identified as another key area of interest by a number of studies [ 4 , 22 , 31 , 33 , 39 , 40 , 57 ]. A study by Akhavan et al. [ 90 ] demonstrates the efficacy of mindfulness practice in a sample of teachers, including enhanced relationships with students and reduced stress.

With regard to the manner in which these mindfulness programmes can be supported, the utilisation of VR has been found to confer considerable benefits [ 15 , 48 , 49 , 86 ]. This is primarily attributable to the degree of adherence to the programme. In their seminal work, Friedlander et al. [ 94 ] introduced the concept of the ‘therapeutic alliance’ to describe this phenomenon of patient adherence in a therapeutic context. They posited that it represents a crucial factor in the efficacy of any therapeutic intervention. In this case, although it is an educational context, the effects of such adherence are similar; therefore, it is worthwhile to explore the potential of the VR format as a key factor for the success of mindfulness.

5 Conclusions

In response to the research questions initially posed, it can be stated that they have been addressed, resulting in comprehensive data pertaining to the volume, language and year of publication of the various research projects. It is notable that there has been a significant increase in publications over the past four years, as well as the prevalence of the article format. As is to be expected, the majority of publications have been in English. It is also evident that universities in countries with an Anglo-Saxon tradition have published the most research on this topic, with the USA being the country with the highest volume of studies.

In answer to questions 6, 7, 8 and 9, it might be stated that mindfulness practice has been shown to promote mental health, academic performance, awareness, prosocial behaviour and well-being in student populations. Mindfulness practice might promote mental health, and well-being in the student population. The positive impact of this practice is not limited to how it is performed. That is, whether it is through traditional guided meditations, mobile applications, videos, online exercises or virtual reality.

However, according to the available literature, habituation is easier to acquire. Therefore, additional benefits can be obtained by increasing the number of sessions completed or minutes of practice. In answer to question 10, in studies where VR was effective for mindfulness practice, students practiced longer than those in the control group. Therefore, VR could be a more effective way to introduce contemplative science by introducing meditation techniques in higher education.

The objective has been fulfilled by analysing the benefits of mindfulness on mental health, academic performance, well-being, mindfulness and prosocial behaviour of university students, as well as identifying the most effective way to achieve habituation to the practice. It is also noteworthy that these benefits are highly relevant, and it would be beneficial to introduce mindfulness practice in the context of higher education.

6 Limitation and implication

One of the issues highlighted is the lack of comprehensive data that would allow for a more thorough comparison. For example, aspects such as the geographical location of the study subjects or the duration of the mindfulness programme applied mean that there are a large number of studies whose effectiveness is not entirely clear. At the same time, this is a topic that is becoming increasingly relevant, but there is still no consensus among researchers.

With regard to prospective implications, it is evident that the implementation of mindfulness in educational settings offers substantial advantages. Consequently, higher education institutions should facilitate the availability of structured mindfulness programmes for students. Undoubtedly, this would prove to be a valuable addition to their psycho-emotional and academic development.

Data availability

The author confirms that all data generated or analysed during this study are included in this published article.

Palomo P, Rodrigues de Oliveira D, Pereira B, García J, Cebolla A, Baños R, Da Silva E, Demarzo M. Study protocol for a randomized controlled trial of mindfulness training with immersive technology (virtual reality) to improve the quality of life of patients with multimorbidity in primary care: the mindful-VR study. Annu Rev Cyberther Telemed. 2018;2018(16):140–7.

Google Scholar  

Noone C, Bunting B, Hogan M. Does mindfulness enhance critical thinking? Evidence for the mediating effects of executive functioning in the relationship between mindfulness and critical thinking. Front Psychol. 2016. https://doi.org/10.3389/fpsyg.2015.02043 .

Article   PubMed   PubMed Central   Google Scholar  

Van Gordon W, Shonin E, Sumich A, Sundin E, Griffiths M. Meditation Awareness Training (MAT) for psychological well-being in a sub-clinical sample of university students: a controlled pilot study. Mindfulness. 2014;5:381–91. https://doi.org/10.1007/s12671-012-0191-5 .

Article   Google Scholar  

Altay B, Porter N. Educating the mindful design practitioner. Think Skills Creativity. 2021. https://doi.org/10.1016/j.tsc.2021.100842 .

Goleman D, y Senge P. Triple focus. Ediciones B; 2016.

Holland D. Contemplative education in unexpected places: teaching mindfulness in Arkansas and Austria. Teach Coll Rec. 2006;108(9):1842–61. https://doi.org/10.1111/j.1467-9620.2006.00764.x .

Kabat-Zinn J. Mindfulness para principiantes. Penguin Random House; 2023.

Bernárdez B, Durán A, Parejo J, Juristo N, Ruiz-Cortés A. Effects of mindfulness on conceptual modeling performance: a series of experiments. IEEE Trans Software Eng. 2022;48(2):432–52. https://doi.org/10.1109/TSE.2020.2991699 .

Bernárdez B, Durán A, Parejo J, Ruiz-Cortés A. An experimental replication on the effect of the practice of mindfulness in conceptual modeling performance. J Syst Softw. 2018;136:153–72. https://doi.org/10.1016/j.jss.2016.06.104 .

Dimidjian S, Linehan M. Defining an agenda for future research on the clinical application of mindfulness practice. Clin Psychol Sci Pract. 2003;10(2):166–71.

Gomutbutra P, Srikhamjak T, Sapinun L, Kunapun S, Yingchankul N, Apaijai N, Shinlapawittayatorn K, Phuackchantuck R, Chattipakorn N, Chattipakorn S. Effect of intensive weekend mindfulness-based intervention on BDNF, mitochondria function, and anxiety. A randomized, crossover clinical trial. Comp Psychoneuroendocrinol. 2022. https://doi.org/10.1016/j.cpnec.2022.100137 .

Ahmad F, El Morr C, Ritvo P, Othman N, Moineddin R, Ashfaq I, Bohr Y, Ferrari M, Fung W, Hartley L, Maule C, Mawani A, McKenzie K, Williams S. An eight-week, web-based mindfulness virtual community intervention for students’ mental health: randomized controlled trial. JMIR Ment Health. 2020. https://doi.org/10.2196/15520 .

Cahn B, Goodman M, Peterson C, Maturi R, Mills P. Yoga, meditation and mind-body health: increased BDNF, Cortisol awakening response, and altered inflammatory marker expression after a 3-month yoga and meditation retreat. Front Hum Neurosci. 2017. https://doi.org/10.3389/fnhum.2017.00315 .

Hernández-Ruiz E, Sebren A, Alderete C, Bradshaw L, Fowler R. Effect of music on a mindfulness experience: an online study. Arts Psychother. 2021. https://doi.org/10.1016/j.aip.2021.101827 .

Modrego-Alarcón M, López-del-Hoyo Y, García-Campayo J, Pérez-Aranda A, Navarro-Gil M, Beltrán-Ruiz M, Morillo H, Delgado-Suarez I, Oliván-Arévalo R, Montero-Marin J. Efficacy of a mindfulness-based programme with and without virtual reality support to reduce stress in university students: a randomized controlled trial. Behav Res Ther. 2021. https://doi.org/10.1016/j.brat.2021.103866 .

Article   PubMed   Google Scholar  

Yogeswaran V, Morr C. Mental health for medical students, what do we know today? Procedia Computer Science. 2022. https://doi.org/10.1016/j.procs.2021.12.245 .

Osorio M, Zepeda C, Carballido J. Towards a virtual companion system to give support during confinement. In: 3rd international conference of inclusive technology and education (CONTIE); 2020. p. 46–50. https://doi.org/10.1109/CONTIE51334.2020.00017

Gomutbutra P, Yingchankul N, Chattipakorn N, Chattipakorn S, Srisurapanont M. The effect of mindfulness-based intervention on brain-derived neurotrophic factor (BDNF): a systematic review and meta-analysis of controlled trials. Front Psychol. 2020. https://doi.org/10.3389/fpsyg.2020.02209 .

Huang E, Reichardt L. Neurotrophins: roles in neuronal development and function. Annu Rev Neurosci. 2001. https://doi.org/10.1146/annurev.neuro.24.1.677 .

Binder D, Scharfman H. Brain-derived neurotrophic factor. Growth Factors. 2004;22:123–31. https://doi.org/10.1080/08977190410001723308 .

Litwiller F, White C, Hamilton-Hinch B, Gilbert R. The impacts of recreation programs on the mental health of postsecondary students in North America: an integrative review. Leis Sci. 2021;44(1):96–120. https://doi.org/10.1080/01490400.2018.1483851 .

Van der Riet P, Rossiter R, Kirby D, Dluzewska T, Harmon C. Piloting a stress management and mindfulness program for undergraduate nursing students: student feedback and lessons learned. Nurse Educ Today. 2015;35(1):44–9. https://doi.org/10.1016/j.nedt.2014.05.003 .

García-Pérez L, Collado Fernández D, Lamas-Cepero JL, Ubago-Jiménez JL. Healthy pills: physical activity program for the prevention of mental health and improvement of resilience in university students. Intervention protocol. Retos. 2024;55:726–35. https://doi.org/10.47197/retos.v55.104012 .

Ramos C, Castañón M, Almendra Y, Monge L. Puede el Mindfulness Reducir el Burnout en los Docentes. Ciencia Latina Revista Científica Multidisciplinar. 2024;8(1):42–55. https://doi.org/10.37811/cl_rcm.v8i1.9384 .

Lynch S, Gander M, Nahar A, Kohls N, Walach H. Mindfulness-based coping with university life: a randomized wait-list controlled study. SAGE Open. 2018. https://doi.org/10.1177/2158244018758379 .

Sensiper S. Teaching meditation to college students within an historical and cultural context: a qualitative analysis of undergraduate reflections on contemplative practice. Curr Psychol. 2022;42:15356–67. https://doi.org/10.1007/s12144-022-02811-x .

Bóo S, Childs-Fegredo J, Cooney S, Datta B, Dufour G, Jones P, Galante J. A follow-up study to a randomised control trial to investigate the perceived impact of mindfulness on academic performance in university students. Couns Psychother Res. 2020. https://doi.org/10.1002/capr.12282 .

McVeigh C, Ace L, Ski C, Carswell C, Burton S, Rej S, Noble H. Mindfulness-based interventions for undergraduate nursing students in a university setting: a narrative review. Healthcare. 2021. https://doi.org/10.3390/healthcare9111493 .

Kaplan-Rakowski R, Johnson K, Wojdynski T. The impact of virtual reality meditation on college students’ exam performance Smart Learn. Environ. 2021. https://doi.org/10.1186/s40561-021-00166-7 .

Kwon J, Hong N, Kim K, Heo J, Kim J, Kim E. Feasibility of a virtual reality program in managing test anxiety: a pilot study. Cyberpsychol Behav Soc Netw. 2020;23(10):715–20. https://doi.org/10.1089/cyber.2019.0651 .

Woloshyn V, Obradovć-Ratković S, Julien K, Rebek J, Sen P. Breathing our way into mindful academic writing: a collaborative autoethnography of an online writing community. J Furth High Educ. 2022;46(8):1135–48. https://doi.org/10.1080/0309877X.2022.2055450 .

Fagioli S, Pallini S, Mastandrea S, Barcaccia B. Effectiveness of a brief online mindfulness-based intervention for university students. Mindfulness. 2023. https://doi.org/10.1007/s12671-023-02128-1 .

McConville J, McAleer R, Hahne A. Mindfulness training for health profession students-the effect of mindfulness training on psychological well-being, learning and clinical performance of health professional students: a systematic review of randomized and non-randomized controlled trials. Explore (NY). 2017;13(1):26–45. https://doi.org/10.1016/j.explore.2016.10.002 .

Carullo P, Ungerman E, Metro D, Adams P. The impact of a smartphone meditation application on anesthesia trainee well-being. J Clin Anesth. 2021. https://doi.org/10.1016/j.jclinane.2021.110525 .

Oberski I, Murray S, Goldblatt J, DePlacido C. Contemplation & mindfulness in higher education. Glob Innov Teach Learn High Educ. 2014;11:317–40. https://doi.org/10.1007/978-3-319-10482-9_19 .

Henriksen D, Heywood W, Gruber N. Meditate to create: mindfulness and creativity in an arts and design learning context. Creativity Stud. 2022;15(1):147–68. https://doi.org/10.3846/cs.2022.13206 .

Schwind J, McCay E, Beanlands H, Schindel L, Martin J, Binder M. Mindfulness practice as a teaching-learning strategy in higher education: a qualitative exploratory pilot study. Nurse Educ Today. 2017. https://doi.org/10.1016/j.nedt.2016.12.017 .

Spadaro K, Hunker D. Exploring the effects of an online asynchronous mindfulness meditation intervention with nursing students on stress, mood, and cognition: a descriptive study. Nurse Educ Today. 2016;39:163–9. https://doi.org/10.1016/j.nedt.2016.02.006 .

Weingartner L, Sawning S, Shaw M, Klein J. Compassion cultivation training promotes medical student wellness and enhanced clinical care. BMC Med Educ. 2019. https://doi.org/10.1186/s12909-019-1546-6 .

Williams M, Estores I, Merlo L. Promoting resilience in medicine: the effects of a mind-body medicine elective to improve medical student well-being. Glob Adv Integr Med Health. 2020. https://doi.org/10.1177/2164956120927367 .

Wu J. An innovative and inclusive mindset: teaching embodied sonic meditation in higher education. In: Proceedings of the AES international conference; 2021. https://acortar.link/CzcWDc

Goleman D. Focus desarrollar la atención para alcanzar la excelencia. Kairós; 2013.

Gordon C, Posner J, Klein E, y Mumm C. (productores). La mente, en pocas palabras [documental]. Estados Unidos: Netflix; 2019.

Ramos C. Inteligencia de la pasión En búsqueda de una educación contemporánea integral e inteligentemente apasionada. RIDE Revista Iberoamericana para la Investigación y el Desarrollo Educativo. 2021. https://doi.org/10.23913/ride.v11i22.950 .

Mohring K, Brendel N. Producing virtual reality (VR) field trips – a concept for a sense-based and mindful geographic education. Geographica Helvetica. 2021;76:369–80. https://doi.org/10.5194/gh-76-369-2021 .

Frewen P, Oldrieve P, Law K. Teaching psychology in virtual reality. Scholarsh Teach Learn Psychol. 2022. https://doi.org/10.1037/stl0000341 .

Yang X, Lin L, Cheng P, Yang X, Ren Y, Huang Y. Examining creativity through a virtual reality support system. Educ Tech earch Dev. 2018;66:1231–54. https://doi.org/10.1007/s11423-018-9604-z .

Pascual K, Fredman A, Naum A, Patil C, Sikka N. Should mindfulness for health care workers go virtual? A mindfulness-based intervention using virtual reality and heart rate variability in the emergency department. Workplace Health Saf. 2023;71(4):188–94. https://doi.org/10.1177/21650799221123258 .

Miller M, Mistry D, Jetly R, Frewen P. Meditating in virtual reality 2: phenomenology of vividness, egocentricity and absorption-immersion. Mindfulness. 2021;12:1195–207. https://doi.org/10.1007/s12671-020-01589-y .

Malighetti C, Schnitzer C, Potter G, Nameth K, Brown T, Vogel E, Riva G, Runfola C, Safe D. Rescripting emotional eating with virtual reality: a case study. Annu Rev Cyberther Telemed. 2021;19:117–21.

Sun G, Lyu B. Relationship between emotional intelligence and self-efficacy among college students: the mediating role of coping styles. Discov Psychol. 2022;2:42. https://doi.org/10.1007/s44202-022-00055-1 .

Crosswell L, Yun G. Examining virtual meditation as a stress management strategy on college campuses through longitudinal, quasi-experimental research. Behaviour & Information Technology. 2020;41:864–78. https://doi.org/10.1080/0144929X.2020.1838609 .

De Zambotti M, Yuksel D, Kiss O, Barresi G, Arra N, Volpe L, King C, Baker F. A virtual reality-based mind-body approach to downregulate psychophysiological arousal in adolescent insomnia. Digital Health. 2022. https://doi.org/10.1177/20552076221107887 .

Whewell E, Caldwell H, Frydenberg M, Andone D. Changemakers as digital makers: connecting and co-creating. Educ Inf Technol. 2022;27:6691–713. https://doi.org/10.1007/s10639-022-10892-1 .

Seligman M. El circuito de la esperanza. Penguin Random House; 2018.

Ramos C. Inteligencia de la pasión: una perspectiva del comportamiento humano a través de la neurociencia. Fontamara. 2022. https://doi.org/10.29059/LUAT.307 .

Justo C, Luque M. Effects of a meditation program on values in a sample of university students. Electron J Res Educ Psychol. 2009;7(3):1157–74.

Sobral C, Caetano A. Addressing pedagogical tensions in emotional education at university: an integrative path HUMAN REVIEW. Int Hum Rev Revista Internacional De Humanidades. 2022;11(5):1–13. https://doi.org/10.37467/revhuman.v11.3873 .

Greeson J, Juberg M, Maytan M, James K, Rogers H. A randomized controlled trial of Koru: a mindfulness program for college students and other emerging adults. J Am Coll Health. 2014;62(4):222–33.

Klatt M. A contemplative tool: an exposé of the performance of self. J Transform Educ. 2017;15(2):122–36. https://doi.org/10.1177/1541344616683280 .

Van D’Ijk I, Lucassen P, Akkermans R, Van Engelen B, Van Weel C, Speckens A. Effects of mindfulness-based stress reduction on the mental health of clinical clerkship students: a cluster-randomized controlled trial. Acad Med. 2017;92(7):1012–21. https://doi.org/10.1097/ACM.0000000000001546 .

Noone C, Hogan M. A randomised active-controlled trial to examine the effects of an online mindfulness intervention on executive control, critical thinking and key thinking dispositions in a university student sample. BMC Psychol. 2018. https://doi.org/10.1186/s40359-018-0226-3 .

Lemay V, Hoolahan J, Buchanan A. Impact of a yoga and meditation intervention on students’ stress and anxiety levels. Am J Pharm Educ. 2019;83(5):7001. https://doi.org/10.5688/ajpe7001 .

Baran A. Sneaking meditation. J Dance Educ. 2022;22(1):23–31. https://doi.org/10.1080/15290824.2020.1765248 .

Damião A, Lucchetti A, da Silva O, Lucchetti G. Effects of a required large-group mindfulness meditation course on first-year medical students’ mental health and quality of life: a randomized controlled trial. J Gen Intern Med. 2020;35(3):672–8. https://doi.org/10.1007/s11606-019-05284-0 .

Devillers-Réolon L, Mascret N, Sleimen-Malkoun R. Online mindfulness intervention, mental health and attentional abilities: a randomized controlled trial in university students during COVID-19 lockdown. Front Psychol. 2022. https://doi.org/10.3389/fpsyg.2022.889807 .

Klonoff-Cohen H. College students’ opinions about coping strategies for mental health problems, suicide ideation, and self-harm during COVID-19. Front Psychol. 2022. https://doi.org/10.3389/fpsyg.2022.918891 .

Strickland D, Price-Blackshear M, Bettencourt B. Mindful writing for faculty and graduate students: a pilot mixed-methods study of effects of a six-week workshop. Innov Educ Teach Int. 2022. https://doi.org/10.1080/14703297.2022.2080099 .

García L, Ferguson S, Facio L, Schary D, Guenther C. Assessment of well-being using Fitbit technology in college students, faculty and staff completing breathing meditation during COVID-19: a pilot study. Ment Health Prev. 2023;30: 200280.

Nagy T, Sik K, Török L, Bőthe B, Takacs Z, Orosz G. Brief growth mindset and mindfulness inductions to facilitate task persistence after negative feedback. Collabra Psychol. 2023. https://doi.org/10.1525/collabra.74253 .

Hagège H, Ourmi M, Shankland R, Arboix-Calas F, Leys C, Lubart T. Ethics and meditation: a new educational combination to boost verbal creativity and sense of responsibility. J Intelligence. 2023. https://doi.org/10.3390/jintelligence11080155 .

Hwang M, Bunt L, Warner C. An eight-week zen meditation and music programme for mindfulness and happiness: qualitative content analysis. Int J Environ Res Public Health. 2023. https://doi.org/10.3390/ijerph20237140 .

Pearson M. Student perceptions of mindful reflection as a media law teaching tool. Aust J Rev. 2023;45(2):201–15. https://doi.org/10.1386/ajr_00132_1 .

Giesler F, Weeden M, Yee S, Ostler M, Brown R. CALM: caring action leadership mindfulness: a college counseling practice-based education model. Soc Work Educ. 2024. https://doi.org/10.1080/02615479.2024.2312940 .

Liu Y, Lee C, Wu L. A mindfulness-based intervention improves perceived stress and mindfulness in university nursing students: a quasi-experimental study. Sci Rep. 2024. https://doi.org/10.1038/s41598-024-64183-5 .

Martini L, Huerta M, Jurkiewicz J, Chan B, Bairaktarova D. Exploring students’ experiences with mindfulness meditations in a first-year general engineering course. Educ Sci. 2024. https://doi.org/10.3390/educsci14060584 .

Kilner-Johnson A, Udofia E. Using mindfulness meditation techniques to support peer-to-peer dialogue in seminars. In: Gravett K, Yakovchuk N, Kinchin I, editors. Enhancing student-centred teaching in higher education. Cham: Palgrave Macmillan; 2020. https://doi.org/10.1007/978-3-030-35396-4_19 .

Chapter   Google Scholar  

Grahame K, Jay A, Gillen A, Freeman S. Meaningful moments: first-year student perceptions of mindfulness and meditation in the classroom. In: ASEE annual conference and exposition, conference proceedings, Baltimore, USA; 2023.

Holland D. Integrating mindfulness meditation and somatic awareness into a public educational setting. J Humanist Psychol. 2004;44(4):468–84. https://doi.org/10.1177/0022167804266100 .

Stillwell S, Vermeesch A, Scott J. Interventions to reduce perceived stress among graduate students: a systematic review with implications for evidence-based practice. Worldviews Evid Based Nurs. 2017;14(6):507–13. https://doi.org/10.1111/wvn.12250 .

Breedvelt J, Amanvermez Y, Harrer M, Karyotaki E, Gilbody S, Bockting C, Cuijpers P, Ebert D. The effects of meditation, yoga, and mindfulness on depression, anxiety, and stress in tertiary education students: a meta-analysis. Front Psychiatry. 2019. https://doi.org/10.3389/fpsyt.2019.00193 .

Yosep I, Suryani S, Mediani H, Mardhiyah A, Ibrahim K. Types of digital mindfulness: improving mental health among college students—a scoping review. J Multidiscip Healthc. 2024;17:43–53. https://doi.org/10.2147/JMDH.S443781 .

Mapchart. Create your own custom map; 2024. https://www.mapchart.net/world.html

Dutt S, Keyte R, Egan H, Hussain M, Mantzios M. Healthy and unhealthy eating amongst stressed students: considering the influence of mindfulness on eating choices and consumption. Health Psychol Rep. 2019;7(2):113–20. https://doi.org/10.5114/hpr.2019.77913 .

Aşık E, Albayrak S. The effect of mindfulness levels on the life satisfaction of nursing students. Perspect Psychiatr Care. 2022;58(3):1055–61. https://doi.org/10.1111/ppc.12898 .

Navarrete J, Martínez-Sanchis M, Bellosta-Batalla M, Baños R, Cebolla A, y Herrero, R. Compassionate embodied virtual experience increases the adherence to meditation practice. Appl Sci (Switzerland). 2021;11(3):1–16. https://doi.org/10.3390/app11031276 .

Huang T, Larsen K, Ried-Larsen M, Møller N, Andersen L. The effects of physical activity and exercise on brain-derived neurotrophic factor in healthy humans: a review. Scand J Med Sci Sports. 2014;24:1–10. https://doi.org/10.1111/sms.12069 .

Cheng MWT, Leung ML, Lau JCH. A review of growth mindset intervention in higher education: The case for infographics in cultivating mindset behaviors. Soc Psychol Educ Int J. 2021;24(5):1335–62. https://doi.org/10.1007/s11218-021-09660-9 .

Saraff S, Tiwari A, Rishipal. Effect of mindfulness on self-concept, self-esteem and growth mindset: evidence from undergraduate students. J Psychosoc Res. 2020;15(1):329–40. https://doi.org/10.32381/JPR.2020.15.01.28 .

Akhavan N, Walsh N, Goree J. Benefits of mindfulness professional development for elementary teachers: considerations for district and school-level leaders. J School Admin Res Dev. 2021;6:24–42. https://doi.org/10.32674/jsard.v6i1.2462 .

Chandna S, Sharma P, Moosath H. The mindful self: exploring mindfulness in relation with self-esteem and self-efficacy in Indian population. Psychol Stud. 2022;67:261–72. https://doi.org/10.1007/s12646-021-00636-5 .

Artika MY, Sunawan S, Awalya A. Mindfulness and student engagement: the mediation effect of self esteem. Jurnal Bimbingan Konseling. 2021;10(2):89–98. https://doi.org/10.15294/jubk.v10i2.47991 .

Cordeiro C, Magalhães S, Rocha R, Mesquita A, Olive T, Castro SL, Limpo T. Promoting third graders’ executive functions and literacy: a pilot study examining the benefits of mindfulness vs. relaxation training. Front Psychol. 2021;12: 643794. https://doi.org/10.3389/fpsyg.2021.643794 .

Friedlander ML, Escudero V, Heatherington L. La alianza terapéutica. En la terapia familiar y de pareja. Paidós; 2009.

Download references

Author information

Authors and affiliations.

Autonomous University of Tamaulipas, Nuevo Laredo, México

Cynthia Lizeth Ramos-Monsivais

University of Burgos, Burgos, Spain

Sonia Rodríguez-Cano & Vanesa Delgado-Benito

University of A Coruña, A Coruña, Spain

Estefanía Lema-Moreira

You can also search for this author in PubMed   Google Scholar

Contributions

C.L.R.M., S.M.C. and E. L. M have designed the study. C.L.R.M. and S.M.C. carried out the methodology and results sections. V. D. B. and C.L.R.M have written the Introduction C.L.R.M. and E.L.M. have written the Discussion, Conclusions and Limitations. All authors wrote and reviewed the manuscript.

Corresponding author

Correspondence to Estefanía Lema-Moreira .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Ramos-Monsivais, C.L., Rodríguez-Cano, S., Lema-Moreira, E. et al. Relationship between mental health and students’ academic performance through a literature review. Discov Psychol 4 , 119 (2024). https://doi.org/10.1007/s44202-024-00240-4

Download citation

Received : 04 April 2024

Accepted : 09 September 2024

Published : 17 September 2024

DOI : https://doi.org/10.1007/s44202-024-00240-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Academic performance
  • Mindfulness
  • University students
  • Virtual reality

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) The Impact of Artificial Intelligent in Education toward 21st

    ai in education a systematic literature review

  2. (PDF) A Systematic Literature Review on AI Algorithms and Techniques

    ai in education a systematic literature review

  3. (PDF) ARTIFICIAL INTELLIGENCE IN EDUCATION

    ai in education a systematic literature review

  4. Artificial intelligence maturity model: a systematic literature review

    ai in education a systematic literature review

  5. AI tools for writing Systematic Literature Reviews

    ai in education a systematic literature review

  6. (PDF) The Opportunities and Challenges of Artificial Intelligence (AI

    ai in education a systematic literature review

VIDEO

  1. Reporting Systematic Review Results

  2. SYSTEMATIC LITERATURE REVIEW

  3. AI in Literature Review: Possibilities and Innovative Technologies

  4. How to Conduct a Systematic Literature Review from Keenious AI tool

  5. AI Literacy: Navigating the World of Automation and Generative AI

  6. Literature review structure and AI tools

COMMENTS

  1. Artificial intelligence in education: A systematic literature review

    In contrast, a systematic literature review, through content analysis of research articles, can delve into research nuances that are of interest to researchers ... The AI education cluster primarily focuses on the education and learning of AI-related knowledge and skills. These include STEM and computer science-related knowledge and skills ...

  2. Role of AI chatbots in education: systematic literature review

    AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering explanations, and providing additional resources. Chatbots can also act as virtual teaching assistants, supporting educators through various means. In this paper, we try to understand the full benefits ...

  3. AI in Education: A Systematic Literature Review

    A review of. available and relevant literature was done using the systematic re view method t o identify the current. research focus and provide an in-depth understanding of AI technology in ...

  4. Artificial Intelligence in Education: A Systematic Literature Review

    Figure 1. Methodology for the literature review on AI and education. Figure 2 illustrates the evolution of research on "education" and "AI" from 1986 to 2024 in the elds. of engineering ...

  5. Artificial intelligence in education: : A systematic literature review

    AbstractArtificial intelligence (AI) in education (AIED) has evolved into a substantial body of literature with diverse perspectives. ... J.Q. Pérez, T. Daradoumis, J.M.M. Puig, Rediscovering the use of chatbots in education: A systematic literature review, Computer Applications in Engineering Education 28 (2020) 1549-1565,. Crossref. Google ...

  6. Artificial Intelligence in Education: a Systematic Review

    This systematic review presents a comprehensive synthesis of recent scientific findings. concerning the disruptive effects of artificial intelligence on the educational sector. I n light. of the ...

  7. AI in Education: A Systematic Literature Review

    The purpose of this study is to analyze the opportunities, benefits, and challenges of AI in education. A review of available and relevant literature was done using the systematic review method to identify the current research focus and provide an in-depth understanding of AI technology in education for educators and future research directions.

  8. Systematic literature review on opportunities, challenges, and future

    Only articles relevant to AIEd were selected for this review. To identify relevant published articles, three of the authors collaboratively discussed and developed the criteria depicted in Fig. 1.Based on the previous studies (Nigam et al., 2021), the search query [("AI" OR "artificial intelligence") AND "education"] was used to include papers with these terms in the titles ...

  9. Artificial Intelligence Applications in K-12 Education: A Systematic

    Additionally, technologies and environments that contributed to employing AI in education were discussed. To this end, a systematic literature review was conducted on articles and conference papers published between 2011 and 2021 in the Web of Science and Scopus databases. As the result of the initial search, 2075 documents were extracted and ...

  10. A systematic review of AI role in the educational system based on a

    To fill this gap, this review research proposes a conceptual framework from complex adaptive systems theory perspective, uses a systematic literature review approach to locate and summarize articles, and categorizes the roles of AI in the educational system. The review results indicate that when AI is added into an educational system, its roles ...

  11. PDF Role of AI chatbots in education: systematic literature review

    To summarize, incorporating AI chatbots in education brings personalized learn-ing for students and time efficiency for educators. Students benefit from flexible study aid and skill development. However, concerns arise regarding the accuracy of information, fair assessment practices, and ethical considerations.

  12. Human-centred learning analytics and AI in education: A systematic

    To conduct the systematic literature review, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol (Page et al., 2021), which has four phases and aims to promote transparent reporting.We searched four reputable bibliographic databases, including Scopus, ACM Digital Library, IEEE Xplore, and Web of Science, to find high-quality peer-reviewed ...

  13. A meta systematic review of artificial intelligence in higher education

    Although the field of Artificial Intelligence in Education (AIEd) has a substantial history as a research domain, never before has the rapid evolution of AI applications in education sparked such prominent public discourse. Given the already rapidly growing AIEd literature base in higher education, now is the time to ensure that the field has a solid research and conceptual grounding. This ...

  14. AI's Role and Application in Education: Systematic Review

    A systematic review approach was used to identify the current research emphasis and offer an in-depth analysis of the function of AI technology in education. A total of 46 related articles were identified from the Scopus database. Three key themes emerged from the review: learning, education and teaching.

  15. PDF Benefits, Challenges, and Methods of Artificial Intelligence (AI

    AI chatbots in education and other fields is increasing, it is considered important to conduct systematic review studies for future research and to eliminate the confusion created by the change with AI chatbot technologies and to eliminate the research gap in the field (Dwivedi et al., 2023). In literature, a systematic literature review of 53

  16. AI in Education: A Systematic Literature Review

    The purpose of this study is to analyze the opportunities, benefits, and challenges of AI in education. A review of available and relevant literature was done using the systematic review method to identify the current research focus and provide an in-depth understanding of AI technology in education for educators and future research directions.

  17. Role of AI chatbots in education: systematic literature review

    It is found that students primarily gain from AI-powered chatbots in three key areas: homework and study assistance, a personalized learning experience, and the development of various skills. AI chatbots shook the world not long ago with their potential to revolutionize education systems in a myriad of ways. AI chatbots can provide immediate support by answering questions, offering ...

  18. AI literacy in K-12: a systematic literature review

    The successful irruption of AI-based technology in our daily lives has led to a growing educational, social, and political interest in training citizens in AI. Education systems now need to train students at the K-12 level to live in a society where they must interact with AI. Thus, AI literacy is a pedagogical and cognitive challenge at the K-12 level. This study aimed to understand how AI is ...

  19. PDF Role of AI in Blended Learning: A Systematic Literature Review

    International Review of Research in Open and Distributed Learning Volume 25, Number 1. February - 2024 Role of AI in Blended Learning: A Systematic Literature Review Yeonjeong Park1 2,*and Min Young Doo . 1Department of Early Childhood Education , Honam University; 2Department of Education Kangwon National University; *Corresponding Author Abstract As blended learning moved toward a new ...

  20. Artificial Intelligence in Education: A Review

    The purpose of this study was to assess the impact of Artificial Intelligence (AI) on education. Premised on a narrative and framework for assessing AI identified from a preliminary analysis, the scope of the study was limited to the application and effects of AI in administration, instruction, and learning. A qualitative research approach, leveraging the use of literature review as a research ...

  21. Role of AI chatbots in education: systematic literature review

    review (Okonkwo & Ade-Ibijola, 2021)a) Chatbots are used in education for teaching, adminis-. tration, assessment, advisory, and research. b) Chatbots have the potential to enhance learning ...

  22. PDF The Role of Artificial Intelligence in Education: A Systematic

    esearch aims to explore the application of Artificial Intelligence in education. In this systematic literature review, the research began by identifying articles related to Artificial Intelligenc. in the Scopus and Google Scholar databases through the publish and perish tool. There are four phases involved in literature mapping, nam.

  23. Human-Centred Learning Analytics and AI in Education: a Systematic

    Despite a shift towards human-centred design in recent LA and AIED research, there remain gaps in our understanding of the importance of human control, safety, reliability, and trustworthiness in the design and implementation of these systems. We conducted a systematic literature review to explore these concerns and gaps.

  24. The use of artificial intelligence in school science: A systematic

    Artificial Intelligence is widely used across contexts and for different purposes, including the field of education. However, a review of the literature showcases that while there exist various review studies on the use of AI in education, missing remains a review focusing on science education. To address this gap, we carried out a systematic literature review between 2010 and 2021, driven by ...

  25. Strategic goals for artificial intelligence integration among STEM

    After consulting various studies, a systematic literature review approach was used to examine the strategic goals of AI integration among STEM academics and undergraduates in African higher education. The systematic review was carried out using the PRISMA procedure. Our objective was to identify the existing gaps and challenges to provide ...

  26. The Best Practices of Financial Management in Education: A Systematic

    DOI: 10.2139/ssrn.4892468 Corpus ID: 271777263; The Best Practices of Financial Management in Education: A Systematic Literature Review @article{Vicente2024TheBP, title={The Best Practices of Financial Management in Education: A Systematic Literature Review}, author={Ruel S. Vicente and Loyd C. Flores and Ronald E. Almagro and Mary Rose V. Amora and Jocel P. Lopez}, journal={SSRN Electronic ...

  27. A systematic literature review on the impact of artificial intelligence

    Artificial intelligence (AI) can bring both opportunities and challenges to human resource management (HRM). While scholars have been examining the impact of AI on workplace outcomes more closely over the past two decades, the literature falls short in providing a holistic scholarly review of this body of research. Such a review is needed in order to: (a) guide future research on the effects ...

  28. (PDF) A Systematic Literature Review on the Impacts of Using Chat GPT

    This systematic literature review explores the impact of using ChatGPT and other AI-d riven tools in English as a Second Language (ESL) teac hing and learning, employing a meta-analytic a pproach

  29. Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations

    Background ChatGPT, a recently developed artificial intelligence (AI) chatbot, has demonstrated improved performance in examinations in the medical field. However, thus far, an overall evaluation of the potential of ChatGPT models (ChatGPT-3.5 and GPT-4) in a variety of national health licensing examinations is lacking. This study aimed to provide a comprehensive assessment of the ChatGPT ...

  30. Relationship between mental health and students' academic performance

    An analysis and systematic review of papers published in the Scopus database was conducted. It was found that publications on the implementation of mindfulness in higher education began in 2004. Their study has been developed in 22 countries, 15 are European, 3 Asians, 2 North American, one Latin American and one from Oceania.