IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

education-logo

Article Menu

text mining research papers 2020

  • Subscribe SciFeed
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Text mining in education—a bibliometrics-based systematic review.

text mining research papers 2020

1. Introduction

  • What has been the state of the art in application of text mining methods in the field of education?
  • What are the main themes in using Text mining in education in the 21st century and how have they evolved?

2. Methodology

Selection criteria and data collection.

  • Scopus search term: (TITLE (“text mining” OR “text analytics” OR “text analysis” OR “natural language processing” OR “NLP” OR “writing analytics” OR “writing analysis” OR “language model” OR “computational linguistics”) AND TITLE-ABS-KEY (“teach*” OR “learn*” OR “educat*” OR “university" OR “college” OR “institution” OR “school” OR “student”)) AND PUBYEAR > 1999 AND (EXCLUDE (DOCTYPE, “re”)) AND (LIMIT-TO (LANGUAGE, “English”))
  • Web of Science search term: (TI = (“text mining” OR “text analytics” OR “text analysis” OR “natural language processing” OR “NLP” OR “writing analytics” OR “writing analysis” OR “language model” OR “computational linguistics”)) AND TS = (“teach*” OR “learn*” OR “educat*” OR “university” OR “college” OR “institution” OR “school” OR “student”) and Review Articles (Exclude–Document Types) and English (Languages)

Click here to enlarge figure

  • Education related terms (Group A) : words that represent education, teaching, or learning (e.g., “distance learning”, “MOOCs”)
  • Text related jargon (Group B) : terms that deal with preparing, processing, presenting or analysing text data (e.g., “word embedding”, “sentiment analysis”)
  • Data analysis technique, jargon or discipline (Group C) : terms that represent the name of a technique or part of a process that is concerned with the analysis of the data (e.g., “support vector machine”, “neural networks”)
  • Publications that have: − a text related jargon (Group B) as well as an education related term (Group A) in their title
  • Publications that have − a data analysis related technique, jargon or discipline (Group C) in their title, and − a text related jargon (Group B) in their title, abstract or author keywords and − an education related term (Group A) in their title
Author KeywordGroupFrequency
Natural language processingB1502
Machine learningC1031
Text miningB1005
Deep learningC401
NLPB294
Sentiment analysisB206
Artificial intelligenceC167
Language modelB165
Information extractionC128
Text analysisB125
Text classificationB124
ClassificationC119
Social mediaC115
Data miningC109
Natural languageB102
LearningA99
Text analyticsB95
Big dataC89
Neural networksC85
Information retrievalC78
Electronic health recordsNA77
Speech recognitionC76
Transfer learningC74
Natural language processing (nlp)B73
Topic modellingB73
ProcessingC71
OntologyB68
BERTB67
TwitterC66
Computational linguisticsB65
COVID-19NA64
NaturalC62
Language modelsB60
Language processingB60
Word embeddingsB60
Language modellingB58
Named entity recognitionB58
ClusteringC57
TextC57
LSTMB53
Neural networkC52

3.1. Descriptive Analysis

3.2. source analysis, 3.3. author analysis, 3.4. document analysis, 3.5. conceptual structure analysis, 4. discussion and conclusions, author contributions, institutional review board statement, data availability statement, conflicts of interest.

  • Liddy, E.D. Natural language processing. In Encyclopedia of Library and Information Science , 2nd ed.; Marcel Decker, Inc.: New York, NY, USA, 2001. [ Google Scholar ]
  • Tan, A.H.; Ridge, K.; Labs, D.; Terrace, H.M.K. Text mining: The state of the art and the challenges. In Proceedings of the Pakdd 1999 Workshop on Knowledge Disocovery from Advanced Databases, Beijing, China, 26–28 April 1999; Volume 8, pp. 65–70. [ Google Scholar ]
  • Denyer, D.; Tranfield, D. Producing a systematic review. In The Sage Handbook of Organizational Research Methods ; Buchanan, D.A., Bryman, A., Eds.; Sage Publications Ltd.: Thousand Oaks, CA, USA, 2009; pp. 671–689. [ Google Scholar ]
  • Battal, A.; Afacan Adanır, G.; Gülbahar, Y. Computer Science Unplugged: A Systematic Literature Review. J. Educ. Technol. Syst. 2021 , 50 , 24–47. [ Google Scholar ] [ CrossRef ]
  • Shin, D.; Shim, J. A Systematic Review on Data Mining for Mathematics and Science Education. Int. J. Sci. Math. Educ. 2021 , 19 , 639–659. [ Google Scholar ] [ CrossRef ]
  • Ferreira-Mello, R.; André, M.; Pinheiro, A.; Costa, E.; Romero, C. Text mining in education. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019 , 9 , e1332. [ Google Scholar ] [ CrossRef ]
  • Kerkhof, R. Natural Language Processing for Scoring Open-Ended Questions: A Systematic Review. Master’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [ Google Scholar ]
  • Soni, S.; Kumar, P.; Saha, A. Automatic Question Generation: A Systematic Review. In Proceedings of the International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Dehradun, India, 14–15 March 2019. [ Google Scholar ]
  • Dos Santos, V.; de Souza, É.F.; Felizardo, K.R.; Watanabe, W.M.; Vijaykumar, N.L.; Aluizio, S.M.; Júnior, A.C. Conceptual Map Creation from Natural Language Processing: A Systematic Mapping Study. Rev. Bras. De Inform. Na Educ. Ao 2019 , 27 , 150–176. [ Google Scholar ] [ CrossRef ]
  • Higgins, J.P.; Thomas, J.; Chandler, J.; Cumpston, M.; Li, T.; Page, M.J.; Welch, V.A. Cochrane Handbook for Systematic Reviews of Interventions ; John Wiley & Sons: Hoboken, NJ, USA, 2019. [ Google Scholar ]
  • Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021 , 372 , n71. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Singh, V.K.; Singh, P.; Karmakar, M.; Leta, J.; Mayr, P. The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis. Scientometrics 2021 , 126 , 5113–5142. [ Google Scholar ] [ CrossRef ]
  • Stahlschmidt, S.; Stephen, D. Comparison of Web of Science, Scopus and Dimensions databases. In KB Forschungspoolprojekt ; DZHW: Hannover, Germany, 2020. [ Google Scholar ]
  • Linnenluecke, M.K.; Marrone, M.; Singh, A.K. Conducting systematic literature reviews and bibliometric analyses. Aust. J. Manag. 2020 , 45 , 175–194. [ Google Scholar ] [ CrossRef ]
  • Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Inf. 2017 , 11 , 959–975. [ Google Scholar ] [ CrossRef ]
  • He, W. Examining students’ online interaction in a live video streaming environment using data mining and text mining. Comput. Hum. Behav. 2013 , 29 , 90–102. [ Google Scholar ] [ CrossRef ]
  • Hung, J.L.; Zhang, K. Examining mobile learning trends 2003–2008: A categorical meta-trend analysis using text mining techniques. J. Comput. High. Educ. 2012 , 24 , 1–17. [ Google Scholar ] [ CrossRef ]
  • McNamara, D.S.; Crossley, S.A.; Roscoe, R. Natural language processing in an intelligent writing strategy tutoring system. Behav. Res. Methods 2013 , 45 , 499–515. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Crossley, S.; Paquette, L.; Dascalu, M.; McNamara, D.S.; Baker, R.S. Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the 6th International Conference on Learning Analytics & Knowledge, Edinburgh, UK, 25–29 April 2016; pp. 6–14. [ Google Scholar ]
  • Robinson, C.; Yeomans, M.; Reich, J.; Hulleman, C.; Gehlbach, H. Forecasting student achievement in MOOCs with natural language processing. In Proceedings of the 6th International Conference on Learning Analytics & Knowledge, Edinburgh, UK, 25–29 April 2016; pp. 383–387. [ Google Scholar ]
  • Yim, S.; Warschauer, M. Web-based collaborative writing in L2 contexts: Methodological insights from text mining. Lang. Learn. Technol. 2017 , 21 , 146–165. [ Google Scholar ]
  • Wang, F.; Ngo, C.W.; Pong, T.C. Structuring low-quality videotaped lectures for cross-reference browsing by video text analysis. Pattern Recognit. 2008 , 41 , 3257–3269. [ Google Scholar ] [ CrossRef ]
  • Gibson, A.; Aitken, A.; Sándor, Á.; Buckingham Shum, S.; Tsingos-Lucas, C.; Knight, S. Reflective writing analytics for actionable feedback. In Proceedings of the 7th International Learning Analytics & Knowledge Conference, Vancouver, BC, Canada, 13–17 March 2017; pp. 153–162. [ Google Scholar ]
  • Shum, S.B.; Sándor, Á.; Goldsmith, R.; Wang, X.; Bass, R.; McWilliams, M. Reflecting on reflective writing analytics: Assessment challenges and iterative evaluation of a prototype tool. In Proceedings of the 6th International Conference on Learning Analytics & Knowledge, Edinburgh, UK, 25–29 April 2016; pp. 213–222. [ Google Scholar ]
  • Allen, L.K.; Snow, E.L.; McNamara, D.S. Are you reading my mind? Modeling students’ reading comprehension skills with Natural Language Processing techniques. In Proceedings of the 5th International Conference on Learning Analytics and Knowledge, Poughkeepsie, NY, USA, 16–20 March 2015; pp. 246–254. [ Google Scholar ]
  • Cobo, M.; López-Herrera, A.; Herrera-Viedma, E.; Herrera, F. An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field. J. Inf. 2011 , 5 , 146–166. [ Google Scholar ] [ CrossRef ]
  • Callon, M.; Courtial, J.P.; Laville, F. Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics 1991 , 22 , 155–205. [ Google Scholar ] [ CrossRef ]
YearNumber of Publications
2000–200418
2005–201967
2010–2014136
2015–2019346
2020–January 2022414
Author KeywordsArticlesKeywords-Plus (ID)Articles
Natural language processing140Students306
Sentiment analysis131Natural language processing systems223
Machine learning122Data mining195
Text mining122Learning systems171
Deep learning64Sentiment analysis160
Artificial intelligence38Natural language processing152
E-learning37E-learning123
Educational data mining32Teaching110
Data mining29Text mining102
Text classification28Education94
NameNumber of Publications
Lecture Notes in Computer Science57
ACM International Conference Proceedings Series29
Advances in Intelligent Systems and Computing24
CEUR Workshop Proceedings19
Communication in Computer and Information Science17
Journal of Physics: Conference Series17
Pervasive Health: Pervasive Computing Technologies for Healthcare17
International Journal of Advanced Computer Science and Applications11
International Journal of Artificial Intelligence in Education11
IEEE Access10
First Author and YearDigital Object IdentifierTotal Citation (TC)TC per YearRef
He W., 2013 12013[ ]
Hung J.L., 2012 10611[ ]
Mcnamara D., 2013 708[ ]
Crossley D., 2016 6110[ ]
Robinson C., 2016 427[ ]
Yim S., 2017 377.4[ ]
Wang F., 2008 312[ ]
Gibson A., 2017 275[ ]
BuckinghamShum S., 2016 232[ ]
Allen L., 2015 233[ ]
RankKeywordFrequencyRankKeywordFrequency
1Natural language processing14026Neural network12
2Sentiment analysis13127Student feedback12
3Machine learning12228Automated essay scoring11
4Text mining12229Feedback11
5Deep learning6430Intelligent tutoring systems11
6Artificial intelligence3831LSTM11
7E-learning3732Natural language processing (NLP)11
8Educational data mining3233Support vector machine11
9Data mining2934BERT10
10Text classification2835Feature selection9
11Learning analytics2736Natural language9
12Education2637Teaching evaluation9
13Topic modelling2438Text analytics9
14Opinion mining2339Word embedding9
15Higher education1940Word2vec9
16Classification1841Assessment8
17NLP1842Big data8
18Text analysis1843COVID-198
19MOOC1644IDA8
20Online learning1545Plagiarism detection8
21Chatbot1346SVM8
22Latent dirichlet allocation1347Twitter8
23Learning1348Named entity recognition7
24MOOCs1349Natural language understanding7
25Information retrieval1250Ontology7
MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

Ahadi, A.; Singh, A.; Bower, M.; Garrett, M. Text Mining in Education—A Bibliometrics-Based Systematic Review. Educ. Sci. 2022 , 12 , 210. https://doi.org/10.3390/educsci12030210

Ahadi A, Singh A, Bower M, Garrett M. Text Mining in Education—A Bibliometrics-Based Systematic Review. Education Sciences . 2022; 12(3):210. https://doi.org/10.3390/educsci12030210

Ahadi, Alireza, Abhay Singh, Matt Bower, and Michael Garrett. 2022. "Text Mining in Education—A Bibliometrics-Based Systematic Review" Education Sciences 12, no. 3: 210. https://doi.org/10.3390/educsci12030210

Article Metrics

Article access statistics, supplementary material.

  • Externally hosted supplementary file 1 Doi: 10.5281/zenodo.5890421 Link: https://zenodo.org/record/5890421#.Yeu92f5BxjE Description: Bib records of the publications

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

text mining Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Evaluation of the synergy degree of industrial de-capacity policies based on text mining: a case study of china's coal industry, application of informetrics on financial network text mining based on affective computing, recycling behaviour: mapping knowledge domain through bibliometrics and text mining, penerapan text mining untuk melakukan clustering data tweet akun blibli pada media sosial twitter menggunakan k-means clustering.

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.

The Epilepsy Ontology: a community-based ontology tailored for semantic interoperability and text-mining

Abstract Motivation: Epilepsy is a multi-faceted complex disorder that requires a precise understanding of the classification, diagnosis, treatment, and disease mechanism governing it. Although scattered resources are available on epilepsy, comprehensive and structured knowledge is missing. In contemplation to promote multidisciplinary knowledge exchange and facilitate advancement in clinical management, especially in pre-clinical research, a disease-specific ontology is necessary. The presented ontology is designed to enable better interconnection between scientific community members in the epilepsy domain.Results: The Epilepsy Ontology (EPIO) is an assembly of structured knowledge on various aspects of epilepsy, developed according to Basic Formal Ontology (BFO) and Open Biological and Biomedical Ontology (OBO) Foundry principles. Concepts and definitions are collected from the latest International League against Epilepsy (ILAE) classification, domain-specific ontologies, and scientific literature. This ontology consists of 1,879 classes and 28,151 axioms (2,171 declaration axioms, 2,219 logical axioms) from several aspects of epilepsy. This ontology is intended to be used for data management and text mining purposes.

ANALISIS KECENDERUNGAN LAPORAN MASYARAKAT PADA “LAPORGUB..!” PROVINSI JAWA TENGAH MENGGUNAKAN TEXT MINING DENGAN FUZZY C-MEANS CLUSTERING

Effective communication between the government and society is essential to achieve good governance. The government makes an effort to provide a means of public complaints through an online aspiration and complaint service called “LaporGub..!”. To group incoming reports easier, the topic of the report is searched by using clustering. Text Mining is used to convert text data into numeric data so that it can be processed further. Clustering is classified as soft clustering (fuzzy) and hard clustering. Hard clustering will divide data into clusters strictly without any overlapping membership with other clusters. Soft clustering can enter data into several clusters with a certain degree of membership value. Different membership values make fuzzy grouping have more natural results than hard clustering because objects at the boundary between several classes are not forced to fully fit into one class but each object is assigned a degree of membership. Fuzzy c-means has an advantage in terms of having a more precise placement of the cluster center compared to other cluster methods, by improving the cluster center repeatedly. The formation of the best number of clusters is seen based on the maximum silhouette coefficient. Wordcloud is used to determine the dominant topic in each cluster. Word cloud is a form of text data visualization. The results show that the maximum silhouette coefficient value for fuzzy c-means clustering is shown by the three clusters. The first cluster produces a word cloud regarding road conditions as many as 449 reports, the second cluster produces a word cloud regarding covid assistance as many as 964 reports, and the third cluster produces a word cloud regarding farmers fertilizers as many as 176 reports. The topic of the report regarding covid assistance is the cluster with the most number of members. 

Text visualization for geological hazard documents via text mining and natural language processing

Analysis of sebaceous gland carcinoma associated genes using network analysis to identify potentially actionable genes.

Eyelid sebaceous gland carcinoma (SGC) is a rare but life-threatening condi-tion. However, there is limited computational research associated with un-derlying protein interactions specific to eyelid sebaceous gland carcinoma. The aim of our study is to identify and analyse the genes associated with eyelid sebaceous gland carcinoma using text mining and to develop a protein-protein interaction network to predict significant biological pathways using bioinformatics tool. Genes associated with eyelid sebaceous gland carcinoma were retrieved from the PubMed database using text mining with key terms ‘eyelid’, ‘sebaceous gland carcinoma’ and excluding the genes for ‘Muir-Torre Syndrome’. The interaction partners were identified using STRING. Cytoscape was used for visualization and analysis of the PPI network. Molec-ular complexes in the network were predicted using MCODE plug-in and ana-lyzed for gene ontology terms using DAVID. PubMed retrieval process identi-fied 79 genes related to eyelid sebaceous gland carcinoma. The PPI network associated with eyelid sebaceous gland carcinoma produced 79 nodes, 1768 edges. Network analysis using Cytoscape identified nine key genes and two molecular complexes to be enriched in the protein-protein interaction net-work. GO enrichment analysis identified biological processes cell fate com-mitment, Wnt signalling pathway, retinoic acid signalling and response to cytokines to be enriched in our network. Genes identified in the study might play a pivotal role in understanding the underlying molecular pathways in-volved in the development and progression of eyelid sebaceous gland carci-noma. Furthermore, it may aid in the identification of candidate biomarkers and therapeutic targets in the treatment of eyelid sebaceous gland carcino-ma.

Determining banking service attributes from online reviews: text mining and sentiment analysis

PurposeThe current study employs text mining and sentiment analysis to identify core banking service attributes and customer sentiment in online user-generated reviews. Additionally, the study explains customer satisfaction based on the identified predictors.Design/methodology/approachA total of 32,217 customer reviews were collected across 29 top banks on bankbazaar.com posted from 2014 to 2021. In total three conceptual models were developed and evaluated employing regression analysis.FindingsThe study revealed that all variables were found to be statistically significant and affect customer satisfaction in their respective models except the interest rate.Research limitations/implicationsThe study is confined to the geographical representation of its subjects' i.e. Indian customers. A cross-cultural and socioeconomic background analysis of banking customers in different countries may help to better generalize the findings.Practical implicationsThe study makes essential theoretical and managerial contributions to the existing literature on services, particularly the banking sector.Originality/valueThis paper is unique in nature that focuses on banking customer satisfaction from online reviews and ratings using text mining and sentiment analysis.

Export Citation Format

Share document.

  • Open access
  • Published: 02 November 2020

Comprehensive review of text-mining applications in finance

  • Aaryan Gupta 1 ,
  • Vinya Dengre 1 ,
  • Hamza Abubakar Kheruwala 1 &
  • Manan Shah 2  

Financial Innovation volume  6 , Article number:  39 ( 2020 ) Cite this article

37k Accesses

65 Citations

1 Altmetric

Metrics details

Text-mining technologies have substantially affected financial industries. As the data in every sector of finance have grown immensely, text mining has emerged as an important field of research in the domain of finance. Therefore, reviewing the recent literature on text-mining applications in finance can be useful for identifying areas for further research. This paper focuses on the text-mining literature related to financial forecasting, banking, and corporate finance. It also analyses the existing literature on text mining in financial applications and provides a summary of some recent studies. Finally, the paper briefly discusses various text-mining methods being applied in the financial domain, the challenges faced in these applications, and the future scope of text mining in finance.

Introduction

Today, technology is deeply integrated with everyone’s lives. Nearly every activity in modern life, from phone calls to satellites sent into space, has evolved exponentially with technology (Patel et al. 2020a , b , c ; Panchiwala and Shah 2020 ). The increasing ability to create and manage information has been an influential factor in the development of technology. According to the National Security Agency of the United States, 1826 petabytes on average are handled daily over the Internet (Hariri et al. 2019 ; Jaseena and David 2014 ). With the rapid increase in data and information communicated over the Internet, it has become necessary to regulate and ease the flow of the same (Ahir et al. 2020 ; Gandhi et al. 2020 ). A number of commercial and social applications have been introduced for these purposes. Aspects of data and information, such as security, research, and sentiment analysis, can be of great help to organisations, governments, and the public (Jani et al. 2019 ; Jha et al. 2019 ). There are various optimized techniques that aid us in tasks such as classification, summarisation, and ease of access and management of data, among others (Shah et al. 2020a , b ; Talaviya et al. 2020 ). Algorithms related to machine learning and deep learning (DL) are just some of the many algorithms that can be used to process the available information (Kakkad et al. 2019 ; Kundalia et al. 2020 ). Even though there is a massive amount of available information, the use of computational techniques can help us process information from top to bottom and analyse entire documents as well as individual words (Pandya et al. 2019 ; Parekh et al. 2020 ).

Human-generated ‘natural’ data in the form of text, audio, video, and so on are rapidly increasing (Shah et al. 2020a , b ). This has led to a rise in interest in methods and tools that can help extract useful information automatically from enormous amounts of unstructured data (Jaseena and David 2014 ; David and Balakrishnan 2011 ). One crucial method is text mining, which is a combined derivative of techniques such as data mining, machine learning, and computational linguistics, among others. Text mining aims to extract information and patterns from textual data (Talib et al. 2016b ; Fan et al. 2006 ). The trivial approach to text mining is manual, in which a human reads the text and searches for useful information in it. A more logical approach is automatic, which mines text in an efficient way in terms of speed and cost (Herranz et al. 2018 ; Sukhadia et al. 2020 ; Pathan et al. 2020 ).

According to the India Brand Equity Foundation (IBEF 2019 ), the Indian financial industry alone had US $340.48 billion in assets under management as of February 2019. This value only provides us with a limited indication of the actual size and reach of the global finance industry. Technology has paved the way for digitalisation in this rapidly growing behemoth. ‘FinTech’ is a developing domain in the finance industry, which has been defined as a union of finance and information technology (Zavolokina et al. 2016 ). Marrara et al. ( 2019 ) examined how FinTech relates to Italian small and medium-sized enterprises (SMEs), where FinTech has witnessed huge growth in terms of investment and development, and how it has proved fruitful for the SME market in a short amount of time. FinTech has popularised the use of data in the financial industry. This data is substantially in the form of structured or unstructured text. Therefore, traditionally and technically, textual data can be regarded as always having been a prevailing and essential element in the finance sector.

Unstructured textual data have been increasing rapidly in the finance industry (Lewis and Young 2019 ). This is where text mining has a lot of potential. Kumar and Ravi ( 2016 ) explored various applications in the financial domain in which text mining could play a significant role. They concluded that it had numerous applications in this industry, such as various kinds of predictions, customer relationship management, and cybersecurity issues, among others. Many novel methods have been proposed for analysing financial results in recent years, and artificial intelligence has made it possible to analyse and even predict financial outcomes based on historical data.

Finance has been an important force in human life since the earliest civilisations. It is noteworthy that from barter systems to cryptocurrencies, finance has always been associated with data, such as transactions, accounts, prices, and reports. Manual approaches to processing data have been reduced in use and significance over time. Researchers and practitioners have come to prefer digitised and automated approaches for studying and analysing financial data. Financial data contain a significant amount of latent information. If the latent information were to be extracted manually from a huge corpus of data, it might take years. Advancements in text mining have made it possible to efficiently examine textual data pertaining to finance. Bach et al. ( 2019 ) published a literature review on text mining for big-data analysis in finance. They structured the review in terms of three critical questions. These questions pertained to the intellectual core of finance, the text-mining techniques used in finance, and the data sources of financial sectors. Kumar and Ravi ( 2016 ) discussed the model presented by Vu et al. ( 2012 ) that implemented text mining on Twitter messages to perform sentiment analysis for the prediction of stock prices. They also mentioned the model of Lavrenko et al. ( 2000 ), which could classify news stories in a way that could help identify which of them affected trends in finance and to what degree. We will further discuss text-mining applications in finance in subsequent sections.

Apart from finance, we present a brief overview of text mining in other industries. On social media, people generate text data in the form of posts, blogs, and web forum activity, among many others (Agichtein et al. 2008 ). Despite the vast quantity of data available, the relatively low proportion of content of significant quality is still a problem (Kinsella et al. 2011 ), which is an issue that can be solved by text mining (Salloum et al. 2017 ). In the biomedical field too, there is a need for effective text-mining and classification methods (Krallinger et al. 2011 ). On e-commerce websites, text mining is used to prevent the repetition of information to the same audience (Da-sheng et al. 2009 ) and improve product listings through reviews (Kang and Park 2016 ; Ur-Rahman and Harding 2012 ). In healthcare, researchers have worked on applications such as the identification of healthcare topics directly from personal messages over the Internet (Lu 2013 ), classification of online data (Srivastava et al. 2018 ), and analysis of patient feedback (James et al. 2017 ). The agriculture industry has also used text mining in, for example, the classification of agricultural regulations (Espejo-Garcia et al. 2018 ), ontology-based agricultural text clustering (Su et al. 2012 ), and analysis of agricultural network public opinions (Lee 2019 ). Text mining has also been utilised in the detection of malicious web URLs which evolve over time and have complex features (Li et al. 2020a ; b , c ).

This paper discusses the use of text mining in the financial domain in detail, taking into consideration three major areas of application: financial forecasting, banking, and corporate finance. We also discuss the widely used methodologies and techniques for text mining in finance, the challenges faced by researchers, and the future scope for text-mining methods in finance.

Overview of text-mining methodologies

Text mining is a process through which the user derives high-quality information from a given piece of text. Text mining has seen a significant increase in demand over the last few years. Coupled with big data analytics, the field of text mining is evolving continuously. Finance is one major sector that can benefit from these techniques; the analysis of large volumes of financial data is both a need and an advantage for corporates, government, and the general public. This section discusses some important and widely used techniques in the analysis of textual data in the context of finance.

Sentiment analysis (SA)

One of the most important techniques in the field is SA. It has applications in numerous sectors. This technique extracts the underlying opinions within textual data and is therefore also referred to as opinion mining (Akaichi et al. 2013 ). It is of prime use in a number of domains, such as e-commerce platforms, blogs, online social media, and microblogs. The motives behind sentiment analysis can be broadly divided into emotion recognition and polarity detection. Emotion detection is focused on the extraction of a set of emotion labels, and polarity detection is more of a classifier-oriented approach with discrete outputs (e.g., positive and negative) (Cambria 2016 ).

There are two main approaches for SA, namely lexicon-based (dictionary-based) and machine learning (ML). The latter is further classified into supervised and unsupervised learning approaches (Xu et al. 2019 ; Pradhan et al. 2016 ). Lexicon-based approaches use SentiWordNet word maps, whereas ML considers SA as a classification problem and uses established techniques for it. In lexicon-based approaches, the overall score for sentiment is calculated by dividing the sentiment frequency by the sum of positive and negative sentiments. In ML approaches, the major techniques that are used are Naïve Bayes (NB) classifier and support vector machines (SVMs), which use labelled data for classification. SA using ML has an edge over the lexicon approach, as it doesn’t require word dictionaries that are highly costly. However, ML requires domain-specific datasets, which can be considered as a limitation (Al-Natour and Turetken 2020 ). After data preprocessing, feature selection is performed as per the requirement, following which one obtains the final results after the analysis of the given data as per the adopted approach (Hassonah et al. 2019 ).

In the financial domain, stock market prediction is one of the applications in which SA has been used to predict future stock market trends and prices from the analysis of financial news articles. Joshi et al. ( 2016 ) compared three ML algorithms and observed that random forest (RF) and SVMs performed better than NB. Renault ( 2019 ) used StockTwits (a platform where people share ideas about the stock market) as a data source and applied five algorithms, namely NB, a maximum entropy method, a linear SVM, an RF, and a multilayer perceptron and concluded that the maximum entropy and linear SVM methods gave the best results. Over the years, researchers have combined deep learning methods with traditional machine learning techniques (e.g., construction of sentiment lexicon), thus obtaining more promising results (Yang et al. 2020 ).

Information extraction

Information extraction (IE) is used to extract predefined data types from a text document. IE systems mainly aim for object identification by extracting relevant information from the fragments and then putting all the extracted pieces in a framework. Post extraction, DiscoTEX (Discovery from TextEXtraction) is one of the core methods used to convert the structured data into meaningful data to discover knowledge from it (Salloum et al. 2018 ).

In finance, named-entity recognition (NER) is used for extracting predefined types of data from a document. In banking, transaction order documents of customers may come via fax, which results in very diverse documents because of the lack of a fixed template and creates the need for proper feature extraction to obtain a structured document (Emekligil et al. 2016 ).

Natural language processing (NLP)

NLP is a part of the artificial intelligence domain and attempts to help transform imprecise and ambiguous messages into unambiguous and precise messages. In the financial sector, it has been used to assess a firm’s current and future performance, domain standards, and regulations. It is often used to mine documents to obtain insights for developing conclusions (Fisher et al. 2016 ). NLP can help perform various analyses, such as NER, which further helps in identifying the relationships and other information to identify the key concept. However, NLP lacks a dictionary list for all the named entities used for identification (Talib et al. 2016a ; b ).

As NLP is a pragmatic research approach to analyse the huge amount of available data, Xing et al. ( 2017 ) applied it to bridge the gap between NLP and financial forecasting by considering topics that would interest both the research fields. Figure  1 provides an intuitive grasp of natural language-based financial forecasting (NLFF).

figure 1

An intersection of NLP and financial forecasting to illustrate the concept of NLFF (Xing et al. 2017 )

Chen et al. ( 2020 ) discussed the role of NLP in FinTech in the past, present, and future. They reviewed three aspects, namely know your customer (KYC), know your product (KYP), and satisfy your customer (SYC). In KYC, a lot of textual data is generated in the process of acquiring information about customers (corporate sector and retail). With respect to KYP, salespersons are required to know all the attributes of their product, which again requires data in order to know the prospects, risks, and opportunities of the product. In SYC, salespersons/traders and researchers try to make the financial activities more efficient to satisfy the customers in the business-to-customer as well as customer-to-customer business models. Herranz et al. ( 2018 ) discussed the role of NLP in teaching finance and reported that it enhanced the transfer of knowledge within an environment overloaded with information.

  • Text classification

Text classification is a four-step process comprising feature extraction, dimension reduction, classifier selection, and evaluation. Feature extraction can be done with common techniques such as term frequency and Word2Vec; then, dimensionality reduction is performed using techniques such as principal component analysis and linear discriminant analysis. Choosing a classifier is an important step, and it has been observed that deep learning approaches have surpassed the results of other machine learning algorithms. The evaluation step helps in understanding the performance of the model; it is conducting using various parameters, such as the Matthews correlation coefficient (MCC), area under the ROC curve (AUC), and accuracy. Accuracy is the simplest of these to evaluate. Figure  2 shows an overview of the text classification process (Kowsari et al. 2019 ).

figure 2

A general overview of the text classification process (Kowsari et al. 2019 )

Brindha et al. ( 2016 ) compared the performance of various text classification techniques, namely NB, k-nearest neighbour (KNN), SVM, decision tree, and regression, and found that based on the precision, recall, and F1 measures, SVM provided better results than the others.

Deep learning

Deep learning is a part of machine learning, which trains a data model to make predictions about new data. Deep learning has a layered architecture, where the input data goes into the lowest level and the output data is generated at the highest level. The input is transformed at the various middle levels by applying algorithms to extract features, transform features into factors, and then input the factors into the deeper layer again to obtain transformed features (Heaton et al. 2016 ). Widiastuti ( 2018 ) focused on the input data, as it plays an important role in the performance of any algorithm. The author concluded that modification of the network architecture with deep learning algorithms can markedly affect performance and provide good results.

In finance, deep learning solves the problem of complexity and ambiguity of natural language. Kraus and Feuerriegel ( 2017 ) used a corpus of 13,135 German ad hoc announcements in English to predict stock market movements and concluded that deep learning was better than the traditional bag-of-words approach. The results also showed that the long short-term memory models outperformed all the existing machine learning algorithms when transfer learning was performed to pre-train word embeddings.

Review of text-mining applications in finance

As mentioned in earlier sections, this paper focuses on the applications of text mining in three sectors of finance, namely financial predictions, banking, and corporate finance. In the subsections, we review various studies. Some literature has been summarised in detail, and in the end, a tabular summary of some more studies is included. Figure  3 shows a summarised link between the text-mining techniques and their corresponding applications in the respective domains. Although the following subsections discuss the studies pertaining to each sector individually, there has also been research on techniques that can be applied to multiple financial sectors. One such system was proposed by Li et al. ( 2020a ), which was a classifier based on adaptive hyper-spheres. It could be helpful in tasks such as credit scoring, stock price prediction, and anti-fraud analysis.

figure 3

An overview of how text mining can be used in the financial domain. This paper follows a systematic approach for reviewing text-mining applications, as depicted by the flowchart in the figure. The two independent entities, namely finance and text mining, are linked together to show the possible applications of various text-mining techniques in various financial domains

Prediction of financial trends

Using the ever-expanding pool of textual data to improve the dynamics of the market has long been a practice in the financial industry. The increasing volume of press releases, financial data, and related news articles have been motivating continued and sophisticated analysis, dating back to the 1980s, in order to derive a competitive advantage (Xing et al. 2017 ). Abundant data investigated with text mining can deliver an advantage in a variety of scenarios. As per Tkáč and Verner ( 2016 ) and Schneider and Gupta ( 2016 ), among the many ideas covered in financial forecasting, from credit scoring to inflation rate prediction, a large proportion of focus is on stock market and forex prediction. Wen et al. ( 2019 ) proposed an idea regarding how retail investor attention can be used for evaluation of the stock price crash risk.

Wu et al. ( 2012 ) proposed a model that combined the features of technical analysis of stocks with sentiment analysis, as stock prices also depend on the decisions of investors who read stock news articles. They focused on obtaining the overall sentiment behind each news article and assigned it the respective sentiment based on the weight it carried. Next, using different indicators, such as price, direction, and volume, technical analysis was performed and the learning prediction model was generated. The model was used to predict Taiwan’s stock market, and the results proved to be more promising than models that employed either of the two. This indicates an efficient system that can be integrated with even better features in the future.

Al-Rubaiee et al. ( 2015 ) analysed the relationship between Saudi Twitter posts and the country’s stock market (Tadawul). They used a number of algorithms such as SVM, KNN, and NB algorithms to classify Arabic text for the purpose of stock trading. Their major focus was on properly preprocessing data before the analysis. By comparing the results, they found that SVM had the best recall, and KNN had the best precision. The one-to-one model that they built showcased the positive and negative sentiments as well as the closing values of the Tadawul All Share Index (TASI). The relationship between a rise in the TASI index and an increase in positive sentiments was found out to be greater than that of a decline in the index and negative sentiments. The researchers mentioned that in future work they would incorporate the Saudi stock market closing values and sentiment features on tweets to explore the patterns between the Saudi stock index and public opinion on Twitter.

Vijayan and Potey ( 2016 ) proposed a model based on recent news headlines that predicted the forex trends based on the given market situations. The information about the past forex currency pair trends was analysed along with the news headlines corresponding to that timeline, and it was assumed that the market would behave in the future as it had done in the past. The researchers focused on the elimination of redundancy, and their model focused on news headlines rather on entire articles. Multilayer dimension reduction algorithms were used for text mining, the Synchronous Targeted Label Prediction algorithm was used for optimal feature reduction, and the J48 algorithm was used for the generation of decision trees. The main focus was on fundamental analysis that targeted unstructured textual data in addition to technical analysis to make predictions based on historical data. The J48 algorithm resulted in an improvement in the accuracy and performance of the overall system, better efficiency, and less runtime. In fact, the researchers reported that the algorithm could be applied to diverse subjects, such as movie reviews.

Nassirtoussi et al. ( 2015 ) proposed an approach for forex prediction wherein the major focus was on strengthening text-mining aspects that had not been focused upon in previous studies. Dimensionality reduction, semantic integration, and sentiment analysis enabled efficient results. The system predicted the directional movement of a currency pair based on news headlines in the sector from a few hours before. Again, headlines were taken into consideration for the analysis, and a multilayer algorithm was used to address semantics, sentiments, and dimensionality reduction. This model’s process was highly accurate, with results of up to 83%. The strong results obtained in that study demonstrate that the studied relationships exist. The models can be applied to other contexts as well.

Nikfarjam et al. ( 2010 ) discussed the components that constitute a forecasting model in this sector and the prototypes that had been recently introduced. The main components were compared with each other. Feature selection and feature weighting were used to select a piece of news and assign weights to them, used either individually or in combination for feature selection. Next, feature weighting was used to calculate the weights for the given terms. The feature weighting methodology was based on the study by Fung et al. ( 2002 ), who had assigned more weights to enhance the term frequency-inverse document frequency (TF-IDF) weighting. For text classification, most researchers have applied SVMs to classify the input text into either good or bad news. Some researchers have used Bayesian classifiers, and some others have used a combination of binary classifiers to achieve the final classification decision. Many authors have focused on news features but not equally addressed the available market data. The focus of most studies has been on the analysis of news and indicator values separately, which has proved to be less efficient. The combination of both market news and the status of market trends at the same time is expected to provide stronger results.

Gupta et al. ( 2019 ) proposed a combination of two models: the primary model obtained the dataset for prediction, preprocessed the dataset using logistic regression to remove redundancy, and employed a genetic algorithm, KNN, and support vector regression (SVR). In a comparison of all three, KNN was the basis for their predictions, with an efficiency of more than 50%. The genetic algorithm was used next in search for better accuracy. In an attempt to further support the genetic algorithm, SVR was used, which gave the opening price for any day in the future. For sentiment analysis, Twitter was used, as it was considered the most popular source for related news. The model divided the tweets into two categories, and the rise or fall of the market was predicted taking into consideration the huge pool of keywords. In the end, the model had an accuracy of about 70–75%, which seems reasonable for a dynamic environment.

Nguyen et al. ( 2015 ) focused on sentiment analysis of social media. They obtained the sentiments behind specific topics of discussion of the company on social media and achieved promising results in comparison with the accuracy of stocks in the preceding year. Sentiments annotated by humans on social media with regards to stock prediction were analysed, and the percentage of desired sentiments was calculated for each class. For a remaining lot of messages without explicit sentiments, a classification model was trained using the annotated sentiments on the dataset. For both of these tasks, an SVM was used as the classification model. In another study, after lemmatisation by CoreNLP, latent Dirichlet allocation (LDA) (Blei et al. 2003 ) was used as the generative probabilistic model. The authors also implemented the JST model (Lin and He 2009 ) and Aspect-based Sentiment Analysis for analysing topic sentiments for stock prediction. The study’s limitation was that the topics and models were selected beforehand. The accuracy was around 54%; however, the overall prediction in the model passed only if the stock went up or down. As the model just focused on sentiments and historical prices, the authors intended to add more factors to build a more accurate model.

Li et al. ( 2009 ) approached financial risk analysis through the available financial data on sentiments and used machine learning and sentiment analysis. The uniqueness of their study was the volume of data and the information sentiments. A generalised autoregressive conditional heteroskedasticity modelling (GARCH)-based artificial neural network and a GARCH-based SVM were used. A special training process, named the ‘dynamic training technique’, was applied because the data was non-stationary and noisy and could have resulted in overfitting. For analysing news, the semantic orientation-based approach was adopted, mainly because of the number of articles that were analysed in the study. The future work on this model was expected to include more input data and better sentiment analysis algorithms to obtain better results.

The use of sentiment analysis as a tool to facilitate investment and risk decisions by stock investors was demonstrated by Wu et al. ( 2014 ). Sina Finance, an experimental platform, was the basis for the collection of financial data for this model. The method incorporated machine learning based on SVM and GARCH with sentiment analysis. At the specific opening and closing times for each day, the GARCH-based SVM was used to identify the relations between the obtained information’s sentiment and stock price volatility. This model showed better results when predicting individual stocks rather than at the industry level. The machine learning approach was about 6% more accurate than the lexicon-based semantic approach, and it performed better with bigger datasets. The model performed better on datasets relating to small companies, as small companies were observed to be more sensitive to online reviews. The authors mentioned their future scope as expanding their dataset and attempting to create a more efficient sentiment calculation algorithm to increase the overall accuracy, similar to the one made by Li et al. ( 2009 ).

A slightly different approach was used by Ahmad et al. ( 2006 ), who focused on sentiment analysis of financial news streams in multiple languages. Three widely spoken languages, namely Arabic, Chinese, and English, were used for replication for automatic sentiment analysis. The authors adopted a local grammar approach using a local archive of the three languages. A statistical criterion in the training collection of texts helped in the identification of keywords. The most widely available corpus was for English, followed by Chinese and Arabic. Based on the frequencies of various words, the most widely utilised words were ranked and selected. Through manual evaluation, the accuracy of extraction ranged from 60 to 75%. A more robust evaluation of this model would be necessary for use in real-time markets, with the inclusion of more than one news vendor at a time.

Over the years, deep learning has become acknowledged as a useful machine learning technique that enables state-of-the-art results. It uses multiple layers to create representations and features from the input data. Text-mining analysis has also continuously evolved. The early basic model used lexicon-based analysis to account for a particular entity (sentiment analysis). Considering the complexity of language, a complete understanding of what any piece of text aims to convey requires a more complex analysis to identify and target relevant entities and related aspects (Dohaiha et al. 2018 ). The most important aspect is the relationship between the words in the text, and how the same is dominant in determining the meaning of the content. Several language elements, such as implications (Ray and Chakrabarti 2019 ) and sarcasm, require high-level methods for handling. This problem requires the use of deep learning models that can help completely understand a given piece of text. Deep learning may incorporate time series analysis and aspect-based sentiment analysis, which enhances data mining, feature selection, and fast information retrieval. Deep learning models learn features during the process of learning. They create abstract representations of the given data and therefore are unchanged with local changes to the input data (Sohangir et al. 2018 ). Word embeddings target words that are similar in context. By the measurement of similarities between words (e.g., cosine similarity in the case of vectors), one can employ word embeddings in the initial data preprocessing layers for faster and more efficient NLP execution (Young et al. 2018 ).

The huge amount of streaming financial news and articles are impossible to be processed by humans for interpretation and application on a daily basis. In a number of uses, such as portfolio construction, forecasting a financial time series is essential. The application of DL techniques on such data for forecasting purposes is of interest to industry professionals. It has been reported that repeated patterns of price movements can be estimated using econometric and statistical models (Souma et al. 2019 ). Even though the market is dynamic, a combination of deep learning models and past market trends is very useful for accurate predictions. In a comparison of real trades with the generated market trades with the use of SA, Kordonis et al. ( 2016 ) found a considerable effect of sentiments on the predictions. Because of the promising results, the use of artificial intelligence and deep learning has attracted the interests of many researchers and practitioners to improve forecasting.

With the use of deep learning, one has to perform little work by hand, while being able to harness a large amount of computation and data. DL techniques that use distributed representation are considered state-of-the-art methods for a large variety of NLP problems. We expect these models to improve and get better at handling unlabelled data through the development and use of approaches such as reinforcement learning.

Owing to the advancements in technology, there are several factors that can be used in models that aim to predict market movements. Not only the price models but also a number of different related models include macroeconomic variables (e.g., investment). Although macroeconomic indicators are important, they tend to be updated infrequently. Unlike such economic factors, public mood and sentiments (Xing et al. 2018a , b ) are dynamic and can be instantaneously monitored. For instance, behavioural science researchers have found that the stock market is affected by the investors’ psychology (Daniel et al. 2001 ). Depending on their mood states, investors make numerous decisions, a big proportion of which are risky. The impact of sentiment and attention measures on stock market volatility (Audrino et al. 2018 ) can be gauged through news articles, social media, and search engine results. The models that incorporate technical indicators of the market with sentiments obtained from the aforementioned sources outperform those that rely on only one of the two (Li et al. 2009 ). In a study pertaining to optimal portfolio allocation, Malandri et al. ( 2018 ) used historical data of the New York Stock Exchange and combined it with sentiment data to get comparatively better returns for the portfolios taken under consideration.

Empirical studies have shown that current market prices are a reflection of recently published news; this has been clearly shown by the Efficient Market Hypothesis (Fama 1991 ). Rather than being dependent on the existing information, price changes are markedly affected by new information or news. ML and DL methods have allowed data scientists to play a part in financial sector analysis and prediction (Picasso et al. 2019 ). There has been an increasing use of text-mining methods to make trading decisions (Wu et al. 2012 ). Different kinds of models, including neural networks, are used for sentiment embeddings from news, tweets, and financial blogs. Mudinas et al. ( 2019 ) studied the change of Granger-caused stocks based on sentiments alone—although this did not provide promising results, the integration with prediction models gave better results. This is because sentiments cannot be determinant factors alone, but they can be used with prediction models to lead to better and dynamic results.

As discussed above, a plethora of proposals and approaches in relation to financial forecasting have been studied, the two main applications of which have been stock prediction and forex. The main focus of these studies was on obtaining sentiments from news headlines and not from entire articles. Researchers have used a variety of text-mining approaches to integrate the abundant amount of useful information with financial patterns. Table  1 summarises some more research studies that have been conducted in recent years on the subject of text mining in financial predictions.

Banking and related applications

Banking is one of the largest and fastest-growing industries in this era of globalisation. The industry is heading towards adopting the most efficient practices for each of its departments. The total lending in the financial year 2017–2018 increased from US $429.92 billion to $1347.18 billion at a CAGR of 10.94% (Ministry of Commerce and Industry, Government of India, 2019). This huge rise is promoting strong economic growth, increasing incomes, enhancing trouble-free access to bank credit, and increasing consumerism. In the midst of an IT revolution, competitive reasons have led to the rising importance and adoption of banking automation. IT enables the implementation of various techniques for risk controls and smooth flow of transactions over electronic mediums and supports financial product innovation and development.

Gao and Ye ( 2007 ) proposed a framework for preventing money laundering with the help of the transaction histories of customers. They did this by identifying suspicious data from various textual reports from law enforcement agencies. They also mined unstructured databases and text documents for knowledge discovery in order to automatically extract the profiles of the entities that could be involved in money laundering. They employed SVM, decision trees, and Bayesian inference to develop a hierarchical structure of the suspicious reports and regression to identify hidden patterns.

Bholat et al. ( 2015 ) analysed the utility of text mining in central banks (CB), as a wide range of data sources is required for evaluating monetary and financial stability and for achieving policy objectives. Therefore, text-mining techniques are more powerful than manual means. The authors elucidated two major approaches: the use of text as data for research purposes in CB, and the various text-mining techniques for this purpose. For the former, they suggested that textual data in the form of social narratives can be used by central banks as financial indicators for risk and uncertainty management by employing topic clustering on the narratives. The latter aspect involved preprocessing of data to de-duplicate it, convert it into text files, and reduce it into tokens by various tokenisation techniques. Thereafter, text-mining techniques, such as dictionary techniques, vector space models, latent semantic analysis, LDA, and NB algorithm, were applied to the tokenised data. The authors concluded that aggregately, these can be a very useful addition to the efficient functioning of the CB.

Bach et al. ( 2019 ) stated that a huge amount of unstructured data from various sources has created a requirement for the extraction of keywords in the banking sector. They mentioned four different procedures for the extraction of keywords, which were obtained from the study by Bharti and Babu ( 2017 ). Bach et al. further discussed how keyword extraction can be implemented to extract related useful comments and documents and to compare the banking institutions as well. They also reviewed some other text-mining techniques that can be utilised by banks. NER was used on large datasets for the extraction of entities such as a person, location, and organisation. Sentiment analysis was done to analyse customer opinions, which is crucial for a bank’s functioning. Topic extraction was found to be useful mainly in credit banking. Social network analysis, a graph theory-based methodology to study the social media user structure, provided an outlook on how the customers are connected on the social media and how impactful they were in sharing information to the network of interests. This social network analysis could then be coupled with text mining to identify the keywords which correspond to the customers’ common interest.

Yap et al. ( 2011 ) discussed the issue faced by recreational clubs with respect to potential defaulters and non-defaulters. They proposed a credit scoring model that utilised text mining for estimating the financial obligations of credit applicants. A scorecard was built with the help of past performance reports of the borrowers wherein different clubs used different criteria for evaluating the historic data. The data was split into a 70:30 ratio for training and validating, respectively. They used three different models, namely a credit scorecard model, logistic regression model, and decision tree model, with an accuracy rate of 72.0%, 71.9%, and 71.2% respectively. Although the model benefitted the club administration, it also had a few limitations, such as poor quality of the scorecard and biased samples used to evaluate new applicants, as the model was built on historic data.

Xiong et al. ( 2013 ) devised a model for personal bankruptcy prediction using sequence mining techniques. The sequence results showed good prediction ability. This model has potential value in many industries. For clustering categorical sequences, a model-based k-means algorithm was designed. A comparative study of three models, namely SVM, credit scoring, and the one proposed by them, found that the accuracies were 89.3%, 80.54%, and 94.07% respectively. The sequence mining used in the proposed model outperformed the other two models. In terms of loss prediction, the KNN algorithm had the potential to identify bad accounts with promising predictive ability.

Bhattacharyya et al. ( 2011 ) explored the use of text mining in credit card fraud detection by evaluating two predictive models: one based on SVM, and the other based on a combination of random forest with logistic regression. They discussed various challenges and problems in the implementation of the models. They recommended that the models should always be kept updated to account for the growing malpractices. The original dataset used in the study comprised more than 50 million real-time credit card transactions. The dataset was split into multiple datasets as per the requirements of different techniques. Because of imbalanced data, the performance was not solely measured by the overall accuracy but also by sensitivity, specificity, and area under the curve. Although the random forest model showed the highest overall accuracy of 96.2%, the study provided some other noteworthy observations. The accuracy of each model varied according to the proportion of the fraudulent cases, with all of them having more than 99% accuracy for a dataset with 2% fraud rates. The authors concluded with suggestions for future exploration: modifying the models to make them more accurate and devising a more reliable approach to split datasets into training and testing sets.

Kou et al. ( 2014 ) used data regarding credit approval and bankruptcy risk from credit card applications to analyse financial risks using clustering algorithms. They made evaluations based on 11 performance measures using multicriteria decision-making (MCDM) methods. A previous study by Kou et al. ( 2012 ) had proposed these MCDM methods for the evaluation of classification algorithms. In a later study (Kou et al. 2019 ), they employed these methods for assessing the feature selection methods for text classification.

In addition to the above-discussed literature in this section, Table  2 provides a summary of some more studies related to the banking finance industry. As visible in Table  2 , banking has a lot of different text-mining applications. Risk assessment, quality assessment, money laundering detection, and customer relationship management are just a few examples from the wide pool of possible text-mining applications in banking.

Applications in corporate finance

Corporate finance is an important aspect of the financial domain because it integrates a company’s functioning with its financial structure. Various corporate documents such as the annual reports of a company have a lot of hidden financial context. Text-mining techniques can be employed to extract this hidden information and also to predict the company’s future financial sustainability.

Guo et al. ( 2016 ) implemented text-mining algorithms that are widely used in accounting and finance. They merged the Thomson Reuters News Archive database and the News Analytics database. The former provides original news, and the latter provides sentiment scores ranging from − 1 to 1 with positive, negative, and neutral scores. To balance the dataset, 3000 news articles were randomly selected for training and 500 for testing. Three algorithms, namely NB, SVM, and neural network, were run on the dataset. The overall output accuracies were 58.7%, 78.2%, and 79.6%, respectively. With the neural network having the highest accuracy, it was concluded that it can be used for text mining-based finance studies. Another model based on semantic analysis was also implemented, which used LDA. LDA was used to extract document relationships and the most relevant information from the documents. According to the authors, in accounting and finance, this technique has proven to be advantageous for examining analyst reports and financial reporting.

Lewis and Young ( 2019 ) discussed the importance of text mining in financial reports. They preferred NLP methods. They highlighted the exploding growth of unstructured textual data in corporate reporting, which opens numerous possibilities for financial applications. According to the authors, NLP methods for text mining provide solutions for two significant problems. One, they prevent overload through automated procedures to deal with immense amounts of data. Two, unlike human cognition, they are able to identify the underlying important latent features. The authors reviewed the widely used methodologies for financial reporting. These include keyword searches and word counts, attribute dictionaries, NB classification, cosine similarity, and LDA. Some factors, such as limited access to the text data resources and insufficient collaboration between various sectors and disciplines, were identified as challenges that are hindering progress in the application of text mining to finance.

Arguing that corporate sustainability reports (CSR) have increased dramatically, become crucial from the financial reporting perspective, and are not amenable to manual analysis processes, Shahi et al. ( 2014 ) proposed an automated model based on text-mining approaches for more intelligent scoring of CSR reports. After preprocessing of the dataset, four classification algorithms were implemented, namely NB, random subspace, decision table, and neural networks. Various parameters were evaluated and the training categories and feature selection algorithms were tuned to determine the most effective model. NB with the Correlation-based Feature Selection (CFS) filter was chosen as the preferred model. Based on this model, software was designed for CSR report scoring that lets the user input a CSR report to get its score as an automated output. The software was tested and had an overall effectiveness of 81.10%. The authors concluded that the software could be utilised for other purposes such as the popularity of performance indicators as well.

Holton ( 2009 ) implemented a model for preventing corporate financial fraud with a different and interesting perspective. The author considered employee disgruntlement or employee dissatisfaction as a hidden indicator that is responsible for fraud. A minimal dataset of intra-company communication messages and emails on online discussion groups was prepared. After using document clustering for estimating that the data possess sufficient predictive power, the NB classifier was implemented to classify the messages into disgruntled/non-disgruntled classes, and an accuracy of 89% was achieved. The author proposed the use of the model for fraud risk assessment in corporations and organisations with the motivation that it can be used to prevent huge financial losses. The performance of other models such as neural networks and decision trees was to be compared in future work.

Chan and Franklin ( 2011 ) developed a new decision-support system to predict the occurrence of an event by analysing patterns and extracting sequences from financial reports. After text preprocessing, textual information generalisation was performed with the help of a shallow parser, which had an F-measure of 85%. The extracted information was stored in a separate database. From this database, the event sequences were identified and extracted. A decision tree model was then implemented on these sequences to create an inference engine that could predict the occurrence of new events based on the training sequences. With an 85: 15% training-to-testing split, the model achieved an overall accuracy of 89.09%. The authors concluded by highlighting that their model had better and robust performance compared to the prevailing models.

Humpherys et al. ( 2011 ) reviewed various text-mining methods and theories that have been proposed for the detection of corporate fraud in financial statements and subsequently devised a methodology of their own. Their dataset comprised the Management’s Discussion and Analysis section of corporate annual financial reports. After basic analysis and reduction, various statistical and machine learning algorithms were implemented on the dataset, among which the NB and C4.5 decision tree models both gave the highest accuracy of 67.3% for classifying 10-K reports into fraudulent and non-fraudulent. The authors suggested that their model can be used by auditors for detecting fraudulent statements in reports with the aid of the Agent99 analyser tool.

Loughran and McDonald ( 2011 ) came up with the argument that the word lists contained in the Harvard Dictionary, which is commonly used for textual analysis, are not suitable for financial text classification because a lot of negative words in the Harvard list are not actually considered a negative in the financial context. Corporate 10-K reports were taken as data sources to create a new dictionary with new word lists for financial purposes. The authors advised the use of term weighting for the word lists. The new word lists were compared with the Harvard word lists on multiple financial data items, such as 10-K filing returns, material weaknesses, and standardised unexpected earnings. Although a significant difference between the word lists was not observed for classification, the authors still suggested the use of their lists in order to be more careful and prevent any erroneous results.

Whereas other researchers have mostly focused on fraud detection and financial predictions from corporate financial reports, Song et al. ( 2018 ) focused on sentiment analysis of these reports with respect to the CSR score. The sentences in the sample reports were manually labelled as positive and negative in order to create sample data for the machine learning algorithm. SVM was implemented on the dataset with a 3:1 training to test split, which achieved a precision ratio of 86.83%. Following this, an object library was created, with objects referring to the internal and external environment of the company. Sentiment analysis was conducted on these objects. Then, six regression models were developed to get the CSR score, with the model comprising of the Political, Economic, Social, Technological, Environmental and Legal (PESTEL), Porter’s Five Forces, and Primary and Support Activities showing the best performance in predicting the CSR score. The authors concluded that CSR plays a vital role in a company’s sustainability, and their research could aid stakeholders in their company-related decision-making.

There have been more studies on CSR reports and sustainability. Liew et al. ( 2014 ) analysed process industries for their sustainability trends with the help of CSR and sustainability reports of a large number of big companies. The RapidMiner tool was used for text preprocessing followed by generating frequency statistics, pruning, and further text refinement, which generated sustainability-related terms for analysis. The most occurring terms were taken into consideration to create a hierarchical tree model. Environment, health and safety, and social were identified as the key concepts for sustainability. Based on term occurrence and involvement, the authors classified the sustainability issues as specific, critical, rare, and general.

Table  3 presents some more studies on the applications of text mining in corporate finance. As evident from the table and the above-mentioned studies, the annual corporate reports are the most commonly used data source for text-mining applications.

Challenges and future scope

The financial sector is a significant driver of broader industry, and the increasing amount of data in this field has given rise to a number of applications that can be used to improve the field and achieve commercial objectives.

Figure  4 shows some common challenges faced by various text-mining techniques in the financial sector. The huge amount of data available is highly unstructured and has explicit meanings in addition to implicit ones. The data needs to undergo proper preprocessing before it can be used for analysis. Although lexicon lists are available for various domains, the financial sector has to have a specific dictionary for such approaches, so as to assign proper weights to corresponding aspects in the document. In addition to this, there is still restricted access to classified information, which is a significant obstacle. Lastly, the current techniques focus on obtaining static results statically that are true for a given period of time. There is a need for a system that performs text-mining techniques on dynamically obtained data to output real-time results to enable even better insights.

figure 4

Major challenges to text mining in finance

The combination of text-mining techniques and financial data analytics can produce a model that can potentially be the most efficient model for this problem domain. The results obtained from mining textual data can be integrated with those from financial analysis, thereby providing models that focus on historical data as well as opinions from diverse sources.

This paper conducted an organised qualitative review of recent literature pertaining to three specific sectors of finance. First, this paper analysed the growing importance of text mining in predicting financial trends. While the prior consensus may have been that financial markets are unpredictable, text mining has challenged this notion. The second area of study was banking, which has seen constant growth in technological innovation over the years, especially in digitisation. Text mining has played a key role in supporting these advancements both directly and indirectly through combination with other technologies. Corporate finance was the third study area. We discussed the importance of text mining in enabling the utilisation of corporate reports and financial statements for serving various purposes in addition to supporting corporate sustainability goals. The use of text mining in financial applications is not limited to these sectors. Researchers are increasingly showing interest in text-mining applications and constantly seeking to build more accurate models. There are still many unexplored possibilities in the financial domain, and the related research can help develop more robust and accurate predictive and analytic systems.

Availability of data and materials

All relevant data and material are presented in the main paper.

Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the international conference on web search and web data mining—WSDM ’08. https://doi.org/10.1145/1341531.1341557

Ahir K, Govani K, Gajera R, Shah M (2020) Application on virtual reality for enhanced education learning, military training and sports. Augment Hum Res 5:7

Article   Google Scholar  

Ahmad K, Cheng D, Almas Y (2006) Multi-lingual sentiment analysis of financial news streams. In: Proceedings of science, pp 1–8

Akaichi J, Dhouioui Z, López-Huertas Pérez MJ (2013) Text mining facebook status updates for sentiment classification. In: 2013 17th international conference on system theory, control and computing (ICSTCC), Sinaia, 2013, pp 640–645. https://doi.org/10.1109/ICSTCC.2013.6689032

Al-Natour S, Turetken O (2020) A comparative assessment of sentiment analysis and star ratings for consumer reviews. Int J Inf Manage. https://doi.org/10.1016/j.ijinfomgt.2020.102132

AL-Rubaiee H, Qiu R, Li D (2015) Analysis of the relationship between Saudi twitter posts and the Saudi stock market. In: 2015 IEEE seventh international conference on intelligent computing and information systems (ICICIS). https://doi.org/10.1109/intelcis.2015.7397193

Audrino F, Sigrist F, Ballinari D (2018) The impact of sentiment and attention measures on stock market volatility. Available at SSRN: https://ssrn.com/abstract=3188941 or https://doi.org/10.2139/ssrn.3188941

Aureli S (2017) A comparison of content analysis usage and text mining in CSR corporate disclosure. Int J Digit Account Res 17:1–32

Bach MP, Krsti Z, Seljan S, Turulja L (2019) Text mining for big data analysis in financial sector: a literature review. Sustainability 2019(11):1277

Bharti SK, Babu KS (2017) Automatic keyword extraction for text summarization: a survey. CoRR. abs/1704.03242.

Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613

Bholat D, Hansen S, Santos P, Schonhardt-Bailey C (2015) Text mining for central banks: handbook. Centre Cent Bank Stud 33:1–19

Google Scholar  

Bidulya Y, Brunova E (2016) Sentiment analysis for bank service quality: a rule-based classifier. In: 2016 IEEE 10th international conference on application of information and communication technologies (AICT). https://doi.org/10.1109/icaict.2016.7991688

Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(2003):993–1022

Brindha S, Prabha K, Sukumaran S (2016) A survey on classification techniques for text mining. In: 2016 3rd international conference on advanced computing and communication systems (ICACCS), Coimbatore, 2016, pp 1–5. https://doi.org/10.1109/ICACCS.2016.7586371

Bruno G (2016) Text mining and sentiment extraction in central bank documents. In: 2016 IEEE international conference on big data (big data). https://doi.org/10.1109/bigdata.2016.7840784

Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107. https://doi.org/10.1109/MIS.2016.31

Chakraborty V, Chiu V, Vasarhelyi M (2014) Automatic classification of accounting literature. Int J Account Inf Syst 15(2):122–148

Chan SWK, Franklin J (2011) A text-based decision support system for financial sequence prediction. Decis Support Syst 52(1):189–198

Chaturvedi D, Chopra S (2014) Customers sentiment on banks. Int J Comput Appl 98(13):8–13

Chen CC, Huang HH, Chen HH (2020) NLP in FinTech applications: past, present and future

Cook A, Herron B (2018) Harvesting unstructured data to reduce anti-money laundering (AML) compliance risk, pp 1–10

Daniel K, Hirshleifer D, Teoh S (2001) Investor psychology in capital markets: evidence and policy implications. J Monet Econ 49:139–209. https://doi.org/10.1016/S0304-3932(01)00091-5

Da-sheng W, Qin-fen Y, Li-juan L (2009) An efficient text classification algorithm in E-commerce application. In: 2009 WRI world congress on computer science and information engineering. https://doi.org/10.1109/csie.2009.346

David JM, Balakrishnan K (2011) Prediction of key symptoms of learning disabilities in school-age children using rough sets. Int J Comput Electr Eng Hong Kong 3(1):163–169

Dohaiha H, Prasad PWC, Maag A, Alsadoon A (2018) Deep learning for aspect-based sentiment analysis: a comparative review. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2018.10.003

Elagamy MN, Stanier C, Sharp B (2018) Stock market random forest-text mining system mining critical indicators of stock market movements. In: 2018 2nd international conference on natural language and speech processing (ICNLSP). https://doi.org/10.1109/icnlsp.2018.8374370

Emekligil E, Arslan S, Agin O (2016) A bank information extraction system based on named entity recognition with CRFs from noisy customer order texts in Turkish. In: Knowledge engineering and semantic web, pp 93–102

Espejo-Garcia B, Martinez-Guanter J, Pérez-Ruiz M, Lopez-Pellicer FJ, Javier Zarazaga-Soria F (2018) Machine learning for automatic rule classification of agricultural regulations: a case study in Spain. Comput Electron Agric 150:343–352

Fama EF (1991) Efficient capital markets: II. J Finance 46(5):1575–1617. https://doi.org/10.2307/2328565

Fan W, Wallace L, Rich S, Zhang Z (2006) Tapping the power of text mining. Commun ACM 49(9):76–82

Feuerriegel S, Gordon J (2018) Long-term stock index forecasting based on text mining of regulatory disclosures. Decis Support Syst 112:88–97

Fisher I, Garnsey M, Hughes M (2016) Natural language processing in accounting, auditing and finance: a synthesis of the literature with a roadmap for future research. Intell Syst Account Finance Manag. https://doi.org/10.1002/isaf.1386

Fritz D, Tows E (2018) Text mining and reporting quality in German banks—a cooccurrence and sentiment analysis. Univers J Account Finance 6(2):54–81

Fung G, Yu J, Lam W (2002) News sensitive stock trend prediction. Adv Knowl Discov Data Min. https://doi.org/10.1007/3-540-47887-6_48

Gandhi M, Kamdar J, Shah M (2020) Preprocessing of non-symmetrical images for edge detection. Augment Hum Res 5:10. https://doi.org/10.1007/s41133-019-0030-5

Gao Z, Ye M (2007) A framework for data mining-based anti-money laundering research. J Money Laund Control 10(2):170–179

Gemar G, Jiménez-Quintero JA (2015) Text mining social media for competitive analysis. Tour Manag Stud 11(1):84–90

Gulaty M (2016) Aspect-based sentiment analysis in bank reviews. https://doi.org/10.13140/RG.2.1.2072.3445

Guo L, Shi F, Tu J (2016) Textual analysis and machine leaning: crack unstructured data in finance and accounting. J Finance Data Sci 2(3):153–170

Gupta R, Gill NS (2012) Financial statement fraud detection using text mining. Int J Adv Comput Sci Appl 3(12):189–191

Gupta A, Simaan M, Zaki MJ (2016) Investigating bank failures using text mining. In: 2016 IEEE symposium series on computational intelligence (SSCI). https://doi.org/10.1109/ssci.2016.7850006

Gupta A, Bhatia P, Dave K, Jain P (2019) Stock market prediction using data mining techniques. In: 2nd international conference on advances in science and technology, pp 1–5

Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697

Hájek P, Olej V (2013) Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Communications in computer and information science, pp 1–10.

Hariri RH, Fredericks EM, Bowers KM (2019) Uncertainty in big data analytics: survey, opportunities, and challenges. J Big Data. https://doi.org/10.1186/s40537-019-0206-3

Hassonah M, Al-Sayyed R, Rodan A, Al-Zoubi A, Aljarah I, Faris H (2019) An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.105353

Heaton JB, Polson NG, Witte JH (2016) Deep learning in finance. arXiv:1602.06561

Heidari M, Felden C (2015) Financial footnote analysis: developing a text mining approach. In: Int'l conf. data mining, pp 10–16

Herranz S, Palomo J, Cruz M (2018) Building an educational platform using NLP: a case study in teaching finance. J Univ Comput Sci 24:1403

Holton C (2009) Identifying disgruntled employee systems fraud risk through text mining: a simple solution for a multi-billion dollar problem. Decis Support Syst 46(4):853–864

Humpherys SL, Moffitt KC, Burns MB, Burgoon JK, Felix WF (2011) Identification of fraudulent financial statements using linguistic credibility analysis. Decis Support Syst 50(3):585–594

IBEF (2019) https://www.ibef.org/download/financial-services-april-2019.pdf

James TL, Calderon EDV, Cook DF (2017) Exploring patient perceptions of healthcare service quality through analysis of unstructured feedback. Expert Syst Appl 71:479–492

Jani K, Chaudhuri M, Patel H, Shah M (2019) Machine learning in films: an approach towards automation in film censoring. J Data Inf Manag. https://doi.org/10.1007/s42488-019-00016-9

Jaseena KU, David JM (2014) Issues, challenges, and solutions: big data mining. In: Natarajan Meghanathan et al. (eds) NeTCoM, CSIT, GRAPH-HOC, SPTM—2014, pp 131–140

Jha K, Doshi A, Patel P, Shah M (2019) A comprehensive review on automation in agriculture using artificial intelligence. Artif Intell Agric 2:1–12

Joshi K, Bharathi N, Jyothi R (2016) Stock trend prediction using news sentiment analysis. Int J Comput Sci Inf Technol 8:67–76. https://doi.org/10.5121/ijcsit.2016.8306

Junqué de Fortuny E, De Smedt T, Martens D, Daelemans W (2014) Evaluating and understanding text-based stock price prediction models. Inf Process Manag 50(2):426–441

Kakkad V, Patel M, Shah M (2019) Biometric authentication and image encryption for image security in cloud framework. Multiscale Multidiscip Model Exp Des. https://doi.org/10.1007/s41939-019-00049-y

Kamaruddin SS, Hamdan AR, Bakar AA (2007) Text mining for deviation detection in financial statements. In: Proceedings of the international conference on electrical engineering and informatics. Institut Teknologi Bandung, Indonesia, 2007, June 17–19

Kang T, Park DH (2016) The effect of expert reviews on consumer product evaluations: a text mining approach. J Intell Inf Syst 22(1):63–82

Kinsella S, Passant A, Breslin JG (2011) Topic classification in social media using metadata from hyperlinked objects. Adv Inf Retr. https://doi.org/10.1007/978-3-642-20161-5_20

Kloptchenko A, Eklund T, Karlsson J, Back B, Vanharanta H, Visa A (2004) Combining data and text mining techniques for analysing financial reports. Intell Syst Account Finance Manag 12(1):29–41

Kordonis J, Symeonidis S, Arampatzis A (2016) Stock price forecasting via sentiment analysis on twitter. https://doi.org/10.1145/3003733.3003787 .

Kou G, Lu Y, Peng Y, Shi Y (2012) Evaluation of classification algorithms using MCDM and rank correlation. Int J Inf Technol Decis Mak. https://doi.org/10.1142/S0219622012500095

Kou G, Peng Yi, Wang G (2014) Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inf Sci 275:1–12. https://doi.org/10.1016/j.ins.2014.02.137

Kou G, Yang P, Peng Yi, Xiao F, Chen Y, Alsaadi F (2019) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836. https://doi.org/10.1016/j.asoc.2019.105836

Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10:150

Krallinger M, Vazquez M, Leitner F, Salgado D, Chatr-aryamontri A, Winter A, Perfetto L, Briganti L, Licata L, Iannuccelli M, Castagnoli L, Cesareni G, Tyers M, Schneider G, Rinaldi F, Leaman R, Gonzalez G, Matos S, Kim S, Wilbur WJ, Rocha L, Shatkay H, Tendulkar AV, Agarwal S, Liu F, Wang X, Rak R, Noto K, Elkan C, Lu Z, Dogan RI, Fontaine JF, Andrade-Navarro MA, Valencia A (2011) The protein–protein interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text. BMC Bioinform 12(Suppl 8):S3. https://doi.org/10.1186/1471-2105-12-s8-s3

Kraus M, Feuerriegel S (2017) Decision support from financial disclosures with deep neural networks and transfer learning. Decis Support Syst. https://doi.org/10.1016/j.dss.2017.10.001

Krstić Ž, Seljan S, Zoroja J (2019) Visualization of big data text analytics in financial industry: a case study of topic extraction for Italian banks (September 12, 2019). In: 2019 ENTRENOVA conference proceedings. https://ssrn.com/abstract=3490108 or https://doi.org/10.2139/ssrn.3490108

Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowl Based Syst 114:128–147

Kundalia K, Patel Y, Shah M (2020) Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augment Hum Res 5:11. https://doi.org/10.1007/s41133-019-0029-y

Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Mining of concurrent text and time series. In: KDD-2000 Workshop on text mining, vol 2000. Citeseer, pp 37–44

Lee CT (2019) Early warning mechanism of agricultural network public opinion based on text mining. Revista De La Facultad De Agronomia De La Universidad Del Zulia, 36

Lee B, Park JH, Kwon L, Moon YH, Shin Y, Kim G, Kim H (2018) About relationship between business text patterns and financial performance in corporate data. J Open Innov Technol Market Complex. https://doi.org/10.1186/s40852-018-0080-9

Lewis C, Young S (2019) Fad or future? Automated analysis of financial text and its implications for corporate reporting. Account Bus Res 49(5):587–615

Li N, Liang X, Li X, Wang C, Wu DD (2009) Network Environment and Financial Risk Using Machine Learning and Sentiment Analysis. Human Ecol Risk Assess Int J 15(2):227–252. https://doi.org/10.1080/10807030902761056

Li T, Kou G, Peng Y, Shi Y (2020a) Classifying with adaptive hyper-spheres: an incremental classifier based on competitive learning. IEEE Trans Syst Man Cybern Syst 50(4):1218–1229. https://doi.org/10.1109/TSMC.2017.2761360

Li X, Wu P, Wang W (2020b) Incorporating stock prices and news sentiments for stock market prediction: a case of Hong Kong. Inf Process Manag. https://doi.org/10.1016/j.ipm.2020.102212

Li T, Kou G, Peng Yi (2020c) Improving malicious URLs detection via feature engineering: linear and nonlinear space transformation methods. Inf Syst 91:101494. https://doi.org/10.1016/j.is.2020.101494

Liew WT, Adhitya A, Srinivasan R (2014) Sustainability trends in the process industries: a text mining-based analysis. Comput Ind 65(3):393–400

Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceeding of the 18th ACM conference on information and knowledge management—CIKM ’09. https://doi.org/10.1145/1645953.1646003

Loughran T, Mcdonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Finance 66(1):35–65

Lu Y (2013) Automatic topic identification of health-related messages in online health community using text classification. SpringerPlus 2(1):309

Malandri L, Xing F, Orsenigo C, Vercellis C, Cambria E (2018) Public mood-driven asset allocation: the importance of financial sentiment in portfolio management. Cogn Comput. https://doi.org/10.1007/s12559-018-9609-2

Marrara S, Pejic Bach M, Seljan S, Topalovic A (2019) FinTech and SMEs—the Italian case. https://doi.org/10.4018/978-1-5225-7805-5.ch002

Matthies B, Coners A (2015) Computer-aided text analysis of corporate disclosures—demonstration and evaluation of two approaches. Int J Digit Account Res 15:69–98

Mudinas A, Zhang D, Levene M (2019) Market trend prediction using sentiment analysis: lessons learned and paths forward. arXiv:1903.05440

Nan L, Xun L, Xinli L, Chao W, Desheng DW (2009) Network environment and financial risk using machine learning and sentiment analysis. Hum Ecol Risk Assess Int J 15(2):227–252

Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DC (2015) Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment. Expert Syst Appl 42(1):306–324. https://doi.org/10.1016/j.eswa.2014.08.004

Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42(24):9603–9611

Nikfarjam A, Emadzadeh E, Muthaiyah S (2010) Text mining approaches for stock market prediction. In: 2010 the 2nd international conference on computer and automation engineering (ICCAE). https://doi.org/10.1109/iccae.2010.5451705

Nopp C, Hanbury A (2015) Detecting risks in the banking system by sentiment analysis. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 591–600

Panchiwala S, Shah MA (2020) Comprehensive study on critical security issues and challenges of the IoT world. J Data Inf Manag. https://doi.org/10.1007/s42488-020-00030-2

Pandya R, Nadiadwala S, Shah R, Shah M (2019) Buildout of methodology for meticulous diagnosis of K-complex in EEG for aiding the detection of Alzheimer’s by artificial intelligence. Augment Hum Res. https://doi.org/10.1007/s41133-019-0021-6

Parekh V, Shah D, Shah M (2020) Fatigue detection using artificial intelligence framework. Augment Hum Res 5:5

Patel D, Shah Y, Thakkar N, Shah K, Shah M (2020a) Implementation of artificial intelligence techniques for cancer detection. Augment Hum Res. https://doi.org/10.1007/s41133-019-0024-3

Patel D, Shah D, Shah M (2020b) The intertwine of brain and body: a quantitative analysis on how big data influences the system of sports. Ann Data Sci. https://doi.org/10.1007/s40745-019-00239-y

Patel H, Prajapati D, Mahida D, Shah M (2020c) Transforming petroleum downstream sector through big data: a holistic review. J Petrol Explor Prod Technol. https://doi.org/10.1007/s13202-020-00889-2

Pathan M, Patel N, Yagnik H, Shah M (2020) Artificial cognition for applications in smart agriculture: a comprehensive review. Artif Intell Agric. https://doi.org/10.1016/j.aiia.2020.06.001

Pejic Bach M, Krstić Ž, Seljan S, Turulja L (2019) Text mining for big data analysis in financial sector: a literature review. Sustainability 11:1277. https://doi.org/10.3390/su11051277

Picasso A, Merello S, Ma Y, Oneto L, Cambria E (2019) Technical analysis and sentiment embeddings for market trend prediction. Expert Syst Appl 135:60–70. https://doi.org/10.1016/j.eswa.2019.06.014

Pradhan MV, Vala J, Balani P (2016) A survey on sentiment analysis algorithms for opinion mining. Int J Comput Appl 133:7–11. https://doi.org/10.5120/ijca2016907977

Ray P, Chakrabarti A (2019) A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis. Appl Comput Inform. https://doi.org/10.1016/j.aci.2019.02.002

Renault T (2019) Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digit Finance. https://doi.org/10.1007/s42521-019-00014-x

Sabo T (2017) Applying text analytics and machine learning to assess consumer financial complaints. In: Proceedings of the SAS global forum 2017 conference. SAS Institute Inc., Cary NC. https://support.sas.com/resources/papers/proceedings17/SAS0282-2017.pdf

Salloum S, Al-Emran M, Monem A, Shaalan K (2017) A survey of text mining in social media: facebook and twitter perspectives. Adv Sci Technol Eng Syst J 2:127–133. https://doi.org/10.25046/aj020115

Salloum S, Mostafa A, Monem A, Shaalan K (2018) Using text mining techniques for extracting information from research articles. https://doi.org/10.1007/978-3-319-67056-0_18

Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32(2):243–256

Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news. ACM Trans Inf Syst 27(2):1–19

Shah D, Isah H, Zulkernine F (2018a) Predicting the effects of news sentiments on the stock market. In: 2018 IEEE international conference on big data (big data). https://doi.org/10.1109/bigdata.2018.8621884

Shah T, Shaikh I, Patel A (2018b) Comparison of different kernels of support vector machine for predicting stock prices. Int J Eng Technol 9(6):4288–4291

Shah G, Shah A, Shah M (2019) Panacea of challenges in real-world application of big data analytics in healthcare sector. Data Inf Manag. https://doi.org/10.1007/s42488-019-00010-1

Shah D, Dixit R, Shah A, Shah P, Shah M (2020) A Comprehensive analysis regarding several breakthroughs based on computer intelligence targeting various syndromes. Augment Hum Res 5:14. https://doi.org/10.1007/s41133-020-00033-z

Shah K, Patel H, Sanghvi D, Shah M (2020) A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augment Hum Res 5:12. https://doi.org/10.1007/s41133-020-00032-0

Shahi AM, Issac B, Modapothala JR (2014) Automatic analysis of corporate sustainability reports and intelligent SCORING. Int J Comput Intell Appl 13(01):1450006. https://doi.org/10.1142/s1469026814500060

Shirata CY, Takeuchi H, Ogino S, Watanabe H (2011) Extracting key phrases as predictors of corporate bankruptcy: empirical analysis of annual reports by text mining. J Emerg Technol Account 8(1):31–44

Sohangir S, Wang D, Pomeranets A et al (2018) Big data: deep learning for financial sentiment analysis. J Big Data 5:3. https://doi.org/10.1186/s40537-017-0111-6

Song Y, Wang H, Zhu M (2018) Sustainable strategy for corporate governance based on the sentiment analysis of financial reports with CSR. Financ Innov. https://doi.org/10.1186/s40854-018-0086-0

Souma W, Vodenska I, Aoyama H (2019) Enhanced news sentiment analysis using deep learning methods. J Comput Soc Sci 2:33–46. https://doi.org/10.1007/s42001-019-00035-x

Srivastava SK, Singh SK, Suri JS (2018) Healthcare text classification system and its performance evaluation: a source of better intelligence by characterizing healthcare text. J Med Syst. https://doi.org/10.1007/s10916-018-0941-6

Su Y, Wang R, Chen P, Wei Y, Li C, Hu Y (2012) Agricultural ontology based feature optimization for agricultural text clustering. J Integr Agric 11(5):752–759

Sukhadia A, Upadhyay K, Gundeti M, Shah S, Shah M (2020) Optimization of smart traffic governance system using artificial intelligence. Augment Hum Res 5:13. https://doi.org/10.1007/s41133-020-00035-x

Sumathi N, Sheela T (2017) Opinion mining analysis in banking system using rough feature selection technique from social media text. Int J Mech Eng Technol 8(12):274–289

Talaviya T, Shah D, Patel N, Yagnik H, Shah M (2020) Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif Intell Agric. https://doi.org/10.1016/j.aiia.2020.04.002

Talib R, Hanif MK, Ayesha S, Fatima F (2016a) Text mining: techniques. Appl Issues 7(11):414–418

Talib R, Kashif M, Ayesha S, Fatima F (2016b) Text mining: techniques, applications and issues. Int J Adv Comput Sci Appl. https://doi.org/10.14569/IJACSA.2016.071153

Tkáč M, Verner R (2016) Artificial neural networks in business: two decades of research. Appl Soft Comput 38:788–804

Ur-Rahman N, Harding JA (2012) Textual data mining for industrial knowledge management and text classification: a business oriented approach. Expert Syst Appl 39(5):4729–4739

Vijayan R, Potey MA (2016) Improved accuracy of FOREX intraday trend prediction through text mining of news headlines using J48. Int J Adv Res Comput Eng Technol 5(6):1862–1866

Vu TT, Chang S, Ha QT, Collier N (2012) An experiment in integrating sentiment features for tech stock prediction in twitter. In: Workshop on information extraction and entity analytics on social media data, COLING, Mumbai, India, pp 23–38

Wang B, Huang H, Wang X (2012) A novel text mining approach to financial time series forecasting. Neurocomputing 83:136–145

Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and stock price crash risk: evidence from China. Int Rev Financ Anal 65:101376. https://doi.org/10.1016/j.irfa.2019.101376

Widiastuti N (2018) Deep learning—now and next in text mining and natural language processing. IOP Conf Ser Mater Sci Eng 407:012114. https://doi.org/10.1088/1757-899X/407/1/012114

Wu JL, Su CC, Yu LC, Chang PC (2012) Stock price predication using combinational features from sentimental analysis of stock news and technical analysis of trading information. Int Proc Econ Dev Res. https://doi.org/10.7763/ipedr

Wu DD, Zheng L, Olson DL (2014) A decision support approach for online stock forum sentiment analysis. IEEE Trans Syst Man Cybern Syst 44(8):1077–1087

Xing FZ, Cambria E, Welsch RE (2017) Natural language based financial forecasting: a survey. Artif Intell Rev 50(1):49–73

Xing FZ, Cambria E, Welsch RE (2018a) Natural language based financial forecasting: a survey. Artif Intell Rev 50:49–73. https://doi.org/10.1007/s10462-017-9588-9

Xing F, Cambria E, Welsch R (2018b) Intelligent asset allocation via market sentiment views. IEEE Comput Intell Mag 13:25–34. https://doi.org/10.1109/MCI.2018.2866727

Xiong T, Wang S, Mayers A, Monga E (2013) Personal bankruptcy prediction by mining credit card data. Expert Syst Appl 40(2):665–676

Xu G, Yu Z, Yao H, Li F, Meng Y, Wu X (2019) Chinese text sentiment analysis based on extended sentiment dictionary. IEEE Access 7:43749–43762. https://doi.org/10.1109/ACCESS.2019.2907772

Yang Li, Li Y, Wang J, Sherratt R (2020) Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 8:1–1. https://doi.org/10.1109/ACCESS.2020.2969854

Yap BW, Ong SH, Husain NHM (2011) Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl 38(10):13274–13283

Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738

Yusuuf H, Shihabeldeen A (2019) Using text mining to predicate exchange rates with sentiment indicators. J Bus Theory Pract 7(2):60–75

Zavolokina L, Dolata M, Schwabe G (2016) The FinTech phenomenon: antecedents of financial innovation perceived by the popular press. Financ Innov. https://doi.org/10.1186/s40854-016-0036-7

Download references

Acknowledgements

The authors are grateful to Nirma University and Department of Chemical Engineering, School of Technology, Pandit Deendayal Petroleum University for the permission to publish this research.

Not applicable.

Author information

Authors and affiliations.

Department of Computer Science, Nirma University, Ahmedabad, Gujarat, India

Aaryan Gupta, Vinya Dengre & Hamza Abubakar Kheruwala

Department of Chemical Engineering, School of Technology, Pandit Deendayal Petroleum University, Gandhinagar, Gujarat, 382007, India

You can also search for this author in PubMed   Google Scholar

Contributions

All the authors make substantial contribution in this manuscript. AG, VD, HA and MS participated in drafting the manuscript. AG, VD and HA wrote the main manuscript, all the authors discussed the results and implication on the manuscript at all stages. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Manan Shah .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Gupta, A., Dengre, V., Kheruwala, H.A. et al. Comprehensive review of text-mining applications in finance. Financ Innov 6 , 39 (2020). https://doi.org/10.1186/s40854-020-00205-1

Download citation

Received : 29 January 2020

Accepted : 17 September 2020

Published : 02 November 2020

DOI : https://doi.org/10.1186/s40854-020-00205-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Text mining
  • Machine learning
  • Financial forecasting
  • Sentiment analysis
  • Corporate finance

text mining research papers 2020

Technology and Big Data Are Changing Economics: Mining Text to Track Methods

The last 40 years have seen huge innovations in computing technology and data availability. Data derived from millions of administrative records or by using (as we do) new methods of data generation such as text mining are now common. New data often requires new methods, which in turn can inspire new data collection. If history is any guide, some methods will stick and others will prove to be a flash in the pan. However, the larger trends towards demanding greater credibility and transparency from researchers in applied economics and a “collage” approach to assembling evidence will likely continue.

We are grateful to Lawrence Katz for helpful comments. We thank Dana Scott for outstanding research assistance, and Tilmann Herchenroder for excellent research assistance in the early stages of the project. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Published Versions

More from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

  • Open access
  • Published: 14 January 2015

Using text mining for study identification in systematic reviews: a systematic review of current approaches

  • Alison O’Mara-Eves 1 ,
  • James Thomas 1 ,
  • John McNaught 2 ,
  • Makoto Miwa 3 &
  • Sophia Ananiadou 2  

Systematic Reviews volume  4 , Article number:  5 ( 2015 ) Cite this article

48k Accesses

300 Citations

152 Altmetric

Metrics details

An Erratum to this article was published on 28 April 2015

The large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities.

Five research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged?

We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings.

The evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable.

On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall).

Conclusions

Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in ‘live’ reviews. The use of text mining as a ‘second screener’ may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.

Peer Review reports

The problem: lack of precision in systematic searches

Systematic reviews are a widely used method to bring together the findings from multiple studies in a reliable way and are often used to inform policy and practice, such as guideline development [ 1 , 2 ]. Whilst they are often associated with medical research and randomised controlled trials, they can be used to address any research question using any relevant type of research [ 3 ]. A critical feature of a systematic review is the application of scientific methods to uncover and minimise bias and error in the selection and treatment of studies [ 4 , 5 ]. However, the large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way both complex and time consuming [ 6 ].

In order to minimise the impact of publication bias [ 7 ], reviewers make efforts to identify all relevant research for inclusion in systematic reviews. This has always been a challenging and time-consuming aspect of reviewing, but the challenge is growing due to the increase in the number of databases to search and the number of papers and journals being published; moreover, as recent work has suggested that there is an inbuilt North American bias in many major bibliographic databases (e.g. PubMed), a wide range of smaller databases needs to be searched in order to identify research for reviews that aim to maximise external validity [ 8 ]. In practice, this means adopting a multi-layered approach to searching which combines: extensive Boolean searches of electronic bibliographic databases, specialised registers and websites; with individual approaches to authors and key informants; and the following of ‘citation trails’ (identifying which papers are cited by a relevant study and which papers in turn cite the paper that it is reported in) [ 9 ]. Of these three approaches, searching databases yields around three quarters of the studies finally included [ 10 ].

Unfortunately, the specificity of sensitive electronic searches of bibliographic databases is low (for definitions of specificity, recall and other key metrics, see Table  1 ). Reviewers often need to look manually through many thousands of irrelevant titles and abstracts in order to identify the much smaller number of relevant ones [ 7 ]; a process known as screening . Reviews that address complex health issues or that deal with a range of interventions (e.g. a typical public health review might be concerned with ‘interventions to promote physical activity’) are often those that have the most challenging numbers of items to screen. Given that an experienced reviewer can take between 30 seconds and several minutes to evaluate a citation [ 11 ], the work involved in screening 10,000 citations is considerable (and the screening burden in some reviews is considerably higher than this) (see also [ 12 ]).

Reviewers are thus faced with two competing demands. Reviews that are to be used to inform policy and practice often need to be completed to externally defined (often short) timetables within limited budgets; but in order for a review to be an accurate reflection of the state of knowledge in a given area, it needs to be comprehensive.

The need to complete reviews to tight timescales has led (particularly in health technology assessments and other rapid reviews) to the adoption of highly pragmatic (and relatively specific ) strategies to searching in order to limit the number of studies to screen—even though relevant research is probably missed because of this [ 16 ]. Limiting the recall of a search may undermine one of the most important principles of a systematic review: that its results are based on an unbiased set of studies. The key problem—which this paper aims to begin to address—is that there are currently no widely accepted alternative ways of dealing with this issue. Reviews are at risk of either limiting their searches to such a degree that the validity of their findings is questionable or of increasing the time and resources they require and thus risk being unable to inform policy and practice.

Proposed ‘solution’: the (semi)-automation of screening

Broadly speaking, text mining is defined as the process of discovering knowledge and structure from unstructured data (i.e., text) [ 17 , 18 ]. In the context of finding research for inclusion in a review, we are interested in automated techniques of discovering whether a given study (described by a title and abstract) is relevant to our review [ 19 , 20 ]. There are two ways of using text mining that are particularly promising for assisting with screening in systematic reviews: one aims to prioritise the list of items for manual screening so that the studies at the top of the list are those that are most likely to be relevant; the second method uses the manually assigned include/exclude categories of studies in order to ‘learn’ to apply such categorisations automatically [ 19 ]; whilst the technologies to perform each may be similar, we separate them here as they are conceptually distinct. The prioritisation of relevant items may not appear to reduce workload (if all citations are to be screened manually anyway), but when there are large numbers of studies to screen manually, identifying most of the relevant ones quickly enables some members of a reviewing team to begin the next stages of the review, whilst the remainder of mostly irrelevant citations are screened by other team members. This reduces the time from review commencement to completion, even if the total workload remains the same.

By reducing the burden of screening in reviews, new methodologies using text mining may enable systematic reviews to both: be completed more quickly (thus meeting exacting policy and practice timescales and increasing their cost efficiency); AND minimise the impact of publication bias and reduce the chances that relevant research will be missed (by enabling them to increase the recall of their searches). In turn, by facilitating more timely and reliable reviews, this methodology has the potential to improve decision-making across the health sector and beyond.

The research problem

Whilst the logic behind applying text mining to the screening stage of systematic reviews has intuitive appeal, there are obvious concerns that might be raised by the systematic review community [ 21 ]. Firstly, there is not a lot of information about text mining written for systematic review audiences. The vast majority of papers on this topic are produced by computer scientists in journals and conference proceedings in the field of medical informatics or artificial intelligence. This means that they are not particularly accessible to systematic reviewers who need to make decisions about their review processes, both in terms of the level of technical detail presented in the reports and in the exposure such papers would have in systematic review communities.

Secondly, for these technologies to achieve broad uptake, they should be accessible to systematic reviewers without the need for a computer scientist to write bespoke code or undertake custom processing of text for individual reviews. Specialist advice may be required, but it should be akin to the need for occasional specialist statistical advice, rather than being at the level of operating the text mining tools. Any implementation issues need to be identified and resolved before rolling such technologies out to the intended users.

Thirdly, there are various ways in which workload could be reduced through these technologies (reducing number needed to screen; text mining as a second screener; increasing the rate (speed) of screening and improving workflow through screening prioritisation). However, not all technologies allow all types of workload reduction to be achieved. In order to make informed decisions about using such technologies, systematic reviewers need to know which technologies can be used for which workload reduction goal.

Fourthly, systematic reviews are a relatively new area in which text mining technologies have been applied. Some of the assumptions of text mining technologies in other applications do not hold when transferred to the review context. For instance, systematic reviewers generally place strong emphasis on high recall—that is, a desire to identify all the relevant includable studies—even if that means a vast number of irrelevant studies need to be considered to find them. When applied in other areas, precision (reducing the number of irrelevant items) and accuracy (correctly classifying items as relevant or irrelevant) are typically more valued. To be acceptable to the systematic review community, new technologies must address the particular challenges and demands of this context (We should also note at this point that we have no guarantee of perfect recall even with current methods, as search strategies are tailored to the resource available to screen results, and humans are likely to make mistakes during their manual sifting through records.).

Finally, the methods, their relative success and the metrics used to evaluate them have not yet been pulled together in a systematic way; this current study aims to fill that research gap.

Aims and research questions of the review

The primary aim of this review is to gather and present the available research evidence on existing methods for text mining related to the title and abstract screening stage in a systematic review, including the performance metrics used to evaluate these technologies a . The purpose of this is to inform systematic reviewers of the current state of text mining methods for use in reducing workload at the screening stage, with a consideration of the potential benefits and challenges when implementing such technologies. Whilst we have explored the more technical aspects of text mining technologies in our data extraction, the intended audience of this paper are users of the technologies rather than computer scientists, and so technical issues are largely dealt with at a conceptual level.

Following directly from the research problem as delineated above, we looked to answer the following questions:

What is the state of the evidence base related to automating (or semi-automating) the screening stage (based on titles and abstracts) of a systematic review? Specifically,

What methods are available; and

How has the field developed over time?

How has the workload reduction issue been evaluated? Specifically,

What has been compared, using what research study designs?

What metrics are available for evaluating the performance of the approaches?

What are the stated purposes of (semi-)automating the screening stage through text mining in terms of workload reduction, what types of methods have been used to address each purpose, and how effective were they?

How, and with what effect, have key contextual problems of applying text mining to systematic review screening been addressed, specifically as relates to the following challenges:

The importance of high recall for systematic reviews?

The risk of hasty generalisation when training from a certain pool of known includes and excludes?

The problem of imbalanced datasets, in which there are typically many more excludes than includes?

Applying the technologies to review updates?

What challenges to implementation emerge from reviewing the evidence base?

We conducted a systematic review of research papers on applications of text mining to assist in identifying relevant studies for inclusion in a systematic review. The protocol can be sent on request by the authors.

Information management

All records of research identified by searches were uploaded to the specialist systematic review software, EPPI-Reviewer 4, for duplicate stripping and screening [ 22 ]. This software recorded the bibliographic details of each study considered by the review, where studies were found and how, reasons for their inclusion or exclusion, descriptive and evaluative codes and text about each included study, and the data used and produced during synthesis.

Search methods

Database and website searches were conducted in December 2013. Sources were searched from 2005 onwards. This date was chosen because, according to Jonnalagadda and Petitti [ 23 ], the first proposed application of text mining to screening in systematic reviews was in 2005 (though this was not an evaluation of a method and so was not included in our review).

Details of the electronic search strategy, including databases searched and terms used, can be found in Additional file 1 : Appendix A; the PRISMA flow diagram can be viewed in Additional file 2 : Flow diagram.

We also included papers known to the team and as recommended by colleagues. We checked the reference lists of all included studies for additional relevant studies. We also followed forward citation recommendations in Science Direct. A cut-off for identifying studies for inclusion in the review was set at 28 February 2014.

After all searches were completed, 1,253 records were identified. These were screened for relevance to our review using the inclusion criteria outlined below.

Inclusion criteria

Studies were screened in a two-stage screening process. First, records were assessed against the following criteria based on their titles and abstracts:

Must be published after 2004

Must be relevant to text mining

Must be relevant to the screening (document selection) stage of a systematic review (or a review of the evidence that follows systematic principles, such as health technology assessment (HTA) or guidelines development)

After an initial piloting of the first stage criteria to establish common understanding of the criteria, records were screened once by two researchers (AOM and JT) who are familiar with systematic reviewing and text mining methods. Any records of doubtful relevance were marked with a ‘query’ tag and discussed by the two researchers until agreement was met (Agreement was always reached, and so recourse to a third reviewer was not required.).

The full-text documents of records that met these criteria ( n  = 69) were retrieved and proceeded to the second stage of screening. The criteria for assessing the full-text documents were:

Must be relevant to text mining methods or metrics

Must be relevant to the screening stage of a systematic review (or similar evidence review)

Must not be a general discussion of the use of text mining in systematic reviewing screening. That is, the record must present a detailed method or evaluation of a method.

The second stage of screening was conducted by one researcher (AOM), with queried records checked by the second researcher (JT) (reviewer agreement was 100% at this stage). After full-text screening, a total of 44 records were identified as relevant to the review questions.

Data extraction

Data extraction was conducted by one researcher (AOM) and checked for accuracy and completeness by a second researcher (JT) and discrepancies resolved by a second check and/or discussion. We extracted and recorded information on the following broad issues (see Additional file 1 : Appendix B for the full data extraction tool, Appendix C for the list of studies included in the review and Appendix D for the characteristics of included studies):

●  Bibliographic details

●  Evaluation context (details of review datasets tested)

●  Evaluation of active learning (if applicable) (see below for definition)

●  Evaluation of classifier

●  Evaluation of feature selection

●  Implementation issues

●  About the evaluation (the methodology and metrics used)

●  Study type descriptors

●  Critical appraisal

●  Comments and conclusions

Extraction consisted of two types of data: direct quotations from the papers, which were gathered through line-by-line coding of the papers; and categorical data, which were gathered by noting the presence or absence of certain characteristics. These two types of data were collected simultaneously. For example, a tick box was checked if a study reported using a support vector machine (SVM) classifier, and line-by-line coding of text that described the SVM was associated with that tick box in the EPPI-Reviewer 4 software [ 22 ].

Synthesis methods

The reviewers discussed the key issues that needed to be covered in the review, as well as themes that had emerged through extracting data from the studies. On that basis, an outline structure for the synthesis was developed. Under the outline subheadings, a narrative was developed that drew on both the line-by-line coded text and the categorical data. The categorical data allowed for the generation of frequency tables and cross tabulations that described the state of the evidence base; whilst the coded text allowed for a richer interrogation of the emerging themes.

The results are presented in order of the research questions posed. Since some issues raised apply beyond the systematic review context, which limited the range of papers about text mining that we formally included, we have inserted some commentary (entitled ‘further information on this topic’) where information from other domains may illuminate a specific issue.

Development of the evidence base

In this section, we address research question 1: What is the state of the evidence base related to automating (or semi-automating) the screening stage (based on titles and abstracts) of a systematic review?

Chronological developments

Our 44 included studies fall within the 8 years between January 2006 and January 2014—an average of 5.6 evaluations a year. As can be seen in the timeline presented in Figure  1 , almost every year saw the evaluation of a newly applied type of classifier or some new consideration of the application of text mining to screening. Indeed, most papers present a new ‘twist’ that distinguishes it from those before, with very few replications or comparisons between papers. The developments highlighted in the timeline are those which we had defined a priori in our data extraction tool and therefore also how the synthesis below is structured; they should therefore be considered to be indicative of interesting developments, rather than being a comprehensive list of every innovation (For example, also worthy of note are the decision trees by Frunza and colleagues in 2010 [ 24 ]; and dual supervision and elicited utility by Wallace et al. (also in 2010 [ 25 ])).

figure 1

Brief timeline of developments in the use of text mining technologies for reducing screening burden in systematic reviews.

This suggests a rapidly evolving evidence base (It also has implications for the later parts of this synthesis, as it is difficult to come to any overarching conclusions about which approach works best.).

Workload reduction approaches

In this section, we address research question 2: What are the stated purposes of (semi-)automating the screening stage through text mining in terms of workload reduction, and what types of methods have been used to address each purpose?

It is evident from the literature that there are several possible ways to reduce screening workload. The approaches that have received attention in terms of text mining are: reducing the number of items that need to be screened manually; reducing the number of people needed to screen the items; increasing the rate (or speed) of screening; and improving workflow. Table  2 shows the number of studies that implicitly or explicitly addressed each of these approaches. Each of these will be discussed in turn.

Reducing the number of items that need to be screened

In many reviews, the number of items to be screened is very large. For example, 4 out of the 31 Cochrane Collaboration systematic reviews published in March 2014 had over 10,000 items to screen [ 26 – 29 ]. This can be a particular problem for searches for certain types of study designs, such as is the case with searches for non-randomised controlled trials, for which database filters are not available or consistently used [ 30 ]. Large numbers of items to screen is even more evident in non-clinical disciplines, in which search strategies tend to be broader in response to broader research questions, less precise or consistent terminology and the lack of controlled vocabularies; for example, EPPI-Centre reviews on topics in public health, education and social care regularly exceed 20,000 items to be screened. At its most extreme, one review identified upward of 800,000 items and another over 1 million items to be screened (see [ 31 ] for a description of such ‘extreme reviewing’). Given that an experienced reviewer can take between 30 seconds and several minutes to evaluate a citation [ 11 ], the work involved in screening even as ‘few’ as several thousand citations is considerable.

An obvious solution to reducing workload is therefore to reduce the number of items that need to be screened manually. Historically, the volume of records returned from a search was determined in part through the search strategy: the number of records identified could be reduced either through searching fewer sources or through carefully constructed database queries. The latter approach usually adopted an emphasis on the precision of the search over its recall. However, some method guidelines specifically recommend favouring recall over precision in order to avoid missing relevant studies (e.g., the Campbell Collaboration’s guide to information retrieval and the US Institute of Medicine of the National Academies [ 32 , 33 ]).

Therefore, resource-efficient approaches that maximise recall are needed, and a number of different models have been identified here. The vast majority of studies included in the review ( n  = 30) implicitly or explicitly propose using text mining for the purpose of reducing the number of studies that need to be screened manually. Within this set of studies, there are two main approaches to excluding items from a review. The first approach is to use a classifier that makes explicit in/out decisions; 23 studies evaluated this approach [ 11 , 14 , 23 , 25 , 34 – 51 ]. The second approach is to use a ranking or prioritisation system and then exclude items that fall below some threshold or criterion, or that lie within a ‘negative prediction zone’ [ 31 , 52 – 57 ]; seven studies used this approach. Whilst many classifiers employing the first approach inherently assign some kind of score that indicates confidence in how likely an item is to be an include or exclude (akin to the ranking in the second approach), this is usually ‘hidden’ from the reviewer such that the decisions are presented as complete. In contrast, the second approach may require a reviewer to continue manual screening until the (reviewer-specified) criterion is met.

It is important to note that the final approach, active learning , can fit loosely into both of the abovementioned camps. Active learning (evaluated in nine studies [ 11 , 23 , 25 , 31 , 40 , 45 , 48 , 49 , 58 ]) is an iterative process whereby the accuracy of the predictions made by the machine is improved through interaction with reviewers. The reviewer—or review team—provides an initial sample of include/exclude decisions that the machine ‘learns’ from; the machine subsequently generates a ranked list and requests the reviewer to provide decisions on items high in the list that it will learn the most from. The machine adapts its decision rule including the information from the additional items and generates a new list of items for the reviewer to screen. This process continues, with the number of reviewer decisions growing and a greater number of relevant items found than would otherwise be the case, until a given stopping criterion is reached and the process ends. Although the final include/exclude decisions for any items not screened manually come from the classifier, the human screener still has some control over the training process and the point at which manual screening ceases.

In all cases, authors reported that the systems tested led to a reduction in workload; however, given the diversity of approaches and the lack of overlap (replication) between evaluations, it is impossible to conclude whether one approach is better than the other in terms of performance. Typical performance reported a reduction in manual screening workload from less than 10% (e.g. [ 41 ]) up to more than 90% (e.g. [ 48 ]). Where expressed as a workload reduction, studies tended to report reductions of between approximately 40% and 50% of work saved (e.g. [ 25 , 40 , 41 , 55 ]). Studies differed from one another in terms of the recall that they aimed for. Some expressed results in terms of 95% recall (e.g. [ 23 ]), whereas others expressed their results in terms of retrieving all relevant studies (e.g. [ 48 ]). Razavi and colleagues took a critical perspective with regard to manual decisions too, concluding that ‘Since the machine learning prediction performance is generally on the same level as the human prediction performance, using the described system will lead to significant workload reduction for the human experts involved in the systematic review process’ [ 44 ].

Text mining as a second screener

Methods guidance for conducting systematic reviews often suggests that more than one person should screen all (or some proportion) of the records returned by the searches (e.g., the Institute of Medicine (Washington, DC) states in Standard 3.3.3. ‘Use two or more members of the review team, working independently, to screen and select studies’ [ 33 ]). The rationale behind this approach is that a single screener can inadvertently introduce bias into the study selection process either because of their interpretation of the inclusion criteria or through their understanding of the content of titles and abstracts. Moreover, given the volume of records to be reviewed, it is conceivable that some relevant records might ‘slip through the net’. It is believed that if there is consistency in the inclusion decisions amongst two or more independent screeners, then the screening process is not likely to be biased. This, however, becomes a very labour-intensive process—particularly when the number of records to screen is high. Although some guidance suggests that if sufficient inter-reviewer reliability is achieved that it is acceptable to ‘double screen’ only a proportion of the records when there is a large number to screen, this still can add a substantial amount of resource to an already time-consuming procedure.

To combat this workload issue, six papers have advocated the use of text mining as a second screener: replacing or supplementing the additional human reviewer that would be required at this stage [ 24 , 30 , 59 – 62 ]. In this model, one human reviewer screens all of the records and the machine acts as the independent check (or presents a vastly reduced list of items to be screened to an additional human reviewer). The evaluations of workload reduction in this area have all been on a classifier model, in which explicit in/out decisions are made by the machine. Results from the evaluations are positive—the classifiers had good agreement with the human reviewer/s. Three of these papers were authored by Bekhuis and colleagues [ 30 , 59 , 60 ], who report that their approach could reduce manual workload by between 88% and 98% [ 60 ]. Frunza and colleagues report two studies in this area [ 24 , 61 ] and Garcia one study [ 62 ]. Like Bekhuis, they report positive results from their evaluations, though they present their findings in terms of high recall rather than workload reduction, and so a direct comparison cannot be made.

Increasing the rate of screening

An alternative approach to those above, which emphasises reducing the number of items that need to be screened manually, is to aid researchers in coming to a decision about each item more quickly; that is, to increase the rate of screening. To achieve this, visual data mining (VDM) approaches attempt to create a visual representation of the connections between documents (using term similarity and/or author connections) to assist the screener in identifying studies easily that are more likely to be similar to each other. Thus, once a relevant document is identified, they can quickly scan other documents that appear to be similar to the relevant document (and similarly, identify documents that are likely to be excluded quickly). The approach assumes that humans can make a decision about a study’s relevance faster using this additional visual information than relying on the textual information in the titles and abstracts alone [ 13 ].

Five evaluations of visual data mining were identified [ 13 , 14 , 63 – 65 ], all in the field of software engineering. The evaluations of visual data mining differ from evaluations of other text mining approaches in that they employ a controlled trial evaluation design to compare the speed and accuracy with which a human can screen items using VDM or without using VDM. The results suggest that humans can screen faster with VDM aids than without, although the accuracy of the human screeners does not appear to change substantially [ 13 , 14 , 63 – 65 ].

A second approach to speeding up the rate of screening that is embedded within approaches to reducing the number needed to screen is through efficient citation assignment . The only example that was identified of this type was by Wallace and colleagues [ 49 ]. In that paper, the authors emphasise that most review teams have a combination of expert and novice screeners. Within the context of an active learning approach, they developed an algorithm that incorporates both information about the relevance of each item and the expected time that it will take to annotate that item; on that basis, the algorithm selects citations specifically for expert and novice reviewers to label. The authors reported that this approach enabled more items to be screened in the same amount of time compared with typical active learning approaches.

Improving workflow efficiency through screening prioritisation

Screening prioritisation is ultimately a form of efficient citation assignment, in that it aims to present reviewers with an ordered list of the items, with the items that are most likely to be relevant to their review at the top of the list. However, it differs from the model described by Wallace et al. [ 49 ] in that it is not necessarily embedded within an approach that is attempting to reduce the number needed to screen and it does not differentially assign items to different types of reviewers (i.e., experts versus novices).

There are various proposed benefits of this approach to workflow efficiency. One is that reviewers gain a better understanding of the inclusion criteria earlier in the process, as they encounter more examples of relevant studies sooner than would otherwise be the case. It also enables the retrieval of the full text of documents to start sooner than can occur when citations are screened essentially at random. This can be important, as obtaining the full-text reports brings forward their full-text screening, the checking of their bibliographies and, critically, enables contact to be made with study authors much earlier in the review. It is also possible that this will make the screening process faster, once the vast majority of relevant studies are identified, as the screeners become more confident that items later in the list are less likely to be relevant. This could also help with the problem of over-inclusiveness that is often experienced in reviews, in which reviewers tend to be cautious and include many more items at this early stage than ultimately make it into the review.

Cohen highlighted another potential benefit: ‘In reviews with searches that result in a large number of citations to be screened for retrieval, reviewing the documents in order of their likely importance would be particularly useful. The remainder of the citations could be screened over the following months, perhaps by the members of the team with less experience, whilst the work of reviewing the includable studies is ongoing’ ([ 66 ] p. 692) (An ongoing project at the EPPI-Centre, which had a large volume of items to be screened (>38,000) but with a very tight timeframe, has taken advantage of this benefit [ 67 ].).

There are also potential benefits for review updates. Cohen stated that ‘by reviewing the most likely important documents before other documents, the human reviewers or curators are more likely to be able to “get up to speed” on the current developments within a domain more quickly’ ([ 68 ] p. 121). In quite a different application of text mining to the screening process, Cohen later explored the use of prioritisation for identifying when a review update was required, which would involve sending alerts to the review team when likely relevant new studies are published [ 69 ].

In other words, this approach emphasises improving workflow in a review and has proposed benefits for efficiency beyond reducing workload in the title and abstract screening phase. Four studies adopted a prioritisation approach to improve workflow [ 58 , 66 , 68 , 69 ]. All four evaluations reported benefits of this approach.

Note that screening prioritisation can also be used to reduce the number of items needed to be screened if a screening cut-off criterion is established (see section on this workload reduction approach, above). Seven studies that have used screening prioritisation did so to reduce the number needed to screen and reported benefits in terms of the amount of work saved [ 31 , 52 – 57 ]. (Again, the metrics and processes varied, so it is not possible to estimate overall or mean statistics across these studies).

Specific issues relating to the use of text mining in systematic reviews

In this section, we address research question 3: How have key contextual problems of applying text mining to systematic review screening been addressed? These reflect the challenges that need to be addressed when applying methods developed for other applications to the case of systematic review screening.

The importance of high recall for systematic reviews

As mentioned in the ‘Background’ section, recall is often prioritised over precision in systematic reviews. This is because it is generally considered to be critical to retrieve all relevant items to avoid biasing the review findings. The importance of high recall of relevant studies is likely to be critical in the acceptability and uptake of text mining techniques by the systematic review community. Indeed, the authors of one paper reflected that ‘If those who rely on systematic review to develop guidelines and policy demand 100% recall and informatics approaches such as ours are not able to guarantee 100% recall, the approaches may be doomed’ ([ 23 ] p. 15).

Many of the studies in this review explicitly refer to the importance of high recall and the implications it might have for text mining applications in this area (studies which discuss the importance of high recall include [ 11 , 23 , 24 , 30 , 38 , 40 , 41 , 44 , 48 , 49 , 53 , 54 , 58 , 60 , 61 , 70 ]). However, few of the studies directly built into the technology an approach to maximising recall. Those that did directly attempt to maximise recall are discussed below.

Voting or committee approaches for ensuring high recall

One approach to ensuring that studies are not missed is to use a voting or committee approach. Essentially, multiple classifiers are run simultaneously, and then a ‘vote’ is taken on each item to determine whether it is likely to be relevant or not. A conservative approach would be to put forward for human screening any item that receives at least one ‘include vote’ (e.g., Wallace et al. [ 11 ]); an approach that places additional emphasis on precision might set a minimum number of agreeing votes (e.g., >50% of the classifiers must agree that an item is an include [ 44 ]).

The appeal of such approaches is that the classification decision is less susceptible to missing studies that do not resemble the training set of includes, because each classifier can start with a different training set. Several studies have used this approach, with different numbers of classifiers used in the committee. Razavi used a committee of five classifiers [ 44 ]; Wallace and Frunza used (up to) eleven classifiers [ 11 , 24 , 61 ]; Ma used two classifiers [ 40 ]. Only Frunza has considered whether the number of votes makes a difference, as discussed below [ 24 , 61 ].

In Frunza (2010), if at least one decision for an abstract was to include it in the systematic review, then the final label was ‘Included’ [ 24 ]. They then tested whether the number of votes (i.e., number of classifiers) made a difference to recall and precision. They concluded that the 2-vote technique is superior to the other voting techniques (1-vote, 3-vote, 4-vote) in terms of the F measure and work saved over sampling (WSS). The highest level of recall was achieved through the 4-vote technique. The success of combined human-machine screening was similar in their later study [ 61 ], with the conclusion that the 2-vote technique was the best performer. Importantly, Frunza noted that precision decreased slightly when the human decisions were added to the machine decisions (i.e., the human incorrectly included some items). This might be relevant to the observation that human screeners tend to be over-inclusive (discussed in a later section).

(We will return to the issue of ‘voting’ approaches below, in the section on ‘Hasty generalisation’).

Specialist algorithms

At least three types of classifiers have been modified to include a specialist algorithm that adjusts the learning rate of the classifier to penalise false negatives. Cohen et al. applied a ‘false negative learning rate’ to their voting perceptron classifier expressing this as a ‘cost-proportionate rejection sampling’ strategy [ 36 ]. Matwin et al. added a heuristic weight factorization technique to their complement naïve Bayes (CNB) algorithm to maximise recall when their original algorithm had unacceptably low recall (<95%) [ 41 ]. Bekhuis also modified a complement naïve Bayes classifier by optimising the decision parameters using F3: a summary measure of performance that overweights recall relative to precision [ 60 ]. Wallace and colleagues modified their support vector machine approach to penalise more severely for false negatives compared with false positives [ 48 ].

All of these studies were retrospective evaluations in which the performance of a classifier was compared against completed include decisions and all reported good results in terms of recall and workload reduction. Future evaluations of this approach should consider whether the amount and/or quality of the training data make a difference to the ability of these modifications to adequately penalise false negatives. The reason for this is that, if used in a ‘live’ review, there might be only a small number of human-labelled items in the training set to be able to determine whether the classifier has incorrectly rejected a relevant study. If there are only a small number of includable studies in the entire dataset, then such penalties might not be implementable.

Human input

Ma proposed using active learning as a method for assuring high recall [ 40 ]. The logic behind this is that the algorithm continues to ‘learn’ as more items are manually screened and so the decision rule is adaptable and less reliant on the initial training set. However, Ma’s [ 40 ] results suggest that recall actually declined when active learning was added to a support vector machine or decision tree classifier and made no difference to the recall of a naïve Bayes classifier. Further research on this is needed to determine why this might be the case.

Hasty generalisation

The term ‘hasty generalisation’ refers to a bias which can occur because the features in the training set are not representative of the population; as opposed to other forms of ‘biased training sets’ (e.g. where bias occurs from non-randomised sampling). If the initial training set of documents in a systematic review is not fully representative of the range of documents which are of interest, it is possible that these documents will be missing from the set of studies identified as relevant through automation (see [ 25 ]). To exclude relevant studies due to their use of different terminology from those that are included would be to inject a systematic bias which would be unacceptable in the vast majority of reviews.

Several methods for dealing with this have been evaluated or discussed: drawing on reviewer domain knowledge, using patient active learning methods and employing an ensemble of classifiers that vote on whether an item should be included or not. These are elaborated on in the following sections.

Reviewer domain knowledge

Some studies evaluated or discussed drawing on the knowledge of the human reviewers to play a part in the text mining process. This is particularly suited to active learning approaches. Jonnalagadda and colleagues suggested that, in active learning, ‘the dynamically changing query set, which decides which document will be presented next, could be easily modified at any stage by removing or adding terms to the query set. In this way, the possibility of not finding documents that use different words could be further minimised by allowing active participation of the users in defining the terms in the query set’ ([ 23 ] p. 15). They did not, however, test this approach empirically.

In addition to other text mining methods, Shemilt et al. employed an approach that used ‘reviewer terms’ (terms specified by the review team as being indicative of an includable or excludable study) [ 31 ]. The text contained in each title-abstract record that was yet to be screened was analysed and the number of relevant and irrelevant terms they contained was calculated. A simple ratio of these values was then generated, and items were ranked according to this ratio. The authors argue that ‘The purpose of this method is to act as a counterpoint to the automated technologies; whereas in ATR [automatic term recognition] and AC [automatic classification], the results are heavily determined by those studies already identified as being relevant; RT [reviewer terms] offers another perspective on potential relevance, offering some protection against the problem of hasty generalization’ ([ 31 ] p. 45). This might offer reassurance to review teams that no relevant items are being erroneously discarded and is an easy approach to implement if the reviewers are familiar with the key terminology.

A more holistic approach was evaluated by Wallace et al. [ 25 ]. As in Shemilt et al. (above), reviewers provided terms that were indicative of includes and excludes (although the terms were ranked in order of ‘indicativeness’ in the Wallace paper). Wallace et al. suggested that combining prior reviewer knowledge with the machine model could be more effective at avoiding hasty generalisation and tested a variety of combinations in terms of the timing at which the reviewer knowledge rankings were emphasised relative to the machine labelling. They concluded that beginning with a bias towards the reviewer rankings and subsequently decreasing its importance as labelling proceeds would be the most effective way of combining reviewer domain knowledge in the process; however, they also noted ‘How this should be done precisely remains a problem for future work’ ([ 25 ] p. 8).

In addition, in a study which came to light after our formal searches were complete, Small et al. utilised reviewer ‘labelled features’ within what they called a ‘constrained weight space SVM’ [ 71 ]. They found that, by allowing reviewers to influence the decisions made by the classifier, it is possible to obtain better results with smaller samples of training records.

Patient active learning

‘Patient active learning’ was first proposed by Wallace et al. as a means of overcoming hasty generalisation using an active learning approach [ 11 ]. The distinguishing feature of ‘patient’ active learning is that training is based on different ‘views’ of the records (e.g. classifiers based on titles or abstract or MeSH terms) which are selected at random at each iteration of the active learning process. The additional variability that this approach injects into the process above the use of a single ‘view’ aims to ensure that the system as a whole is exposed to as wide a variety of relevant studies as possible and thus does not overly narrow the range of items it considers to be relevant.

Wallace and colleagues evaluated four different active learning strategies and found that patient active learning outperformed the others [ 11 ]. In a study which replicated some of Wallace’s work on the same data, Miwa and colleagues evaluated a range of active learning enhancements and found that patient active learning is certainly better than some strategies, though not as good as others [ 45 ].

Voting or committee approaches for dealing with hasty generalisation

The concept of a committee of classifiers was earlier introduced for helping to ensure high recall. Given that hasty generalisation would logically lead to lower recall, it is unsurprising that this approach has also been suggested as a solution to hasty generalisation.

Two studies explicitly refer to this approach. Miwa et al. reported that voting showed some improvement over non-voting approaches, especially for one particularly ‘messy’ dataset with respect to the terminology used in that review topic [ 45 ]. Shemilt et al. did not compare voting with non-voting approaches but ran the classifier multiple times and then manually screened only those items that were consistently classified as being relevant [ 31 ]. This approach seems likely to have increased precision at the expense of sensitivity.

Dealing with imbalanced datasets

At the title and abstract screening stage of a typical systematic review, the dataset is imbalanced in that there are usually far more excluded studies than included studies. One paper reported a median search precision (number of included studies divided by total number of items located through searching) of 2.9% across 94 health-related systematic reviews [ 72 ]. This translates to an imbalance in which there are approximately 33.5 times as many excludes as includes. Search precision can be much less than this, resulting in even greater imbalances.

In text mining evaluations, this is referred to as the ‘class imbalance’ problem (where ‘class’ refers to the designation as an include or an exclude). It is a problem for text mining as there are far fewer relevant items compared to non-relevant items on which to train the classifier or text mining technology. Also, Wallace et al. state that ‘class imbalance presents a problem for classification algorithms, because they have typically been optimised for accuracy, rather than the recall of a particular class’ ([ 11 ] p. 5). Since it is possible to have high accuracy even if a system produces many false negatives [ 73 ], this could be a problem for systematic reviews where missing relevant studies is highly undesirable.

To counter the class imbalance, various methods have been proposed. They generally rely on up-weighting the number of includes or down-weighting the number of excludes; or undersampling the number of excludes used in the training set. The various approaches are described in the following sections.

Weighting approaches assign greater weights to positive instances (includes) than to negative instances (excludes). Generally, the weight is set to the ratio of the number of positive instances to the number of negative instances.

Compared to an un-weighted method or an aggressive undersampling method (described below), Miwa et al. reported better performance of active learning models on a variety of imbalanced datasets [ 45 ]. This was particularly the case when weighting was used in conjunction with a ‘certainty’ approach, in which the next items to be annotated in the active learning process were selected because they had the highest probability of being relevant to the review, based on the output of classifiers trained on previously annotated items.

Cohen et al. also reported good results for a weighted model, in which they modified their voting perceptron classifier to incorporate a false negative learning rate (FNLR) [ 36 ]. Across 15 reviews, they found that the FNLR should be proportional to the ratio of negative to positive samples in the dataset in order to maximise performance.

Undersampling

Undersampling involves using fewer non-relevant studies in the training set than might be expected given their prevalence in the entire dataset. Two different types of undersampling have been tested in this context: random and aggressive.

Random undersampling involves randomly selecting a training set with the same number of relevant and non-relevant studies. This approach was adopted in four studies that did not compare random undersampling to other methods for dealing with class imbalance [ 11 , 31 , 39 , 48 ].

Ma compared five undersampling methods with their active learning naïve Bayes classifier—one of which was random undersampling [ 40 ]. Method 1 involved selecting the negative examples whose average distances (a measure of similarity/dissimilarity) to the three farthest positive examples are the smallest; Method 2 involved selecting the negative examples whose average distances to the three closest positive examples are the smallest; Method 3 involved selecting the negative examples whose average distances to the three closest positive examples are the largest; Method 4 involved removing those examples that participated in Tomek links (see [ 74 ] for a definition); Method 5 involved selecting negative examples randomly. Ma concluded that random undersampling did not perform the best. ‘In general, the first and third undersampling methods work well with all feature selection methods. We have a very high recall after performing undersampling techniques. However, we have a big trade-off in precision’ ([ 40 ] p. 75).

Aggressive undersampling as defined by Wallace (in the context of active learning) involves discarding the majority examples (i.e., excludes) nearest the current separating hyperplane [ 11 ]. The separating hyperplane represents the border between the two classes: includes and excludes. Therefore, by throwing away those nearest to the hyperplane, we are discarding those that are the most ambiguous as to whether they should be in the include or exclude class. As such, the items that are more likely to be excludes are sent to the human reviewer for manual screening, which are then used to retrain the classifier. The logic behind this approach is to ‘explicitly push the decision boundary away from the minority class [includes], as it has been observed that when there is class imbalance, SVMs are prone to discovering hyperplanes that are closer to the minority class than the ideal separating boundary, resulting in false negatives’ ([ 11 ] p. 5).

Wallace (2010a) [ 11 ] compared naive random sampling and aggressive undersampling in their evaluation of active learning with an SVM classifier. They concluded that aggressive undersampling performed better [ 11 ]. Miwa et al. compared aggressive undersampling with a range of other options and found that whilst it outperformed the other strategies at the beginning of the active learning sequence, other methods overtook it as screening progressed [ 45 ].

It is difficult to draw conclusions across the papers, as the two that conducted a comparison differed in many other dimensions (classifier, reviews tested, etc.). This requires further exploration.

Cohen and colleagues observed that any kind of sampling strategy can result in the exclusion of a large proportion of the possible sample available from which the classifier can ‘learn’ [ 66 ]. ‘To address this, we sample the nontopic data, creating several different priming SVM models, and extract the support vectors from each of these models to use as priming vectors. The nontopic data are rejection sampled, that is, sampled without replacement. The probabilities of inclusion for each sample within a given nontopic are adjusted so that approximately the same number of samples from each nontopic is included.’ In their experiments they used 20 resamples.

Other methods for dealing with class imbalance

Some authors claimed that certain classifiers are particularly well suited to imbalanced datasets. Bekhuis Frunza, Kouznetsov and Matwin claimed that complement naïve Bayes (CNB) is suitable for imbalanced data, particularly when implemented in Weka [ 24 , 30 , 41 , 54 , 60 , 61 ]. Frunza and colleagues compared CNB with other classifiers (decision trees, support vector machine, instance-based learning and boosting) but concluded that CNB always performed better; it is not clear, however, whether this is because of the class imbalance problem or other differences between the approaches [ 24 , 61 ].

Some authors have suggested that the selection of features for text mining might be important in addressing class imbalances. Although they did not test it in their paper, Bekhuis et al. suggested that selecting features within the positive (include) and negative (exclude) classes before grid optimization, rather than across all items, would be appropriate for dealing with class imbalance [ 30 ]. Frunza explicitly compared classifiers that had been ‘boosted’ in terms of having more representative features for the included class (a balanced dataset) with typical feature selection technique (imbalanced dataset) but found no significant difference between these two approaches [ 24 ].

Updates versus ‘new’ reviews

Out of the 44 studies, the context of 36 was a new review, eight a review update, and for two studies the review context was not the primary area of investigation (the issue was the performance of classifiers). The context of new reviews is challenging, because there is so little training material available at the start of screening on which to conduct any machine learning. Whilst the concept of obtaining an unbiased set of training material using a random sample is widely employed, Wallace and colleagues have outlined an explicit iterative method to determine whether the variation in likely ‘includes’ has been explored adequately enough for active learning to begin [ 11 ]. They do this drawing on the work of Brinker who has developed methods for incorporating diversity in active learning by evaluating the stability of a measure of similarity between ‘included’ citations between iterations [ 75 ]. Once the measure of similarity ceases to change between iterations, the sample can be considered ready to perform active learning.

In contrast, whilst the review update might appear to be the more straightforward situation, since there are preexisting citation decisions on which to ‘learn’, some of the earliest work included in our review—by Cohen—shows that review updates face many challenges of their own [ 35 , 66 , 68 , 69 ]. In particular, the issue of ‘concept drift’ looms large over the review update. As Bekhuis points out, there are many changing variables in a review update—the team, the searches and even aspects of the question may all change—and the data from the original review may cease to be a reliable indicator of what should be included in the new one [ 60 ]. Dalal and colleagues attempted to mitigate the effects of concept drift but were not entirely successful [ 70 ].

Additional information on this topic

Online learning methods which treat datasets as a stream, updating their model for each instance and discarding it after updates, can be used for new reviews. Some online learning algorithms adapt their models quickly to new coming data and can be adapted to deal with slight concept drift [ 76 ]. Domain adaptation, multi-task learning and transfer learning can improve models for a specific review by using related information from other reviews and problems. Such learning methods support the learning of multiple, related review targets [ 77 ].

How has the workload reduction issue been evaluated?

The following section addresses research question 4: How has the workload reduction issue been evaluated? There are three aspects that we explore: what has been compared and through what research design; and what metrics were used to evaluate the performance of the technologies?

What has been compared, using what research design?

The vast majority of evaluations used a retrospective design; that is, they assessed performance against the ‘gold standard’ judgements made in a completed systematic review [ 11 , 25 , 30 , 34 , 36 – 45 , 47 , 48 , 51 , 52 , 55 , 56 , 59 – 62 , 66 , 68 , 70 ] ( n  = 27). In contrast, prospective designs are those in which the technology was assessed in a ‘live’ context; that is, as the review was being conducted. Seventeen studies employed a prospective design, of which five were self-described as ‘case studies’ [ 31 , 46 , 50 , 57 , 63 ], four were controlled trials [ 13 , 14 , 64 , 65 ], and eight were other prospective designs [ 23 , 24 , 35 , 49 , 53 , 54 , 58 , 69 ].

The type of design is important, as prospective designs have the potential to tell us more about how the text mining technologies might work when implemented in ‘real life’. Whilst retrospective simulations are essential in determining the relative performance of different classifiers or establishing the optimal parameters of a classifier, some of the difficulties of implementing such technologies in a live review cannot be taken into account adequately (e.g., reviewer over-inclusiveness at different stages of the process, which might ‘mislead’ the classifier about what an include ‘looks like’). Moreover, many of the evaluations are of relatively ‘neat’ datasets, in that they have a sufficient number of includes on which to train (even if they are the minority class). How does text mining cope when there is a tiny number of includes, or in a so-called ‘empty’ review, in which there are no included studies? b

Related to the issue of how the technologies were evaluated is the question of what was evaluated. Most of the evaluations conducted to date ( n  = 29) make some form of comparison between different algorithms or methods for text mining [ 11 , 23 – 25 , 30 , 34 , 36 , 37 , 39 – 43 , 45 , 49 , 51 – 55 , 58 , 60 – 62 , 66 , 68 – 70 ]. The main issues evaluated are: the relative effectiveness of different methods for classifying studies (i.e. ‘classifiers’ and different options for using them (‘kernels’)); how different approaches to ‘feature selection’ (the way that aspects of studies—e.g. their titles, abstracts and MeSH headings are encoded for machine learning) impact on performance; how effective different approaches to separating different pieces of ‘intelligence’ about the study are (e.g. separating titles from abstracts); and whether performance differs depending on how many studies are used for the initial training. The remaining 16 evaluations do not compare aspects of the methodology; rather, they report on the effectiveness of one chosen method for implementing text mining [ 13 , 14 , 31 , 35 , 38 , 44 , 46 – 48 , 50 , 56 , 57 , 63 – 65 ].

Unsurprisingly, study design is associated with certain types of comparisons (see Table  3 ). The four controlled trials all compared human performance with machine performance but did not compare different aspects of text mining technologies. None of the five case studies compared text mining features either, with an emphasis instead on how workload could be reduced in an ongoing review. The retrospective simulation studies tended to compare more features of text mining than other prospective studies, perhaps because of the comparative ease with which adaptations to the text mining approach can be made in a retrospective evaluation.

Metrics for assessing classifier performance

In this section, we address research question 3 : What metrics are available for evaluating the performance of the approaches, in terms of both effectiveness and efficiency? The metrics are presented in order from the most popular to the least in Table  1 . Most studies reported more than one performance metric and generally considered the importance of both identifying relevant studies and reducing workload for the reviewers. The metrics are defined in Table  1 .

There are various arguments used throughout the literature as to which metric is the most appropriate. It should be noted that not all metrics are suitable for all evaluation designs or text mining technology types. For instance, coverage is only suitable for active learning approaches, whilst Cohen noted that ‘If the task is not to separate documents into positive and negative groups, but instead to prioritise which documents should be reviewed first and which later, then precision, recall and F measure do not provide sufficient information’ (p. 121) [ 68 ].

Measures that allow the trade-off between recall and precision to be taken into account on a review-by-review basis seem particularly useful, as they allow reviewers to change the relative importance of these two metrics depending on priorities in a given review. These metrics include notably the F measure, work saved over sampling and utility, which are summarised below.

F measure is a weighted harmonic mean of precision and recall. The weighting can be determined on a review-by-review basis, allowing reviewers to assess the relative importance of recall and precision in their context.

Work saved over sampling (WSS) indicates how much work (in terms of number of items needed to screen) is saved over and above the work saved by simple sampling for a given level of recall. It is typical to use a recall level of 0.95. See Cohen et al. [ 36 ].

Utility is relevant for active learning approaches and is calculated based on yield and burden. Yield represents the fraction of includes in the data pool that are identified by a given method, and burden represents the fraction of includes in the data pool that have to be annotated/reviewed by reviewers. The formula to calculate utility includes a weighting factor so that the reviews can specify the relative importance of yield and burden. This weighting factor has been established for some contexts but might need to be re-established for application in other settings [ 25 ].

It is clear from the three metrics above that there is a subjective element to the performance metrics, as it is up to the evaluators to determine thresholds and weighting values. Whilst this has the advantage of making the metrics tailored to the review and evaluation context, it (a) makes it difficult to compare across studies that use different thresholds/weights in their calculations, and (b) it is not always transparent or justified as to how the thresholds/weights were selected.

Evaluation metrics that emphasise high recall

As mentioned above, many studies discussed the importance of high recall without necessarily making explicit adaptations to their text mining approach. They do, however, consider the importance of high recall in their choice of metric when evaluating the performance of the text mining technology. Examples included:

●  Bekhuis (2012) used F3—a summary measure that overweights recall relative to precision—because they felt this was more in keeping with reviewer behaviour (than a metric which weights them equally) [ 59 ]

●  Kouznetsov (2010) used false negatives (relevant articles mistakenly ranked at the bottom of a ranked list) as their primary performance measure [ 54 ]

●  Wallace (2011) [ 58 ] used U19—a weighted metric in which recall is 19 times as important as cost. The value of 19 was determined through an expert consultation process [ 25 ] (see Wallace [ 11 ])

●  Dalal (2013) evaluated performance using a range of probability thresholds to better consider the impact on observed performance of using different recall and precision trade-offs: one metric was based on ‘sensitivity-maximising thresholds’ whilst another ‘preserved good sensitivity whilst substantially reducing the error rate [false positives]’ (p. 348) [ 70 ]

In contrast to most of the studies in this review, Dalal (2013) argued that ‘neither error minimization nor sensitivity maximisation are absolute goals’ (p. 348) [ 70 ]. In fact, Fiszman and colleagues (2008, 2010) used the F0.5 measure, which weights precision more highly than recall [ 38 , 53 ]. They argue that clinical practice guideline developers value precision more than recall and therefore performance should be evaluated on this basis. This suggests that the relative importance of recall and precision might vary from context-to-context, and a high recall should not be assumed to be more important than high precision (though in most systematic review guidance—and practice—maximising recall is prioritised).

Evaluation metrics that account for class imbalance

As with the issue of the importance of high recall in systematic reviews, some authors have reflected the class imbalance problem in their choice of evaluation measure. Cohen (2010) argued that the AUC is independent of class prevalence [ 24 , 35 ], whilst Frunza [ 24 ] reported the F measure for the same reason. The choice of evaluation metric should consider whether class imbalance is likely to bias the results.

Further information on this topic

We should note that other evaluation metrics can also account for class imbalance. For example, if you care about both the TPs and the TNs, you’d use ROC-AUC, but if you only care about the TPs, you might prefer PR_AUC [ 78 ]. See also [ 79 ].

Implementation challenges

The following section attempts to answer research question 5: What challenges to implementation emerge from reviewing the evidence base? Whilst almost all of the papers concluded that text mining was a ‘promising’ approach to reduce workload in the screening stage of a systematic review, it was not always clear how these technologies would be rolled out for use in ‘live’ reviews. A few issues became clear that need to be considered for the knowledge gained in these studies to have practical application (all of which apply to other uses of automation and semi-automation in systematic reviews [ 80 ]).

Deployed systems

Only six different systems (reported in 12 papers) are currently ‘deployed’—that is, are in a packaged system that a reviewer could use without having to do any computer programming. Some are bespoke systematic review systems, whereas others are more generic software for predictive analytics which can be used in a systematic review. The bespoke systems for systematic reviews which were used in evaluations in this review are: Abstrackr [ 49 , 50 ], EPPI-Reviewer [ 31 , 57 ], GAPScreener [ 51 ] and Revis [ 64 ]. Many generic software applications support the kinds of machine learning evaluated in this review; the two that were used in our included papers were Pimiento [ 62 ] and RapidMiner [ 59 , 60 ]. However, even though no programming may be required to use these tools, reviewers using the systems are likely to require some training to be able to use them. Given concerns about the need for high recall, imbalanced datasets, etc., these are not packages that can be used without understanding some of the behind-the-scenes decisions that are made with respect to handling the data.

Replication of evaluations

Only one study in the evidence base represents a true replication of another study (Felizardo [ 65 ]). There are some partial replications that used the same dataset; notably, Cohen and colleagues and Matwin and colleagues had an ongoing correspondence in the Journal of the American Medical Informatics Association in which they presented results across the same review datasets using different classifiers and parameters. Most studies differ in many ways: datasets used, classifiers tested, feature selection processes applied, citation portions viewed, comparisons made, study designs employed, metrics used for evaluation, etc. This makes it impossible to compare results across studies directly. It also makes it difficult to conclude whether any particular aspect of the abovementioned differences is particularly important to adopt or fruitful to explore in future research.

It is hoped that future evaluations will attempt more replications of the same methodological applications but on different datasets, to determine whether findings hold when applied to new topic areas. For instance, Miwa [ 45 ] reported that a particular approach did not perform as well on ‘messy’ social science datasets as it did for ‘cleaner’ clinical datasets that had been used elsewhere (though other enhancements can make up for some of this deficit)—these sorts of partial replications of the method are helpful in understanding the cross-review and cross-disciplinary applicability of the evaluation findings [ 45 ].

Scalability

A further concern is whether some of the approaches will work on very large datasets—that is, can they be ‘scaled up’ from the small datasets used in the evaluations to the larger datasets that are often encountered in systematic reviews. The largest evaluation was on a dataset of more than 1 million citations [ 31 ], although that was a case study (and an extreme one at that!); the second largest evaluation was on a dataset of 47,274 [ 24 ]. However, the vast majority were conducted on review datasets that were well below 5,000 items, with the smallest datasets being only 57 items (20 in the training set, 37 in the test set; [ 64 , 65 ]).

Given that the purpose of using such technologies in systematic reviews is to reduce screening workload, then it seems appropriate to test them on datasets for which the workload is large or even unmanageable. Although we can extrapolate from the smaller datasets to larger reviews, there is a limit to how much we can assume that the technologies will be able to detect true positives in such large (and thereby presumably more diverse) datasets.

The issue of scalability is particularly relevant to the visual text mining approaches, as discussed earlier in the paper. Consideration will need to be paid to how to represent connections between papers visually when many items are in the dataset; the visual image could be too overwhelming to be of any use in aiding human information processing. Either adaptations to such tools will need to be made for scaling up, or an upper threshold of number of items in the dataset might need to be established.

Methods such as stream-based active learning are promising in handling large-scale data instances [ 81 ]. Stream active learning is closely related to online learning [3.3.4], but as it does not need to store all the instances in active learning, it can handle large-scale data instances.

Suitability. Appropriateness of TM for a given review

This systematic review has aimed to identify all the relevant studies concerning the use of text mining for screening, finding that it is a relatively new field with many gaps in the evidence base. One significant gap is the limited range of topics and types of study within the reviews which have been used to evaluate the text mining methods. On the whole, they are concerned with identifying RCTs in clinical areas and there are almost no examples outside the health and biomedical sector apart from a discrete set in the area of software engineering. This is not surprising, since these are the areas that text mining for other purposes is most common, but it is an important area for future research, because general literature is more challenging to text mine because of the variability of concepts, text categorisation, etc.

Bekhuis and Demner-Fushman tested this explicitly in their study of 2010, looking for non-randomised, as well as randomised, controlled trials (though still in the medical domain) [ 59 ]. Their findings are promising, though they are concerned about the possibility of ‘over-fitting’ and the danger of building a classifier that does not recognise the true scope of relevant studies. They identify a specific type of SVM classifier and conclude that their method may be able to identify non-randomised studies with a high degree of recall—as long as the citations on which the machine learning can ‘train’ encapsulate the full range of the potentially relevant studies. Miwa et al. test explicitly the difference in performance of the same machine learning approaches between ‘clinical’ and ‘social science’ reviews [ 45 ]. They found that text mining performance was slightly poorer in the social scientific literature than the clinical domain and that certain enhancements could improve this.

Wallace and colleagues suggest a method to be used in review updates which enable reviewers to determine whether a semi-automated approach is viable [ 48 ]. They recommend a ‘cross-fold validation’ test, whereby the database of studies from the original review is split into parts (say, 10) and the classifier successively trained on 90% of the data, leaving 10% for assessing its performance. Performance is then averaged over the 10 iterations and if acceptable, then the use of automation for the update of that specific review can be recommended.

Most text mining systems used in systematic reviews use shallow information e.g. bag-of-words and their combinations, e.g., kernels. Natural language processing techniques such as syntactic parsing can be employed to engineer more discriminative features. Furthermore, unsupervised feature learning or dimensionality reduction approaches can be employed to build feature representations suitable for specific domains as well as finding queries to relieve hasty generalisations as mentioned in 3.3.2 [ 82 ].

Over-inclusive screeners

The success of most automated approaches relies upon ‘gold standard’ training data; that is, citations that the machine can assume have been correctly designated as relevant or irrelevant. Using these data, the machine is then able to build a model to designate such classifications automatically. Usually, these gold standard training data take the form of decisions made by reviewers when screening a proportion of the studies of interest. Unfortunately, these decisions may not actually be ‘gold standard’ training data, because reviewers are trained to be over inclusive, and to retrieve the full text whenever they are in doubt—even if the most likely final decision is that it is irrelevant. Such decisions may mislead the classifier and generate a model which incorrectly classifies irrelevant studies as relevant. Bekhuis et al. acknowledge this as a potential problem, but go on to argue then that to ‘be worthwhile, a classifier must return performance better than this baseline to ensure reduced labor’ [ 60 ]: a pragmatic way of looking at how machine learning might potentially assist in systematic reviews. Frunza et al. also encountered this challenge, finding that the best way of mitigating the effects of reviewer over-inclusivity was to base the machine learning on designations that were the result of two reviewers’ opinions—after disagreements had been resolved [ 61 ]. This solution is clearly only possible when two reviewers are reviewing every abstract—something which is common, but by no means universal, practice.

A machine learning-based method able to deal with over-inclusive screening as well as data imbalance is cost-sensitive learning [ 83 ]. Cost-sensitive learning assigns misclassification costs to certain types in learning and adapts machine-learning methods for task-specific criteria. It is as competitive as or better than sampling methods for unbalanced datasets [ 84 ], and it is also employed in active learning [ 85 ].

Summary of key findings

This review asked five research questions, which we have addressed through synthesising the evidence from 44 evaluations of the use of text mining for reducing screening workload in systematic reviews.

The first research question related to the state of the evidence base, which we conclude to be both active and diverse. The timeline indicates that the field is evolving rapidly, with new issues being tackled almost every year since its application to systematic reviews. However, this also hints at an issue that was elaborated on throughout this paper—that is, there is almost no replication between studies or collaboration between research teams, making it difficult to establish any overall conclusions about best approaches.

The second research question related to the purpose of using text mining to reduce workload and the methods used for each purpose. For reducing the number needed to be screened, it is reasonable to assume that the more interactive approach offered by a ranking or prioritisation system and the active learning approaches will have greater user appeal than a strict classifier approach in ‘new’ reviews (as opposed to review updates). This is because reviewers might be uncomfortable with handing over too much control to an automated system. Also, when using a ranking or prioritisation approach, reviewers are able to search more sensitively than is currently the norm and screen the same number of studies as they currently would; the effort spent screening manually would thus be focused on those studies identified as being the most relevant retrieved in the search, enabling these reviews to identify more relevant studies than is currently the case.

For using text mining to replace a second human screener, classifiers were used to make explicit in/out decisions and those decisions were compared with a human reviewer. This approach is likely to have strong appeal amongst the systematic review community because, whilst it reduces the resources required to screen items, 100% of the items identified through searching are still viewed by a human screener. This could combat concerns about false negatives assigned by an automated screener. A further potential benefit of such a system is that it ‘could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration’ (Bekhuis [ 60 ], p. 9) (One possible weakness of this approach is that it necessarily assumes that any mistakes made by the human screener are essentially at random, and not because of some systematic misapplication of the inclusion criteria, which might be picked up and addressed if two reviewers were working in tandem.).

Reducing workload by increasing the rate (or speed) of screening was a little researched topic, exclusively limited to the visual data mining approach and largely championed by one research group. A major limitation of these evaluations—and potentially for the wider applicability of these approaches—is that the approach has only been tested on very small datasets. The largest dataset consisted of only 261 items to be screened [ 13 ]. It is unclear whether such an approach could be scaled up to be applied in other disciplines in which thousands of items might need to be screened, though the authors argue that upscaling is indeed possible. The efficient citation assignment approach evaluated by Wallace et al. [ 49 ] may also be promising for larger reviews where the expertise of the reviewers is known.

Improving workflow efficiency through screening prioritisation is likely to appeal to systematic reviewers as it allows for reviewers to screen 100% of the titles and abstract but with a range of benefits. Benefits discussed in the literature included: understanding the inclusion criteria sooner, getting up to speed on new developments in review updates, starting full-text document retrieval sooner and starting the data extraction and synthesis processes in parallel with screening the ‘tail end’ of the list of items (in which there are expected to be very few or zero relevant items).

The third research question related to the contextual problems of applying text mining to systematic review screening and how they have been addressed in the literature. We found various attempts to address the importance of high recall for systematic reviews (vote counting; specialist algorithms; and human input). Whilst all evaluations reported good recall, the studies used different adaptations; so it is impossible to conclude whether any approach is better than another—and in which context. However, human input is likely to have intuitive appeal to systematic reviewers, as it allows for a human sense-check of the terminology preferences determined by the machine.

One important distinction to make when evaluating the utility of machine learning in screening is whether one is creating a new review or updating and existing one. Given the existence of the preexisting data for review updates, it is often possible to know in advance the likely performance of using text mining, enabling reviewers to make an informed decision about its potential in that specific review. Such a situation does not pertain in new reviews, and the risk of hasty generalisation is a ‘known unknown’ here, as are the risks and benefits of adopting a semi-automated approach.

The lack of replication and testing outside the biomedical sphere makes it difficult to draw conclusions about the general effectiveness of these technologies. Certainly, where technical jargon is utilised, most approaches appear to offer efficiency savings; and in the few instances of their application outside the medical domain they again can be effective, though potentially slightly less so.

The fourth research question considered how the workload reduction issue has been evaluated. Here, it was impossible to synthesise study findings quantitatively, because each used different technologies in (usually) different reviews. On the whole, most suggested that a saving in workload of between 30% and 70% might be possible (with some a little higher or a little lower than this), though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall).

The fifth research question considered the challenges to implementation that emerged from reviewing the evidence base. Here, we found few deployed systems, which limits the ability of reviewers to try out these technologies, but also, given the limitations in the evidence base identified above, there is probably a need for specialist advice whenever they are used in a live review—and certainly if workload reduction is planned (i.e. if their use extends beyond prioritising screening). We also found a lack of replication studies, which makes it difficult to compare the efficacy of different approaches across review contexts, and few evaluations outside the biomedical domain. Challenges in using such technologies include questions about how they might scale to large reviews and how to model accurate classifiers when the decisions made by reviewers are likely to err on the side of caution, and hence be over-inclusive.

Strengths and limitations of this review

To the best of our knowledge, this is the first systematic review that has brought together evidence concerning the use of text mining for screening in systematic reviews. We have identified a varied, innovative and potentially extremely important evidence base—which one day may do much to improve review efficiency and so improve decision-making. We hope that this review will help the different areas of the field to ‘speak’ to one another and so facilitate the development of the field as a whole.

As there are no other systematic reviews of this area, we had a broad review question, which encompassed any approach. This has enabled us to identify the cross-cutting issues in the field but has limited the quantity of technical information that we have been able to present. For example, a narrower review focused solely on active learning might be able to delve into the specifics in more detail.

An inevitable limitation due to setting the scope of the review to evaluations of text mining approaches within systematic reviews is that relevant research in other areas is excluded. For example, if we had reviewed all potentially relevant research about text mining and active learning (an almost impossible task!), other technologies and approaches, beyond those so far evaluated in systematic reviews, might well have come to light. Whilst this limitation was impossible to avoid, it is nevertheless a significant limitation, because only a small subset of possible approaches to, for example, feature selection/enrichment and distance analytics, have been tested within the systematic review literature. The field of text mining contains many more possibilities—and some may be more effective and appropriate than those so far evaluated.

A limitation which applies to any systematic review is that we may not have managed to find every relevant study. This was highlighted to us during the peer review process when another relevant study came to light. This study was focused on a text mining approach and utilised data from systematic reviews as its test scenario [ 71 ]. There may be other papers like this one which we have inadvertently missed.

Further possibilities

It is interesting to note that text mining approaches to support screening have followed the human reviewer’s initial approach of using titles, abstracts and keywords. The human reviewer will retrieve full text for further review, but typically text mining approaches so far have not processed full text in support of the screening process. There are essentially three issues to consider here. Firstly, there is the issue of how well a title, abstract and metadata can satisfy a complex information need. For example, regarding use of an abstract to determine what claims are being made, Blake found that, in biomedicine, fewer than 8% of the scientific claims made in full-text articles were to be found in their abstracts, which would certainly motivate the need to process full text [ 86 ].

Cohen and colleagues have investigated more widely the implications for text mining of processing abstracts as opposed to full-text articles, and moreover mention a second issue, to do with problems that may arise for systems in going from the processing of abstracts to the processing of full text, but note that there are opportunities to be exploited in so doing [ 87 ]. Text mining technology has, however, improved greatly since that publication. There are now text mining systems that process large amounts of full text and that support sophisticated semantic search. For example, Europe PubMed Central, a large archive for the Life Sciences, showcases on its Labs site a semantic search system, EvidenceFinder, that is underpinned by deep parsing, conducted in a cloud environment, of some 2.5 m articles to yield over 83 m searchable facts ( http://labs.europepmc.org/evf ).

Text mining can increasingly handle deep analysis of full-text context, at scale, thus it would be natural to move towards exploiting such a capability in support of systematic reviews. However, this leads into the third issue, concerning copyright, licencing and lawful access to full-text content for text mining purposes. Reviewers already run into this issue when they find that their institution does not subscribe to some journal, for example. However, even if one’s institution does have the relevant subscription, licencing terms may explicitly disallow text mining or allow it but place constraints on use of its results. This is a hot topic, with researchers claiming that ‘the right to read is the right to mine’ (Open Knowledge Foundation). Open Access publications are not subject to the same constraints as subscription-based content; however, there is growing concern amongst researchers and funding bodies that opportunities are being lost to advance knowledge and boost innovation and growth due to restrictive copyright and licencing regimes that are unsuited to the digital age [ 88 , 89 ]. Most recently, the UK has passed legislation to legalise text mining for non-commercial use ( http://www.legislation.gov.uk/uksi/2014/1372/regulation/3/made ). There is thus a valuable opportunity for the systematic reviewing community in the UK at least to work closely with its text mining community to exploit the benefits of full-text processing, particularly to improve screening and to reduce the need for humans to laboriously move from abstract to full text to carry out a more specific check for relevance.

The use of automation to assist in study selection is possibly the most advanced of all the areas where automation in systematic reviews is being developed; but others range from writing sections of the report, formulating the review question and automated data extraction and quality assessment [ 90 – 93 ].

Recommendations

Recommendations for research.

●  More replications using the same text mining methods on different datasets are required.

●  Likewise, different methods using the same dataset are also needed in order genuinely to compare one with another.

●  To facilitate the above, data on which evaluations are based should be made public as often as possible.

●  The testing of the methods reviewed here in other disciplines is urgently required. For example, the field of Development Studies may be more complex and thus demand more of the text mining (promoting more innovation to overcome new hurdles).

Recommendations for reviewing practice

●  Reviewers should engage with the computer science community to develop and evaluate methods and systems jointly.

●  Using text mining to prioritise the order in which items are screened should be considered safe and ready for use in ‘live’ reviews.

●  The use of text mining as a ‘second screener’ may be used cautiously in the knowledge that the assumption is that the human reviewer is not missing relevant studies systematically.

●  The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.

Whilst there is a relatively abundant and active evidence base evaluating the use of text mining for reducing workload in screening for systematic reviews, it is a diverse and complex literature. The vast array of different issues explored makes it difficult to draw any conclusions about the most effective approach. There are, however, key messages regarding the complexity of applying text mining to the systematic review context and the challenges that implementing such technologies in this area will encounter. Future research will particularly need to address: the issue of replication of evaluations; the suitability of the technologies for use across a range of subject-matter areas; and the usability and acceptability of using these technologies amongst systematic review (non-computer scientist) audiences.

a A ‘method’, in the context of this review, is the application of a specific technology or a process within a systematic review. This is a somewhat broad definition which includes, for example, both the use of a classifier to classify citations as being relevant/irrelevant; and also the ‘active learning’ approach, which incorporates a classifier as part of its process. This broad definition reflects the practical purpose of this review—we are interested in approaches that can be applied in systematic reviews, and these may be individual tools, combinations of tools or processes for using them.

b The practicalities of implementing text mining in live reviews are the subject of a current project by the EPPI-Centre and NaCTeM, which aims to address some of these issues. Project URL: http://www.ioe.ac.uk/research/63969.html .

Abbreviations

complement naïve Bayes

false negative learning rate

health technology assessment

Library, Information Science & Technology Abstracts

natural language processing

support vector machine

visual data mining

work saved over sampling.

Gough D, Elbourne D: Systematic research synthesis to inform policy, practice and democratic debate. Soc Policy Soc 2002, 1: 225–36.

Article   Google Scholar  

Gough D, Oliver S, Thomas J: An Introduction to Systematic Reviews . London: Sage; 2012.

Google Scholar  

Gough D, Thomas J, Oliver S: Clarifying differences between review designs and methods. Syst Rev 2012., 1 (28) : doi:10.1186/2046–4053–1-28

Chalmers I, Hedges L, Cooper H: A brief history of research synthesis. Eval Health Prof 2002, 25: 12–37. 10.1177/0163278702025001003

Article   PubMed   Google Scholar  

Mulrow C: Rationale for systematic reviews. BMJ 1994, 309: 597–9. 10.1136/bmj.309.6954.597

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bastian H, Glasziou P, Chalmers I: Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med 2010., 7 (9) :

Lefebvre C, Manheimer E, Glanville J: Searching for studies (chapter 6). In Cochrane Handbook for Systematic Reviews of Interventions Version 510 [updated March 2011] . Edited by: Higgins J, Green S. Oxford: The Cochrane Collaboration; 2011.

Gomersall A, Cooper C: Database selection bias and its affect on systematic reviews: a United Kingdom perspective. In Joint Colloquium of the Cochrane and Campbell Collaborations . Keystone, Colorado: The Campbell Collaboration; 2010.

Harden A, Peersman G, Oliver S, Oakley A: Identifying primary research on electronic databases to inform decision-making in health promotion: the case of sexual health promotion. Health Educ J 1999, 58: 290–301. 10.1177/001789699905800310

Sampson M, Barrowman N, Moher D, Clifford T, Platt R, Morrison A, et al .: Can electronic search engines optimize screening of search results in systematic reviews: an empirical study. BMC Med Res Methodol 2006., 6 (7) :

Wallace B, Trikalinos T, Lau J, Brodley C, Schmid C: Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics 2010., 11 (55) :

Allen I, Olkin I: Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA 1999, 282 (7) : 634–5. 10.1001/jama.282.7.634

Article   CAS   PubMed   Google Scholar  

Felizardo K, Andery G, Paulovich F, Minghim R, Maldonado J: A visual analysis approach to validate the selection review of primary studies in systematic reviews. Inf Softw Technol 2012, 54 (10) : 1079–91. 10.1016/j.infsof.2012.04.003

Malheiros V, Hohn E, Pinho R, Mendonca M: A visual text mining approach for systematic reviews. In Empirical Software Engineering and Measurement, 2007 ESEM 2007 First International Symposium on: 2007 2007 . Piscataway: IEEE; 2007:245–54.

Miroslav K, Matwin S: Addressing the curse of imbalanced training sets: one-sided selection. Proceedings of the Fourteenth International Conference on Machine Learning: 1997 1997.

Watt A, Cameron A, Sturm L, Lathlean T, Babidge W, Blamey S, et al .: Rapid reviews versus full systematic reviews: an inventory of current methods and practice in health technology assessment. Int J Technol Assess Health Care 2008, 24 (2) : 133–9.

Ananiadou S, McNaught J: Text Mining for Biology and Biomedicine . Boston/London: Artech House; 2006.

Hearst M: Untangling Text Data Mining. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999): 1999 1999, 3–10.

Thomas J, McNaught J, Ananiadou S: Applications of text mining within systematic reviews. Res Synth Methods 2011, 2 (1) : 1–14. 10.1002/jrsm.27

Ananiadou S, Okazaki N, Procter R, Rea B, Sasaki Y, Thomas J: Supporting systematic reviews using text mining. Soc Sci Comput Rev 2009, 27: 509–23. 10.1177/0894439309332293

Thomas J: Diffusion of innovation in systematic review methodology: why is study selection not yet assisted by automation? OA Evid Based Med 2013, 1 (2) : 12.

Thomas J, Brunton J, Graziosi S: EPPI-Reviewer 4.0: Software for Research Synthesis . London: EPPI-Centre Software, Social Science Research Unit, Institute of Education; 2010.

Jonnalagadda S, Petitti D: A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des 2013, 6 (1–2) : 5–17.

Article   PubMed   PubMed Central   Google Scholar  

Frunza O, Inkpen D, Matwin S: Building systematic reviews using automatic text classification techniques. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters: 2010 2010 . Beijing China: Association for Computational Linguistics; 2010:303–11.

Wallace B, Small K, Brodley C, Trikalinos T: Active learning for biomedical citation screening. KDD 2010; Washington USA 2010.

Lavoie M, Verbeek J: Devices for preventing percutaneous exposure injuries caused by needles in healthcare personnel. Cochrane Database Syst Rev 2014., 2014 (3) :

Mischke C, Verbeek J, Saarto A, Lavoie MC, Pahwa M, Ijaz S: Gloves, extra gloves or special types of gloves for preventing percutaneous exposure injuries in healthcare personnel. Cochrane Database Syst Rev 2014., 2014 (3) :

Martin A, Saunders D, Shenkin S, Sproule J: Lifestyle intervention for improving school achievement in overweight or obese children and adolescents. Cochrane Database Syst Rev 2014., 2014 (3) :

Fletcher-Watson S, McConnell F, Manola E, McConachie H: Interventions based on the Theory of Mind cognitive model for autism spectrum disorder (ASD). Cochrane Database Syst Rev 2014., 2014 (3) :

Bekhuis T, Demner-Fushman D: Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. Artif Intell Med 2012, 55 (3) : 197–207. 10.1016/j.artmed.2012.05.002

Shemilt I, Simon A, Hollands G, Marteau T, Ogilvie D, O’Mara-Eves A, et al .: Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synth Methods 2013, 13: 1218. n/a-n/a

Hammerstrøm K, Wade A, Jørgensen A: Searching for Studies: A Guide to Information Retrieval for Campbell Systematic Reviews . Keystone, Colorado: Campbell Collaboration; 2010.

Institute of Medicine of the National Academies: Finding what works in health care: standards for systematic reviews . Washington, DC: Institute of Medicine of the National Academies; 2011.

Cohen A: Performance of support-vector-machine-based classification on 15 systematic review topics evaluated with the WSS@95 measure. J Am Med Inform Assoc 2011, 18: 104-–4.

Cohen A, Ambert K, McDonagh M: A prospective evaluation of an automated classification system to support evidence-based medicine and systematic review. AMIA Annual Symposium 2010, 121–5.

Cohen A, Hersh W, Peterson K, Yen P-Y: Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc 2006, 13 (2) : 206–19. 10.1197/jamia.M1929

Cohen A: An effective general purpose approach for automated biomedical document classification. In AMIA Annual Symposium Proceedings, vol. 13 . Washington, DC: American Medical Informatics Association; 2006:206–19.

Fiszman M, Ortiz E, Bray BE, Rindflesch TC: Semantic Processing to Support Clinical Guideline Development. AMIA 2008 Symposium Proceedings: 2008 2008 2008, 187–91.

Kim S, Choi J: Improving the performance of text categorization models used for the selection of high quality articles. Healthc Informatics Res 2012, 18 (1) : 18–28. 10.4258/hir.2012.18.1.18

Ma Y: Text Classification on Imbalanced Data: Application to Systematic Reviews Automation . Ottawa: University of Ottawa; 2007.

Matwin S, Kouznetsov A, Inkpen D, Frunza O, O’Blenis P: A new algorithm for reducing the workload of experts in performing systematic reviews. J Am Med Inform Assoc 2010, 17 (4) : 446–53. 10.1136/jamia.2010.004325

Matwin S, Kouznetsov A, Inkpen D, Frunza O, O’Blenis P: Performance of SVM and Bayesian classifiers on the systematic review classification task. J Am Med Inform Assoc 2011, 18: 104–5.

Matwin S, Sazonova V: Correspondence. J Am Med Inform Assoc 2012, 19: 917-–7. 10.1136/amiajnl-2012-001072

Razavi A, Matwin S, Inkpen D, Kouznetsov A: Parameterized Contrast in Second Order Soft Co-Occurrences: A Novel Text Representation Technique in Text Mining and Knowledge Extraction. In 2009 Ieee International Conference on Data Mining Workshops: 2009 2009 . New York: Ieee; 2009:471–6.

Chapter   Google Scholar  

Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S: Reducing systematic review workload through certainty-based screening. J Biomed Inform 2014, 51: 242–53. doi:10.1016/j.jbi.2014.06.005

Sun Y, Yang Y, Zhang H, Zhang W, Wang Q: Towards evidence-based ontology for supporting Systematic Literature Review. In Proceedings of the EASE Conference 2012: 2012 2012 . Ciudad Real Spain: IET; 2012.

Tomassetti F, Rizzo G, Vetro A, Ardito L, Torchiano M, Morisio M: Linked data approach for selection process automation in systematic reviews. Evaluation & Assessment in Software Engineering (EASE 2011), 15th Annual Conference on: 2011 2011; Durham 2011, 31–5.

Wallace B, Small K, Brodley C, Lau J, Schmid C, Bertram L, et al .: Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining. Genet Med 2012, 14: 663–9. 10.1038/gim.2012.7

Wallace B, Small K, Brodley C, Lau J, Trikalinos T: Modeling Annotation Time to Reduce Workload in Comparative Effectiveness Reviews. Proc ACM International Health Informatics Symposium: 2010 2010 2010, 28–35.

Wallace B, Small K, Brodley C, Lau J, Trikalinos T: Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium: 2012 . New York: ACM; 2012:819–24.

Yu W, Clyne M, Dolan S, Yesupriya A, Wulf A, Liu T, et al .: GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. BMC Bioinformatics 2008., 205 (9) :

Choi S, Ryu B, Yoo S, Choi J: Combining relevancy and methodological quality into a single ranking for evidence-based medicine. Inf Sci 2012, 214: 76–90.

Fiszman M, Bray BE, Shina D, Kilicoglu H, Bennett GC, Bodenreider O, et al .: Combining relevance assignment with quality of the evidence to support guideline development. Stud Health Technol Inform 2010, 160 (1) : 709–13.

PubMed   PubMed Central   Google Scholar  

Kouznetsov A, Japkowicz N: Using classifier performance visualization to improve collective ranking techniques for biomedical abstracts classification. In Advances in Artificial Intelligence, Proceedings: 2010 . Berlin: Springer-Verlag Berlin; 2010:299–303.

Kouznetsov A, Matwin S, Inkpen D, Razavi A, Frunza O, Sehatkar M, et al .: Classifying biomedical abstracts using committees of classifiers and collective ranking techniques. In Advances in Artificial Intelligence, Proceedings: 2009 . Berlin: Springer-Verlag Berlin; 2009:224–8.

Martinez D, Karimi S, Cavedon L, Baldwin T: Facilitating biomedical systematic reviews using ranked text retrieval and classification. Proceedings of the 13th Australasian Document Computing Symposium: 2008; Hobart Australia 2008, 53.

Thomas J, O’Mara A: How can we find relevant research more quickly? In NCRM MethodsNews . UK: NCRM; 2011:3.

Wallace B, Small K, Brodley C, Trikalinos T: Who should label what? Instance allocation in multiple expert active learning. Proc SIAM International Conference on Data Mining: 2011 2011, 176–87.

Bekhuis T, Demner-Fushman D: Towards automating the initial screening phase of a systematic review. Stud Health Technol Inform 2010, 160 (1) : 146–50.

PubMed   Google Scholar  

Bekhuis T, Tseytlin E, Mitchell K, Demner-Fushman D: Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence. PLoS One 2014, 9 (1) : e86277. 10.1371/journal.pone.0086277

Frunza O, Inkpen D, Matwin S, Klement W, O’Blenis P: Exploiting the systematic review protocol for classification of medical abstracts. Artif Intell Med 2011, 51 (1) : 17–25. 10.1016/j.artmed.2010.10.005

García Adevaa J, Pikatza-Atxa J, Ubeda-Carrillo M, Ansuategi-Zengotitabengoa E: Automatic text classification to support systematic reviews in medicine. Expert Syst Appl 2014, 41 (4) : 1498–508. 10.1016/j.eswa.2013.08.047

Felizardo K, Maldonado J, Minghim R, MacDonell S, Mendes E: An extension of the systematic literature review process with visual text mining: a case study on software engineering. 16. Unpublished. Downloadable from: (p.164) http://www.teses.usp.br/teses/disponiveis/55/55134/tde-18072012–102032/publico/Thesis.pdf

Felizardo K, Salleh N, Martins R, Mendes E, MacDonell S, Maldonado J: Using visual text mining to support the study selection activity in systematic literature reviews. Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on: 2011; Banff 2011, 77–86.

Felizardo R, Souza S, Maldonado J: The use of visual text mining to support the study selection activity in systematic literature reviews: a replication study. Replication in Empirical Software Engineering Research (RESER), 2013 3rd International Workshop on: 2013; Baltimore 2013, 91–100.

Cohen A, Ambert K, McDonagh M: Cross-topic learning for work prioritization in systematic review creation and update. J Am Med Inform Assoc 2009, 16: 690–704. 10.1197/jamia.M3162

Brunton G, Caird J, Sutcliffe K, Rees R, Stokes G, Stansfield C, et al .: Depression, Anxiety, Pain and Quality of Life in People Living with Chronic Hepatitis C: A Systematic Review and Meta-Analysis . London: EPPI Centre, Social Science Research Unit, Institute of Education, University of London; 2014.

Cohen A: Optimizing feature representation for automated systematic review work prioritization. AMIA Annual Symposium Proceedings: 2008 2008, 121–5.

Cohen A, Ambert K, McDonagh M: Studying the potential impact of automated document classification on scheduling a systematic review update. BMC Med Inform Decis Mak 2012, 12 (1) : 33. 10.1186/1472-6947-12-33

Dalal S, Shekelle P, Hempel S, Newberry S, Motala A, Shetty K: A pilot study using machine learning and domain knowledge to facilitate comparative effectiveness review updating. Med Decis Making 2013, 33 (3) : 343–55. 10.1177/0272989X12457243

Small K, Wallace B, Brodley C, Trikalinos T: The constrained weight space SVM: learning with ranked features. In Proceedings of the 28th International Conference on Machine Learning . Bellevue, WA, USA: ICML; 2011.

Sampson M, Tetzlaff J, Urquhart C: Precision of healthcare systematic review searches in a cross-sectional sample. Res Synth Methods 2011, 2: 119–25. 10.1002/jrsm.42

Sasaki Y: Automatic text classification. University of Manchester: presentation 2008.

Tomek I: Two modifications of CNN. IEEE Trans Syst Man Cybern 1976, SMC-6 (11) : 769–72.

Brinker K: Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th International Conference on Machine Learning: 2003 . Palo Alto: AAAI Press; 2003:59–66.

Gama J, Žliobaitė A, Bifet A, Pechenizkiy M, Bouchachia A: A survey on concept drift adaptation. ACM Comput Surv (CSUR) 2014, 46 (4) : 44.

Pan S, Qiang Y: A survey on transfer learning. Knowledge and Data Engineering. IEEE Trans Syst Man Cybern 2010, 22 (10) : 1345–59.

Davis J, Goadrich M: The relationship between Precision-Recall and ROC curves. In ICML '06 Proceedings of the 23rd international conference on Machine learning 2006 . New York, NY, USA: ACM; 2006.

García V, Mollineda R, Sánchez J: A bias correction function for classification performance assessment in two-class imbalanced problems. Knowl Based Syst 2014, 59: 66–74.

Tsafnat G, Dunn A, Glasziou P, Coiera E: The automation of systematic reviews. BMJ 2013., 346 (f139) :

Settles B: Active Learning Literature Survey. Computer Sciences Technical Report 1648 . Wisconsin: University of Wisconsin–Madison; 2009.

Sarveniazi A: An actual survey of dimensionality reduction. Am J Comput Math 2014, 4: 55–72. 10.4236/ajcm.2014.42006

Elkan C: The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence: 2001 . Seattle, Washington: Morgan Kaufmann Publishers Inc; 2001.

Cao P, Zhao D, Zaiane O: An optimized cost-sensitive SVM for imbalanced data learning. In Advances in Knowledge Discovery and Data Mining: 2013 . Berlin Heidelberg: Springer; 2013:280–92.

Margineantu D: Active cost-sensitive learning. In Proceedings of the 19th International Joint Conference on Artificial Intelligence: 2005 . Burlington: Morgan Kaufmann Publishers Inc; 2005.

Blake C: Beyond genes, proteins, and abstracts: identifying scientific claims from full-text biomedical articles. J Biomed Inform 2010, 43: 173–89. 10.1016/j.jbi.2009.11.001

Cohen K, Johnson H, Verspoor K, Roeder C, Hunter L: The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics 2010., 11 (492) :

Truyens M, Van Eecke P: Legal aspects of text mining. Comput Law Secur Rev 2014, 302 (2) : 153–70.

Reichman J, Okediji R: When copyright law and science collide: empowering digitally integrated research methods on a global scale. Minn Law Rev 2012, 96 (4) : 1362–480.

Tsafnat G, Glasziou P, Choong M, Dunn A, Galgani F, Coiera E: Systematic review automation technologies. Syst Rev 2014, 3 (1) : 74. 10.1186/2046-4053-3-74

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I: ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak 2010, 10 (1) : 56. 10.1186/1472-6947-10-56

Marshall I, Kuiper J, Wallace B: Automating risk of bias assessment for clinical trials. In BCB '14 Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics . New York, NY, USA: ACM; 2014:88–95.

Summerscales R: Automatic Summarization of Clinical Abstracts for Evidence-Based Medicine . Chicago, Illinois: Graduate College of the Illinois Institute of Technology; 2013.

Download references

Acknowledgements

This work was supported by grants awarded by the UK Medical Research Council: Identifying relevant studies for systematic reviews and health technology assessments using text mining [Grant No. MR/J005037/1, also MR/L01078X/1: Supporting Evidence-based Public Health Interventions using Text Mining ].

Author information

Authors and affiliations.

Evidence for Policy and Practice Information and Coordinating (EPPI)-Centre, Social Science Research Unit, UCL Institute of Education, University of London, London, UK

Alison O’Mara-Eves & James Thomas

The National Centre for Text Mining and School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK

John McNaught & Sophia Ananiadou

Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, 468-8511, Japan

Makoto Miwa

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to James Thomas .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

JT and AOM conceived the study and wrote the protocol. AOM conducted the principal data extraction and took the lead on writing this paper. JT undertook 2 nd data extraction and journal submission, assisted with writing the paper, and made amendments in the light of peer reviewers’ comments. JM, MM and SA contributed domain expertise and wrote sections of the paper. The authors authored three of the studies included in the review [ 31 , 45 , 57 ]. All authors read and approved the final manuscript.

An erratum to this article is available at http://dx.doi.org/10.1186/s13643-015-0031-5 .

Electronic supplementary material

13643_2014_321_moesm1_esm.doc.

Additional file 1: Appendix A. Search strategy. Appendix B. Data extraction tool. Appendix C. List of studies included in the review ( n = 44). Appendix D. Characteristics of included studies. (DOC 151 KB)

Additional file 2: Flow diagram. (DOC 58 KB)

Rights and permissions.

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/ .

The Creative Commons Public Domain Dedication waiver ( https://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

O’Mara-Eves, A., Thomas, J., McNaught, J. et al. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4 , 5 (2015). https://doi.org/10.1186/2046-4053-4-5

Download citation

Received : 07 September 2014

Accepted : 10 December 2014

Published : 14 January 2015

DOI : https://doi.org/10.1186/2046-4053-4-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Text mining
  • Study selection
  • Review efficiency

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

text mining research papers 2020

Advertisement

Advertisement

Recent trends of green human resource management: Text mining and network analysis

  • Research Article
  • Published: 05 July 2022
  • Volume 29 , pages 84916–84935, ( 2022 )

Cite this article

text mining research papers 2020

  • Chetan Sharma   ORCID: orcid.org/0000-0001-5401-8503 1 ,
  • Sumit Sakhuja 2 &
  • Shivinder Nijjer 2  

9752 Accesses

24 Citations

Explore all metrics

Issues of the environmental crisis are being addressed by researchers, government, and organizations alike. GHRM is one such field that is receiving lots of research focus since it is targeted at greening the firms and making them eco-friendly. This research reviews 317 articles from the Scopus database published on green human resource management (GHRM) from 2008 to 2021. The study applies text mining, latent semantic analysis (LSA), and network analysis to explore the trends in the research field in GHRM and establish the relationship between the quantitative and qualitative literature of GHRM. The study has been carried out using KNIME and VOSviewer tools. As a result, the research identifies five recent research trends in GHRM using K-mean clustering. Future researchers can work upon these identified trends to solve environmental issues, make the environment eco-friendly, and motivate firms to implement GHRM in their practices.

Similar content being viewed by others

text mining research papers 2020

Green Human Resource Management and Corporate Social Responsibility for a Sustainable Environment: A Bibliometric Review

text mining research papers 2020

An Overview of Challenges and Research Avenues for Green Business Process Management

text mining research papers 2020

Research Trends in Green Human Resource Management: A Comprehensive Review of Bibliometric Data

Explore related subjects.

  • Artificial Intelligence
  • Medical Ethics
  • Environmental Chemistry

Avoid common mistakes on your manuscript.

Introduction

In the current era, the entire globe is faced with unprecedented environmental issues (Rajabpour et al. 2022 ). This era marks the epoch of globalization, digitalization, and technology, which have penetrated human lives to the extent that technology is required to perform even mundane routine tasks. In the past decade, environmental degradation and climate change have posed significant global threats such as natural disasters (droughts, hot waves, wildfires), resulting in a loss to the economy (Shafaei et al. 2020a ). In addition, industrialization contributes to increasing global warming throughout the world. Human actions, automation, and many other factors are responsible for global warming (UNEP 2020 ). In the year 2019, the Covid-19 virus shook the entire world, such that in the year 2020, it was declared a global pandemic. This virus has and is still degrading each country’s economy. However, restrictions imposed by governments on human movement across different nations to contain the spread of the virus brought about slight environmental rejuvenation. According to UNEP (United Nations Environment Programme), the pandemic slowed down carbon dioxide emissions by 7% compared to previous years. Despite this, the world is still experiencing a 3% rise in temperature due to global warming (UNEP 2020 ). Various environmentalists, researchers, governments, and organizations have come forward to put various efforts into action. Agencies like UNCC (United nation climate change) have drafted some guidelines for environmental issues, which are represented on international platforms (UNCC 1997 ) (UNFCC 2007 ) (Agreement 2016 ).

This represents an important topic because ecological problems are about mindfulness and awareness. A critical asset that can be used to identify a solution for most of the issues mentioned above is the people. Work considering green human resource management has a rich background, and the management of human resources is central to any organization’s strategy for fostering growth among its workers and entities (Vuong and Sid 2020 ). HRM serves as the organization’s formal system for managing human resources and ensures that its vision and mission are achieved. Recruitment, hiring, onboarding, performance evaluations, on-the-the-job orientations, performance evaluations, training, retraining, performance evaluations, and employee discipline are all aspects of human resource management (Li et al. 2020 ). Until the mid-1990s, the HR department was not addressing strategic issues for practitioners and academics. Even in the twenty-first century, the human resources department was still an essential ally in expanding any business or organization. As a result of increased global competition, these organizations had to make policy and system changes. Since the dawn of globalization in the early nineteenth century, people everywhere have been concerned about the environmental consequences of this process.

Governments, businesses, and public and private sectors have committed to implementing green human resource management in response to environmental concerns (GHRM) (Rondinelli and Berry 2000 ; Victor 2011 ). In consideration of environmental issues, HRM has shifted its approach to green human resource management (GHRM) (Rondinelli and Berry 2000 ; Victor 2011 ). Various international bodies dealing with environmental issues have moved a firm’s culture from HRM to GHRM (Wehrmeyer 2017 ). Wehrmeyer ( 2017 ) edited a book titled “Greening people: human resource and environmental management,” where the first mention of “Green” in HRM occurred. The concept of GHRM gained existence in research works after the 1990s. A critical asset that can be used to identify a solution for all the issues discussed above is the people. Environmental problems are about mindfulness and awareness.

Human resource management began its journey a long time ago to improve the organization (Sharma et al. 2022 ). When Robert Owen and Charles Babbage came up with a simple idea during the Industrial Revolution in Europe in the eighteenth century, the human resources department was born. In the early 1900s, the HR department was referred to as the personal department. HR departments play an essential role throughout employee life, from recruitment to retention (Dulebohn et al. 1995 ). Personnel departments received official recognition in 1921 from the National Institute of Industrial Psychology (NIIP). Management of human resources is central to any organization’s strategy for fostering growth among its workers and entities (Vuong and Sid 2020 ). Using HRM policies, any organization can boost its employees’ productivity and commitment by influencing their motivation, ability, and availability for work-related responsibilities (Sharma et al. 2022 ). There is a long history of HRM’s role in boosting every sector that contributes to the economy.

To ensure that the organization’s vision and mission can be achieved, HRM serves as the organization’s formal system for managing human resources. Recruitment, hiring, onboarding, performance evaluations, on-the-the-job orientations, performance evaluations, training, retraining, performance evaluations, and employee discipline are all aspects of human resource management (Li et al. 2020 ). A company’s HR department determines how well its employees perform. When employees in any industry are selected, trained, inducted, monitored, rewarded, and promoted efficiently, this statement states that the company can produce goods that aid in realizing its vision. Human resource management (HRM) encourages employees to perform at their peak levels to get the most value from their time spent with the company. Until the mid-1990s, the HR department was not addressing strategic issues for practitioners and academics. Even in the twenty-first century, the human resources department is still an essential ally in expanding any business or organization. As a result of increased global competition, these organizations had to make policy and system changes. According to Ehrlich ( 1997 ), how well a company’s HR system works and how well it treats its employees, the company’s growth and value are evaluated. Human talent can be efficiently and effectively utilized through human resource management. Since the dawn of globalization in the early nineteenth century, people everywhere have been concerned about the environmental consequences of this process. Governments, businesses, and public and private sectors have committed to implementing green human resource management in response to environmental concerns (GHRM) (Rondinelli and Berry 2000 ; Victor 2011 ).

This section outlines the existing methods available in the literature for green human resource management. Firms all over the globe are also playing a significant role in the savior of the environment. These firms are adopting policies and practices to ensure sustainability. The human resource management (HRM) department is responsible for designing, implementing, and maintaining the sustainability culture (Collings et al. 2018 ). In addition, research shows that the HRM department is associated with recruiting people, making strategies, and providing facilities to employees for the organization’s benefit (Heneman et al. 2000 ).

Renwick and others described GHRM as integrating Human Resource Management and Corporate Environment Management (Renwick et al. 2008 ). Green HRM is also considered to combine green policies and human resource policies (Jamal et al. 2021 ), as HRM encompasses the process of recruitment, performance, benefits, rewards, training, and development, and other employee-related tasks; for each process, green policies which incorporate the provision of sustainability and addressing environmental issues are provided by the government and international bodies (Jabbour et al. 2010 ; Daily and Huang 2001 ; Jackson et al. 2011 ; Sarkis et al. 2010 ). Jackson et al. ( 2011 ) revised the definition of GHRM, stating that it is all about greening the HRM practices in context to its functional and competitive dimensions. Development, implementation, and maintaining the system to make employees and organizations green to achieve environmental goals contribute to ecological sustainability (Yong et al. 2020a ). Numerous revolutionary initiatives have emerged by combining green with other fields, such as green marketing (Grant 2008 ), green finance (Bebbington 2001 ), green retailing (Lai et al. 2010 ), and green in the integration of HRM is evolving day by day.

Many literature reviews have been conducted in GHRM (Paulet et al. 2021 ; Pham et al. 2020 ; Shahriari et al. 2019 ). According to the author (Renwick et al. 2013 ), organizations’ understanding of developing green abilities and giving employees opportunities to participate in environmental management organizational efforts delays how GHRM practices influence employee motivation to participate in environmental activities. EM improvement efforts may be hindered because organizations are not using the full range of GHRM practices. Jackson et al. ( 2011 ) discuss different functions of the HRM practices and stimulate the HRM field to expand its role in environmental sustainability. They describe the multiple opportunities for integrating strategic HRM and environment management. Jabbour and de Sousa Jabbour ( 2016 ) link GHRM and green supply management as an important subject area of HRM. This study proposed an integrated framework for GHRM and GSCM and a research agenda for this integration. Also, it highlighted the implications of GHRM-GSCM integration for scholars, managers, and practitioners in organizational sustainability and sustainable supply chains. Ren et al. ( 2018 ) provided the reason for introducing the concept of GHRM for effective environment management within the organizations. They provided GHRM theoretical foundations, empirical development, measures, and factors that give rise to GHRM practices. The study advocated the need for understanding and quantifying and thus constructed a model that incorporates all factors influencing it. Yong et al. ( 2020b ) provided a systematic literature review on GHRM to identify different focused areas, approaches, and scope. The author studied five focus areas to determine the performance outcome at the organizational level and individual levels.

Most of these previous works have used a subjective approach to conduct a literature review, which the researcher’s bias may limit. Furthermore, several studies and surveys depict how to collect the database manually. No mathematical or machine learning techniques have been implemented to automatically interpret the corpus results and conclude the key findings. Manual reviews are subjective, limiting the raw number of studies and suffering from biasness in some cases due to the variable experience and skillset of reviewers (YONG 2020 ). The evidence from the last two decades strongly supports the significant drawbacks of manual review, which claims that it does not compare the inter and intra-document comparison of critical terms, methods, and findings. Therefore, research trends and future direction depend on the author’s viewpoint, skillset, and experience.

To overcome these limitations, this study aims to conduct a systematic literature review (SLR) on the Scopus database of GHRM. The primary aim of the research work is to implement a semi-automated literature review that is less biased in interpreting the recent trend of the concerned field. The semi-automated and quantitative analysis of the collected dataset from Scopus has been performed using latent semantic analysis (LSA), which is part of natural language processing (NLP) (Hoblos 2020 ). The primary difference between the systematic manual review and semi-automatic review is shown in Fig.  1 . To develop the semi-automatic analysis routine, the author uses the KNIME tool, which is available as open-source software for all researchers (Alam and Yao 2019 ). The analysis carried out in this work will help the research community to find the core areas and recent trends of GHRM. This technique has already found diverse applications in different domains to manifest the current directions and core areas of those domains (Fortuna et al. 2006 ; Kulkarni et al. 2014 ; Kundu et al. 2015 ; Rani et al. 2017 ; Yalcinkaya and Singh 2015 ). However, to the best of the researchers’ knowledge, there is a lack of empirical research works that articulate recent trends in GHRM.

figure 1

Comparison of systematic manual vs semi-automatic review

Thus, the uniqueness of this study is in the approach followed in carrying out the systematic literature review. Previous studies focussed on thematic analysis of literature providing knowledge about GHRM and their practices, models, and policies adopted by the organization. Instead, our focus is on extracting information from unstructured data using data mining techniques. As a result, it provides essential terms from the collected corpus with their relevant score for each term. Moreover, it gives the semantic similarity between the keywords, which helps in offering recent trends for future researchers. Hence, this study will help researchers find keywords, key topics, and evolving research areas in GHRM using natural language processing.

A systematic literature review follows a PRISMA guideline (2009) which examines the items which need to be reported while performing systematic reviews and meta-analyses. These items include reporting identification, screening, eligibility, and inclusion of relevant studies for performing the quantitative analysis (Aguilar-Hernandez et al. 2021 ; Det Udomsap and Hallinger 2020 ). A recent revision was made to these guidelines in the year 2020, which mandated the inclusion of checklists, explanation, elaboration, and flow diagrams while performing SLR (Page et al. 2021 ). Therefore, this study speaks off the revised PRISMA guidelines (2020) to collect and conduct an analysis of qualitative publications of GHRM.

Many strategies have been proposed in the literature to deal with the prediction of research trends in green human resource management practices; LSA (latent semantic analysis), one of the methods in NLP, helps in identifying and understanding the composed document through term frequency (TF) and inverse document frequency (IDF) scores (Aizawa 2003 ). It is considered one of the best methods to extract and infer meaningful relations of words stored as a bag of words (BOW) (Yalcinkaya and Singh 2015 ). LSA is an objective method to analyze text data to answer the formulated research question by researchers (Evangelopoulos et al. 2012 ). A proven mathematical model provides the same results as a human brain interprets the words to draw semantics from them (Ding 2005 ).

The primary objective of this work is to uncover and predict recent trends and core areas of GHRM. Network analysis related to the research question is done using the VOSviewer tool, represented in the result section later in this article. It will be depicted based on answering the following broad research questions (RQ):

RQ1: Who are the leading researchers and top publishers in GHRM?

RQ2: What is the relationship between quantitative and qualitative literature of GHRM?

RQ3: What are the recent trends and future direction in GHRM?

The structure of the study is shown graphically in Fig.  2 .

figure 2

Paper structure

Methodology

The proposed methodology of this study has been graphically depicted in Fig.  3 .

figure 3

Proposed methodology

Selection of string

The first step in a systematic literature review is the formulation of string developed by following the guidelines (Kitchenham and Charters 2007 ). To develop the string, key terms have been taken from the topic selection, keywords from relevant articles, and AND, OR Boolean operations. Application of AND process on keywords in the search term enables the inclusion of only those articles in search results that contain all of the critical keywords. In contrast, the application of OR operation includes all articles incorporating one or more of the selected keywords. The primary keywords for this study are “Green” and “human resource management.” The final resultant search string after applying various combinations of keywords and using Boolean operations is “green human resource” OR “green human resource management” OR “ ghrm” OR “green hrm.”

Data source

Although there are various digital libraries to retrieve articles related to Green HRM, Scopus was considered for carrying out this research. Furthermore, prior comparative studies between Scopus, Web of Science, and other scholarly databases such as Singh et al. ( 2021 ); Zhu and Liu ( 2020 ); and Harzing and Alakangas ( 2016 ) indicate that Scopus has a broader coverage of articles proving that it is a comprehensive and reliable data source (Tseng et al. 2019 ) thereby justifying its suitability for this review. Therefore, the finalized search string was applied to the Scopus database. As a result, 318 results were fetched, out of which 317 studies were retained to perform analysis. A sample of the obtained corpus has been depicted in Fig.  4 and contains the paper’s title, authors, year, publication source, keywords, and abstract for effective LSA analysis.

figure 4

Corpus sample

Since the analysis is based on text mining, the KNIME tool has been used to conduct this study as it is an open-source tool with text processing features (Fillbrunn et al. 2017 ). The tool is easy to use and allows sharing of workflows among the authors (Dietz and Berthold 2016 ). Network analysis is performed using the VOSviewer tool (open-source software).

KNIME workflows used in this study

The workflow developed in KNIME for conducting a meta-analysis is shown in Fig.  5 .

figure 5

Meta-analysis KNIME workflow

To apply text mining, the text needs to be pre-processed in KMINE. The pre-processing steps include POS (Part of Speech) tagging, removing the numbers from the document, removing stop words (English only), and stemming the words in the paper (Tseng et al. 2019 ). POS tagging creates the token of words from the document, which further makes BOW (Bag of Words). BOW is a dictionary of words used to conduct the LSA. During the normalization process, all the documents are converted to lower or upper case using the case converter node in KNIME, making the paper ready within the corpus for text mining application. This case conversion is essential to neutralize any bias because of case sensitivity.

Furthermore, stop word filtering implies the removal of all those words in the text that do not draw any knowledge like punctuation marks or numbers or stop words in the language (is, am, are of) (Evangelopoulos et al. 2012 ). Finally, the stemming process is used to remove the data redundancy, which will improve the efficiency of the text mining process using LSA (Feldman and Sanger 2006 ). KNIME workflow used to conduct document pre-processing has been shown in Fig.  6 .

figure 6

KNIME workflow to conduct experiment

A sample of all the steps executed for pre-processing the documents and their output at each step has been explained and depicted in Table 1 .

Document 1: Green human resource management deals with environmental sustainability and protecting the environment. This is under the green moment.

Document 2: Various organizations adopt green practices in their daily work culture. Green policies and procedures have been involved in the system of organizations for better sustainability.

Result analysis

The following section presents the meta-analysis and network analysis results to answer the research questions laid at the beginning of the article.

Meta-analysis

Meta-analysis primarily focuses on descriptive statistics of the corpus of articles. Since the corpus contains data on year, author, countries, subject areas, and so on, different metadata can be generated by each article, year, subject area, author, etc.

Top 10 leading authors

Table 2 shows the top 10 leading researchers in GHRM with their number of documents and citations found in the selected corpus.

This implies that the author Jabbour has the maximum number of publications on Green HRM and has been cited. All the remaining leading authors have a publication count of less than 10 in GHRM, while Jabbour’s is 27.

Publications by year

Figure  7 shows that the emergence of articles in the field of GHRM began in the year 2008. The number of papers contributed per year has also risen steadily. The maximum number of publications in this field will be 102 in 2020. It implies that GHRM is a burning area of research in present times. The researchers are increasingly interested in incorporating green practices in HRM to achieve sustainability, manage environmental issues, etc.

figure 7

Year-wise publication analysis

Publications by journals

Figure  8 shows the top 10 journals by a count of publications. The highest number of publications is in the Journal of Cleaner Production (28 articles), representing approximately 10% of the total papers used for analysis. This is followed by sustainability which also has a good publication count, followed by the International Journal of HRM and International Journal of Manpower.

figure 8

Top 10 dominating journal in GHRM

Term frequency (TF)-inverse document frequency (IDF)

A document term matrix is developed from the corpus for ease of text analysis, which lists the absence/ presence or frequency of different terms in a particular document. Using this matrix, popular terms or recurring terms can be fetched. For this purpose, a metric called as term frequency-inverse document frequency (TF-IDF) score is generated. This study generated TF-IDF scores from the corpus to identify the highly weighed article compared to others and extract the terms with a high TF-IDF score. TF-IDF is a technique used under the umbrella of NLP for classification and text summarization (Jones 1972 ; Ramos 2003 ; Wu et al. 2008 ). This technique draws the importance of terms from the extracted terms from the selected corpus. TF-IDF generates the weights against the words extracted from the corpus in BOW, which depends upon the term in a particular document (Yalcinkaya and Singh 2015 ).

TF refers to the term frequency and is used to calculate the frequency of a specific word in the document. It is estimated as the number of times the term appears in the document divided by the total times in the whole document (Artama et al. 2020 ; Kim et al. 2020 ). On the other hand, IDF is used to calculate inverse document frequency. The mathematical formula for TF-IDF is shown in the below equations.

TF-IDF values for the corpus used in this study have been shown in Table 3 , which have been computed automatically by KNIME workflow using Eqs.  1 and 2 . In this study, the top 50 terms have been taken from the corpus encountered most frequently in BOW.

The corpus contains 317 articles extracted from Scopus, and BOW had 3147 unique tokens to make a dictionary of words. Out of these 3147 tokens, the top 20 frequent words have been represented graphically in Fig.  9 .

figure 9

Top 20 frequent terms

The matrix created is of 3147 terms × 317 column and limitation to represent the data, top 50 terms × 317 column has been described as there are high terms extracted from a corpus that is likely to be noisy or redundant, using clustering author identified prominent five latent topics from the corpus.

Terms represented in the matrix represent unique words from the corpus, and during the transformation step, weights are assigned against the terms according to their importance in the particular document. These TF-IDF scores obtained for the corpus can be used to perform the clustering of documents.

Recent trends identification using K-mean clustering

To derive insights from the articles extracted from the Scopus database on GHRM, the papers can be clustered, and then topics can be assigned based on terms that load onto these clusters. As BOW extracts the K number of terms from the corpus, K-mean clustering is applied to this data to draw a topic solution. Selecting the perfect/appropriate number of topics is challenging, but Deerwester (Deerwester et al. 1990 ) suggests that five is an optimal number of topics for a 317 document corpus. The ideal five topics extracted using KNIME with their high loading terms have been represented in Table 4 . These five topics or clusters can now be defined based on their TF-IDF scores. These topic labels are considered recent trends in GHRM on which further research in this field can be focused.

Labeling the clusters

The clusters extracted using K-mean clustering based on the TF-IDF score can be further refined based on their highest score. After analyzing the terms in each cluster, appropriate labels are assigned. They are then grouped according to their weight and are concluded as the recent trends for future research or areas of research needing more attention. Similarly, clusters with high loading articles and corresponding scores can also be generated (Table 5 ).

  • Network analysis

The co-authorship analysis depicts the link between the authors and their co-authors. Such analysis is instrumental in identifying which authors are playing lead researchers and networking to spread the research. This can be highly useful in the strategic planning of research and development capacity-building programs (Morel et al. 2009 ). The co-authorship network for this work is shown in Fig.  10 , generated using the VOSviewer tool. Each color code represents one network of authors. Since Jabbour is at the center of the network and in bold, he is the main contributor with the most extensive networking. Similarly, node sizes also depict the density of the networks (Fig.  10 ).

figure 10

Researcher network

figure 11

Ten leading countries with article count

Countries that are contributing to GHRM are shown in Fig.  12 . Figure  11 shows the ten leading countries’ document count and concludes that China has the leading publication in GHRM. The highest cited research is also published in this country in the area of GHRM (Dumont et al. 2017 ; Kim et al. 2019 ; Jabbour 2013 ; 2015 ; Jackson et al. 2011 ; Shafaei et al. 2020b ; Renwick et al. 2013 ; Teixeira et al. 2016 ). Such cross-country networking can enable the development of better policies and learn from the best practices of other cultures. For example, authors from China and Canada are networking to research in GHRM, and each country’s researcher can leverage the best practices of others.

figure 12

Countries contributing to GHRM

Network analysis can also be applied to keywords to unfold the directions in which research is heading for a particular topic. This can also provide a starting point of study for a few. The network analysis of keywords is shown in Fig.  13 , and the word cloud generated for essential words in the corpus-based on their frequency is shown in Fig.  14 .

figure 13

Network analysis of keywords

figure 14

Word cloud based on frequency of words in BOW

Recent trends and future direction in the field of GHRM?

This work attempted to identify the existing trends in research in GHRM. Findings from keyword network analysis (Fig.  13 ) reveal that existing literature on Green HRM has contributed to specific theories. Ability Motivation Opportunity (AMO) Theory, Grounded theory, and Resource-Based View (RBV). Besides this, it is worth mentioning that works with keywords focusing on investigating GHRM in an employee context also find high occurrence. The highest frequency was cited for relating employee green behavior, employee ambidexterity, the study of generational differences, environmental commitment, green human capital, artificial intelligence, and organizational barriers in the context of GHRM. The study also shows that India still has scope to produce works in this field, and the keywords mentioned above can be a good starting point. Additionally, keywords also depict the areas of application of GHRM researched in literature, viz. hospitals, hotels, faculty, and airlines.

From the topics and clusters extracted using K-mean clustering applied to TF-IDF scores, it is seen that in GHRM, more attention needs to be paid to environment and organization sustainability. The trends in GHRM adequately labelled have been shown in Fig.  15 .

figure 15

Recent trends

The first theme to identify recent trends in Green HRM research that emerged from the analysis is ‘Practices for Organization and Environment Sustainability. Previous works have demonstrated that management practices can promote green behaviour among employees and thereby promote sustainability (Rubel et al. 2021 ). When a firm incorporates green practices in its policies and procedures, individual employees exhibit green behaviour, automatically enabling organizational sustainability. For example, looking for individuals with green values promotes the accomplishment of organizational green goals (Saifulina et al. 2020 ). Some practices followed by the organization to transform HRM to GHRM are Zia et al. ( 2020 ), Mehrajunnisa et al. ( 2021 ), and Suharti and Sugiarto ( 2020 ).

Online portals for attendance leave applications, salary slips, and other services.

Using green printing practices

Green manufacturing

Green dumping of employees’ identity cards

Sharing the task or job between the two employees

Use of video and teleconferencing to conduct interviews or meetings

Using recycled materials like paper, water, and waste material

Use of Teleworking (work from home, using the internet)

Providing online training and development services

Use of company transport or public transport than private vehicles

Using green payroll services

Use of carpooling services with the employees

An electronic record of organization data

Using paperless strategies

An awareness campaign for energy saving

The organization must follow the mentioned practice to get involved in the Green Revolution. Furthermore, practices and policies should be addressed globally on a common platform so that some set of guidelines are developed, which can be followed by organizations and countries (Soomro et al. 2021 ).

The previous works also support the notion that to promote sustainability and make firms environment friendly, the importance of Green HRM practices needs to be acknowledged. Following Rajiani and others’ (2016) work, they have attempted to view GHRM as an innovation, suggesting that to bring a shift in firms to become pro-environmental, firms need to bring about process innovations in GHRM practices (Rajiani et al. 2016 ). Through the adoption and implementation of GHRM practices, firms influence employees’ ability (through appropriate recruitment) and motivation (through green incentives), which fosters the environmental commitment among employees and aids in greening the firms and the environment. Masri and Jaaron ( 2017 ) also demonstrated that GHRM practices contribute to environmental performance, where green recruitment is the most likely and green training is the minor influential contributor (Masri and Jaaron 2017 ). It again emphasizes the AMO theory that when an organization recruits employees with green aptitude, it reflects environmental commitment. This way, firms can strategize HR practices to the green edge. In addition, ecological training contributes to operational performance (Kerdpitak), and environmental and human resources management practices significantly affect firm performance.

Studies have established a direct relationship between corporate social responsibility and firm performance, and indirectly through green behavior (Úbeda-Garc\’\ia et al. 2021 ). It is said that when Green HRM practices are adopted in a firm, they promote environmental performance, leading to organizational optimization, which in turn enhances the financial, operational, and social performance of the firms (Zaid et al. 2018 ). Eco-innovation is innovations targeted at promoting environmental sustainability that is also encouraged when a firm adopts Green HRM practices,. By adopting the eco-innovation culture, the firm’s performance is also enhanced (Muisyo and Qin 2021 ), for example, by rewarding green innovations by employees (Suharti and Sugiarto 2020 ). HRM is the department that takes care of employees and motivates them to adopt green policies. Employee behavior plays a vital role in adopting green practices, and their behavior directly relates to their performance (Amjad et al. 2021 ). Therefore, organizations should include reward policies to motivate their employees to accept the green policies. From the day an employee is recruited until their exit from the organization, HRM handles all policies and practices to make their employee green. Employee ethics and their health also play an essential role in their performance, so health-related issues must be addressed and included in policies so that employees be healthy. A healthy person can attend the office daily, which makes high performance directly impacts organizational performance.

Global practices or strategies like green performance management enrich individual green competencies, thereby contributing to the environment-friendly culture at the workplace and accomplishing organizational green goals (Chakraborty et al. 2020 ). Globally, firms compete to gain an advantage, and Green HRM is a pivotal catalyst for promoting sustainability, the most sought-after factor in the current business environment (Ogbeibu et al. 2020 ). Green HRM also enhances non-green work outcomes such as enhanced economic performance, corporate image, and ethicality among employees (Suharti and Sugiarto 2020 ). Ethical behavior is also promoted through the increased perceived value of the meaningfulness of work (Al-Hawari et al. 2021 ). The relationship between green behaviour and the health outcomes of the employees is almost cyclic (French 2005 ). This is because exhibiting green behaviour helps keep the environment at the firm green, which improves both the physical and mental health of the employees. This, in turn, motivates the employees to exhibit pro-environmental behaviour to reap the health benefits.

Understanding such themes and recent trends can aid the researchers in identifying areas for future research. Besides this, researchers can explore how GHRM can contribute to global recognition and how it can be implemented to make the environment more sustainable. Human existence depends on environmental factors, so organizations must transform themselves into Green organizations to save our environment and save the earth.

Human resource management’s primary goal is to recruit the right people at the right place and time. Human resource management continuously focuses on the performance, engagement, productivity, innovation, and sustainability of the organization and employees (Tayali and Sakyi 2020 ). The human resource department is concerned with advertising, recruitment, selection, training, deployment, performance, rewards, etc. The process of advertising to recruit people is not that eco-friendly and less concerned about environmental sustainability. Due to climate change, our environment is degrading daily and forcing organizations and people to adopt green policies. All human resource processes are taken online, and less paper or green technologies are used to recruit people. Green human resources focus on environmental sustainability, but as this is the growing stage, future researchers concentrate on the areas that need more attention in terms of ecological sustainability. In this article, the author provided recent trends based on the extracted keywords using latent semantic analysis, which may need more attention. Areas that need more attention or that researchers can work on are Practices for Organization and Environment Sustainability, Global Strategies for GHRM, Behaviour Management, Performance Factors, Ethical and Health Benefits, and Eco-Innovation Responsibilities. Natural language processing (NLP) is an emerging area in this current era, and text mining is gaining more interest among researchers. In this study, the author implements the NLP on the Scopus database. In NLP, text mining and LSA are used to carry out this research.

Clusters are extracted using the K-mean clustering algorithm, which is further used to provide the recent trends for the GHRM. Identified areas need to be explored by future researchers for better implementation of green policies in their daily practices. This study tried to uncover the different aspects of the GHRM. Green practices are other for different people and organizations, but the Green Revolution is the area to save our environment by implementing various procedures, according to multiple researchers. Humans adapt to technology very fast, but it affects our environment too, which every individual is aware of. To provide awareness, different organizations introduce various campaigns from time to time. The campaign aims to motivate people to save our natural resources and personal resources to better the environment. This study provides information to researchers about the researchers providing various practices and policies to implement green policies in their practices through extensive research. The author identified various publishers that offer a platform to showcase their research globally. Countries that are leading in the field of GHRM are also represented graphically. This study provides all aspects of GHRM through the research areas identified. This study concludes every research area relates to GHRM and helps in environmental sustainability.

Through their research, progressive policies and practices may be provided to organizations to resolve environmental issues. In addition, literature in GHRM should be identified to provide awareness for the implementation of green practices. Text mining, LSA, and network analysis used in this study will help researchers see challenges in GHRM and work toward implementing green practice to resolve environmental issues.

Agreement, Paris (2016) “The paris agreement.” 2016. https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement

Aguilar-Hernandez GA, Dias Rodrigues JF, Tukker A (2021) Macroeconomic, social and environmental impacts of a circular economy up to 2050: a meta-analysis of prospective studies. J Clean Prod 278(January):123421. https://doi.org/10.1016/j.jclepro.2020.123421

Article   Google Scholar  

Aizawa A (2003) An information-theoretic perspective of Tf–Idf measures. Inf Process Manage 39(1):45–65. https://doi.org/10.1016/S0306-4573(02)00021-3

Al-Hawari MA, Quratulain S, Melhem SB (2021) How and when frontline employees’ environmental values influence their green creativity? Examining the Role of Perceived Work Meaningfulness and Green HRM Practices. J Clean Prod 310:127598

Alam S, Yao N (2019) The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput Math Organ Theory 25(3):319–335. https://doi.org/10.1007/s10588-018-9266-8

Amjad F, Abbas W, Zia-UR-Rehman M, Baig SA, Hashim M, Khan A, Rehman H-u (2021) Effect of green human resource management practices on organizational sustainability: the mediating role of environmental and employee performance. Environ Sci Pollut Res 28(22):28191–28206

Artama M, Sukajaya IN, Indrawan G 2020 Classification of official letters using TF-IDF method. In Journal of Physics: Conference Series, 1516:012001. Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1516/1/012001

Balakrishnan C, Suchithra B, Kasturi S, Janani K (2018) A study on the role of human resource process in translating green policy into practice. Int J Pharm Res Adv Sci Res. https://doi.org/10.31838/ijpr/2018.10.04.048

Bebbington J (2001) “Sustainable development: a review of the international development, business and accounting literature. Account Forum 25:128–57

Cabral C, Dhar RL (2019) Green competencies: construct development and measurement validation. J Clean Prod 235(October):887–900. https://doi.org/10.1016/j.jclepro.2019.07.014

Chakraborty M, Biswas SK, Purkayastha B (2020) Data mining using neural networks in the form of classification rules: a review, 2020 4th International Conference on Computational Intelligence and Networks. (CINE). IEEE, Kolkata, p 1–6

Collings DG, Wood GT, Szamosi LT (2018) Human resource management: a critical approach. In: Human resource management. Routledge, pp 1–23

Daily BF, Huang S-C 2001 Achieving sustainability through attention to human resource factors in environmental management. Int J Oper Prod Manag

Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Info Sci 41(6):391–407

Det Udomsap A, Hallinger P 2020 A bibliometric review of research on sustainable construction, 1994–2018.” Journal of Cleaner Production. Elsevier Ltd. https://doi.org/10.1016/j.jclepro.2020.120073

Dietz C, Berthold MR (2016) KNIME for open-source bioimage analysis: a tutorial. Adv Anat Embryol Cell Biol 219(May):179–197. https://doi.org/10.1007/978-3-319-28549-8_7

Ding CHQ (2005) A probabilistic model for latent semantic indexing. J Am Soc Inform Sci Technol 56(6):597–608. https://doi.org/10.1002/asi.20148

Dulebohn JH, Ferris GR, Stodd JT (1995) The history and evolution of human resource management. Handbook Human Resour Manage 7(9):18–41

Dumont J, Shen J, Deng X (2017) Effects of green HRM practices on employee workplace green behavior: the role of psychological green climate and employee green values. Hum Resour Manage 56(4):613–627. https://doi.org/10.1002/hrm.21792

Ehrlich CJ (1997) Human resource management: a changing script for a changing world. Human Resour Manage (1986-1998) 36(1):85

Elshaer IA, Abu EE, Sobaih MA, Azzaz AMS (2021) The effect of green human resource management on environmental performance in small tourism enterprises: mediating role of pro-environmental behaviors. Sustainability (switzerland) 13(4):1–17. https://doi.org/10.3390/su13041956

Evangelopoulos N, Zhang X, Prybutok VR (2012) Latent semantic analysis: five methodological recommendations. Eur J Inf Syst 21(1):70–86

Feldman R, Sanger J (2006) The text mining handbook. The Text Mining Handbook. Cambridge University Press.  https://doi.org/10.1017/cbo9780511546914

Fillbrunn A, Dietz C, Pfeuffer J, Rahn R, Landrum GA, Berthold MR 2017 “KNIME for reproducible cross-domain analysis of life science data.” Journal of Biotechnology. Elsevier B.V. https://doi.org/10.1016/j.jbiotec.2017.07.028

Fortuna B, Mladenič D, Grobelnik M 2006 Semi-automatic construction of topic ontologies. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4289 LNAI:121–31. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11908678_8 .

French E (2005) The effects of health, wealth, and wages on labour supply and retirement behaviour. Rev Econ Stud 72(2):395–427

Grant J (2008) Green marketing. Strateg Dir 24(6):25–27. https://doi.org/10.1108/02580540810868041

Harzing A-W, Alakangas S (2016) Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison. Scientometrics 106(2):787–804

Heneman RL, Tansky JW, Michael Camp S (2000) Human resource management practices in small and medium-sized enterprises: unanswered questions and future research perspectives. Entrep Theory Pract 25(1):11–26. https://doi.org/10.1177/104225870002500103

Hoblos J 2020 Experimenting with latent semantic analysis and latent dirichlet allocation on automated essay grading. In 2020 7th International Conference on Social Network Analysis, Management and Security, SNAMS 2020. Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/SNAMS52053.2020.9336533

Jabbour (2015) Environmental training and environmental management maturity of brazilian companies with ISO14001: empirical evidence. J Clean Prod 96(June):331–338. https://doi.org/10.1016/j.jclepro.2013.10.039

Jabbour CJC (2013) Environmental training in organisations: from a literature review to a framework for future research. Resour Conserv Recycl 74(May):144–155. https://doi.org/10.1016/j.resconrec.2012.12.017

Jabbour CJ, Chiappetta FC, Santos A, Nagano MS (2010) Contributions of HRM throughout the stages of environmental management: methodological triangulation applied to companies in Brazil. Int J Hum Resour Manage 21(7):1049–1089

Jabbour CJC, de Lopes Sousa Jabbour AB (2016) Green human resource management and green supply chain management: linking two emerging agendas. J Clean Prod 112:1824–33

Jackson SE, Renwick DWS, Jabbour CJC, Muller-Camen M (2011) State-of-the-art and future directions for green human resource management. Ger J Res Hum Resour Manage 25(2):99–116. https://doi.org/10.1688/1862-0000

Jamal T, Zahid M, Martins JM, Mata MN, Rahman HU, Mata PN (2021) Perceived green human resource management practices and corporate sustainability: multigroup analysis and major industries perspectives. Sustainability 13(6):3045

Jones KS 1972 A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation. MCB UP Ltd. https://doi.org/10.1108/eb026526

Kerdpitak C (n.d) The effects of environmental management and HRM practices on the operational performance in Thai pharmaceutical industry. Syst Rev Pharm 11 https://doi.org/10.5530/srp.2020.2.83

Khan NU, Wenya Wu, Saufi RBA, Sabri NAA, Shah AA (2021) Antecedents of sustainable performance in manufacturing organizations: a structural equation modeling approach. Sustainability (switzerland) 13(2):1–23. https://doi.org/10.3390/su13020897

Article   CAS   Google Scholar  

Kim S, Park H, Lee J (2020) Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: a study on blockchain technology trend analysis. Expert Syst Appl 152:113401

Kim YJ, Kim WG, Choi HM, Phetvaroon K (2019) The effect of green human resource management on hotel employees’ eco-friendly behavior and environmental performance. Int J Hosp Manag 76(January):83–93. https://doi.org/10.1016/j.ijhm.2018.04.007

Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering

Kulkarni SS, Apte UM, Evangelopoulos NE (2014) The use of latent semantic analysis in operations management research. Decis Sci 45(5):971–994. https://doi.org/10.1111/deci.12095

Kundu A, Jain V, Kumar S, Chandra C 2015 A journey from normative to behavioral operations in supply chain management: a review using Latent Semantic Analysis. Expert Systems with Applications. Elsevier Ltd. https://doi.org/10.1016/j.eswa.2014.08.035

Lai KH, Cheng TCE, Tang AKY (2010) Green retailing: factors for success. Calif Manage Rev 52(2):6–31. https://doi.org/10.1525/cmr.2010.52.2.6

Li X, Mai Z, Yang L, Zhang J (2020) Human resource management practices, emotional exhaustion, and organizational commitment–with the example of the hotel industry. J China Tour Res 16(3):472–486

Malik SY, Mughal YH, Azam T, Cao Y, Wan Z, Zhu H, Thurasamy R (2021) Corporate social responsibility, green human resources management, and sustainable performance: is organizational citizenship behavior towards environment the missing link? Sustainability (switzerland) 13(3):1–24. https://doi.org/10.3390/su13031044

Masri HA, Jaaron AAM (2017) Assessing green human resources management practices in palestinian manufacturing context: an empirical study. J Clean Prod 143(February):474–489. https://doi.org/10.1016/j.jclepro.2016.12.087

Mehrajunnisa M, Jabeen F, Faisal MN, Mehmood K (2021) Prioritizing green HRM practices from policymaker’s perspective. Int J Organ Anal 30(3):652–678

Morel CM, Serruya SJ, Penna GO, Guimarães R (2009) Co-authorship network analysis: a powerful tool for strategic planning of research, development and capacity building programs on neglected diseases. PLoS Negl Trop Dis 3(8):e501

Muisyo PK, Qin Su (2021) Enhancing the FIRM’S green performance through green HRM: the moderating role of green innovation culture. J Clean Prod 289(March):125720. https://doi.org/10.1016/j.jclepro.2020.125720

Neto AS, Jabbour CJC, Jabbour ABLDS (2014) Green training supporting eco-innovation in three Brazilian companies: practices and levels of integration. Ind Commer Train 46(7):387–392. https://doi.org/10.1108/ICT-02-2014-0010

Ogbeibu S, Emelifeonwu J, Senadjki A, Gaskin J, Kaivo-oja J (2020) Technological turbulence and greening of team creativity, product innovation, and human resource management: implications for sustainability. J Clean Prod 244:118703

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L et al (2021) The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev 10(1):1–11. https://doi.org/10.1186/s13643-021-01626-4

Paulet R, Holland P, Morgan D 2021 A meta-review of 10 years of green human resource management: is green HRM headed towards a roadblock or a revitalisation?. Asia Pacific Journal of Human Resources. John Wiley and Sons Inc. https://doi.org/10.1111/1744-7941.12285 .

Pham NT, Hoang HT, Phan QPT (2020) Green human resource management: a comprehensive review and future research agenda. Int J Manpow 41(7):845–878. https://doi.org/10.1108/IJM-07-2019-0350

Rajabpour E, Fathi MR, Torabi M (2022) Analysis of factors affecting the implementation of green human resource management using a hybrid fuzzy AHP and type-2 fuzzy DEMATEL approach. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-022-19137-7

Rajiani I, Musa H, Hardjono B (2016) Ability, motivation and opportunity as determinants of green human resources management innovation. Res J Bus Manag 10(1–3):51–57. https://doi.org/10.3923/rjbm.2016.51.57

Ramos J (2003) Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (vol. 242, No. 1, pp. 29–48)

Rani M, Dhar AK, Vyas OP (2017) Semi-automatic terminology ontology learning based on topic modeling. Eng Appl Artif Intell 63(August):108–125. https://doi.org/10.1016/j.engappai.2017.05.006

Ren S, Tang G, Jackson SE (2018) Green human resource management research in emergence: a review and future directions. Asia Pac J Manag 35(3):769–803

Renwick D, Redman T, Maguire S (2008) Green HRM: a review, process model, and research agenda. University of Sheffield Management School Discussion Paper 1:1–46

Google Scholar  

Renwick DWS, Redman T, Maguire S (2013) Green human resource management: a review and research agenda*. Int J Manag Rev 15(1):1–14. https://doi.org/10.1111/j.1468-2370.2011.00328.x

Rondinelli DA, Berry MA (2000) Environmental citizenship in multinational corporations: social responsibility and sustainable development. Eur Manag J 18(1):70–84

Rubel MRB, Kee DMH, Rimi NN (2021) The influence of green HRM practices on green service behaviors: the mediating effect of green knowledge sharing. Employee Relations: The International Journal.

Saifudin A, Havidz Aima M, Sutawidjaya AH, Sugiyono. (2021) Hospital digitalization in the era of Industry 4.0 based on GHRM and service quality. Int J Data Netw Sci 5(2):107–114. https://doi.org/10.5267/j.ijdns.2021.2.004

Saifulina N, Carballo-Penela A, Ruzo-Sanmartín E (2020). Sustainable HRM and green HRM: The role of green HRM in influencing employee pro-environmental behavior at work. Sustain Environ Res 2(3)

Sarkis J, Gonzalez-Torre P, Adenso-Diaz B (2010) Stakeholder pressure and the adoption of environmental practices: the mediating effect of training. J Oper Manag 28(2):163–176

Shafaei A, Nejati M, Yusoff YM (2020a) Green human resource management: a two-study investigation of antecedents and outcomes. Int J Manpow 41(7):1041–1060. https://doi.org/10.1108/IJM-08-2019-0406

Shafaei A, Nejati M, Yusoff YM (2020b) Green human resource management: a two-study investigation of antecedents and outcomes TT - green human resource management. Int J Manpow 41(7):1041–1060. https://doi.org/10.1108/IJM-08-2019-0406

Shahriari B, Hassanpoor A, Navehebrahim A, Jafarinia S (2019) Kyushu University Institutional Repository A systematic review of green human resource management A Systematic Review of Green Human Resource 1 Management. EVERGREEN Joint Journal of Novel Carbon Resource Sciences & Green Asia Strategy 6(2):177–189

Sharma C, Ahmad S, Singh S (2022) Impact of human resource practices on individual and organization growth. International Conference on Decision Aid Sciences and Applications (DASA) 2022:937–941. https://doi.org/10.1109/DASA54658.2022.9765203

Singh VK, Singh P, Karmakar M, Leta J, Mayr P (2021) The journal coverage of Web of Science, Scopus and Dimensions: a comparative analysis. Scientometrics 126(6):5113–5142

Soomro MM, Wang Y, Tunio RA, Aripkhanova K, Ansari MI (2021) Management of human resources in the green economy: does green labour productivity matter in low-carbon development in China. Environ Sci Pollut Res 28(42):59805–59812

Suharti L, Sugiarto A (2020) A qualitative study of green hrm practices and their benefits in the organization: an Indonesian company experience. Bus: Theor Pract 21(1):200–211. https://doi.org/10.3846/btp.2020.11386

Tayali EM, Sakyi KA (2020) Reputable relevant realistic reliable and rigorous human resource management strategic approaches and practices in the 21st century. Adv Soc Sci Res J 7(6):600–621

Teixeira AA, Jabbour CJC, Jabbour ABLDS, Latan H, Oliveira JHCD (2016) Green training and green supply chain management: evidence from brazilian firms. J Clean Prod 116(March):170–176. https://doi.org/10.1016/j.jclepro.2015.12.061

Tseng M-L, Islam MS, Karia N, Fauzi FA, Afrin S (2019) A literature review on green supply chain management: trends and future challenges. Resour Conserv Recycl 141:145–162

Úbeda-Garc\’\ia M, Claver-Cortés E, Marco-Lajara B, Zaragoza-Sáez P (2021) Corporate social responsibility and firm performance in the hotel industry. the mediating role of green human resource management and environmental outcomes. J Bus Res 123:57–69

UNCC. 1997. “Kyoto Protocol.” 1997. https://unfccc.int/kyoto_protocol .

UNEP. 2020. “UNEP.” 2020. https://www.unep.org/emissions-gap-report-2020 .

UNFCC. 2007. “Bali Climate Change Conference.” 2007. https://unfccc.int/process-and-meetings/conferences/past-conferences/bali-climate-change-conference-december-2007/bali-climate-change-conference-december-2007-0 .

Victor DG 2011 The Collapse of the Kyoto Protocol and the Struggle to Slow Global Warming. Princeton University Press

Vuong B, Sid S (2020) The impact of human resource management practices on employee engagement and moderating role of gender and marital status: an evidence from the Vietnamese banking industry. Manag Sci Lett 10(7):1633–1648

Wehrmeyer W (2017) Greening people: Human Resources and Environmental Management. Routledge

Wu HC, Luk RWP, Wong KF, Kwok KL 2008. “Interpreting TF-IDF term weights as making relevance decisions. ACM Trans Inform Syst 26 (3). https://doi.org/10.1145/1361684.1361686 .

Xiang L, Yang YC 2020 Factors influencing green organizational citizenship behavior.” Social Behavior and Personality 48 (9): ready. https://doi.org/10.2224/SBP.8754

Yalcinkaya M, Singh V (2015) Patterns and trends in building information modeling (BIM) research: a latent semantic analysis. Autom Constr 59(November):68–80. https://doi.org/10.1016/j.autcon.2015.07.012

Yong JY, Yusliza MY, Fawehinmi OO (2020a) Green human resource management: a systematic literature review from 2007 to 2019. Benchmarking 27(7):2005–2027. https://doi.org/10.1108/BIJ-12-2018-0438

Yong JY, Yusliza MY, Jabbour CJC, Ahmad NH (2020b) Exploratory cases on the interplay between green human resource management and advanced green manufacturing in light of the ability-motivation-opportunity theory. J Manag Dev 39(1):31–49. https://doi.org/10.1108/JMD-12-2018-0355

Yusliza MY, Tanveer MI, Fawehinmi OO, Yong JY, Ahmad A (2019) Systematic literature review on green human resource management: Green health, safety and welfare as new dimension. In Proceedings of the 33rd International Business Information Management Association Conference, IBIMA 2019: Education Excellence and Innovation Management through Vision 2020. International Business Information Management Association, IBIMA,  pp 181–191

Zaid AA, Jaaron AAM, Bon AT (2018) The impact of green human resource management and green supply chain management practices on sustainable performance: an empirical study. J Clean Prod 204:965–979. https://doi.org/10.1016/j.jclepro.2018.09.062

Zhu J, Liu W (2020) A tale of two databases: the use of Web of Science and Scopus in academic papers. Scientometrics 123(1):321–335. https://doi.org/10.1007/s11192-020-03387-8

Zia Y, Farhad A, Bashir F, Qureshi KN, Ahmed G (2020) Content-based dynamic superframe adaptation for internet of Medical Things. Int J Distrib Sens Netw 16(2):1550147720907032

Download references

Author information

Authors and affiliations.

Chitkara University, Solan, Himachal Pradesh, India

Chetan Sharma

Chitkara Business School, Chitkara University, Rajpura, Punjab, India

Sumit Sakhuja & Shivinder Nijjer

You can also search for this author in PubMed   Google Scholar

Contributions

Chetan Sharma collected the data from Scopus and preprocessed the data for further analysis of data using the KNIME tool. In addition, this author does network analysis using the VOSviewer tool. He also contributed as a writer of the paper. Dr. Sumit Sakhuja contributed the workflow to execute the analysis. First, this author analyzed the KNIME tool. Dr. Shivinder Nijjer contributed to writing the paper, and after completing the paper, all proofreading and errors were removed by this author.

Corresponding author

Correspondence to Chetan Sharma .

Ethics declarations

Ethics approval.

We adhere to all ethics as per journal policy.

Consent to participate

Not applicable.

Consent for publication

I consent to publish this article on behalf of all authors.

Competing interests

The authors declare no competing interests.

Additional information

Responsible Editor: Philippe Garrigues

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Sharma, C., Sakhuja, S. & Nijjer, S. Recent trends of green human resource management: Text mining and network analysis. Environ Sci Pollut Res 29 , 84916–84935 (2022). https://doi.org/10.1007/s11356-022-21471-9

Download citation

Received : 13 August 2021

Accepted : 10 June 2022

Published : 05 July 2022

Issue Date : December 2022

DOI : https://doi.org/10.1007/s11356-022-21471-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Green human resource management
  • Latent semantic analysis
  • Text mining
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. Research trends in text mining: Semantic network and main path analysis

    Highlights • Semantic network and main path analysis were conducted on 1856 studies on text mining. • Using text mining as research topic or method has increased fast and widely applied. • Revealed keywords of text mining study in the 1980s and 1990s, the 2000s, the 2010s. • Identified which papers make a significant academic contribution on text mining.

  2. The application of text mining methods in innovation research: current

    This involved taking stock of text mining applications in the field of innovation research to date by means of a systematic review of 124 journal articles employing text mining techniques and are published in a basket of the 10 premier innovation management and 8 top general management journals.

  3. Text Preprocessing for Text Mining in Organizational Research: Review

    Abstract Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of ...

  4. Text mining for social science

    Text analysis is an approach that lends itself to the use of computational methods especially well. Sociology has a long history of using computer-assisted text analysis in both quantitative and qualitative research traditions.

  5. Review on Text Mining: Techniques, Applications and Issues

    Due to the rapid rise of digital data collection techniques, a vast amount of data has become available. Unstructured and unsaturated data account for more than 85 percent of present data. Finding acceptable trends and patterns to interpret text documents from huge amounts of data is a major challenge. The process of extracting valuable and nontrivial patterns from large amounts of text ...

  6. Twitter and Research: A Systematic Literature Review Through Text Mining

    This study systematically mines a large number of Twitter-based studies to characterize the relevant literature by an efficient and effective approach. This study collected relevant papers from three databases and applied text mining and trend analysis to detect semantic patterns and explore the yearly development of research themes across a ...

  7. Trend Analysis in Machine Learning Research Using Text Mining

    In this paper, text mining methods are applied to detect trends of terms that occur in the research articles and how they varies over time. We collected21,906 scientific papers from six top journals in the field of machine learning published in period 1988-2017 and analyzed them using text mining.

  8. Opportunities and challenges of text mining in materials research

    Summary Research publications are the major repository of scientific knowledge. However, their unstructured and highly heterogenous format creates a significant obstacle to large-scale analysis of the information contained within. Recent progress in natural language processing (NLP) has provided a variety of tools for high-quality information extraction from unstructured text. These tools are ...

  9. Text mining in unstructured text: techniques, methods and analysis

    Text mining is a process to extract interesting and significant patterns to explore knowledge from textual data sources. Approximately, 90% of world's data is held in unstructured format.

  10. Text Mining in Education—A Bibliometrics-Based Systematic Review

    We analyse the metadata of all publications that use text mining or natural language processing in educational settings to report on the key themes of application of text mining methods in educational studies providing an overview of the current state of the art and the future directions for research and applications.

  11. Text Mining and Text Analytics of Research Articles

    Tremendous increase in published articles and research papers in recent years, requires analysis of the patterns and trends in the structured as well as unstructured data. Text mining is useful to researchers, scientists, academics for this purpose. The objective of this research paper is to analyse the use of text mining techniques, and to explore recent developments in the field of design ...

  12. text mining Latest Research Papers

    Find the latest published documents for text mining, Related hot topics, top authors, the most cited documents, and related journals

  13. Comprehensive review of text-mining applications in finance

    Text-mining technologies have substantially affected financial industries. As the data in every sector of finance have grown immensely, text mining has emerged as an important field of research in the domain of finance. Therefore, reviewing the recent literature on text-mining applications in finance can be useful for identifying areas for further research. This paper focuses on the text ...

  14. (PDF) Text Preprocessing for Text Mining in Organizational Research

    Past methodological papers have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent.

  15. PDF Text mining methodologies with R: An application to central bank texts

    well as new challenges, both to researchers and research institutions. In this paper, we review several existing methodologies for analyzing texts and introduce a formal pro-ce s of applying text mining techniques using the open-sourc

  16. Research trends in text mining: Semantic network and main path analysis

    Additionally, influential papers have been recently published in fields such as architecture and social ecology revealing the wide scope of text mining. This article presents an understanding of previously unexplored research trends in text mining and how these trends shed light on the most influential academic papers in the field.

  17. Using Text Mining Techniques for Extracting Information from Research

    The primary goals of this research are (1) Using text mining techniques for. identifying the topics of a scienti fic text related to ML research and developing a. hierarchical and evolutionary ...

  18. Technology and Big Data Are Changing Economics: Mining Text to Track

    Janet Currie & Henrik Kleven & Esmée Zwiers, 2020. "Technology and Big Data Are Changing Economics: Mining Text to Track Methods," AEA Papers and Proceedings, vol 110, pages 42-48. citation courtesy of. In addition to working papers, the NBER disseminates affiliates' latest findings through a range of free periodicals — the NBER Reporter ...

  19. Text Mining Approaches for Biomedical Data

    She has published papers in journals and conferences of national and international repute along with invited talks within and outside India. Her research interests include text mining, web mining, natural language processing, information retrieval, sentiment analysis, artificial intelligence, and machine learning. Currently, her focus is on ...

  20. Using text mining for study identification in systematic reviews: a

    Methods We conducted a systematic review of research papers on applications of text mining to assist in identifying relevant studies for inclusion in a systematic review. The protocol can be sent on request by the authors. Information management All records of research identified by searches were uploaded to the specialist systematic review software, EPPI-Reviewer 4, for duplicate stripping ...

  21. Applications of text mining in services management: A systematic

    The importance of text mining is increasing in services management as the access to big data is increasing across digital platforms enabling such services. This study adopts a systematic literature review on the application of text mining in services management. First, we analyzed the literature on which has used text mining methods like ...

  22. Twitter and Research: A Systematic Literature Review Through Text Mining

    This study collected relevant papers from three databases and applied text mining and trend analysis to detect semantic patterns and explore the yearly development of research themes across a ...

  23. Recent trends of green human resource management: Text mining and

    The study applies text mining, latent semantic analysis (LSA), and network analysis to explore the trends in the research field in GHRM and establish the relationship between the quantitative and qualitative literature of GHRM. The study has been carried out using KNIME and VOSviewer tools.

  24. Full article: Identifying the optimal number of topics in text mining

    Introduction. The scientific literature has surged exponentially with a growth rate of 8-9% per year over the past decades (Bornmann and Mutz Citation 2015).With the number of new scientific papers exceeding 5.0 million in 2022, the scientific communities are easily overloaded with information.

  25. Text mining methodologies with R: An application to central bank texts

    This paper reviews the most common text mining methodologies with R. •. We offer a detailed step-by-step tutorial to analyze central bank texts. •. Comprehensive code excerpts and examples of output are provided. •. Examples include text cleaning, sentiment analysis, and topic modeling.