Disease transmission control
Lockdown, social-distancing, quarantine centers, hospitalization
High
High
Source: [ [58] , [59] , [60] , [61] , [62] , [63] , [64] , [65] , [66] , [67] , [68] , [69] , [70] , [71] , [72] , [73] , [74] , [75] , [76] , [77] , [78] , [79] , [80] , [81] , [82] , [83] , [84] , [85] , [86] , [87] , [88] , [89] , [90] , [91] , [92] , [93] , [94] , [95] , [96] , [97] , [98] , [99] , [100] , [101] , [102] , [103] , [155] ], [ [104] , [105] , [106] , [107] ].
Interestingly, many responses, such as hoarding, harassment, or discrimination, are noted in both developed and developing nations, not just in the beginning but even after a year in the second wave. Besides, occasional incidents were also reported, such as the claim suggesting 5G networks accelerate the spread of COVID-19, which led to the attacks on several 5G phone masts in the UK. In India, pets were killed, and pet lovers were attacked due to WhatsApp messages spreading misinformation that ‘animals spread coronavirus’ [ 72 ]. Many African countries, such as Nigeria, a multi-ethnic, multi-cultural and multi-religious country, reported vaccine hesitancy, especially within religious belief systems that see causation as coincidences rather than finding answers to phenomena that seem coincidental [ 108 ]. The impacts were certainly beyond these reported events and would require further studies to get the complete scenario. The rapid flow of misinformation led WHO and various countries to take several actions to control rumors and myths. In countries affected by infodemics, such as India and Bangladesh, the health authorities also issued notices for limiting the COVID communications on social platforms to be followed by legal actions [ 95 , 109 ]. Despite several efforts to bust the myths, the gap in actual and perceived risk continued, affecting response at different levels.
The WHO, in its six-point action plan, addresses the public in its very first point and notes that “the public must be effectively prepared for the critical measures that are needed to help suppress the spread and protect vulnerable groups, like the elderly and those with underlying health conditions” [ 153 ]. The reality portrayed a different scenario. In many countries, top-down communication and information flow can be linked with the gaps in response that occurred at the local level. Various extreme measures suggested and implemented across different countries caught the people unprepared to either understand or adhere to the extreme conditions imposed. The following paragraphs discuss some of these measures:
Lockdown: The administrative risk communication suggesting lockdown was purely focused on the need to control COVID-19 transmission without adequate explanations for its possible impacts or responses to other related uncertainties. On the other hand, the local population feared the uncertainties relating to food, employment, and their overall future beyond the crisis [ 110 ]. In India, thousands of laborers were stranded without work, money, or any other option except to return to their hometown due to the total lockdown in India declared on March 24, 2020. The concerns of people ranged from affording expenses of childrens' education fees to feeding their families in both short and long-term. As quoted by Prakash, an auto-rickshaw driver in Kerala (the first state in India to report COVID-19), the “virus doesn't worry me much as the uncertainty that awaits on the other side of the crisis” [ 111 ]. The decision of lockdown caused a mass exodus of laborers, industrial workers, and unorganized sector employees from the megacities like Delhi and Mumbai to rural areas across the country, which indicated both unpreparedness and inadequacy of risk communication of measures taken or promised to the affected population [ 112 ]. Along with a fear of hunger and loss of livelihood, many people also carried the risk of spreading COVID-19 to distant rural areas. A similar trend of domestic migration is also noted in Bangladesh, Malaysia, and New Zealand. However, in contrast to New Zealand, in developing countries like India or Bangladesh, the people are given very limited time to travel or make arrangements to follow the guidelines of lockdown or social distancing [ [84] , [112] , [113] ]). Many countries, including India and Taiwan, adopted partial lockdown during the second wave attributed to widespread social and economic impacts. Even in remote indigenous communities, small business and tourism were ceased in Taiwan. By realizing how insufficient health and medicine resources are in the marginal area, local communities adopted strict attitudes to prevent people from entering the community, including their younger generations studying and working in the cities. Public participation in implementing governmental instructions in Taiwan is noted to be very high. A similar situation is reported in African indigenous communities, which indicates varying levels of trust and cooperation between the local communities and government in different countries that are not often addressed in global risk communications.
Quarantine: Quarantine measures are vital in controlling the spread of COVID-19, as they reduce social interaction, maintain physical distancing to prevent spread, and also help to facilitate the contact tracing processes needed to limit outbreak cluster growth [ 114 ]. However, compliance with quarantine orders requires high levels of trust and confidence in officials, as well as adequate risk communication to develop an understanding of the risk posed by breaking quarantine [ 115 ]. Unfortunately, inadequately planned communication also led to the loss of trust in the planned response by the government. As the governments placed quarantine measures for public safety, even the educated crowd and professionals chose to run away, as seen in India and Nigeria. On March 11, 2020, three out of the 17 pilots and flight attendants who flew 14 Chinese medical doctors and medical supplies from China to Nigeria, boycotted the Lagos quarantine center provided by the Lagos State Government. The pilots left the quarantine center and went to their homes, despite knowing that it could put their family members and people with whom they may come in contact in grave danger [ 65 ]. Similarly, many migrating laborers in India in the state of Uttar Pradesh and Bihar broke the quarantine to go to a home in the absence of any police or security personnel to stop them [ 66 ]. While inadequate arrangements can be argued as a cause, such incidences also indicate a gap in risk communication about the safety of the affected individuals and their families while they stay in quarantine.
Social-distancing: The term social distancing refers to the practices of maintaining a greater physical distance from people, usually 6 feet or more, to avoid the spread of the disease . However, it comes with other challenges such as psychological fallouts or mental health problems or a decline in care for those who need it the most, e.g. elderly [ 116 ]. Besides, the solution is not applicable in many high-risk areas, e.g. the marginal communities residing in densely populated regions such as slum areas of Bangladesh [ 156 ]. These communities living in close proximity cannot maintain social distance despite official orders, which gives little detail of what to do or practice in such situations. The inadequate communication and understanding of people about social distancing were also reflected in the immediate rush and breach of social distancing to buy liquor in India as the government eased some of the lockdown conditions [ 117 ]. It clearly indicates that the communication didn't address the risk involved or how to manage it when services resume. Besides, the usage of the term also attracted reactions for its literal meaning, which is also acknowledged by the Risk Communication & Community Engagement Technical Officer WHO Regional Office for South-East Asia, who suggested that social distancing should be understood as physical distance and social connection [ 118 ] . However, real-time communication and practices didn't reflect this understanding on the ground level. Lack of community participation is witnessed in the fact that even the community leaders became sources of misinformation or breach of social distancing in many cases. In South Korea, a religious leader said God would protect people who attended the gathering and not be infected by COVID-19 [ 119 ]. However, this cluster of people was infected by the virus and became a key source of COVID-19 spreading in South Korea. Similar cases occurred in Malaysia, where two major clusters of infection originated from religious gatherings that ignored the Ministry of Health's advice. The first gathering was a three-day Islamic Tablighi gathering at the Sri Petaling mosque, Kuala Lumpur, held from February 27 to March 1, attended by nearly 16,000 people [ 120 ]. By April 11th, 2020, 40.2% of the cases in Malaysia were related to this Tablighi gathering . The gathering had even become the source of virus spread [ 121 ] internationally in Brunei, India, and Indonesia. In India, legal cases were registered against Tabligh participants for spreading the disease [ 83 ]. In Taiwan, on the other hand, ICTs is used to alert people for social distancing rules.
Vaccination: Throughout the COVID-19 exposure, vaccination remained an important point of discussion. The communication of pandemic initially projected herd immunity, which was soon replaced by the urgent need to vaccinate the entire world [ 122 , 123 ]. The formal process of vaccination could only start by the end of 2020. Although it was not made mandatory, it created misunderstandings and apprehensions among people and government agencies for those without vaccination. The reason for vaccine hesitancy, on the one hand, was apparent due to untimely deaths without COVID-19 or other diseases after vaccination [ 124 ]. On the other hand, governments kept on vaccinating people with and without being fully approved by the WHO [ 125 ]. It also led to a drive for building trust in COVID-19 vaccination. Many governments used multiple channels, social media, and ICTs to alter public perception and implement vaccination to the entire population. A survey revealed that more than 80% of the Taiwanese people approved the government's efficacy for handling the crisis (Wang et al., 2020). And for vaccination, although there were rumors about severe side effects and high mortality rate after vaccination, the CECC, together with third parties of experts in medical science, have voiced to correct the rumors and justified the effectiveness of the vaccines and the importance of getting herd immunity for the society. As of October 2021, the coverage of first-dose in Taiwan has reached 70%. India also completed and celebrated the mark of 1 billion vaccination by the end of October 2021[ 69 ]. This left the gap in addressing the concerns of people who lost lives either due to COVID19 extreme measures or vaccination.
The aforementioned examples highlight that while the public was the main target of the risk communication, they were excluded from the formal risk communication process. The communications also lacked clarity of their role as a key stakeholder in risk communication beyond the expectations that they would follow the orders, which can be seen as a direct cause behind the info-demic and subsequent unintended impacts.
Public participation or involvement in the understanding, communication, and management of global risks is not just a recommendation for good governance but also crucial for its success [ 126 ]. Renn [ 127 ] identified four different types of risk communication, i.e. documentation, information, dialogue and involvement. Although all four types of communications were observed for COVID-19, the first two dominated the process across most countries. The dialogue and involvement of the public were not just insufficient at times but also discouraged in several instances. The scenario, however, varied across different countries depending on varied risk perceptions rooted in the complexity of the situation, past experiences, or socio-cultural context. The review of COVID-19 information flow highlighted some trends in the risk communication leading to differential impacts on the ground. The nature of risk communication as experienced in different countries can be classified into three broad categories (see Fig. 1 ):
Diagrammatic illustration of levels of risk communication and outcome.
Info-demic: Info-demic represents a situation of excessive risk communications, wherein multiple stakeholders share their perceptions, fears, knowledge, or thoughts about risk or response without much consideration to its overall impact. A mix of risk information from official and multiple unofficial sources creates confusion, fear, stress, and loss of trust. The impact of such communications is rather severe, such as suicides, harassment, xenophobia, or hoardings of essential goods, as noted in countries like India, Bangladesh, Nigeria or the USA (see Table 1 ). It is important to note that many of these incidents occurred in urban areas which became the epicentre of pandemic with excessive concentration of people, pre-existing inequalities and hightened socio-economic impacts of lockdown [ 128 ]. While the ‘public’ is seen as the dominant source of misinformation or info-demic, in many of these cases, the exclusion of the community as a responsible stakeholder in timely risk communication can also be an important cause. The excessive fear in the situation of very high uncertainty can be seen as a reason behind such outbursts of miscommunication. Further, the governments' interpretation and usage of the terms and war approach also added to this fear (see [ [129] , [130] ]). In this situation, the administration not only had to deal with the real cases of COVID-19 but also with several other issues that emerged from the miscommunication, including violence, large-scale unemployment or loss of trust in the government.
Ideal risk communication: Ideal risk communication represents a situation where varied risk communications from different stakeholders are aligned to resolve the issues associated with a hazard. Various governments tried to achieve this with or without sufficient public participation. While control measures can help manage risk for a short time span, it becomes problematic when hazard exposure is prolonged. Some of the countries, however, not only acted proactively but also encouraged public participation as a shared responsibility for risk communication and management. The Ubuntu philosophy of Africa is rather noted as a framework for dealing with COVID-19 in social psychology that is based on community consensus and participation [ 131 ]. It is noted that past experiences of dealing with Ebola and community participation not only helped west Africa and Democratic Republic of Congo in managing COVID-19 response but also found essential for avoiding misinformation during crisis [ 132 ]. Several efforts to enhance community participation are also observed in other countries. Mongolia's preliminary stakeholder engagement plan emphasized inclusive and culturally sensitive risk communication for various affected, interested and vulnerable groups with a clear operational procedure for grievances redress mechanism [ 2 ]. Sweden, on the other hand, is seen as an outlier in Europe when it adopted a different approach to deal with COVID-19 by allowing essential services to be open. However, the plan is not just backed by the local people, but the impact of the shared responsibility was witnessed in voluntary social distancing, reduced mobility and precautionary public behavior [ 157 ]. While the number of cases affected by COVID-19 is found to be high in the country, the response seems to have reduced side-effects of COVID-19 in terms of its impact on the economy or mental health [ [133] , [134] ]. New Zealand is another good example where local people are not only given time to prepare for a lockdown but physical movements were also allowed locally for better health and well-being [ 113 ], [ 135 ]. While which country's approach can be considered is ideal, it can be argued, however, that enhanced degree of community engagement as responsible stakeholder is an essential element for ideal risk communication.
Inadequate risk communications: This reflects a situation of inadequate communication giving little or no clarity about the hazard, impact, or measures to manage the risk. The situation leads to a high dependence on rumors that cause anxiety, fear, confusion or loss of trust among people. While the reason for inadequate risk communication could vary, such as high uncertainty or insufficient information, the gap in risk communication results in high exposure and loss of lives, as seen in Iran and Italy in the first wave [ 136 ]. While local vulnerabilities and situations can be seen as the cause behind the high mortality rate, the existence of gap in risk communication that can ensure trust in people about their safety cannot be denied.
As the public is frequently the ultimate target of risk communication, their understanding, concerns, role, and participation become the critical aspects of risk communication. Contrary to this, the information shared to them depends on the availability of information about the hazard, associated uncertainties and the previous knowledge of best practices to manage the situation that follows a certain direction of formal information flow, as noted in COVID-19 ( Table 2 ). The table highlights the change in the nature of information in terms of its quantity, emotions, uncertainty and understanding as it moves from the scientists assessing risks to various bodies understanding and communicating risks for its management. By the time information reaches the public, it is diversified and tends to be more confusing, high in emotion, and generate varied responses from different communities.
Tentative flow of risk communication across various stakeholders during COVID-19 to date.
A sudden emergence of information having the potential to disrupt normal life not only creates fear but also generates varied reactions from the public. In such a case, understanding the risk and even vulnerabilities only fulfils the partial purpose of risk communication. For example, the warning of health authorities regarding the vulnerability of the elderly population led to their further isolation and ageism [ 81 , 137 ]. Similarly, the increasing cases of racism during the pandemic led to the risk communication articles focusing on how to communicate without fueling anti-Chinese sentiments [ 138 ]. Although modification of the risk communication can help in managing a specific situation, it is difficult to address every problem for every section of the society. It is noted that despite communicating the risks and various efforts to bust the myths at different levels, the gaps tend to continue in terms of addressing all issues or reaching out to every community, particularly those which did not have access to the internet [ 152 ]; [ 97 ]. Studies argue that it is a mistake to consider ‘public’ as one stakeholder as it represents strata of varied socio-economic, cultural and political communities [ 27 ].
At the same time, there are examples of countries, which effectively engaged communities in risk communication with top-down apporach. For example, in Taiwan everyday press conference by the CECC authority, the deployment of ICTs during covid-19 has been further invented for information exposure and dissimilation of infection, resource allocation (such as where to get masks and tests nearby your neighborhood), social distancing and vaccination nationalwide. Here, apart from a top-down approach, a bottom-up mechanism was established through several smart phone apps to facilitate citizen participatory communication. Singapore is also noted globally for its high preparedness and successful risk communication, promoting a strong community engagement and less emphasis on extreme measures such as lockdown despite following a top-down approach [ 139 ].
The increasingly homogenizing response across the countries has an underlying assumption that the information shared about risk is likely to be received, understood, and responded in a similar manner with some modifications. Subsequently, the efforts are focused on either improvising the risk communication on the basis of overall feedback from the ground or busting individual myths (e.g. [ 27 ]. However, the compromised role of the public as a key stakeholder in risk communication not only creates a wider gap in the way the information is received but also how it is responded to, as seen in the cases of info-demic or inadequate risk communications. Besides, though the rapid transmission tends to bring valuable data such as concentrated impacts and measures, e.g. demography informed COVID-19 policy [ 140 ], it takes time for the governments to mobilize and make policy decisions at the national level. Adapting the messages and responses can be managed with ease and rigor at the local level by using a participatory approach.
At any given time, the local population tends to deal with various circumstances situated in a dynamic reality, wherein they are not only pressed by their day-to-day concerns but also exposed to multiple hazards [ 141 ]. Frequently, a specific risk communication doesn't address most of the other concerns people may be dealing with. For example, earthquakes in Croatia or Delhi that remind us of the vulnerability to other natural hazards that continue to exist while all the response mechanisms were focused on the pandemic [ 142 ]. In Delhi, the COVID-19 led to a complete closure of all public spaces, including park gates during lockdown, which left people confused without any option to move out of their houses or apartments, putting them at a higher risk of earthquakes. Such events, however, give little time or scope for improvisation of the risk communication, and the role of communities in managing local risks becomes all the more important. Studies note that a participative approach for risk communication can effectively trigger adaptive behaviors [ 143 ]. COVID-19 also created a situation of increased mental and emotional stress, which further suggests the significance of community for not just informing risks and responsible behavior for safety but also providing physical, psychological and emotional support during a disaster situation. The role of community is also essential in establishing trust in information and support provided by the government or international organizations like WHO attributed to varied impacts and socio-cultural responses [ 158 ].
Effective public participation is not only essential to deal with the info-demic but also for the effective use of the local indigenous knowledge and wisdom to deal with disasters. The value of public participation and indigenous knowledge has been recurrently emphasized in the disaster literature in the form of community-based disaster risk reduction [ 144 , 145 ]. However, it has yet to be explored in terms of risk communication. Although the use of ICTs has tremendously increased in informing the public about best practices and busting myths and rumors, its use for effective community engagement is still in its nascent stage and requires further research. It is essential that risk communication is interactive and inclusive as risk emerges in a complex socio-economic and political context and accordingly perceived and responded to [ 146 ]. To achieve this effect, it would require comprehensive planning and evaluation of risk communications that addresses the local context, constraints, and the local knowledge to facilitate effective and responsible community participation.
In this rapidly changing world, it is crucial that disaster risk communications address the increased public exposure and participation in the globalization of knowledge, economy, and information flow. While global guidelines are useful, they lack structures, specific guidelines, and to an extent, scope to encompass all possible diversities. Although countries choose their response and methods of risk communication, a gap is noted in the top-down risk communication that may overlook various local risks. Communities, on the other hand, not only face multiple risks but also receive information and risk messages from multiple sources that could induce confusion or dilemmas in their decision-making. Ensuring effective response of communities thus requires a shift from a top-down approach focusing on what should or should not be done to an exploratory and interactive risk communication that recognizes public emotions, builds trust, understands heuristics, and the socio-cultural context of power relations and cultural practices that affect the local response. For this, it is essential that risk communication for emergencies or disasters is inclusive and engages with communities at the local level, where new risks are formed and responded to.
Although several models have been developed for engaging communities and using a participatory approach for risk communication, there is limited research on their applications in different socio-cultural contexts. There is a need to call for further research on various structures and guidelines to engage communities as responsible stakeholders in the process of risk communication where they are not only informed about the risks but also empowered to make informed decisions that incorporate and respect local socio-economic and cultural diversity, varied risks, and the local governance system.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The authors are grateful to Dr Emma Hudson Doyle, World Social Science Fellow and Senior Lecturer at the Massey University, for her timely and detailed comments that helped to improve the paper. The authors are also grateful to Global Young Academy and International Science Council for providing the platforms for researchers to connect and encourage them to work together, which made this paper possible. The authors also thank the reviewers for their valuable comments and suggestions.
BMC Infectious Diseases volume 24 , Article number: 620 ( 2024 ) Cite this article
249 Accesses
Metrics details
Currently, several studies have observed that chronic hepatitis B virus infection is associated with the pathogenesis of kidney disease. However, the extent of the correlation between hepatitis B virus infection and the chronic kidney disease risk remains controversial.
In the present study, we searched all eligible literature in seven databases in English and Chinese. The random effects model was used to conduct a meta-analysis. Quality of included studies was assessed using the Newcastle-Ottawa Quality Scale.
In this analysis, a total of 31 studies reporting the association between hepatitis B virus infection and chronic kidney disease risk were included. The results showed a significant positive association between hepatitis B virus infection and the risk of chronic kidney disease (pooled OR , 1.20; 95% CI , 1.12–1.29), which means that hepatitis B virus increases the risk of developing chronic kidney disease.
This study found that hepatitis B virus infection was associated with a significantly increased risk of chronic kidney disease. However, the current study still cannot directly determine this causal relationship. Thus, more comprehensive prospective longitudinal studies are needed in the future to provide further exploration and explanation of the association between hepatitis B virus and the risk of developing chronic kidney disease.
Peer Review reports
Chronic kidney disease (CKD) is the primary non-infectious disease associated with high morbidity and mortality and is commonly defined as persistent urinary abnormalities, structural abnormalities, or impaired renal excretory function [ 1 , 2 ]. When diagnosed with CKD, kidney function gradually declines and progresses to end-stage renal disease (ESRD) with irreversible damage [ 3 ]. It is estimated that patients with CKD account for more than 10% of the world’s population, and the prevalence increases with age [ 4 , 5 ]. In addition, the researchers found that both morbidity and mortality from CKD have risen dramatically over the past 30 years, and that this upward trend will continue through 2029 [ 6 ]. Therefore, CKD is considered a growing global public health problem.
Currently, about 296 million people worldwide are infected with hepatitis B virus (HBV), which is the main cause of cirrhosis and liver cancer [ 7 ]. Besides the effects on the liver, several studies have found that chronic HBV infection is associated with the pathogenesis of kidney diseases such as polyarteritis nodosa (PAN) catheterization and glomerulonephritis (GN) [ 8 ]. Recently, an increasing number of studies have been conducted on the relationship between HBV infection and CKD. However, the extent of the association between the two remains controversial. A large U.S. cohort study found that HBV infection was associated with an increased risk of developing CKD and ESRD [ 9 ]. However, a cross sectional study based on a Chinese population did not find any direct relationship between HBV infection and the risk of developing CKD [ 10 ]. Recently, a meta-analysis showed that HBV infection is related to an increased risk of CKD in the general adult population [ 11 ]. The recently publication on the relationship between HBV and CKD provides an opportunity to assess again the association between HBV and CKD, which may provide additional scientific evidence [ 12 , 13 , 14 ]. Therefore, in this study, we assessed the association between HBV and the risk of CKD prevalence in the general adult population through a meta-analysis of observational studies.
Literature search strategy.
All relevant studies up to March 20, 2023 were searched all eligible literature in seven databases in English and Chinese, including Chinese National Knowledge Infrastructure (CNKI), China Science, Wanfang and Technology Journal (VIP), PubMed, Web of Science, Embase databases and Cochrane Library. The search terms included “hepatitis B virus infection”, “chronic hepatitis B”, “HBV”, “chronic kidney disease “, “CKD”, and “chronic renal insufficiency”. The search formulas have been adjusted to the requirements of each database separately. Besides the above search methods, manual searches were performed for references to reviews and original articles. Supplementary Material 1 shows in detail the specific search formulas used for each database.
There were no language limitations for studies included in the analysis, but review articles, abstracts, reviews, letters, and articles without complete text or valid data were excluded. When more than one study reported similar data, the most recent study was included in this analysis. In addition, for inclusion, the following requirements were met: (a) the type of study design was a cohort study, case-control study, or cross-sectional study; (b) HBV infection is defined as detection of HBsAg in serum and/or HBV DNA by PCR [ 15 ]; (c) the study outcome was the incidence or prevalence of CKD (glomerular filtration rate (GFR) < 60 mL/min/1.73 m 2 or albuminuria ≥ 30 mg/24 hours) or ESRD or composite renal outcome due to CKD [ 1 ]; (d) an adjusted risk estimates or sufficient data to calculate the above metrics.
Information was independently extracted from the retrieved literature by two authors according to the inclusion exclusion criteria. When disagreements arose, they were analyzed and resolved by a third researcher. Information extracted from the literature included mainly (a) the sample size of the study, (b) details of the study design, (c) patient characteristics, (d) outcome indicators as defined above.
The quality of the included 20 case-control studies and cohort studies was assessed using the Newcastle-Ottawa Quality Scale (NOS) [ 16 ]. The NOS scoring criteria included three main components: selection of study subjects, comparability between groups, and outcome/exposure assessment. Points were assigned when the information contained in the articles matched the scale description. Of these, those scoring below 4 were classified as low-quality studies, those scoring 5–6 as moderate-quality studies, and those scoring above 7 as high-quality studies. In addition, the quality of the 11 included cross-sectional studies was assessed according to the adapted NOS [ 17 ]. Studies with scores of 6–10, 4–5, or 0–3 were rated as high quality, moderate quality, and low quality, respectively. Only articles rated as moderate and high quality were included in the meta-analysis.
A meta-analysis of the included literature was performed using Stata 17.0 software. Odds risks ( OR ) or hazard ratios ( HR ) and their 95% confidence intervals ( CI ) were used to estimate effect sizes. Meanwhile, the I 2 statistic and Q test were used to assess possible heterogeneity between different study results. Included studies were considered to have large heterogeneity when I 2 ⩾ 50% or P < 0.05. When study heterogeneity existed, a random effects model was used to calculate pooled effect sizes. Conversely, a fixed-effects model was used. Besides, when there was significant heterogeneity across studies, meta-regression and subgroup analysis were used to explore the sources of heterogeneity. Also, sensitivity analysis was performed using the one-by-one exclusion method. Begg’s test, Egger’s test, and funnel plot were used to assess the potential publication bias of the included literature. All P -values were obtained in a two-sided test.
A total of 12,801 studies were collected by a search of seven Chinese and English databases and a manual search of references. The retrieved articles were managed using EndNote software. The literature was selected based on inclusion and exclusion criteria, and a total of 31 studies were eligible [ 10 , 12 , 13 , 14 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ], which included three manually searched articles. Among them, studies by Hwang JC et al. [ 25 ] and Tartof SY et al. [ 37 ] were included for the first time in 2019 [ 45 ], whereas Chen YC et al. [ 19 ] were included for the first time in 2020 [ 11 ] in the systematic review and meta-analysis. Finally, 11 of the included articles were cross-sectional studies, 16 were cohort studies, and the remaining 4 were case-control studies. Figure 1 shows the specific process of literature screening. The general characteristics of the final included studies are shown in Table 1 .
Flowchart of the selection of studies for inclusion in the meta-analysis
According to the NOS quality assessment of the included literature, a total of 13 case-control or cohort studies and 11 cross-sectional studies were considered to be of high quality, and the other 7 included cohort studies were of moderate quality. The proportion of high-quality studies is 77.4% (24/31). Details used to rate the quality of the studies are shown in Supplementary Tables 1 and 2 .
A random-effects model was used to perform a meta-analysis of the 31 included studies reporting the association between HBV and CKD risk. As the result is shown in Fig. 2 , there was a significant positive association between HBV infection and the risk of CKD (pooled OR , 1.20; 95% CI , 1.12–1.29), which means that HBV infection increases the risk of developing CKD. Furthermore, a large statistical heterogeneity was found in this meta-analysis ( I 2 = 85.7%, P < 0.001).
Forest plot of association meta-analysis of HBV and CKD risk
Meta-regression analysis was performed on five factors including type of study, region, reference year, study outcome, and sample size of the included articles to explore sources of heterogeneity. The result is shown in Table 2 , and no heterogeneity was generated by including these five variables in the regression model simultaneously. In addition, subgroup analyses were conducted on the five factors mentioned above, and the results, as shown in Supplementary Figs. 1 – 5 , did not reveal a source of heterogeneity.
As shown in Fig. 3 , a sensitivity analysis of the included studies was performed using a case-by-case exclusion method to evaluate the impact of individual studies on the newly generated pooled OR . As the results showed, the results of the meta-analysis were comparatively stable after excluding any of the studies, and ranged from 1.17 (95% CI , 1.09–1.26) to 1.22 (95% CI , 1.13–1.31). The P values of the regression tests of Egger and Begg used to test for publication bias were 0.862 and 0.139, which were consistent with the results suggested by the funnel plot (Fig. 4 ), and there was no publication bias in this study.
Sensitivity analysis of the association between HBV and CKD risk
Funnel plot of the association between HBV and CKD risk
Over the past few decades, a strong link between HBV and kidney disease has been known to exist [ 46 , 47 ]. However, controversy remains regarding the relationship between HBV infection and CKD risk. This study summarized and pooled the relevant existing studies to perform a meta-analysis of the risk of CKD in the adult general population infected with HBV. The results showed that people infected with HBV had a higher risk of developing CKD compared to those who were not infected with HBV (pooled OR , 1.20; 95% CI , 1.12–1.29). Also, no literature was observed in the sensitivity analysis that had a significant impact on the study results, and no publication bias was observed.
Increasingly, studies have examined the relationship between HBV infection and the risk of CKD prevalence. Several previous meta-analyses have not observed a significant correlation between HBV infection and risk of CKD prevalence, with pooled effect estimates and their 95% CIs were 1.05 (0.56, 1.98) and 2.22 (0.95; 3.50), respectively [ 17 , 48 ]. A recently published meta-analysis by Fabrizi F et al. found that HBV infection increased the risk of CKD ( OR , 1.19; 95% CI 1.11–1.27) [ 11 ]. A recently published case-control study based on a Chinese population found that HBV infection promoted an increased risk of CKD ( OR , 2.099; 95% CI 1.128–3.907) [ 14 ]. In this study, we found that HBV infection was associated with an increased risk of developing chronic kidney disease, which is consistent with the results of a recently published meta-analysis.
Unfortunately, our analysis found substantial heterogeneity in prior published studies ( I 2 = 85.7%, P < 0.001). In order to explore sources of heterogeneity, heterogeneity was assessed using meta-regression and subgroup analyses. However, study type, region, reference year, study outcome, and sample size were not sources of heterogeneity. Although studies providing adjusted outcome estimates ( HR / OR ) were included in our study, there may still be residual confounding factors. Therefore, sources of article heterogeneity could not be easily excluded. Meanwhile, because complete covariate information was not given across studies, we were unable to conduct a more comprehensive exploration of the sources of heterogeneity. For example, the specific inclusion and exclusion criteria for studies included in the literature may vary, which may account for the high degree of heterogeneity.
The mechanisms underlying the association between HBV and CKD development have not been fully elucidated. Nonetheless, the relationship between chronic HBV infection and kidney disease was reported in an article more than fifty years ago [ 49 ]. It has been suggested that the deposition of immune complexes in the kidney plays a key role in the pathogenesis of HBV-related nephropathy [ 50 ]. It is likely due to low molecular weight HBeAg (3 × 10 5 Da) crossing the glomerular basement membrane to form subepithelial immune deposits, which leads to glomerular and interstitial tubular damage and contributes to the decline in renal function [ 51 , 52 ]. Secondly, Deng et al. showed that excessive apoptosis of renal proximal tubular cells may also be associated with renal injury in patients with chronic HBV infection [ 53 ]. In addition, six nucleotide analogues (NAs) have been approved for the treatment of chronic HBV. Nevertheless, all NAs are excreted via the renal route and suffer from some degree of nephrotoxicity [ 54 ]. Therefore, dosing adjustments should be made according to the overall clinical status of chronic HBV infection to avoid causing renal impairment [ 55 ].
Our study has several advantages. Firstly, this study synthesizes several recently published large studies on the relationship between HBV infection and the risk of CKD, and provides more reliable evidence. Secondly, the study area included Asia, Europe, and the Americas, which can better represent the international research landscape. Generally, the results of our meta-analysis are similar to related articles recently published by other scholars.
Nevertheless, there are some limitations to this study. Firstly, the included studies contained a large proportion of case-control studies and cohort studies, which may be subject to selection bias and recall bias. Secondly, the inclusion of a large proportion of cross-sectional studies in this study made it difficult to establish a causal association between HBV infection and risk of CKD. Thirdly, our subgroup analysis could not explain the source of heterogeneity. In addition, although this study developed strict inclusion and exclusion criteria and used the NOS scale to assess the quality of the included articles during the screening process, there was still a degree of subjectivity in the assessment of the literature.
In conclusion, this study found that HBV infection was associated with a significant increase in the risk of CKD. However, the current study still cannot directly determine this cause-and-effect relationship. Thus, more comprehensive prospective longitudinal studies are needed in the future to provide further exploration and explanation of the association between hepatitis B virus and the risk of developing chronic kidney disease.
Data sharing is not applicable to this paper as no datasets were generated or analyzed for this study.
95% confidence intervals
Chinese National Knowledge Infrastructure
End-stage renal disease
Glomerular filtration rate
Glomerulonephritis
Hazard ratios
Nucleotide analogues
Newcastle-Ottawa Quality Scale
Polyarteritis nodosa
Chen TK, Knicely DH, Grams ME. Chronic kidney disease diagnosis and management: a review. JAMA. 2019;322(13):1294–304.
Article CAS PubMed PubMed Central Google Scholar
Romagnani P, Remuzzi G, Glassock R, Levin A, Jager KJ, Tonelli M, et al. Chronic kidney disease. Nat Rev Dis Primers. 2017;3:17088.
Article PubMed Google Scholar
Feng X, Hou N, Chen Z, Liu J, Li X, Sun X, et al. Secular trends of epidemiologic patterns of chronic kidney disease over three decades: an updated analysis of the global burden of disease study 2019. BMJ Open. 2023;13(3):e064540.
Article PubMed PubMed Central Google Scholar
Liu W, Zhou L, Yin W, Wang J, Zuo X. Global, regional, and national burden of chronic kidney disease attributable to high sodium intake from 1990 to 2019. Front Nutr. 2023;10:1078371.
Hill NR, Fatoba ST, Oke JL, Hirst JA, O’Callaghan CA, Lasserson DS, et al. Global prevalence of chronic kidney disease - a systematic review and meta-analysis. PLoS ONE. 2016;11(7):e0158765.
Li Y, Ning Y, Shen B, Shi Y, Song N, Fang Y, et al. Temporal trends in prevalence and mortality for chronic kidney disease in China from 1990 to 2019: an analysis of the global burden of disease study 2019. Clin Kidney J. 2023;16(2):312–21.
Hsu YC, Huang DQ, Nguyen MH. Global burden of hepatitis B virus: current status, missed opportunities and a call for action. Nat Rev Gastroenterol Hepatol. 2023;20(8):524–37.
Cacoub P, Asselah T, Hepatitis B. Virus infection and extra-hepatic manifestations: a systemic disease. Am J Gastroenterol. 2022;117(2):253–63.
Article CAS PubMed Google Scholar
Geng XX, Tian Z, Liu Z, Chen XM, Xu KJ. Associations between hepatitis B infection and chronic kidney disease: 10-year results from the U.S. National Inpatient Sample. Enferm Infecc Microbiol Clin (Engl Ed). 2021;39(1):14–21.
Zhang H, Xu H, Wu R, Yu G, Sun H, Lv J, et al. Association of hepatitis C and B virus infection with CKD and impact of hepatitis C treatment on CKD. Sci Rep. 2019;9(1):1910.
Fabrizi F, Cerutti R, Donato FM, Messa P. HBV infection is a risk factor for chronic kidney disease: systematic review and meta-analysis. Rev Clin Esp. 2020;221(10):600–11.
Article Google Scholar
Geng XX, Tian Z, Liu Z, Chen XM, Xu KJ. Associations between hepatitis B infection and chronic kidney disease: 10-year results from the U.S. National Inpatient Sample. Enferm Infecc Microbiol Clin. 2021;39(1):14–21.
Lin S, Wang M, Liu Y, Huang J, Wu Y, Zhu Y, et al. Concurrence of HBV infection and non-alcoholic fatty liver disease is associated with higher prevalence of chronic kidney disease. Clin Res Hepatol Gastroenterol. 2021;45(2):101483.
Liu Y, Wang X, Xu F, Li D, Yang H, Sun N, et al. Risk factors of chronic kidney disease in chronic hepatitis B: a hospital-based case-control study from China. J Clin Transl Hepatol. 2022;10(2):238–46.
Jeng WJ, Papatheodoridis GV, Lok ASF. Hepat B Lancet. 2023;401(10381):1039–52.
Article CAS Google Scholar
Stang A. Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. Eur J Epidemiol. 2010;25(9):603–5.
Fabrizi F, Donato FM, Messa P. Association between hepatitis B virus and chronic kidney disease: a systematic review and meta-analysis. Ann Hepatol. 2017;16(1):21–47.
Cai J, Fan X, Mou L, Gao B, Liu X, Li J, et al. Association of reduced renal function with hepatitis B virus infection and elevated alanine aminotransferase. Clin J Am Soc Nephrol. 2012;7(10):1561–6.
Chen YC, Li CY, Tsai SJ, Chen YC. Nationwide cohort study suggests that nucleos(t)ide analogue therapy decreases dialysis risk in Taiwanese chronic kidney disease patients acquiring hepatitis B virus infection. World J Gastroenterol. 2018;24(8):917–28.
Chen YC, Su YC, Li CY, Hung SK. 13-year nationwide cohort study of chronic kidney disease risk among treatment-naïve patients with chronic hepatitis B in Taiwan Epidemiology and Health outcomes. BMC Nephrol. 2015;16:110.
Du Y, Zhang S, Hu M, Wang Q, Liu N, Shen H, et al. Association between hepatitis B virus infection and chronic kidney disease: a cross-sectional study from 3 million population aged 20 to 49 years in rural China. Med (United States). 2019;98(5):e14262.
CAS Google Scholar
Du Y, Zhang S, Hu M, Wang Q, Shen H, Zhang Y, et al. Prevalence of chronic kidney disease markers: evidence from a three-million married population with fertility desire in rural China. Sci Rep. 2017;7(1):2710.
Fang J, Li W, Tan M, Peng X, Tan Z, Wang W. Effect of different hepatitis B infection status on the prognosis of active lupus nephritis treated with immunosuppression: a retrospective analysis of 177 patients. Int J Rheum Dis. 2018;21(5):1060–7.
Hong YS, Ryu S, Chang Y, Caínzos-Achirica M, Kwon MJ, Zhao D, et al. Hepatitis B virus infection and development of chronic kidney disease: a cohort study. BMC Nephrol. 2018;19(1):353.
Hwang JC, Jiang MY, Lu YH, Weng SF. Impact of HCV infection on diabetes patients for the risk of end-stage renal failure. Med (Baltim). 2016;95(3):e2431.
Kim SE, Jang ES, Ki M, Gwak GY, Kim KA, Kim GA, et al. Chronic hepatitis B infection is significantly associated with chronic kidney disease: a population-based, matched case-control study. J Korean Med Sci. 2018;33(42):e264.
Kong XL, Ma XJ, Su H, Xu DM. Relationship between occult hepatitis B virus infection and chronic kidney disease in a Chinese population-based cohort. Chronic Dis Translational Med. 2016;2(1):55–60.
Lai T-S, Lee M-H, Yang H-I, You S-L, Lu S-N, Wang L-Y, et al. High hepatitis C viral load and genotype 2 are strong predictors of chronic kidney disease. Kidney Int. 2017;92(3):703–9.
Lee JJ, Lin MY, Chang JS, Hung CC, Chang JM, Chen HC, et al. Hepatitis C virus infection increases risk of developing end-stage renal disease using competing risk analysis. PLoS ONE. 2014;9(6):e100790.
Lee JJ, Lin MY, Yang YH, Lu SN, Chen HC, Hwang SJ. Association of hepatitis C and B virus infection with CKD in an endemic area in Taiwan: a cross-sectional study. Am J Kidney Dis. 2010;56(1):23–31.
Lin MY, Chiu YW, Lee CH, Yu HY, Chen HC, Wu MT, et al. Factors associated with CKD in the elderly and nonelderly population. Clin J Am Soc Nephrol. 2013;8(1):33–40.
Mocroft A, Neuhaus J, Peters L, Ryom L, Bickel M, Grint D, et al. Hepatitis B and C co-infection are independent predictors of progressive kidney disease in HIV-positive, antiretroviral-treated adults. PLoS ONE. 2012;7(7):e40245.
Nguyen MH, Lim JK, Burak Ozbay A, Fraysse J, Liou I, Meyer N, et al. Advancing age and comorbidity in a US insured population-based cohort of patients with chronic hepatitis B. Hepatology. 2019;69(3):959–73.
Senghore T, Su FH, Lin YS, Chu FY, Yeh CC. Association between hepatitis B virus infection and chronic kidney disease in university students receiving physical check-ups: a cross-sectional study. J Experimental Clin Medicine(Taiwan). 2013;5(5):181–6.
Si J, Yu C, Guo Y, Bian Z, Qin C, Yang L, et al. Chronic hepatitis B virus infection and risk of chronic kidney disease: a population-based prospective cohort study of 0.5 million Chinese adults. BMC Med. 2018;16(1):93.
Su SL, Lin C, Kao S, Wu CC, Lu KC, Lai CH, et al. Risk factors and their interaction on chronic kidney disease: a multi-centre case control study in Taiwan. BMC Nephrol. 2015;16:83.
Tartof SY, Hsu JW, Wei R, Rubenstein KB, Hu H, Arduino JM, et al. Kidney function decline in patients with CKD and untreated hepatitis C infection. Clin J Am Soc Nephrol. 2018;13(10):1471–8.
Vu V, Trinh S, Le A, Johnson T, Hoang J, Jeong D, et al. Hepatitis B and renal function: a matched study comparing non-hepatitis B, untreated, treated and cirrhotic hepatitis patients. Liver Int. 2019;39(4):655–66.
Zeng Q, Gong Y, Dong S, Xiang H, Wu Q. Association between exposure to hepatitis B virus and chronic kidney disease in China. J Int Med Res. 2014;42(5):1178–84.
Cheng AY, Kong AP, Wong VW, So WY, Chan HL, Ho CS, et al. Chronic hepatitis B viral infection independently predicts renal outcome in type 2 diabetic patients. Diabetologia. 2006;49(8):1777–84.
Huang JF, Chuang WL, Dai CY, Ho CK, Hwang SJ, Chen SC, et al. Viral hepatitis and proteinuria in an area endemic for hepatitis B and C infections: another chain of link? J Intern Med. 2006;260(3):255–62.
Ishizaka N, Ishizaka Y, Seki G, Nagai R, Yamakado M, Koike K. Association between hepatitis B/C viral infection, chronic kidney disease and insulin resistance in individuals undergoing general health screening. Hepatol Res. 2008;38(8):775–83.
Lo MK, Lee KF, Chan NN, Leung WY, Ko GT, Chan WB, et al. Effects of gender, helicobacter pylori and hepatitis B virus serology status on cardiovascular and renal complications in Chinese type 2 diabetic patients with overt nephropathy. Diabetes Obes Metab. 2004;6(3):223–30.
Zhang L, Zhang P, Wang F, Zuo L, Zhou Y, Shi Y, et al. Prevalence and factors associated with CKD: a population study from Beijing. Am J Kidney Dis. 2008;51(3):373–84.
Fabrizi F, Cerutti R, Ridruejo E. Hepatitis B virus infection as a risk factor for chronic kidney disease. Expert Rev Clin Pharmacol. 2019;12(9):867–74.
Baig S, Alamgir M. The extrahepatic manifestations of hepatitis B virus. J Coll Physicians Surg Pak. 2008;18(7):451–7.
PubMed Google Scholar
Lhotta K. Beyond hepatorenal syndrome: glomerulonephritis in patients with liver disease. Semin Nephrol. 2002;22(4):302–8.
Cai QC, Zhao SQ, Shi TD, Ren H. Relationship between hepatitis B virus infection and chronic kidney disease in Asian populations: a meta-analysis. Ren Fail. 2016;38(10):1581–8.
Combes B, Shorey J, Barrera A, Stastny P, Eigenbrodt EH, Hull AR, et al. Glomerulonephritis with deposition of Australia antigen-antibody complexes in glomerular basement membrane. Lancet. 1971;2(7718):234–7.
Ren J, Wang L, Chen Z, Ma ZM, Zhu HG, Yang DL, et al. Gene expression profile of transgenic mouse kidney reveals pathogenesis of hepatitis B virus associated nephropathy. J Med Virol. 2006;78(5):551–60.
Shah AS, Amarapurkar DN. Spectrum of hepatitis B and renal involvement. Liver Int. 2018;38(1):23–32.
Chan TM. Hepatitis B and renal disease. Curr Hepat Rep. 2010;9(2):99–105.
Deng CL, Song XW, Liang HJ, Feng C, Sheng YJ, Wang MY. Chronic hepatitis B serum promotes apoptotic damage in human renal tubular cells. World J Gastroenterol. 2006;12(11):1752–6.
Liaw YF, Raptopoulou-Gigi M, Cheinquer H, Sarin SK, Tanwandee T, Leung N, et al. Efficacy and safety of entecavir versus adefovir in chronic hepatitis B patients with hepatic decompensation: a randomized, open-label study. Hepatology. 2011;54(1):91–100.
Pipili C, Cholongitas E, Papatheodoridis G. Review article: nucleos(t)ide analogues in patients with chronic hepatitis B virus infection and chronic kidney disease. Aliment Pharmacol Ther. 2014;39(1):35–46.
Download references
The authors would like to express their gratitude to all participants for their cooperation.
This work was supported by the Natural Science Foundation of Fujian Province (No. 2020J01607) and Natural Science Foundation of Fujian Province (No. 2023J01628).
Danjing Chen, Rong Yu and Shuo Yin contributed equally to this work.
Department of Epidemiology and Health Statistics, Fujian Provincial Key Laboratory of Environment Factors and Cancer, School of Public Health, Fujian Medical University, Fuzhou, 350122, People’s Republic of China
Danjing Chen, Rong Yu, Shuo Yin, Wenxin Qiu, Jiangwang Fang & Xian-e Peng
Department of Epidemiology and Health Statistics, Key Laboratory of Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Ministry of Education, Fujian Medical University, Xuefu North Road 1st, Shangjie Town, Minhou Country, Fuzhou, Fujian, 350108, China
Xian-e Peng
You can also search for this author in PubMed Google Scholar
Study concept and design: PXE; Collection and assembly of data: CDJ, YR, YS and QWX; Data analysis and interpretation: CDJ, YR, YS and FJW; Manuscript writing and review: CDJ, YR, YS and PXE. All authors read and approved the final manuscript.
Correspondence to Xian-e Peng .
This article does not contain any research conducted by the authors on human participants or animals.
Not applicable.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Chen, D., Yu, R., Yin, S. et al. Hepatitis B virus infection as a risk factor for chronic kidney disease: a systematic review and meta-analysis. BMC Infect Dis 24 , 620 (2024). https://doi.org/10.1186/s12879-024-09546-z
Download citation
Received : 24 August 2023
Accepted : 20 June 2024
Published : 22 June 2024
DOI : https://doi.org/10.1186/s12879-024-09546-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1471-2334
BMC Medical Informatics and Decision Making volume 24 , Article number: 178 ( 2024 ) Cite this article
83 Accesses
Metrics details
This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML).
The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the weights of these indicators were determined using the Analytic Hierarchy Process (AHP) method; and (4) the developed index system was empirically validated using real-world EMR data in a ML-based disease risk prediction task.
The synthesis of review findings and the expert consultations led to the formulation of a three-level index system with four first-level, 11 second-level, and 33 third-level indicators. The weights of these indicators were obtained through the AHP method. Results from the empirical analysis illustrated a positive relationship between the scores assigned by the proposed index system and the predictive performances of the datasets.
The proposed index system for evaluating EMR data quality is grounded in extensive literature analysis and expert consultation. Moreover, the system’s high reliability and suitability has been affirmed through empirical validation.
The novel index system offers a robust framework for assessing the quality and suitability of EMR data in ML-based disease risk predictions. It can serve as a guide in building EMR databases, improving EMR data quality control, and generating reliable real-world evidence.
Peer Review reports
The onset of the digital health era has led to a paradigm shift in health management, transitioning from a focus on reactive treatment to proactive prevention [ 1 ]. Disease risk intelligent prediction has become a vital strategy in proactive health management, aiming to identify potential risk factors and prevent the progression of diseases. By harnessing the capabilities of Artificial Intelligence (AI) technologies and Machine Learning (ML) approaches, healthcare professionals can gain valuable insights into diseases, enabling the development of more effective preventive treatment plans [ 2 , 3 ].
Johnson [ 4 ] applied four different ML-based models to predict subsequent deaths or cardiovascular events in a cohort of 6,892 patients. The study found that the ML-based model had superior discrimination ability compared to traditional coronary Computed Tomography (CT) scores in identifying patients at risk of adverse cardiovascular events. Electronic medical records (EMR) data, as a valuable real-world data source, plays a critical role in disease risk prediction using ML techniques [ 4 ]. An EMR refers to a digital version of a patient’s medical record, encompassing medical history, medications, test results, and other relevant information [ 5 , 6 ]. Healthcare providers commonly utilize EMRs to document and track patient information, enabling comprehensive decision-making regarding patient care. Furthermore, clinical researchers can leverage de-identified EMR data to study disease patterns, develop novel treatments, and advance medical knowledge. The integration of ML with EMRs has recently shown significant improvements in predicting patient outcomes, such as identifying individuals with suspected coronary artery disease [ 7 ] or forecasting the likelihood of open-heart surgery [ 8 ]. These advancements highlight the potential of ML in enhancing the efficiency of clinical decision-making [ 9 ].
Nevertheless, several studies have raised concerns about the quality of EMRs in clinical research, emphasizing issues such as lack of data standardization, incomplete or missing clinical data, and discrepancies in data types and element representations [ 10 , 11 ]. Ensuring the quality of EMR data is crucial, as it forms the bedrock for effective utilization of EMRs. High-quality EMR data not only supplies robust evidence, but also accelerates the clinical research process, shortens its timeline, and reduces associated risks. Therefore, controlling and evaluating EMR data quality are pivotal in upholding the overall quality and integrity of clinical research.
Despite numerous studies investigating the assessment of EMR data quality in clinical research, it is noteworthy that the body of literature evaluating EMR data quality is growing [ 12 , 13 ]. However, publicly published clinical studies employing ML techniques and utilizing EMR data frequently overlook data quality or implement methods lacking expert knowledge or evidential support. While methods for data quality evaluation have been described in the informatics literature, researchers without specialized knowledge in this field may find difficulty choosing the appropriate evaluation method in line with the available data and research problems [ 14 ]. Furthermore, the existing quality assessment framework primarily relies on qualitative approaches, making objective measurement of quality and suability challenging.
In this paper, we aim to develop and validate a quantitative index system for evaluating the quality of EMR in disease risk prediction using ML. The proposed index system is intended to provide guidance for utilizing EMR data in research, enhance the quality of EMR data within a Hospital Information System (HIS), and facilitate the implementation of clinical decision-making research based on EMR data. By applying the proposed index system, researchers and healthcare professionals can make knowledgeable decisions regarding the use of EMR data for ML-based disease prediction research, ultimately improving patient care and advancing medical knowledge.
In this paper, we present the development of a quantitative index system, depicted in Fig. 1 , designed to ensure the quality control of EMR data in disease prediction models. The development process incorporated the use of the Delphi method and the analytic hierarchy process (AHP). In addition, an empirical study was undertaken to validate the effectiveness of the developed index system using real-world EMR data in disease risk intelligent prediction.
Workflow of the study
Preliminary indicator identification, definition, and organization.
The initial set of indicators was determined through a comprehensive literature review of studies published before September 27, 2021, obtained from the PubMed database. The search query used was “(machine learning) AND (electronic medical records) AND (disease prediction)”, which resulted in 549 papers. The inclusion criteria required that the research data be related to EMR or HIS and that disease risk was predicted using ML techniques. Review articles and papers deemed to have low relevance were excluded, leading to the removal of 225 papers based on the fulfillment of the exclusion criteria after reading abstracts.
Further screening was conducted by reading the full papers to eliminate studies that did not involve EMR or HIS data or utilized disease prediction methods other than machine learning. Additionally, 18 relevant papers were included by examining the reference lists of the selected studies. Ultimately, a total of 229 papers were retained for the development of the preliminary index system. The detailed process of paper screening is illustrated in Fig. 2 .
Flowchart of paper screening
Upon analyzing the review results, we formulated an initial multi-level index system consisting of four first-level, 11 second-level, and 33 third-level indicators. The first-level indicators represent broad dimensions of data quality, while the second-level indicators correspond to the general dimensions specifically for EMR data quality. The third-level indicators capture specific dimensions relevant to EMR-based disease prediction models.
We utilized the AHP method to determine the weights of the first- and second-level indicators in the three-level index system. The weights of the third-level indicators were calculated using percentages or binary values according to their definitions. The calculation formulas of these third-level indicators will be assessed in the forthcoming Delphi consultation.
Questionnaire compilation and expert consultation.
We conducted a Delphi consultation to gather feedback from experts based on the preliminary index system. The consultation questionnaire, provided in Additional file 1 , consists of four parts: experts' basic information (see Table S1), familiarity and judgment basis with AI-based disease prediction (see Table S2-S3), evaluation tables for the preliminary index system (see Table S4-S6), and an evaluation table for the calculation formulas of the third-level indicators (see Table S7). The importance of the preliminary indicators was measured using a 5-point Likert scale, ranging from “very unimportant” to “very important”. To ensure the extensibility of the preliminary index system, three additional options were included: delete, modify, and new indicator(s). For the calculation formulas part, experts were asked to provide a yes or no response, and if the answer was no, a suggestion for modification was requested.
A total of twenty experts specializing in healthcare/EMR data governance and medical AI were selected for the Delphi consultation. The inclusion criteria for the selection were as follows: (1) holding a Ph.D. degree or being a senior technical associate; (2) possessing more than two years of research experience in related fields; (3) being familiar with the construction and evaluation of EMR data; and (4) being able to give feedback in a timely manner. We conducted a single-round consultation since the nature of our consulting panel was relatively small and homogeneous [ 15 ].
To achieve relatively consistent and reliable feedback from the questionnaire, we calculated four metrics: the experts' positive coefficients, expert authority coefficients (Cr), coefficient of variation (CV), and Kendall's coefficient of concordance. The experts' positive coefficients were determined based on the response rate to the questionnaire. A response rate of 70% or higher is considered satisfactory [ 16 ]. The Cr was calculated as the average of the familiarity coefficient (Cs) and the judgment coefficient (Ca), reflecting the reliability of the expert consultation. A Cr value of 0.7 or above is considered acceptable. The CV measures the consistency of indicators on the same level. A CV value less than 0.25 is expected, indicating a high level of consistency [ 17 ]. Kendall's coefficient of concordance evaluates the overall consistency of all indicators in the system. It ranges from zero to one, with a value greater than 0.2 considered acceptable [ 18 ]. All statistical analyses were performed using Microsoft Excel/IBM SPSS 25.0.
We applied the AHP method to determine the weights of indicators at each level, which is a well-known technique in multiple criteria decision-making [ 19 ]. AHP enables the quantification of criteria and opinions that are difficult to measure numerically, and its outcomes are free from subjective influence due to its use of pairwise comparisons and eigenvalues [ 20 ].
In this study, our AHP method was conducted in three steps. First, we obtained the importance ratings of experts for each indicator. Then, we averaged these ratings for each indicator and performed pairwise comparisons among indicators at the same level that belong to the same upper level. This step allowed us to construct multiple judgment matrices based on their ratios.
Second, we calculated the eigenvectors of each indicator by normalizing the judgement matrix. A larger eigenvector for an indicator represents a higher relative importance. The relative weights of indicators at the same level were determined by standardizing the eigenvectors. For the first-level indicators, their relative weights were equal to their absolute weights. For the second- and third-level indicators, their absolute weights were calculated by multiplying their relative weights with the absolute weight of the upper level.
Third, we performed a consistency test using the consistency ratio (CR) to evaluate the consistency of the judgment matrices. A CR below 0.1 indicated that the judgment matrices were consistent and that the obtained weights were considered valid [ 21 ]. The steps of the AHP method are illustrated in Fig. 3 .
Flowchart of the AHP method
To further validate the suitability of the proposed index system, an empirical study was conducted using real-world EMR data for disease risk prediction.
To ensure a fair assessment, we opted to generate multiple datasets from a single EMR data resource. The chosen data resource needed to be large-scale, open-access, and regularly updated. Once the data resource was identified, we constructed several datasets with varying sample types but maintaining the same set of attributes.
For each dataset, we computed the scores of 33 third-level indicators using the established calculation formulas. The weights of the proposed index system were applied to obtain weighted scores for all indicators within each dataset. The overall score of a dataset was subsequently computed by summing the scores of the first-level indicators.
In the context of disease risk prediction, we considered three widely used ML models: logistic regression (LR), support vector machine (SVM), and random forest (RF). LR is a traditional classification algorithm used to estimate the probability of an event occurring [ 22 ]. SVM, a nonlinear classifier, employs a kernel function to transform input data into a higher-dimensional space, making it effective in handling complex relationships and nonlinear patterns [ 23 ]. RF is an ensemble method that combines the predictions from multiple decision trees. It has shown great success in disease risk prediction tasks by reducing overfitting and improving predictive accuracy [ 24 ]. For our analysis, we used the scikit-learn python library [ 25 ] to implement LR, SVM, and RF.
In our study, we conducted reliability analysis to examine the relationship between the scores obtained from our constructed datasets and the performance of predictive models. we applied Pearson correlation for assessing linear relationships [ 26 ] and Spearman correlation for nonlinearity [ 27 ]. The Pearson correlation coefficient was calculated using the formula:
Here, \({x}_{i}\) and \({y}_{i}\) represent individual data points from the two respective datasets, while \(\overline{x }\) and \(\overline{{\text{y}} }\) denote the mean values of these datasets. A Pearson correlation coefficient near 1 or -1 indicates a strong linear relationship between dataset scores and model performance, whereas a value close to 0 suggests a very weak linear relationship.
Similarly, the Spearman correlation coefficient was calculated using the formula:
Here, \({d}_{i}\) represents the difference in rank between the two datasets for the i -th observation, and n denotes the total number of observations. A Spearman correlation coefficient near 1 or -1 indicates a strong nonlinear relationship, while a value close to 0 suggests a very weak relationship.
In both analyses, statistical significance was established with a p-value less than 0.05. This finding indicates a significant correlation between the scores of our constructed datasets and the performance of predictive models. Thus, this statistically significant outcome supports the reliability of our proposed index system in evaluating the data quality of EMR for intelligent disease risk prediction.
In the Delphi consultation, a total of twenty experts were invited, of which 17 actively participated, yielding a response rate of 85.0%. Out of the 17 experts, 16 provided feedback that met the credibility criteria for a Delphi study, resulting in an effective response rate of 94.1%. These response rates reflect a high degree of expert engagement.
Most of the participating experts were male, held Ph.D. degrees, and specialized in medical informatics or medical AI. Over half of the experts were aged between 40 and 50 years, and 62.5% had between 10 and 20 years of work experience. Moreover, 68.7% of the experts occupied senior associate positions or higher. For detailed information, see Table S8 in Additional file 3 .
The degree of expert authority (Cr) is defined by two factors: the expert's familiarity with the consultation content (Cs) and the basis of expert judgment (Ca). Of the 16 participating experts, 7 were found to be very familiar with the content, while 9 were relatively familiar. This indicates an overall sound understanding of the field among the experts. Only two experts exhibited a low judgment basis, suggesting that the majority of the experts were well-equipped to offer informed judgment. Details of expert familiarity and judgment basis can be found in Table S9 in Additional file 3 .
Cr was calculated to be 0.89, with Cs and Ca values of 0.88 and 0.90, respectively. These values indicate a high level of expert authority and reliability in the consultation results. The CV values for the first-level indicators were less than 0.16, for the second-level indicators were less than 0.20, and for the third-level indicators were no more than 0.25. These low CV values indicate a high level of consistency among experts' scores for the preliminary indicators at each level. Kendall's coefficients of concordance for all three levels were greater than 0.30, indicating a substantial level of agreement among the experts. Additionally, the p-values for the preliminary second- and third-level indicators were very small, further confirming the consistency of experts' scores for each preliminary indicator. Overall, the results demonstrate a high level of consistency and reliability in the experts' assessments for each preliminary indicator.
Experts' comments focused on changes to the definition of indicators. After further discussions with experts, all preliminary indicators were included in the final weighted three-level index system, as shown in Table 1 . No new indicators were added to the system. The index system comprises four first-level indicators, 11 second-level indicators, and 33 third-level indicators, with the weights determined using the AHP method and percentages.
The first-level indicators represent a series of data quality characteristics that determine the suitability of EMR data for disease risk intelligent prediction research. The second-level indicators provide a concrete representation or evaluation of the first-level indicators, making it easier for users to understand their extension or evaluation. The third-level indicators further specify the second-level indicators, providing clear quality requirements for different levels of granularity in the EMR dataset, such as data records, data elements, and data element values. This facilitates users in understanding the evaluation needs and contents more clearly. For detailed information on the indicators, please see Additional file 2 .
In this empirical study, the MIMIC-III clinical database was chosen as the representative real-world EMR data resource. MIMIC-III Footnote 1 is an extensive and freely accessible database that contains comprehensive health-related data from more than 46,000 patients admitted to intensive care unit (ICU) at the Beth Israel Deaconess Medical Center between 2001 and 2012 [ 28 ]. For this study, we utilized MIMIC-III v1.4, which is the latest version released in 2016 [ 29 ] and ensures effective control over EMR data.
Sepsis is a leading cause of mortality among ICU patients, highlighting the importance of accurate sepsis risk prediction for precise treatments in the ICU [ 30 ]. Hence, we selected sepsis as the disease prediction task using the MIMIC-III database. Potential predictors were extracted from the records of vital signs, routine blood examinations [ 31 ], liver function tests [ 32 ] and demographic information. The outcome variable for the prediction task is the occurrence of sepsis. Furthermore, we obtained five different populations of ICU patients with a high risk of sepsis from the MIMIC-III database. The number of patients in each population, categorized as elderly (> 80 years old), long-stay (> 30 days of length of stay, LLOS), ischemic stroke, acute renal failure (ARF), and cirrhosis (CIR), is presented in Table S10 in Additional file 3 .
According to the proposed index system, we evaluated the five datasets and assigned scores to each indicator based on their respective weights in the system. The detailed list of scores of divergent indicators for each dataset can be found in Table S11 in Additional file 3 . In Table 2 , we present the scores of first-level indicators. It is important to note that the scores for the operability indicator were consistent across all five datasets, with a value of 0.251. This is because these datasets were obtained from a single resource.
When considering the overall scores, the LLOS dataset achieved the highest score of 0.966, indicating a higher level of quality, while the ARF dataset obtained the lowest score of 0.907. These scores provide an assessment of the datasets' suitability and quality for disease risk prediction using the proposed index system.
Additional data processing was conducted to prepare the datasets for training ML models. To address the missing values, median imputation was applied to predictors with a small proportion of missing values in each dataset. To mitigate potential bias arising from imbalanced datasets, we applied undersampling on the majority class to achieve a balanced ratio of 1:1. Each dataset was then randomly split into 80% for training and 20% for testing. To ensure fairness in model comparison, the predictors were normalized, and a tenfold cross-validation was performed during the training process.
Regarding model hyperparameters, the LR model applied the 'liblinear' solver method. The SVM model utilized a Radial Basis Function kernel, with a regularization parameter (C) set to 1.0, and the gamma value was set to 'scale'. For the RF model, it was constructed with 10 trees (n_estimators = 10), a maximum tree depth of 7 (max_depth = 7), and optimal feature selection (max_features = ‘auto’).
The evaluation of model performance was based on accuracy (ACC), precision, and area under the curve (AUC). Accuracy represents the proportion of correct predictions made by a model among all predictions. Precision measures the proportion of true positive predictions among all positive predictions made by a model. AUC, also known as the area under the receiver operating characteristic curve, is a metric used to evaluate the performance of binary classification models [ 33 ].
Table 3 displays the model performance on the five datasets. Among the three models, LLOS achieved the highest performance across all three evaluation metrics. On the other hand, ARF had the lowest performance in most cases, except for precision in the LR model.
The relationships between the scores of datasets and the performance metrics of the models were analyzed as follows. First, a normality test was conducted on each pair of scores. If the scores passed the normality test, a Pearson correlation analysis was performed. Otherwise, a Spearman correlation analysis was conducted. Table 4 shows that all correlations, except for LR-Precision, were strongly positive and statistically significant. The SVM-Precision correlation showed the strongest effect among them.
We have developed a quantitative evaluation index system to assess the suitability of EMRs in disease risk intelligent prediction research. The proposed index system was validated through an empirical study using MIMIC-III datasets for predicting sepsis. Three popular ML models were performed, and the predictive results demonstrated that datasets with higher scores achieved better performance across three ML models. Our result is consistent with a previous study that showed the impact of data quality on prediction performance [ 34 ]. Additionally, the association analyses revealed a strong positive relationship between the scores of datasets and the combination of the ML model and evaluation metric. These findings confirm that the proposed index system was effective in evaluating the quality of EMR data in disease risk prediction using ML techniques.
Compared to the general framework for evaluating EMR data quality, our proposed index system was constructed by incorporating both the quality characteristics of EMR data and the specific research activities in ML-based disease risk prediction. It differs from the frameworks developed by Johnson [ 35 ] and Lv [ 36 ], which focused on summarizing literature on general medical data rather than specifically on EMR data. Although Weiskopf [ 37 ] specified EMR data as a required condition for a literature search, they did not explicitly address the situation of using EMR data in their development. The proposed index system considers not only the practical foundation of EMR data but also the data processing operations and operational objectives of EMRs at different stages of prediction model construction. This approach makes the evaluation index system more focused on its research purpose and enhances its explanatory power.
Another significant contribution of the proposed index system is the quantitative evaluation of EMR data quality in disease risk prediction. This provides researchers with guidance or standards for quantifying the EMR datasets for specific research purposes. Most current EMR data quality evaluation systems for clinical research rely on qualitative indicators [ 38 ]. Qualitative indicators are often based on typical cases, statements, and supporting materials, which may lack objectivity. While the study of Weiskopf [ 37 ] incorporated quantitative evaluation, it still relied on subjective scoring of each dimension by experts to calculate the mean value. The evaluation model proposed by Zan [ 39 ] utilized objective measurement indicators, but it primarily focused on binary classification and only included first-level indicators.
The proposed three-level index system was developed using a combination of qualitative and quantitative approaches. The naming and definition of all three levels of indicators were constructed through an extensive literature review and expert consultation. The first-level indicators correspond to the core qualitative aspects of evaluating EMR data quality in ML-based disease risk prediction. The second-level indicators serve as a refinement of the first-level qualitative indicators. The third-level indicators are quantitative in nature and can be obtained through objective quantitative calculations, such as assessing the coverage bias of the outcome variables in the integrity of the third-level indicators. Through the AHP method, the weights of the first- and second-level indicators can be obtained by the weights of third-level indicators in a hierarchical way.
Our study has several limitations. First, the calculation of the weights for the third-level indicators was based on simple percentages. This calculation method may neglect variations in the importance of different indicators. Second, the empirical study was conducted using the MIMIC-3 v1.4 database. Although the MIMIC-3 dataset is a widely used resource in research, the use of a single data resource may restrict the generalizability of our findings. Certain indicators for comparing data resources may be hard to validate without diverse EMR data resources. Hence, future validation studies using another data resource should be conducted to ensure the robustness of the proposed index system.
In this paper, we developed a quantitative three-level index system, which included four first-level, 11 second-level, and 33 third-level indicators, to evaluate the EMR data quality in ML-based disease risk prediction. The reliability of the proposed index system has been verified through an empirical study with real-world data.
The proposed index system can benefit both EMR users for research and data managers. For EMR users for research, the proposed index system could provide them with a measurement for the suitability of EMR data in ML-based disease risk predictions. For EMR data managers, it could guide the direction of EMR database construction and improve the EMR data quality control. Eventually, we hope that the proposed index system can promote the generation of real-world evidence from reliable real-world EMR data.
All data generated or analyzed during this study are included in this published article and its additional files.
https://physionet.org/content/mimiciii/1.4/
Artificial intelligence
Analytic hierarchy process
Acute renal failure
Area under curve
Judgment coefficient
Consistency ratio
Expert authority coefficients
Familiarity coefficient
Computed tomography
Coefficient of variation
Hospital information system
Intensive care units
> 30 Days of length of stay
Logistic regression
Support vector machine
Random forest
Waldman SA, Terzic A. Healthcare evolves from reactive to proactive. Clin Pharmacol Ther. 2019;105(1):10.
Article PubMed Google Scholar
Razzak MI, Imran M, Xu G. Big data analytics for preventive medicine. Neural Comput Appl. 2020;32:4417–51.
Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8.
Article PubMed PubMed Central Google Scholar
Dzau VJ, Balatbat CA. Health and societal implications of medical and technological advances. Sci Transl Med. 2018;10(463):eaau4778.
Institute of Medicine. The Computer-Based Patient Record: An Essential Technology for Health Care. Washington DC: National Academy Press; 1997.
Google Scholar
Ambinder EP. Electronic health records. Journal of oncology practice. 2005;1(2):57.
Motwani M, Dey D, Berman DS, et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multi centre prospective registry analysis. Eur Heart J. 2017;38(7):500–7.
PubMed Google Scholar
Allyn J, Allou N, Augustin P, et al. A comparison of a machine learning model with EuroSCORE II in predicting mortality after elective cardiac surgery: a decision curve analysis. PLoS ONE. 2017;12(1): e0169772.
Geissbuhler A, Miller RA. Clinical application of the UMLS in a computerized order entry and decision-support system. Proc AMIA Symp. 1998;1998:320–4.
Field D, Sansone SA. A special issue on data standards. OMICS. 2006;10(2):84–93.
Article CAS Google Scholar
Mead CN. Data interchange standards in healthcare IT–computable semantic interoperability: now possible but still difficult, do we really need a better mousetrap? J Healthc Inf Manag. 2006;20(1):71–8.
Reimer AP, Milinovich A, Madigan EA. Data quality assessment framework to assess electronic medical record data for use in research. Int J Med Inform. 2016;90:40–7.
Johnson SG, Speedie S, Simon G, et al. Application of an ontology for characterizing data quality for a secondary use of EHR data. Appl Clin Inform. 2016;7(1):69–88.
Ozonze O, Scott PJ, Hopgood AA. Automating electronic health record data quality assessment. J Med Syst. 2023;47(1):23.
Strasser A. Delphi method variants in IS research: a taxonomy proposal. In: PAC15 2016 Proceedings. 2016. https://aisel.aisnet.org/pacis2016/224 . Accessed 5 July 2023.
Babbie ER. The Practice of Social Research. Mason, OH: CENGAGE Learning Custom Publishing; 2014.
Bryman A. Social Research Methods. London, England: Oxford University Press; 2015.
Ruan Y, Song S, Yin Z, et al. Comprehensive evaluation of military training-induced fatigue among soldiers in China: A Delphi consensus study. Front Public Health. 2022;10:1004910.
Shim JP. Bibliographical research on the analytic hierarchy process (AHP). Socio-Econ Plann Sci. 1989;23(3):161–7.
Article Google Scholar
Ho W. Integrated analytic hierarchy process and its applications–A literature review. Eur J Oper Res. 2008;186(1):211–28.
Lane EF, Verdini WA. A consistency test for AHP decision makers. Decis Sci. 1989;20(3):575–90.
Johnson AE, Pollard TJ, Mark RG. “MIMIC-III clinical database (version 1.4),” PhysioNet. 2016; https://doi.org/10.13026/C2XW26 .
Nusinovici S, Tham YC, Yan MYC, et al. Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol. 2020;122:56–69.
Watanabe T, Kessler D, Scott C, et al. Disease prediction based on functional connectomes using a scalable and spatially-informed support vector machine. Neuroimage. 2014;96:183–202.
Yang L, Wu H, Jin X, et al. Study of cardiovascular disease prediction model based on random forest in eastern China. Sci Rep. 2020;10(1):5245.
Article CAS PubMed PubMed Central Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Machine Learn Res. 2011;12:2825–30.
Cohen I, Huang Y, Chen J, et al. Pearson correlation coefficient. In: Noise reduction in speech processing. Heidelberg: Springer; 2009:1–4.
Hauke J, Kossowski T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestion Geograph. 2011;30(2):87–93.
Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3(1):1–9.
Mayr FB, Yende S, Angus DC. Epidemiology of severe sepsis. Virulence. 2014;5(1):4–11.
Lan P, Wang SJ, Shi QC, et al. Comparison of the predictive value of scoring systems on the prognosis of cirrhotic patients with suspected infection. Medicine. 2018;97(28): e11230.
Lan P, Pan K, Wang S, et al. High serum iron level is associated with increased mortality in patients with sepsis. Sci Rep. 2018;8(1):11072.
Saito T, Rehmsmeier M. Precrec: fast and accurate precision-recall and ROC curve calculations in R. Bioinformatics. 2017;33(1):145–7.
Article CAS PubMed Google Scholar
Ferencek A, Kljajić BM. Data quality assessment in product failure prediction models. J Decis Syst. 2020;29(Suppl 1):79–86.
Johnson SG, Speedie S, Simon G, et al. A data quality ontology for the secondary use of EHR data. AMIA Annu Symp Proc. 2015;2015:1937–46.
PubMed PubMed Central Google Scholar
Tian Q, Chen Y, Han Z, et al. Research on evaluation indexes of clinical data quality. J Med Inform. 2020;41(10):9–17.
Weiskopf NG, Bakken S, Hripcsak G, et al. A data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC). 2017;5(1):14.
Kahn MG, Callahan TJ, Barnard J, et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244.
Cai L, Zhu Y. The challenges of data quality and data quality assessment in the big data era. Data Sci J. 2015;14:2–2.
Download references
The authors would like to thank all those who participated in the expert consultation.
This work was supported by the Chinese Academy of Medical Sciences Initiative for Innovative Medicine (Grant No. 2021-I2M-1–057 and Grant No. 2021-I2M-1–056), National Key Research and Development Program of China (Grant No. 2022YFC3601001), and National Social Science Fund of China (Grant No. 21BTQ069).
Jiayin Zhou and Jie Hao contributed equally to this work.
Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, People’s Republic of China
Jiayin Zhou, Jie Hao, Mingkun Tang, Haixia Sun, Jiao Li & Qing Qian
Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, People’s Republic of China
Jiayang Wang
You can also search for this author in PubMed Google Scholar
All authors contributed to this study. QQ led and designed the study. HS designed the study and structured the manuscript. JZ drafted and revised the manuscript. JH drafted, revised the manuscript, and provided assistance with experiment interpretation. JW and MT conducted the empirical study. JL provided critical revision. All authors read and approved the final manuscript.
Correspondence to Qing Qian .
Ethics approval and consent to participate.
Ethics approval was not deemed necessary for this study by the Ethics Committee at the Institute of Medical Information & Library, Chinese Academy of Medical Sciences, in accordance with national guidelines and local legislation. Written informed consent was obtained from all participants. All methods were performed in accordance with the relevant guidelines and regulations. This study was conducted in compliance with the Declaration of the Helsinki.
Not applicable.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material 1., supplementary material 2., supplementary material 3. , rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Zhou, J., Hao, J., Tang, M. et al. Development of a quantitative index system for evaluating the quality of electronic medical records in disease risk intelligent prediction. BMC Med Inform Decis Mak 24 , 178 (2024). https://doi.org/10.1186/s12911-024-02533-z
Download citation
Received : 05 July 2023
Accepted : 13 May 2024
Published : 24 June 2024
DOI : https://doi.org/10.1186/s12911-024-02533-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1472-6947
Background Arteriovenous malformation (AVM)-associated aneurysms represent a high-risk feature predisposing them to rupture. Infratentorial AVMs have been shown to have a greater incidence of associated aneurysms, however the existing data is outdated and biased. The aim of our research was to compare the incidence of supratentorial vs infratentorial AVM-associated aneurysms.
Methods Patients were identified from our institutional AVM registry, which includes all patients with an intracranial AVM diagnosis since 2000, regardless of treatment. Records were reviewed for clinical details, AVM characteristics, nidus location (supratentorial or infratentorial), and presence of associated aneurysms. Statistical comparisons were made using Fisher’s exact or Wilcoxon rank sum tests as appropriate. Multivariable logistic regression analysis determined independent predictors of AVM-associated aneurysms. As a secondary analysis, a systematic literature review was performed, where studies documenting the incidence of AVM-associated aneurysms stratified by location were of interest.
Results From 2000–2024, 706 patients with 720 AVMs were identified, of which 152 (21.1%) were infratentorial. Intracranial hemorrhage was the most common AVM presentation (42.1%). The incidence of associated aneurysms was greater in infratentorial AVMs compared with supratentorial cases (45.4% vs 20.1%; P<0.0001). Multivariable logistic regression demonstrated that infratentorial nidus location was the singular predictor of an associated aneurysm, odds ratio: 2.9 (P<0.0001). Systematic literature review identified eight studies satisfying inclusion criteria. Aggregate analysis indicated infratentorial AVMs were more likely to harbor an associated aneurysm (OR 1.7) and present as ruptured (OR 3.9), P<0.0001.
Conclusions In this modern consecutive patient series, infratentorial nidus location was a significant predictor of an associated aneurysm and hemorrhagic presentation.
https://doi.org/10.1136/jnis-2024-022003
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
X @GaborTothMD
Contributors All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MD, MM, and NT. Manuscript preparation was performed by MD and all authors participated in manuscript revisions. All authors reviewed and approved the final manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
IMAGES
VIDEO
COMMENTS
In seeking to comment on, and evaluate, the latest developments in risk communication research, this literature review is organized around two themes that dominated the published literature since 2010. We first review recent debates over the current state and future directions of the field. These debates include various perspectives on how the ...
PDF | On Jan 1, 1986, V.T. Covello and others published Risk communication: A review of the literature | Find, read and cite all the research you need on ResearchGate
A search strategy for risk communication in clinical trials was designed in collaboration with a Senior Information Specialist (skilled in developing and running search strategies to identify relevant scientific literature) and informed by systematic reviews of risk communication in treatment and screening contexts and supplemented with trial ...
First is the need to disseminate meaningful information to politicians, public health professionals, and the public. Second is the need to communicate that information in formats that are readily understandable. It is not the time, in the midst of a pandemic, to devise a scientific risk and public health communication strategy.
Some of the risk communication literature is too general in terms of recognizing the nuance of the locus of risk, and the role(s) of stakeholders and communicators, which limit understanding that could extend and enrich current risk communication literature. ... Miller AN, Sellnow T, Neuberger L, et al. (2017) A systematic review of literature ...
The Evolving Field of Risk Communication. Dominic Balog-W ay , 1,∗Katherine McComas,1and John Besley 2. The 40th Anniversary of the Society for Risk Analysis presents an apt time to step back ...
The 40th Anniversary of the Society for Risk Analysis presents an apt time to step back and review the field of risk communication. In this review, we first evaluate recent debates over the field's current state and future directions. ... and social risk communication scholarship appearing in the published literature since 2010. Studies on ...
A systematic review on risk communication published since has shown that visual aids, such as icon arrays and bar ... Sixsmith J., Barry M., Núñez-Córdoba J., Oroviogoicoechea-Ortega C. and Guillén-Grima F. (2013). A literature review on effective risk communication for the prevention and control of communicable diseases in Europe ...
The literature on risk communication is diverse and some areas of risk communication are still without strong evidence.4 This review discusses the importance of effec-tive risk communication and summarises the evidence behind the various methods of presenting risk information. Why is risk communication important?
Explore the latest full-text research PDFs, articles, conference papers, preprints and more on RISK COMMUNICATION. Find methods information, sources, references or conduct a literature review on ...
This systematic literature review aims to provide a coherent foundation for empirical studies of interpersonal discussion on risk. Specifically, it summarizes existing research into the reciprocal relationship between interpersonal discussion on risk and individual-level risk perception. ... Leaving this field of study to risk communication ...
The Risk-Perception and Risk-Communication Literature 151 sensible for inducing a protective behavior than for the more typical "law-breaking" behavior. However, fear appeals can be useful if they are paired with mechanisms for reducing associated anxiety and fear. Incentives and other positive reinforcement (e.g., lottery prizes, coupons, or
This review examines the current body of literature on risk communication related to communicable diseases, focusing on: (i) definitions and theories of risk communication; (ii) methodologies, tools and guidelines for risk communication research, policy and implementation; and (iii) implications, insights and key lessons learned from the ...
There are four basic risk and high concern communication theories: trust determination theory, negative dominance theory, mental noise theory, and risk perception theory. Risk communication will be successful only if carefully planned and designed for the specific situation and audience. Technical language and jargon are useful as professional ...
The review demonstrates that there is an impressive body of literature on risk communication relevant to the prevention and control of communicable diseases. This literature is complicated, however, by blurred definitions and overlap between risk communication and crisis communication.
transpire. Effective risk communication practices are imperative to ensuring that society is equipped with the best tools needed to manage the repercussions of COVID-19. In the following sections, I will first explore the best practices for engaging in risk communication through a literature review. Next, I will present a case study on the State of
Communicators often find it challenging to prioritize the public and manage their comments during risk communication. This study explored the effects of comments as interactivity cues on news diffusion while considering situational factors under the framework of the Situational Theory of Problem Solving in the context of the US-China trade conflict.
KEY WORDS: Literature review; probability information; risk communication risks.(1-3) The risk's probability may be one of the outcomes of a risk analysis, in addition to, for example, details about the people at risk and the exposure level.(4,5) Numerous studies have been conducted about how (i.e., in which format) to present the ...
Risk communication includes any type of two-way communication among different stakeholders, ... In the literature review, no systematic review was found to identify the components and models of disaster risk communication. Based on this study's results, 115 components were identified in five groups (message, message sender, message receiver ...
However, the incoherence of variables. affecting risk communication in various studies makes it dif cult to plan for disaster risk communication. This study aims to identify and classify the in ...
Preferred mechanism for risk communication. Figure 2 describes preferences for risk communication for our entire sample. If considered to be at high risk for breast cancer, 52.9% would prefer to receive the results by telephone with a healthcare professional, followed by 47.1% preferring a face-to-face meeting with a healthcare professional.
The 40th Anniversary of the Society for Risk Analysis presents an apt time to step back and review the field of risk communication. In this review, we first evaluate recent debates over the field's current state and future directions. Our takeaway is that efforts to settle on a single, generic version of what constitutes risk communication will ...
Risk assessment is a critical sub-process in information security risk management (ISRM) that is used to identify an organization's vulnerabilities and threats as well as evaluate current and planned security controls. Therefore, adequate resources and return on investments should be considered when reviewing assets. However, many existing frameworks lack granular guidelines and mostly ...
Executive Summary This review examines the current body of literature on risk communication related to communicable diseases, focusing on: (i) definitions and theories of risk communication; (ii) methodologies, tools and guidelines for risk communication research, policy and implementation; and (iii) implications, insights and key lessons learned from the application of risk communication ...
The literature review methodology received different terms in the literature (Whittemore and Knafl 2005).Webster & Watson recommended a structured approach that focuses on the main journals and academic databases, which can speed up the identification of relevant papers.This research uses a descriptive approach (Durach, Kembro, e Wieland, 2015), based on gaps, themes, research agendas, framed ...
Risk communication here is used as an overarching concept that includes various communications pertaining to disaster risks including but not limited to risk assessment, warnings, forecasts, risk awareness, and crisis communication . It provides a comprehensive desktop review of various sources and literature, including websites and online ...
Background Currently, several studies have observed that chronic hepatitis B virus infection is associated with the pathogenesis of kidney disease. However, the extent of the correlation between hepatitis B virus infection and the chronic kidney disease risk remains controversial. Methods In the present study, we searched all eligible literature in seven databases in English and Chinese. The ...
The review brings together the current body of literature on risk communication on communicable diseases in a concise reference document that can be used to inform the development of evidence ...
This study aimed to develop and validate a quantitative index system for evaluating the data quality of Electronic Medical Records (EMR) in disease risk prediction using Machine Learning (ML). The index system was developed in four steps: (1) a preliminary index system was outlined based on literature review; (2) we utilized the Delphi method to structure the indicators at all levels; (3) the ...
Background Arteriovenous malformation (AVM)-associated aneurysms represent a high-risk feature predisposing them to rupture. Infratentorial AVMs have been shown to have a greater incidence of associated aneurysms, however the existing data is outdated and biased. The aim of our research was to compare the incidence of supratentorial vs infratentorial AVM-associated aneurysms. Methods Patients ...