Alpha
Constructs correlation matrix.
PPC | SE | TA | SA | TRU | PU | PEOU | BI | |
---|---|---|---|---|---|---|---|---|
0.499 | ||||||||
−0.009 | 0.153 | |||||||
0.360 | 0.411 | −0.086 | ||||||
0.378 | 0.563 | 0.105 | 0.610 | |||||
0.340 | 0.347 | −0.177 | 0.734 | 0.439 | ||||
0.441 | 0.697 | −0.057 | 0.658 | 0.672 | 0.683 | |||
0.405 | 0.523 | −0.115 | 0.764 | 0.610 | 0.853 | 0.840 |
The HTMT Analysis of discriminate validity.
PPC | SE | TA | SA | TRU | PU | PEOU | BI | |
---|---|---|---|---|---|---|---|---|
0.556 | ||||||||
0.015 | 0.153 | |||||||
0.405 | 0.403 | 0.081 | ||||||
0.415 | 0.575 | 0.096 | 0.644 | |||||
0.397 | 0.340 | 0.177 | 0.720 | 0.437 | ||||
0.520 | 0.726 | 0.058 | 0.693 | 0.729 | 0.712 | |||
0.444 | 0.525 | 0.115 | 0.757 | 0.622 | 0.852 | 0.881 |
Structural equation modeling was used to analyze the proposed research model with AMOS 24.0. The results revealed absolute fit indices and incremental fit indices (see Table 6 ). All the values are greater than the suggested values [ 105 ], which indicates that the data has a good fit with the proposed model and the data is adequate for further path analysis.
Goodness-of-fit test.
Category | Measure | Acceptable Values | Value |
---|---|---|---|
Absolute fit indices | Chi-square/d.f. | 1–5 | 2.248 |
GFI | 0.90 or above | 0.913 | |
SRMR | 0.08 or below [ ] | 0.065 | |
RMSEA | 0.08 or below [ ] | 0.055 | |
NFI | 0.90 or above | 0.920 | |
Incremental fit indices | IFI | 0.90 or above | 0.954 |
TLI | 0.90 or above | 0.942 | |
CFI | 0.90 or above | 0.953 |
Note: GFI = goodness-of-fit index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; NFI = normed fit index; IFI = incremental fit index; TLI = Tucker–Lewis index; CFI = comparative fit index.
Path analysis was conducted through SEM to examine the relationships among variables. The results of path analyses can be found in Figure 3 and Table 7 . Results revealed that ten out of fourteen hypotheses were supported or partially supported. Behavior intention was predicted by perceived usefulness, perceived ease of use and trust, with a variance of 64.34%. Perceived usefulness was the most determinate variable, followed by perceived ease of use and perceived trust. Moreover, perceived usefulness was predicted by perceived ease of use, mobile self-efficacy and self-actualization, with a variance of 51.09%. Perceived ease of use was explained with the variance of 30.07% by self-efficacy, technology anxiety and self-actualization. Perceived trust was influenced by self-efficacy with a variance of 54.02%.
Results of SEM. Note: * p < 0.1; ** p < 0.05; *** p < 0.01.
Results of hypotheses testing.
Path Direction | Path Coefficients | -Value | Results | |
---|---|---|---|---|
H1-1 | PU → BI | 0.655 | *** | Supported |
H1-2 | PEOU → BI | 0.458 | *** | Supported |
H1-3 | PEOU → PU | 0.595 | *** | Supported |
H2 | TRU → BI | 0.068 | 0.028 ** | Supported |
H3-1 | PPC → PEOU | 0.015 | 0.209 | Not supported |
H3-2 | PPC → PU | 0.078 | 0.277 | Not supported |
H4-1 | SE → PEOU | 0.407 | *** | Supported |
H4-2 | SE → PU | −0.216 | 0.005 ** | Supported |
H4-3 | SE → TRU | 0.735 | *** | Supported |
H5-1 | TA → PEOU | −0.019 | 0.090 * | Partially supported |
H5-2 | TA → PU | −0.015 | 0.188 | Not supported |
H6-1 | SA → PEOU | 0.367 | *** | Supported |
H6-2 | SA → PU | 0.332 | *** | Supported |
Note: * p < 0.1; ** p < 0.05; *** p < 0.01.
In terms of the influences of aging characteristics, mobile self-efficacy makes significant influences on perceived usefulness, perceived trust, and perceived ease of use. Technology anxiety influences perceived ease of use negatively in a marginal way. Self-actualization significantly influences perceived usefulness and perceived ease of use.
Accordingly, the current study tends to contribute to prior literatures in several ways. To begin with, although previous research might have emphasized the introduction of new media and discussed their acceptance towards the latest technology, limited attention has been given to VUI [ 3 ], especially in the context of China, one of largest elderly populations in the world [ 12 ]. Given the popularity of VUI nowadays, older adults’ adoption intention has been largely overlooked in China. This study addresses the research gap by proposing a model to provide insights on the factors that influence older adults’ adoption of VUI in China. In addition, rare literature has comprehensively discussed the characteristic of VUI and older adults through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). In order to address this gap, this study started from TAM and further extended the model to have a relatively more thoroughly insight of the behavior of the elderly in this digital era. Results revealed that three factors determined Chinese older adults’ adoption of VUI: perceived usefulness, perceived ease of use and trust.
To specify, the results reveal several important findings. Consistent with previous studies on TAM [ 2 ]. Findings confirm that perceived usefulness, perceived ease of use, and trust is three important factors to explain Chinese older adults’ adoption of VUI. Results further reveal aging-related characteristics influence older adults’ perception of ease of use, usefulness and trust. This study finds a positive relationship between trust and the adoption intention of VUI. Trust has been demonstrated as an important factor in the contexts of e-commerce, e-government and technology adoption [ 48 , 108 ]. In the context of VUI, as VUI systems need to perform monitoring functions all the time, users have to share their daily conversations with the systems. The exposure of personal information can make users feel uncomfortable and vulnerable, which hinders users’ adoption of VUI. In this case, trust becomes a crucial factor. Users’ belief that their personal information will be protected becomes can largely alleviate users’ negative feelings and facilitate their adoption of VUI. Consistent with prior research that found the role of trust in influencing young adults’ adoption of VUI in the U.S. [ 40 ], results of this study show a similar pattern. Users who have a higher degree of trust will have a stronger adoption intention of VUI.
This study also reveals the influences of aging-related characteristics. Among the aging-related characteristics, perceived physical conditions did not show any significant influences on perceived usefulness, perceived ease of use and perceived trust. These findings are consistent with previous studies [ 61 ]. One possible explanation would be that healthy conditions serve as a precondition for older adults’ adoption of VUI, but the perceived physical condition itself does not naturally lead to better adoption intention. In other words, relatively healthy physical conditions enable older adults with acceptable physical and cognitive capabilities for using VUI. For instance, a good hearing ability enables older adults to use VUI, but a better hearing ability does not improve their intention of using VUI. Most likely, perceived physical conditions are influenced by other factors, such as technology anxiety.
Different from our hypothesis, no significant influence of technology anxiety is found on perceived usefulness or trust. In line with previous studies [ 61 ], technology anxiety lowers older adults’ perception of ease of use of VUI. A marginal significant negative influence of technology anxiety is found on perceived ease of use ( p < 0.1). This could be influenced by the fact that the benefits of VUI have been well acknowledged by older adults. The anxious emotion does not have a significant influence on their perception of usefulness. Different from other interaction methods (e.g., GUI) that require considerable efforts to acquire, VUI is highly similar to natural speech in daily lives. Such similarities make older adults feel that VUI are close to them and easy to acquire. The anxiety triggered by technology might be largely alleviated because of the intuitiveness of VUI. Thus, no significant influences of technology anxiety on perceived usefulness or trust were detected.
In terms of mobile self-efficacy, as expected, it positively affects perceived ease of use and trust. The extensive experience with mobile devices provides users with a better capability of learning VUI, and thus, they have a more positive perception of ease of use. Similarly, their experience with other technological applications, such as e-commerce, also translates into higher trust with VUI. Through their previous experience, they understand that technology provides have the obligation to protect users’ personal information. There are laws and rules to prohibit the misuse of users’ personal information. Therefore, older adults who have a higher level of mobile self-efficacy form a higher degree of trust with VUI. However, the higher level of self-efficacy does not bring a higher perception of usefulness. Instead, high self-efficacy is found to lower older adults’ perception of the usefulness of VUI. This finding indicates that older adults with a higher level of self-efficacy have more serious resistance to VUI. Specifically, older adults who are skillful at traditional interaction methods may feel that the traditional ways can satisfy their needs and it is unnecessary to change into VUI. Consequently, they have a negative perception of the usefulness of VUI.
As for self-actualization, consistent with our hypotheses, it positively relates to perceived usefulness, perceived ease of use, and perceived trust. Self-actualization is an intrinsic motivation to make achievements [ 87 ]. In line with previous studies that show that a higher level of self-actualization is associated with older adults’ adoption of new technologies [ 61 , 72 , 109 ], this study further confirms this notion by revealing the positive relationship between a higher level of self-actualization and the perception of VUI. Chinese older adults view using VUI as a chance for personal development.
Chinese older adults’ adoption of smart devices remains relatively low [ 110 ]. The complicated interaction is one of the barriers to older adults’ effective usage of smart devices. Using VUI as an interaction method could be a chance to assist older adults’ effective usage of smart products. This study finds that older adults’ adoption of VUI is predicated by perceived usefulness, perceived ease of use and trust. These factors also serve as mediators for the influences of technology anxiety, mobile self-efficacy and self-actualization on older adults’ adoption of VUI. These findings have valuable implications for developers and promoters to develop better VUI and plan for better communication strategies to facilitate adoption by older adults.
Developers should improve the speech recognition quality and language processing quality of VUI, as older adults show a higher adoption intention when they perceive VUI as more useful and ease of use. Both usefulness and ease of use of VUI rely on speech recognition accuracy and natural language processing capability. The higher accuracy of users’ voice commands and better comprehension of users’ intended meanings further improve VUI’s usefulness and ease of use. Specifically, for improving perceived usefulness, developers should carefully assess the contexts for using VUIs. The usage of VUIs can be particularly helpful for complex interaction tasks that require multiple steps, such as searching and navigation tasks. It would be also useful for using VUIs in tasks that are difficult for older adults due to decreasing capabilities, such as typing and dialing tasks.
To improve users’ perception of ease of use, developers can also make the voice interaction simple and intuitive. Involving interpersonal communication techniques into VUI can be particularly helpful for older adults. Designers can think of creating a personality for VUIs, which can largely reduce the psychological distance perceived by older adults. Designers should carefully consider how to create a desirable personality, including gender, tone, speaking styles. As older adults suffer from reduced cognitive load, it would be helpful to use short vocabularies that are easy to remember, such as ‘OK’ and ‘got it’. When it is necessary to highlight certain information, it would also be useful to slow down the speed and improve the volume of voice commands.
Moreover, it is important to improve trust between older adults and VUI. Developers could explore new technologies solutions to improve privacy when using VUI. When promoting VUI, marketers could highlight the sophisticated technologies used to improve privacy as well as the agreements with users for protecting users’ personal information. Policymakers could also try to explain the regulations in law for protecting users’ information and the serious consequences for the misuse of users’ personal information.
This study further shows the influences of mobile self-efficacy, technology anxiety, and self-actualization, which are useful for developers and marketers. Older adults who have a higher level of mobile self-efficacy show a higher perception of ease of use and trust, but a lower perception of the usefulness of VUI. This indicates that a higher level of mobile self-efficacy makes older adults more resistant to the benefits of VUI. When promoting VUI, marketers need different communication strategies for older adults who have a low or high level of mobile self-efficacy. It is necessary to highlight the benefits of VUI, especially the relative advantages of VUI in comparison with previous interaction methods. It would be also possible to first target older adults who have a low level of mobile self-efficacy. Moreover, it seems that VUI is an intuitive interaction method and thus, the influence of technology anxiety is relatively limited. Technology anxiety is found to be marginally related to perceived ease of use negatively. Therefore, developers and markers do not need to pay extensive efforts on how to reduce technology anxiety. In addition, self-actualization is found to make positive influences on perceived ease of use, perceived usefulness, and trust. This finding indicates that marketers should express the message that using VUI is a channel for personal development. Marketers could use multiple channels to express these messages, such as short videos on social media and graphic posters in public places. These efforts could facilitate older adults’ adoption of VUI.
Older adults show resistance to adopting smart home devices although they can gain huge benefits from adopting smart home systems. The integration of VUI in smart home systems is promising to facilitate older adults’ adoption of smart home systems. The results of this research not only provide implications for older adults’ adoption of VUI but also for their adoption of smart home systems.
For developers, when integrating VUI into smart home devices, they should pay particular attention to users’ perception of ease of use and usefulness. Specifically, for some smart products, such as smart speakers, the integration of VUI can largely improve users’ perception of ease of use and usefulness because smart speakers provide various functions which require complex interactions. In this case, integration of VUI largely reduces older adults’ learning burdens, which improves their perceptions of ease of use and usefulness of smart speakers in general. Differently, for some products that require simple interactions, integrating VUI may not be an optimal choice because the improvements of perceptions of usefulness and ease of use remain limited. For instance, for a cleaning robot whose function is to clean floors autonomously, users interact with it by pressing a start button, which is direct and simple. Upon completion, users have to physically interact with it in order to clean the dust containers. Thus, because of the simple interaction and requirements of physical interaction, involving VUI in cleaning robots might not largely improve users’ perception of ease of use and usefulness. As developing and integrating VUI into smart devices is costly, developers should carefully consider the appropriateness of involving VUI in smart devices.
This study also shows the influences of aging-related characteristics on older adults’ adoption of VUI, which could also be applicable to explaining their adoption of smart home devices. Specifically, mobile efficacy may lower users’ perceptions of the usefulness of smart home devices, similar to users’ perceptions of VUI. Because users who are very familiar with current mobile devices may feel that these devices sufficiently satisfy their needs, it is not necessary to switch to smart devices. Therefore, to promote older adults’ adoption of smart home devices, it would be interesting to highlight the benefits provided by smart home devices and target users who are less familiar with mobile devices.
In addition, we found a positive relationship between self-actualization and adoption of VUI. It is possible that self-actualization also positively influences older adults’ adoption of smart home devices. When older adults have a higher level of self-actualization, they are more motivated to adopt VUI because they view learning VUI as a chance for personal development. Similarly, for older adults with high self-actualization, learning to use smart home systems could also become an opportunity for them to gain new experiences. Thus, to promote smart home devices, companies should highlight self-actualization messages and target older adults who have a relatively high level of self-actualization.
Although this study is carefully prepared, it carries several limitations. We conducted the data collected online. According to CNNIC, 70% of older adults in China are frequent users of the Internet and mobile Internet [ 110 ]. The adoption of smartphones exceeds 80%. The high penetration rate of smartphones and the Internet makes it feasible to collect data online. As VUI is often integrated with smart products, it is also suitable to use the online sampling method. However, the older adults who are less active online might not be covered in this sample. In other words, whether these results can be applicable for older adults who are not frequent users of the Internet still requires further validation, which can be interesting for future research. Moreover, this study provides evidence on the potential usage of VUI toward the target population. A future study can use field experiments to validate the current finding. Specifically, it would be interesting to collect elderly participants who have some hands-on experience regarding VUI usages, which can result in more specific guidelines for developing usable VUI for older adults.
In addition, the average age of participants is 59, who are labeled as young old adults. This group of older adults occupies a large proportion in China, and thus it is worthwhile to focus on this group. However, this group of older adults could be different from older adults whose ages exceed 65. Therefore, future research could replicate this study by focusing on older adults with higher ages. Moreover, this study focuses on VUI adoption intention and older adults’ general perception of VUI. In other words, although older adults are willing to adopt VUI in their daily lives, their actual usage and continuous usage remain unknown. Older adults’ actual usage might be influenced by other factors, such as usability and usage scenarios. Future research could conduct user studies to learn the usability issues with using VUI and generate guidelines for VUI development, which can further facilitate the adoption of VUI.
VUI has gained popularity in this decade. It has been integrated with various smart home devices and developed for many usage scenarios. The benefits of VUI should be available to everyone, including older adults, who occupy 25% of the overall population in China. This study investigates the factors that influence older adults’ adoption of VUI in China. On the basis of TAM, this study proposes a theoretical model to predict older adults’ adoption of VUI through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). A survey was conducted with 420 participants who are current or potential users of VUI. Data were analyzed through SEM and the data showed a good fit of the proposed theoretical model. Results further revealed that older adults’ adoption is determined by perceived usefulness, perceived ease of use and trust. These factors also mediate the influences of aging-related characteristics on older adults’ adoption of VUI. Specifically, mobile self-efficacy is found to make positive influences on trust and perceived ease of use, but negative influences on perceived usefulness. Self-actualization makes positive influences on perceived usefulness and perceived ease of use. Technology anxiety only exerts a marginally significant influence on perceived ease of use. No significant influences of perceived physical conditions were found. These results extend the TAM and STAM by incorporating additional variables. These results also provide valuable implications for practice.
Conceptualization, Y.Y. and P.C.; methodology, Y.Y.; software, Y.Y.; validation, Y.Y.; formal analysis, Y.S. and P.C.; investigation, Y.Y. and P.C.; resources, P.C.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.S. and P.C.; visualization, Y.Y.; supervision, P.C.; project administration, P.C.; funding acquisition, Y.S. and P.C. All authors have read and agreed to the published version of the manuscript.
This research was funded by Humanities and Social Science projects of the Ministry of Education in China, grant number 20YJC760009; Shenzhen Science and Technology Innovation Commission under Shenzhen Fundamental Research Program, grant number JCYJ20190806142 401703; the Fundamental Research Funds for the Central Universities, grant number YJ202203.
Not applicable.
Informed consent was obtained from all subjects involved in the study.
Conflicts of interest.
The authors declare no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Since the invention of computers, the norm of human-computer interaction has gone through various stages: from keyboard-mouse interaction to touch screen, to name a few. meanwhile, user interface has undergone tremendous changes in the recent years. voice interface is gradually replacing graphic user interface, and quickly becoming a common part of in-vehicle experiences. products that use voice as the primary interface are becoming popular by the day and the number of users has continued to grow. this case study explores the the application of voice interface in automotive field..
Key takeaway is that we should not separate speech from ui design. just like when we are at a live music show, all five of our senses work together. , when and where to apply gui or vui really depends. some information is easier to be processed when we see it. in other cases, vui is more suitable., here are a few examples of instances: gui: when we show a long list of option. charts with large amount of data. product information and product comparison. vui: simple command and user instructions. warning and notifications. .
How do we analyze user’s intentions in vui.
Users will only think that computers understand what they mean by getting the feedback that meet their expectations. complete conversational structure in natural language has a ‘start module’ and an ‘end module’ with ‘nodes in between topics. , analyze user intent using replacement, we can derive a variety of user needs and responses by dissecting a fairly general user intent into components, and re-combining the components to derive a series of complex user needs. use this case of autonomous driving as an example. suppose users want to ask about weather in the car by striking up a conversation. users ask questions about the weather not only to ontain information about the weather, but we should be able to expand the topic and add dimensions to the conversation with related topics like safety, travel, health, food, mood etc., adobe xd is a powerful prototyping tool for voice interface. when users ariticular a particular word or phrase, the utterance triggers speech-to-text engine, and prototypes will react with words/sentence defined by the designer. .
You have full access to this open access article
15 Accesses
Explore all metrics
Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users and cognitive assessment administrators, through standard evaluation questionnaires. Method: a sample of 46 individuals and 24 evaluators participated in this study. End-users were fairly heterogeneous considering demographic and neuro-psychological characteristics. Evaluators were mostly health and social care professionals, relatively well-balanced in terms of gender, career background and years of experience. Results: end-users acceptability ratings were generally positive (rating above 3 in a 5-point scale for all dimensions) and it improved significantly after the interaction with DigiMoCA. Administrators also rated the usability of DigiMoCA, with an average score of 5.86/7 and with high internal consistency ( \(\alpha \) = 0.95). Conclusion: although there is still room for improvement in terms of user satisfaction and voice interface, DigiMoCA is perceived as an acceptable, accessible and usable cognitive screening tool, both by individuals being tested and test administrators.
Avoid common mistakes on your manuscript.
In a world with an ever-increasing human lifespan, the quality of life of senior adults is becoming more and more relevant. According to WHO [ 1 ], the percentage of population over the age of 60 will increase by 34% between 2020 and 2030, and with it, the prevalence of neuro-psychiatric disorders, particularly dementia, which have an extremely high impact on people’s well-being and their social and economical aspects.
Mild Cognitive Impairment (MCI) is the transition stage between healthy aging and dementia and is characterized by subtle cognitive deficits that do not meet the criteria for diagnosis of a major neuro-cognitive disorder (DSM-V) [ 2 ]. These difficulties can manifest themselves in areas such as memory, attention, language, orientation or decision making. Thus, detecting MCI in its early stages is beneficial in preventing the progression of the disease, and, in certain cases, in slowing down some of its symptoms. However, in most cases the detection of cognitive deficits occurs when the symptoms are already evident and when the underlying neurological disorder was already present for some time [ 3 ], which means that the disease progressed. The traditional screening method for early detection of cognitive impairment involves the use of clinically-validated gold-standard tests that assess the cognitive state of a person.
The inception of these tests trace back to the second half of the 20th century. One of the first widely used screening tools was the Mini-Mental State Examination (MMSE), published by Folstein [ 4 ] in 1975; it includes items of orientation, concentration, attention, verbal memory, naming and visuospatial skills. In the 80s, the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) was developed [ 5 ] and it included 7 items, namely word recall, naming, commands, constructional praxis, ideational praxis, orientation and word recognition.
One of the limitations of these evaluation instruments is the fact that they are dementia-oriented, particularly Alzheimer’s. Therefore, in later years other screening tools were created, e.g., the Montreal Cognitive Assessment (MoCA) [ 6 ] test, which has a 90% sensitivity for MCI detection (MMSE is not sensitive to MCI). Its telephone version (T-MoCA) [ 7 , 8 ] is also validated and has a strong correlation with MoCA with a Pearson coefficient of 0.74.
The fact that MoCA is oriented at MCI detection makes it suitable as a screening tool for an early diagnosis.
In this context, the use of Information and Communication Technologies (ICT) could be a valuable tool for the early detection of MCI cases in a reliable and efficient way, where smart conversational agents are a disruptive technology with the potential to help detect neuro-psychiatric disorders in early stages [ 9 , 10 ]. Note that the penetration of these technological tools among senior adults is not as higher as in the case of other age groups, which makes these tools even more relevant.
Previous research demonstrated that it is possible to implement a voice-based version of a gold standard test for cognitive assessment using conversational agents [ 11 ]. More specifically, DigiMoCA, an Alexa voice application based on T-MoCA, was developed and tested with actual elderly people using a smart speaker.
DigiMoCA makes use of Alexa’s voice recognition and natural language processing services, and is able to store and retrieve session data in DynamoDB (Amazon’s NoSQL database service) persistently. Additionally, DigiMoCA utilizes prosodic annotations to adapt the speech rate to the user, and collects the response time to each item using a statistical estimation of rountrip times. This information is subsequently used to enhance DigiMoCA’s CI screening performance. DigiMoCA was evaluated using the Paradigm for Dialogue System Evaluation (PARADISE), yielding a confusion matrix with a Kappa coefficient \(\kappa = 0.901\) . This means DigiMoCA understands the user approximately 90% of the time, which is equivalent to “almost perfect”[ 12 ] in terms of task completion performance.
The main objective of this work is to analyze the acceptability and usability of DigiMoCA through a user interaction pilot study [ 13 ]. For this, the perception of senior end-users as well as administrators was collected by means of standard evaluation questionnaires, and the outcomes were analyzed using standard statistical procedures.
Thus, the research question posed is:
Is the screening tool DigiMoCA acceptable and usable for the cognitive evaluation of senior adults, both by them and their evaluators?
Section 2 describes the sample of participants, the study design and the data analysis carried out; Section 3 presents and discuss the findings of the study, both from the senior end-users as well as the administrators’ point of view; finally, Section 4 summarizes the results of this research.
This user-interaction study included the participation of 46 senior end-users and 24 sector-related professionals. According to previous relevant works [ 14 , 15 ], in order to calculate the number of participants for a pilot study we need to take into account: (1) the parameters to be estimated; (2) that at least 30 participants are involved; (3) a minimum confidence interval of 80% is required. The present study fits all three criteria.
Senior end-users participated through two associations: Parque Castrelos Daycare Center (PCDC) and the Association of Relatives of Alzheimer’s Patients (AFAGA), both located in the city of Vigo (Spain). Before the start of each study, applications were submitted to the Research Ethics Committee of Pontevedra-Vigo-Ourense, containing: (1) the objectives of the study, main and secondary; (2) the methodology proposed, i.e. tests and questionnaires to administer, inclusion and exclusion criteria, recruiting procedure within the association, sample size and structure, and detailed schedule; (3) security concerns and how to address them (anonymization and encryption); (4) ethical and legal aspects, particularly regarding data privacy; and finally, (5) a copy of the informed consent to be signed in advance by all participants. Both applications for AFAGA and CDPC were approved by the corresponding dictums with registration codes 2021/213 and 2023/115 respectively.
Inclusion criteria for senior participants consisted mainly of being over the age of 65 and not having an advanced state of dementia or any other psychological pathology, or any auditory/vocal disability. Table 1 collects the demographic characteristics of the end-user participants, classified by cognitive group. The mean age was 78.61 ± 6.75, with 65% of them being female. We can see that the number of individuals is fairly distributed per group. For cognitive state classification, we used the Global Deterioration Scale (GDS) [ 16 ], which is a widely utilized scale that describes the stage of cognitive impairment, with higher GDS score meaning more deterioration. For additional information, we also show the results of the T-MoCA evaluation (16.25 ± 3.28 for healthy users (HC), 16.25 ± 3.28 for users with MCI and 16.25 ± 3.28 for users with dementia (AD)), as well as the Memory Failures of Everyday (MFE) [ 17 ] questionnaire and the Instrumental Activities of the Daily Living (IADL) scale [ 18 ].
Administrator participants, on the other hand, were affiliated to several associations, namely the Unit of Psychogerontology at the University of Santiago de Compostela, the Galicia Sur Health Research Institute, the Multimedia Technology Group at the University of Vigo, and also AFAGA and PCDC. Table 2 depicts the information about these participants, where we can see that they are predominantly from the health field. The sample has a 58.33% female composition, mostly with middle-aged participants, and fairly evenly distributed among different backgrounds. We can also see a variety in terms of seniority, ranging from less than 5 of years of experience (29.17%) to more than 20 (20.83%).
The study was organized along 3 different sessions: during the first one, T-MoCA, MFE and IADL questionnaires were administered; during the second, and after at least two weeks in between, DigiMoCA administration took place. Finally, again after two or more weeks, a second administration of DigiMoCA was carried out during the third session.
Before the first and after the second conversation with the agent, participants were asked to answer to a Technology Acceptance Model (TAM) [ 20 ] questionnaire, which covers how users come to accept a technological system. In order to determine the acceptability of the conversational agent by participants, the designed TAM questionnaire addressed 3 dimensions:
Perceived usefulness (PU) . It measures whether a participant finds the smart speaker useful, both as a general concept, and specifically during the cognitive assessment sessions.
Perceived ease-of-use (PEoU) . It measures whether the conversation with the speaker was comfortable and straightforward for the user, purely in terms of communication.
Perceived satisfaction (PS) . It measures whether the user enjoyed the utilization of the speaker, and whether they prefer it to a human counterpart (i.e., another person conducting T-MoCA as an interviewer).
The resulting questionnaire consisted of a 5-point Likert rating scale composed of 6 items, 2 for each main dimension (1 meaning strongly negative/disagree, 5 strongly positive/agree, 3 neutral). For reference, the TAM questionnaire used is available in Section 1 , translated to English.
In addition to studying how end-users interacted with DigiMoCA, another study was conducted to gather the opinions of cognitive evaluation administrators on its usability and user-friendliness. These were individuals either responsible for administering cognitive assessment tools to older adults, or had a background of expertise related to application development and voice assistants. A 7-point Likert scale questionnaire based on the Post-Study System Usability Questionnaire (PSSUQ) [ 21 ] was used (1 meaning strongly disagree, 7 strongly agree, 4 neutral). The English translation of the PSSUQ questionnaire used is available in Section 2 .
The PSSUQ-based questionnaire was designed in order to evaluate 3 usability dimensions:
System usefulness: measures the ease of use and convenience. In the designed version, includes the average scores of items 1 to 8.
Information quality: measures the usefulness of the information and messages provided by the application. Includes average scores of questions 9 to 14.
Interface quality: measures the friendliness and functionality of the user interface of the system. Includes average scores of items 15 to 17 of the questionnaire.
Overall: measures overall usability, computed as the average of the scores of all items (1 to 18 in our case).
The following statistical instruments were used to assess acceptability:
Fundamental statistics: mean, standard deviation and percentages.
Cronbach’s Alpha ( \(\alpha \) )[ 22 ] to estimate the reliability, and specifically the internal consistency, of the responses. It is widely used in psychological test construction and interpretation, and it seeks to measure how closely test items are related to one another - thus measuring the same construct. When test items are closely related to each other, Cronbach’s alpha will be closer to 1; if they are not, Cronbach’s alpha will be closer to 0. In this study, we use this metric to evaluate the internal consistency of the responses to the TAM (end-user centered) and PSSUQ (administrators centered) questionnaires. It is computed as follows:
k is the number of items/questions included.
\(\sigma _i^2\) is the variance of each item across all responses.
\(\sigma _x^2\) is the total variance, including all items.
According to Gliem [ 23 ], a good interpretation of the value of Cronbach’s alpha regarding internal consistency is \(\alpha > 0.9\) means “excellent"; \(\alpha > 0.8\) means “good"; \(\alpha > 0.7\) menas “acceptable"; \(\alpha > 0.6\) means “questionable"; and anything below 0.6 is considered an indicator of low internal consistency.
Student T-tests [ 24 ] were used for comparison of pre-pilot and post-pilot questionnaires, giving insight on the evolution of the acceptability perception of the participants during the administration. Statistical significance was measured by means of p-values.
Cohen’s d [ 25 ]: measures the effect size of T-tests, and is computed as the standardized mean difference between two groups (in this case, pre-pilot and post-pilot). It is computed as the difference between the means divided by the square root of the average of both variances:
Based on Tellez’s analysis[ 26 ] the interpretation of Cohen’s d is as follows: \(d < 0.2\) is “trivial effect"; \(0.2< d < 0.5\) is “small effect"; \(0.5< d < 0.8\) is “medium effect"; and \(d > 0.8\) means “large effect".
Statistical analysis was performed using the Google Sheets online tool, as well as Google Colab with Jupyter notebooks written in Python. Several commonly-used data analysis libraries were used (e.g., NumPy, Pandas, Pingouin).
This section presents and analyzes the main results obtained regarding the usability and acceptability of DigiMoCA, both from the end-users’ perspective (sample of n = 46) as well as the administrators’ (n = 24).
As explained in Section 2 , users completed the TAM questionnaire before and after the administration of DigiMoCA. The questionnaire included two sections, each with the 3 dimensions and 6 questions: one focused on technology in general, and another focused on DigiMoCA and conversational agents.
Table 3 presents the results of TAM’s 3-dimensional scale, taken from the post evaluation, regarding DigiMoCA’s section. Most relevant results are:
Perceived usefulness: a value of 3.87 ± 0.92 was obtained including all groups, with the highest rating within the MCI group (4.11 ± 0.92) and the lowest from the HC group (3.42 ± 0.93). Regarding the internal consistency of the answers, a value of \(\alpha \) = 0.63 was obtained, with the most internally consistent group being HC ( \(\alpha \) = 0.76) and the lowest MCI ( \(\alpha \) = 0.42).
Perceived ease of use: a value of 3.98 ± 0.96 was obtained including all groups. Once again, the highest mean value was found in the MCI group (4.14 ± 0.99), whereas the lowest rating was also obtained within the HC group (3.83 ± 0.96). In terms of internal consistency, a value of \(\alpha \) = 0.73 was obtained overall, being the HC group the most internally consistent ( \(\alpha \) = 0.96) and MCI the least one ( \(\alpha \) = 0.28).
Perceived satisfaction: including all groups we observe a value of 3.27 ± 1.21, in this case with the best rating coming from the AD group (3.47 ± 1.16), and the worst rating from the HC group (3.00 ± 1.22). Regarding the internal consistency, a value of \(\alpha \) = 0.41 was obtained, with the most internally consistent group being HC again ( \(\alpha \) = 0.56) and the least one being MCI ( \(\alpha \) = 0.25).
Overall, we consider these results to be rather positive: none of the ratings drop below 3 (out of 5) on average, either considering the overall sample or any particular group/sub-sample. This means that regardless of the level of cognitive deterioration, the users find DigiMoCA useful, easy to use and satisfactory.
Regarding the internal consistency however, it is only “acceptable" for one of the dimensions (PEoU), with a worryingly low value for the PS dimension. We believe this inconsistency to be caused by the disparity of results obtained from the two questions regarding PS: the first asks about whether participants “liked to use DigiMoCA", and the second whether they would rather “use DigiMoCA instead of T-MoCA". We observe that the answers to the second part (i.e., after interacting with the agent) are considerably lower than to the first, perhaps due to the comparison between a human-robot interaction and a human-human interaction (which is usually strongly preferred by this demographic group).
Additionally, we can observe a tendency for the MCI group to give the highest ratings but with lowest internal consistency, whereas the HC group usually gives the lowest ratings but with highest internal consistency. One possible explanation for this behavior is that cognitive impairment can interfere with consistent reasoning; it is also likely that users with MCI had more trouble understanding the full implications of the questions posed, giving less consistent answers. Certainly, it is reasonable to believe that healthy users are generally more sensitive to the intrusiveness of these evaluations, hence the slightly lower ratings.
Tables 4 and 5 present the results of the perception variation between pre-administration and post-administration of DigiMoCA. Table 4 contains the results regarding the section about technology in general, while Table 5 contains the results of the section about conversational agents. Again, data is classified by TAM dimensions (rows), including the results for each individual question (“.1" and “.2" for each dimension). We also have the results obtained classified by cognitive group (columns): HC, MCI, AD and the whole sample.
The main objective of this analysis is to determine whether the acceptability perception of users has a significant change after the administration of DigiMoCA. For this, we performed a student’s T-test with pre and post questions, and obtained three metrics: percentage change between the averages, Cohen’s d and the statistical significance p . The following paragraphs address the main findings of this process.
Regarding the technology section, there is a percentage increase in all items of the first two dimensions: +6.17% for PU.1 (d = 0.33), +3.05% for PU.2 (d = 0.11), +5.26% for PEoU.1 (d = 0.17) and +9.00% for PEoU.2 (d = 0.44). However, there is only one item (PEoU.2) that exhibits a significant change (p = 0.010). Both items from the PS dimension remain essentially unchanged. Therefore, generally speaking, we can establish that the administration does not significantly change the acceptability of technology in senior adults, but we do observe a non-significant positive change in both PU and PEoU items. Furthermore, if we look at the sample sub-groups independently, we can also observe a positive non-significant change in the vast majority of items, only one of them being significant (PEoU.2 for AD group with +17.08% change; d = 0.84, p = 0.007).
With respect to the conversational agents section, the acceptability has a more noticeable improvement among most items, three of them being statistically significant, and we also find the first item with a “large effect" size: PU.1 with +59.14% (d = 1.06, p < 0.001), PEoU.2 with +13.71% (d = 0.65, p = 0.005), PS.1 with +12.22% (d = 0.61, p = 0.005). We should also notice that the PS.2 item has a significant decrease of -24.24% (d = 0.95, d < 0.001), but we do not think this particular item is a good representative of the PS dimension, since -as it was stated previously- the pre and post questions are different, and thus it should be taken with a grain of salt. If we look at sample sub-groups independently, we can notice that none of the significant changes are in the HC group, while most are concentrated on the MCI group: +85.84% (d = 1.29, p < 0.001) for PU.1, +21.61% (d = 1.04, p = 0.007) for PEoU.2, and +16.90% (d = 1.14, p = 0.003) for PS.1. Within the AD group, only the PU.1 is statistically significant (+58.59%, d = 1.02, p = 0.013).
In light of the results discussed, it seems reasonable to affirm that the acceptability on conversational agents by senior adults improves significantly after the interaction with DigiMoCA. To support this, we found that at least one item exhibits a statistically significant (p < 0.05) positive change in all 3 dimensions, and if we discard item PS.2, which as pointd out above is probably not accurate, all items have an increase in acceptability across all groups.
In addition to the end-user interaction study, an additional study was carried out in order to measure the usability perception of DigiMoCA from cognitive assessment administrators and professionals. For this, we employed the PSSUQ questionnaire with items rated in a 7-point Likert scale, which is widely used to measure user’s perceived satisfaction of a software system. Table 6 summarizes the results, which are also categorized by gender, field of occupation and years of experience:
Overall usability (OVERALL): we obtain a 5.86 ± 1.24 mean value for all participants and all items. The mean rating does not excessively change based on gender or career experience, although the average rating for participants in the technological field was slightly higher (6.26 ± 0.94). The internal consistency obtained was “excellent" ( \(\alpha \) = 0.95) overall, with some slight differences based on gender ( \(\alpha \) being 0.88 for males and 0.97 for females), field of expertise ( \(\alpha \) = 0.96 for health field, 0.90 for technological field) and experience ( \(\alpha \) = 0.91 for administrators with 10+ years of experience, 0.97 for the ones with less than 10).
System usefulness (SYSUSE): including items 1 to 8, we obtain a mean value of 5.96 ± 1.14 for all participants. Again the mean rating is not considerably affected by gender or career experience, but we do obtain a slightly higher value of 6.36 ± 0.94 for participants in the technological field. As for the internal consistency of the answers, we get an “excellent" \(\alpha \) = 0.91 for the whole sample, although it does drop to just “good" for the male group ( \(\alpha \) = 0.85) and the most experienced participants ( \(\alpha \) = 0.88). The lowest internal consistency is found within the technological field, with an “acceptable" \(\alpha \) = 0.76.
Information quality (INFOQUAL): the mean value obta-ined from items 9 to 14 was 5.74 ± 1.44 overall. Once again, the highest differences found were based on the field of expertise: the technological field group had the highest mean value of 6.17 ± 1.22, while the lowest value was obtained from the health field group (5.63 ± 1.48). The overall internal consistency was \(\alpha \) = 0.90, and we do find differences between the demographic groups: higher consistency for females ( \(\alpha \) = 0.96) than males ( \(\alpha \) = 0.74); higher consistency for the health field group ( \(\alpha \) = 0.91) than the technological group ( \(\alpha \) = 0.79); and higher consistency for the least experienced individuals ( \(\alpha \) = 0.93) than the most experienced ( \(\alpha \) = 0.84).
Interface quality (INTERQUAL): including items 15 to 17, the overall mean rating was 5.81 ± 1.11. For this dimension the mean value for the technological field group was the highest (6.11 ± 0.65), and the mean value for the least experienced group was the lowest (5.71 ± 1.23). As for the internal consistency, this was the dimension with the lowest overall, with an “acceptable" value of \(\alpha \) = 0.77. Again we find considerable differences between demographic groups: higher \(\alpha \) = 0.88 for females than males ( \(\alpha \) = 0.34), higher \(\alpha \) = 0.80 for health field than technological field ( \(\alpha \) = 0.27) and higher \(\alpha \) = 0.90 for the less experienced group than the people with 10+ years of experience ( \(\alpha \) = 0.42). This is the only dimension where we see the internal consistency drop below an “acceptable" level, and it is probably due to the small amount of items it considers (only three).
In light of the presented results, we observe that the overall usability perception is generally positive, slightly under 6 out of 7 points, and never drops below 5 for any of its dimensions, even if considering specific demographic groups based on gender, career field and experience.
We do observe a pattern between the groups: females provide slightly lower ratings than males, but with a higher internal consistency. The exact same happens between the health field group (i.e., slightly lower ratings and higher consistency) and the technological field group, as well as between the most experienced group and the least one. The fact that this pattern repeats across groups is expected, and it is probably due to the fact that the groups are overlapping: more males than females work in the tech field, and the males happen to be younger on average than females (34.9 years old vs. 40.14, cf. Table 2 ), hence the difference found between different seniority groups. Furthermore, we noticed that participants from the medical field made more comments suggesting improvement areas than participants from the technical field, particularly regarding the user interface.
As to why this pattern occurs, we believe it is justified since DigiMoCA is inherently a technological and disruptive screening tool. Therefore, it is to be expected that professionals from the technological field are more keen to using it, and generally more interested in it and curious about how it works. Conversely, it also makes sense that professionals from the health field are more “skeptical" and less interested, since the health field is generally more stable and less prone to disruptive changes [ 27 ], and certainly more people-oriented than tool-oriented.
Finally, the fact that the information and interface-related items obtain a slightly lower rating across all groups is justified, as one of the main drawbacks of using a voice-only communication channel is the restriction of the user interface, which lacks visual user interaction. This probably means that the PSSUQ questionnaire should be adapted in this context to new ICT tools based on conversational agents, where questions about the user interface either need a reformulation or simply to be excluded.
In this paper a user-interaction pilot study analyizing the usability and acceptability of DigiMoCA -a digital Alexa-based cognitive impairment screening tool based on T-MoCA- is discussed, both from end-users’ and administrators’ perspectives.
In the case of end-users, a TAM questionnaire was utilized, administered both before and after DigiMoCA. Overall, the results show that users accept DigiMoCA, giving it a 3+ score in all three TAM’s dimensions, meaning that they perceive it as useful, easy to use and satisfactory. The perceived ease of use was particularly positive and internally consistent, with a mean score of 3.98. Additionally, the pre vs. post analysis show that, while the acceptability of technology does not change significantly after the administration of DigiMoCA, when it comes to conversational agents specifically, their perceived acceptability improves significantly. All three dimensions have an item with a statistically significant positive change. Moreover, the vast majority of non-significant changes were also positive.
In the case of test administrators, a PSSUQ questionnaire was used. Its results show that DigiMoCA is considered usable (mean score 5.86) very consistently ( \(\alpha \) = 0.95), with a score of 5+ out of 7 for all the dimensions and demographic groups. System usefulness was rated consistently higher than information and interface quality, and we find the biggest demographic differences between the health field group and the technological field group.
The sample size is one of the main limitations of the study. To estimate an ideal sample size, initially we obtained an estimation of the prevalence of AD in Spain (10.285%) Footnote 1 . Then, based on the confidence interval needed of 95% we would need n = 142 participants per study group, which is far from the sample size achieved so far.
Future lines of work include further characterizing the sample, carrying out a study of acceptability and usability by technological training of the participants, including their relationship with technology throughout their lives. Additionally, it could be worth to analyse more objective metrics, such as participants’ response times, which could enrich the study of DigiMoCA.
Ongoing work addresses the improvement of the perceived satisfaction from using DigiMoCA, by making it more friendly, while also improving its interface and the information provided to the user, compensating the voice-only interaction limitations. As these aspects are improved to make user interaction with conversational agents to be perceived closer and closer to that with human administrators, the distinctive affordability and accessibility of smart assistant-based tests can effectively set them off as a powerful screening technology.
All data supporting the findings of this study are available within the paper and its Supplementary Files, particularly the user responses to the usability and acceptability questionnaires.
Source: Clinical Practice Guideline on Comprehensive Care for People with Alzheimer’s Disease and other dementias https://portal. guiasalud.es/wp-content/uploads/2018/12/GPC_484_Alzheimer_AIA- QS_resum.pdf
WHO (2023) Un decade of healthy ageing: Plan of action. https://cdn.who.int/media/docs/default-source/decade-of-healthy-ageing/decade-proposal-final-apr2020-en.pdf?sfvrsn=b4b75ebc_28
APA (2013) Diagnostic and statistical manual of mental disorders, 5th Edn. https://doi.org/10.1176/appi.books.9780890425596
Kowalska M, Owecki M, Prendecki M, Wize K, Nowakowska J, Kozubski W, Lianeri M, Dorszewska J (2017) Aging and neurological diseases. In: Senescence, IntechOpen, Ch. 5. https://doi.org/10.5772/intechopen.69499
Gallegos M, Morgan M, Cervigni M, Martino P, Murray J, Calandra M, Razumovskiy A, Caycho-Rodríguez T, Arias Gallegos W (2022) 45 years of the mini-mental state examination (mmse): A perspective from ibero-america. Dementia & Neuropsychologia. https://doi.org/10.1590/1980-5764-dn-2021-0097
Kueper J, Speechley M, Montero-Odasso M (2018) The alzheimer’s disease assessment scale–cognitive subscale (adas-cog): Modifications and responsiveness in pre-dementia populations. a narrative review. Journal of Alzheimer’s Disease. https://doi.org/10.3233/JAD-170991
Nasreddine Z, Phillips N, Bédirian V, Charbonneau S, Whitehead V, Collin I, Cummings J, Chertkow H (2005) The montreal cognitive assessment, moca: A brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53:695–9. https://doi.org/10.1111/j.1532-5415.2005.53221.x
Article Google Scholar
Katz M, Wang C, Nester C, Derby C, Zimmerman M, Lipton R, Sliwinski M, Rabin L (2021) T-moca: A valid phone screen for cognitive impairment in diverse community samples. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring 13. https://doi.org/10.1002/dad2.12144
Nasreddine ZS (2021) Moca test: Validation of a five-minute telephone version. Alzheimer’s & Dementia 17. https://doi.org/10.1002/alz.057817
Pacheco-Lorenzo MR, Valladares-Rodríguez SM, Anido-Rifón LE, Fernández-Iglesias MJ (2021) Smart conversational agents for the detection of neuropsychiatric disorders: A systematic review. Journal of Biomedical Informatics 113. https://doi.org/10.1016/j.jbi.2020.103632
Otero-González I, Pacheco-Lorenzo MR, Fernández-Iglesias MJ, Anido-Rifón LE (2024) Conversational agents for depression screening: A systematic review. International Journal of Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2023.105272
Pacheco-Lorenzo M, Fernández-Iglesias MJ, Valladares-Rodriguez S, Anido-Rifón LE (2023) Implementing scripted conversations by means of smart assistants. Software: Practice and Experience 53. https://doi.org/10.1002/spe.3182
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics. https://doi.org/10.2307/2529310
Valladares-Rodriguez S, Fernández-Iglesias MJ, Anido-Rifón L, Facal D, Rivas-Costa C, Pérez-Rodríguez R (2019) Touchscreen games to detect cognitive impairment in senior adults. a user-interaction pilot study, International Journal of Medical Informatics 127. https://doi.org/10.1016/j.ijmedinf.2019.04.012
Lancaster GA, Dodd S, Williamson PR (2004) Design and analysis of pilot studies: recommendations for good practice. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/j.2002.384.doc.x
Cocks K, Torgerson DJ (2013) Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology. https://doi.org/10.1016/j.jclinepi.2012.09.002
Reisberg B, Torossian C, Shulman M, Monteiro I, Boksay I, Golomb J, Benarous F, Ulysse A, Oo T, Vedvyas A, Rao J, Marsh K, Kluger A, Sangha J, Hassan M, Alshalabi M, Arain F, Sh N, Buj M, Shao Y (2018) Two year outcomes, cognitive and behavioral markers of decline in healthy, cognitively normal older persons with global deterioration scale stage 2 (subjective cognitive decline with impairment). Journal of Alzheimer’s disease: JAD. https://doi.org/10.3233/JAD-180341
Montejo P, Peña M, Sueiro M (2012) The memory failures of everyday questionnaire (mfe): Internal consistency and reliability. The Spanish Journal of Psychology. https://doi.org/10.5209/rev_SJOP.2012.v15.n2.38888
Graf C (2008) The lawton instrumental activities of daily living (iadl) scale, AJN. American Journal of Nursing. https://doi.org/10.1097/01.NAJ.0000314810.46029.74
CSIC (2023) Un perfil de las personas mayores en españa 2023. https://envejecimientoenred.csic.es/wp-content/uploads/2023/10/enred-indicadoresbasicos2023.pdf
Abu Rbeian AH, Owda A, Owda M (2022) A technology acceptance model survey of the metaverse prospects, AI. https://doi.org/10.3390/ai3020018
Lewis JR (1992) Psychometric evaluation of the post-study system usability questionnaire: The pssuq. Proceedings of the Human Factors Society Annual Meeting. https://doi.org/10.1177/154193129203601617
Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika. https://doi.org/10.1007/BF02310555
Gliem JA, Gliem RR (2003) Calculating, interpreting, and reporting cronbach’s alpha reliability coefficient for likert-type scales. https://hdl.handle.net/1805/344
Mishra P, Singh U, Pandey CM, Mishra P, Pandey G (2019) Application of student’s t-test, analysis of variance, and covariance. Annals of Cardiac Anaesthesia. https://doi.org/10.4103/aca.ACA_94_19
Thalheimer W, Cook S (2002)How to calculate effect sizes from published research: A simplified methodology. Work-Learning Research. https://api.semanticscholar.org/CorpusID:145490810
Tellez A, Garcia Cadena C, Corral-Verdugo V (2015) Effect size, confidence intervals and statistical power in psychological research. Psychology in Russia: State of the Art. https://doi.org/10.11621/pir.2015.0303
Nadarzynski T, Miles O, Cowie A, Ridge D (2019) Acceptability of artificial intelligence (ai)-led chatbot services in healthcare: A mixed-methods study. Digital Health. https://doi.org/10.1177/2055207619871808
Download references
We acknowledge the contributions and support of author’s colleague Noelia Lago, as well as the staff at AFAGA (Miriam Fortes and Maxi Rodríguez) and Centro de Día Parque Castrelos (Ángeles Álvarez), and all of the participants of this study, without whom this work would not be possible.
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open access charge: CISUG/Universidade de Vigo. This work has been partially funded by Ministerio de Ciencia e Innovación, project SAPIENS- Services and applications for a healthy aging [grant PID2020-115137RB-I00 funded by MCIN/AEI/10.13039/501100011033] and by the Ministry of Science, Innovation and Universities [grant FPU19/01981] (Formación de Profesorado Universitario).
Authors and affiliations.
atlanTTic, University of Vigo, 36310, Vigo, Spain
Moisés R. Pacheco-Lorenzo, Luis E. Anido-Rifón & Manuel J. Fernández-Iglesias
Department of Electronics and Computing, USC, 15782, Santiago de Compostela, Santiago de Compostela, Spain
Sonia M. Valladares-Rodríguez
You can also search for this author in PubMed Google Scholar
Moisés R. Pacheco-Lorenzo : administration of questionnaires, statistical analysis and writing.
Sonia Valladares-Rodriguez : statistical analysis and writing.
Manuel J. Fernández-Iglesias : supervision, writing, review and editing.
Luis E. Anido-Rifón : supervision, writing, review and editing.
Correspondence to Moisés R. Pacheco-Lorenzo .
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Pacheco-Lorenzo, M.R., Anido-Rifón, L.E., Fernández-Iglesias, M.J. et al. Will senior adults accept being cognitively assessed by a conversational agent? a user-interaction pilot study. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05558-z
Download citation
Accepted : 23 May 2024
Published : 15 June 2024
DOI : https://doi.org/10.1007/s10489-024-05558-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Published on 18.6.2024 in Vol 26 (2024)
Authors of this article:
1 Inserm, Sorbonne Université, université Paris 13, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France
2 Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France
3 Institut National de la Santé et de la Recherche Médicale, Université Jean Monnet, SAnté INgéniérie BIOlogie St-Etienne, SAINBIOSE, 42270 Saint-Priest-en-Jarez, France
Marie-Christine Jaulent, PhD
Sorbonne Université
université Paris 13, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-santé, LIMICS, F-75006
15 rue de l'école de Médecine
Paris, 75006
Phone: 33 144279108
Email: [email protected]
Background: To mitigate safety concerns, regulatory agencies must make informed decisions regarding drug usage and adverse drug events (ADEs). The primary pharmacovigilance data stem from spontaneous reports by health care professionals. However, underreporting poses a notable challenge within the current system. Explorations into alternative sources, including electronic patient records and social media, have been undertaken. Nevertheless, social media’s potential remains largely untapped in real-world scenarios.
Objective: The challenge faced by regulatory agencies in using social media is primarily attributed to the absence of suitable tools to support decision makers. An effective tool should enable access to information via a graphical user interface, presenting data in a user-friendly manner rather than in their raw form. This interface should offer various visualization options, empowering users to choose representations that best convey the data and facilitate informed decision-making. Thus, this study aims to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively.
Methods: To enhance pharmacovigilance efforts, we have devised a pipeline comprising 4 distinct modules, each independently editable, aimed at efficiently analyzing health-related French web forums. These modules were (1) web forums’ posts extraction, (2) web forums’ posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France. This event led to a surge in reports to the French regulatory authority.
Results: Between January 1, 2017, and February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums. These posts contained 437,192 normalized drug-ADE couples, annotated with the Anatomical Therapeutic Chemical (ATC) Classification and Medical Dictionary for Regulatory Activities (MedDRA). The analysis of the Levothyrox new formula revealed a notable pattern. In August 2017, there was a sharp increase in posts related to this medication on social media platforms, which coincided with a substantial uptick in reports submitted by patients to the national regulatory authority during the same period.
Conclusions: We demonstrated that conducting quantitative analysis using the GUI is straightforward and requires no coding. The results aligned with prior research and also offered potential insights into drug-related matters. Our hypothesis received partial confirmation because the final users were not involved in the evaluation process. Further studies, concentrating on ergonomics and the impact on professionals within regulatory agencies, are imperative for future research endeavors. We emphasized the versatility of our approach and the seamless interoperability between different modules over the performance of individual modules. Specifically, the annotation module was integrated early in the development process and could undergo substantial enhancement by leveraging contemporary techniques rooted in the Transformers architecture. Our pipeline holds potential applications in health surveillance by regulatory agencies or pharmaceutical companies, aiding in the identification of safety concerns. Moreover, it could be used by research teams for retrospective analysis of events.
Social media as a complementary data source for pharmacovigilance.
One primary mission of regulatory agencies such as the FDA (Food and Drug Administration) or the EMA (European Medicines Agency) is to monitor drug usage and adverse drug events (ADEs) to mitigate the risks associated with drugs within the population. This task entails analyzing diverse data sources, including clinical trials, postmarketing surveillance, spontaneous reporting systems, and published scientific literature. Despite the wealth of available data, some ADEs are not always detected promptly, largely because of underreporting. In France, for instance, underreporting was estimated to range between 78% and 99% from 1997 to 2002 [ 1 ]. To tackle this challenge, several countries have implemented systems allowing patients to report ADEs.
Additional sources for detecting ADEs have been under exploration, such as electronic patient records [ 2 - 4 ] and social media platforms [ 5 - 9 ]. While some argue that social media alone cannot serve as a primary source for signal detection [ 10 ], it can be viewed as a valuable secondary source for monitoring emerging adverse drug reactions or reinforcing signals previously identified through spontaneous reports stored in traditional pharmacovigilance databases [ 11 ]. In a prior study by the authors, patient profiles and reported ADEs found in web forums were compared with those in the French Pharmacovigilance Database (FPVD). The forums tended to represent younger patients, more women, less severe cases, and a higher incidence of psychiatric disorder–related ADEs compared with the FPVD [ 12 ]. Moreover, forums reported a greater number of unexpected ADEs. Over the past decade, several tools for evaluating social media posts have been described in the literature [ 13 ]. Specifically, effective ADE detection in social media necessitates both quantitative and qualitative analyses of data [ 14 ].
Qualitative assessment entails evaluating whether users’ messages contain pertinent information for an assessment akin to a pharmacovigilance case report. This includes details such as the patient’s age and gender, the severity of the case, the expectedness and timeline of the adverse event, time-to-onset, dechallenge (outcome upon drug withdrawal), and rechallenge (outcome upon drug reintroduction). For instance, GlaxoSmithKline Inc. implemented the qualitative approach Insight Explorer, which facilitates the collection of extensive data for causality and quality assessment. Users can input data including personal information (eg, age range, gender) and product details (eg, name, route of administration, duration of use, dosage). This approach was adapted for the WEB-RADR (Recognizing Adverse Drug Reactions) project to manually construct a gold standard of curated patient-authored text [ 15 ].
Quantitative evaluation involves analyzing extracted data using descriptive and analytical statistics, such as signal detection and change-point analysis. Numerous projects have been undertaken to monitor ADEs on social media. One of the earliest projects is the PREDOSE (Prescription Drug Abuse Online Surveillance and Epidemiology) project [ 5 ], which investigates the illicit use of pharmaceutical opioids reported in web forums. While the PREDOSE project showcased the potential of leveraging social media for opioid monitoring, notable limitations are the lack of deidentification and signal detection methods. MedWatcher Social, a monitoring platform for health-related web forums, Twitter, and Facebook, represents a prototype application developed in 2014 [ 16 ]. Yeleswarapu et al [ 6 ] outlined a semiautomatic pipeline that applies natural language processing (NLP) tasks to extract ADEs from MEDLINE abstracts and user comments from health-related websites. However, this pipeline was not intended for routine use.
The Domino’s interface [ 17 ], developed in 2018 by the University of Bordeaux in France and funded by the French Medicines Agency (Agence nationale de sécurité du médicament et des produits de santé [ANSM]), was designed to analyze drug misuses in health-related web forums using NLP methods and the summary of product characteristics. Initially tailored for antidepressant drugs, this tool does not primarily focus on ADE surveillance.
Another pipeline, described by Nikfarjam et al in 2019 [ 7 ], used a neural network–based named entity recognition system specifically designed for user-generated content in social media. This platform is dedicated to identifying the association of cutaneous ADEs with cancer therapy drugs. The study focused on a selection of drugs and only examined 8 ADEs.
Magge et al [ 8 ] described a pipeline aimed at the extraction and normalization of adverse drug mentions on Twitter. Their pipeline consisted of an ADE classifier designed to identify tweets mentioning an ADE, which were then mapped to a MedDRA (Medical Dictionary for Regulatory Activities Terminology) code. However, the normalization process was confined to the ADEs present in the training set. Neither Nikfarjam’s nor Magge’s pipeline provides a graphical user interface.
Some private companies also offer tools for analyzing social media for pharmacovigilance purposes. For instance, the DETECT platform was developed as part of a collaborative project in France by Kappa Santé [ 18 ]. This system enabled the labeling of posts with known controlled vocabulary concepts, and signal detection was conducted [ 19 ]. Within the scope of this project, Expert System Company implemented BIOPHARMA Navigator to extract web forum posts, while the Luxid Annotation Server provided web services for the automatic annotation of posts.
An important finding from the studies of the last decade is that while regulatory agencies have begun using data sources beyond spontaneous reports, social media has yet to be fully leveraged in real-world settings due to the immaturity of available solutions. Primarily, these solutions are essentially proofs of concept that lack scalability and are challenging for experts to evaluate routinely, primarily due to the absence of a graphical user interface to present information.
Our aim was to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively.
This article presents the design and implementation of our pipeline dedicated to harnessing posts from social media. In addition, we showcase the use of the pipeline through a specific use case, emphasizing the importance of monitoring drugs in social media to better address patients’ expectations.
The PHARES project (Pharmacovigilance in Social Networks), funded from 2017 to 2019 by the French ANSM, aimed to develop a software suite (a pipeline) enabling pharmacovigilance users to analyze social networks, particularly messages posted on forums. The objective of the pipeline is to facilitate routine use through continuous post extraction and quantitative data analysis from web forums, specifically tailored for the French language.
The pipeline is made up of 4 modules, each referring to its own methods ( Figure 1 ):
The Scraper module, which extracts posts from forums using a previously developed tool, Vigi4Med (V4M) scraper [ 9 ], and produces a comma-separated values (CSV) file filled with the texts extracted.
The Annotation module, which extracts elements of interest from the posts and registers annotations in CSV files, with each line representing an annotation of an ADE or a drug. When a causality relationship is identified, both an ADE and a drug are annotated on the same line.
The Statistical module, which performs quantitative analysis on the annotated posts, generating numerical data, tables, or figures.
The Interface module, which supports query definition and visualization of results.
The methodology used to evaluate the PHARES pipeline involved comparing its performance with existing platforms mentioned above, in accordance with a set of criteria established with prospective PHARES users. The criteria, specific to each module, are as follows:
V4M Scraper is an open-source tool designed for data extraction from web forums [ 9 ]. Its primary functions are optimizing scraping time, filtering out posts primarily focused on advertisements, and structuring the extracted data semantically. The module operates by taking a configuration file as input, which contains the URL of the targeted forum. The algorithm navigates through forum pages and generates resource description framework (RDF) triplets for each extracted element, allowing for potential alignment with external semantic resources. A caching mechanism has been integrated into this tool to maintain a local copy of previously visited pages, thereby avoiding redundant requests to websites for already scraped web pages, particularly in cases of errors or testing, for example. Vigi4Med V4M Scraper was customized for the PHARES project, as indicated by the red elements in Figure S1 in Multimedia Appendix 1 . The database format (Figure S2 in Multimedia Appendix 1 ) was implemented to enhance interaction with the interface. Specifically, the main scraping script was adjusted to produce a simplified tabular format (CSV) of the extracted data and to store these data in a database. This modification aims to facilitate input to the subsequent module of the pipeline (annotation). V4M Scraper was customized to enable a continuous scraping routine, wherein data extracted from web forums are automatically and regularly annotated and registered. A log file was integrated into the scraper structure to maintain a record of the last scraped element. This log file ensures that the daily routine scraping always begins from the last scraped point. An automation tool (crontab) is used to schedule the execution of the pipeline for each forum on a daily basis at a specific time.
A total of 23 public French health-related web forums were selected through a combination of Google searches and from a list of certified health websites provided by the HON Foundation, in collaboration with the French National Health Authority (HAS). The selection criteria included the requirement for websites to be hosted in France, feature a discussion board or space for sharing experiences, and have more than 10 patient contributions. Furthermore, Twitter posts are collected and analyzed by the pipeline. This is achieved using the Twitter API for data collection, followed by employing the same modules used for processing web forum posts.
Entities corresponding to drugs and pathological conditions in social media were identified and annotated using an NLP pipeline [ 20 ]. Initially, conditional random fields were used to account for global dependencies [ 21 ]. Specifically, the model considers the entire sequence when making predictions for individual tokens. This approach is advantageous for entity extraction tasks, as the presence of an entity in one part of the text can influence the likelihood of other entities in the vicinity. Second, a support vector machine is used to predict the causality relationship between an entity identified as a drug and another entity identified as an ADE. The annotation method used in this module was implemented at an early stage of the pipeline’s design. Currently, the named entity recognition task of this module is undergoing revision to incorporate more recent advancements in NLP algorithms [ 22 - 26 ].
In a third step, the detected annotations were normalized using codes from the MedDRA and the Anatomical Therapeutic Classification (ATC) to ensure they were suitable for signal detection purposes.
MedDRA is an international medical hierarchical terminology comprising 5 levels used to code potential ADEs in pharmacovigilance. The highest level is the system organ class, which is further divided into high-level group terms, then into high-level terms, preferred terms (PTs), and finally lowest level terms. Typically, the PT level is used in pharmacovigilance signal detection.
The ATC classification system is a drug classification used in France for pharmacovigilance purposes. It categorizes the active ingredients of drugs based on the organ system they primarily affect. The classification comprises 5 levels: the anatomical main group (consisting of 14 main groups), the therapeutic subgroup, the therapeutic/pharmacological subgroup, the chemical/therapeutic/pharmacological subgroup, and the chemical substance. Typically, the fifth level (chemical substance) is used in pharmacovigilance signal detection.
The outputs of the annotation module are CSV files with the following variables:
In these CSV files, each line can consist of either an adverse event (ADE) annotation, a drug annotation, or both when a causality relationship has been identified between the drug and the ADE. Table 1 provides a sample of the database.
In a prior study, we selected posts where at least one ADE associated with 6 drugs (agomelatine, baclofen, duloxetine, exenatide, strontium ranelate, and tetrazepam) had been detected by this algorithm. A manual review revealed that among 5149 posts, 1284 (24.94%) were validated as pharmacovigilance cases [ 12 ]. The fundamental metrics used to assess the performance of the annotation module were precision (P), recall (R), and their harmonic mean F 1 -score. To calculate these metrics, it is necessary to evaluate false negatives for nonrecognition of relevant terms, false positives for irrelevant recognitions, and true positives for correct recognitions. Precision, recall, and F 1 -score are defined as follows:
Precision = (true positive)/(true positive + false positive); recall = (true positive)/(true positive + false negative); F 1 -score = (2 × precision × recall)/(precision + recall) (1)
In the “Results” section, we present a comparison of the performance of the annotation module with the performance of state-of-the-art methods [ 8 , 22 , 25 , 26 ].
Forum name | Post ID | Date | Time | ADE verbatim | ADE normalized | Concept unique identifier | Drug verbatim | Drug normalized | Active ingredient | MedDRA code | ATC code |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Maux de tête | Céphalée | C0018681 | Lévothyrox | LEVOTHYROX | Levothyroxine sodique | — | H03AA01 |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Maux de tête | Céphalée | C0018681 | Calcium | — | — | — | — |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Nodules cancereux | — | — | Lévothyrox | LEVOTHYROX | Levothyroxine sodique | — | H03AA01 |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Nodules cancereux | — | — | Calcium | — | — | — | — |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Fatigue | Fatigue | C0015672 | Lévothyrox | LEVOTHYROX | Levothyroxine sodique | 10016256 | H03AA01 |
Atoute | 7354 | October 8, 2018 | 21:37:00 | fatigue | Fatigue | C0015672 | Calcium | — | — | 10016256 | — |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Perte de poids | Poids diminué | C0043096 | Lévothyrox | LEVOTHYROX | Levothyroxine sodique | 10048061 | H03AA01 |
Atoute | 7354 | October 8, 2018 | 21:37:00 | Perte de poids | Poids diminué | C0043096 | Calcium | — | — | 10048061 | — |
a ADE: adverse event.
b MedDRA: Medical Dictionary for Regulatory Activities Terminology.
c ATC: Anatomical Therapeutic Classification.
d No data are available for this slot.
This module generates general statistics and diagrams for web forums or Twitter. It provides data such as the number of annotated posts (related to the drug, the ADE, or both), the count of drug-ADE pairs identified, and the distribution of ADEs’ MedDRA-PTs. In addition, a change-point analysis method was used to detect significant changes over time in the mean number of posts mentioning the drug and ADE [ 27 ].
Besides, several statistical signal detection methods were implemented to generate potential signals. Safety signals, which provide information on adverse events that may potentially be caused by a medicine, were further evaluated by pharmacovigilance experts to determine the causal relationship between the medicine and the reported adverse event.
The statistical module implements 3 signal detection methods, including 2 well-known and frequently used disproportionality signal detection methods: the PRR [ 28 ] and the reporting odds ratio (ROR) [ 29 ]. In addition, a complementary method, a logistic regression–based signal detection method known as the class imbalanced subsampling lasso [ 30 ], was used.
PRR and ROR are akin to a relative risk and an odds ratio, respectively. However, they differ in their denominators: as the number of exposed patients is typically unknown in pharmacovigilance databases, the denominator in PRR and ROR calculations is the number of cases reported in the pharmacovigilance database.
PRR and ROR are specific to each drug-ADE pair and can be directly computed from the contingency table ( Table 2 ).
Adverse drug event of interest | Other adverse drug events | |
Drug of interest | ||
Other drugs |
The PRR compares the proportion of an ADE among all the ADEs reported for a specific drug with the same proportion for all other drugs in the database (Equation 2). A PRR significantly greater than 1 suggests that the ADE is more frequently reported for patients taking the drug of interest, while a PRR equal to 1 suggests independence between the 2 variables.
PRR = [a/(a + b)]/[c/(c + d)] (2)
The ROR quantifies the strength of the association between drug administration and the occurrence of the ADE. It represents the ratio of the odds of drug administration when the ADE is present to the odds of drug administration when the ADE is absent (Equation 3). When the 2 events are independent, the ROR equals 1. An ROR significantly greater than 1 suggests that drug administration is associated with the presence of the ADE.
ROR = ad / bc (3)
We considered events over posts for the calculation of disproportionality statistics. If the same drug-ADE pair was identified multiple times within a post, the pair was counted as many times as it occurred in the calculation.
Disproportionality analysis has certain limitations, including the confounding effect resulting from coreported drugs and the masking effect, where the background relative reporting rate of an ADE is distorted by extensive reporting on the ADE with a specific drug or drug group. Caster et al [ 31 ] demonstrated through 2 real case examples how multivariate regression–based approaches can address these issues. Harpaz et al also suggested that logistic regression could be used for safety surveillance [ 32 ]. Initially designed for pharmacovigilance case reports, we hypothesize that they may also be applicable to posts.
The logistic regression model specifically focuses on a particular ADE or a group of ADEs. It involves creating a vector that represents the presence (1) or absence (0) of the ADE of interest in the pharmacovigilance case (in our case, in the post). Additionally, a matrix is generated to represent the administration or nonadministration of all drugs in the database by the patient (1 for administration and 0 for nonadministration). Figure S3 in Multimedia Appendix 1 illustrates an example of using logistic regression. In our case, we assumed that if a drug was annotated in the post, it was taken by the patient. The logistic regression aims to predict the probability of the presence of the ADE (ADE=1) of interest based on the presence of all ( N m ) drugs in the database (Equation 4), where X represents the distribution of the presence/absence of the drugs. The adjusted factors included only concomitant medications, as patient-related factors are often missing in web forums’ posts. Therefore, we did not need to address the impact of missing data, which should be evaluated when necessary.
ln([P(X|ADE=1)]/[P(X|ADE=0)]) = a + b1 × Drug1 + ... + bi × Drug i + .. . + bNm × Drug Nm (4)
The selection of the drugs depends on the parameter b i . If b i <0, the drug i decreases the risk of the ADE, and if b i >0, the drug i increases the risk of the ADE.
Then, 2 sets are defined:
In our case n 0 >> n 1 , indicating a significant imbalance toward posts lacking annotations of the ADEs of interest. To address this issue, we took a subsample with a more favorable ratio of posts with annotated ADEs versus those without. Additionally, to enhance result stability, we conducted multiple draws instead of just one.
In practice, we generated B subsamples. Each subsample was constructed by randomly drawing, with replacement, n 1 posts from S 1 and R posts from S 0 , where R=max(4 n 1 , 4 N m ). The choice of 4 n 1 was inspired by case-control studies, while 4 N m was included to ensure an adequate number of observations considering the multitude of predictors.
We implemented a change-point analysis method described in [ 27 ] to detect whether there was a change in the evolution over time of a chosen statistic, such as the number of a specific drug-ADE pair, the number of ADEs associated with a specific drug, or the number of drugs associated with a specific ADE. The method uses the Cumulative Sum (CUSUM) algorithm to analyze the evolution of statistics over time, comparing current values with the period mean. It identifies breakpoints by calculating the highest difference in statistical values and comparing it with random samples. The process repeats for periods before and after detected breakpoints until no more are found.
The user interface module facilitates user interaction with the pipeline in a user-friendly manner. The interface comprises a dashboard divided into 2 main parts. The left dark column ( Figure 2 ) serves as a control sidebar, where users can select parameters to filter the data, including the forum, period, drug(s) according to the ATC classification, and ADE(s) according to a level in the MedDRA hierarchy. On the right side of the interface, various visualizations are available, organized into several tabs such as “Forum Statistics” and “Consultation of Posts,” with additional tabs for statistics that become active upon querying.
Before applying a specific query, the interface provides general information about the currently available data ( Figure 2 ), including the total annotated posts since 2017 (n=2,081,296) and total annotations since 2017 (n=2,454,310). In addition, a “Consultation of Tweets” tab (not visible in the figure) displays the total annotated tweets since March 2020 (n=46,153).
Furthermore, several tabs corresponding to different types of statistics, including “Forums Statistics” and “Twitter Statistics,” provide general statistics and diagrams for web forums and Twitter. Examples of these are pie charts showing forum distribution, line charts depicting the evolution of drug and ADE mentions, histograms displaying ADE distribution by system organ class, and line charts illustrating the temporal trend of posts containing the drug and an ADE, as shown in Figures 3 and 4 . The “Annotations Plot” tab displays annotations of drugs and adverse effects selected by the user, along with forum information, PTs, high-level terms, high-level group terms, dates, and hours. The “Logistic Regression” tab allows users to choose parameters for applying logistic regression. In the “Disproportionality” tab, users can choose between the PRR and ROR methods, with the time evolution of the chosen method displayed. The “Change-Point” tab enables analysis of temporal evolution, with identified breakpoints indicated. The “Consultation of Posts” and “Consultation of Tweets” tabs provide details on annotated posts/tweets, including downloadable tables. The statistical module performs calculations based on user queries, updating the interface accordingly. If multiple drugs or adverse events are selected, they are treated as new entities for analysis.
The interface was implemented using the R language and environment (R Foundation) for statistical computing and graphics [ 33 ], leveraging the Shiny package [ 34 ] for development.
A statement by an Institutional Review Board was not required because we used only publicly available data that do not necessitate Institutional Review Board review.
This study complied with the European General Data Protection Regulation (GDPR), which has been in force since 2018 in Europe [ 35 ]. The GDPR enhances the protection of individuals by introducing the right to be informed about the processing of personal data. However, informing each user individually may be impractical. Therefore, the GDPR introduces 2 legal conditions where informed consent is not mandatory, which can be interpreted as supporting the processing of web forum posts for pharmacovigilance (Article 9): “(e) processing relates to personal data which are manifestly made public by the data subject; [. . .] (i) processing is necessary for reasons of public interest in the area of public health, such as [. . .] ensuring high standards of quality and safety of health care and of medicinal products . . ..” The GDPR also requires data processing to “not permit or no longer permits the identification of data subjects” (Article 89). Deidentification was conducted during the extraction of posts from web forums to ensure privacy [ 9 ]. User identifiers in the main RDF file were encrypted using the SHA1 algorithm [ 36 ]. The correspondence between these encrypted identifiers and the original keys is presented in RDF triplets in a separate file, referred to as the “keys file.” Therefore, the only way to retrieve the original authors’ identities is by concatenating the main RDF containing the encrypted data with the keys file, which is kept in a secured location. Moreover, all our data processing was carried out on a secured server with restricted access.
The primary outcome of this study is the operational PHARES pipeline itself. Daily extraction and annotation of posts are initiated and imported into the database linked to the user interface. In this paper, the platform’s use will be demonstrated through a specific use case on the analysis of Levothyrox ADE mentions in forums (discussed later). In addition, we conducted a comparative analysis of the PHARES pipeline with the existing platforms mentioned in the “Introduction” section, based on the criteria listed in the “Methods” section.
Of the 10 identified pipelines, half were public and half were private. While 8 out of 10 focused on ADEs, only 4 were designed for routine usage. Five scrapers were open source, and all posts from considered websites were extracted by only 6 of the scrapers (with others extracting posts under certain conditions). Six scraped web forum posts, but only 3 performed deidentification. Additionally, 4 pipelines focused on the French language. A total of 6 pipelines displayed the temporal evolution of the number of posts, but only 1 conducted a change-point analysis. Signal detection methods were performed by only 4 of them, with none displaying the temporal evolution of the PRR nor a logistic regression–based method. Finally, 6 of them had an interface ( Table 3 ).
Pipeline | General | Scraper | Annotation | Statistics | Signal detection | ||||||||||||
Focus on ADEs | Routine usage | Public/private | All posts | Deidentification | Web forums | Open source | French language | Temporal evolution | Change-point analysis | Signal detection | PRR temporal evolution | Logistic regression | Interface | ||||
PREDOSE | X | ✓ | Public | ✓ | X | ✓ | ✓ | X | ✓ | X | X | X | X | ✓ | |||
Insight Explorer | ✓ | X | Private | X | X | X | ✓ | X | X | X | X | X | X | ✓ | |||
MedWatcher Social | ✓ | ✓ | Public | X | X | ✓ | ✓ | X | ✓ | X | ✓ | X | X | ✓ | |||
Yeleswarapu et al [ ] | ✓ | X | Private | X | X | X | X | X | X | X | ✓ | X | X | X | |||
Domino | X | ✓ | Public | ✓ | X | ✓ | ✓ | ✓ | ✓ | X | X | X | X | ✓ | |||
Nikfarjam et al [ ] | ✓ | X | Public and Private | X | X | X | X | X | X | X | X | X | X | X | |||
Magge et al [ ] | ✓ | X | Public | ✓ | X | X | ✓ | X | ✓ | X | X | X | X | X | |||
ADR-PRISM | ✓ | X | Public and Private | ✓ | ✓ | ✓ | X | ✓ | ✓ | X | ✓ | X | X | ✓ | |||
Kappa Santé | ✓ | ✓ | Private | ✓ | ✓ | ✓ | X | ✓ | ✓ | ✓ | ✓ | X | X | ✓ | |||
Expert System | ✓ | X | Private | ✓ | ✓ | ✓ | X | ✓ | X | X | X | X | X | ✓ |
a PHARES: Pharmacovigilance in Social Networks.
b The X symbol means that the characteristic is missing and the symbol ✓ means the characteristic is fulfilled.
c ADE: adverse drug event.
d PRR: proportional reporting ratio.
e PREDOSE: Prescription Drug Abuse Online Surveillance and Epidemiology.
f ADR-PRISM: Adverse Drug Reaction from Patient Reports in Social Media.
We also compared the performance of our annotation process with those of up-to-date state-of-the-art methods ( Table 4 ).
While the annotation module demonstrated good performance for named entity recognition ( F 1 -score=0.886), it remains slightly below the state of the art. Presently, in medical texts, the best performances are achieved by Hussain et al [ 25 ] and Ding et al [ 26 ] for the named entity recognition task, and by Xia [ 22 ] for the relationship extraction task. On Twitter, known for its notably more complex data, Hussain et al [ 25 ] achieved slightly better results than our annotator, while Ding et al [ 26 ] achieved slightly worse results.
Annotator | Language | Data | Natural language processing method | Named entity recognition (precision; recall; -score) | Relationship extraction (precision; recall; -score) |
PHARES | French | Patient’s web drug review | Conditional random fields and support vector machines | 0.926; 0.845; 0.886 | 0.683; 0.956; 0.797 |
Magge et al [ ] | English | BERT neural networks | 0.82; 0.76; 0.78 | — | |
Xia [ ] | English | Medical texts | HAMLE model | — | 0.929; 0.914; 0.921 |
Hussain et al [ ] | English | Medical texts (PubMed) and Twitter | BERT | 0.982; 0.964; 0.976 (PubMed) and 0.840; 0.861; 0.896 (X/Twitter) | — |
Ding et al [ ] | English | Medical texts (PubMed) and Twitter | BGRU + char LSTM attention + auxiliary classifier | 0.867; 0.948; 0.906 (PubMed) and 0.785; 0.914; 0.844 (Twitter) | — |
a The 2 categories are entity recognition, which is the detection of a drug or ADE mention, and relationship extraction, which is the detection of a relation between a drug and an ADE.
b PHARES: Pharmacovigilance in Social Networks.
c BERT: Bidirectional Encoder Representations from Transformer.
d Not available.
e HAMLE: Historical Awareness Multi-Level Embedding.
f BGRU: Bidirectional Gated Recurrent Unit.
g LSTM: Long-Short-Term-Memory.
From January 1, 2017, to February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums ( Table 5 ). We obtained 713,057 normalized annotations of drugs, 1,527,004 normalized annotations of ADEs, and 437,192 annotations of normalized drug-ADE couples. The number of posts annotated with at least one normalized drug-ADE couple was equal to 125,279 (6.02%). Table 4 summarizes the number of posts extracted per forum, the publication dates, and the description of the web forum. For 1 forum, the publication dates were not available. A total of 9 were generalist health forums, 3 were specialized for parents of a young baby, 2 for families, 3 for mothers, 2 specialized in thyroid issues, 1 for pregnant women, 1 for women, 1 for parents of a teenager or for teenagers, 1 for sports persons, and 1 specialized in rare diseases.
Forum | Extracted posts, n | Publication date of the first extracted post | Publication date of the last extracted post | Description |
thyroideNEW | 451,253 | February 15, 2001 | February 25, 2021 | Specialized in thyroid issues |
doctissimoSante | 248,691 | March 19, 2003 | January 16, 2021 | Generalist health forum |
doctissimoNutrition | 183,730 | December 30, 2002 | January 16, 2021 | Specialized in nutrition |
infoBebe | 127,341 | November 30, 2000 | March 08, 2019 | Specialized for parents of a young baby |
atoute | 118,415 | February 05, 2005 | February 28, 2021 | Generalist health forum |
notreFamille | 97,098 | March 16, 2000 | October 26, 2017 | Specialized for families |
magicMaman | 96,713 | June 14, 1999 | February 22, 2021 | Specialized for mothers |
doctissimoMed | 95,531 | August 05, 2002 | January 15, 2021 | Generalist health forum |
doctissimoGrossesse | 93,449 | November 09, 2006 | January 15, 2021 | Specialized for pregnant women |
thyroide | 73,376 | September 25, 2001 | January 07, 2019 | Specialized in thyroid issues |
aufeminin | 72,732 | April 05, 2001 | January 09, 2020 | Specialized for women |
mamanVie | 69,167 | June 07, 2006 | April 10, 2019 | Specialized for mothers |
onmeda | 61,428 | July 25, 2001 | February 24, 2021 | Generalist health forum |
ados | 58,181 | June 20, 2006 | March 08, 2019 | Specialized for parents of a teenager or for teenagers |
carenity | 52,659 | May 16, 2011 | August 29, 2020 | Generalist health forum |
famili | 51,844 | November 06, 2000 | November 17, 2019 | Specialized for families |
babyFrance | 43,806 | January 20, 2003 | April 30, 2018 | Specialized for parents of young baby |
bebeMaman | 38,450 | — | — | Specialized for mothers of young baby |
alloDocteurs | 15,833 | June 15, 2009 | February 09, 2021 | Generalist health forum |
reboot | 9383 | May 04, 2016 | February 25, 2021 | Generalist health forum |
futura | 6765 | May 12, 2003 | February 22, 2021 | Generalist health forum |
sportSante | 6350 | May 10, 2011 | January 14, 2020 | Specialized for sportsperson |
maladieRares | 4827 | October 09, 2012 | May 14, 2020 | Specialized in rare diseases |
queChoisir | 4250 | June 16, 2003 | February 11, 2021 | Generalist health forum |
a Not available.
To demonstrate the usage of the pipeline, we chose to focus on Levothyrox as a case study. Levothyrox is a drug prescribed in France since 1980 for hypothyroidism and circumstances where it is necessary to limit the thyroid-stimulating hormone. In 2017, a new formula of Levothyrox, differing from the 30-year-old drug at the excipient level (with lactose being replaced by mannitol and citric acid in the new formula), was marketed with widespread media coverage. In parallel, an unexpected increase in notifications of ADEs for this drug was detected. Viard et al [ 37 ] were unable to find any pharmacological rationale to explain that signal. Approximately 32,000 adverse effects were reported by patients in France in 2017, representing 42% of all the ADEs collected yearly [ 38 ]. Most of these notifications concerned the new formulation of Levothyrox and led to the “French Levothyrox crisis.” In 2017, 1664 notifications of ADEs were spontaneously reported by patients to the Pharmacovigilance Center of Nice. Among the 1544 reviewed notifications, 1372 concerned Levothyrox while only 172 concerned other drugs [ 37 ].
In this use case, the study period was from January 1, 2017, to February 28, 2021, and the drugs included were 2 drugs from the “H03AA Thyroid hormones” ATC class: “Levothyroxine sodium” and “associations of levothyroxine and liothyronine.” A total of 17 forums were selected as they included at least one post with information about these drugs. Posts were extracted, annotated, and analyzed through the pipeline from several forums ( Table 6 ). Signal detection methods were applied to an ADE chosen as it frequently appeared with Levothyrox in our data: “tiredness.” A signal can be detected when the lower bound of the 95% CI of the logarithm of the PRR is greater than 0. For logistic regression, we applied the tenth quantile. A total of 11,340 posts contained an annotation concerning the drugs of interest. Figure S4 in Multimedia Appendix 1 illustrates the source and evolution over time of these posts. Out of a total of 50,127 annotations of Levothyrox, they principally originated from the Vivre sans thyroïde forum and were mostly posted in mid-2017 ( Figure 4 , Table 6 ). The results of the statistical analysis were displayed by the user interface.
ADEs annotated with Levothyrox were mainly from system organ classes: general disorders and administration site conditions (29.6%), metabolism and nutrition disorders (11.6%), and endocrine disorders (11.4%). The PTs mostly found in association with Levothyrox are listed in Table 7 . All this information is accessible in the interface module (Figure S5 in Multimedia Appendix 1 ).
We chose the PT “tiredness” for the signal detection analysis. A total of 85,976 posts were annotated with either one of the drugs of interest or the ADE tiredness. Among them, 1841 Levothyrox-tiredness couples were found, mostly in 2017 ( Table 7 ).
Figure 5 illustrates the time evolution of the PRR for the Levothyrox-tiredness couple. Figure S6 in Multimedia Appendix 1 displays the source and evolution over time of French web forums’ posts for this couple. A signal is consistently generated throughout the period as the logarithm of the PRR is always greater than 0.
Forum | Value, n | Cumulative frequency, % |
Vivre sans thyroïde | 41,211 | 82.21 |
Doctissimo Santé | 4230 | 90.65 |
Doctissimo Grossesse | 1476 | 93.60 |
Doctissimo Nutrition | 1177 | 95.94 |
Carenity | 863 | 97.67 |
Allo docteurs | 502 | 98.67 |
Atoute | 170 | 99.01 |
Doctissimo medicaments | 166 | 99.34 |
Que choisir | 85 | 99.51 |
Maladie rares | 76 | 99.66 |
Au feminin | 58 | 99.77 |
Sport santé | 50 | 99.87 |
Onmeda | 48 | 99.97 |
Famili | 7 | 99.98 |
Futura | 5 | 99.99 |
Maman vie | 2 | 100.00 |
Magic maman | 1 | 100.00 |
Preferred terms | Values, n |
Pain | 1882 |
Tiredness | 1841 |
Faintness | 1267 |
Hypothyroidism | 1110 |
Dizziness | 912 |
Insomnia | 627 |
Palpitations | 571 |
Hyperthyroidism | 568 |
Malignant tumor | 560 |
Anxiety | 498 |
Overdose | 490 |
Nervous tension | 484 |
Myalgia | 409 |
Nausea | 388 |
Stress | 380 |
Diarrhea | 354 |
Tachycardia | 322 |
Muscle spasms | 321 |
Convulsions | 302 |
Arthralgia | 276 |
A total of 11 drugs were found to be associated with tiredness using logistic regression: paclitaxel, pegfilgrastim, Levothyrox, glatiramer acetate, escitalopram ferrous sulfate, the combination of Levothyrox and liothyronine, secukinumab, methotrexate, bismuth potassium, tetracycline, and metronidazole.
Change-point analysis was conducted on the monthly evolution of the number of Levothyrox-ADE couples detected in web forums. Six breakpoints were identified ( Figure 6 ), and 3 of them correlated with an increase in the number of ADEs found with Levothyrox on web forums. These increases occurred in August 2017 and in September and December 2018.
This use case demonstrates that the results obtained through the pipeline, particularly in the context of Levothyrox, align with findings in the literature derived from more traditional data sources such as case reports in pharmacovigilance (see the “Discussion” section). It underscores the potential of leveraging such a pipeline to monitor a drug, not only retrospectively but also in real time using social media. Consequently, PHARES has the capability to potentially uncover new signals in pharmacovigilance.
To align with our objective, we implemented and evaluated a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. Through this pipeline, we demonstrated that quantitative analysis can be conducted through the interface without requiring the user to code. We discovered the feasibility of acquiring information akin to the literature regarding a drug’s ADEs, as well as unexpected ADEs and significant event dates related to a drug. This underscores the relevance and utility of such a pipeline.
A conceptual contribution of this research was the proposal of a methodology for designing a pipeline to facilitate pharmacovigilance studies on web forums. This involved describing 4 independent modules and outlining their interactions. Additionally, another contribution was the adaptation of certain pharmacovigilance analysis methods for the examination of data extracted from web forum posts. The logistic regression–based method presented in this article was originally tailored for pharmacovigilance cases to consider co-prescriptions of drugs. We have adapted it to suit the analysis of pharmacovigilance data extracted from web forum posts.
The PHARES pipeline offers added value compared with previous pipelines in terms of the criteria set, which reflects an analysis of experts’ needs for routine monitoring of ADEs in social media. Unlike previous approaches, the scrapers used in PHARES routinely perform deidentification, and the inclusion of change-point analysis, the evolution of PRRs over time, and a logistic regression–based signal detection method were previously unavailable. The temporal evolution of the number of posts and a signal detection method are also seldom supported. Designed for routine usage and focused on ADEs, all posts from selected web forums are scraped and deidentified using an open-source scraper.
The period and selected web forums differed between both studies: Audeh et al [ 38 ] covered the period from January 2015 to December 2017, while our study spanned from January 2017 to February 2021. Additionally, Audeh et al [ 38 ] included only 1 web forum specialized in thyroid issues, whereas we incorporated this specific forum along with 16 others. The main ADEs associated with Levothyrox in our study align with those found by Audeh et al [ 38 ] on similar data, albeit without using the interface. In our study, the 10 most frequent symptoms were pain, tiredness, faintness, hypothyroidism, dizziness, insomnia, palpitations, hyperthyroidism, malignant tumor, and anxiety. By contrast, Audeh et al [ 38 ] reported tiredness, weight gain, pain, ganglions, hot flush, chilly, inflammation, faintness, weight loss, and discomfort.
Furthermore, the PHARES pipeline surpasses previous efforts, particularly regarding several criteria. These include the annotation tool, where only 4 pipelines were identified using a French annotator tool. In terms of available statistics, only 1 pipeline met both criteria we identified. Regarding signal detection, among the 3 criteria identified, 5 pipelines matched with only 1, while the remaining 5 matched with none.
In the use case, a notable increase in the number of ADEs associated with Levothyrox was detected using the change-point analysis method a few months after the introduction of the new formula in March 2017, specifically in August 2017. This surge coincided with the initial declaration to the pharmacovigilance network and a petition initiated by patients to reintroduce the former formula in June 2017. We compared these findings with results from a pharmacovigilance study based on spontaneous reporting. Out of 1554 notifications spontaneously addressed by patients to the Pharmacovigilance Center of Nice from January 1, 2017, to December 31, 2017, 1372 were related to the new formula of Levothyrox, representing 7342 ADEs. Our comparison with these data clarified our findings. The 10 most frequently reported ADEs in these notifications closely resembled our own results [ 37 ]. These were asthenia, headache, dizziness, hair loss, insomnia, cramps, weight gain, nausea, muscle pain, and irritability. Consequently, our results demonstrate coherence with the existing literature. This study illustrates the feasibility of identifying the date of significant events related to a drug. However, it is noteworthy that the detection of such events is not necessarily expedited through social media compared with the traditional pharmacovigilance system.
The method used in our annotation process was integrated at an early stage during the pipeline’s design. Regarding the identification of drugs and symptoms, our annotation process exhibited the following performances: precision=0.926, recall=0.845, and F 1 -score=0.886 [ 20 ]. Similarly, for discerning the relationship between the drug and the ADEs, the performances were precision=0.683, recall=0.956, and F 1 -score=0.797 [ 20 ]. This study marked the inaugural publication on using NLP methods to identify ADEs in French-language web forums. The annotation process was thus developed using contemporary state-of-the-art methodologies at the time. However, it would now stand to gain from the integration of more recent NLP algorithms for named entity recognition [ 8 , 23 , 24 ]. These newer algorithms offer comparable performances while effectively handling more complex data, thereby enhancing the efficacy of NLP analysis. However, because of our emphasis on the genericity of the approach and the interoperability between the different modules rather than solely focusing on the performance of each module, we opted not to use these algorithms. Nevertheless, contemporary state-of-the-art methods for annotating ADEs from social media posts encompass convolutional neural networks trained on top of pretrained word vectors for sentence-level classification [ 24 ] and transformers using the bidirectional encoder representations from transformers (BERT) language model [ 39 ]. Hussain et al [ 25 ] introduced a multitask neural network based on BERT with hyperparameter optimization capable of sentence classification and named entity recognition. This model achieved performances of precision=0.840, recall=0.861, and F 1 -score=0.896 on the Twitter (X)-TwiMed data set. Additionally, Magge et al [ 8 ] presented a pipeline consisting of 3 BERT neural networks designed to classify sentences, extract named entities, and normalize those entities to their respective MedDRA concepts. The performances of this model were as follows: precision=0.82, recall=0.76, and F 1 -score=0.78 on the SMM4H-2020 data set (Twitter/X). Thanks to our modular design, it will be straightforward to substitute our current annotation process with an enhanced model in the future.
Several limitations should be acknowledged for future work. First, the scraper relies on the HTML structure of web forums, necessitating updates to its configuration files if a forum alters its page design. Additionally, our interface lacks the capability to incorporate alternate identifiers for drugs or ADEs. For instance, patients may commonly refer to the drug “baclofen” as “baclo” on social media platforms. Consequently, the number of posts pertaining to a drug or ADE could potentially be underestimated.
Forums must be selected before query execution to mitigate calculation time. However, selecting forums based on the presence of information related to a particular drug or ADE can introduce bias into signal detection methods, particularly in disproportionality analysis, where the drug-ADE pair may be overrepresented. Another limitation in qualitative analysis of posts is the inability of users to edit annotations or record typical pharmacovigilance qualitative data.
The assumption that all drugs mentioned in a post were consumed simultaneously by the user, as applied in the logistic regression–based method, introduces an evident bias.
One limitation associated with the use of social media data pertains to fraudulent posts. The pseudonymity inherent in these platforms provides malevolent individuals with the opportunity to disseminate false rumors. Additionally, patients might post identical or similar messages across multiple discussion boards, or even multiple times on the same board. Thus, it is crucial to consider these factors to mitigate biases in signal detection.
In the short to medium term, our objectives are updating the annotation module to enhance accuracy, improving the qualitative analysis by enabling users to edit and correct annotations, and expanding the range of signal detection methods available in the statistics module.
This method could indeed be beneficial for identifying potential drug misuse and unknown ADEs [ 40 ]. By categorizing pathological terms found in web forums based on their presence in the summary of product characteristics, we can distinguish between indications, known ADEs, and potential instances of drug misuse or unexpected ADEs. However, it is important to note that considering all pathological terms found in the summary of product characteristics as indications might obscure cases of drug inefficiency. Therefore, a nuanced approach is necessary to ensure comprehensive and accurate analysis.
We next tested our pipeline from the perspective of end users. However, the hypothesis was only partially confirmed, indicating the need for further studies. These studies should include evaluations with ergonomic criteria.
In the long term, our vision is to expand this tool to encompass other languages and themes beyond pharmacovigilance. This includes areas such as drug misuse, the consumption of food supplements, and the use of illegal drugs. French web forums dedicated to recreational drug use already exist, providing a valuable source of data for such endeavors.
Our hypothesis focused on the challenge encountered by regulatory agencies in using social media, primarily because of the lack of appropriate decision-making tools. To tackle this challenge, we devised a pipeline consisting of 4 editable modules aimed at effectively analyzing health-related French web forums for pharmacovigilance purposes. Using this pipeline and its user-friendly interface, we successfully demonstrated the feasibility of conducting quantitative analyses without the need for coding. This approach yielded coherent results and holds the potential to reveal new insights about drugs.
A practical implication of our pipeline is its potential application in health surveillance by regulatory agencies such as the ANSM or pharmaceutical companies. It can be instrumental in detecting issues related to drug safety and efficacy in real time. Furthermore, research teams can leverage this tool to retrospectively analyze events and gain valuable insights into pharmacovigilance trends.
The annotation module was developed by François Morlane-Hondère, Cyril Grouin, Pierre Zweigenbaum, and Leonardo Campillos-Llanos from the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI). Code review for the graphical user interface in R language was performed by Stevenn Volant under a contract with the Stat4Decision company. Stat4Decision was not involved in designing the study and writing this article. This work was funded by the Agence nationale de sécurité du médicament et des produits de santé (ANSM) through Convention No. 2016S076 and was supported by a PhD contract with Sorbonne Université.
Our data were extracted from web forums that do not allow data sharing. Thus, as we are not the owners of the data we cannot make the data available. The scrapper we developed to extract these data is open source and can be used to extract data from web forum posts. The tool as well as full documentation (in English and French) of the code and configuration file are available online [ 41 ].
None declared.
Vigi4Med Scraper structure, PHARES database structure, example of data representation, and source and evolution over time of web forum posts. PHARES: Pharmacovigilance in Social Networks.
adverse drug event |
Agence nationale de sécurité du médicament et des produits de santé |
Anatomical Therapeutic Classification |
Bidirectional Encoder Representations from Transformer |
comma-separated values |
Cumulative Sum |
European Medicines Agency |
Food and Drug Administration |
French Pharmacovigilance Database |
General Data Protection Regulation |
French National Health Authority |
Medical Dictionary for Regulatory Activities Terminology |
natural language processing |
Pharmacovigilance in Social Networks |
Prescription Drug Abuse Online Surveillance and Epidemiology |
proportional reporting ratio |
preferred term |
resource description framework |
reporting odds ratio |
Recognizing Adverse Drug Reactions |
Edited by A Mavragani; submitted 01.02.23; peer-reviewed by S Matsuda, L Shang; comments to author 06.07.23; revised version received 20.10.23; accepted 12.03.24; published 18.06.24.
©Pierre Karapetiantz, Bissan Audeh, Akram Redjdal, Théophile Tiffet, Cédric Bousquet, Marie-Christine Jaulent. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.06.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
IMAGES
VIDEO
COMMENTS
Conclusion. The key takeaways of this post are: A speech interface is a VUI (Voice User Interface) referring to an interface that requires voice interaction.; It is different from a tangible user interface, which requires interactions with physical gestures, such as tapping or swiping.; Designers need to carry out thorough research and observations on the user persona, device persona and ...
Conclusion. Overall, this case study touched on a few key items when designing a conversational interface: The scenarios the assistant should support. The types of statements the assistant should take in. The mapping of the statements to the user flow. The exchange of conversation on different devices.
A VUI UX design case study focuses on the process, methodology, and outcomes of designing a voice user interface (VUI) without relying on visuals. While traditional UX case studies often include visual artifacts like wireframes or prototypes, a VUI case study primarily focuses on the conversational flow, user interactions, and the effectiveness ...
[Capital One]: A case study of how Capital One designed a voice user interface for customers to access their banking services through smart speakers. Conclusion Voice interaction design presents an exhilarating and complex realm, necessitating a fusion of skills and expertise from diverse fields such as user experience design, natural language ...
1. Voice-first Design. You need to design hands-free and eyes-free user interfaces. Even when a VUI device has a screen, we should always design for voice-first interactions. While the screen can complement the voice interaction, the user should be able to complete the operation with minimum or no look at the screen.
In this case, it is safe to assume that voice interaction is one of many possible types of interaction. The user has access to multiple alternative interaction implements: a remote, a paired smartphone, a gaming controller, or a connected IoT device. Voice, therefore, does not necessarily become the default mode of interaction. It is one of many.
Designing a conversational flow is at the heart of a voice user interface design. Nailing a logical yet free-flowing dialog flow that mimics how a user would interact with the voice interface can make or break the experience. An effective approach is to use real-life dialogs as references in use cases to help guide the design process.
A speech interface, better known as a VUI (Voice User Interface), is an invisible interface that requires voice to interact with it. A common device that has voice recognition software is the Amazon Alexa smart speaker. ... An E-commerce Checkout Design Case Study. World-class articles, delivered weekly. Sign Me Up. By entering your email, you ...
Voice user interfaces (VUIs) allow the user to interact with a system through voice or speech commands. Virtual assistants, such as Siri, Google Assistant, and Alexa, are examples of VUIs. The primary advantage of a VUI is that it allows for a hands-free, eyes-free way in which users can interact with a product while focusing their attention ...
Consequently, the VUI system can effectively execute user commands and establish a fluid interaction between voice input and corresponding actions on the web page. By leveraging the power of web automation, VUIs can automate repetitive tasks, enhance user interactivity, and streamline workflows within web applications.
The Voice User Interface (VUI) is a significant innovation that allows for seamless communication between humans and machines via voice instructions. ... A UI design case study to redesign an ...
Voice User Interface is an interface which enables voice interaction between human and devices. Visuals can be added as a complement to give a better understanding of users in some cases. ... A UI design case study to redesign an example user interface using logical rules or guidelines.
VUIs employ speech-recognition technology that enables users to communicate via a computer, smartphone, smart speaker, or other device by using voice commands. Some examples of VUIs include products such as Apple's Siri, Amazon's Alexa, Google's Assistant, and Microsoft's Cortana. Voice technology is different from any other method of ...
we faced to make search by voice a reality. In Section 4 we explore the user interface design issues. Multi-modal interfaces, combining speech and graphical elements, are very new, and there are many challenges to contend with as well as opportunities to exploit. Finally, in Section 5 we describe user studies based on our deployed applications ...
3. The user asks the interface something it can't do. (System error) In general, users should not experience more than three "no input" or "no match" errors in a row. On the first no match error, the system should do a rapid reprompt that combines an apology with a condensed repetition of the original question.
Voice User Interface (abbreviated as VUI) refers to interfaces that enable vocal interaction between humans and devices. A Voice User Interface can be any object, as long as it is capable of recognizing what the person addressing it is saying and consequently responding intelligently. If some aspects still seem a bit strange to the general public, we cannot overlook that more and more ...
These examples use Speechly Spoken Language Understanding technology for a natural voice UI, enhancing the touch screen user experience with voice functionalities. Form filling with voice. Voice in eCommerce and search filtering with voice. Adding items from a big inventory, such as grocery eCommerce. Professional applications.
Designing voice user interfaces (VUIs) with emotion involves adding elements that convey emotions and personality to the voice and the overall user experience. Here are some tips for designing VUIs with emotion: Choose the right voice: The voice you use for your VUI can play a significant role in the emotional experience of your users. Consider ...
Same fundamental skills. Designing a voice user interface requires the same fundamental skills as designing a visual user interface. The design of any human-computer interaction centers around formative user research, rapid prototyping, and regular testing. A voice user interface is simply a new and exciting way to transmit information.
Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker. Pages 127-131 ... and usefulness. In this study, we conducted a web-based survey to investigate usability, user experiences, and usefulness of the Google Home smart speaker. A total of 114 users, who are active in a social-media ...
Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity in recent years. VUI has been integrated into various devices in the context of the smart home system. In comparison with traditional interaction methods, VUI provides multiple benefits. VUI allows for hands-free and eyes-free interaction.
Voice interface is gradually replacing Graphic User Interface, and quickly becoming a common part of in-vehicle experiences. Products that use voice as the primary interface are becoming popular by the day and the number of users has continued to grow. This case study explores the the application of Voice Interface in automotive field.
Abstract Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users ...
Multi-modal interaction is the combination of two or more modes of input or output in a user interface. For example, a user can use voice and touch to interact with a smartphone, or voice and gesture to interact with a smart TV. ... A UI design case study to redesign an example user interface using logical rules or guidelines. Mar 14, 2023. 267 ...
These modules were (1) web forums' posts extraction, (2) web forums' posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France.