a N/A: not applicable.
b E: expert.
Based on the external expert’s comments, the recommendations were reanalyzed. Of the 244 recommendations, 61 (25%) were deleted because they were duplicated or redundant, 48 (19.7%) were merged with other complementary recommendations, 62 (25.4%) were rewritten for clarification and language standardization, 14 (5.7%) were split in 2 or more recommendations, and 59 (24.2%) were not changed. This resulted in a preliminary list of 175 recommendations. Table 2 compares the external experts’ recommendations and internal experts’ final decision.
Comparison of external expert’s recommendations and internal experts’ decision.
Type of action | External experts’ recommendations (N=263) , n (%) | Internal experts’ decision (N=244), n (%) |
Deleted | 44 (16.7) | 61 (25) |
Merged | 108 (41.1) | 48 (19.7) |
Rewritten | 29 (11) | 62 (25.4) |
Split | 4 (1.5) | 14 (5.7) |
Not changed | 78 (29.7) | 59 (24.2) |
a Consensus was not possible for 19 recommendations.
The 175 recommendations were then categorized into 12 mutually exclusive principles (feedback, recognition, flexibility, customization, consistency, errors, help, accessibility, navigation, privacy, visual, and emotional) and within each principle, organized into 2 levels of hierarchy according to the specificity/level of detail.
Of the 175 recommendations, 70 were categorized as level 1 and were generic recommendations applied to all digital solutions, and 105 recommendations were linked to 1 first level recommendation and subdivided by type of digital solution/interaction paradigm. The recommendations of both levels are linked, as level 2 recommendations detail how level 1 recommendations can be implemented. For example, the level 1 recommendation that “the system should be used efficiently and with a minimum of fatigue” is linked to a set of level 2 recommendations targeted at specific interaction paradigms, such as feet interaction and robotics: (1) “In feet interaction, the system should minimize repetitive actions and sustained effort, using reasonable operating forces and allowing the user to maintain a neutral body position,” and (2) “In robotics, the system should have an appropriate weight, allowing the person to move the robot easily (this can be achieved by using back drivable hardware).” Table 3 shows the distribution of the 175 recommendations.
Distribution of recommendations by level and category.
Category | Level 1, (N=70), n | Level 2, (N=105), n | Technology/interaction paradigm | Total (N=175), n |
Feedback | 6 | 5 | 11 | |
Recognition | 5 | 12 | | 17 |
Flexibility | 6 | 10 | 16 | |
Customization | 7 | 6 | 13 | |
Consistency | 2 | 2 | 4 | |
Errors | 5 | 7 | | 12 |
Help | 3 | 2 | | 5 |
Accessibility | 8 | 23 | | 31 |
Navigation | 6 | 6 | 12 | |
Privacy | 3 | 5 | 8 | |
Visual component | 16 | 22 | 38 | |
Emotional component | 3 | 5 | 8 |
A total of 14 experts (8 females and 6 males) with a mean age of 35 (SD 8.8) years old provided feedback on recommendations. Experts were user interface designers (n=6, 43%) and user interface researchers (n=8, 57%) who had a background in design (n=8, 57%) or communication and technology sciences (n=6, 43%). The interviews lasted up to 2 hours each.
All the 175 recommendations reached consensus for the usefulness question. However, for question 2 (Do you consider this recommendation mandatory?), there was consensus that 54 (77%) level 1 recommendations were mandatory. The remaining 16 (23%) level 1 recommendations were considered by 5 (36%) to 9 (64%) experts as not mandatory. For the 105 level 2 recommendations, there was consensus that 91 (87%) were mandatory, and the remaining 14 were not considered mandatory by 5 (36%) to 9 (64%) experts.
Experts’ comments were aggregated into 5 main themes: (1) deletion or recategorizing of recommendations, (2) consistency, (3) contradiction, (4) asymmetry, and (5) uncertainty. It was suggested that 1 recommendation be deleted (“The system should be free from errors”), and another moved from the visual component category to the emotional component category. No other suggestions were made regarding the structure of the recommendations. There were comments related to the consistency, particularly regarding the need to use either British or American spelling throughout all recommendations and to consistently refer to “users” instead of “persons” or “individuals.” The remaining comments applied mostly to level 2 recommendations, for which experts identified contradictory recommendations (eg, accessibility: “In robotics, the system should meet the person’s needs, be slow, safe and reliable, small, easy to use, and have an appearance not too human-like, not patronizing or stigmatizing ” vs emotional: “In robotics, the system should indicate aliveness by showing some autonomous behavior, facial expressions, hand/head gestures to motivate engagement, as well as changing vocal patterns and pace to show different emotions”). Experts also commented on the asymmetry across the number of level 2 recommendations linked to level 1 recommendations and on the asymmetry regarding the number of recommendations per type of technology and interaction paradigm. In addition, experts were uncertain about the accuracy of the measures indicated in the recommendations (eg, visual: “In robotics, the system graphical user interface and button elements should be sufficiently big in size, so they can be easily seen and used, about ~20 mm in case of touch screen, buttons” vs visual: “In feet interaction, the system should consider an appropriate interaction radius of 20 cm for tap, 30 cm at the front, and 25 cm at the back for kick”).
Based on the experts’ comments and issues raised in the previous step, the term “users” was adopted throughout the recommendations, 1 recommendation was removed, and 1 was moved from the visual component to the emotional component. In addition, all level 1 recommendations for which no consensus was reached on whether they were mandatory were considered not mandatory (identified by using the word “may” in the recommendation). The internal panel also recognized that level 2 recommendations cannot be used to guide user interface design in their current stage and that further work is needed. Therefore, a final list of 69 generic recommendations is proposed ( Multimedia Appendix 2 ).
To the best of our knowledge, this is the first study that attempted to analyze and synthetize existing recommendations on user interface design. This was a complex task that generated a high number of interdependent recommendations that could be organized into hierarchical levels of interdependency and grouped according to usability principles. Level 1 recommendations are generic and can be used to inform the user interface design of different types of technology and interaction paradigms. Meanwhile, level 2 recommendations are more specific and therefore apply to different types of technology and interaction paradigms. Furthermore, the level of detail and absence of evidence that they had been validated raised doubts about their validity.
The external experts’ suggestions formed the basis for the internal experts’ (our) analysis. However, there is a discrepancy between the analysis of both panels of experts in terms of the number of recommendations that should be deleted, merged, rewritten, fragmented, or not changed. This was because when analyzing the recommendations, the internal panel verified that there were more recommendations to delete that were repeated or generic beyond those already identified by the external panel. It is likely that these were missed due to the high number of recommendations, which made the analysis a time-consuming and complex task. Furthermore, changing 1 recommendation in line with external experts’ suggestions resulted in subsequently having to change other recommendations for coherence and consistency, resulting in a higher number of recommendations that were rewritten. In addition, there was a lack of consensus among external experts, leaving the final decision to the internal experts (us), further contributing to discrepancies.
Regarding the organization of the recommendations, the division into 2 hierarchical levels based on the specificity/level of detail resulted from the external experts’ feedback and aimed at making the consultation of the list of recommendations easier. This type of hierarchization in levels of detail was also used in previous studies aimed at synthetizing existing guidelines [ 23 , 36 ].
The recommendations were grouped into 12 categories, which closely relate to existing usability principles (feedback, recognition, flexibility, customization, consistency, errors, help, accessibility, navigation, and privacy [ 18 , 37 - 39 ]). Usability principles are defined as broad “rules of thumb” or design guidelines that describe features of the systems to guide the design of digital solutions [ 18 , 40 ]. Additionally, they are oriented to improve user interaction [ 3 ] and impact the quality of the digital solution interface [ 41 ]. Therefore, having the recommendations organized in a way that maps these principles helps facilitate a practical use of the recommendations proposed herein, as these usability principles are familiar to designers and are well established, well known, and accepted in the literature [ 23 , 42 ].
The results showed an asymmetry in the number of recommendations categorized into each of the 12 usability principles (eg, for level 1, consistency has 2 recommendations while the visual component has 16 recommendations). This discrepancy suggests that some areas of user interface design such as the visual component might be better detailed, more complex, or more valued in the literature, but can also suggest that the initial search might not have been comprehensive enough, as it included a reduced number of databases [ 32 ]. Nevertheless, the heterogeneity between categories does not influence its relevance, as it is the set of recommendations as a whole that influences the user design interface of a digital solution.
The number of level 2 recommendations aggregated under each level 1 recommendation is also uneven. Most of the level 2 recommendations that resulted from this study concern web and mobile technologies because their utilization is widespread among the population [ 43 ] and therefore more likely to have design recommendations reported in the literature [ 23 , 31 , 44 , 45 ]. On the other hand, emerging technologies like robotics and interaction paradigms (eg, gestures, voice, and feet) represent new challenges for researchers, and recommendations are still being formulated, resulting in a lower number of specific recommendations that are published [ 46 - 49 ]. Moreover, the level 2 recommendations raised doubts among experts, namely regarding (1) the lack of consensus on whether they were mandatory or not, (2) apparent contradictions between recommendations, and (3) uncertainty regarding the accuracy of some recommendations, particularly those very specific (eg, the recommendations on the size of the buttons in millimeters). These aspects suggest that level 2 recommendations need further validation in future studies. No data was found on how the authors of the recommendations arrived at this level of detail and how the exact recommendation might vary depending on the target users [ 50 , 51 ], the type of technology [ 49 ], interaction paradigm [ 46 ], and the context of use [ 52 ]. Validation of the level 2 recommendations might be performed by gathering expert’s consensus on the adequacy of recommendations by type of technology/interaction paradigm and involving real users to test if the specific user interfaces that fulfill the recommendations improve usability and user experience [ 50 , 53 ].
We believe that level 1 recommendations apply to different users, contexts, and technologies/interaction paradigms and that the necessary level of specificity will be given by level 2 recommendations, which can be further operationalized into more detailed recommendations (eg, creating level 3 recommendations under level 2 recommendations). For example, recommendation 1 from the recognition category states that “the system should consider the context of use, using phrases, words, and concepts that are familiar to the users and grounded in real conventions, delivering an experience that matches the system and the real world,” which is an example of applicability to different contexts such as health or education. Similarly, recommendation 1 from the flexibility category states that “the system should support both inexperienced and experienced users, be easy to learn, and to remember, even after an inactive period,” also showing adaptability to different types of users. Nevertheless, the level of importance of each level 1 recommendation might vary. For example, recommendation 6 of the flexibility category, which states that “the system may make users feel confident to operate and take appropriate action if something unexpected happens,” was not considered mandatory by the panel of external experts. However, one might argue that it should be considered mandatory in the field of health, where the feeling of being in control and acting immediately if something unexpected happens is of utmost importance. Therefore, both level 1 and level 2 recommendations require further validation across different types of technology and interaction paradigms but also for different target users and contexts of use. Also required are investigations to determine whether their use results in better digital solutions, and particularly for the health care field, increases adhesion to and effectiveness of interventions.
In synthesis, although this study constitutes an attempt toward a more standardized approach in the field of user interface design, the set of recommendations presented herein should not be seen as a final set but rather as guides that should be critically appraised by designers according to the context, type of technology, type of interaction, and the end users for whom the digital solution is intended.
The strengths of this proposed set of recommendations are that it was developed based on multiple sources and multiple rounds of experts’ feedback. However, although several experts were involved in different steps of the study, it cannot be guaranteed that the views of the included experts are representative of the views of a broader community of user interface design experts. Another limitation of this study is that the initial search for recommendations might not have been comprehensive enough. Nevertheless, external experts were given the possibility of adding recommendations to the list, and none suggested the need to include additional recommendations. The list of level 2 recommendations is a work in progress that should be further discussed and changed considering the technology/paradigm of interaction. Finally, some types of technologies and interaction paradigms are not represented in the recommendations (eg, virtual reality), and it would be important to have specific recommendations for all types of technologies and interaction paradigms in the future.
This work was supported by the SHAPES (Smart and Health Ageing through People Engaging in Supportive Systems) project funded by the Horizon 2020 Framework Program of the European Union for Research Innovation (grant agreement 857159 - SHAPES – H2020 – SC1-FA-DTS – 2018-2020).
SHAPES | Smart and Health Ageing through People Engaging in Supportive Systems |
Multimedia appendix 2.
Conflicts of Interest: None declared.
Theory, Analysis and Reviews on UX User Experience Research and Design
Home Mission About Charles L. Mauro CHFP
« Updates From Mauro Usability Science - Critical Analysis of Maeda Design in Tech Report 2017 »
Charles Mauro CHFP
Important peer-reviewed and informally published recent research on user interface design and user experience (UX) design.
For the benefit of clients and colleagues we have culled a list of approximately 70 curated recent research publications dealing with user interface design, UX design and e-commerce optimization.
In our opinion these publications represent some of the best formal research thinking on UI and UX design. These papers are also among the most widely downloaded and cited formal research on UI / UX design. We have referenced many of these studies in our work at MauroNewMedia.
Pay walls: As you will note in reviewing the following links and abstracts, most of the serious research on UI / UX design and optimization is located behind pay walls controlled by major publishers. However, in the end, good data is well worth the investment. Many links and other cited references are, of course, free.
Important disclaimer: We do not receive any form of compensation for citing any of the following content. Either Charles L Mauro CHFP or Paul Thurman MBA has personally reviewed all papers and links in this list. Some of these references were utilized in the recent NYTECH UX talk given by Paul Thurman MBA titled: Critical New UX Design Optimization Research
In addition to historical research papers, we frequently receive requests from colleagues, clients and journalists for recommended reading lists on topics covering our expertise in UX design, usability research and human factors engineering. These requests prompted us to pull from our research library (yes, we still have real books) 30+ books which our professional staff felt should be considered primary conceptual literature for anyone well-read in the theory and practice of UX design and research. Please follow the for PulseUX’s compilation of the 30+ Best UX Design and Research Books of All Time
Title: The influence of hedonic and utilitarian motivations on user engagement: The case of online shopping experiences
Abstract User experience seeks to promote rich, engaging interactions between users and systems. In order for this experience to unfold, the user must be motivated to initiate an interaction with the technology. This study explored hedonic and utilitarian motivations in the context of user engagement with online shopping. Factor analysis was performed to identify a parsimonious set of factors from the Hedonic and Utilitarian Shopping Motivation Scale and the User Engagement Scale based on responses from 802 shoppers. Multiple linear regression was used to test hypotheses with hedonic and utilitarian motivations (Idea, Social, Adventure/Gratification, Value and Achievement Shopping) and attributes of user engagement (Aesthetics, Focused Attention, Perceived Usability, and Endurability). Results demonstrate the salience of Adventure/Gratification Shopping and Achievement Shopping Motivations to specific variables of user engagement in the e-commerce environment and provide considerations for the inclusion of different types of motivation into models of engaging user experiences. Abstract Copyright © 2010 Elsevier B.V. All rights reserved.
Title: New Support for Marketing Analytics
Abstract Consumer surveys and myriad other forms of research have long been the grist for marketing decisions at large companies. But many firms have been reluctant to embrace the high-tech approach to data gathering and number crunching that falls under the rubric of marketing analytics, which uses advanced techniques to transform the tracking of promotional efforts, customer preferences, and industry developments into sophisticated branding and advertising campaigns. Fueled in part by Tom Peters and Robert Waterman’s seminal 1982 book In Search of Excellence , which coined the phrase “paralysis through analysis,” skepticism about the approach remains widespread, even in the face of a number of positive research results over the years. This new study, involving Fortune 1000 companies, offers yet more ammunition for supporters of marketing analytics. Abstract Copyright © 2013 Booz & Company Inc. All rights reserved.
Title: Video game values: Human-computer interaction and games
Abstract Current human–computer interaction (HCI) research into video games rarely considers how they are different from other forms of software. This leads to research that, while useful concerning standard issues of interface design, does not address the nature of video games as games specifically. Unlike most software, video games are not made to support external, user-defined tasks, but instead define their own activities for players to engage in. We argue that video games contain systems of values which players perceive and adopt, and which shape the play of the game. A focus on video game values promotes a holistic view of video games as software, media, and as games specifically, which leads to a genuine video game HCI. Abstract Copyright © 2006 Elsevier B.V. All rights reserved.
Title: When fingers do the talking: a study of text messaging
Abstract SMS or text messaging is an area of growth in the communications field. The studies described below consisted of a questionnaire and a diary study. The questionnaire was designed to examine texting activities in 565 users of the mobile phone. The diary study was carried out by 24 subjects over a period of 2 weeks. The findings suggest that text messaging is being used by a wide range of people for all kinds of activities and that for some people it is the preferred means of communication. These studies should prove interesting for those examining the use and impact of SMS. Abstract Copyright © 2004 Elsevier B.V. All rights reserved.
Title: Understanding factors affecting trust in and satisfaction with mobile banking in Korea: A modified DeLone and McLean’s model perspective
Abstract As mobile technology has developed, mobile banking has become accepted as part of daily life. Although many studies have been conducted to assess users’ satisfaction with mobile applications, none has focused on the ways in which the three quality factors associated with mobile banking – system quality, information quality and interface design quality – affect consumers’ trust and satisfaction. Our proposed research model, based on DeLone and McLean’s model, assesses how these three external quality factors can impact satisfaction and trust. We collected 276 valid questionnaires from mobile banking customers, then analyzed them using structural equation modeling. Our results show that system quality and information quality significantly influence customers’ trust and satisfaction, and that interface design quality does not. We present herein implications and suggestions for further research. Abstract Copyright © 2009 Elsevier B.V. All rights reserved.
Title: What is beautiful is usable
Abstract An experiment was conducted to test the relationships between users’ perceptions of a computerized system’s beauty and usability. The experiment used a computerized application as a surrogate for an Automated Teller Machine (ATM). Perceptions were elicited before and after the participants used the system. Pre-experimental measures indicate strong correlations between system’s perceived aesthetics and perceived usability. Post-experimental measures indicated that the strong correlation remained intact. A multivariate analysis of covariance revealed that the degree of system’s aesthetics affected the post-use perceptions of both aesthetics and usability, whereas the degree of actual usability had no such effect. The results resemble those found by social psychologists regarding the effect of physical attractiveness on the valuation of other personality attributes. The findings stress the importance of studying the aesthetic aspect of human–computer interaction (HCI) design and its relationships to other design dimensions. Abstract Copyright © 2000 Elsevier Science B.V. All rights reserved.
Title: UX Curve: A method for evaluating long-term user experience
Abstract The goal of user experience design in industry is to improve customer satisfaction and loyalty through the utility, ease of use, and pleasure provided in the interaction with a product. So far, user experience studies have mostly focused on short-term evaluations and consequently on aspects relating to the initial adoption of new product designs. Nevertheless, the relationship between the user and the product evolves over long periods of time and the relevance of prolonged use for market success has been recently highlighted. In this paper, we argue for the cost-effective elicitation of longitudinal user experience data. We propose a method called the “UX Curve” which aims at assisting users in retrospectively reporting how and why their experience with a product has changed over time. The usefulness of the UX Curve method was assessed in a qualitative study with 20 mobile phone users. In particular, we investigated how users’ specific memories of their experiences with their mobile phones guide their behavior and their willingness to recommend the product to others. The results suggest that the UX Curve method enables users and researchers to determine the quality of long-term user experience and the influences that improve user experience over time or cause it to deteriorate. The method provided rich qualitative data and we found that an improving trend of perceived attractiveness of mobile phones was related to user satisfaction and willingness to recommend their phone to friends. This highlights that sustaining perceived attractiveness can be a differentiating factor in the user acceptance of personal interactive products such as mobile phones. The study suggests that the proposed method can be used as a straightforward tool for understanding the reasons why user experience improves or worsens in long-term product use and how these reasons relate to customer loyalty. Abstract Copyright 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: Heuristic evaluation: Comparing ways of finding and reporting usability problems
Abstract Research on heuristic evaluation in recent years has focused on improving its effectiveness and efficiency with respect to user testing. The aim of this paper is to refine a research agenda for comparing and contrasting evaluation methods. To reach this goal, a framework is presented to evaluate the effectiveness of different types of support for structured usability problem reporting. This paper reports on an empirical study of this framework that compares two sets of heuristics, Nielsen’s heuristics and the cognitive principles of Gerhardt-Powals, and two media of reporting a usability problem, i.e. either using a web tool or paper. The study found that there were no significant differences between any of the four groups in effectiveness, efficiency and inter-evaluator reliability. A more significant contribution of this research is that the framework used for the experiments proved successful and should be reusable by other researchers because of its thorough structure. Abstract Copyright © 2006 Elsevier B.V. All rights reserved.
Title: Socio-technical systems: From design methods to systems engineering
Abstract It is widely acknowledged that adopting a socio-technical approach to system development leads to systems that are more acceptable to end users and deliver better value to stakeholders. Despite this, such approaches are not widely practised. We analyse the reasons for this, highlighting some of the problems with the better known socio-technical design methods. Based on this analysis we propose a new pragmatic framework for socio-technical systems engineering (STSE) which builds on the (largely independent) research of groups investigating work design, information systems, computer-supported cooperative work, and cognitive systems engineering. STSE bridges the traditional gap between organisational change and system development using two main types of activity: sensitisation and awareness; and constructive engagement. From the framework, we identify an initial set of interdisciplinary research problems that address how to apply socio-technical approaches in a cost-effective way, and how to facilitate the integration of STSE with existing systems and software engineering approaches. Abstract Copyright © 2010 Elsevier B.V. All rights reserved.
Title: Five reasons for scenario-based design
Abstract Scenarios of human–computer interaction help us to understand and to create computer systems and applications as artifacts of human activity—as things to learn from, as tools to use in one’s work, as media for interacting with other people. Scenario-based design of information technology addresses five technical challenges: scenarios evoke reflection in the content of design work, helping developers coordinate design action and reflection. Scenarios are at once concrete and flexible, helping developers manage the fluidity of design situations. Scenarios afford multiple views of an interaction, diverse kinds and amounts of detailing, helping developers manage the many consequences entailed by any given design move. Scenarios can also be abstracted and categorized, helping designers to recognize, capture and reuse generalizations and to address the challenge that technical knowledge often lags the needs of technical design. Finally, scenarios promote work-oriented communication among stakeholders, helping to make design activities more accessible to the great variety of expertise that can contribute to design, and addressing the challenge that external constraints designers and clients face often distract attention from the needs and concerns of the people who will use the technology. Abstract Copyright © 2000 Elsevier Science B.V. All rights reserved.
Title: Needs, affect, and interactive products – Facets of user experience
Abstract Subsumed under the umbrella of User Experience (UX), practitioners and academics of Human–Computer Interaction look for ways to broaden their understanding of what constitutes “pleasurable experiences” with technology. The present study considered the fulfilment of universal psychological needs, such as competence, relatedness, popularity, stimulation, meaning, security, or autonomy, to be the major source of positive experience with interactive technologies. To explore this, we collected over 500 positive experiences with interactive products (e.g., mobile phones, computers). As expected, we found a clear relationship between need fulfilment and positive affect, with stimulation, relatedness, competence and popularity being especially salient needs. Experiences could be further categorized by the primary need they fulfil, with apparent qualitative differences among some of the categories in terms of the emotions involved. Need fulfilment was clearly linked to hedonic quality perceptions, but not as strongly to pragmatic quality (i.e., perceived usability), which supports the notion of hedonic quality as “motivator” and pragmatic quality as “hygiene factor.” Whether hedonic quality ratings reflected need fulfilment depended on the belief that the product was responsible for the experience (i.e., attribution). Abstract Copyright © 2010 Elsevier B.V. All rights reserved.
Title: The role of social presence in establishing loyalty in e-Service environments
Abstract Compared to offline shopping, the online shopping experience may be viewed as lacking human warmth and sociability as it is more impersonal, anonymous, automated and generally devoid of face-to-face interactions. Thus, understanding how to create customer loyalty in online environments (e-Loyalty) is a complex process. In this paper a model for e-Loyalty is proposed and used to examine how varied conditions of social presence in a B2C e-Services context influence e-Loyalty and its antecedents of perceived usefulness, trust and enjoyment. This model is examined through an empirical study involving 185 subjects using structural equation modeling techniques. Further analysis is conducted to reveal gender differences concerning hedonic elements in the model on e-Loyalty. Abstract Copyright © 2006 Elsevier B.V. All rights reserved.
Title: A framework for evaluating the usability of mobile phones based on multi-level, hierarchical model of usability factors
Abstract As a mobile phone has various advanced functionalities or features, usability issues are increasingly challenging. Due to the particular characteristics of a mobile phone, typical usability evaluation methods and heuristics, most of which are relevant to a software system, might not effectively be applied to a mobile phone. Another point to consider is that usability evaluation activities should help designers find usability problems easily and produce better design solutions. To support usability practitioners of the mobile phone industry, we propose a framework for evaluating the usability of a mobile phone, based on a multi-level, hierarchical model of usability factors, in an analytic way. The model was developed on the basis of a set of collected usability problems and our previous study on a conceptual framework for identifying usability impact factors. It has multi-abstraction levels, each of which considers the usability of a mobile phone from a particular perspective. As there are goal-means relationships between adjacent levels, a range of usability issues can be interpreted in a holistic as well as diagnostic way. Another advantage is that it supports two different types of evaluation approaches: task-based and interface-based. To support both evaluation approaches, we developed four sets of checklists, each of which is concerned, respectively, with task-based evaluation and three different interface types: Logical User Interface (LUI), Physical User Interface (PUI) and Graphical User Interface (GUI). The proposed framework specifies an approach to quantifying usability so that several usability aspects are collectively measured to give a single score with the use of the checklists. A small case study was conducted in order to examine the applicability of the framework and to identify the aspects of the framework to be improved. It showed that it could be a useful tool for evaluating the usability of a mobile phone. Based on the case study, we improved the framework in order that usability practitioners can use it more easily and consistently. Abstract Copyright © 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: Understanding the most satisfying and unsatisfying user experiences: Emotions, psychological needs, and context
Abstract The aim of this research was to study the structure of the most satisfying and unsatisfying user experiences in terms of experienced emotions, psychological needs, and contextual factors. 45 university students wrote descriptions of their most satisfying and unsatisfying recent user experiences and analyzed those experiences using the Positive and Negative Affect Schedule (PANAS) method for experienced emotions, a questionnaire probing the salience of 10 psychological needs, and a self-made set of rating scales for analyzing context. The results suggested that it was possible to capture variations in user experiences in terms of experienced emotions, fulfillment of psychological needs, and context effectively by using psychometric rating scales. The results for emotional experiences showed significant differences in 16 out of 20 PANAS emotions between the most satisfying and unsatisfying experiences. The results for psychological needs indicated that feelings of autonomy and competence emerged as highly salient in the most satisfying experiences and missing in the unsatisfying experiences. High self-esteem was also notably salient in the most satisfying experiences. The qualitative results indicated that most of the participants’ free-form qualitative descriptions, especially for the most unsatisfying user experiences, gave important information about the pragmatic aspects of the interaction, but often omitted information about hedonic and social aspects of user experience. Abstract Copyright © 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: The Usability Metric for User Experience
Abstract The Usability Metric for User Experience (UMUX) is a four-item Likert scale used for the subjective assessment of an application’s perceived usability. It is designed to provide results similar to those obtained with the 10-item System Usability Scale, and is organized around the ISO 9241-11 definition of usability. A pilot version was assembled from candidate items, which was then tested alongside the System Usability Scale during usability testing. It was shown that the two scales correlate well, are reliable, and both align on one underlying usability factor. In addition, the Usability Metric for User Experience is compact enough to serve as a usability module in a broader user experience metric. Abstract Copyright © 2010 Elsevier B.V. All rights reserved.
Title: User acceptance of mobile Internet: Implication for convergence technologies
Abstract Using the Technology Acceptance Model as a conceptual framework and a method of structural equation modeling, this study analyzes the consumer attitude toward Wi-Bro drawing data from 515 consumers. Individuals’ responses to questions about whether they use/accept Wi-Bro were collected and combined with various factors modified from the Technology Acceptance Model.
The result of this study show that users’ perceptions are significantly associated with their motivation to use Wi-Bro. Specifically, perceived quality and perceived availability are found to have significant effect on users’ extrinsic and intrinsic motivation. These new factors are found to be Wi-Bro-specific factors, playing as enhancing factors to attitudes and intention. Abstract Copyright © 2007 Elsevier B.V. All rights reserved.
Title: Understanding purchasing behaviors in a virtual economy: Consumer behavior involving virtual currency in Web 2.0 communities
Abstract This study analyzes consumer purchasing behavior in Web 2.0, expanding the technology acceptance model (TAM), focusing on which variables influence the intention to transact with virtual currency. Individuals’ responses to questions about attitude and intention to transact in Web 2.0 were collected and analyzed with various factors modified from the TAM. The results of the proposed model show that subjective norm is a key behavioral antecedent to using virtual currency. In the extended model, the moderating effects of subjective norm on the relations among the variables were found to be significant. The new set of variables is virtual environment-specific, acting as factors enhancing attitudes and behavioral intentions in Web 2.0 transactions. Abstract Copyright © 2008 Elsevier B.V. All rights reserved.
Title: Fundamentals of physiological computing
Abstract This review paper is concerned with the development of physiological computing systems that employ real-time measures of psychophysiology to communicate the psychological state of the user to an adaptive system. It is argued that physiological computing has enormous potential to innovate human–computer interaction by extending the communication bandwidth to enable the development of ‘smart’ technology. This paper focuses on six fundamental issues for physiological computing systems through a review and synthesis of existing literature, these are (1) the complexity of the psychophysiological inference, (2) validating the psychophysiological inference, (3) representing the psychological state of the user, (4) designing explicit and implicit system interventions, (5) defining the biocybernetic loop that controls system adaptation, and (6) ethical implications. The paper concludes that physiological computing provides opportunities to innovate HCI but complex methodological/conceptual issues must be fully tackled during the research and development phase if this nascent technology is to achieve its potential. Abstract Copyright © 2008 Elsevier B.V. All rights reserved.
Title: Modelling user experience with web sites: Usability, hedonic value, beauty and goodness
Abstract Recent research into user experience has identified the need for a theoretical model to build cumulative knowledge in research addressing how the overall quality or ‘goodness’ of an interactive product is formed. An experiment tested and extended Hassenzahl’s model of aesthetic experience. The study used a 2 × 2 × (2) experimental design with three factors: principles of screen design, principles for organizing information on a web page and experience of using a web site. Dependent variables included hedonic perceptions and evaluations of a web site as well as measures of task performance, navigation behaviour and mental effort. Measures, except Beauty, were sensitive to manipulation of web design. Beauty was influenced by hedonic attributes (identification and stimulation), but Goodness by both hedonic and pragmatic (user-perceived usability) attributes as well as task performance and mental effort. Hedonic quality was more stable with experience of web-site use than pragmatic quality and Beauty was more stable than Goodness. Abstract Copyright © 2008 Elsevier B.V. All rights reserved.
Title: Sample Size In Usability Studies
Abstract Usability studies are a cornerstone activity for developing usable products. Their effectiveness depends on sample size, and determining sample sizehas been a research issue in usability engineering for the past 30 years. In 2010, Hwang and Salvendy reported a meta study on the effectiveness of usability evaluation, concluding that a sample size of 10±2 is sufficient for discovering 80% of usability problems (not five, as suggested earlier by Nielsen in 2000). Here, I show the Hwang and Salvendy study ignored fundamental mathematical properties of the problem, severely limiting the validity of the 10±2 rule, then look to reframe the issue of effectiveness and sample-size estimation to the practices and requirements commonly encountered in industrial-scale usability studies. Abstract Copyright © 2013 ACM, Inc. Title: An experimental study of learner perceptions of the interactivity of web-based instruction
Abstract An effectively designed interaction mechanism creates a shortcut for human–computer interaction. Most studies in this area have concluded that the higher the level of interactivity, the better, especially regarding interactive websites applied in the fields of business and education. Previous studies have also suggested that designs with a higher level of interactivity result in higher learner evaluations of websites. However, little research has examined learner perceptions as they interact with web-based instruction (WBI) systems in a situation with limited time. To assist learners in acquiring knowledge quickly, the interactivity design must make the web learning environment easier to use by reducing the complexity of the interface. The aim of the present study is to explore learner perceptions of three WBI systems with different interaction levels under time limitations. This study was therefore designed to provide a new framework to design systems with different degrees of interactivity, and to examine learners’ perceptions of these interaction elements. Three WBI systems were developed with different degrees of interactivity from high to low, and a between-subject experiment was conducted with 45 subjects. The results of the experiment indicate that a higher level of interactivity does not necessarily guarantee a higher perception of interactivity in a short-term learning situation. Therefore, the instructors must pay attention to modifying or selecting appropriate interactive elements that are more suitable for various learning stages. The findings provide insights for designers to adopt different degrees of interactivity in their designs that will best fulfill various learners’ needs. Abstract Copyright © 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: Age differences in the perception of social presence in the use of 3D virtual world for social interaction
Abstract 3D virtual worlds are becoming increasingly popular as tool for social interaction, with the potential of augmenting the user’s perception of physical and social presence. Thus, this technology could be of great benefit to older people, providing home-bound older users with access to social, educational and recreational resources. However, so far there have been few studies looking into how older people engage with virtual worlds, as most research in this area focuses on younger users. In this study, an online experiment was conducted with 30 older and 30 younger users to investigate age differences in the perception of presence in the use of virtual worlds for social interaction. Overall, we found that factors such as navigation and prior experience with text messaging tools played a key role in older people’s perception of presence. Both physical and social presence was found to be linked to the quality of social interaction for users of both age groups. In addition, older people displayed proxemic behavior which was more similar to proxemic behavior in the physical world when compared to younger users. Abstract Copyright © 2012 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: Human error and information systems failure: the case of the London ambulance service computer-aided despatch system project
Abstract Human error and systems failure have been two constructs that have become linked in many contexts. In this paper we particularly focus on the issue of failure in relation to that group of software systems known as information systems. We first review the extant theoretical and empirical work on this topic. Then we discuss one particular well-known case — that of the London ambulance service computer-aided despatch system (Lascad) project — and use it as a particularly cogent example of the features of information systems failure. We maintain that the tendency to analyse information systems failure solely from a technological standpoint is limiting, that the nature of information systems failure is multi-faceted, and hence cannot be adequately understood purely in terms of the immediate problems of systems construction. Our purpose is also to use the generic material on IS failure and the specific details of this particular case study to critique the issues of safety, criticality, human error and risk in relation to systems not currently well considered in relation to these areas. Abstract Copyright © 1999 Elsevier B.V. All rights reserved.
Title: Feminist HCI meets facebook: Performativity and social networking sites
Abstract In this paper, I reflect on a specific product of interaction design, social networking sites. The goals of this paper are twofold. One is to bring a feminist reflexivity, to HCI, drawing on the work of Judith Butler and her concepts of peformativity, citationality, and interpellation. Her approach is, I argue, highly relevant to issues of identity and self-representation on social networking sites; and to the co-constitution of the subject and technology. A critical, feminist HCI must ask how social media and other HCI institutions, practices, and discourses are part of the processes by which sociotechnical configurations are constructed. My second goal is to examine the implications of such an approach by applying it to social networking sites (SNSs) drawing the empirical research literature on SNSs, to show how SNS structures and policies help shape the subject and hide the contingency of subject categories. Abstract Copyright © 2011 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: A survey of methods for data fusion and system adaptation using autonomic nervous system responses in physiological computing
Abstract Physiological computing represents a mode of human–computer interaction where the computer monitors, analyzes and responds to the user’s psychophysiological activity in real-time. Within the field, autonomic nervous system responses have been studied extensively since they can be measured quickly and unobtrusively. However, despite a vast body of literature available on the subject, there is still no universally accepted set of rules that would translate physiological data to psychological states. This paper surveys the work performed on data fusion and system adaptation using autonomic nervous system responses in psychophysiology and physiological computing during the last ten years. First, five prerequisites for data fusion are examined: psychological model selection, training set preparation, feature extraction, normalization and dimension reduction. Then, different methods for either classification or estimation of psychological states from the extracted features are presented and compared. Finally, implementations of system adaptation are reviewed: changing the system that the user is interacting with in response to cognitive or affective information inferred from autonomic nervous system responses. The paper is aimed primarily at psychologists and computer scientists who have already recorded autonomic nervous system responses and now need to create algorithms to determine the subject’s psychological state. Abstract Copyright © 2012 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: Positive mood induction procedures for virtual environments designed for elderly people
Abstract Positive emotions have a significant influence on mental and physical health. Their role in the elderly’s wellbeing has been established in numerous studies. It is therefore worthwhile to explore ways in which elderly people can increase the number of positive experiences in their daily lives. This paper describes two Virtual Environments (VEs) that were used as mood induction procedures (MIPs) for this population. In addition, the VEs’ efficacy at increasing joy and relaxation in elderly users is analyzed. The VEs contain exercises for generating positive-autobiographic memories, mindfulness and slow breathing rhythms. The total sample comprised 18 participants over 55 years old who used the VEs on two occasions. Twelve of them used the joy environment, while 16 used the relaxation environment. Moods before and after each session were assessed using Visual Analogical Scales. After using both VEs, results indicated significant increases in joy and relaxation and significant decreases in sadness and anxiety. The participants also indicated low levels of difficulty of use and high levels of satisfaction and sense of presence. Hence, the VEs demonstrate their usefulness at promoting positive affects and enhancing the wellbeing of elderly people. Abstract Copyright © 2012 British Informatics Society Limited. Published by Elsevier B.V. All rights reserved.
Title: The effects of trust, security and privacy in social networking: A security-based approach to understand the pattern of adoption
Abstract Social network services (SNS) focus on building online communities of people who share interests and/or activities, or who are interested in exploring the interests and activities of others. This study examines security, trust, and privacy concerns with regard to social networking Websites among consumers using both reliable scales and measures. It proposes an SNS acceptance model by integrating cognitive as well as affective attitudes as primary influencing factors, which are driven by underlying beliefs, perceived security, perceived privacy, trust, attitude, and intention. Results from a survey of SNS users validate that the proposed theoretical model explains and predicts user acceptance of SNS substantially well. The model shows excellent measurement properties and establishes perceived privacy and perceived security of SNS as distinct constructs. The finding also reveals that perceived security moderates the effect of perceived privacy on trust. Based on the results of this study, practical implications for marketing strategies in SNS markets and theoretical implications are recommended accordingly. Abstract Copyright © 2010 Elsevier B.V. All rights reserved.
Title: Usability testing: what have we overlooked?
Abstract For more than a decade, the number of usability test participants has been a major theme of debate among usability practitioners and researchers keen to improve usability test performance. This paper provides evidence suggesting that the focus be shifted to task coverage instead. Our data analysis of nine commercial usability test teams participating in the CUE-4 study revealed no significant correlation between the percentage of problems found or of new problems and number of test users, but correlations of both variables and number of user tasks used by each usability team were significant. The role of participant recruitment on usability test performance and future research directions are discussed. Abstract Copyright © 2013 ACM, Inc.
Title: Predicting online grocery buying intention: a comparison of the theory of reasoned action and the theory of planned behavior
Abstract This paper tests the ability of two consumer theories—the theory of reasoned action and the theory of planned behavior—in predicting consumer online grocery buying intention. In addition, a comparison of the two theories is conducted. Data were collected from two web-based surveys of Danish ( n =1222) and Swedish ( n =1038) consumers using self-administered questionnaires. These results suggest that the theory of planned behavior (with the inclusion of a path from subjective norm to attitude) provides the best fit to the data and explains the highest proportion of variation in online grocery buying intention. Abstract Copyright © 2013 Elsevier B.V. All rights reserved.
Title: Decomposition and crossover effects in the theory of planned behavior: A study of consumer adoption intentions
Abstract The Theory of Planned Behavior, an extension of the well-known Theory of Reasoned Action, is proposed as a model to predict consumer adoption intention. Three variations of the Theory of Planned Behavior are examined and compared to the Theory of Reasoned Action. The appropriateness of each model is assessed with data from a consumer setting. Structural equation modelling using maximum likelihood estimation for the four models revealed that the traditional forms of the Theory of Reasoned Action and the Theory of Planned Behavior fit the data adequately. Decomposing the belief structures and allowing for crossover effects in the Theory of Planned Behavior resulted in improvements in model prediction. The application of each model to theory development and management intervention is explored. Abstract Copyright © 1995 Elsevier B.V. All rights reserved.
Title: Knowledge and the Prediction of Behavior: The Role of Information Accuracy in the Theory of Planned Behavior
Abstract The results of the present research question the common assumption that being well informed is a prerequisite for effective action to produce desired outcomes. In Study 1 ( N = 79), environmental knowledge had no effect on energy conservation, and in Study 2 ( N = 79), alcohol knowledge was unrelated to drinking behavior. Such disappointing correlations may result from an inappropriate focus on accuracy of information at the expense of its relevance to and support for the behavior. Study 3 ( N = 85) obtained a positive correlation between knowledge and pro-Muslim behavior, but Study 4 ( N = 89) confirmed the proposition that this correlation arose because responses on the knowledge test reflected underlying attitudes. Study 4 also showed that the correlation could become positive or negative by appropriate selection of questions for the knowledge test. The theory of planned behavior (Ajzen, 1991 ), with its focus on specific actions, predicted intentions and behavior in all four studies. Abstract Copyright © 2013 Informa plc
Link: h ttp://www.businessinsider.com/ron-johnson-apple-store-j-c-penney-2011-11
People come to the Apple Store for the experience — and they’re willing to pay a premium for that. There are lots of components to that experience, but maybe the most important — and this is something that can translate to any retailer — is that the staff isn’t focused on selling stuff, it’s focused on building relationships and trying to make people’s lives better. Abstract Copyright © 2013 Business Insider, Inc. All rights reserved.
Title : Naturalizing aesthetics: Brain areas for aesthetic appraisal across sensory modalities
Abstract We present here the most comprehensive analysis to date of neuroaesthetic processing by reporting the results of voxel-based meta-analyses of 93 neuroimaging studies of positive-valence aesthetic appraisal across four sensory modalities. The results demonstrate that the most concordant area of activation across all four modalities is the right anterior insula, an area typically associated with visceral perception, especially of negative valence (disgust, pain, etc.). We argue that aesthetic processing is, at its core, the appraisal of the valence of perceived objects. This appraisal is in no way limited to artworks but is instead applicable to all types of perceived objects. Therefore, one way to naturalize aesthetics is to argue that such a system evolved first for the appraisal of objects of survival advantage, such as food sources, and was later co-opted in humans for the experience of artworks for the satisfaction of social needs. Abstract Copyright © 2011 Elsevier Inc. All rights reserved.
Link: http://www.scientificamerican.com/article.cfm?id=the-neuroscience-of-beauty
Studies from neuroscience and evolutionary biology challenge this separation of art from non-art. Human neuroimaging studies have convincingly shown that the brain areas involved in aesthetic responses to artworks overlap with those that mediate the appraisal of objects of evolutionary importance, such as the desirability of foods or the attractiveness of potential mates. Hence, it is unlikely that there are brain systems specific to the appreciation of artworks; instead there are general aesthetic systems that determine how appealing an object is, be that a piece of cake or a piece of music. Abstract © 2013 Scientific American, a Division of Nature America, Inc.
Link: http://blogs.scientificamerican.com/symbiartic/2011/10/03/need-proof-that-were-visual-beings/
This video offers proof that humans are visual beings. Abstract © 2013 Scientific American, a Division of Nature America, Inc.
Link: http://hbr.org/web/slideshows/five-charts-that-changed-business/1-slide
Once in a while, a chart so deftly captures an important strategic insight that it becomes an iconic part of management thinking and a tool that shows up in MBA classrooms and corporate boardrooms for years to come. As HBR prepares for its 90th anniversary, in 2012, their editors have combed the magazine archives and other sources to select five charts that changed the shape of strategy. Abstract Copyright © 2013 Harvard Business School Publishing. All rights reserved.
Link: http://www.strategy-business.com/article/04412
It is a widely accepted and rarely challenged tenet of marketing that companies can sustain competitive advantage only through “new and improved” product differentiation based on unique features and benefits. What a mistake. By paying attention to what consumers really want, companies can attract new customers and create a distinctive brand. Abstract © 2013 Booz & Company Inc. All rights reserved.
Link: http://www.economist.com/node/17723028
If you can have everything in 57 varieties, making decisions becomes hard work. Many of these options have improved life immeasurably in the rich world, and to a lesser extent in poorer parts. They are testimony to human ingenuity and innovation. Free choice is the basis on which markets work, driving competition and generating economic growth. It is the cornerstone of liberal democracy. The 20th century bears the scars of too many failed experiments in which people had no choice. But amid all the dizzying possibilities, a nagging question lurks: is so much extra choice unambiguously a good thing? Abstract Copyright © The Economist Newspaper Limited 2013. All rights reserved.
Link: http://e.businessinsider.com/public/1099804
Mobile apps are becoming more important to people, not less important, according to this chart plucked from a big presentation on the internet. It’s an interesting trend because it shows how mobile behavior is different than traditional desktop computing behavior when it comes to the web. Abstract Copyright © 2013 Business Insider, Inc. All rights reserved.
Link: http://blogs.scientificamerican.com/scicurious-brain/2012/07/30/you-want-that-well-i-want-it-too-the-neuroscience-of-mimetic-desire/
Mimetic desire is more than jealously wanting something because someone else has it. Rather, it’s about valuing something because someone else values it . And it’s pretty easy to transmit the value. Just writing about Person A’s activities and habits and showing it to Person B will make Person B start to think Person A must have seen something good about the Toyota Camry…maybe his next car…
But what is behind this contagion of desires? Abstract © 2013 Scientific American, a Division of Nature America, Inc.
Link: http://www.united-academics.org/magazine/27212/visual-memory-blindness/
A well-known pheonomenon in psychology has been the ‘inattentional blindness’ principle. In fact, you might know it from experience: it means that people tend to fail seeing things in their visible fields when they have to focus on a task. Until now, it was thought that in order to cause the effect, a cluttered visual field is required. Recent research shows that the effect is present though in many more situations. Abstract Copyright United Academics 2012 Coypright – All rights Reserved
Link: http://www.businessinsider.com/18-24-texting-2011-9
Chart of the Day: According to the Pew Internet project , people in the 18-24 year-old range are sending and receiving 110 texts per day on average. The median number of texts sent/received by that group is 50 per day. Abstract Copyright © 2013 Business Insider, Inc. All rights reserved.
Link: http://www.businessinsider.com/chart-of-the-day-facebook-time-2011-9
Chart of the Day: A new report on social media from Nielsen shows U.S. users spent 53.5 billion minutes on Facebook in May, which is more time than was spent on the next four biggest sites. Abstract Copyright © 2013 Business Insider, Inc. All rights reserved.
Link: http://www.scientificamerican.com/article.cfm?id=your-brain-on-facebook
A recent study showed that certain brain areas expand in people who have greater numbers of friends on Facebook . There was a problem, though. The study, in Proceedings of the Royal Society B , was unable to resolve the question of whether “friending” plumps up the brain areas or whether people with a type of robustness in brain physiology are just natural social butterflies. But with the help of a few monkeys in England, teenagers everywhere may now have more ammunition to use against parents. Abstract © 2013 Scientific American, a Division of Nature America, Inc.
Link: http://iwc.oxfordjournals.org/content/26/3/196.abstract.html?etoc
Although advances in technology now enable people to communicate ‘anytime, anyplace’, it is not clear how citizens can be motivated to actually do so. This paper evaluates the impact of three principles of psychological empowerment, namely perceived self-efficacy, sense of community and causal importance, on public transport passengers’ motivation to report issues and complaints while on the move. A week-long study with 65 participants revealed that self-efficacy and causal importance increased participation in short bursts and increased perceptions of service quality over longer periods. Finally, we discuss the implications of these findings for citizen participation projects and reflect on design opportunities for mobile technologies that motivate citizen participation. Abstract 2013 Oxford University Press.
Link: http://iwc.oxfordjournals.org/content/26/3/208.abstract.html?etoc
This review paper argues that users of personal information management systems have three particularly pressing requirements, for which current systems do not fully cater: (i) To combat information overload, as the volume of information increases. (ii) To ease context switching, in particular, for users who face frequent interrupts in their work. (iii) To be supported in information integration, across a variety of applications. To meet these requirements, four broad technological approaches should be adopted in an incremental fashion: (i) The deployment of a unified file system to manage all information objects, including files, emails and webpage URLs. (ii) The use of tags to categorize information; implemented in a way which is backward-compatible with existing hierarchical file systems. (iii) The use of context to aid information retrieval; built upon existing file and tagging systems rather than creating a parallel context management system. (iv) The deployment of semantic technologies, coupled with the harvesting of all useful metadata. Abstract 2013 Oxford University Press.
Link: http://iwc.oxfordjournals.org/content/26/3/238.abstract.html?etoc
Projective techniques are used in psychology and consumer research to provide information about individuals’ motivations, thoughts and feelings. This paper reviews the use of projective techniques in marketing research and user experience (UX) research and discusses their potential role in understanding users, their needs and values, and evaluating UX in practical product development contexts. A projective technique called sentence completion is evaluated through three case studies. Sentence completion produces qualitative data about users’ views in a structured form. The results are less time-consuming to analyze than interview results. Compared with quantitative methods such as AttrakDiff, the results are more time consuming to analyze, but more information is retrieved on negative feelings. The results show that sentence completion is useful in understanding users’ perceptions and that the technique can be used to complement other methods. Sentence completion can also be used online to reach wider user groups. Abstract 2013 Oxford University Press.
Link: http://iwc.oxfordjournals.org/content/26/3/256.abstract.html?etoc
Cognitive load (CL) is experienced during critical tasks and also while engaged emotional states are induced either by the task itself or by extraneous experiences. Emotions irrelevant to the working memory representation may interfere with the processing of relevant tasks and can influence task performance and behavior, making the accurate detection of CL from nonverbal information challenging. This paper investigates automatic CL detection from facial features, physiology and task performance under affective interference. Data were collected from participants (n=20) solving mental arithmetic tasks with emotional stimuli in the background, and a combined classifier was used for detecting CL levels. Results indicate that the face modality for CL detection was more accurate under affective interference, whereas physiology and task performance were more accurate without the affective interference. Multimodal fusion improved detection accuracies, but it was less accurate under affective interferences. More specifically, the accuracy decreased with an increasing intensity of emotional arousal. Abstract 2013 Oxford University Press.
Link: http://iwc.oxfordjournals.org/content/26/3/269.abstract.html?etoc
In the field of virtual reality (VR), many efforts have been made to analyze presence, the sense of being in the virtual world. However, it is only recently that functional magnetic resonance imaging (fMRI) has been used to study presence during an automatic navigation through a virtual environment. In the present work, our aim was to use fMRI to study the sense of presence during a VR-free navigation task, in comparison with visualization of photographs and videos (automatic navigations through the same environment). The main goal was to analyze the usefulness of fMRI for this purpose, evaluating whether, in this context, the interaction between the subject and the environment is performed naturally, hiding the role of technology in the experience. We monitored 14 right-handed healthy females aged between 19 and 25 years. Frontal, parietal and occipital regions showed their involvement during free virtual navigation. Moreover, activation in the dorsolateral prefrontal cortex was also shown to be negatively correlated to sense of presence and the postcentral parietal cortex and insula showed a parametric increased activation according to the condition-related sense of presence, which suggests that stimulus attention and self-awareness processes related to the insula may be linked to the sense of presence. Abstract 2013 Oxford University Press.
Link: http://iwc.oxfordjournals.org/content/26/3/285.abstract.html?etoc
Unlike visual stimuli, little attention has been paid to auditory stimuli in terms of emotion prediction with physiological signals. This paper aimed to investigate whether auditory stimuli can be used as an effective elicitor as visual stimuli for emotion prediction using physiological channels. For this purpose, a well-controlled experiment was designed, in which standardized visual and auditory stimuli were systematically selected and presented to participants to induce various emotions spontaneously in a laboratory setting. Numerous physiological signals, including facial electromyogram, electroencephalography, skin conductivity and respiration data, were recorded when participants were exposed to the stimulus presentation. Two data mining methods, namely decision rules and k-nearest neighbor based on the rough set technique, were applied to construct emotion prediction models based on the features extracted from the physiological data. Experimental results demonstrated that auditory stimuli were as effective as visual stimuli in eliciting emotions in terms of systematic physiological reactivity. This was evidenced by the best prediction accuracy quantified by the F1 measure (visual: 76.2% vs. auditory: 76.1%) among six emotion categories (excited, happy, neutral, sad, fearful and disgusted). Furthermore, we also constructed culture-specific (Chinese vs. Indian) prediction models. The results showed that model prediction accuracy was not significantly different between culture-specific models. Finally, the implications of affective auditory stimuli in human–computer interaction, limitations of the study and suggestions for further research are discussed. Abstract 2013 Oxford University Press.
Link: http://www.sciencedirect.com/science/article/pii/S0160289614000087
The deliberate practice view has generated a great deal of scientific and popular interest in expert performance. At the same time, empirical evidence now indicates that deliberate practice, while certainly important, is not as important as Ericsson and colleagues have argued it is. In particular, we (Hambrick, Oswald, Altmann, Meinz, Gobet, & Campitelli, 2014) found that individual differences in accumulated amount of deliberate practice accounted for about one-third of the reliable variance in performance in chess and music, leaving the majority of the reliable variance unexplained and potentially explainable by other factors. Ericsson’s (2014) defense of the deliberate practice view, though vigorous, is undercut by contradictions, oversights, and errors in his arguments and criticisms, several of which we describe here. We reiterate that the task now is to develop and rigorously test falsifiable theories of expert performance that take into account as many potentially relevant constructs as possible. Abstract © 2014 Elsevier Inc.
Link: http://techcrunch.com/2013/02/05/amazon-to-launch-virtual-currency-amazon-coins-in-its-appstore-in-may/
Amazon has just announced a new virtual currency for Kindle Fire owners to use on in-app purchases, app purchases, etc. in the Amazon Appstore. Abstract © 2013 AOL Inc. All rights reserved.
Link: http://onlinelibrary.wiley.com/doi/10.1002/smj.2284/abstract
Link: http://iwc.oxfordjournals.org/content/early/2014/05/09/iwc.iwu016.abstract.html?papetoc
Wizard of Oz (WOZ) is a well-established method for simulating the functionality and user experience of future systems. Using a human wizard to mimic certain operations of a potential system is particularly useful in situations where extensive engineering effort would otherwise be needed to explore the design possibilities offered by such operations. The WOZ method has been widely used in connection with speech and language technologies, but advances in sensor technology and pattern recognition as well as new application areas such as human–robot interaction have made it increasingly relevant to the design of a wider range of interactive systems. In such cases, achieving acceptable performance at the user interface level often hinges on resource-intensive improvements such as domain tuning, which are better done once the overall design is relatively stable. Although WOZ is recognized as a valuable prototyping technique, surprisingly little effort has been put into exploring it from a methodological point of view. Starting from a survey of the literature, this paper presents a systematic investigation and analysis of the design space for WOZ for language technology applications, and proposes a generic architecture for tool support that supports the integration of components for speech recognition and synthesis as well as for machine translation. This architecture is instantiated in WebWOZ—a new web-based open-source WOZ prototyping platform. The viability of generic support is explored empirically through a series of evaluations. Researchers from a variety of backgrounds were able to create experiments, independent of their previous experience with WOZ. The approach was further validated through a number of real experiments, which also helped to identify a number of possibilities for additional support, and flagged potential issues relating to consistency in wizard performance. Abstract 2014 Oxford University Press
Link: http://www.thinkwithgoogle.com/insights/library/studies/the-new-multi-screen-world-study/
This paper studies how business models can be designed to tap effectively into open innovation labor markets with heterogeneously motivated workers. Using data on open source software, we show that motivations are diverse, and demonstrate how managers can strategically influence the flow of code contributions and their impact on project performance. Unlike previous literature using survey data, we exploit the observed pattern of project membership and code contributions—the “revealed preference” of developers—to infer the motivations driving their decision to contribute. Developers strongly sort along key dimensions of the business model chosen by project managers, especially the degree of openness of the project license. The results indicate an important role for intrinsic motivation, reputation, and labor market signaling, and a more limited role for reciprocity. Abstract 2014 John Wiley & Sons, Ltd.
updated on 5/13
Title: Developing elements of user experience for mobile phones and services: survey, interview, and observation approaches
Abstract The term user experience (UX) encompasses the concepts of usability and affective engineering. However, UX has not been defined clearly. In this study, a literature survey, user interview and indirect observation were conducted to develop definitions of UX and its elements. A literature survey investigated 127 articles that were considered to be helpful to define the concept of UX. An in-depth interview targeted 14 hands-on workers in the Korean mobile phone industry. An indirect observation captured daily experiences of eight end-users with mobile phones. This study collected various views on UX from academia, industry, and end-users using these three approaches. As a result, this article proposes definitions of UX and its elements: usability, affect, and user value. These results are expected to help design products or services with greater levels of UX. Abstract Copyright 2011 Wiley Periodicals, Inc.
Title: Why different people prefer different systems for different tasks: An activity perspective on technology adoption in a dynamic user environment
Abstract In a contemporary user environment, there are often multiple information systems available for a certain type of task. Based on the premises of Activity Theory, this study examines how user characteristics, system experiences, and task situations influence an individual’s preferences among different systems in terms of user readiness to interact with each. It hypothesizes that system experiences directly shape specific user readiness at the within-subject level, user characteristics and task situations make differences in general user readiness at the between-subject level, and task situations also affect specific user readiness through the mediation of system experiences. An empirical study was conducted, and the results supported the hypothesized relationships. The findings provide insights on how to enhance technology adoption by tailoring system development and management to various task contexts and different user groups. Abstract Copyright 2011 ASIS&T
Title: A review of factors influencing user satisfaction in information retrieval
Abstract The authors investigate factors influencing user satisfaction in information retrieval. It is evident from this study that user satisfaction is a subjective variable, which can be influenced by several factors such as system effectiveness, user effectiveness, user effort, and user characteristics and expectations. Therefore, information retrieval evaluators should consider all these factors in obtaining user satisfaction and in using it as a criterion of system effectiveness. Previous studies have conflicting conclusions on the relationship between user satisfaction and system effectiveness; this study has substantiated these findings and supports using user satisfaction as a criterion of system effectiveness. Abstract Copyright 2010 ASIS&T
Title: The development and evaluation of a survey to measure user engagement
Abstract Facilitating engaging user experiences is essential in the design of interactive systems. To accomplish this, it is necessary to understand the composition of this construct and how to evaluate it. Building on previous work that posited a theory of engagement and identified a core set of attributes that operationalized this construct, we constructed and evaluated a multidimensional scale to measure user engagement. In this paper we describe the development of the scale, as well as two large-scale studies (N=440 and N=802) that were undertaken to assess its reliability and validity in online shopping environments. In the first we used Reliability Analysis and Exploratory Factor Analysis to identify six attributes of engagement: Perceived Usability, Aesthetics, Focused Attention, Felt Involvement, Novelty, and Endurability. In the second we tested the validity of and relationships among those attributes using Structural Equation Modeling. The result of this research is a multidimensional scale that may be used to test the engagement of software applications. In addition, findings indicate that attributes of engagement are highly intertwined, a complex interplay of user-system interaction variables. Notably, Perceived Usability played a mediating role in the relationship between Endurability and Novelty, Aesthetics, Felt Involvement, and Focused Attention. Abstract Copyright 2009 ASIS&T
Title: Exploring user engagement in online news interactions
Abstract This paper describes a qualitative study of online news reading and browsing. Thirty people participated in a quasi-experimental study in which they were asked to browse a news website and select three stories to discuss at a social gathering. Semi-structured interviews were conducted post-task to understand participants’ perceptions of what makes online news reading and browsing engaging or non-engaging. Findings as presented within the experience-based framework of user engagement and demonstrate the complexity of users’ interactions with information content and systems in online news environments. This study extends the model of user engagement and contributes new insights into user’s experience in casual-leisure settings, such as online news, which has implications for other information domains. Abstract Copyright 2011 by American Society for Information Science and Technology
Abstract This chapter of The Fabric of Mobile Services: Software Paradigms and Business Demands contains sections titled: New Services and User Experience, User-Centered Simplicity and Experience, Methodologies for Simplicity and User Experience, and Case Studies: Simplifying Paradigms Abstract Copyright 2009 John Wiley & Sons, Inc.
Title: The Right Angle: Visual Portrayal of Products Affects Observers’ Impressions of Owners
Abstract Consumer products have long been known to influence observers’ impressions of product owners. The angle at which products are visually portrayed in advertisements, however, may be an overlooked factor in these effects. We hypothesize and find that portrayals of the same product from different viewpoints can prime different associations that color impressions of product and owner in parallel ways. In Study 1, automobiles were rated higher on status- and power-related traits (e.g., dominant , powerful ) when portrayed head-on versus in side profile, an effect found for sport utility vehicles (SUVs)—a category with a reputation for dominance—but not sedans. In Study 2, these portrayal-based associations influenced the impressions formed about the product’s owner: a target person was rated higher on status- and power-related traits when his SUV was portrayed head-on versus in side profile. These results suggest that the influence of visual portrayal extends beyond general evaluations of products to affect more specific impressions of products and owners alike, and highlight that primed traits are likely to influence impressions when compatible with other knowledge about the target. Abstract Copyright 2012 Wiley Periodicals, Inc
Title: The Counterfeit Self: The Deceptive Costs of Faking It
Abstract Although people buy counterfeit products to signal positive traits, we show that wearing counterfeit products makes individuals feel less authentic and increases their likelihood of both behaving dishonestly and judging others as unethical. In four experiments, participants wore purportedly fake or authentically branded sunglasses. Those wearing fake sunglasses cheated more across multiple tasks than did participants wearing authentic sunglasses, both when they believed they had a preference for counterfeits (Experiment 1a) and when they were randomly assigned to wear them (Experiment 1b). Experiment 2 shows that the effects of wearing counterfeit sunglasses extend beyond the self, influencing judgments of other people’s unethical behavior. Experiment 3 demonstrates that the feelings of inauthenticity that wearing fake products engenders—what we term the counterfeit selfmediate the impact of counterfeits on unethical behavior. Finally, we show that people do not predict the impact of counterfeits on ethicality; thus, the costs of counterfeits are deceptive. Abstract Copyright 2010 Francesca Gino, Michael I. Norton, and Dan Ariely3
Link: http://iwc.oxfordjournals.org/content/26/5/389.full.html?etoc
Menus are a key mechanism for organizing different commands in graphical user interfaces. Nowadays low-cost devices that allow using different interaction techniques in remote interfaces have become widespread. Nevertheless, their corresponding menus are direct adaptations from traditional ones. As a consequence, they are inaccurate and slow, and also produce tiredness. In this paper, we design, implement and evaluate a menu selection technique for remote interfaces, the Body Menu. This technique permits whole-body interaction and is specifically designed to take advantage of the proprioception sense. The Body Menu attaches virtual menu items to different parts of the body and selects them when the users reach these zones with their hands. We use the Microsoft Kinect to implement this system. Additionally, we compared it with the most representative menus, studied the best number of body parts to be used and analyzed how children interact with it. Abstract © 2013 Oxford University Publishing.
Link: http://iwc.oxfordjournals.org/content/26/5/403.full.html?etoc
We present the evaluation of an interactive audio map system that enables blind and partially sighted users to explore and navigate city maps from the safety of their home using simulated 3D audio and synthetic speech alone. We begin with a review of existing literature in the areas of spatial knowledge and wayfinding, auditory displays and auditory map systems, before describing how this research builds on and differentiates itself from this body of work. One key requirement was the ability to quantify the effectiveness of the audio map, so we describe the design and implementation of the evaluation, which took the form of a game downloaded by participants to their own computers. The results demonstrate that participants (blind, partially sighted and sighted) have acquired detailed spatial knowledge and also that the availability of positional audio cues significantly improves wayfinding performance. Abstract © 2013 Oxford University Publishing.
Link: http://iwc.oxfordjournals.org/content/26/5/417.full.html?etoc
Delegation is the practice of sharing authority with another individual to enable them to complete a specific task as a proxy. Practices to permit delegation can range from formal to informal arrangements and can involve spontaneous yet finely balanced notions of trust between people. This paper argues that delegation is a ubiquitous yet an unsupported feature of socio-technical computer systems and that this lack of support illustrates a particular neglect to the everyday financial practices of the more vulnerable people in society. Our contribution is to provide a first exploration of the domain of person-to-person delegation in digital payments, a particularly pressing context. We first report qualitative data collected across several studies concerning banking practices of individuals over 80 years of age. We then use analytical techniques centred upon identification of stakeholders, their concerns and interactions, to characterize the delegation practices we observed. We propose a Concerns Matrix as a suitable representation to capture conflicts in the needs of individuals in such complex socio-technical systems, and finally propose a putative design response in the form of a Helper Card. Abstract © 2013 Oxford University Publishing..
Link: Why We Love Beautiful Things
Great design, the management expert Gary Hamel once said, is like Justice Potter Stewart’s famous definition of pornography — you know it when you see it. You want it, too: brain scan studies reveal that the sight of an attractive product can trigger the part of the motor cerebellum that governs hand movement. Instinctively, we reach out for attractive things; beauty literally moves us. © 2013 New York Times
Link: http://www.bris.ac.uk/news/2013/9478.html
A new study has analysed tens of thousands of articles available to readers of online news and created a model to find out ‘what makes people click’. The aim of the study was to model the reading preferences for the audiences of 14 online news outlets using machine learning techniques. The models, describing the appeal of an article to each audience, were developed by linear functions of word frequencies. The models compared articles that became “most popular” on a given day in a given outlet with articles that did not. The research dentified the most attractive keywords, as well as the least attractive ones, and explained the choices readers made. Abstract © 2013 University of Bristol.
Title: Pointing and Selecting with Facial Activity
Abstract The aim of this paper was to evaluate the use of three facial actions (i.e. frowning, raising the eyebrows, and smiling) in selecting objects on a computer screen when gaze was used for pointing. Dwell time is the most commonly used selection technique in gaze-based interaction, and thus, a dwell time of 400 ms was used as a reference selection technique. A wireless, head-mounted prototype device that carried out eye tracking and contactless, capacitive measurement of facial actions was used for the interaction task. Participants (N=16) performed point-and-select tasks with three pointing distances (i.e. 60, 120 and 240 mm) and three target sizes (i.e. 25, 30 and 40 mm). Task completion times, pointing errors and throughput values based on Fitts’ law were used to compare the selection techniques. The participants also rated the techniques with subjective ratings scales. The results showed that the different techniques performed equally well in many respects. However, throughput values varied from 8.38 bits/s (raising the eyebrows) to 15.33 bits/s (smiling) and were comparable to or, in the case of smiling, better than in earlier research with similar interaction techniques. The dwell time was found to be the least accurate selection technique in terms of the magnitudes of point-and-select errors. Smiling technique was rated as more accurate to use than the frowning or the raising techniques. The results give further support for methods that combine facial behavior to eye tracking when interacting with technology.
Abstract Copyright 2014 Outi Tuisku1, Ville Rantanen, Oleg Špakov, Veikko Surakka and Jukka Lekkala
Title: Modeling Traditional Literacy, Internet Skills and Internet Usage: An Empirical Study
Abstract This paper focuses on the relationships among traditional literacy (reading, writing and understanding text), medium-related Internet skills (consisting of operational and formal skills), content-related Internet skills (consisting of information and strategic skills) and Internet usage types (information- and career-directed Internet use and entertainment use). We conducted a large-scale survey that resulted in a dataset of 1008 respondents. The results reveal the following: (i) traditional literacy has a direct effect on formal and information Internet skills and an indirect effect on strategic Internet skills and (ii) differences in types of Internet usage are indirectly determined by traditional literacy and directly affected by Internet skills, such that higher levels of strategic Internet skills result in more information- and career-directed Internet use. Traditional literacy is a pre-condition for the employment of Internet skills, and Internet skills should not be considered an easy means of disrupting historically grounded inequalities caused by differences in traditional literacy.
Abstract Copyright 2014 A.J.A.M. van Deursen and J.A.G.M. van Dijk
Title: Life Is Too Short to RTFM: How Users Relate to Documentation and Excess Features in Consumer Products
Abstract This paper addresses two common problems that users of various products and interfaces encounter—over-featured interfaces and product documentation. Over-featured interfaces are seen as a problem as they can confuse and over-complicate everyday interactions. Researchers also often claim that users do not read product documentation, although they are often exhorted to ‘RTFM’ (read the field manual). We conducted two sets of studies with users which looked at the issues of both manuals and excess features with common domestic and personal products. The quantitative set was a series of questionnaires administered to 170 people over 7 years. The qualitative set consisted of two 6-month longitudinal studies based on diaries and interviews with a total of 15 participants. We found that manuals are not read by the majority of people, and most do not use all the features of the products that they own and use regularly. Men are more likely to do both than women, and younger people are less likely to use manuals than middle-aged and older ones. More educated people are also less likely to read manuals. Over-featuring and being forced to consult manuals also appears to cause negative emotional experiences. Implications of these findings are discussed.
Abstract Copyright 2014 Alethea L. Blackler, Rafael Gomez, Vesna Popovic and M. Helen Thompson
Title: Effect of Age on Human–Computer Interface Control Via Neck Electromyography
Abstract The purpose of this study was to determine the effect of age on visuomotor tracking using submental and anterior neck surface electromyography (sEMG) to assess feasibility of computer control via neck musculature, which allows people with little remaining motor function to interact with computers. Thirty-two healthy adults participated: 16 younger adults aged 18–29 years and 16 older adults aged 69–85 years. Participants modulated sEMG to achieve targets presented at different amplitudes using real-time visual feedback. Root mean squared (RMS) error was used to quantify tracking performance. RMS error was increased for older adults relative to younger adults. Older adults demonstrated more RMS error than younger adults as a function of increasing target amplitude. The differential effects of age found on static tracking performance in anterior neck musculature suggest more difficult translation of human–computer interfaces controlled using anterior neck musculature for static tasks to older populations.
Abstract Copyright 2014 Gabrielle L. Hands and Cara E. Stepp
Title: Should I Stay or Should I Go? Improving Event Recommendation in the Social Web
Abstract This paper focuses on the recommendation of events in the Social Web, and addresses the problem of finding if, and to which extent, certain features, which are peculiar to events, are relevant in predicting the users’ interests and should thereby be taken into account in recommendation. We consider, in particular, three ‘additional’ features that are usually shown to users within social networking environments: reachability from the user location, the reputation of the event in the community and the participation of the user’s friends. Our study is aimed at evaluating whether adding this information to the description of the event type and topic, and including in the user profile the information on the relevance of these factors, can improve our capability to predict the user’s interest. We approached the problem by carrying out two surveys with users, who were asked to express their interest in a number of events. We then trained, by means of linear regression, a scoring function defined as a linear combination of the different factors, whose goal was to predict the user scores. We repeated this experiment under different hypotheses on the additional factors, in order to assess their relevance by comparing the predictive capabilities of the resulting functions. The compared results of our experiments show that additional factors, if properly weighted, can improve the prediction accuracy with an error reduction of 4.1%. The best results were obtained by combining content-based factors and additional factors in a proportion of ∼10:4.
Abstract Copyright 2014 Federica Cena, Silvia Likavec, Ilaria Lombardi and Claudia Picardi
Title: “I Need to Be Explicit: You’re Wrong”: Impact of Face Threats on Social Evaluations in Online Instructional Communication
Abstract Online instructional communication, as found in ask-an-expert forums, e-learning discussion boards or online help desks, creates situations that threaten the recipient’s face. This study analyzed the evaluation of face-threatening acts with a 1×3 design. An online forum thread confronted a layperson with an expert who either (a) addressed the layperson’s misconceptions directly and frankly, (b) mitigated face threats through explicit hints about the need to be direct or (c) communicated politely and indirectly. College students read these dialogues and assessed the expert communicator’s facework, recipient orientation, credibility and likability. Results showed that polite experts were evaluated most positively; explicit hints did not improve perceptions of face-threatening acts. This implies that users of instructional forums prefer communicators to be polite even when face threats are necessary. We discuss practical implications for different online instruction contexts and make suggestions for further research.
Abstract Copyright 2014 Regina Jucks, Lena Päuler and Benjamin Brummernhenrich
Title: The Potential of a Text-Based Interface as a Design Medium: An Experiment in a Computer Animation Environment
Abstract Since the birth of the concept of direct manipulation, the graphical user interface has been the dominant means of controlling digital objects. In this research, we hypothesize that the benefits of a text-based interface involve multiple tradeoffs, and we explore the potential of text as a medium of design from three perspectives: (i) the perceived level of control of the designed object, (ii) a tool for realizing creative ideas and (iii) an effective form for a highly learnable user interface. Our experiment in a computer animation environment shows that (i) participants did feel a high level of control of characters, (ii) creativity was both restricted and facilitated depending on the task and (iii) natural language expedited the learning of a new interface language. Our research provides experimental proof of the effect of a text-based interface and offers guidelines for the design of future computer-aided design applications.
Abstract Copyright 2014 Sangwon Lee and Jin Yan
Title: Framing a Set: Understanding the Curatorial Character of Personal Digital Bibliographies
Abstract We articulate a model of curatorship that emphasizes framing the character of the curated set as the focus of curatorial activity. This curatorial character is structured through the articulation, via mechanisms of selection, description and arrangement, of coherent classificatory principles. We describe the latest stage of a continuing project to examine the curatorial character of personal digital bibliographies, such as Pinterest boards, Flickr galleries and GoodReads shelves, and to support the design of such curatorially expressive personal collections. In the study reported here, 24 participants created personal bibliographies using either a structured design process, with explicit tasks for selecting, describing and arranging collection items, or an unstructured process that did not separate these activities. Our findings lead to a more complex understanding of personal collections as curatorial, expressive artifacts. We explore the role of cohesion as a quality that facilitates expression of the curatorial frame, and we find that when designers read source materials as a part of a set, they are more likely to write cohesive collections. Our findings also suggest that the curatorial act involves both the definition of abstract classificatory principles and their instantiation in a specific material environment. We describe various framing devices that facilitate these reading and writing activities, and we suggest design directions for supporting curatorial reading and writing tasks.
Abstract Copyright 2014 Melanie Feinberg, Ramona Broussard and Eryn Whitworth
Title: Identifying Problems Associated with Focus and Context Awareness in 3D Modelling Tasks
Abstract Creating complex 3D models is a challenging process. One of the main reasons for this is that 3D models are usually created using software developed for conventional 2D displays which lack true depth perspective, and therefore do not support correct perception of spatial placement and depth-ordering of displayed content. As a result, modellers often have to deal with many overlapping components of 3D models (e.g. vertices, edges, faces, etc.) on a 2D display surface. This in turn causes them to have difficulties in distinguishing distances, maintaining position and orientation awareness, etc. To better understand the nature of these problems, which can collectively be defined as ‘focus and context awareness’ problems, we have conducted a pilot study with a group of novice 3D modellers, and a series of interviews with a group of professional 3D modellers. This article presents these two studies, and their findings, which have resulted in identifying a set of focus and context awareness problems that modellers face in creating 3D models using conventional modelling software. The article also provides a review of potential solutions to these problems in the related literature.
Abstract Copyright 2014 Masood Masoodian, Azmi bin Mohd Yusof and Bill Rogers
Abstract The goal of user experience design in industry is to improve customer satisfaction and loyalty through the utility, ease of use, and pleasure provided in the interaction with a product. So far, user experience studies have mostly focused on short-term evaluations and consequently on aspects relating to the initial adoption of new product designs. Nevertheless, the relationship between the user and the product evolves over long periods of time and the relevance of prolonged use for market success has been recently highlighted. In this paper, we argue for the cost-effective elicitation of longitudinal user experience data. We propose a method called the “UX Curve” which aims at assisting users in retrospectively reporting how and why their experience with a product has changed over time. The usefulness of the UX Curve method was assessed in a qualitative study with 20 mobile phone users. In particular, we investigated how users’ specific memories of their experiences with their mobile phones guide their behavior and their willingness to recommend the product to others. The results suggest that the UX Curve method enables users and researchers to determine the quality of long-term user experience and the influences that improve user experience over time or cause it to deteriorate. The method provided rich qualitative data and we found that an improving trend of perceived attractiveness of mobile phones was related to user satisfaction and willingness to recommend their phone to friends. This highlights that sustaining perceived attractiveness can be a differentiating factor in the user acceptance of personal interactive products such as mobile phones. The study suggests that the proposed method can be used as a straightforward tool for understanding the reasons why user experience improves or worsens in long-term product use and how these reasons relate to customer loyalty.
Abstract Copyright 2011 Sari Kujalaa, Virpi Rotob, Kaisa Väänänen-Vainio-Mattilaa, Evangelos Karapanosc and Arto Sinneläa
Title: Researching Young Children’s Everyday Uses of Technology in the Family Home
Abstract Studies of the everyday uses of technology in family homes have tended to overlook the role of children and, in particular, young children. A study that was framed by an ecocultural approach focusing on children’s play and learning with toys and technologies is used to illustrate some of the methodological challenges of conducting research with young children in the home. This theoretical framework enabled us to identify and develop a range of methods that illuminated the home’s unique mix of inhabitants, learning opportunities and resources and to investigate parents’ ethnotheories, or cultural beliefs, that gave rise to the complex of practices, values and attitudes and their intersections with technology and support for learning in the home. This resulted in a better understanding of the role of technology in the lives of these 3- and 4-year-old children.
Abstract Copyright 2014 Lydia Plowman
Title: Measuring web usability using item response theory: Principles, features and opportunities
Abstract Usability is considered a critical issue on the web that determines either the success or the failure of a company. Thus, the evaluation of usability has gained substantial attention. However, most current tools for usability evaluation have some limitations, such as excessive generality and a lack of reliability and validity. The present work proposes the construction of a tool to measure usability in e-commerce websites using item response theory (IRT). While usability issues have only been considered in theoretical or empirical contexts, in this study, we discuss them from a mathematical point of view using IRT. In particular, we develop a standardised scale to measure usability in e-commerce websites. This study opens a new field of research in the ergonomics of interfaces with respect to the development of scales using IRT.
Abstract Copyright 2011 Rafael Tezzaa, Antonio Cezar Borniaa and Dalton Francisco de Andrade
Title: Everything Science Knows Right Now About Standing Desks
Abstract If it wasn’t already clear through common sense, it’s become painfully clear through science that sitting all day is terrible for your health. What’s especially alarming about this evidence is that extra physical activity doesn’t seem to offset the costs of what researchers call “prolonged sedentary time.” Just as jogging and tomato juice don’t make up for a night of smoking and drinking, a little evening exercise doesn’t erase the physical damage done by a full work day at your desk.
In response some people have turned to active desks—be it a standing workspace or even a treadmill desk—but the research on this recent trend has been too scattered to draw clear conclusions on its benefits (and potential drawbacks). At least until now. A trio of Canada-based researchers has analyzed the strongest 23 active desk studies to draw some conclusions on how standing and treadmill desks impact both physiological health and psychological performance. Abstract Copyright 2015 Eric Jaffe
Send Us Your Research References: If you have interesting and relevant research references post, post content as comment below for possible inclusion in next year’s updated list.
Other Content from PulseUX: Here are 2 other references from widely read and quoted long-form posts you may find interesting.
Angry Birds UX: Why Angry Birds is so successful and popular: a cognitive teardown of the user experience (1.5 million page views). https://live-mauro-usability-science.pantheonsite.io/blog/why-angry-birds-is-so-successful-a-cognitive-teardown-of-the-user-experience/
Apple v. Samsung: Impact and Implications for Product Design, User Interface Design (UX), Software Development and the Future of High-Technology Consumer Products https://live-mauro-usability-science.pantheonsite.io/blog/apple-v-samsung-implications-for-product-design-user-interface-ux-design-software-development-and-the-future-of-high-technology-consumer-products/
Charles L Mauro CHFP President / Founder MauroNewMedia
Find out more about Charles L Mauro Find out more about MauroNewMedia Follow Pulse>UX on Twitter @PulseUX
Subscribe to email updates
Fantastic site. A lot of helpful info here. I’m sending it to some buddies ans additionally sharing in delicious. And naturally, thanks on your sweat!
you are truly a just right webmaster. The site loading speed is incredible. It kind of feels that you’re doing any distinctive trick. In addition, The contents are masterwork. you have done a great activity in this matter!
This is such an intriguing post, thank you for sharing it!
This is a comment from Bing using Python.
pelicula mexicana red social hi5 red social the social network en español red social mas usada en mexico
Thanks, I have just been looking for information about this subject for a long time and yours is the best I’ve discovered till now. However, what in regards to the bottom line? Are you certain in regards to the supply?
Subscribe to email updates.
Click through the PLOS taxonomy to find articles in your field.
For more information about PLOS Subject Areas, click here .
Loading metrics
Open Access
Peer-reviewed
Research Article
Roles Data curation, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft
Affiliation School of Psychology and Cognitive Science, East China Normal University, Shanghai, China
Roles Conceptualization, Writing – review & editing
* E-mail: [email protected]
To understand the influence of user interface on task performance and situation awareness, three levels of user interface were designed based on the three-level situation awareness model for the 3-player diner’s dilemma game. The 3-player diner's dilemma is a multiplayer version of the prisoner's dilemma, in which participants play games with two computer players and try to achieve high scores. A total of 117 participants were divided into 3 groups to participate in the experiment. Their task performance (the dining points) was recorded and their situation awareness scores were measured with the Situation Awareness Global Assessment Technique. The results showed that (1) the level-3 interface effectively improved the task performance and situation awareness scores, while the level-2 interface failed to improve them; (2) the practice effect did exist in all three conditions; and (3) the levels of user interface had no effect on the task learning process, implying that the learning rules remained consistent across different conditions.
Citation: Jiang T, Fang H (2020) The influence of user interface design on task performance and situation awareness in a 3-player diner's dilemma game. PLoS ONE 15(3): e0230387. https://doi.org/10.1371/journal.pone.0230387
Editor: Valerio Capraro, Middlesex University, UNITED KINGDOM
Received: August 29, 2019; Accepted: February 28, 2020; Published: March 17, 2020
Copyright: © 2020 Jiang, Fang. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the relevant data are within the paper and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
With the development and popularization of network and smart devices, people have become familiar with user interfaces (UIs). In the process of human-computer interaction, the interface plays a vital role [ 1 ]. A productive and stimulating interface helps ensure high-quality interactions. Therefore, it is necessary to explain the influence of UI design on human behavior and to set standards to measure the characteristics of interfaces.
In this study, the 3-player diner’s dilemma game was employed to explore the impact of UI design on task performance and situation awareness (SA). The 3-player diner's dilemma is a multi-player version of the prisoner's dilemma, which has been widely applied in research in the economics, psychology, and political science fields [ 2 ]. Based on the SA theory and the diner's dilemma game, we hoped to explore the role of UI design in task performance and SA.
As the medium of human-computer interaction, people need the information provided by UIs to complete a variety of tasks. So, information display plays an important role in human-computer interaction. However, according to Lucas and Nielsen, it is difficult to design a graphical interface because there are many variables that affect the development of a computer-based information system [ 3 ]. What and how much information should be presented on the interface has long been an intriguing question.
A naïve approach would be to put as much information as possible into the UI; however, this is untenable and “less is more” has been proved to be true in many cases. In some decision studies, researchers found that presenting more information would make participants feel more confident and lead to lower accuracy in decision making [ 4 ]. Todd et al. noted that sometimes people's decisions rely on limited information [ 5 ]. The same can be found in UI research. For example, Davies et al. studied the influence of menus on task performance during command learning in a word processing application. The results showed that the group without menus performed better than the group with menus [ 6 ]. Xuan et al. used a train simulation driving game to explore the relationship between UI information and task performance, and the results also showed that more information did not mean better performance [ 7 ].
Therefore, the amount of information and how it is presented in the UI should be carefully considered. According to Tufte, “attractive displays of statistical information… display an accessible complexity of detail” [ 8 ]. In other words, a good information display should contain task-relevant information and be presented in a reasonable way that does not overwhelm the users. This sentiment was echoed by human factors psychologists such as Sweller [ 9 , 10 ] and was a component of the ISO standards for interface design.
Under the guidance of these principles, the results still depended on the circumstances. Laura et al. studied information availability in a simulated command and control environment. The results indicated that increasing the volume of information, even when it was accurate and task-relevant, was not necessarily beneficial to decision-making performance [ 11 ]. Davidsson and Alm's research on driving information showed that drivers' need for information was very complex. People usually need different information in different contexts [ 12 ]. Therefore, we argue that the design of a UI should reflect human cognitive processes and fully consider cognitive processing limits and capabilities. There is some evidence to support this assumption, such as the study by Dina et al., which found that when users could customize their UIs, errors were reduced and user acceptance was improved [ 13 ]. In summary, to achieve a well-designed UI, we should not only study the task itself, but also take a deep look at the cognitive processes of the human operators.
One theory that facilitates the understanding of humans’ cognitive processes and has been effectively applied to interface design is the SA theory. Since the SA theory values both explanation and prediction, it has been widely applied in interface research to validate the effectiveness of UI design (e.g., [ 11 , 14 , 15 ]).
The concept of SA originated in research on fighter pilots in the 1990s (e.g., [ 16 ]) to explain the psychological processes of pilots in complex and dynamic environments. Then it was extended to other scenarios [ 17 ]. In such studies, the involvement of SA and the related approaches brought considerable benefits such as high efficiency and error reduction [ 18 ]. Therefore, exploring the characteristics of interfaces based on SA theory is a feasible approach.
There are three distinct definitions of SA theory: (i) individual SA, (ii) team SA, and (iii) system SA. Individual SA means “knowing what is going on around you” [ 18 ]. SA is not an entity that can be touched and observed, but rather a concept involving a cognitive black box. Therefore, there is no unified definition or measurement for it. One of the most famous individual SA models is the three-level model proposed by Endsley [ 19 ]. Endsley believed that SA was the individual’s perception of the elements of their environment, the comprehension of their meaning, and the projection of future states under specific time and space conditions. More specifically, the three levels were presented as follows:
Perception (Lv. 1): the simple awareness of task-related elements (objects, events, people, etc.) and their present states (locations, conditions, modes, actions, etc.) in the surrounding environment.
Comprehension (Lv. 2): Integrating elements from Lv. 1 through understanding their past states and how they impact goals or objectives.
Projection (Lv. 3): Integrating Lv. 1 and Lv. 2 information and using it to project future actions and states of the elements in the environment.
The Situation Awareness Global Assessment Technique (SAGAT), which was proposed based on the above definition, is a popular method of assessing individual SA through probe techniques [ 20 ]. Researchers need to compile situation-related questions on three levels: perception, comprehension, and projection. The test is inserted into the task. At this time, the task is suspended and the participants cannot view other information, which means they need to complete the test by memory. After the test is over, the task continues. Its reliability and validity have been confirmed in many experimental studies [ 14 , 21 – 23 ].
In the 3-player diner’s dilemma, three players enter a restaurant. They agree to order a dish for themselves and the final cost is shared. The restaurant offers two types of dish, one is a hot dog at a low price and the other is a lobster at a high price. The more expensive the dish, the higher the quality. However, the hot dog has a higher quality-cost ratio. Here, we call this ratio dining points (DPs). In this game, players need to take part in multiple rounds of the game, with the goal of improving their total DPs. In our experiment, the hot dog had a quality of 200 and a cost of 10, so the DPs were 20, while the lobster had a quality of 400 and a cost of 30, so the DPs were 13.33. Since the final cost was shared by all three players, there were six possible outcomes for each player (see Fig 1 ).
In our study, there were six possible outcomes for each player.
https://doi.org/10.1371/journal.pone.0230387.g001
Looking at Fig 1 , we can see that although the hot dog had higher DPs, if the human player chose lobster and the other two computer players still chose the hot dog, the player could get more DPs from the loss of the other two players (see lines 1 and 2). However, if one of the computer players chose the lobster, the human player’s benefit would disappear or they would make a loss (see lines 3 and 4). In addition, if the other two computer players chose lobster and the human player still chose the hot dog, then their loss would be even greater (see lines 5 and 6).
Therefore, in order to get more DPs, the players need to observe how the other two behave. In the simplest case, if the other two players always chose the hot dog, the lobster would be the best choice (DPs = 24>20). Similarly, if the other two players always chose lobster, the lobster would also be the optimal solution (DPs = 13.3>8.6). So, when would the hot dog be the optimal solution? The answer is when the other two players were playing TFT (tit-for-tat). When the human player tried to get more DPs by selecting the lobster, the two computer players immediately chose the lobster to counterattack, so the human player’s total DPs would be reduced.
In summary, the players needed to understand the mathematical rules behind the situation and analyze the selection habits of the other players to develop their own strategies. The strategic depth of the game allowed us to design different interfaces to influence the task performance and SA of the participants. At the perception level (Lv. 1), the interface would present the basic task-related elements and allow the human player to understand their meaning. Specifically, all of the player's choices in the last round, as well as the player’s DPs, would be presented. All of this information was necessary for the player to complete the task. At the comprehension level (Lv. 2), as mentioned above, the tendencies of the other two players were vital to the task performance. Therefore, the information in UI-2 needed to reflect the tendencies of the other two players, especially whether they had been playing TFT. At the projection level (Lv. 3), the player needed to be able to use the information to predict future outcomes. The elements in UI-3 had to help the participants understand the possible outcomes of the different options. Because the three levels of SA should not be isolated, the UI-3 needed to contain all of the elements of the subordinate interfaces. So, we designed three interfaces for the game.
As far as we know, a total of three studies used a similar experimental paradigm to explore the impact of UI on SA and task performance in the 3-player diner’s dilemma [ 23 ]. All of the previous studies were based on the three-level model of SA. Three levels of interface were designed. UI-1 involved the level of perception, UI-2 involved the level of comprehension, and UI-3 involved the level of projection. Participants played the game against two computer players.
Yun et al. [ 23 ] originally conducted the experiment to explore the relationship among
SA, trust, and interface types, laying the foundation for subsequent research. The results showed that when using UI-1, the participants had a higher tendency to cooperate, and in the context of encouraging cooperation, the self-reported trust scores and proportion of cooperation responses were positively correlated. However, as a first try, this study unavoidably had some limitations. First, SA was not measured directly. In the study, the interface types and levels of SA were made equivalent, but the interface types did not necessarily reflect the SA. Second, the computer strategy was not described in detail. Finally, the interface was relatively simple.
Therefore, in the study by Onal et al. [ 21 ], some important improvements were made. First, the researchers employed the SAGAT to obtain objective SA scores, making the SA and interface types no longer equivalent. Second, they tried to quantify the computer strategy so that the relationship between it and the behavior of the participants could be clearly recorded. Third, the interface design became more sophisticated. As a result, the study showed that the interface did significantly affect task performance and a significant positive correlation between SA and task performance was also detected.
In the latest research, Schafffer et al. [ 22 ] refined the study by upgrading the number of computer strategies (from 5 to 12), enlarging the sample population (from 95 to 901), and adding advanced statistical methods such as path analysis to explain the results. In general, the conclusions of the study were similar to those of Onal et al. [ 21 ]. The impact of the interface on SA was determined, and the SA scores increased along with the interface level. It was also found that the interface did affect task performance. In the context of encouraging cooperation, better performance tended to increase (indicating improved performance) with the increasing interface level.
In conclusion, past studies explained the relationship among interface, SA, and cooperation, but they still had some shortcomings. First, they did not introduce the time dimension, so it was impossible for them to explain whether (and how) task performance and SA would change with practice. That is to say, the above studies only considered the situation of "novices" in the diner’s dilemma. In fact, the “novice-expert” comparison was an important research issue for individual SA [ 17 ]. Also, Gonzalez et al. [ 14 ] have already shown that SA scores and task performance increased with practice in the water purification plant task. In their study, it was also found that only the levels of perception and comprehension increased with practice, while the third level of projection did not. Gonzalez et al. argued that this might be linked to the difficulty of the task. The establishment of a mental model had a positive effect on SA [ 24 ] and the essence of the practice effect might be the establishment of a mental model that could distinguish novices and experts [ 17 ]. So, it was necessary to probe the practice effect in different tasks to verify the assumption. Second, the computer strategies of the past studies still had a subdivision space. They set computer strategies through two types of parameter, distinguishing between cooperation and defection situations. These settings only changed one of the parameters while fixing the other, which was only a one-dimensional change. Therefore, our study extended it to a two-dimensional plane, making the computer strategy more similar to human behavior.
Based on the previous studies and literature, we proposed the following hypotheses:
The East China Normal University Ethics Committee approved this study. Our whole research process followed the ethical guidelines for research involving human subjects, and every participant provided written informed consent.
This study adopted a mixed 3 (between-subjects variable) × 4 (within-subjects variable) design to examine the effects of interface and practice on DPs and SA scores. In the experiment, the participants were asked to compete with two computers that used a preset strategy, and they were encouraged to get as many DPs as possible. The task was divided into 4 blocks in a total of 200 trials. The program settled the DPs once every 50 trials, and then the DPs returned to zero to start a new block. The experiment contained three levels of UI, and each participant fulfilled the task under only one.
In the experiment, the two computer players used the TFT strategy that was proposed by previous studies [ 21 – 23 ]. The TFT strategy meant that the computer would make the next round’s decisions based on the choice of the participant. In other words, if the participant chose a hot dog, the computer would also choose the hot dog in the next trial. Or if the participant chose the lobster, the computer would choose the lobster in the next trial. However, there was a 10% chance that the computer would make a different choice from the participant. There were two reasons for this setting: first, it ensured that the game had a clear optimal strategy so that the participants had the chance to learn; second, the probability was variable, which would make the computer players more natural, thus improving the ecological validity of the experiment.
In order to verify the practice effect, the task was divided into 4 blocks, 50 trials per block, for a total of 200 trials. At the end of each block, the DP results (including all previous blocks) were presented to the participants and reset to 0 after confirmation (see Fig 2 ).
In this study, the task was divided into 4 blocks, 50 trials per block. Dining points would be set to zero at the end of each block.
https://doi.org/10.1371/journal.pone.0230387.g002
The effect of the interface level was another focus of the experiment. Three levels were designed based on SA theory (see Fig 3 ).
This figure shows the UIs employed in the study. The solid green line box is the level-1 UI, the blue dashed box is the level-2 UI, and the red line box is the level-3 UI.
https://doi.org/10.1371/journal.pone.0230387.g003
Level-1 UI: The green solid line box shown in Fig 3 was a simplified UI (level-1 UI), in which only the basic buttons, the players’ selection for each trial, the DPs obtained in each trial, and the sum of the DPs were presented. At this level, participants could not directly view the selection tendency of the computer players.
Level-2 UI: The blue dashed box shown in Fig 3 was the level-2 UI. Compared with the level-1 UI, a history panel was added. In the history panel, the participants could view the selection information of all players in each trial, the DPs obtained in each trial, and the frequency count of the computer players mimicking the choices of the participants.
Level-3 UI: The red line frame part of Fig 3 was the level-3 UI. Compared with the level-2 UI, a prediction panel was added. Through the prediction panel, the participants could adjust the parameters and the program would calculate the expected DPs according to the preset formula to help the participants make decisions.
The SAGAT was employed in this study to measure the SA scores. The test was inserted in the 25th trial of the task of every block. At this time, the participant could not view the main interface. According to the task situation, eight questions were put forward, of which questions 1–3 corresponded to the level of perception, 4–5 to the level of comprehension, and 6–8 to the level of projection. All were 4-to-1 single-choice questions, with 1 point for the correct choice and 0 points for the wrong choice, so the highest SA score for each test was 8 points. The eight questions are shown in Table 1 .
https://doi.org/10.1371/journal.pone.0230387.t001
The program was run on a MacBook Pro 13 with an external 25-inch 2560×1440 display. In the experiment, the participants used an external display and a Bluetooth mouse. Both the GUI and the background code were written in MATLAB 2019a.
Initially, we recruited 90 undergraduates. Then, during the revision of the manuscript, we recruited another 27 undergraduates to improve the statistical power. Finally, a total of 117 undergraduates were recruited and formally informed, including 82 females and 35 males. The 117 volunteers (between 18 and 26 years old) were randomly assigned to 1 of the 3 interface conditions. Each group contained 39 participants. In the three interface conditions, the number of males was 11 (UI-1), 12 (UI-2), and 12 (UI-3).
A repeated measures analysis of variance (ANOVA) was employed to analyze the DPs. The results showed that the main effect of the block was significant: F (3, 342) = 48.924, p < 0.001, η partial 2 = 0.300. The main effect of the interface condition was also significant: F (2, 114) = 6.183, p = 0.003, η partial 2 = 0.098. However, the interaction between the block and the interface condition was not significant: F (6, 342) = 0.872, p = 0.515, η partial 2 = 0.015. Post-hoc multiple comparisons with Bonferroni correction indicated that (1) there were significant differences between block 1 and block 2 ( p < 0.001), block 1 and block 3 ( p < 0.001), block 1 and block 4 ( p < 0.001), block 2 and block 3 ( p = 0.01), block 2 and block 4 ( p < 0.001), and block 3 and block 4 ( p = 0.002); (2) the DPs in UI-3 were significantly higher than in UI-1 ( p = 0.005) and UI-2 ( p = 0.016) and there were no significant differences between UI-1 and UI-2 ( p = 1.000) (see Fig 4 ).
The red box on the left of the figure shows the post-hoc comparison results of the block and the blue box on the right shows the post-hoc comparison results of the UI. (* p < 0.05, ** p < 0.01, *** p < 0.001; the error bars denote 2 SDs).
https://doi.org/10.1371/journal.pone.0230387.g004
Similarly, a repeated measures ANOVA was used to analyze the SA scores. The results showed that the main effect of the block was significant: F (3, 342) = 16.681, p < 0.001, η partial 2 = 0.128. The main effect of the interface condition was also significant: F (2, 114) = 3.985, p = 0.021 < 0.05, η partial 2 = 0.065. However, the interaction between the block and the interface condition was not significant: F (6, 342) = 0.631, p = 0.705, η partial 2 = 0.011. Post-hoc multiple comparisons with Bonferroni correction indicated that (1) the SA scores of block 1 and block 2 ( p = 0.027), block 1 and block 3 ( p < 0.001), block 1 and block 4 ( p < 0.001), and block 2 and block 4 ( p = 0.001) showed significant differences and there were no significant differences between block 2 and block 3 ( p = 0.228) or block 3 and block 4 ( p = 0.289); (2) the SA scores of UI-3 were significantly higher than those of UI-2 ( p = 0.017), while there were no significant differences between UI-1 and UI-3 ( p = 0.664) and UI-1 and UI-2 ( p = 0.346) (see Fig 5 ).
https://doi.org/10.1371/journal.pone.0230387.g005
In theory, both the SA scores and the DPs could reflect the participants' understanding of the task, and there should be a positive correlation between the two variables. The results showed that there did exist a significant positive correlation; the Pearson’s correlation coefficient results were r = 0.543, p < 0.001 (see Fig 6 ).
There was a significant positive correlation between SAs and DPs with r = 0.5, indicating that the SA scores could reflect the participants' understanding of the task.
https://doi.org/10.1371/journal.pone.0230387.g006
In our study, we examined the influence of interface design on task performance and SA, and also took the practice effect into consideration. There were two different dependent variables, in which the DPs reflected the task performance, while the SA scores reflected the understanding of the task. As mentioned above, the task in our study was not complicated in operation, so we had supposed that there would be a positive correlation between the two dependent variables (Hypothesis 1). The results supported the hypothesis, as there was a significant positive correlation ( r = 0.543) between the DPs and the SA scores. High SA scores might have helped the participants emphasize the long-term gains over short-term gains (choosing lobster to gain more DPs in a single round), thus increasing the total DPs. On the other hand, the results also showed that the SAGAT had good reliability and validity, which could reflect the participants' understanding of the task.
The second hypothesis was that as more components became involved in the interface, the task performance would improve. In our interface design, UI-3 was the most complex level, which covered the compositions of the subordinate levels. Therefore, we speculated that UI-3 would lead to the best task performance and SA scores. The results verified this point. It was found that there was a significant positive effect of UI-3 on both the DPs and the SA scores. Therefore, the design of the prediction panel did enhance the participants' comprehension of the task situation. It not only helped the participants understand the rules more clearly, but also facilitated their more accurate and finer future planning. These results were in line with our hypothesis.
Taking a deeper view on this, the advantage of UI-3 might have come from the high integration of information. A good interface should not only integrate information but also embrace simplicity [ 25 ]. There was evidence that integrating all kinds of sub-UIs into a single UI was helpful for improving SA [ 26 ]. According to Durso et al., both working memory and mental models are relevant to SA [ 17 ]. By providing a graphical depiction of the diners’ historical decisions, UI-3 served as a working memory assist, which reduced the unnecessary cognitive load of the participants. Meanwhile, the prediction panel made good use of the information provided by the history panel, prompting the participants to build a correct mental model of the situation. The participants could understand the task through a single interface. So, under this condition, the participants performed better in both DPs and the SA test.
However, when using UI-2, there was no significant improvement in task performance and SA scores compared with UI-1. So, contrary to the effect of the prediction panel of UI-3, the history panel in UI-2, which corresponded to the comprehension level of SA theory, was not ideal. The history panel failed to improve the participants’ DPs or SA scores. Also, in the SAGAT, the SA scores of UI-2 were slightly lower than those of UI-1. The history panel also failed to help the participants answer the two questions that were relevant to the comprehension level. This result did not fit our hypothesis, but might reflect that more information was not always beneficial [ 7 , 11 ]. Looking back at UI-2, the frequency count on the history panel did not have an intuitive effect in the absence of a predictive panel, which might have made the participants feel puzzled. This also suggested that simply providing additional information did not necessarily improve SA and task performance.
The third hypothesis was that the practice effect would exist in all three interface conditions. The results showed that it did exist in all three conditions; not only the DPs, but also the SA scores significantly increased with practice. Also, the levels of UI had no effect on the task learning process, which implied that the learning rules remained consistent across the different conditions. According to Endsley, it usually takes a long time to build SA [ 27 ]. However, in a simple task, the SA could also change through training in a short time [ 28 ]. The task of this experiment should belong to the latter, as the task was a relatively simple simulation scenario, so the SA scores could change significantly in the four blocks. However, the improvement of the DPs and the SA scores was not smooth. The SA scores tended to be stable in blocks 2, 3, and 4, while the changes in DPs were significant from block to block. These results implied that the participants had established a stable mental model of the task. In addition, no interaction of blocks and UI level was found in the experiment. Given that the assignment of the participants was completely random, it could be inferred that the pre-task instructions and practice were the main reasons why the UI-3 group had the best performance. In the subsequent whole process of the task, the advantages were continually maintained, neither expanding nor decreasing. This might mean that the UI level did not have a substantial impact on the efficiency of task learning, and the design of the UI might not have changed the law of learning.
First, the main purpose of our study was to verify the theoretical basis for interface design. So, we adopted the 3-player diner's dilemma as the experimental task, which was a simulated situation. Moreover, we chose only situations where the computer played TFT without considering more complex computer strategies, such as computer players dynamically adjusting their preferences in response to the participants’ choices. Given that, whether our results could be extended to other tasks or even to the real world still remains to be proven.
Another limitation is that our interface design did not exclude all of the irrelevant variables. UI-3 was interactive, while UI-1 and UI-2 were not. Previous studies confirmed that the controllability of the system was beneficial to understanding the situation and improving task performance and satisfaction [ 13 , 29 ]. Thus, it is the interaction itself that might help the participants to have better control over the UI and deepen their understanding of the task. To eliminate this possibility, in the future, we would like to change the parameter adjustment in the prediction panel in UI-3 from manual input to automatic input to test this assumption.
Finally, the SAGAT in our study used only eight questions, which could have led to insufficient reliability and validity of the test. Specifically, even if the participants did not know the correct answer, just guessing might have led to a high score. In addition, the effect size of the SA became small. Therefore, in future experiments, we might add more questions and refine the test to improve its reliability.
Three interfaces based on the three-level SA model were designed to explore the role of interface design in task performance, SA scores, and the learning process in a simulated situation. We found that: (1) the level-3 interface effectively improved the task performance and SA scores, while the level-2 interface failed to improve them; (2) the practice effect did exist in all three conditions; and (3) the levels of UI had no effect on the task learning process, which implied that the learning rules remained consistent across the different conditions.
S1 table. descriptive statistical results of the main variables of the experiment..
https://doi.org/10.1371/journal.pone.0230387.s001
https://doi.org/10.1371/journal.pone.0230387.s002
https://doi.org/10.1371/journal.pone.0230387.s003
Skip navigation
Usability 101: introduction to usability.
January 3, 2012 2012-01-03
This is the article to give to your boss or anyone else who doesn't have much time, but needs to know the basic usability facts.
What — definition of usability, why usability is important, how to improve usability, when to work on usability, where to test.
Usability is a quality attribute that assesses how easy user interfaces are to use. The word "usability" also refers to methods for improving ease-of-use during the design process.
Usability is defined by 5 quality components :
There are many other important quality attributes. A key one is utility , which refers to the design's functionality: Does it do what users need?
Usability and utility are equally important and together determine whether something is useful: It matters little that something is easy if it's not what you want. It's also no good if the system can hypothetically do what you want, but you can't make it happen because the user interface is too difficult. To study a design's utility, you can use the same user research methods that improve usability.
On the Web, usability is a necessary condition for survival. If a website is difficult to use, people leave . If the homepage fails to clearly state what a company offers and what users can do on the site, people leave . If users get lost on a website, they leave . If a website's information is hard to read or doesn't answer users' key questions, they leave . Note a pattern here? There's no such thing as a user reading a website manual or otherwise spending much time trying to figure out an interface. There are plenty of other websites available; leaving is the first line of defense when users encounter a difficulty.
The first law of ecommerce is that if users cannot find the product, they cannot buy it either.
For intranets , usability is a matter of employee productivity . Time users waste being lost on your intranet or pondering difficult instructions is money you waste by paying them to be at work without getting work done.
Current best practices call for spending about 10% of a design project's budget on usability. On average, this will more than double a website's desired quality metrics (yielding an improvement score of 2.6) and slightly less than double an intranet's quality metrics. For software and physical products, the improvements are typically smaller — but still substantial — when you emphasize usability in the design process.
For internal design projects, think of doubling usability as cutting training budgets in half and doubling the number of transactions employees perform per hour. For external designs, think of doubling sales, doubling the number of registered users or customer leads, or doubling whatever other KPI (key performance indicator) motivated your design project.
There are many methods for studying usability, but the most basic and useful is user testing , which has 3 components:
It's important to test users individually and let them solve any problems on their own. If you help them or direct their attention to any particular part of the screen, you have contaminated the test results.
To identify a design's most important usability problems, testing 5 users is typically enough. Rather than run a big, expensive study, it's a better use of resources to run many small tests and revise the design between each one so you can fix the usability flaws as you identify them. Iterative design is the best way to increase the quality of user experience. The more versions and interface ideas you test with users, the better.
User testing is different from focus groups , which are a poor way of evaluating design usability. Focus groups have a place in market research, but to evaluate interaction designs you must closely observe individual users as they perform tasks with the user interface. Listening to what people say is misleading; you have to watch what they actually do.
Usability plays a role in each stage of the design process. The resulting need for multiple studies is one reason I recommend making individual studies fast and cheap. Here are the main steps:
Don't defer user testing until you have a fully implemented design. If you do, it will be impossible to fix the vast majority of the critical usability problems that the test uncovers. Many of these problems are likely to be structural, and fixing them would require major rearchitecting.
The only way to a high-quality user experience is to start user testing early in the design process and to keep testing every step of the way.
If you run at least one user study per week , it's worth building a dedicated usability laboratory. For most companies, however, it's fine to conduct tests in a conference room or an office — as long as you can close the door to keep out distractions. What matters is that you get hold of real users and sit with them while they use the design. A notepad is the only equipment you need.
Ux basic training.
Foundational concepts that everyone should know
Interaction
Collect insights without leaving your desk
Designing successful web pages based on content priority, visual design, and the right page components to meet objectives
Please accept marketing cookies to view the embedded video. https://www.youtube.com/watch?v=st9AEPOjGpU
Usability 101
Tabs vs. Accordions: When to Use Each
Huei-Hsin Wang · 3 min
How to Conduct a Heuristic Evaluation
Kate Moran · 5 min
What Is UX (Not)?
Kate Kaplan · 3 min
User-Experience Quiz: 2023 UX Year in Review
Raluca Budiu · 3 min
UX Basics: Study Guide
Tim Neusesser · 6 min
Competitive Usability Evaluations
The ELIZA Effect: Why We Love AI
Caleb Sponheim · 3 min
Evaluate Interface Learnability with Cognitive Walkthroughs
Kim Salazar · 8 min
The Hawthorne Effect or Observer Bias in User Research
Mayya Azarova · 10 min
Baymard Institute uncovers what designs cause usability issues, how to create “State of the Art” user experiences, and measure how your UX performance stacks up against leading e-commerce sites.
UX Articles
384 free articles covering 5% of Baymard’s large-scale e-commerce UX research findings.
UX Benchmarks
Case studies of 244 top e-commerce sites. Ranked using 215,000+ UX performance scores.
Page Designs
14,000+ annotated design examples, for systematic inspiration on e-commerce page types.
Premium Research
Get full access to Baymard’s 130,000+ hours of research and empower your UX decisions.
Bi-weekly UX articles based on Baymard’s e-commerce research
The Current State of Accounts & Self-Service UX: 5 Common Pitfalls & Best Practices
If Providing Sidebar Filtering, Position the “Size” Filter near the Top and Expand It by Default
Always Allow Users to Navigate across User Reviews via Reviewer-Submitted Images
Mobile UX Trends: The Current State of Mobile UX (15 Common Pitfalls & Best Practices)
Make Research-Based UX And Design Decisions
Use Baymard’s comprehensive UX research database to create “State of the Art” user experiences, and see how your UX performance stacks up.
With Baymard Premium you will get access to 650+ design guidelines and 215,000+ performance scores — insights already used by several of the world’s leading sites.
Learn More About Baymard Premium
The 40 Most Important Changes You Can Make To Your UX
A UX expert from Baymard will perform a full analysis of your site, based on our 130,000+ hours of UX research.
The 120-page audit report will outline 40 improvements for your site, document its UX performance across 500 parameters, and compare it to industry leaders and competitors.
Learn More About Our UX Audit Service
100% remote & self-paced courses
Unlock the full potential of your UX team and accelerate your individual career with Baymard's UX training and certification platform. The self-paced courses are based on Baymard’s 130,000+ hours of UX research.
There’s 3 difficulty levels to match all UX backgrounds and ambitions, from the uninitiated to the UX veteran. Beyond training and certification, there’s also guest lectures from Google, Luke Wroblewski, Paul Boag, Brad Frost, etc.
Learn More About Our UX Training
Baymard’s research is used by 17,500+ brands, agencies, researchers, and UX designers, across 80+ countries, and includes 71% of all Fortune 500 e-commerce companies.
“ Baymard produces some of the most relevant and actionable user experience research available. They really understand the needs of UX and Product Management professionals, and their deep experience in the eCommerce field allows them to offer sophisticated, nuanced insights. ”
“ Baymard has been a great resource in helping us improve the customer experience. We are continually applying these best practices to our sites. ”
“ I can not tell you how much help your benchmark studies have been for our company, e-commerce and UX teams. We have used and continue to use these reports for baseline benchmarks as we build test protocols or eye tracking scripts etc. in lab. ”
“ Thanks again for the great work on our checkout project. Our whole group found it incredibly insightful. We’re applying the suggestions you provided to our new checkout design which launches at the end of the month! One of my colleagues was also interested in your group’s competitive expertise with regard to responsive web and native apps. ”
“ Thank you. This was an excellent piece of work: professional, thorough, and actionable for the team. We’re very happy with the work Baymard has done for us. ”
“ Thank you very much for the 7 usability audits of our country-specific sites. The audits have provided us with specific and actionable advice, allowed us to prioritize development resources, and enabled us to compare UX performance between the 7 different country-specific sites, and against State of the Art implementations. The audit itself is done really professionally, and the recommendations contain actionable and insightful information. ”
“ Intelligent, consumer-focused insights that are clear and actionable. The team in the room really loved the way the Baymard Institute highlighted the optimizations in the various user experience elements (copy, layout, design, calls-to-action…), from the perspective of consumer struggles. Baymard’s Usability research really complements our other existing research tools. ”
“ Thank you, this was really insightful! ”
“ We’ve received some awesome feedback from our Merchant Success team as well as our merchants about all of the UX Audits we’ve had thus far with Baymard. Thank you so much to you and your team for all of your hard work. The pilot with Baymard has been going fantastic and I’m really excited with all that we’re learning! You have an amazing platform, team and super helpful data base for us to work with. ”
“ Having Baymard is like having access to a magical UX super power. I can't believe how helpful and easy to use it is, given the vast array of tools and information they provide! ”
“ Baymard's audit services give us a detailed view of usability improvements across our entire site. This is so much more comprehensive than running individual usability studies. ”
“ Clear, concise, actionable, data-driven insights! ”
“ I was able to bring these designed solutions home with me and kickoff multiple optimization projects that I am confident will affect the site in a positive way, both in usability and conversion. ”
“ I just wanted to take a minute to thank you for the amazing work on this audit. You should know that this has been very well received internally and there’s a lot of excitement around adopting the ideas you have shared. ”
“ Very thorough and professional UX review of our website, based on an extensive amount of previous UX research insights within the industry, and specifically targeted to our needs. We received both critical and, most importantly, constructive feedback, along with actionable, prioritized suggestions and best-practice examples. This will allow us to address the areas of improvement and significantly help ameliorate the experience users have on our website, which in turn is expected to drive conversion rates and reduce the number of customer service requests. We can highly recommend Baymard's UX audit. ”
“ Damn. The reports that the @Baymard folks do cost money, but they’re worth it. ”
“ This has been fantastic: really good recommendations, really comprehensive. ”
“ I can confirm that the list was fully implemented. Every time we put up a change we either A/B test or we watch it very closely to determine that it’s doing better and not the opposite. So I can confirm that these fixes have improved our checkout. Thanks for everything. ”
“ Excellent tool – looking forward to using it with our other sites and prototypes as they’re developed. ”
“ We found the audits extremely helpful and validated a number of changes we have been wanting to make or are in the process of making, so thank again for all the great insights. ”
“ This was indeed very helpful guidance and a very well-documented roadmap for us to fix, validate, organize, collectively understand and continually improve our ecommerce foundation. ”
“ It is immensely valuable having a thorough, independent study to help validate my work and in particular, help facilitate buy-off from stakeholders. Baymard has quickly become one of my most trusted resources for the UX/UI field. ”
“ I found the UX audit a very comprehensive evaluation, with clear reports and actionable recommendations. Baymard's commitment to excellence in user experience shines through its thorough approach! ”
“ Thanks for everything. The audit was extremely useful, I think we have gained valuable insight. ”
“ This was…mind-blowing. We’ve been having conversations on the side as you’ve been presenting the audit findings. There’s so much to do! ”
“ These reports are fabulous. The content is exactly what our team has been looking for, and so much more! Extremely helpful, thank you! ”
“ I have found the M-Commerce and E-Commerce reports very useful, thank you! ”
“ I’m an avid user of your reports and recommendations. I have leveraged your articles and findings throughout my career in B2B, B2C, and hospitality. ”
“ The Baymard team has been a delight to work with on the JohnLewis digital platform audit. They responded to the brief very well, have been very accessible for ongoing clarification and queries and Rebecca was excellent in the recent team share, articulately presenting findings in an engaging walk-through with the wider team which will really support driving engagement and a robust response. Many thanks for all the effort and focus folks. ”
“ The Baymard reports have proven to be an invaluable resource for us. Comprehensive, pragmatic and actionable. We have redesigned our checkout process and made changes to our category pages based on usability guidelines in the reports. ”
“ Thanks for this audit and your good work. This was exactly what I was aiming for. Also thanks for the very, very professional presentation, and answering all our countless questions. Very good work. ”
“ I just wanted to let you know that I think your site is the best thousand bucks I’ve ever spent. I wish I found you years ago. ”
“ First off, thank you. This was the most engrossed I’ve ever been in a 2-hour meeting. This [audit presentation] was incredibly insightful and very helpful. Many, many thanks. ”
“ We are very excited to finally proceed with the UX improvements, and I truly believe your audit report will be super helpful to put us ahead of the wave. If you ever need a reference, please do not hesitate to share my contact. ”
“ Baymard has helped so much: UX was a brand new role at my company when I was hired. I was researching, planning, and designing UX & UI for 5 different products, all by myself. After showing real-world, bottom-line results from a UX centered approach to our products, we have expanded our UX team and greatly improved our UX-to-product process. Baymard’s research database was a critical component to my (and my company’s) success. Thank you! ”
“ Wanted to thank you again for the checkout audit and walking us through the process. It was super helpful and we can’t wait to apply the changes to our checkout for a better user experience. ”
“ The recommendations in our audit were awesome - well prioritized, actionable and helped us focus on what to optimize. This audit, along with the e-Commerce Reports & Benchmark Databases, are my go-to resources for thorough, insightful information. Thank you! ”
“ This is awesome so far. Everyone wants to know what's going on – you just got everyone's attention here. Everything that you've called out is definitely eye-opening for us over here. ”
“ Some time ago we purchased the Ecommerce Homepage & Category report - the research and insights are extremely useful to us and help us a lot in our work! ”
“ Given the tricky science of conversion rate optimization, it is great to know that you are dealing with professionals whose advice is based on solid research. It was a pleasure collaborating with the Baymard team. ”
“ Within a very short time Baymard Institute provided 15 clear, useful improvement suggestions for our checkout process. We intend to implement all of them. It’s easy to find companies that offer website improvement suggestions. But, most companies don’t do their homework and don’t provide specific examples of how best to make the improvements. With Baymard Institute, the checkout process suggestions they made were intuitive, specific, and actionable. I highly recommend their audit service. ”
“ This UX audit has been very helpful, not just for our design and product teams, but even for the UX research team, because we can reference back to the audit, either in the design of a user research session or when we analyze findings. Thank you very much; this has been incredibly valuable. ”
“ The Baymard UX audit has been a revelation for our organisation and will likely become a vital tool in our process moving forward. ”
“ Working with Baymard for our UX audit was an exceptional experience from start to finish. Their attention to detail, depth of analysis, and clear communication throughout the process truly exceeded our expectations. The insights they provided were not only actionable but profoundly insightful. I highly recommend Baymard for their expertise, professionalism, and commitment to elevating user experiences. ”
“ The audit opened our eyes once again, as we are often blind to our own operations. The comparison with competitors' best practices was particularly helpful. ”
You have full access to this open access article
15 Accesses
Explore all metrics
Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users and cognitive assessment administrators, through standard evaluation questionnaires. Method: a sample of 46 individuals and 24 evaluators participated in this study. End-users were fairly heterogeneous considering demographic and neuro-psychological characteristics. Evaluators were mostly health and social care professionals, relatively well-balanced in terms of gender, career background and years of experience. Results: end-users acceptability ratings were generally positive (rating above 3 in a 5-point scale for all dimensions) and it improved significantly after the interaction with DigiMoCA. Administrators also rated the usability of DigiMoCA, with an average score of 5.86/7 and with high internal consistency ( \(\alpha \) = 0.95). Conclusion: although there is still room for improvement in terms of user satisfaction and voice interface, DigiMoCA is perceived as an acceptable, accessible and usable cognitive screening tool, both by individuals being tested and test administrators.
Avoid common mistakes on your manuscript.
In a world with an ever-increasing human lifespan, the quality of life of senior adults is becoming more and more relevant. According to WHO [ 1 ], the percentage of population over the age of 60 will increase by 34% between 2020 and 2030, and with it, the prevalence of neuro-psychiatric disorders, particularly dementia, which have an extremely high impact on people’s well-being and their social and economical aspects.
Mild Cognitive Impairment (MCI) is the transition stage between healthy aging and dementia and is characterized by subtle cognitive deficits that do not meet the criteria for diagnosis of a major neuro-cognitive disorder (DSM-V) [ 2 ]. These difficulties can manifest themselves in areas such as memory, attention, language, orientation or decision making. Thus, detecting MCI in its early stages is beneficial in preventing the progression of the disease, and, in certain cases, in slowing down some of its symptoms. However, in most cases the detection of cognitive deficits occurs when the symptoms are already evident and when the underlying neurological disorder was already present for some time [ 3 ], which means that the disease progressed. The traditional screening method for early detection of cognitive impairment involves the use of clinically-validated gold-standard tests that assess the cognitive state of a person.
The inception of these tests trace back to the second half of the 20th century. One of the first widely used screening tools was the Mini-Mental State Examination (MMSE), published by Folstein [ 4 ] in 1975; it includes items of orientation, concentration, attention, verbal memory, naming and visuospatial skills. In the 80s, the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) was developed [ 5 ] and it included 7 items, namely word recall, naming, commands, constructional praxis, ideational praxis, orientation and word recognition.
One of the limitations of these evaluation instruments is the fact that they are dementia-oriented, particularly Alzheimer’s. Therefore, in later years other screening tools were created, e.g., the Montreal Cognitive Assessment (MoCA) [ 6 ] test, which has a 90% sensitivity for MCI detection (MMSE is not sensitive to MCI). Its telephone version (T-MoCA) [ 7 , 8 ] is also validated and has a strong correlation with MoCA with a Pearson coefficient of 0.74.
The fact that MoCA is oriented at MCI detection makes it suitable as a screening tool for an early diagnosis.
In this context, the use of Information and Communication Technologies (ICT) could be a valuable tool for the early detection of MCI cases in a reliable and efficient way, where smart conversational agents are a disruptive technology with the potential to help detect neuro-psychiatric disorders in early stages [ 9 , 10 ]. Note that the penetration of these technological tools among senior adults is not as higher as in the case of other age groups, which makes these tools even more relevant.
Previous research demonstrated that it is possible to implement a voice-based version of a gold standard test for cognitive assessment using conversational agents [ 11 ]. More specifically, DigiMoCA, an Alexa voice application based on T-MoCA, was developed and tested with actual elderly people using a smart speaker.
DigiMoCA makes use of Alexa’s voice recognition and natural language processing services, and is able to store and retrieve session data in DynamoDB (Amazon’s NoSQL database service) persistently. Additionally, DigiMoCA utilizes prosodic annotations to adapt the speech rate to the user, and collects the response time to each item using a statistical estimation of rountrip times. This information is subsequently used to enhance DigiMoCA’s CI screening performance. DigiMoCA was evaluated using the Paradigm for Dialogue System Evaluation (PARADISE), yielding a confusion matrix with a Kappa coefficient \(\kappa = 0.901\) . This means DigiMoCA understands the user approximately 90% of the time, which is equivalent to “almost perfect”[ 12 ] in terms of task completion performance.
The main objective of this work is to analyze the acceptability and usability of DigiMoCA through a user interaction pilot study [ 13 ]. For this, the perception of senior end-users as well as administrators was collected by means of standard evaluation questionnaires, and the outcomes were analyzed using standard statistical procedures.
Thus, the research question posed is:
Is the screening tool DigiMoCA acceptable and usable for the cognitive evaluation of senior adults, both by them and their evaluators?
Section 2 describes the sample of participants, the study design and the data analysis carried out; Section 3 presents and discuss the findings of the study, both from the senior end-users as well as the administrators’ point of view; finally, Section 4 summarizes the results of this research.
This user-interaction study included the participation of 46 senior end-users and 24 sector-related professionals. According to previous relevant works [ 14 , 15 ], in order to calculate the number of participants for a pilot study we need to take into account: (1) the parameters to be estimated; (2) that at least 30 participants are involved; (3) a minimum confidence interval of 80% is required. The present study fits all three criteria.
Senior end-users participated through two associations: Parque Castrelos Daycare Center (PCDC) and the Association of Relatives of Alzheimer’s Patients (AFAGA), both located in the city of Vigo (Spain). Before the start of each study, applications were submitted to the Research Ethics Committee of Pontevedra-Vigo-Ourense, containing: (1) the objectives of the study, main and secondary; (2) the methodology proposed, i.e. tests and questionnaires to administer, inclusion and exclusion criteria, recruiting procedure within the association, sample size and structure, and detailed schedule; (3) security concerns and how to address them (anonymization and encryption); (4) ethical and legal aspects, particularly regarding data privacy; and finally, (5) a copy of the informed consent to be signed in advance by all participants. Both applications for AFAGA and CDPC were approved by the corresponding dictums with registration codes 2021/213 and 2023/115 respectively.
Inclusion criteria for senior participants consisted mainly of being over the age of 65 and not having an advanced state of dementia or any other psychological pathology, or any auditory/vocal disability. Table 1 collects the demographic characteristics of the end-user participants, classified by cognitive group. The mean age was 78.61 ± 6.75, with 65% of them being female. We can see that the number of individuals is fairly distributed per group. For cognitive state classification, we used the Global Deterioration Scale (GDS) [ 16 ], which is a widely utilized scale that describes the stage of cognitive impairment, with higher GDS score meaning more deterioration. For additional information, we also show the results of the T-MoCA evaluation (16.25 ± 3.28 for healthy users (HC), 16.25 ± 3.28 for users with MCI and 16.25 ± 3.28 for users with dementia (AD)), as well as the Memory Failures of Everyday (MFE) [ 17 ] questionnaire and the Instrumental Activities of the Daily Living (IADL) scale [ 18 ].
Administrator participants, on the other hand, were affiliated to several associations, namely the Unit of Psychogerontology at the University of Santiago de Compostela, the Galicia Sur Health Research Institute, the Multimedia Technology Group at the University of Vigo, and also AFAGA and PCDC. Table 2 depicts the information about these participants, where we can see that they are predominantly from the health field. The sample has a 58.33% female composition, mostly with middle-aged participants, and fairly evenly distributed among different backgrounds. We can also see a variety in terms of seniority, ranging from less than 5 of years of experience (29.17%) to more than 20 (20.83%).
The study was organized along 3 different sessions: during the first one, T-MoCA, MFE and IADL questionnaires were administered; during the second, and after at least two weeks in between, DigiMoCA administration took place. Finally, again after two or more weeks, a second administration of DigiMoCA was carried out during the third session.
Before the first and after the second conversation with the agent, participants were asked to answer to a Technology Acceptance Model (TAM) [ 20 ] questionnaire, which covers how users come to accept a technological system. In order to determine the acceptability of the conversational agent by participants, the designed TAM questionnaire addressed 3 dimensions:
Perceived usefulness (PU) . It measures whether a participant finds the smart speaker useful, both as a general concept, and specifically during the cognitive assessment sessions.
Perceived ease-of-use (PEoU) . It measures whether the conversation with the speaker was comfortable and straightforward for the user, purely in terms of communication.
Perceived satisfaction (PS) . It measures whether the user enjoyed the utilization of the speaker, and whether they prefer it to a human counterpart (i.e., another person conducting T-MoCA as an interviewer).
The resulting questionnaire consisted of a 5-point Likert rating scale composed of 6 items, 2 for each main dimension (1 meaning strongly negative/disagree, 5 strongly positive/agree, 3 neutral). For reference, the TAM questionnaire used is available in Section 1 , translated to English.
In addition to studying how end-users interacted with DigiMoCA, another study was conducted to gather the opinions of cognitive evaluation administrators on its usability and user-friendliness. These were individuals either responsible for administering cognitive assessment tools to older adults, or had a background of expertise related to application development and voice assistants. A 7-point Likert scale questionnaire based on the Post-Study System Usability Questionnaire (PSSUQ) [ 21 ] was used (1 meaning strongly disagree, 7 strongly agree, 4 neutral). The English translation of the PSSUQ questionnaire used is available in Section 2 .
The PSSUQ-based questionnaire was designed in order to evaluate 3 usability dimensions:
System usefulness: measures the ease of use and convenience. In the designed version, includes the average scores of items 1 to 8.
Information quality: measures the usefulness of the information and messages provided by the application. Includes average scores of questions 9 to 14.
Interface quality: measures the friendliness and functionality of the user interface of the system. Includes average scores of items 15 to 17 of the questionnaire.
Overall: measures overall usability, computed as the average of the scores of all items (1 to 18 in our case).
The following statistical instruments were used to assess acceptability:
Fundamental statistics: mean, standard deviation and percentages.
Cronbach’s Alpha ( \(\alpha \) )[ 22 ] to estimate the reliability, and specifically the internal consistency, of the responses. It is widely used in psychological test construction and interpretation, and it seeks to measure how closely test items are related to one another - thus measuring the same construct. When test items are closely related to each other, Cronbach’s alpha will be closer to 1; if they are not, Cronbach’s alpha will be closer to 0. In this study, we use this metric to evaluate the internal consistency of the responses to the TAM (end-user centered) and PSSUQ (administrators centered) questionnaires. It is computed as follows:
k is the number of items/questions included.
\(\sigma _i^2\) is the variance of each item across all responses.
\(\sigma _x^2\) is the total variance, including all items.
According to Gliem [ 23 ], a good interpretation of the value of Cronbach’s alpha regarding internal consistency is \(\alpha > 0.9\) means “excellent"; \(\alpha > 0.8\) means “good"; \(\alpha > 0.7\) menas “acceptable"; \(\alpha > 0.6\) means “questionable"; and anything below 0.6 is considered an indicator of low internal consistency.
Student T-tests [ 24 ] were used for comparison of pre-pilot and post-pilot questionnaires, giving insight on the evolution of the acceptability perception of the participants during the administration. Statistical significance was measured by means of p-values.
Cohen’s d [ 25 ]: measures the effect size of T-tests, and is computed as the standardized mean difference between two groups (in this case, pre-pilot and post-pilot). It is computed as the difference between the means divided by the square root of the average of both variances:
Based on Tellez’s analysis[ 26 ] the interpretation of Cohen’s d is as follows: \(d < 0.2\) is “trivial effect"; \(0.2< d < 0.5\) is “small effect"; \(0.5< d < 0.8\) is “medium effect"; and \(d > 0.8\) means “large effect".
Statistical analysis was performed using the Google Sheets online tool, as well as Google Colab with Jupyter notebooks written in Python. Several commonly-used data analysis libraries were used (e.g., NumPy, Pandas, Pingouin).
This section presents and analyzes the main results obtained regarding the usability and acceptability of DigiMoCA, both from the end-users’ perspective (sample of n = 46) as well as the administrators’ (n = 24).
As explained in Section 2 , users completed the TAM questionnaire before and after the administration of DigiMoCA. The questionnaire included two sections, each with the 3 dimensions and 6 questions: one focused on technology in general, and another focused on DigiMoCA and conversational agents.
Table 3 presents the results of TAM’s 3-dimensional scale, taken from the post evaluation, regarding DigiMoCA’s section. Most relevant results are:
Perceived usefulness: a value of 3.87 ± 0.92 was obtained including all groups, with the highest rating within the MCI group (4.11 ± 0.92) and the lowest from the HC group (3.42 ± 0.93). Regarding the internal consistency of the answers, a value of \(\alpha \) = 0.63 was obtained, with the most internally consistent group being HC ( \(\alpha \) = 0.76) and the lowest MCI ( \(\alpha \) = 0.42).
Perceived ease of use: a value of 3.98 ± 0.96 was obtained including all groups. Once again, the highest mean value was found in the MCI group (4.14 ± 0.99), whereas the lowest rating was also obtained within the HC group (3.83 ± 0.96). In terms of internal consistency, a value of \(\alpha \) = 0.73 was obtained overall, being the HC group the most internally consistent ( \(\alpha \) = 0.96) and MCI the least one ( \(\alpha \) = 0.28).
Perceived satisfaction: including all groups we observe a value of 3.27 ± 1.21, in this case with the best rating coming from the AD group (3.47 ± 1.16), and the worst rating from the HC group (3.00 ± 1.22). Regarding the internal consistency, a value of \(\alpha \) = 0.41 was obtained, with the most internally consistent group being HC again ( \(\alpha \) = 0.56) and the least one being MCI ( \(\alpha \) = 0.25).
Overall, we consider these results to be rather positive: none of the ratings drop below 3 (out of 5) on average, either considering the overall sample or any particular group/sub-sample. This means that regardless of the level of cognitive deterioration, the users find DigiMoCA useful, easy to use and satisfactory.
Regarding the internal consistency however, it is only “acceptable" for one of the dimensions (PEoU), with a worryingly low value for the PS dimension. We believe this inconsistency to be caused by the disparity of results obtained from the two questions regarding PS: the first asks about whether participants “liked to use DigiMoCA", and the second whether they would rather “use DigiMoCA instead of T-MoCA". We observe that the answers to the second part (i.e., after interacting with the agent) are considerably lower than to the first, perhaps due to the comparison between a human-robot interaction and a human-human interaction (which is usually strongly preferred by this demographic group).
Additionally, we can observe a tendency for the MCI group to give the highest ratings but with lowest internal consistency, whereas the HC group usually gives the lowest ratings but with highest internal consistency. One possible explanation for this behavior is that cognitive impairment can interfere with consistent reasoning; it is also likely that users with MCI had more trouble understanding the full implications of the questions posed, giving less consistent answers. Certainly, it is reasonable to believe that healthy users are generally more sensitive to the intrusiveness of these evaluations, hence the slightly lower ratings.
Tables 4 and 5 present the results of the perception variation between pre-administration and post-administration of DigiMoCA. Table 4 contains the results regarding the section about technology in general, while Table 5 contains the results of the section about conversational agents. Again, data is classified by TAM dimensions (rows), including the results for each individual question (“.1" and “.2" for each dimension). We also have the results obtained classified by cognitive group (columns): HC, MCI, AD and the whole sample.
The main objective of this analysis is to determine whether the acceptability perception of users has a significant change after the administration of DigiMoCA. For this, we performed a student’s T-test with pre and post questions, and obtained three metrics: percentage change between the averages, Cohen’s d and the statistical significance p . The following paragraphs address the main findings of this process.
Regarding the technology section, there is a percentage increase in all items of the first two dimensions: +6.17% for PU.1 (d = 0.33), +3.05% for PU.2 (d = 0.11), +5.26% for PEoU.1 (d = 0.17) and +9.00% for PEoU.2 (d = 0.44). However, there is only one item (PEoU.2) that exhibits a significant change (p = 0.010). Both items from the PS dimension remain essentially unchanged. Therefore, generally speaking, we can establish that the administration does not significantly change the acceptability of technology in senior adults, but we do observe a non-significant positive change in both PU and PEoU items. Furthermore, if we look at the sample sub-groups independently, we can also observe a positive non-significant change in the vast majority of items, only one of them being significant (PEoU.2 for AD group with +17.08% change; d = 0.84, p = 0.007).
With respect to the conversational agents section, the acceptability has a more noticeable improvement among most items, three of them being statistically significant, and we also find the first item with a “large effect" size: PU.1 with +59.14% (d = 1.06, p < 0.001), PEoU.2 with +13.71% (d = 0.65, p = 0.005), PS.1 with +12.22% (d = 0.61, p = 0.005). We should also notice that the PS.2 item has a significant decrease of -24.24% (d = 0.95, d < 0.001), but we do not think this particular item is a good representative of the PS dimension, since -as it was stated previously- the pre and post questions are different, and thus it should be taken with a grain of salt. If we look at sample sub-groups independently, we can notice that none of the significant changes are in the HC group, while most are concentrated on the MCI group: +85.84% (d = 1.29, p < 0.001) for PU.1, +21.61% (d = 1.04, p = 0.007) for PEoU.2, and +16.90% (d = 1.14, p = 0.003) for PS.1. Within the AD group, only the PU.1 is statistically significant (+58.59%, d = 1.02, p = 0.013).
In light of the results discussed, it seems reasonable to affirm that the acceptability on conversational agents by senior adults improves significantly after the interaction with DigiMoCA. To support this, we found that at least one item exhibits a statistically significant (p < 0.05) positive change in all 3 dimensions, and if we discard item PS.2, which as pointd out above is probably not accurate, all items have an increase in acceptability across all groups.
In addition to the end-user interaction study, an additional study was carried out in order to measure the usability perception of DigiMoCA from cognitive assessment administrators and professionals. For this, we employed the PSSUQ questionnaire with items rated in a 7-point Likert scale, which is widely used to measure user’s perceived satisfaction of a software system. Table 6 summarizes the results, which are also categorized by gender, field of occupation and years of experience:
Overall usability (OVERALL): we obtain a 5.86 ± 1.24 mean value for all participants and all items. The mean rating does not excessively change based on gender or career experience, although the average rating for participants in the technological field was slightly higher (6.26 ± 0.94). The internal consistency obtained was “excellent" ( \(\alpha \) = 0.95) overall, with some slight differences based on gender ( \(\alpha \) being 0.88 for males and 0.97 for females), field of expertise ( \(\alpha \) = 0.96 for health field, 0.90 for technological field) and experience ( \(\alpha \) = 0.91 for administrators with 10+ years of experience, 0.97 for the ones with less than 10).
System usefulness (SYSUSE): including items 1 to 8, we obtain a mean value of 5.96 ± 1.14 for all participants. Again the mean rating is not considerably affected by gender or career experience, but we do obtain a slightly higher value of 6.36 ± 0.94 for participants in the technological field. As for the internal consistency of the answers, we get an “excellent" \(\alpha \) = 0.91 for the whole sample, although it does drop to just “good" for the male group ( \(\alpha \) = 0.85) and the most experienced participants ( \(\alpha \) = 0.88). The lowest internal consistency is found within the technological field, with an “acceptable" \(\alpha \) = 0.76.
Information quality (INFOQUAL): the mean value obta-ined from items 9 to 14 was 5.74 ± 1.44 overall. Once again, the highest differences found were based on the field of expertise: the technological field group had the highest mean value of 6.17 ± 1.22, while the lowest value was obtained from the health field group (5.63 ± 1.48). The overall internal consistency was \(\alpha \) = 0.90, and we do find differences between the demographic groups: higher consistency for females ( \(\alpha \) = 0.96) than males ( \(\alpha \) = 0.74); higher consistency for the health field group ( \(\alpha \) = 0.91) than the technological group ( \(\alpha \) = 0.79); and higher consistency for the least experienced individuals ( \(\alpha \) = 0.93) than the most experienced ( \(\alpha \) = 0.84).
Interface quality (INTERQUAL): including items 15 to 17, the overall mean rating was 5.81 ± 1.11. For this dimension the mean value for the technological field group was the highest (6.11 ± 0.65), and the mean value for the least experienced group was the lowest (5.71 ± 1.23). As for the internal consistency, this was the dimension with the lowest overall, with an “acceptable" value of \(\alpha \) = 0.77. Again we find considerable differences between demographic groups: higher \(\alpha \) = 0.88 for females than males ( \(\alpha \) = 0.34), higher \(\alpha \) = 0.80 for health field than technological field ( \(\alpha \) = 0.27) and higher \(\alpha \) = 0.90 for the less experienced group than the people with 10+ years of experience ( \(\alpha \) = 0.42). This is the only dimension where we see the internal consistency drop below an “acceptable" level, and it is probably due to the small amount of items it considers (only three).
In light of the presented results, we observe that the overall usability perception is generally positive, slightly under 6 out of 7 points, and never drops below 5 for any of its dimensions, even if considering specific demographic groups based on gender, career field and experience.
We do observe a pattern between the groups: females provide slightly lower ratings than males, but with a higher internal consistency. The exact same happens between the health field group (i.e., slightly lower ratings and higher consistency) and the technological field group, as well as between the most experienced group and the least one. The fact that this pattern repeats across groups is expected, and it is probably due to the fact that the groups are overlapping: more males than females work in the tech field, and the males happen to be younger on average than females (34.9 years old vs. 40.14, cf. Table 2 ), hence the difference found between different seniority groups. Furthermore, we noticed that participants from the medical field made more comments suggesting improvement areas than participants from the technical field, particularly regarding the user interface.
As to why this pattern occurs, we believe it is justified since DigiMoCA is inherently a technological and disruptive screening tool. Therefore, it is to be expected that professionals from the technological field are more keen to using it, and generally more interested in it and curious about how it works. Conversely, it also makes sense that professionals from the health field are more “skeptical" and less interested, since the health field is generally more stable and less prone to disruptive changes [ 27 ], and certainly more people-oriented than tool-oriented.
Finally, the fact that the information and interface-related items obtain a slightly lower rating across all groups is justified, as one of the main drawbacks of using a voice-only communication channel is the restriction of the user interface, which lacks visual user interaction. This probably means that the PSSUQ questionnaire should be adapted in this context to new ICT tools based on conversational agents, where questions about the user interface either need a reformulation or simply to be excluded.
In this paper a user-interaction pilot study analyizing the usability and acceptability of DigiMoCA -a digital Alexa-based cognitive impairment screening tool based on T-MoCA- is discussed, both from end-users’ and administrators’ perspectives.
In the case of end-users, a TAM questionnaire was utilized, administered both before and after DigiMoCA. Overall, the results show that users accept DigiMoCA, giving it a 3+ score in all three TAM’s dimensions, meaning that they perceive it as useful, easy to use and satisfactory. The perceived ease of use was particularly positive and internally consistent, with a mean score of 3.98. Additionally, the pre vs. post analysis show that, while the acceptability of technology does not change significantly after the administration of DigiMoCA, when it comes to conversational agents specifically, their perceived acceptability improves significantly. All three dimensions have an item with a statistically significant positive change. Moreover, the vast majority of non-significant changes were also positive.
In the case of test administrators, a PSSUQ questionnaire was used. Its results show that DigiMoCA is considered usable (mean score 5.86) very consistently ( \(\alpha \) = 0.95), with a score of 5+ out of 7 for all the dimensions and demographic groups. System usefulness was rated consistently higher than information and interface quality, and we find the biggest demographic differences between the health field group and the technological field group.
The sample size is one of the main limitations of the study. To estimate an ideal sample size, initially we obtained an estimation of the prevalence of AD in Spain (10.285%) Footnote 1 . Then, based on the confidence interval needed of 95% we would need n = 142 participants per study group, which is far from the sample size achieved so far.
Future lines of work include further characterizing the sample, carrying out a study of acceptability and usability by technological training of the participants, including their relationship with technology throughout their lives. Additionally, it could be worth to analyse more objective metrics, such as participants’ response times, which could enrich the study of DigiMoCA.
Ongoing work addresses the improvement of the perceived satisfaction from using DigiMoCA, by making it more friendly, while also improving its interface and the information provided to the user, compensating the voice-only interaction limitations. As these aspects are improved to make user interaction with conversational agents to be perceived closer and closer to that with human administrators, the distinctive affordability and accessibility of smart assistant-based tests can effectively set them off as a powerful screening technology.
All data supporting the findings of this study are available within the paper and its Supplementary Files, particularly the user responses to the usability and acceptability questionnaires.
Source: Clinical Practice Guideline on Comprehensive Care for People with Alzheimer’s Disease and other dementias https://portal. guiasalud.es/wp-content/uploads/2018/12/GPC_484_Alzheimer_AIA- QS_resum.pdf
WHO (2023) Un decade of healthy ageing: Plan of action. https://cdn.who.int/media/docs/default-source/decade-of-healthy-ageing/decade-proposal-final-apr2020-en.pdf?sfvrsn=b4b75ebc_28
APA (2013) Diagnostic and statistical manual of mental disorders, 5th Edn. https://doi.org/10.1176/appi.books.9780890425596
Kowalska M, Owecki M, Prendecki M, Wize K, Nowakowska J, Kozubski W, Lianeri M, Dorszewska J (2017) Aging and neurological diseases. In: Senescence, IntechOpen, Ch. 5. https://doi.org/10.5772/intechopen.69499
Gallegos M, Morgan M, Cervigni M, Martino P, Murray J, Calandra M, Razumovskiy A, Caycho-Rodríguez T, Arias Gallegos W (2022) 45 years of the mini-mental state examination (mmse): A perspective from ibero-america. Dementia & Neuropsychologia. https://doi.org/10.1590/1980-5764-dn-2021-0097
Kueper J, Speechley M, Montero-Odasso M (2018) The alzheimer’s disease assessment scale–cognitive subscale (adas-cog): Modifications and responsiveness in pre-dementia populations. a narrative review. Journal of Alzheimer’s Disease. https://doi.org/10.3233/JAD-170991
Nasreddine Z, Phillips N, Bédirian V, Charbonneau S, Whitehead V, Collin I, Cummings J, Chertkow H (2005) The montreal cognitive assessment, moca: A brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53:695–9. https://doi.org/10.1111/j.1532-5415.2005.53221.x
Article Google Scholar
Katz M, Wang C, Nester C, Derby C, Zimmerman M, Lipton R, Sliwinski M, Rabin L (2021) T-moca: A valid phone screen for cognitive impairment in diverse community samples. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring 13. https://doi.org/10.1002/dad2.12144
Nasreddine ZS (2021) Moca test: Validation of a five-minute telephone version. Alzheimer’s & Dementia 17. https://doi.org/10.1002/alz.057817
Pacheco-Lorenzo MR, Valladares-Rodríguez SM, Anido-Rifón LE, Fernández-Iglesias MJ (2021) Smart conversational agents for the detection of neuropsychiatric disorders: A systematic review. Journal of Biomedical Informatics 113. https://doi.org/10.1016/j.jbi.2020.103632
Otero-González I, Pacheco-Lorenzo MR, Fernández-Iglesias MJ, Anido-Rifón LE (2024) Conversational agents for depression screening: A systematic review. International Journal of Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2023.105272
Pacheco-Lorenzo M, Fernández-Iglesias MJ, Valladares-Rodriguez S, Anido-Rifón LE (2023) Implementing scripted conversations by means of smart assistants. Software: Practice and Experience 53. https://doi.org/10.1002/spe.3182
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics. https://doi.org/10.2307/2529310
Valladares-Rodriguez S, Fernández-Iglesias MJ, Anido-Rifón L, Facal D, Rivas-Costa C, Pérez-Rodríguez R (2019) Touchscreen games to detect cognitive impairment in senior adults. a user-interaction pilot study, International Journal of Medical Informatics 127. https://doi.org/10.1016/j.ijmedinf.2019.04.012
Lancaster GA, Dodd S, Williamson PR (2004) Design and analysis of pilot studies: recommendations for good practice. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/j.2002.384.doc.x
Cocks K, Torgerson DJ (2013) Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology. https://doi.org/10.1016/j.jclinepi.2012.09.002
Reisberg B, Torossian C, Shulman M, Monteiro I, Boksay I, Golomb J, Benarous F, Ulysse A, Oo T, Vedvyas A, Rao J, Marsh K, Kluger A, Sangha J, Hassan M, Alshalabi M, Arain F, Sh N, Buj M, Shao Y (2018) Two year outcomes, cognitive and behavioral markers of decline in healthy, cognitively normal older persons with global deterioration scale stage 2 (subjective cognitive decline with impairment). Journal of Alzheimer’s disease: JAD. https://doi.org/10.3233/JAD-180341
Montejo P, Peña M, Sueiro M (2012) The memory failures of everyday questionnaire (mfe): Internal consistency and reliability. The Spanish Journal of Psychology. https://doi.org/10.5209/rev_SJOP.2012.v15.n2.38888
Graf C (2008) The lawton instrumental activities of daily living (iadl) scale, AJN. American Journal of Nursing. https://doi.org/10.1097/01.NAJ.0000314810.46029.74
CSIC (2023) Un perfil de las personas mayores en españa 2023. https://envejecimientoenred.csic.es/wp-content/uploads/2023/10/enred-indicadoresbasicos2023.pdf
Abu Rbeian AH, Owda A, Owda M (2022) A technology acceptance model survey of the metaverse prospects, AI. https://doi.org/10.3390/ai3020018
Lewis JR (1992) Psychometric evaluation of the post-study system usability questionnaire: The pssuq. Proceedings of the Human Factors Society Annual Meeting. https://doi.org/10.1177/154193129203601617
Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika. https://doi.org/10.1007/BF02310555
Gliem JA, Gliem RR (2003) Calculating, interpreting, and reporting cronbach’s alpha reliability coefficient for likert-type scales. https://hdl.handle.net/1805/344
Mishra P, Singh U, Pandey CM, Mishra P, Pandey G (2019) Application of student’s t-test, analysis of variance, and covariance. Annals of Cardiac Anaesthesia. https://doi.org/10.4103/aca.ACA_94_19
Thalheimer W, Cook S (2002)How to calculate effect sizes from published research: A simplified methodology. Work-Learning Research. https://api.semanticscholar.org/CorpusID:145490810
Tellez A, Garcia Cadena C, Corral-Verdugo V (2015) Effect size, confidence intervals and statistical power in psychological research. Psychology in Russia: State of the Art. https://doi.org/10.11621/pir.2015.0303
Nadarzynski T, Miles O, Cowie A, Ridge D (2019) Acceptability of artificial intelligence (ai)-led chatbot services in healthcare: A mixed-methods study. Digital Health. https://doi.org/10.1177/2055207619871808
Download references
We acknowledge the contributions and support of author’s colleague Noelia Lago, as well as the staff at AFAGA (Miriam Fortes and Maxi Rodríguez) and Centro de Día Parque Castrelos (Ángeles Álvarez), and all of the participants of this study, without whom this work would not be possible.
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open access charge: CISUG/Universidade de Vigo. This work has been partially funded by Ministerio de Ciencia e Innovación, project SAPIENS- Services and applications for a healthy aging [grant PID2020-115137RB-I00 funded by MCIN/AEI/10.13039/501100011033] and by the Ministry of Science, Innovation and Universities [grant FPU19/01981] (Formación de Profesorado Universitario).
Authors and affiliations.
atlanTTic, University of Vigo, 36310, Vigo, Spain
Moisés R. Pacheco-Lorenzo, Luis E. Anido-Rifón & Manuel J. Fernández-Iglesias
Department of Electronics and Computing, USC, 15782, Santiago de Compostela, Santiago de Compostela, Spain
Sonia M. Valladares-Rodríguez
You can also search for this author in PubMed Google Scholar
Moisés R. Pacheco-Lorenzo : administration of questionnaires, statistical analysis and writing.
Sonia Valladares-Rodriguez : statistical analysis and writing.
Manuel J. Fernández-Iglesias : supervision, writing, review and editing.
Luis E. Anido-Rifón : supervision, writing, review and editing.
Correspondence to Moisés R. Pacheco-Lorenzo .
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Pacheco-Lorenzo, M.R., Anido-Rifón, L.E., Fernández-Iglesias, M.J. et al. Will senior adults accept being cognitively assessed by a conversational agent? a user-interaction pilot study. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05558-z
Download citation
Accepted : 23 May 2024
Published : 15 June 2024
DOI : https://doi.org/10.1007/s10489-024-05558-z
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
Research on the movement speed of situational map symbols based on user dynamic preference perception.
2.1. design, 2.2. participant, 2.3. procedure, 2.4. apparatus and environment, 3. results of experiment i, 4. discussion of experiment i.
5.1. design, 5.2. participants, 5.3. procedure, 5.4. apparatus and environment, 6. results of experiment ii, 6.1. apparent speed rate, 6.2. accuracy rate, 6.3. visual comfort score, 7. discussion of experiment ii, 8. general discussion, 9. conclusions, author contributions, data availability statement, conflicts of interest.
Click here to enlarge figure
Large Size | Medium Size | Small Size | |||||||
---|---|---|---|---|---|---|---|---|---|
Speed | JNDS (M) | JNDS (SD) | Weber Fraction | JNDS (M) | JNDS (SD) | Weber Fraction | JNDS (M) | JNDS (SD) | Weber Fraction |
0.25 | 0.04 | 0.02 | 0.171 | 0.04 | 0.03 | 0.176 | 0.05 | 0.02 | 0.203 |
1 | 0.11 | 0.03 | 0.108 | 0.11 | 0.04 | 0.112 | 0.11 | 0.03 | 0.112 |
4 | 0.29 | 0.14 | 0.072 | 0.32 | 0.13 | 0.081 | 0.33 | 0.15 | 0.083 |
8 | 0.56 | 0.14 | 0.070 | 0.54 | 0.12 | 0.067 | 0.55 | 0.12 | 0.069 |
16 | 1.14 | 0.25 | 0.071 | 1.18 | 0.22 | 0.068 | 1.20 | 0.24 | 0.070 |
32 | 2.34 | 0.62 | 0.073 | 2.24 | 0.66 | 0.070 | 2.27 | 0.71 | 0.071 |
48 | 3.74 | 0.90 | 0.078 | 4.03 | 1.02 | 0.084 | 3.89 | 0.95 | 0.081 |
64 | 5.82 | 1.95 | 0.091 | 6.08 | 1.88 | 0.095 | 6.02 | 2.03 | 0.094 |
88 | 9.68 | 3.52 | 0.110 | 10.12 | 3.85 | 0.115 | 10.21 | 3.68 | 0.116 |
128 | 18.69 | 8.96 | 0.146 | 19.20 | 7.69 | 0.150 | 20.35 | 7.84 | 0.159 |
256 | 49.41 | 16.13 | 0.193 | 52.73 | 17.58 | 0.206 | 58.37 | 18.60 | 0.228 |
Apparent Speed Rate | Accuracy Rate | |||
---|---|---|---|---|
Mean Value | Standard Deviation | Mean Value | Standard Deviation | |
Velocity | ||||
1 | 1.21 | 0.15 | 99.2 | 0.2 |
2 | 1.60 | 0.10 | 99.4 | 0.3 |
4 | 1.95 | 0.11 | 99.2 | 0.3 |
8 | 2.06 | 0.12 | 98.2 | 0.4 |
12 | 2.55 | 0.16 | 97.8 | 0.2 |
16 | 3.06 | 0.17 | 96.8 | 0.5 |
24 | 3.58 | 0.15 | 96.5 | 0.8 |
32 | 4.06 | 0.13 | 95.2 | 0.2 |
48 | 4.32 | 0.16 | 93.3 | 1.4 |
64 | 4.92 | 0.18 | 91.2 | 1.8 |
Speed | Regression Equation | R | Adjusted R |
---|---|---|---|
1–64 | Y = 0.921 + 0.169x − 0.003x + 2.1 × 10 x | 0.826 | 0.784 |
JNDS | Apparent Speed | Accuracy Rate | ||||
---|---|---|---|---|---|---|
F-Value | p-Value | F-Value | p-Value | F-Value | p-Value | |
Speed | 108.561 | <0.05 | 65.174 | <0.05 | 11.628 | <0.05 |
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Tong, M.; Chen, S.; Wang, X.; Xue, C. Research on the Movement Speed of Situational Map Symbols Based on User Dynamic Preference Perception. Aerospace 2024 , 11 , 478. https://doi.org/10.3390/aerospace11060478
Tong M, Chen S, Wang X, Xue C. Research on the Movement Speed of Situational Map Symbols Based on User Dynamic Preference Perception. Aerospace . 2024; 11(6):478. https://doi.org/10.3390/aerospace11060478
Tong, Mu, Shanguang Chen, Xinyue Wang, and Chengqi Xue. 2024. "Research on the Movement Speed of Situational Map Symbols Based on User Dynamic Preference Perception" Aerospace 11, no. 6: 478. https://doi.org/10.3390/aerospace11060478
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
Discover the world's research
Published on 18.6.2024 in Vol 26 (2024)
Authors of this article:
1 Department of Pediatrics, University of Alberta, Edmonton, AB, Canada
2 School of Physical & Occupational Therapy, McGill University, Montreal, QC, Canada
Francois Bolduc, MD, PhD
Department of Pediatrics
University of Alberta
11315 87th Avenue
Edmonton, AB, T6G 2E1
Phone: 1 780 492 9713
Email: [email protected]
Families of individuals with neurodevelopmental disabilities or differences (NDDs) often struggle to find reliable health information on the web. NDDs encompass various conditions affecting up to 14% of children in high-income countries, and most individuals present with complex phenotypes and related conditions. It is challenging for their families to develop literacy solely by searching information on the internet. While in-person coaching can enhance care, it is only available to a minority of those with NDDs. Chatbots, or computer programs that simulate conversation, have emerged in the commercial sector as useful tools for answering questions, but their use in health care remains limited. To address this challenge, the researchers developed a chatbot named CAMI (Coaching Assistant for Medical/Health Information) that can provide information about trusted resources covering core knowledge and services relevant to families of individuals with NDDs. The chatbot was developed, in collaboration with individuals with lived experience, to provide information about trusted resources covering core knowledge and services that may be of interest. The developers used the Django framework (Django Software Foundation) for the development and used a knowledge graph to depict the key entities in NDDs and their relationships to allow the chatbot to suggest web resources that may be related to the user queries. To identify NDD domain–specific entities from user input, a combination of standard sources (the Unified Medical Language System) and other entities were used which were identified by health professionals as well as collaborators. Although most entities were identified in the text, some were not captured in the system and therefore went undetected. Nonetheless, the chatbot was able to provide resources addressing most user queries related to NDDs. The researchers found that enriching the vocabulary with synonyms and lay language terms for specific subdomains enhanced entity detection. By using a data set of numerous individuals with NDDs, the researchers developed a knowledge graph that established meaningful connections between entities, allowing the chatbot to present related symptoms, diagnoses, and resources. To the researchers’ knowledge, CAMI is the first chatbot to provide resources related to NDDs. Our work highlighted the importance of engaging end users to supplement standard generic ontologies to named entities for language recognition. It also demonstrates that complex medical and health-related information can be integrated using knowledge graphs and leveraging existing large datasets. This has multiple implications: generalizability to other health domains as well as reducing the need for experts and optimizing their input while keeping health care professionals in the loop. The researchers' work also shows how health and computer science domains need to collaborate to achieve the granularity needed to make chatbots truly useful and impactful.
Knowledge exchange in the medical domain presents multiple challenges, including accessibility, readability, and accuracy. Chatbots, or computer programs that simulate conversations, can help answer users’ or caregivers’ questions [ 1 , 2 ]. Moreover, a chatbot offers several advantageous features for medical needs: flexibility (service providers available 24/7 and from any location [ 3 ]), speed (rapid delivery of a large number of resources), privacy (confidential access for users), engagement (an appealing user interface that fosters interaction), and trustworthiness (information developed by professionals, ensuring reliability). Chatbots have already been developed for diagnosing heart disease [ 4 , 5 ], providing counseling for mental health [ 6 ], improving patient monitoring and medical services [ 7 ], and preventing eating disorders [ 8 ]. Some chatbots have integrated coaching elements to support youth with weight management and prediabetes symptoms [ 9 , 10 ], young adults with depression and anxiety [ 3 , 11 ], people with obesity and emotional eating issues, adults wishing to improve wellness [ 4 , 5 ], and young adults with a high level of stress [ 6 ]. User trust remains a key challenge faced by medical and health-related chatbots [ 12 ].
Chatbots powered by advances in natural language processing (NLP) such as large language models (LLMs; eg, ChatGPT [ 7 ]) have shown how chatbots can revolutionize the way information is shared and accessed.
Nonetheless, developing a chatbot for the medical domain is challenging, especially when targeting complex medical conditions such as neurodevelopmental disorders (NDDs). NDDs represent a diverse group of conditions affecting development, including conditions such as attention-deficit/hyperactivity disorder (ADHD), intellectual disability, autism spectrum disorder, cerebral palsy, and learning difficulty. Together, NDDs affect up to 14% of children [ 8 ] and have major implications not only for the individuals themselves but also for their families and society [ 13 ]. It is increasingly recognized that individuals with 1 NDD diagnosis (eg, autism spectrum disorder) often also present with features of ADHD or learning difficulty. Moreover, several associated conditions (often referred to as comorbidities) [ 14 , 15 ] can also significantly affect the clinical presentation of individuals with NDDs and their needs in terms of health, social participation, and education; for instance, sleep disorders, gastrointestinal symptoms, anxiety, depression, or even seizures are more commonly found in individuals with NDDs than in the general population. NDDs are also increasingly linked to genes. While this has paved the way for personalized medicine, it has also led to silos where parents have access to information only through associations created for their genes of interest, resulting in very limited information in terms of management for many rare conditions. Furthermore, NDDs are chronic conditions with changing manifestations and needs over the course of an individual’s lifespan.
While LLM technology is evolving quickly, here, we discuss key steps to be considered when developing a chatbot for the medical domain and present how our team applied these steps to our use case of developing a chatbot for medical information related to NDDs named CAMI (Coaching Assistant for Medical/Health Information).
It is important to consult directly with individuals with lived experience. While this is now common practice in industry, it remains challenging to reach out to individuals with medical issues.
In our case, we developed a national advisory group that included 9 caregivers of individuals with NDDs (we advertised for the advisory group position through associations and partners involved in NDD research in Canada). The caregivers were predominantly female (8/9, 89%), and their ages ranged from 32 to 51 years. In terms of marital status, of the 9 participants, 7 (78%) were married or had common law spouses, and 2 (22%) were divorced or separated. They had diverse levels of education: bachelor’s degree (7/9, 78%), master’s degree (1/9, 11%), and PhD (1/9, 11%). Of the 9 participants, 6 (67%) were employed full time, while 3 (33%) worked part time. The participants had various occupations: special education teacher (1/9, 11%), planning and programming officer (1/9, 11%), senior IT consultant (1/9, 11%), community development coordinator (1/9, 11%), academic professor (2/9, 11%), research assistant (1/9, 11%), psychologist (1/9, 11%), and manufacturer (1/9, 11%). Of the 9 individuals, 8 (89%) were White, and 1 (11%) identified as a First Nations person. All participants were biological parents.
The advisory group will not only provide direct feedback on the project but will also, through their personal and professional networks, recruit participants for testing the ideas or chatbot prototype.
It is important to prepare in advance a complete set of questions to be used in semistructured interviews with individuals with lived experience. The advisory group can provide key insights into the development of the consent forms, study protocol, and interview material. For developers from industry, it is key to collaborate with clinicians, association leaders, or other allied health professionals who are not only able to provide feedback but can also help engage participants in the project.
In our case, the project was approved by the research ethics board at the University of Alberta (Pro00081113). Informed consent was obtained from participants in interviews. The participants were compensated for their work as per ethics regulation.
We used semistructured interviews ( Multimedia Appendix 1 ) adapted from user interface evaluation resources [ 12 , 16 - 18 ] to ask advisory group members about their current needs, their use of web-based platforms to gather information, and the current barriers to access to information. All interviews were video recorded after receiving consent from the participants.
Our interviews allowed us to identify overlapping themes using thematic analysis [ 19 ]. We were able to identify the patterns and themes among different user groups and build a plan on how to represent the data. The advisory group members (who were individuals with lived experience) suggested that the chatbot should provide rapid access to information:
I like the way that the resources just pop up in the side window as the user is answering the questions.
You could be asking me questions all day long but I’m not sure how close it can come to identifying the biggest stressors and challenges for our family. According to those responses they could target the info resources accordingly. If the chatbot doesn’t get to the bottom of our challenges fast enough, as a parent, I would likely find myself turning to Google more quickly to find what I’m looking for.
It is important to identify the potential differences in the conceptual framework associated with medical conditions. These differences in mental representations, known as mental models, can lead to blind spots that may prevent the chatbot from being widely useful [ 20 ].
Step 1: gathering information.
Providing trusted, exhaustive, and actionable information is key for a chatbot meant to provide medical information. It is important to partner with individuals with lived experience, associations (patients, parents, and professionals) relevant to the condition, and health authorities to obtain such information. The information about the relevant web pages, books, or other formats needs to be stored centrally and visible to others so that it can be peer reviewed. This can be done using shared sheets or websites.
In our case, we developed a database of NDD resources by leveraging existing databases (Alberta Children’s Hospital, AIDE Canada, and InformAlberta [ 21 , 22 ]) as well as by developing a nationwide consultation with individuals with lived experience from across Canada who submitted 1422 resources. AIDE Canada and InformAlberta shared their resource database with the research team, which has been incorporated in the main CAMI database. The same data set or a superset of the data is used on the general websites of the aforementioned organizations. The individuals with lived experience used Google Sheets to add the known or found resources with the appropriate data labels. These data included the web page URL, type of resource, language, age group, location, eligibility criteria, and some important keywords. All resources were collected and annotated manually by group members. Submitted resources covered several topics such as health, education, and support programs and were grouped under the categories core knowledge , educational, financial support , and services .
A key feature brought up by the families of individuals we consulted was the need for an in-depth, trusted, and diverse set of resources. Therefore, we collaborated with several organizations involved in providing NDD treatment as well as individuals with lived experience. We encountered challenges in obtaining resources for diverse NDD subtypes and covering all regions of Canada at the start because we connected only with organizations that tend to have better coverage in urban areas. We found that forming a network of individuals with lived experience was very helpful and allowed us to achieve broader coverage in identifying resources ( Figure 1 ; Textbox 1 ).
Province and number of resources
It is important to annotate the resources included in the data set in a meaningful way so that the information can be retrieved in response to related queries by future chatbot users. Again, discussions with potential users are key in capturing topics of interest and making sure that they are properly covered in the database.
We developed an automated annotation tool [ 23 ] using an NLP pipeline that uses a combination of named entity recognition, topic modeling, and text classification model to annotate the resources. All resource annotations along with their weight were stored in the Neo4j graph database (Neo4j, Inc) in the form of a weighted knowledge graph as shown in Figure 2 in the paper authored by Costello et al [ 23 ]. Ordered weighted aggregation operators are used to rank the resources that will be printed out in response to a user query.
Alternatively, one could use LLMs to extract keywords from the individual web pages. This would include selecting a suitable prompt and providing the web page content extracted from the URL. By leveraging LLMs such as ChatGPT and Google Gemini, among others, the content can be analyzed to identify and filter out important keywords from the web page.
The information is stored in the database in a format that retains the dependencies and relations among different data. The data are divided into multiple tables or classes such that the data are modular and can be used independently.
We used the Neo4j database to store the database models, and the database includes several nodes, relationships, and properties. The web page content was initially stored in the MongoDB database (MongoDB, Inc) and was further annotated into multiple properties and relationships. These properties are is_about , is_associated_with , is_located_in , occurred_together , and so on. When querying the database, these properties and corresponding relationships are extracted to filter out the resources from the database and rank them in accordance with the defined parameters.
A crucial aspect of the chatbot relates to its user interface, which requires it to be functional, easy to use, and attractive so that it appeals to users [ 6 - 8 , 13 ]. Of particular importance is the landing page because it represents the first impression of the application and should connect with the target audience [ 7 ] by reflecting the user’s [ 6 ] as well as the chatbot’s goals [ 14 ]. Important components of the page design include a logo, design, fonts, colors, and layout [ 15 ] ( Multimedia Appendices 2 - 7 ).
To streamline the process of multiple iterations, mock-up representations of the chatbot can be used in interviews with individuals with lived experience. This is important to increase the cost-effectiveness and reduce the time required for coding each version. In our case, the mock-ups, or the design wireframes of the chatbot, were shared with the participants (individuals with lived experience) by research team members on a Zoom call (Zoom Video Communications, Inc), and their feedback was recorded on Google Docs during these calls. We usually included both content experts and computer scientists to uncover gaps. This cycle should be repeated until a consensus is reached among the target user base, and the design is finalized [ 24 ]. In the design process, we used Figma (Figma Inc) [ 25 ], a collaborative design tool that allows users to work on various designs while also allowing others to review them and provide comments. Optimal user interface design uses an agile methodology that includes iterative design, implementation, and testing [ 26 ].
We started with the home page of the chatbot ( Figure 3 ). We found that families of individuals with NDDs favored a simple design with a clear indication of the purpose of the chatbot. We also included a tutorial video about the purpose and intended use of the chatbot. The families also recommended the inclusion of the names of the institutions involved in the development of the chatbot as a mark of trusted information.
The home page was designed using Gestalt principles [ 27 ], and all closely matching content was kept on the same page. During the initial interviews, participants indicated that the home page content was not self-explanatory, clear, and trustworthy. The feedback was considered and the required changes made to showcase the authenticity of the chatbot.
The conversation (or generic flow), which represents the exchange between the user and the chatbot, is a very important aspect in usability. Most chatbots will include an introductory text explaining the purpose of the chatbot. Subsequently, the conversation can be more or less prestructured.
An introductory video can help the user understand rapidly the aim of the chatbot as well as how best to ask questions. Importantly, the video needs to be short enough for the user to stay engaged with the chatbot.
In our case, the conversation (chat) design went through several iterations based on input from caregivers of individuals with NDDs ( Figure 4 ). In the initial design, the users were confused when the chatbot was not able to understand them. Either they had to start the chat again or follow along with the conversation. In the final design, users have the ability to inform the chatbot if its understanding is incorrect, and they can enter their query again. Users have more flexibility in responding to certain questions by clicking “I am not sure” if they are not comfortable with answering them. In certain cases, we also provided radio buttons based on participant feedback:
Instead of having the user type yes or no every time, could a selection not appear using radio buttons, one for yes and one for no? It would be easier for the user.
We developed a video explaining the aim and scope of the chatbot to avoid users asking questions outside of our domain of expertise. The video is available on the web [ 28 ].
Another major focus was on how to present the results of the query.
In our case, web resources identified by CAMI are presented to the user ( Figure 5 ). This part underwent significant modification to provide information to users that would influence their decision to continue browsing or not. Users have major time constraints, and they would lose interest quickly if they are unclear about the quality of the resource. The resource card evolved from showing a title and rating to showing a static image of the website, the type of resource, tags describing the resource, and some key functions such as sharing or saving the resource (which was identified by users as key in being able to access the chatbot from multiple environments, eg, work, commute, and home). All static images are captured as screenshots and saved in the database. The families reported that in addition to being more appealing, the images would help to build trust and identify the authenticity of a site if they could see its home page. Moreover, users mentioned that they would be able to recognize and remember the site more effectively if it displayed an image. We also took advantage of the tags to allow the user to converse with or direct the chatbot dynamically. Indeed, when the user clicks on a radio button, say, “parent” or “autism,” the chatbot would respond by providing resources related to this tag. This not only allows the user to start with a query but also enables them to navigate to other topics of interest that they may not have initially considered.
Next, we investigated how many resources to display during a single interaction. The families’ feedback showed consensus at 3. Resources are presented in sets of 3 with page numbers at the bottom showing the user that there are more resources for them to look at in the future ( Figure 6 ).
While websites have used several ways to encourage users to remain on their platform, it is important to consider the vulnerable nature of patients or their families as well as the potential detrimental effect of prolonged use. At the same time, encouraging the user to reconnect with the platform will allow repeated use and coaching.
Health coaching and care coordination [ 29 , 30 ] have been shown to improve health outcomes for individuals with NDDs, but they are not offered to most families because of cost, a lack of specialized health care professionals [ 31 ], access barriers due to geography, or a lack of integration into the health system (eg, insurance and social factors).
In our case, we consulted with health experts in NDDs and identified key coaching aspects that could be provided to users. We also asked users whether they were interested in being sent tips or related resources via SMS text messaging or email ( Figure 7 ).
Understanding user input is an important factor in providing responses to user queries. NLP is a domain of artificial intelligence focused on developing computer programs that are able to read textual data, analyze the content, and extract meaningful information [ 32 ]. Advanced deep network NLP algorithms, such as named entity recognition and relation extraction, have been repurposed to identify diseases, symptoms, and drugs from user input [ 33 - 35 ]. Chatbots use NLP to process user text and respond to the text. While the underlying processes are different, chatbots aim to behave like people who listen and respond to their conversational partners. Day-to-day language is difficult for a computer to understand, but, using NLP, the computer is able to break up user text into a set of attribute-value pairs; for example, if the user types “I want to know if there are any services available for Global Developmental Delay in Edmonton,” the NLP pipeline would break up the text into subparts by extracting keywords: { Services, Global Developmental Delay, Edmonton }.
We leveraged an annotation NLP pipeline [ 23 ] to extract structured information from the user free-text query. Every query submitted by the user is divided into multiple medical-related domains such as Human Phenotype Ontology (HPO) for medical terms and Alliance of Information & Referral Systems (now known as Inform USA) for service-related terms, as well as symptom-specific terms (eg, challenging behavior), geographic location, and the age of the individual.
We found that formatting the query into multiple domains helped us identify the resources with more accuracy compared to performing a search using keywords. We experimented with certain queries from the test group and identified the related conditions for common queries. Textbox 2 shows the extraction of a single query to a list of multiple medical database keywords that relate to each other when filtering the resources.
In CAMI, certain keywords are hardcoded so that the recommendations can be domain-specific and adhere closely to user requirements. If keywords such as “knowledge” or “information” are included in the initial user query, the user will not be asked about their geographic location because location is not relevant; for instance, general knowledge regarding autism is likely consistent in Canada, the United States, or Germany, but services or policies will differ based on location. By contrast, if the keyword “services” is detected in the user query, and the location is not extracted from the query itself, the user will be prompted to provide their location (city or province) such that the recommended services can be actionable.
Term and value
There are several tools available to create chatbots with diverse functionalities and their use varies with each use case. Wit.ai [ 36 ], Rasa [ 37 ], and Dialogflow [ 38 ] are some of the popular frameworks used to design rule-based chatbots.
We used the Django library in Python to create the backend for CAMI. It follows the model-view-template framework and app-based architecture, in which all classes are called apps and can work independently as well as in conjunction with other apps in the system ecosystem [ 36 ]. Each app includes a model file, a views file, a test file, and an init file. The schema was defined for the database table in the models.py file, where the column name is declared with its definition, which includes data type, constraints, and default values. All changes in the database model generate a migration file, which is an auto-created SQL code that creates or updates the database’s internal schema [ 37 , 38 ]. View files include a set of functions with decorators that define the type of http request served [ 39 ]. Django (Django Software Foundation) was selected over comparable frameworks such as Flask because using Django’s authentication module streamlined the development process for registration and log-in functionalities through auto-encryption, password verification, and authorization [ 34 , 35 ].
Once the information is extracted from the user query, it is sent to the matching engine. The matching engine incorporate the logic to find the similarity between the given input data dictionary and the list of resources from the database. The resources are annotated in the same way as the query to maintain the uniformity between the query and the list of resources to be considered for matching after the resources are filtered with the type of resource requested ( Figure 8 ).
We optimized the system to obtain faster throughput to display the results faster. This was done by incrementing the heap size and page cache in the Neo4j configuration file. The average response time for query analysis is approximately 2 seconds, but this depends on the length of the sentence. For smaller sentence sizes, the output will be returned in a shorter time period ( Figure 9 ).
Similarly, the time required to obtain the recommendations for the queries ( Figure 9 ) changes with the size of the payload. The engine checks all web pages and ranks them on the basis of the topic, entity, and location ( Figure 10 ).
In developing the chatbot, it is important to follow the CONSORT (Consolidated Standards of Reporting Trials) guidelines [ 39 ]. CONSORT guidelines are usually meant for reporting clinical trials, ensuring transparency and accuracy in the reporting of trial methods and results, but the same principles can be applied to software development as well. The guidelines include the following: (1) mention the names, credentials, and affiliations of the developers, sponsors, and owners; (2) describe the history and development process of the application and previous formative evaluations; (3) report revisions and updates; (4) provide information on quality assurance methods to ensure the accuracy and quality of the information provided; (5) ensure replicability by publishing the source code; (6) provide the URL of the application; however, because the intervention is likely to change or disappear over the years, make sure that the intervention is archived; and (7) describe how participants accessed the application, in what setting or context, and if they had to pay for access.
In our case, we applied the CONSORT guidelines as outlined in Textbox 3 .
Adhering to the CONSORT guidelines
In the case of medical conditions, especially complex ones such as NDDs, patients present with variable or multiple issues that can all be of interest. Therefore, it is important to include a knowledge of such associated concepts to provide comprehensive information. In some cases, the core disease, say, autism or advanced stage cancer, may not be treatable, but associated conditions (eg, anxiety or constipation) may have interventions available to the patients. Concepts (or entities) and their relationships are often stored in knowledge graphs. A knowledge graph consists of a graphical and structured representation of information: terms or concepts can be stored as nodes that are connected to one another through edges that define different relationships [ 42 ]. Knowledge graphs are already being used to store information about disorders [ 8 , 13 , 14 ] and could therefore be leveraged by chatbots.
Using a knowledge graph in a health assistant or chatbot allows the question-answering system to provide (1) information regarding, for instance, disease symptoms [ 43 , 44 ]; (2) internet-based diagnosis and risk assessment [ 45 ]; (3) personal lifestyle interventions for serious conditions [ 46 - 48 ]; and (4) prediagnosis and triaging using the patient’s symptoms and medical history [ 49 , 50 ].
We developed a knowledge graph representing key entities observed in individuals with NDDs as well as their interrelation by leveraging the largest data set of individuals with NDDs, the Deciphering Developmental Disorders (DDD) database [ 51 ]. The DDD database includes phenotypic and genotypic information along with the ages of 13,424 patients with severe and undiagnosed NDDs [ 30 ]. The DDD database labels phenotypes using the HPO. The knowledge graph contains 4181 HPO phenotypes for which we calculated the co-occurrence counts among all patient profiles and stored them in the knowledge graph using the “is_associated_with” relationship. The knowledge graph contains 357,514 “is_associated_with” relations. As all resources are already annotated with HPO phenotypes, the chatbot suggests resources to the user for other phenotypes associated with the queried phenotype.
Leveraging the DDD database allowed us to identify the concepts that co-occurred in individuals with NDDs ( Figure 2 ). Using CAMI, we are successfully able to identify the relationship between different nodes (medical entities related to NDDs). This is possible, based on the large number of individuals in the DDD database, which makes a salient difference between the rates of co-occurrence.
The tips or suggestions are based currently on established concepts from the coaching literature developed for families or caregivers of individuals with NDDs ( Figure 11 ); for instance, suggesting that the user set up objectives, which is a key step and a common coaching strategy when dealing with individuals with complex conditions such as NDDs. The chatbot will provide a tip, which, if the user wishes, can be sent to them via email or SMS text message.
This allowed CAMI to provide the user with the names of conditions related to their initial query. It then offered the user the opportunity to see resources for these related entities ( Figure 11 ).
The entities extracted from the query will be passed to the knowledge graph by calling the Neo4j database’s object instance, and the resources are matched [ 26 ]; for example, a query such as “My child has sleep issues” will be parsed and subcategorized into multiple key-value pairs. The aforementioned sample will return the result in the following format:
{HPO-DDD: [], UMLS: [Problem], EricTerm: [Sleep], AIRS: [], cb_category: [Sleep issues], location: AGE: Child, ngrams: [child, sleep, issue], relatedConditions: [], searchedCategory: [core knowledge]}
The keywords in the query, including “sleep,” “child,” and “issues,” are mapped with other medical data sets and categorized in the format shown.
Step 1: internal testing.
It is important, especially for the medical domain, to assess the output of the chatbot internally with a content expert before conducting further testing. This process, known as red teaming , is carried out to make sure that the chatbot does not provide inappropriate results.
We started by reviewing the chatbot output with 3 caregivers of individuals with NDDs as well as the resources identified by the chatbot to assess their quality. Overall, these individuals with lived experience appreciated the resource quality; most of the suggestions were from trusted websites and did not include any advertisements or false information. The chatbot was also able to recognize, for instance, the location and age and provide different recommendations integrating this information ( Table 1 ).
Although query 1 and query 2 in Table 1 are similar, the top resource recommendation differs. When the age is included in the query, the system goes through a different pipeline and categorizes or ranks the resources on the basis of age as well. More specific resources will be ranked higher, and very general resources will be ranked lower. Similarly, if the location is added, then another ranking filter will be appended to the recommendations, and the list is sorted with 2 filters.
Query and title of web page | Reference | |
“Siblings of ADHD Children: Family Dynamics” | [ ] | |
“ADHD Behavior: Expert Discipline Skills” | [ ] | |
“Expert Answers to Common Questions About ADHD” | [ ] | |
“Being Strength-Minded: An Introduction To Growth Mindset - Foothills Academy” | [ ] | |
“Baby Registry Tips: Baby Clothes | Cando Kiddo” | [ ] | |
“Alberta Child Health Benefit | Alberta.ca” | [ ] | |
“Services — Qi Creative Inc.” | [ ] |
Next, it is important to conduct a broader assessment of the chatbot output with a larger number of queries and evaluators. This is more challenging because patients or families have limited time. Similarly, pilot testing by clinicians is difficult to carry out.
We collaborated with undergraduate students who received training about NDDs and subsequently were asked to evaluate the resources provided by the chatbot. A total of 17 students signed up for the evaluation, but 3 (18%) dropped out, leaving 14 (82%) to evaluate the recommendations. The 14 students were divided into 3 subgroups, and each subgroup evaluated the recommendations individually for the full testing set. The evaluation sheet was divided into 4 tabs: behavioral concerns , support groups or programs , cognitive development , and other topics . Multiple user queries (from the interviews or feedback) were grouped together into each tab, and up to 50 recommendations were provided for each query. In a single subgroup, each member checked 1 query and all recommendations and answered the questions in the evaluation form ( Multimedia Appendix 8 ). Thus, each query and its recommendations were reviewed by 3 individuals, and a majority decision (2 out of 3) was considered the final label for the evaluation.
After the evaluation, the queries were subcategorized into NDD topics such as autism, ADHD, sleep concerns, and so on, and were analyzed further ( Figure 12 ). The analysis included the relevance scores for the top 10, top 15, and top 50 recommendations for each topic. The results varied for each topic; for ADHD, the relevance scores decreased from the top 10 recommendations to the top 15 recommendations but subsequently increased when considering the top 50 results. For autism, the relevance score gradually decreased as the considered recommendations increased, which shows that the initial recommendations are considered better and more relevant than the later recommendations. Upon analysis, we also found that the data in the database and the mapping keywords play an important role in the data ranking.
The ranking relies on multiple factors, such as location, category, Unified Medical Language System (UMLS) terms, Education Resources Information Center (ERIC) terms, type of resource required, and so on. Although we had approximately 11,000 resources that include the term ADHD or attention-deficit/hyperactivity disorder , the filters and ranking algorithms need a weighted mechanism to decide which resource to rank first. Regarding autism , 4169 resources in the CAMI database include the term, and CAMI performed better in terms of ranking, which shows that having more resources does not necessarily equate to better recommendations. We kept all weights at the same scale for our research, but varying weights have a better chance of improving the ranking.
While chatbots are emerging as a key aspect in customer service workflow for several businesses, their use in the medical domain remains limited. Although several features of chatbots, such as 24/7 availability, accessibility from remote areas, and privacy, promise a bright future for the use of chatbots in health care, significant challenges remain; for instance, users expect chatbots to be able to process their query. This relies on NLP, and while there are extensive ontologies for general domains, an understanding of the medical domain, especially when specific to a subfield such as NDDs, remains limited. The vocabulary used by users who are lay people requires the development of synonyms for most medical terms used usually by web resources that provide either core knowledge or specialized services. We found that, in addition to using the UMLS, involving medical experts as well as individuals with lived experience could greatly improve on this. This proved to be engaging for families of individuals with NDDs because they felt involved in the development of the chatbot. We would therefore encourage developers to form, as we did, advisory groups that include individuals with lived experience.
Another important aspect to consider when developing a medical domain chatbot is the level of complexity of most medical conditions. This may be related to sex differences, age-dependent differential manifestations, or associated conditions (comorbid conditions) that may influence the clinical presentation and the needs of the user; for example, an individual with an NDD is more at risk of developing seizures. This is important information for the user to be aware of when asking about, for instance, change in behavior. However, this information may not be easily available to computer scientists developing the chatbot. Even general practitioners may not consider this information shared knowledge. Interviewing domain-specific medical experts is challenging due to access or time constraints. Therefore, we developed an alternative approach by using a large data set of information and integrating the data into a knowledge graph. This allowed us to identify coassociated conditions and provide the chatbot with knowledge of the topic.
In addition to these more conceptual aspects, several practical points need to be considered when developing a chatbot in the medical domain. Possibly the most important concerns how to manage a highly multidisciplinary team (medical, social science, and computer science professionals as well as individuals with lived experience) with very limited overlap in background knowledge. First, we noted the importance of involving potential users (caregivers) in identifying the needs of users, the features to be included, and the user interface best suited for efficient and useful mobilization of knowledge. We found that it was very important to bring together individuals who share a common interest in the topic to surmount the challenges related to differences in language and background. Indeed, we noted that an agile-inspired approach was extremely important because caregivers could better identify needs and refine approaches as they were presented with multiple iterations of the chatbot [ 59 ]. The agile approach has gained popularity because it focuses more on the potential users and the results, which enables projects to swiftly adapt to new requirements or changes as they arise.
Second, we identified several technical points that made knowledge mobilization particularly challenging for a chatbot when using automation, as opposed to manual human curation. We observed from all our user experience testing that, to be useful, the chatbot needed a wide array of resources from which to make recommendations (as was pointed out during the initial steps with the advisory group). However, we found it difficult to identify a labeled set of such resources that could be used by the chatbot. Furthermore, we noted that, for the most part, resources were labeled manually by each individual group focused on a specific health-related entity. This made it difficult for the chatbot to identify the relevant resources using a common set of vocabulary. Related to this is the difficulty for the chatbot to provide resources according to a rank most useful to users; for instance, some websites (eg, services) may have more content and therefore might include more keywords of interest than other sites with less content (which includes a list of telephone numbers for a service). At the moment, it is still unclear how simpler sites, which may be more impactful to the user, could be added.
Finally, developing a chatbot that uses a coaching approach to engage with the user and build a long-term relationship proved to be difficult. We incorporated several elements to foster engagement (customization, allowing resources to be shared, and providing expert tips), but recreating the relationship between a caregiver and a human coach remains a complex task that will require further research in mood analysis, personalization based on deeper understanding of a user’s inner mental model (which can vary significantly between users), and an awareness of the course of the disease.
Individuals with NDDs can present with different clinical features based on age and sex. Unfortunately, there is currently no database containing longitudinal data. In the future, user interactions, such as resource ratings of the resources combined with data on individuals with NDDs and cluster analysis approaches, could be used to recommend age-, sex-, or phenotypic profile–specific resources [ 60 - 63 ].
The conversation between chatbot and the user is still scripted to some degree, although the user can decide to branch out by either navigating to the related topics suggested by the chatbot or simply clicking on the tags presented with the resources provided for their query. In the future, we would like to expand on this and allow a question-generation system that will use the knowledge graph of entities related to NDDs to ask users questions to provide more specific recommendations [ 64 ]. Developing a question-generation system could allow parents to access required information without answering too many queries.
From a technical point of view, querying the database imposes significant load, resulting in prolonged wait times for results, which may impact the user experience. Using the correct database or adding indexing to the database queries can expedite data processing, but this will need many changes in database modeling and extraction queries [ 65 ]. In addition, regularly checking for annotation updates on websites is crucial to ensure their continued operation and content consistency. Sometimes, websites do not follow coding standards, and it is very difficult to spot abandoned or corrupted web pages and make sure that they do not show up in the suggestions. We could potentially develop a script that runs automatically regularly to find and remove pages that do not exist or have broken links. The website’s status can be confirmed by either receiving the response using the Python request module or using the Linux ping command [ 66 , 67 ]. However, the authenticity of the website needs to be reverified after the data update to ensure that false information is not disseminated.
Finally, in its current version, CAMI is only available in English, but newcomers to the country, immigrants, and people who are comfortable in a different language should be able to access the chatbot so that it can reach, and provide resources to, a larger audience [ 68 , 69 ].
We would like to highlight the importance for our project of creating a large data set of longitudinal data to track patients’ data over a period of time using the chatbot. This will allow the system to monitor patient behavior and can provide proactive solutions or resources in the early stages itself. This type of information will also be crucial for other artificial intelligence tools aiming at predicting outcomes and assessing intervention impact.
Another key aspect of chatbots operating within the medical domain will be a precise NLP system, necessitating the creation of a lexicon of domain-specific medically relevant terms alongside layperson language that can be recognized by the language models. Along the same lines, a truly dynamic system with automatic question generation will make the chatbot smarter in terms of its conversational manner. In addition to the user asking general questions, the specificity of interactions with the chatbot will be determined by the responses or queries typed by the user. Better questions will lead to better and more specific responses and better recommendations [ 70 - 72 ].
FB conceptualized the study, obtained funding, and managed the project. OZ, AM, and DN assisted with the conceptualization, navigation, and development of the chatbot. MZR and OZ assisted with the technical validation of the project. AS developed and designed the architecture and backend of the chatbot and cowrote the paper. RK developed the user interface of the chatbot and cowrote the paper. MK assisted with the knowledge graph and matching engine of the chatbot and cowrote the paper. MZR assisted with the validation of the knowledge graph. CR assisted with interviews with individuals with lived experience as well as managed the resource identification team, user data validation, and interviews for requirement gathering and analysis. KK assisted with the parent advisory group and provided feedback about the chatbot from the point of view of an individual with lived experience. NR assisted with the development of the text classification model that was used in natural language processing. AM also assisted with feedback on the paper. TAB assisted with the development of the CAMI (Coaching Assistant for Medical/Health Information) logo as well as other visual aspects of the chatbot. TO assisted with the development of the parent advisory group and initial interviews.
None declared.
Semistructured interview consent and questions.
Coaching Assistant for Medical/Health Information home page (top portion).
Coaching Assistant for Medical/Health Information home page (bottom portion).
Coaching Assistant for Medical/Health Information sign-up page.
Coaching Assistant for Medical/Health Information chat interface.
Coaching Assistant for Medical/Health Information resource card.
Coaching Assistant for Medical/Health Information chat radio buttons that enable users to explore related conditions.
Coaching Assistant for Medical/Health Information questions for recommendation evaluation.
attention-deficit/hyperactivity disorder |
Coaching Assistant for Medical/Health Information |
Consolidated Standards of Reporting Trials |
Deciphering Developmental Disorders |
Education Resources Information Center |
Human Phenotype Ontology |
large language model |
neurodevelopmental disability/difference |
natural language processing |
Unified Medical Language System |
Edited by A Mavragani; submitted 22.06.23; peer-reviewed by M Chatzimina; comments to author 21.07.23; revised version received 27.07.23; accepted 19.04.24; published 18.06.24.
©Ashwani Singla, Ritvik Khanna, Manpreet Kaur, Karen Kelm, Osmar Zaiane, Cory Scott Rosenfelt, Truong An Bui, Navid Rezaei, David Nicholas, Marek Z Reformat, Annette Majnemer, Tatiana Ogourtsova, Francois Bolduc. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.06.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Mar 16, 2023 | Jared Spataro - CVP, AI at Work
Humans are hard-wired to dream, to create, to innovate. Each of us seeks to do work that gives us purpose — to write a great novel, to make a discovery, to build strong communities, to care for the sick. The urge to connect to the core of our work lives in all of us. But today, we spend too much time consumed by the drudgery of work on tasks that zap our time, creativity and energy. To reconnect to the soul of our work, we don’t just need a better way of doing the same things. We need a whole new way to work.
Today, we are bringing the power of next-generation AI to work. Introducing Microsoft 365 Copilot — your copilot for work . It combines the power of large language models (LLMs) with your data in the Microsoft Graph and the Microsoft 365 apps to turn your words into the most powerful productivity tool on the planet.
“Today marks the next major step in the evolution of how we interact with computing, which will fundamentally change the way we work and unlock a new wave of productivity growth,” said Satya Nadella, Chairman and CEO, Microsoft. “With our new copilot for work, we’re giving people more agency and making technology more accessible through the most universal interface — natural language.”
Copilot is integrated into Microsoft 365 in two ways. It works alongside you, embedded in the Microsoft 365 apps you use every day — Word, Excel, PowerPoint, Outlook, Teams and more — to unleash creativity, unlock productivity and uplevel skills. Today we’re also announcing an entirely new experience: Business Chat . Business Chat works across the LLM, the Microsoft 365 apps, and your data — your calendar, emails, chats, documents, meetings and contacts — to do things you’ve never been able to do before. You can give it natural language prompts like “Tell my team how we updated the product strategy,” and it will generate a status update based on the morning’s meetings, emails and chat threads.
With Copilot, you’re always in control. You decide what to keep, modify or discard. Now, you can be more creative in Word, more analytical in Excel, more expressive in PowerPoint, more productive in Outlook and more collaborative in Teams.
Microsoft 365 Copilot transforms work in three ways:
Unleash creativity. With Copilot in Word, you can jump-start the creative process so you never start with a blank slate again. Copilot gives you a first draft to edit and iterate on — saving hours in writing, sourcing, and editing time. Sometimes Copilot will be right, other times usefully wrong — but it will always put you further ahead. You’re always in control as the author, driving your unique ideas forward, prompting Copilot to shorten, rewrite or give feedback. Copilot in PowerPoint helps you create beautiful presentations with a simple prompt, adding relevant content from a document you made last week or last year. And with Copilot in Excel, you can analyze trends and create professional-looking data visualizations in seconds.
Unlock productivity. We all want to focus on the 20% of our work that really matters, but 80% of our time is consumed with busywork that bogs us down. Copilot lightens the load. From summarizing long email threads to quickly drafting suggested replies, Copilot in Outlook helps you clear your inbox in minutes, not hours. And every meeting is a productive meeting with Copilot in Teams. It can summarize key discussion points — including who said what and where people are aligned and where they disagree — and suggest action items, all in real time during a meeting. And with Copilot in Power Platform, anyone can automate repetitive tasks, create chatbots and go from idea to working app in minutes.
GitHub data shows that Copilot promises to unlock productivity for everyone. Among developers who use GitHub Copilot, 88% say they are more productive, 74% say that they can focus on more satisfying work, and 77% say it helps them spend less time searching for information or examples.
But Copilot doesn’t just supercharge individual productivity. It creates a new knowledge model for every organization — harnessing the massive reservoir of data and insights that lies largely inaccessible and untapped today. Business Chat works across all your business data and apps to surface the information and insights you need from a sea of data — so knowledge flows freely across the organization, saving you valuable time searching for answers. You will be able to access Business Chat from Microsoft 365.com, from Bing when you’re signed in with your work account, or from Teams.
Uplevel skills. Copilot makes you better at what you’re good at and lets you quickly master what you’ve yet to learn. The average person uses only a handful of commands — such as “animate a slide” or “insert a table” — from the thousands available across Microsoft 365. Now, all that rich functionality is unlocked using just natural language. And this is only the beginning.
Copilot will fundamentally change how people work with AI and how AI works with people. As with any new pattern of work, there’s a learning curve — but those who embrace this new way of working will quickly gain an edge.
The Copilot System: Enterprise-ready AI
Microsoft is uniquely positioned to deliver enterprise-ready AI with the Copilot System . Copilot is more than OpenAI’s ChatGPT embedded into Microsoft 365. It’s a sophisticated processing and orchestration engine working behind the scenes to combine the power of LLMs, including GPT-4, with the Microsoft 365 apps and your business data in the Microsoft Graph — now accessible to everyone through natural language.
Grounded in your business data. AI-powered LLMs are trained on a large but limited corpus of data. The key to unlocking productivity in business lies in connecting LLMs to your business data — in a secure, compliant, privacy-preserving way. Microsoft 365 Copilot has real-time access to both your content and context in the Microsoft Graph. This means it generates answers anchored in your business content — your documents, emails, calendar, chats, meetings, contacts and other business data — and combines them with your working context — the meeting you’re in now, the email exchanges you’ve had on a topic, the chat conversations you had last week — to deliver accurate, relevant, contextual responses.
Built on Microsoft’s comprehensive approach to security, compliance and privacy. Copilot is integrated into Microsoft 365 and automatically inherits all your company’s valuable security, compliance, and privacy policies and processes. Two-factor authentication, compliance boundaries, privacy protections, and more make Copilot the AI solution you can trust.
Architected to protect tenant, group and individual data. We know data leakage is a concern for customers. Copilot LLMs are not trained on your tenant data or your prompts. Within your tenant, our time-tested permissioning model ensures that data won’t leak across user groups. And on an individual level, Copilot presents only data you can access using the same technology that we’ve been using for years to secure customer data.
Integrated into the apps millions use every day. Microsoft 365 Copilot is integrated in the productivity apps millions of people use and rely on every day for work and life — Word, Excel, PowerPoint, Outlook, Teams and more. An intuitive and consistent user experience ensures it looks, feels and behaves the same way in Teams as it does in Outlook, with a shared design language for prompts, refinements and commands.
Designed to learn new skills. Microsoft 365 Copilot’s foundational skills are a game changer for productivity: It can already create, summarize, analyze, collaborate and automate using your specific business content and context. But it doesn’t stop there. Copilot knows how to command apps (e.g., “animate this slide”) and work across apps, translating a Word document into a PowerPoint presentation. And Copilot is designed to learn new skills. For example, with Viva Sales, Copilot can learn how to connect to CRM systems of record to pull customer data — like interaction and order histories — into communications. As Copilot learns about new domains and processes, it will be able to perform even more sophisticated tasks and queries.
Committed to building responsibly
At Microsoft, we are guided by our AI principles and Responsible AI Standard and decades of research on AI, grounding and privacy-preserving machine learning. A multidisciplinary team of researchers, engineers and policy experts reviews our AI systems for potential harms and mitigations — refining training data, filtering to limit harmful content, query- and result-blocking sensitive topics, and applying Microsoft technologies like InterpretML and Fairlearn to help detect and correct data bias. We make it clear how the system makes decisions by noting limitations, linking to sources, and prompting users to review, fact-check and adjust content based on subject-matter expertise.
Moving boldly as we learn
In the months ahead, we’re bringing Copilot to all our productivity apps—Word, Excel, PowerPoint, Outlook, Teams, Viva, Power Platform, and more. We’ll share more on pricing and licensing soon. Earlier this month we announced Dynamics 365 Copilot as the world’s first AI Copilot in both CRM and ERP to bring the next-generation AI to every line of business.
Everyone deserves to find purpose and meaning in their work — and Microsoft 365 Copilot can help. To serve the unmet needs of our customers, we must move quickly and responsibly, learning as we go. We’re testing Copilot with a small group of customers to get feedback and improve our models as we scale, and we will expand to more soon.
Learn more on the Microsoft 365 blog and visit WorkLab to get expert insights on how AI will create a brighter future of work for everyone.
And for all the blogs, videos and assets related to today’s announcements, please visit our microsite .
Tags: AI , Microsoft 365 , Microsoft 365 Copilot
Data on software engineers at a Fortune 500 company revealed that junior and senior women saw contrasting costs and benefits.
While much has been said about the potential benefits of remote work for women, recent research examines how working from home affects the professional development of female software engineers at a Fortune 500 company, revealing that its impact varies by career stage. Junior women engineers benefit significantly from in-person mentorship, receiving 40% more feedback when sitting near colleagues, while senior women face reduced productivity due to increased mentoring duties. Male engineers also benefit from proximity, but less so. The authors suggest that recognizing and rewarding mentorship efforts could mitigate these disparities, ensuring junior women receive adequate support remotely and senior women are properly compensated for their mentoring contributions.
Since the pandemic began, work from home (WFH) has at times been pitched as a means of supporting women in the workplace. This argument often focuses on WFH’s potential to help women juggle the demands of their jobs with the demands of their families. However, WFH’s impact on women’s professional development may vary over their careers. In our research, we explored how WFH impacts young women as they try to get a foothold in their careers and how it affects the often-invisible mentorship work done by more senior women.
IMAGES
VIDEO
COMMENTS
2.1. Intelligent User Interfaces. The idea of introducing intelligence to human-computer interaction (HCI) and user interface sprouted decades ago in the form of intelligent computer-assisted instructions, which later gained a wider following and application as IUIs [].The field connects different disciplines, mainly AI, software engineering (SE), and HCI, with AI contributing simulation and ...
Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia. * Correspondence: [email protected]. Abstract: Intelligent user interfaces (IUI) are driven ...
Plasticity of user interface design research is the latest trend in the domain of AUI design, facing many challenges. Some criticism arises from the fact that it is allied with AI, and thus some scepticisms have been raised by the HCI community. From the literature survey, most of the AUIs are designed for office and web-based applications.
Abstract. User experience (UX) researchers in technical communication (TC) and beyond still need a clear picture of the methods used to measure and evaluate UX. This article charts current UX methods through a systematic literature review of recent publications (2016-2018) and a survey of 52 UX practitioners in academia and industry.
1 Introduction. Natural user interface (NUI) came up to improve users' interaction with the system using natural body movements to perform actions [].According to Wigdor and Wixon [], the natural property is not referring to the interface, but to the way that users interact with it and what they feel using it.Norman [], a prominent human-computer interaction (HCI) researcher, cited Steve ...
The goal of this research was to outline the current user interface design criteria and guidelines applied when designing a mobile learning application and explore how these factors affect the ...
Mobile applications can be computationally simple, while incorporating a heavy design, implementation and overall effort in constructing a complex UI that services a critical aspect of the product communication and interaction with the user. Previous research falls short at considering the UI definition of the product, and this gap opens the ...
Her research interests include User Interface design, model-driven engineering (MDE) and software engineering. Estefanía Serral. Estefanía Serral is a computer scientist currently working as an assistant professor at KU Leuven (Belgium). She has a highly international and interdisciplinary profile. Her research focuses on IoT, MDE and ...
Usability and user experience (UX) are important concepts in the design and evaluation of products or systems intended for human use. This chapter introduces the fundamentals of design for usability and UX, focusing on the application of science, art, and craft to their principled design. It reviews the major methods of usability assessment ...
Natural user interface (NUI) is considered a recent topic in human-computer interaction (HCI) and provides innovative forms of interaction, which are performed through natural movements of the human body like gestures, voice, and gaze. ... voice, and gaze. In the software development process, usability and user eXperience (UX) evaluations are ...
The articles in this special issue report on user interface design on the frontiers of computing. They investigate topics with significant potential or risk, showing the maturity of new concepts and evidence of usefulness. Each article summarizes interesting research, engages readers, and directs future efforts.
Abstract. Usability and user experience (UX) are important concepts in the design and evaluation of products or systems intended for human use. This chapter introduces the fundamentals of design ...
Exploring the African village metaphor for computer user interface icons [Conference session]. 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists, Vanderbijlpark, Emfuleni, South Africa (pp. 132-140).
The natural user interface (NUI) enables intuitive interaction between users and the SHS, significantly lowering the barrier to entry and enhancing user experience. However, a comprehensive evaluation of research on NUI for the SHS to provide a valuable synthesis of existing research and informing future research directions remains unavailable.
UX measurement method is a method to measure UX aspects and to get information about the fulfillment level of a certain aspect. The primary studies have been used to identify the different measurement methods used by researchers to measure UX aspects, either as a separate method or mixed with other measurement methods.
Building on this prior research, a set of recommendations on user interface design were engendered following 4 steps: (1) interview with user interface design experts, (2) analysis of the experts' feedback and drafting of a set of recommendations, (3) reanalysis of the shorter list of recommendations by a group of experts, and (4) refining ...
Important peer-reviewed and informally published recent research on user interface design and user experience (UX) design. For the benefit of clients and colleagues we have culled a list of approximately 70 curated recent research publications dealing with user interface design, UX design and e-commerce optimization. ...
To understand the influence of user interface on task performance and situation awareness, three levels of user interface were designed based on the three-level situation awareness model for the 3-player diner's dilemma game. The 3-player diner's dilemma is a multiplayer version of the prisoner's dilemma, in which participants play games with two computer players and try to achieve high scores.
UX Research Cheat Sheet. Susan Farrell. February 12, 2017. Summary: User research can be done at any point in the design cycle. This list of methods and activities can help you decide which to use when. User-experience research methods are great at producing data and insights, while ongoing activities help get the right things done.
Iterative design is the best way to increase the quality of user experience. The more versions and interface ideas you test with users, the better. User testing is different from focus groups, which are a poor way of evaluating design usability. Focus groups have a place in market research, but to evaluate interaction designs you must closely ...
Use Baymard's comprehensive UX research database to create "State of the Art" user experiences, and see how your UX performance stacks up. With Baymard Premium you will get access to 650+ design guidelines and 215,000+ performance scores — insights already used by several of the world's leading sites.
1. Understand the user and the brand. Think about what problem you're trying to solve for the user (and how this aligns with brand goals). 2. Conduct user research. Identify user needs, goals, behaviors, and pain points. Tools for user research might include surveys, one-on-one interviews, focus groups, or A/B testing.
Abstract Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users ...
When designing situational maps, selecting distinct and visually comfortable movement speeds for dynamic elements is an ongoing challenge for designers. This study addresses this issue by conducting two experiments to measure the human eye's ability to discern moving speeds on a screen and examines how symbol movement speeds within situational maps affect users' subjective experiences ...
Mobile platforms have called for attention from HCI practitioners, and, ever since 2007, touchscreens have completely changed mobile user interface and interaction design. Some notable differences ...
Research Article. Optimizing cost of quality in commercial projects using fuzzy expert system. ... The Graphical User Interface is shown in Figure 7 and the sequence of processes included in the FES is shown in Figure 8. Figure 7. The graphical user interface (gui) of the expert system.
SEP-CyLE includes learning objects and tutorials on a variety of computer science and software engineering topics. In this experiment, we chose to use learning objects related to software testing. Our research aim is to investigate the utility of SEP-CyLE and evaluate the usability of its user interface based on the actual user experience. 1.1.
To identify NDD domain-specific entities from user input, a combination of standard sources (the Unified Medical Language System) and other entities were used which were identified by health professionals as well as collaborators. ... Journal of Medical Internet Research 8439 articles JMIR Research Protocols 4028 articles ...
Microsoft 365 Copilot transforms work in three ways: Unleash creativity. With Copilot in Word, you can jump-start the creative process so you never start with a blank slate again. Copilot gives you a first draft to edit and iterate on — saving hours in writing, sourcing, and editing time.
While much has been said about the potential benefits of remote work for women, recent research examines how working from home affects the professional development of female software engineers at ...