voice user interface case study

A Definitive Guide to Voice User Interface Design (VUI)

voice user interface case study

Home / UX / A Definitive Guide to Voice User Interface Design (VUI)

The Voice User Interface or VUI has broken the silent interaction between machines and humans.

When we stop for a moment and think about how popular voice-controlled devices and virtual assistants have become, it is only necessary to look deeper into the VUI design and learn more.

Siri, which came first, the chicken or the egg?

Siri: No one knows this, but it was a tie.

A recent article from Google called “ How voice assistance is reshaping consumer behavior ” makes the huge rise of smart speakers and VUI even more obvious. According to Google’s article, 41% of people who own a voice-controlled speaker feel like talking to another person .

That makes one wonder, is VUI reshaping the human-device relationship?

The answer totally at your discretion but before rushing an answer, let’s go through what a voice user interface is and how to create a VUI design . 

What is Voice User Interface (VUI)? 

Voice User Interfaces (VUI) make it possible for users to interact with a device or an app through voice commands. With the increased use of digital devices, screen fatigue has become more of a widely experienced issue. And this has given all the more advantages to the development and the use of voice user interfaces. VUIs provide hands-free, complete control over devices and apps without having to look at the screen. The world’s leading companies, specifically all the “Big Five” tech firms like Google, Amazon, Microsoft, Facebook, and Apple, have developed or been developing their voice-enabled AI assistants and voice-controlled devices.

The most well-known voice user interface examples include Apple’s Siri, Google Assistant, and Amazon’s Alexa . Not only AI assistants but also smart devices with VUIs are taking over the market, such as Amazon Echo, Apple HomePod, and Google Home. 

Be it an AI assistant, a voice-enabled mobile app, or a voice-controlled device like smart speakers, voice interface and interactions have become incredibly common. 

I almost hear you asking, “how common exactly?”

We will talk more about the VUI’s popularity, but you can take a guess from the stats below.

According to a report , 1 in 4 adults in the US owns a smart speaker today, while one-third of the US population uses voice search features .

You need to understand what a voice interface is and how it works to be able to create a voice user interface design that doesn’t frustrate users and provides a smooth user experience. Now that you have got the voice user interface definition, let’s dive into the next question, “ how does a voice interface work .”

Buckle up; this will be a tiny pocket encyclopedia, only in fewer and simpler words. 

How Does a Voice Interface Work?

A voice UI is actually an outcome of the combination of several Artificial Intelligence (AI) technologies, including Speech Synthesis , Automatic Speech Recognition , and Name Entity Recognition . Voice UIs can be added to devices or inside applications. 

The backend infrastructure and the VUI’s speech components backed by AI technologies often get stored in a private or public cloud where the VUI processes the user’s voice and speech. AI technology understands the user’s intent and returns a response to the device. 

That’s the basics of a voice UI design. Most companies include a Graphical User Interface (GUI) and additional sound effects to the VUIs to provide the best user experience. Visuals and sound effects make it easier for the user to know when the device is listening, processing speech, or responding back to the user. 

VUI Device Types

Today, a wide range of devices can contain the VUI, such as:

  • Smartphones
  • Wearables like smart wristwatches
  • Desktop computers, laptops
  • Sound systems, smart TVs
  • Smart speakers
  • Internet of Things (IoT); locks, thermostats, lights

Voice User Interface - Advantages and Disadvantages

Advantages of vui.

  • Faster than typing : Dictating is faster than typing text messages, making it more convenient for users.
  • Ease of use : Not all people can get along well with technological devices. But any user can use voice to request a task from VUI devices or AI assistants.
  • Hands-free : In some cases, such as driving, cooking, or when you are away from your device, speaking is much more practical than typing or tapping.
  • Eyes-free : VUI provides an eyes-free user experience. In cases like driving, you can focus on the road rather than the device. It is practical for screen fatigue issues as well.

Disadvantages of VUI

  • Privacy concerns : Potential privacy violations of a VUI concerns some users. 
  • Misinterpretation & lack of accuracy : Voice recognition software still has its flaws. The software cannot understand and interpret the context of language, which causes errors and misinterpretation. Voice dictation for typing automatically may lead to mistyping since VUIs may not always differentiate homonyms, such as “there” and ”their.”
  • Public spaces : It can be hard to give voice commands to devices and AI assistants in public spaces for privacy and noise issues.

Why Voice Interface Design Is The Next Big Thing

User interfaces or UI are the bridge that makes interaction possible between machines and humans.

One particular type of UI, visual user interface, has exploded in popularity in recent years and surpassed typing.

71% of users prefer doing a voice search in queries instead of typing. Not only that, but the usage of voice-controlled smart speakers is also constantly increasing. More than half of the smart speaker owners in the US are using their devices on a daily basis. 

This expanding popularity of voice interfaces particularly interests the UX and UI designers. After all, the ultimate goal of both is to answer users’ needs and make the experience easy-flowing for them. 

In order to create a successful voice interface design, designers and developers need to understand the intricacies of human communication. Consumers expect a certain level of capability, a conversational tone, and fewer misinterpretations from AI assistants and smart devices as they interact daily. 

So, there comes the big question: how to design VUI that will bring value to users’ lives? 

How to Design a Voice User Interface

At its heart, designing a voice interface is not so different from designing GUI or any other UX project . We can break down the VUI design process into simpler steps.

Step 1: Conduct a User Research 

Start off by understanding the interaction between the user persona and an assistant persona in various engagement stages by customer journey mapping.

Focus on observing and understanding the needs, motivations, and behaviors of the user. Include voice as a channel in your customer journey map to identify how and where voice can be used as an interaction method.

The designer should highlight where voice interactions can be implemented in the user flow as an opportunity. This is valid for a scenario in which the customer journey map has yet to be created. If there is already a customer journey, the designer should see if voice interactions can improve the user flow. 

Designers should focus on solving users’ problems.

For example, if your customer support always gets the same question asked, then it might be an opportunity to integrate the conversation in the voice app.

You might like; User Persona Examples for SaaS Products .

Understand the device persona.

Apart from understanding the user persona, identify the ability and the characters of the device (e.g., Alexa.)

Step 2: Make a VUI Competitor Analysis

Designers should conduct a VUI competitors analysis to see how competitors are implementing voice interactions.

Find out the use case for their app, voice commands used in the app, and check what their users think from the reviews.

Step 3: Define Requirements

Define user’s pain points, needs, and requirements.

Other than conducting user research and competitor analysis, you can try interviewing and user testing . Capture different scenarios before turning them into conversation flows. Use flow maps to write down user requirements with user stories. Then, design dialog flows for each of them.

Next, go on to prototyping VUI conversations with dialog flows showing the interaction between the device and the user. 

How to Prototype VUI Conversations with Dialog Flows

Key points to creating successful VUI dialog flows: 

  • Keep the interaction conversational and simple,
  • Create a strong error strategy,
  • Confirm when a task is completed,
  • Have an extra layer of strong security.

VUI designers need to create dialog flows and the entire conversation between the system and the users. Dialog flows should successfully guide users. A dialog flow script is a deliverable that consists of:

  • Keywords that initiate the interaction, aka voice triggers like “Hello, Alexa.”
  • Branches that show where the conversation could lead to
  • Sample dialogs for the users and the AI assistant

A dialog flow is like a prototype, and it is a script that covers the back-and-forth conversation. Fortunately, there are prototyping apps that simplify the creation of dialog flows. 

Some of the apps for prototyping VUIs are in the following.

  • Voiceflow : Collaboration tool to design, prototype, and create for Google Assistant and Amazon Alexa
  • Dialogflow : Google-owned platform for designing a conversational user interface into web apps, mobile apps, bots, and devices.
  • Speechly : Spoken language understanding solution to build voice user interfaces.

Amazon has their own Alexa Skill Builder to help designers create new Alexa Skills . 

Learn more about product design tools: 16 Product Design Tools You Need in Your Arsenal .

Step 4: testing.

Testing the dialog flows between the system and the user is like a role play.

One person plays the device, and the other plays the user to see if the conversation flows successfully. 

Step 5: Understand the Anatomy of a Voice Command

When designing a VUI, designers constantly need to think about the possible interaction scenarios and each objective (i.e., what is the user trying to achieve in this scenario?)

So, when a user gives a voice command, it consists of three factors at its core: intent, utterance, and slot.

The intent is the objective of the user’s voice command. Voice interactions’ intent can be either a low utility or high utility interaction.

A high utility interaction refers to very specific tasks, such as requesting a rock song play on Spotify or the lights in the living room be turned off. 

A low utility interaction , on the other hand, is about performing vaguer and hard to decipher tasks. For example, if a user asks for more information about a topic, the voice UI needs to check if it is included in its service scope and then asks more questions to understand and respond better to the request.

The utterance is the way a user phrases or utters the voice command to trigger the task.

Some phrases for requests can be simple and easy to understand, like “Play me music on Spotify,” but voice UX designers need to consider other variations as well. For example, instead of saying “play…” a user can say “I want to hear music…” or “Could you play…”

The more variations designers consider, the better and easier the AI will understand the request and respond with the right action.

Slots are the required or optional variables that are requested from the user to fulfill the task.

For example, if a user requests “play me calming music,” the variable here is “calming.” Since the AI can also perform the request without the variable, this slot is optional. However, for example, if a user wants to book a restaurant reservation, the slot will be the hour, and it is required.  

Voice User Interface Examples

We have gone over what a voice user interface is and how you can create a voice user interface design. Let’s see some of the top examples of VUIs. Of course, the most used and popular ones are Siri, Alexa, Cortana, and Google Assistant. Which one do you think is the most competent voice assistant in the market?

Siri is Apple's voice assistant that comes with Apple's operating systems such as iOS, iPadOS, watchOS, macOS, and tvOS.

It was first released on October 4, 2011; and has been active ever since.

siri voice user interface

Released in November 2014, Amazon's Alexa has been first used in Amazon's Echo smart speakers.

It has now made it's way into most of the operating systems of smart devices, such as Android, IOS, and Fire OS.

alexa voice user interface

Microsoft's voice assistant Cortana helps you be more productive by using Bing search engine to do tasks like setting reminders and answering questions for you.

cortana voice user interface

Google Assistant

Available on smart devices and home systems, Google Assistant is a virtual-assistant VUI designed and developed by Google.

According to research carried out by Loup Ventures , Google Assistant is the most competent one among these voice assistants. 

google assistant voice user interface

The key takeaways of this post are:

  • A speech interface is a VUI (Voice User Interface) referring to an interface that requires voice interaction.
  • It is different from a tangible user interface , which requires interactions with physical gestures, such as tapping or swiping.
  • Designers need to carry out thorough research and observations on the user persona, device persona and create easy flowing dialog flows to be able to achieve a successful voice user interface design.

Frequently Asked Questions

What is the goal of a voice user interface (vui).

A Voice User Interface is designed to re-enact the feeling of conversations between device-user interaction and help users easily complete tasks or search for information without using their hands or eyes.

What do people ask their Voice User Interface?

Voice User Interfaces such as Alexa, Siri, and Google Assistant can perform numerous tasks, so what people ask their VUI-based virtual assistants can vary from daily tasks to business related search queries.

When was Voice User Interface created?

The first ever Voice User Interface was developed in a collaboration of Nuance and SpeechWorks through Interactive Voice Response (IVR) systems in 1984.

Related Articles

voice user interface case study

8 Best Gamification Software to Level-up Your User Experience

voice user interface case study

12 Best UX Design Books Every Designer Should Read in 2024

voice user interface case study

8 Examples of SaaS Gamification to Inspire Your Product's Design

voice user interface case study

How to Write Free Trial Emails for Conversion + 8 Examples

voice user interface case study

20 Engaging Welcome and Greeting Messages for Your Customers in 2024

voice user interface case study

16 Product Design Tools You Need in Your Arsenal

voice user interface case study

8 Best No-Code Website Design Tools to Create Responsive and Interactive Pages in 2024

voice user interface case study

User Journey vs User Flow: When To Use Which

voice user interface case study

UX Resources for Beginners

voice user interface case study

13 UX Survey Questions that Generate the Best Insights

Join 1000+ teams driving product success at speed.

  • Nick Babich & Gleb Kuznetsov
  • Feb 14, 2022

Everything You Want To Know About Creating Voice User Interfaces

  • 19 min read
  • UI , Voice , User Experience
  • Share on Twitter ,  LinkedIn

About The Authors

Gleb Kuznetsov has more than 15 years experience leading product, UI and UX design across web, mobile, and TV ecosystems. He has rightfully acquired a global … More about Nick & Gleb ↬

Email Newsletter

Weekly tips on front-end & UX . Trusted by 200,000+ folks.

Voice is a powerful tool that we can use to communicate with each other. Human conversations inspire product designers to create voice user interfaces (VUI), a next-generation of user interfaces that gives users the power to interact with machines using their natural language.

For a long time, the idea of controlling a machine by simply talking to it was the stuff of science fiction. Perhaps most famously, in 1968 Stanley Kubrick released a movie called 2001: A Space Odyssey , in which the central antagonist wasn’t a human. HAL 9000 was a sophisticated artificial intelligence controlled by voice.

Since then the progress in natural language processing and machine learning has helped product creators introduce less murderous voice user interfaces in various products — from mobile phones to smart home appliances and automobiles.

A Brief History Of Voice Interfaces

If we go back to the real world and analyze the evolution of VUI, it’s possible to define three generations of VUIs. The first generation of VUI is dated to the 1950s. In 1952, Bell Labs built a system called Audrey. The system derived its name from its ability to decode digits — Automatic Digit Recognition. Due to the tech limitations, the system could only recognize the spoken numbers of “0” through “9”. Yet, Audrey proved that VUIs could be built.

The second generation of VUIs dates to the 1980s and 1990s. It was the era of Interactive voice response (IVR). One of the first IVRs was developed in 1984 by Speechworks and Nuance, mainly for telephony, and they revolutionized the business. For the first time in history, a digital system could recognize human voice-over calls and perform the tasks given to them. It was possible to get the status of your flight, make a hotel booking, transfer money between accounts using nothing more than a regular landline phone and the human voice.

The third (and current) generation of VUIs started to get traction in the second decade of the 21st century. The critical difference between the 2nd and 3rd generations is that voice is being coupled with AI technology. Smart assistants like Apple Siri, Google Assistant, and Microsoft Cortana can understand what the user is saying and offer suitable options. This generation of VUIs is available in various types of products — from mobile phones to car human-machine interfaces (HMIs). They are fast becoming the norm.

Six Fundamental Properties Of VUI Design

Before we move to specific design recommendations, it’s essential to state the basic principles of good VUI design.

1. Voice-first Design

You need to design hands-free and eyes-free user interfaces. Even when a VUI device has a screen, we should always design for voice-first interactions. While the screen can complement the voice interaction, the user should be able to complete the operation with minimum or no look at the screen.

Of course, some tasks become inefficient or impossible to complete by voice alone. For example, having users listen and browse through search results by voice can be tedious. But you should avoid creating an action that relies on users interacting with a screen alone. If you design one of those tasks, you need to consider an experience where your users start with voice and then switch to a visual or touch interface.

2. Natural Conversation

The interaction with VUI shouldn’t feel like an interaction with a robot. The conversation flow should be user-centric (resembling natural human conversation). The user shouldn’t have to remember specific phrases to get the system to do what they want to do.

It’s important to use everyday language and invite users to say things in the ways they usually do. If you notice that you have to explain commands, it’s a clear indication that something is wrong with your design and you need to go back to the drawing board and redesign it.

3. Personalization

Personalization is more than just saying “Welcome back, %username%”. Personalization is about knowing genuine user needs and wants and adapting information to them. VUI gives product designers a unique opportunity to individualize the user’s entire interaction. The system should be able to recognize new and returning users, create user profiles and store the information the system collects in it. The more the system learns about users, the more personalized experience it should offer. Product designers need to decide what kinds of information to collect from users to personalize the experience.

4. Tone Of Voice

Voice is more than just a medium of interaction. In a few seconds, we listen to the other person’s voice; we create an impression on that person — a sense of gender, age, education, intelligence, trustworthiness, and many other characteristics. We do it intuitively, just by listening to a voice. That’s why it’s vital to give your VUI a personality — create the right brand persona that matches brand values. A good persona is specific enough to evoke a unique voice and personality.

5. Context Of Use

You need to understand where and how the voice-enabled product will be used. Will it be used by one person or shared between many people? In public or private areas? How noisy is the environment? The context of use will impact many product design decisions you will make.

6. Sense Of Trust

Trust is a foundational principle of good user experience — user engagement is built on a foundation of trust. Good interaction with the voice user interface should always lead to the buildup of trust.

Here are a few things product designers can do to achieve this goal:

  • Never share private data with anyone. Be careful to verbalize sensitive data such as medical data because users might not be alone.
  • Avoid offensive content. Introduce offensive or sensitive changes by age and region/country.
  • Try to avoid purely promotional content. Don’t mention products or brand names out of the context because users may perceive it as promotional content.

Design Recommendations

When it comes to designing VUI, it’s possible to define two major areas:

  • Conversational Design
  • Visual Design

1. Designing The Conversation

At first glance, the significant difference between GUI and VUI is the interaction medium. In GUI, we use a keyboard, mouse, or touch screen, while for VUI, we use voice. However, when we look closer, we will see that the fundamental difference between the two types of interfaces is an interaction model. With voice, users can simply ask for what they want instead of learning how to navigate through the app and learn its features. When we design for voice, we design conversational interactions.

Learn About Your Users

Conversations with a computer should not feel awkward. Users should be able to interact with a voice user interface as they would with another person. That’s why the process of conversation design should always start with learning about the users. You need to find answers to the following questions:

  • Who are your users? (Demographics, psychological portrait)
  • How are they familiar with voice-based interactions? Are they currently using voice products? (Level of tech expertise)

Understand Problem Space And Define Key Use Cases

When you know who your users are, you need to develop a deep understanding of user problems. What are their goals? Build empathy maps to identify users’ key pain points. As soon as you understand the problem space, it will be easier for you to anticipate features that users want and define specific use cases. (What can a user do with the voice system?)

Think about both the problem your user is trying to solve and how the voice user interface can help the user solve this problem. Here are a few questions that can help you with that:

  • What are the key user’s tasks? (Learn about user needs/wants.)
  • What situations trigger these tasks? (In what context users will interact with the system.)
  • How are users completing these tasks today? (What is the user journey?)

It’s also vital to ensure that a voice user interface is the right solution for the user problem. For example, voice UI might work well for the task of finding a nearby restaurant while you’re on the road, but it might feel clunky for tasks like browsing restaurant reviews.

Write Dialog Flow

At its core, conversation design is about the flow of the conversation. Dialog flow shouldn’t be an afterthought; instead, it should be the first thing you create because it will impact development.

Here are a few tips for creating a foundation for your dialog flow:

  • Start with a sample dialog that represents the happy path. The happy path is the simplest, easiest path to success a user could follow. Don’t try to make sample dialog perfect at this step.
  • Focus on the spoken conversation. Try to avoid situations when you write dialog differently than people speak it. It usually leads to well-structured but longer and more formal dialogs. When people want to solve a particular task, they are more to the point when they speak.
  • Read a sample dialog aloud to ensure that it sounds natural. Ideally, you should invite people who don’t belong to the design team and collect feedback.

The sample dialog will help you identify the context of the conversation (when, where, and how the user triggers the voice interface) and the common utterances and responses.

After you finish writing sample dialogs, the next thing to do is add various paths (consider how the system will respond in numerous situations, adding turns in conversations, etc.). It doesn’t mean that you need to account for all possible variations in dialogs. Consider the Pareto principle (80% of users will follow the most common 20% of possible paths in a discussion) and define the most likely logical paths a user can take.

It’s also recommended to recruit a conversation designer — a professional who can help you craft natural and intuitive conversations for users.

Design For Human Language

The more an interface leverages human conversation, the fewer users have to be taught how to use it. Invest in user research and learn the vocabulary of your real or potential users. Try to use the same phrases and sentences in the system’s response. It will create a more user-friendly conversation.

  • Don’t teach commands. Let users speak in their own words.
  • Avoid technical jargon. Let users interact with the system naturally using the phrases they prefer.

The UserAlways Starts The Conversation

No matter how sophisticated the voice-based system is, it should never start the conversation. It will be awkward if the system reaches the user with a topic they don’t want to discuss.

Avoid Long Responses

When you design system responses, always take a cognitive load into account. VUI users aren’t reading, they are listening, and the longer you make system responses, the more information they have to retain in their working memory. Some of this information might not be usable for the user, but there is no way to fast-forward responses to skip forward.

Make every word count and design for brief conversations. When you’re scripting out system responses, read them aloud. The length is probably good if you can say the words at a conversational pace with one breath. If you need to take an extra breath, rewrite the responses and reduce the length.

Minimize The Number Of Options In System Prompts

It’s also possible to minimize the cognitive load by reducing the number of options users hear. Ideally, when users ask for a recommendation, the system should offer the best possible option right away. If it’s impossible to do that, try to provide the three best possible options and verbalize the most relevant one first.

Provide Definitive Choices

Avoid open-ended questions in system responses. They can cause users to answer in ways that the system does not expect or support. For example, when you design an introduction prompt, instead of saying “Hello, its company ACME, what do you want to do?” you should say, “Hello, its company ACME, you can do [Option A], [Option B] or [Option C].”

Add Pauses Between The Question And Options

Pauses and punctuation mimic actual speech cadence, and they are beneficial for situations when the system asks a question and offers a few options to choose from.

Add a 500-millisecond pause after asking the question. This pause will give users enough time to comprehend the question.

Give Users Time To Think

When the system asks the user something, they might need to think about answering the question. The default timeout for users to respond to the request is 8-10 seconds. After that timeout, the system should repeat the request or re-prompt it. For example, suppose a user is booking a table at a restaurant. The sample dialog might sound like that:

User : “Assistant, I want to go to the restaurant.” System : “Where would you like to go?” (No response for 8 seconds) System : “I can book you a table in a restaurant. What restaurant would you like to visit?”

Prompt For More Information When Necessary

It’s pretty common for users to request something but not provide enough details. For example, when users ask the voice assistant to book a trip, they might say something like, “Assistant, book a trip to sea.” The user assumes that the system knows them and will offer the best possible option. When the system doesn’t have enough information about the use it should prompt for more information rather than offer an option that might not be relevant.

User : “I’d like to book a trip to the seashore.” System : “When would you like to go?”

Never Ask Rhetorical Or Open-ended Questions

By asking rhetorical or open-ended questions, you put a high cognitive load on users. Instead, ask direct questions. For example, instead of asking the user “What do you want to do with your invitation?” you should say “You can cancel your invitation or reschedule it. What works for you?”

Don’t Make People Wait In Silence

When people don’t hear/see any feedback from the system they might think that it’s not working. Sometimes the system needs more time to proceed with the user request, but it doesn’t mean that users should wait in absolute silence/without any visual feedback. At least, you should offer some audition signal and pair it with visual feedback.

Minimize User Data Entry

Try to reduce the number of cases when users have to provide phone numbers, street addresses, or alphanumeric passwords. It can be difficult for users to tell voice system strings of numbers or detailed information. This is especially true for users with speech impediments. Offer alternative methods for inputting this kind of information, such as using the companion mobile app.

Support Repeat

Whether users are using the system in a noisy area or they’re just having issues understanding the question, they should be able to ask the system to repeat the last prompt at any time.

Feature Discoverability

Feature discoverability can be a massive problem in voice-based interfaces. In GUI, you have a screen that you can use to showcase new features, while in voice user interfaces, you don’t have this option.

Here are two techniques you can use to improve discoverability:

  • Solid onboarding. A first-time user requires onboarding into the system to understand its capabilities. Make it practical — let users complete some actions using voice commands.
  • The first encounter with a particular voice app, you might want to discuss what is possible.

Confirm User Requests

People enjoy a sense of acknowledgment. Thus, let the user know that the system hears and understands them. It’s possible to define two types of confirmation — implicit and explicit confirmation.

Explicit confirmations are required for high-risk tasks such as money transfers. These confirmations require the user’s verbal approval to continue.

User : “Transfer one thousand dollars to Alice.” System : “You want to transfer one thousand dollars to Alice Young, correct?”

At the same time, not every action requires the user’s confirmation. For example, when a user asks to stop playing music, the system should end the playback without asking, “Do you want to stop the music?”

Handle Error Gracefully

It’s nearly impossible to avoid errors in voice interactions. Loosely handled error states might affect a user’s impression of the system. No matter what caused the error, it’s important to handle it with grace, meaning that the user should have a positive experience from using a system even when they face an error condition.

  • Minimize the number of “I don’t understand you” situations. Avoid error messages that only state that they didn’t understand the user correctly. Well-designed dialog flow should consider all possible dialog branches, including branches with incorrect user input.
  • Introduce a mechanism of contextual repairs. Help the system situation when something unexpected happens while the user is speaking. For example, the voice recognition system failed to hear the user due to the loud noise in the background.
  • Clearly say what the system cannot do. When users face error messages like “I cannot understand you” they start to think whether the system isn’t capable of doing something or they incorrectly verbalize the request. It’s recommended to provide an explicit response in situations when the system cannot do something. For example, “Sorry, I cannot do that. But I can help you with [option].”
  • Accept corrections. Sometimes users make corrections when they know that system got something wrong or when they decided to change their minds. When users want to correct their input, they will say something like “No,” or “I said,” followed by a valid utterance.

Test Your Dialogs

The sooner you start testing your conversation flow, the better. Ideally, start testing and iterating on your designs as soon as you have sample dialogs. Collecting feedback during the design process exposes usability issues and allows you to fix the design early.

The best way to test if your dialog works is to act it out. You can use techniques like Wizard of Oz , where one person pretends to be a system and the other is a user. As soon as you start practicing the script, you will notice whether it sounds good or bad when spoken aloud.

Remember, that you should prevent people from sharing non-verbal cues. When we interact with other people, we typically use non-verbal language (eye gaze, body language). Non-verbal cues are extremely valuable for conveying information, but unfortunately, VUIs systems cannot understand them. When testing your dialogs, try to sit test participants back to back to avoid eye contact.

The next part of testing is observing real user behavior. Ideally, you should observe users who use your product for the first time. It will help you understand what works and what doesn’t. Testing with 5 participants will help you reveal most of your usability issues.

2. Visual Design

A screen plays a secondary role in voice interactions. Yet, it’s vital to consider a visual aspect of user interaction because high-quality visual experiences create better impressions on users. Plus, visuals are good for some particular tasks such as scanning and comparing search results. The ultimate goal is to design a more delightful and engaging multimodal experience.

Design For Smaller Screens First

When adapting content across screens, start with the smallest screen size first. It will help you prioritize what the most important content is.

When targeting devices with larger screens, don’t just scale the content up. Try to take full advantage of the additional screen real estate. Put attention on the quality of images and videos — imagery shouldn’t lose its quality as they scale up.

Optimize Content For Fast Scanning

As was mentioned before, screens are very handy for cases when you need to provide a few options to compare. Among all content containers, you can use, cards are the one that works the best for fast scanning. When you need to provide a list of options to choose from, you can put each option on the card.

Design With A Specific Viewing Distance In Mind

Design content so it can be viewed from a distance. The viewing range of small screen voice-enabled devices should be between 1-2 meters, while for large screens such as TVs, it should be 3 meters. You need to ensure that font size and the size of imagery and UI elements that you will show on the screen are comfortable for users.

Google recommends using a minimum font size of 32 pt for primary text, like titles, and a minimum of 24pt for secondary text, like descriptions or paragraphs of text.

Learn User Expectations About Particular Device

Voice-enabled devices can range from in-vehicle to TV devices. Each device mode has its own context of use and set of user expectations. For example, home hubs are typically used for music, communications, and entertainment, while in-car systems are typically used for navigation purposes.

Further Reading : Designing Human-Machine Interfaces For Vehicles Of The Future

Hierarchy Of Information On Screens

When we design website pages, we typically start with page structure. A similar approach should be followed when designing for VUI — decide where each element should be located. The hierarchy of information should go from most to least important. Try to minimize the information you display on the screen — only required information that helps users do what they want to do.

Keep The Visual And Voice In Sync

There shouldn’t be a significant delay between voice and visual elements. The graphical interface should be truly responsive — right after the user hears the voice prompt; the interface should be refreshed with relevant information.

Motion language plays a significant part in how users comprehend information. It’s essential to avoid hard cuts and use smooth transitions between individual states. When users are speaking, we should also provide visual feedback that acknowledges that the system is listening to the user.

Accessible Design

A well-designed product is inclusive and universally accessible. Visual impairment users (people with disabilities such as blindness, low vision, and color blindness) shouldn’t have any problems interacting with your product. To make your design accessible, follow WCAG guidelines .

  • Ensure that text on the screen is legible. Ensure your text has a high enough contrast ratio. The text color and contrast meet AAA ratios.
  • Users who rely on screen readers should understand what is displayed on the screens. Add descriptions to imagery.
  • Don’t design screen elements that flicker, flash, or blink. Generally, everything that flashes more than three flashes per second can cause users with motion sickness headaches.

Related Reading : How A Screen Reader User Accesses The Web

We are at the dawn of the next digital revolution. The next generation of computers will give users a unique opportunity to interact with voice. But the foundation for this generation is created today. It’s up to designers to develop systems that will be natural for users.

Recommended Related Reading

  • “ Alexa Design Guide ,” Amazon Developer Documentation
  • “ Conversation Design Process ,” Google Assistant Docs
  • “ Designing Voice User Interfaces: Principles Of Conversational Experiences ,” Cathy Pearl (2017)
  • “ Applying Built-In Hacks Of Conversation To Your Voice UI ,” James Giangola (video)
  • “ Creating A Persona: What Does Your Product Sound Like? ,” Wally Brill (video)
  • “ Voice Principles ,” a collection of resources created by Clearleft.

Smashing Newsletter

Tips on front-end & UX, delivered weekly in your inbox. Just the things you can actually use.

Front-End & UX Workshops, Online

With practical takeaways, live sessions, video recordings and a friendly Q&A.

TypeScript in 50 Lessons

Everything TypeScript, with code walkthroughs and examples. And other printed books.

  • Get in touch
  • Enterprise & IT
  • Banking & Financial Services
  • News media & Entertainment
  • Healthcare & Lifesciences
  • Networks and Smart Devices
  • Education & EdTech
  • Service Design
  • UI UX Design
  • Data Visualization & Design
  • User & Design Research
  • In the News
  • Our Network
  • Voice Experiences
  • Golden grid
  • Critical Thinking
  • Enterprise UX
  • 20 Product performance metrics
  • Types of Dashboards
  • Interconnectivity and iOT
  • Healthcare and Lifesciences
  • Airtel XStream
  • Case studies

Data Design

  • UCD vs. Design Thinking

A Developer’s Guide To Voice User Interface Design

In this ever-evolving landscape of technology, voice user interfaces (VUIs) have emerged as a powerful and intuitive way for users to interact with applications and devices. The evidence of this has been apparent over the last decade with the increasing popularity and demands for virtual voice assistants, such as Siri, Alexa, and Google Assistant.

From smart appliances to virtual assistants, the demand for these automated and seamless voice-driven experiences continues to grow. As a designer and developer, it is critical to understand the p1rinciples and best practices of voice user interface design to create a user-friendly and proven experience for global users. In this guide to design voice user interfaces, we’ll cover the fundamentals of designing one.

Why Should You Look Into Voice User Interface Design Right Now?

Introducing a voice-driven interface to your project should be a necessary step in this day and age. Not only does it help create an accessible user experience , but it also makes it engaging.

In cases where natural, hands-free interaction is critical, offering a voice user interface (VUI) just makes sense. And, since voice user interfaces work across various devices, including TVs, phones, tablets, etc., making it easy for users to interact with the system can be made all the more seamless. For instance, applications that are designed for multitasking or situations where users have limited access to screens, such as driving or cooking, benefit significantly from VUI. In environments where the convenience of voice interaction surpasses the traditional point-and-click interactions, integrating a VUI becomes a sensible choice.

Additionally, industries like healthcare, where hands-free operation is crucial for hygiene, find value in incorporating voice interfaces. It’s essential to assess the context and user needs to determine if a VUI aligns with the intended use case, offering a more intuitive and user-friendly experience.

Experts in the application industry have moved away from innovating desktop systems since they are no longer the leading area of technology in use. A larger number of people use their mobile devices daily for just about everything—fitness trackers, navigation, note-taking, etc. Keeping in mind such interventions, it is crucial to include ux design including features that meet user needs at all times.

How To Design Voice User Interfaces (Voice UI)

Crafting an experience to include voice user interface or VUI can be challenging yet critical. Understanding the fundamental steps that can simplify and streamline your journey to designing an intuitive and effective voice-based user experience.

1. Establish A Persona and Tone

The first step towards designing a user experience is establishing a persona. Giving your voice interface a human touch via a consistent personality will help users interact with it casually.

Learn about the tone of voice that users are already accustomed to and their expectations from the interface to better craft a persona for your interface.

2. Understand User Context

Taking a stab at this design challenge begins with extensive user research. It is crucial to understand the users’ context and the specific use cases for your voice application.

Consider the user’s environment when using the app, goals, and frustrations to create the ideal storyboard for the features. For example, if your application is intended for use in a noisy environment, you may need to incorporate noise cancellation techniques or provide visual feedback along with the voice response.

3. Crafting A Conversation Flow

Designing a conversational flow is at the heart of a voice user interface design. Nailing a logical yet free-flowing dialog flow that mimics how a user would interact with the voice interface can make or break the experience.

An effective approach is to use real-life dialogs as references in use cases to help guide the design process.

4. Integrate Into Platforms Intuitively

When integrating voice-based interactions into any application–web or app, developers must find the appropriate platform and integration method. Choosing popular platforms such as Amazon Alexa, Google Assistant, or Siri can offer ready-to-deploy development kits that can streamline integration. However, it shouldn’t sway you into taking the easy route.

It is crucial that you take into consideration important factors like platform scalability, accessibility, and user experience.

5. Keeping it Conversational

Ensuring your application is adjacent to human conversation is one of the main features of voice-driven applications. VUIs are designed to provide users with the advantage of a natural and conversational interaction compared to traditional graphic-heavy user experiences.

Design your voice user interface to mimic human conversation by using natural language processing (NLP) techniques and providing responses that sound human-like. You must further ensure that your application is void of any industry jargon, making it accessible to a wider audience.

6. Voice First, Always

While it is essential to provide a visual interface for users who prefer a graphical representation, remember that the primary mode of interaction is voice. Design the voice interaction first and then consider how to enhance it with visual elements. Use visuals sparingly and only when necessary to complement the voice interaction.

7. Clear and Concise Prompts

When designing voice commands or prompts, make sure they are clear, concise, and easy to understand. Avoid long and complex sentences that may confuse the user. Break down complex tasks into smaller steps and provide clear instructions for each step. Use simple and familiar language to ensure that users can easily follow along.

8. Personalizing User Experience

Allowing users to interact with the voice-driven system could help you learn more about user preferences. These data points, aka insight, should be used to set the course of a conversation.

Leverage user data to provide personalized responses and anticipate user needs. However, prioritize privacy concerns and clearly communicate how user data is utilized.

9. Progressive Disclosure Is Your Friend

It is a design technique that involves revealing information gradually to prevent overwhelming the user with too much information at once. Apply this principle to voice user interface design by providing information in a step-by-step manner. For instance, instead of providing a long list of CTAs, narrow down the choices when asking the user specific questions.

10. Improve Error Handling

Users are bound to face roadblocks when using your application. Handling errors gracefully would be critical in such cases.

Anticipate edge cases that could be potential errors and provide clear and helpful error messages to guide the user in correcting their input. Use an empathetic and conversational tone when providing suggestions or alternatives to help the user recover from errors. Avoid generic error messages that do not provide any specific guidance.

11. Always Provide Confirmation

Informing users that their voice commands have been successfully processed is a foolproof way to retain your users. Immediate feedback and confirmation are some of the known aspects that drive continuous engagement with the users. Ensure that you provide the same to avoid losing them.

Use audio cues like bells, beeps, or chimes to indicate the successful completion of a task. Additionally, provide confirmation prompts wherever necessary to provide a status update every step of the way.

12. Iteratively Improve The User Experience

Just like any other design process, it is essential that developers consistently test the VUI to learn more about use cases and refine it for a better user experience.

Conduct usability testing with real users, possibly in real-life scenarios, to gather feedback and identify areas for improvement.

13. Accessibility Considerations

It goes without saying—your VUI should be accessible to a diverse audience. Consider accounting for different languages, accents, speech patterns, and imperfections when designing the experience. An ideal solution would be to provide users with options to adjust the speed of responses and consider adding voice commands to assist users with visual impairments.

14. Integrate Innovative Technology

The rapidly evolving industry demands new advancements and capabilities from voice technology across the globe. Latest developments, thus, roll out routinely to ensure we remain up to date with user and industry trends and needs.

It is essential to incorporate new technologies via features and functionalities into your voice user interface design. Consistent updates would help your users gain the best possible experience at all times.

Additionally, plan technology integrations into your VUI with scalability in mind. As your application evolves, the VUI should seamlessly adapt to new features and functionalities. Maintain flexibility in your design to accommodate future enhancements without requiring a complete overhaul.

In envisioning the future of voice user interface design, developers stand at the forefront of a transformative era where intuitive, voice-driven interactions redefine user experiences. As we propel into the future, the principles of conversational design will evolve to mirror even more closely the nuances of natural human dialogue dedicated to creating seamless exchanges between users and technology.

The ongoing refinement of natural language processing (NLP) and multi-modal interactions and extensive customer journey mapping will allow applications to adapt dynamically to users’ preferences and anticipate needs with unprecedented accuracy.

As VUI becomes increasingly integrated into our daily lives, developers hold the key to crafting not just functional applications but immersive, empathetic, and anticipatory voice experiences that elevate user satisfaction and seamlessly merge technology with the rhythm of everyday living.

Stuti mazumdar.

Experience Design Lead at Think Design, Stuti is a post graduate in Communication Design. She likes to work at the intersection of user experience and communication design to craft digital solutions that advance products and brands.

Vidhi Tiwari

Engineer turned writer, Vidhi is a seasoned UX Writer & Designer with a background in Computer Science. With her keen interest in research, she crafts empathetic content design and strategy to help build meaningful experiences.

Was this Page helpful?

Using white space in design: a complete guide, the power of white space in design, 14 things to keep in mind when designing for wearables, ui ux design, user & design research, service design.

We use cookies to ensure that we give you the best experience on our website. If you continue we'll assume that you accept this. Learn more

Recent Tweets

Sign up for our newsletter.

Subscribe to our newsletter to stay updated with the latest insights in UX, CX, Data and Research.

Get in Touch

Thank you for subscribing.

You will be receive all future issues of our newsletter.

Thank you for Downloading.

One moment….

While the report downloads, could you tell us…

Designing a VUI – Voice User Interface

Discover why conversational UIs and voice apps are surging in popularity and learn how to design voice user interfaces (VUIs) for both mobile and smart home speakers.

Designing a VUI – Voice User Interface

By Frederik Goossens

Frederik is a certified UX designer and product owner. He is a creative thinker experienced with user research and business analysis.

PREVIOUSLY AT

More and more voice-controlled devices, such as the Apple HomePod , Google Home , and Amazon Echo , are storming the market. Voice user interfaces are helping to improve all kinds of different user experiences, and some believe that voice will power 50% of all searches by 2020.

Voice-enabled AI can take care of almost anything in an instant.

  • “What’s next in my Calendar?”
  • “Book me a taxi to Oxford Street.”
  • “Play me some Jazz on Spotify!”

All five of the “Big Five” tech companies—Microsoft, Google, Amazon, Apple, and Facebook—have developed (or are currently developing) voice-enabled AI assistants. Siri, the AI assistant for Apple iOS and HomePod devices, is helping more than 40 million users per month , and according to ComScore , one in 10 households in the US already own a smart speaker today.

Whether we’re talking about VUIs (Voice User Interfaces) for mobile apps or for smart home speakers, voice interactions are becoming more common in today’s technology, especially since screen fatigue is a concern.

Amazon

What Can Users Do with Voice Commands?

Alexa is the AI assistant for voice-enabled Amazon devices like the Echo smart speaker and Kindle Fire tablet—Amazon is currently leading the way with voice technology (in terms of sales).

On the Alexa store, some of the trendiest apps (called “skills”) are focused on entertainment, translation, and news, although users can also perform actions like request a ride via the Uber skill, play some music via the Spotify skill, or even order a pizza via the Domino’s skill.

Another interesting example comes from commercial bank Capital One, which introduced an Alexa skill in 2016 and was the first bank to do so. By adding the Capital One skill via Alexa, customers can check their balance and due dates and even settle their credit card bill. PayPal took the concept a step further by allowing users to make payments via Siri on either iOS or the Apple HomePod, and there’s also an Alexa skill for PayPal that can accomplish this.

But what VUIs can do, and what users are actually using them for, are two different things.

ComScore stated that over half of the users that own a smart speaker use their device for asking general questions, checking the weather, and streaming music, closely followed by managing their alarm, to-do list, and calendar (note that these tasks are fairly basic by nature).

As you can see, a lot of these tasks involve asking a question (i.e., voice search).

Statistics for smart speaker usage in the US

What Do Users Search for with Voice Search?

People mostly use voice search when driving, although any situation where the user isn’t able to touch a screen (e.g., when cooking or exercising, or when trying to multitask at work), offers an opportunity for voice interactions. Here’s the full breakdown by HigherVisibility .

Android Auto voice app and voice user interface

Conducting User Research for Voice User Interface Design

While it’s useful to know how users are generally using voice, it’s important for UX designers to conduct their own user research specific to the VUI app that they’re designing.

Customer Journey Mapping

User research is about understanding the needs, behaviors and motivations of the user through observation and feedback. A customer journey map that includes voice as a channel can not only help user experience researchers identify the needs of users at the various stages of engagement, but it can also help them see how and where voice can be a method of interaction.

In the scenario that a customer journey map has yet to be created, the designer should highlight where voice interactions would factor into the user flow (this could be highlighted as an opportunity, a channel, or a touchpoint). If a customer journey map already exists for the business, then designers should see if the user flow can be improved with voice interactions.

For example, if customers are always asking a certain question via social media or live support chat, then maybe that’s a conversation that can be integrated into the voice app.

In short, design should solve problems. What frictions and frustrations do users encounter during a customer journey?

VUI Competitor Analysis

Through competitor analysis , designers should try to find out if and how competitors are implementing voice interactions. The key questions to ask are:

  • What’s the use case for their app?
  • What voice commands do they use?
  • What are customers saying in the app reviews and what can we learn from this?

US-based full-time freelance UI designers wanted

To design a voice UI for an app, we first need to define the users’ requirements. Aside from creating a customer journey map and conducting competitor analysis (as mentioned above), other research activities such as interviewing and user testing can also be useful.

For voice interface design, these written requirements are all the more important since they will encompass most of the design specs for developers. The first step is to capture the different scenarios before turning them into a conversational dialog flow between the user and the voice assistant.

An example user story for the news application could be:

“As a user, I want the voice assistant to read the latest news articles so that I can be updated about what’s happening without having to look at my screen.”

With this user story in mind, we can then design a dialog flow for it.

issuing a voice command for voice controlled user interface

The Anatomy of a Voice Command

Before a dialog flow can be created, designers first need to understand the anatomy of a voice command. When designing VUIs, designers constantly need to think about the objective of the voice interactions (i.e., What is the user trying to accomplish in this scenario? ).

A users’ voice command consists of three key factors: the intent , utterance , and slot .

Let’s analyze the following request: “Play some relaxing music on Spotify.”

Intent (the Objective of the Voice Interaction)

The intent represents the broader objective of a users’ voice command, and this can be either a low utility or high utility interaction .

A high utility interaction is about performing a very specific task, such as requesting that the lights in the sitting room be turned off, or that the shower be a certain temperature. Designing these requests is straightforward since it’s very clear what’s expected from the AI assistant.

Low utility requests are more vague and harder to decipher. For example, if the user wanted to hear more about Amsterdam, we’d first want to check whether or not this fits into the scope of the service and then ask the user more questions to better understand the request.

In the given example, the intent is evident: The user wants to hear music.

Utterance (How the User Phrases a Command)

An utterance reflects how the user phrases their request. In the given example, we know that the user wants to play music on Spotify by saying “Play me…,” but this isn’t the only way that a user could make this request. For example, the user could also say, “I want to hear music … .”

Designers need to consider every variation of utterance. This will help the AI engine to recognize the request and link it to the right action or response.

Slots (the Required or Optional Variables)

Sometimes an intent alone is not enough and more information is required from the user in order to fulfill the request. Alexa calls this a “slot,” and slots are like traditional form fields in the sense that they can be optional or required, depending on what’s needed to complete the request.

In our case, the slot is “relaxing,” but since the request can still be completed without it, this slot is optional. However, in the case that the user wants to book a taxi, the slot would be the destination, and it would be required. Optional inputs overwrite any default values; for example, a user requesting a taxi to arrive at 4 p.m. would overwrite the default value of “as soon as possible.”

Prototyping VUI Conversations with Dialog Flows

Prototyping designers need to think like a scriptwriter and design dialog flows for each of these requirements. A dialog flow is a deliverable that outlines the following:

  • Keywords that lead to the interaction
  • Branches that represent where the conversation could lead to
  • Example dialogs for both the user and the assistant

A dialog flow is a script that illustrates the back-and-forth conversation between the user and the voice assistant. A dialog flow is like a prototype, and it can be depicted as an illustration (like in the example below), or there are prototyping apps that can be used to create dialog flows.

An illustration of a dialog flow for VUI design

Apps for Prototyping VUIs

Once you’ve mapped out the dialog flows, you’re ready to prototype the voice interactions using an app. A few prototyping tools have entered the market already; for example, Sayspring makes it easy for designers to create a working prototype for voice-enabled Amazon and Google apps.

Prototyping VUI apps with Sayspring

Amazon also offers their own Alexa Skill Builder , which makes it easy for designers to create new Alexa Skills. Google offers an SDK; however, this is aimed at Google Action developers . Apple hasn’t launched their competing tool yet, but they’ll soon be launching SiriKit.

Amazon

UX Analytics for Voice Apps

Once you’ve rolled out a “skill” for Alexa (or an “action” for Google), you can track how the app is being used with analytics. Both companies offer a built-in analytics tool; however, you can also integrate a third-party service for more elaborate analytics (such as voicelabs.co for Amazon Alexa, or dashbot.io for Google Assistant). Some of the key metrics to keep an eye out for are:

  • Engagement metrics, such as sessions per user or messages per session
  • Languages used
  • Behavior flows
  • Messages, intents, and utterances

Alexa

Practical Tips for VUI Design

Keep the communication simple and conversational.

When designing mobile apps and websites, designers have to think about what information is primary, and what information is secondary (i.e., not as important). Users don’t want to feel overloaded, but at the same time, they need enough information to complete their task.

When designing UI for voice commands, designers have to be even more careful because words (and maybe a relatively simple GUI) are all that there is to communicate with. This makes it especially difficult in the case of conveying complex information and data. This means that fewer words are better, and designers need to make sure that the app fulfills the users’ objective and stays strictly conversational.

Confirm When a Task Has Been Completed

When designing an eCommerce checkout flow, one of the key screens will be the final confirmation. This lets the customer know that the transaction has been successfully recorded.

The same concept applies to voice assistant UI design. For example, if a user were in the sitting room asking their voice assistant to turn off the lights in the bathroom, without a confirmation, they’d need to walk into the sitting room and check, defeating the object of a “hands-off” VUI app entirely.

In this scenario, a “Bathroom lights turned off” response will do fine.

Create a Strong Error Strategy

As a VUI designer, it’s important to have a strong error strategy. Always design for the scenario where the assistant doesn’t understand or doesn’t hear anything at all. Analytics can also be used to identify wrong turns and misinterpretations so that the error strategy can be improved.

Some of the key questions to ask when checking for alternate dialogs:

  • Have you identified the objective of the interaction?
  • Can the AI interpret the information spoken by the user?
  • Does the AI require more information from the user in order to fulfill the request?
  • Are we able to deliver what the user has asked for?

Add an Extra Layer of Security

Google Assistant, Siri, and Alexa can now recognize individual voices. This adds a layer of security similar to Face ID or Touch ID. Voice recognition software is constantly improving, and it’s becoming harder and harder to imitate voice; however, at this moment in time, it may not be secure enough and an additional authentication may be required. When working with sensitive data, designers may need to include an extra authentication step such as fingerprint, password, or face recognition. This is especially true in the case of personal messaging and payments.

Duer voice assistant with face recognition software

The Dawn of the VUI Revolution

VUIs are here to stay and will be integrated into more and more products in the coming years. Some predict we will not use keyboards in 10 years to interact with computers.

Still, when we think “user experience,” we tend to think about what we can see and touch. As a consequence, voice as a method of interaction is rarely considered. However, voice and visuals are not mutually exclusive when designing user experiences—they both add value.

User research needs to answer the question on whether or not voice will improve the UX and, considering how quickly the market share for voice-enabled devices is rising, doing this research could be well worth the time and significantly increase the value and quality of an app.

Further Reading on the Toptal Blog:

  • The World Is Our Interface: The Evolution of UI Design
  • Future UI and the End of Design Sandboxes
  • Wearable Technology: How and Why It Works
  • Design With Precision: An Adobe XD Review
  • Voice of the Customer: How to Leverage User Insights for Better UX
  • Chatbot UX: Design Tips and Considerations
  • Designing for Tomorrow: Addressing UI Challenges in Emerging Interfaces (With Infographic)

Understanding the basics

What is a tangible user interface.

A tangible user interface is one that can be interacted with via taps, swipes and other physical gestures. Tangible user interfaces are commonly seen on touchscreen devices.

What is a speech interface?

A speech interface, better known as a VUI (Voice User Interface), is an invisible interface that requires voice to interact with it. A common device that has voice recognition software is the Amazon Alexa smart speaker.

What does an Echo do?

Amazon’s Echo smart speaker uses voice recognition software to help users perform tasks using voice interactions, even if they’re across the other side of the room. Echo smart speakers are powered by a voice assistant called Alexa, and VUI apps called “Skills.”

  • Prototyping

Frederik Goossens

Brussels, Belgium

Member since August 7, 2017

About the author

World-class articles, delivered weekly.

By entering your email, you are agreeing to our privacy policy .

Toptal Designers

  • Adobe Creative Suite Experts
  • Agile Designers
  • AI Designers
  • Art Direction Experts
  • Augmented Reality Designers
  • Axure Experts
  • Brand Designers
  • Creative Directors
  • Dashboard Designers
  • Digital Product Designers
  • E-commerce Website Designers
  • Full-Stack Designers
  • Information Architecture Experts
  • Interactive Designers
  • Mobile App Designers
  • Mockup Designers
  • Presentation Designers
  • Prototype Designers
  • SaaS Designers
  • Sketch Experts
  • Squarespace Designers
  • User Flow Designers
  • User Research Designers
  • Virtual Reality Designers
  • Visual Designers
  • Wireframing Experts
  • View More Freelance Designers

Join the Toptal ® community.

  • Reviews / Why join our community?
  • For companies
  • Frequently asked questions

Voice User Interfaces (VUI)

What are voice user interfaces (vui).

Voice user interfaces (VUIs) allow the user to interact with a system through voice or speech commands. Virtual assistants, such as Siri, Google Assistant, and Alexa, are examples of VUIs. The primary advantage of a VUI is that it allows for a hands-free, eyes-free way in which users can interact with a product while focusing their attention elsewhere.

Applying the same design guidelines to VUIs as to graphical user interfaces is impossible. In a VUI, there are no visual affordances; so, when looking at a VUI, users have no clear indications of what the interface can do or what their options are. When designing VUI actions, it is thus important that the system clearly state possible interaction options, tell the user what functionality he/she is using, and limit the amount of information it gives out to an amount that users can remember.

Because individuals normally associate voice with interpersonal communication rather than with person-technology interaction, they are sometimes unsure of the complexity to which the VUI can understand. Hence, for a VUI to succeed, it not only requires an excellent ability to understand spoken language but also needs to train users to understand what type of voice commands they can use and what type of interactions they can perform. The intricate nature of a user’s conversing with a VUI means a designer needs to pay close attention to how easily a user might overstep with expectations. This is why designing the product in such a simple, almost featureless form is important—to keep the user mindful that a two-way “human” conversation is infeasible. Likewise, the user’s patience in building a communications “rapport” will help improve satisfaction when the VUI, becoming increasingly acquainted with the speaker’s voice (which the speaker will use more effectively), rewards him/her with more accurate responses.

Literature on Voice User Interfaces (VUI)

Here’s the entire UX literature on Voice User Interfaces (VUI) by the Interaction Design Foundation, collated in one place:

Learn more about Voice User Interfaces (VUI)

Take a deep dive into Voice User Interfaces (VUI) with our course Human-Computer Interaction: The Foundations of UX Design .

Interactions between products/designs/services on one side and humans on the other should be as intuitive as conversations between two humans—and yet many products and services fail to achieve this. So, what do you need to know so as to create an intuitive user experience ? Human psychology? Human-centered design? Specialized design processes? The answer is, of course,  all  of the above, and this course will cover them all.

Human-Computer Interaction (HCI) will give you the skills to properly understand, and design, the relationship between the “humans”, on one side, and the “computers” (websites, apps, products, services, etc.), on the other side. With these skills, you will be able to build products that work more efficiently and therefore sell better. In fact, the Bureau of Labor Statistics predicts the IT and Design-related occupations will grow by 12% from 2014–2024, faster than the average for all occupations. This goes to show the immense demand in the market for professionals equipped with the right design skills .

Whether you are a newcomer to the subject of HCI or a professional, by the end of the course you will have learned how to implement user-centered design for the best possible results .

In the “ Build Your Portfolio: Interaction Design Project ”, you’ll find a series of practical exercises that will give you first-hand experience of the methods we’ll cover. If you want to complete these optional exercises, you’ll create a series of case studies for your portfolio which you can show your future employer or freelance customers.

This in-depth, video-based course is created with the amazing Alan Dix , the co-author of the internationally best-selling textbook  Human-Computer Interaction and a superstar in the field of Human-Computer Interaction . Alan is currently professor and Director of the Computational Foundry at Swansea University.    

All open-source articles on Voice User Interfaces (VUI)

How to design voice user interfaces.

voice user interface case study

  • 3 years ago

How to manage the users’ expectations when designing smart products

voice user interface case study

Open Access—Link to us!

We believe in Open Access and the  democratization of knowledge . Unfortunately, world-class educational materials such as this page are normally hidden behind paywalls or in expensive textbooks.

If you want this to change , cite this page , link to us, or join us to help us democratize design knowledge !

Privacy Settings

Our digital services use necessary tracking technologies, including third-party cookies, for security, functionality, and to uphold user rights. Optional cookies offer enhanced features, and analytics.

Experience the full potential of our site that remembers your preferences and supports secure sign-in.

Governs the storage of data necessary for maintaining website security, user authentication, and fraud prevention mechanisms.

Enhanced Functionality

Saves your settings and preferences, like your location, for a more personalized experience.

Referral Program

We use cookies to enable our referral program, giving you and your friends discounts.

Error Reporting

We share user ID with Bugsnag and NewRelic to help us track errors and fix issues.

Optimize your experience by allowing us to monitor site usage. You’ll enjoy a smoother, more personalized journey without compromising your privacy.

Analytics Storage

Collects anonymous data on how you navigate and interact, helping us make informed improvements.

Differentiates real visitors from automated bots, ensuring accurate usage data and improving your website experience.

Lets us tailor your digital ads to match your interests, making them more relevant and useful to you.

Advertising Storage

Stores information for better-targeted advertising, enhancing your online ad experience.

Personalization Storage

Permits storing data to personalize content and ads across Google services based on user behavior, enhancing overall user experience.

Advertising Personalization

Allows for content and ad personalization across Google services based on user behavior. This consent enhances user experiences.

Enables personalizing ads based on user data and interactions, allowing for more relevant advertising experiences across Google services.

Receive more relevant advertisements by sharing your interests and behavior with our trusted advertising partners.

Enables better ad targeting and measurement on Meta platforms, making ads you see more relevant.

Allows for improved ad effectiveness and measurement through Meta’s Conversions API, ensuring privacy-compliant data sharing.

LinkedIn Insights

Tracks conversions, retargeting, and web analytics for LinkedIn ad campaigns, enhancing ad relevance and performance.

LinkedIn CAPI

Enhances LinkedIn advertising through server-side event tracking, offering more accurate measurement and personalization.

Google Ads Tag

Tracks ad performance and user engagement, helping deliver ads that are most useful to you.

Share Knowledge, Get Respect!

or copy link

Cite according to academic standards

Simply copy and paste the text below into your bibliographic reference list, onto your blog, or anywhere else. You can also just hyperlink to this page.

New to UX Design? We’re Giving You a Free ebook!

The Basics of User Experience Design

Download our free ebook The Basics of User Experience Design to learn about core concepts of UX design.

In 9 chapters, we’ll cover: conducting user interviews, design thinking, interaction design, mobile UX design, usability, UX research, and many more!

Rethinking Web Interaction: An In-depth Look at Web VUIs

Rethinking Web Interaction: An In-depth Look at Web VUIs

Samuel Umoren's photo

Table of contents

What are voice user interfaces (vuis), vuis in websites. why, web speech recognition api:, natural language processing (nlp):, web automation:, handling dynamic web elements, ambiguous commands, accents and background noise, demonstration: a simple vui with speech recognition, wrapping up….

Today I woke up and felt like talking to websites. Instead of clicking a CTA button or an 'Add to cart' button, I wanted to be able to say it in words just like you would interact with Siri, Google Assistant, Alexa, etc. This technology is called Voice User Interface (VUI). I used the native Web SpeechRecognition API to implement a basic demo app with JavaScript. In this article, we'll delve into VUIs, their benefits, and how they work, culminating with a simple but functional demonstration of a web-based VUI. So, if you've ever wondered about giving your websites a 'voice,’ read on!

Voice User Interfaces, or VUIs for short, are like digital helpers that you talk to. They work by listening to your voice commands and doing what you ask them to do. You can find VUIs in many places these days.

For instance, when you ask your phone for the latest weather update or when you ask your smart speaker to play your favorite song, you're using a VUI. They're also in-car systems where you can control navigation, music, or make calls without taking your hands off the steering wheel.

Some popular examples of VUIs are Amazon's Alexa, Google Assistant, Siri, and Cortana. These are complex VUIs that can do many different things, but VUIs can also be simple, doing just one or two tasks. For example, a voice-controlled light switch in your home is also a VUI.

I believe VUIs can transform the user experience of websites, and these are my reasons:

Accessibility: VUIs make websites more accessible to a wider audience. They enable people with visual impairments or mobility issues to easily navigate and interact with a website. This is a major step forward in inclusive web design.

Convenience: VUIs offer a hands-free way of browsing a website. For example, users can search for a product, read reviews, and even make a purchase on an e-commerce site without ever having to touch the keyboard or mouse.

Efficiency: VUIs can simplify complex tasks on a website. Instead of clicking through various pages and forms, users can accomplish tasks quickly through voice commands.

Natural Interaction: VUIs make interaction with websites more intuitive. Speaking is a natural way of communication, and VUIs bring this naturalness to website interaction.

Improved User Engagement: VUIs can lead to increased user engagement. They can turn a passive browsing experience into an active conversation, making users feel more connected to the website.

Incorporating VUIs into websites can enhance the user experience by making them more accessible, convenient, efficient, and engaging. It is expected that as more users become familiar with voice assistants like Siri and Alexa, the expectation for voice interaction on websites will only grow.

The Components of VUIs

We need to employ a combination of technologies to bring voice interaction to a website. Let's take a closer look at the main components that make Voice User Interfaces (VUIs) possible on the web:

This is the first building block in creating a VUI for a website. The Web Speech Recognition API, native to modern web browsers, is designed to convert spoken language into written text. When a user issues a command such as "search for blue shoes," the Speech Recognition API transcribes this spoken input into written text. It's important to note that while this API is powerful, it could be better and can sometimes struggle with accents, background noise, and complex phrases. However, for simple and clear commands, it serves as a solid starting point for implementing VUIs in a web environment.

This is how it works:

The process starts by initiating a new instance of the SpeechRecognition object. With the new instance ready, its settings are adjusted to match the needs of continuous and interim speech recognition. The continuous property is set to true , ensuring that the speech recognition continues without automatically stopping. Another property, interimResults , is also set to true . This allows the system to provide results even as the user is still speaking, instead of waiting for complete sentences.

Next, an onresult function is declared on the recognition object. This function is triggered when the system successfully transcribes speech into text. Inside this function, the transcribed text, referred to as the transcript, is extracted from the event results.

The transcript is then examined for the presence of a specific command. In this case, the command is 'button'. A related action is triggered if this command is found within the transcript. Although this is a relatively simple example, more sophisticated systems could employ natural language processing to understand the meaning behind the command and use web automation to act on the webpage.

With all the components set, the speech recognition process is kickstarted by calling the start method on the recognition object. This sets the voice user interface into motion, enabling it to start listening for and processing the spoken commands.

After converting spoken words into written text using the Speech Recognition API, the next step in building a web VUI is to make sense of the user's command. This is where Natural Language Processing (NLP) comes in. NLP is a field of Artificial Intelligence that enables computers to understand, interpret, and generate human language in a valuable way. It can be used to parse the transcribed speech, identify key commands or intents, and even handle more complex conversational contexts. For instance, if a user says, "Show me the latest blog posts, and then sort them by popularity", an NLP system can break this down into two separate commands: displaying the latest blog posts and sorting the results.

While full-fledged NLP systems might seem complex, many libraries and APIs can help, such as Google Cloud Natural Language API, Microsoft's Azure Language Understanding (LUIS), or open-source libraries like NLTK and spaCy. These tools can greatly simplify the task of language understanding, providing pre-built models for tasks like entity recognition, sentiment analysis, and more.

Also, you can use complex Large Language Models (LLMs) like OpenAI's GPT to understand, interpret, and generate human language in a valuable way. These models have been trained on vast amounts of text data, enabling them to generate contextually relevant responses based on their input.

In the context of a VUI, LLMs like GPT can be utilized to understand the intent behind the user's command and generate appropriate responses. For example, if a user says, "Show me the latest blog posts and then sort them by popularity," an NLP system powered by an LLM can parse and understand the multiple actions requested in this single command. It can provide relevant responses or ask clarifying questions if the command is ambiguous.

In this basic VUI demo context, we're not employing any advanced NLP. We check if the transcribed text includes the word 'button'. This can be seen in the code snippet from the previous section:

The onresult function is examining the transcribed text for a specific command, in this case, 'button'. If the command is present, an action is triggered. In a more complex system, this is where an NLP library would be used to understand the intent behind the user's command.

It's worth noting that implementing NLP can add significant value to a VUI by enabling it to handle complex commands and provide more natural, conversational interactions. However, it also adds a layer of complexity to the system and may only be necessary for some use cases.

Web Automation is the final step in the process of translating a user's understood command into actionable behavior on a web page. This critical component utilizes tools such as Selenium or Puppeteer to enable programmatic control of web browsers. With the help of web automation, the Voice User Interface (VUI) system can interact with various elements of the web page, replicating human-like behavior. For instance, if the user commands the system to "Click me," web automation tools can be employed to locate the "Click me" button on the web page. Once identified, the system can trigger the button's onClick event, mimicking the action of a user physically clicking the button.

By utilizing web automation, the VUI system can navigate through different web page elements, interact with forms, submit data, scroll, and perform other actions that users typically perform manually. This integration seamlessly integrates the VUI system with the web application, facilitating a smooth and interactive user experience.

Selenium and Puppeteer are two popular tools offering extensive web automation capabilities. They provide APIs that allow developers to script and control browsers programmatically. These tools enable the execution of complex actions, handling dynamic elements, and asynchronous task performance. Consequently, the VUI system can effectively execute user commands and establish a fluid interaction between voice input and corresponding actions on the web page.

By leveraging the power of web automation, VUIs can automate repetitive tasks, enhance user interactivity, and streamline workflows within web applications. Combining speech recognition, natural language processing, and web automation empowers developers to create sophisticated voice-driven interfaces that elevate the overall user experience.

Challenges in Implementing VUIs on Websites

However, they are limitations and challenges that make implementing VUIs on websites difficult. I outline some of the relevant issues in this section.

Web applications often contain dynamic elements that can change in real-time or based on user interactions. For example, if a user says, "Click the second button," but the web page dynamically reorders the buttons, the VUI system must adapt to these changes and still perform the intended action accurately. Handling dynamic elements requires robust strategies, such as utilizing unique identifiers or utilizing DOM traversal techniques to locate elements reliably.

Spoken commands can sometimes be ambiguous or need more context for the system to understand the user's intent clearly. For instance, if a user says, "Delete it," without specifying what needs to be deleted, the VUI system needs to ask clarifying questions or make educated guesses based on the context. Resolving ambiguity requires advanced natural language processing techniques, context-aware algorithms, and user prompts to ensure accurate interpretation and execution of commands.

Speech recognition accuracy can be affected by various factors, such as regional accents or background noise. Different accents can pose challenges for speech recognition systems, as they might need help to accurately transcribe words. Additionally, background noise can interfere with speech clarity, leading to misinterpretation of commands. Overcoming these challenges requires robust speech recognition algorithms and techniques, including accent-specific training data and noise reduction algorithms.

There are other challenges, such as user adaptation and integration with existing systems. These challenges can be resolved with robust algorithms, considering user-centric design principles, and leveraging advancements in speech recognition and natural language processing technologies.

I built a simple VUI with speech recognition. Here’s the demo :

This simple demonstration of a Voice User Interface (VUI) using Speech Recognition technology. The application listens to the user's spoken commands and performs actions on the user interface based on those commands.

In this basic example, the application listens for the "Button" command. When this command is recognized, it simulates a click on a button in the interface.

This demonstration showcases several important aspects of VUI design:

Speech recognition : The application listens to the user's spoken commands and accurately transcribes them into text.

Command processing : The application interprets the recognized text and maps it to an action.

Action execution : The application performs the action on the user interface (in this case, clicking a button).

This demo is a good starting point for anyone interested in developing more complex voice-interactive applications. While simple, it captures the fundamental process of turning speech into action.

Here is the code on Github:

GitHub - Umoren/basic-vui-demo: A basic demonstration of a Voice User Interface (VUI) using Speech Recognition technology. Listens for the "Click me" command and performs a simulated button click in response.

The potential of VUIs extends beyond accessibility and convenience. They can provide a more seamless and engaging user experience, allowing users to interact with websites and applications more naturally and conversationally. This can lead to increased productivity, reduced cognitive load, and enhanced accessibility for users with disabilities.

As technology progresses and our understanding of human-computer interaction deepens, the future of VUIs looks promising. It is an exciting time to explore, innovate, and shape the future of web-based voice interfaces.

  • Top Articles
  • Experiences

Voice User Interfaces: Seamless Interactions

In this article, I’ll focus on some technological innovations in voice user interfaces (VUIs). What is a voice user interface, or VUI? VUIs employ speech-recognition technology that enables users to communicate via a computer, smartphone, smart speaker, or other device by using voice commands.

Some examples of VUIs include products such as Apple’s Siri, Amazon’s Alexa, Google’s Assistant, and Microsoft’s Cortana. Voice technology is different from any other method of interaction because it relies on the user’s voice commands instead of traditional modes of communication.

How Do Voice User Interfaces Work?

Since VUIs are a relatively recent innovation in user-interface (UI) design, they incorporate many recent technological advancements, including artificial-intelligence (AI) technologies such as speech synthesis and automatic speech recognition. Once the user has adopted a VUI or installed it on a device, it can function successfully within many applications.

Furthermore, when a VUI processes the user’s voice and speech using backend functionality and AI-powered speech components that reside on a private or public cloud, the AI technology can recognize the user’s intent and quickly provide a response to the user’s device.

Benefits of Voice User Interfaces

The advantages or benefits of voice user interfaces are many. Let’s consider some of the benefits of VUIs, as follows:

  • Since speaking is much quicker than typing, VUIs help users save time, regardless of what purpose an application serves.
  • VUIs provide similar results for multiple synonyms, so users can always find what they’re looking for, regardless of what they say.
  • VUIs promote multitasking, so users can use the VUI and keep on performing a task at the same time.
  • VUIs promote inclusiveness and are accessible to people who have a variety physical impairments.
  • VUIs help people perform operational tasks, making it easier for them to conduct any task effectively.

Security for Voice User Interfaces

The security of users’ data is one of the top priorities and concerns of all technology developers. Because VUI technology can easily differentiate between the voices of different users, developers have been putting their best efforts into making VUIs secure for users.

Nevertheless, VUIs still are not the safest or most secure technologies. Although developers are trying their best to improve the security of VUI technology, it is still best to offer a password or fingerprint identification option for such systems.

Comparative Analysis of VUIs and Other User Interfaces

What is the usefulness of VUIs versus other user interfaces such as graphic user interfaces (GUIs)? Both types of systems can be equally useful, depending on the nature and context of the user’s task and the information the user requires.

Since VUIs are a more recent advancement, they include additional features and offer unique benefits such as being more accessible and convenient for users, anytime and anywhere. GUIs require more time for users to type out what they’re looking for. But people still need to get exposure to and become familiar with VUI systems.

Common Applications of Voice User Interfaces

People sometimes limit their perception of the feasible applications of voice user interfaces to search-engine commands. However, this is not the case in reality. Some common applications of VUIs in today’s world include the following:

  • VUIs facilitate the process of searching for something online.
  • VUIs can transform speech into text.
  • VUIs convey commands to smart-home devices and systems.
  • VUI applications can improve customer service through the use of Interactive Voice Response (IVR) systems and analytics.
  • The use of VUIs can improve biometric systems.

Voice User Interface Platforms and Tools

Developers often become concerned about where to start their initial planning and design phases for VUI systems. The following tools might help beginners kick start their journey:

  • Alexa Skills Kit (ASK)
  • Action Builder for Google Assistant
  • Web Speech API

Challenges in Implementing Voice User Interfaces

Although there are advantages relating to new technological inventions, there are also some challenges. For VUIs, the following challenges exist.

  • challenges with the accuracy voice-recognition systems
  • VUIs use of natural language processing (NLP)
  • ethical considerations in VUI development
  • users’ high demands and expectations
  • functionality issues

Challenges with the Accuracy of Voice-Recognition Systems

The first challenge for a VUI system relates to the accuracy of its ability to identify users’ speech inputs. Although the system’s technology might be fully functional, there are still other factors that affect speech recognition such as a microphone’s quality, background noise and the user’s accent, vocabulary, and conversational context. These problems could result in poor performance and, as a consequence, users’ poor opinion of the VUI system.

VUIs Use of Natural Language Processing (NLP)

Next is the challenge of understanding the intent and meaning of the user’s input. Analyzing and interpreting language inputs are key components of natural language processing (NLP), natural-language interpretation (NLI), and natural language understanding (NLU), which are aspects of artificial intelligence (AI) that are necessary to come up with the best possible response to speech input. Given that natural language is sometimes imprecise or insufficiently clear, it can become difficult for NLP, NLI, and NLU systems to comprehend and respond to the user’s commands.

Ethical Considerations in VUI Development

The ethical considerations of using VUIs pose other problems such as adequately considering privacy, security, and the system’s biases. However, VUI experts endeavor to meet these challenges by protecting the user’s data and identity, respecting the user’s rights and values, making users aware of their capabilities and limitations, and refraining from causing any sort of damage. VUI systems can help users handle ethical issues by giving them the autonomy to participate and voice their opinions.

Users’ High Demands and Expectations

Another challenge of employing voice user–interface technology is fulfilling the needs and preferences of users and using the technology to provide them with the best possible user experience. Since people all over the world use this technology, it is difficult to manage people’s different cultural backgrounds and individual differences.

Functionality Issues

VUI designers must battle the challenge of VUI systems’ functionality issues. To ensure that users experience a smoothly functioning system, it is necessary to keep tabs on the feedback system and make the VUI discoverable and easy to learn and use for everyone.

Best Practices for Designing Voice User Interfaces

Now, let’s consider some best practices of skilled UX professionals who design VUI systems, as follows:

  • Understand the target audience and the purpose of developing the system. Developers of VUIs should know the answers to what, how, why, who, and when before diving into the design phase.
  • Using simple, understandable language is essential when designing a VUI. It is important to design an effective communication flow between the system and the user.
  • It is crucial that VUI systems practice providing vocal or written feedback to users in ways that are similar to what users would type when communicating with a system. This builds rapport and ease of use for the user.
  • Developers must iteratively test and modify the VUI system to provide top-quality performance to users.

Future Trends of Voice User Interfaces

As VUIs have become more useful and popular, experts in the field have been predicting some future trends regarding this technology, as follows:

  • VUI systems will include personalization and customization features to meet the expectations of individual users.
  • Emotional-intelligence technology will become part of VUI systems, helping bots understand the emotions of users through their speech patterns and behavioral cues.
  • VUI technology will become a key part of the Internet of Things (IoT) to facilitate the processing of voice commands.
  • Multipurpose VUIs will be able to handle context and conversational flows across multiple platforms under the guidance of the user.

VUIs for Accessibility

Voice user interfaces can contribute a lot of value to the goals of achieving accessibility. The technology focuses on providing solutions for all users, including those who suffer from any form of physical impairment. A VUI’s user-friendly user interface and functional dynamics make it more accessible and beneficial to every user.

Real-World Use Cases for VUIs

The applications of VUI systems are not limited to just searching for products and other services using search systems. The real-world use cases for VUIs apply to domains such as the following:

  • improving customer service in the business sector and industries through the use of voice assistants
  • providing in-car speech assistance in the automotive industry
  • being part of mobile applications and the mobile-app design industry
  • in academia, helping students to improve their pronunciation and vocabulary and helping students who have visual impairments
  • in the healthcare industry, utilizing medical transcription facilities

Multilingual Voice User Interfaces

Multilingualism is a critical issue that developers face when creating VUI systems. Since users all over the world are using this technology, it is a necessary to factor in cultural differences in language patterns, vocabulary, regional accents, and ethical dilemmas when constructing voice user interfaces. Both developers and technicians must make sure that they deal with these factors professionally by installing various data sets into the system and, thus, providing the best possible user experience to each individual.

User Experience Testing for VUI

Similar to visual or graphical user interfaces (GUIs), it is best to assess voice user interfaces through usability testing with participants from the target audience. A usability study helps developers understand the user experience, the problems relating to it, and solutions that would resolve the issues. For VUIs, developers must focus on how well the software responds to vocal input and how well its interpretations of the user’s inputs match the user’s mental models.

No Comments

Join the discussion, morrison kristan.

UX Designer at Appsnado

Mt. Laurel Township, New Jersey, USA

Morrison Kristan

Other Articles on Voice User Interfaces

  • It Takes an Ecosystem to Make a Smart Speaker Smart
  • 6 Reasons Your Touch-Screen Apps Should Have Voice Capabilities
  • Why Mobile Apps Need Voice Interfaces
  • Progressing from One-Way Notifications to Two-Way Conversations

New on UXmatters

  • Humanizing Technology in the Age of Conscious AI
  • Building a User-Centered Product-Management Culture
  • Designing a Pre-signup Experience Based on User Intent
  • Enhancing Child Safety: Pioneering Child-Safety Features in Smart Devices
  • Personalization in Orthodontic Care: The UX of Customized Treatment Plans

Share this article

  • This is an Ad-Free website
  • Learn AI Jobs skills
  • Blockchain Courses
  • Digital Jobs Offers

Digital-coach.com

Voice User Interface (abbreviated as VUI) refers to interfaces that enable vocal interaction between humans and devices.

A Voice User Interface can be any object, as long as it is capable of recognizing what the person addressing it is saying and consequently responding intelligently.

If some aspects still seem a bit strange to the general public, we cannot overlook that more and more companies are launching products based on the Voice User Experience.

The term has been chosen specifically because it is not about creating a simple interaction between a product and a customer, but rather reproducing a true system of experiences that, like any valid User Experience, can evoke emotions in the user.

If you want to learn more, I suggest checking out the Web Analytics Course and User Experience Course available at Digital Coach®.

What is a Voice User Interface?

A Voice User Interface , or Vocal Interface, is therefore a device capable of establishing interaction with a human being.

This interaction unfolds in two distinct moments, both necessary to give the user a complete User Experience (UX), voice recognition and voice synthesis .

What is Voice Recognition?

In this first process, the device must be able to understand what the user is saying and therefore implement real Voice Recognition . This ability is otherwise known as ASR (Automatic Speech Recognition).

Depending on what they can identify verbally, they are classified based on whether they can:

  • recognize individual isolated words or complete meaningful sentences ;
  • recognize the voice of any individual or only of the single person who configured them;
  • understand any type of question or only requests circumscribed to a specific context.

What is Voice Synthesis?

In this second phase of the process, namely Voice Synthesis , the device must be able to respond coherently to the user.

This is commonly known as TTS (Text to Speech), which is the conversion of written text into an artificial voice produced by a computer.

To achieve such a result, the following techniques are used:

  • articulatory synthesis systems : capable of reproducing the functioning of the vocal apparatus;
  • formant synthesis : managing the acoustic parameters handled by the artificial signal by these particular mathematical filters;
  • fragment synthesis : acoustic fragments taken from the natural voice are used to compose messages. Once extracted, these fragments are stored in a database, then selected as needed, and finally reassembled to create a sound that exactly corresponds to the written letters.

When was the Voice User Interface born?

Despite the phenomenon of the Voice User Interface exploding only in recent years, its origins date back several decades.

More precisely, the first mention dates back to 1952 when, at the infamous Bell Labs in New Jersey, “The Audrey” came to light, a primitive speech recognition system that could pick up individual and very few words, mostly uttered by a single user, and provided basic outputs.

In the subsequent decades, research progressed in both expanding the recognized vocabulary and, more importantly, shifting from the recognition of individual words to that of “continuous speech,” without having to pause between individual words.

voice user interface case study

This led to the birth of the first independent speaker systems in the 1990s. The widespread adoption of Voice User Interface began in the early 2000s with the systems of Interactive Voice Response (IVR) .

This is a system capable of providing information to a human caller by interacting via a telephone keypad , giving the customer the necessary information (e.g., opening hours, cost of a service, product specifications), and relieving the workload of telephone operators.

Today, we find ourselves in what could be defined as the second major era of IVRs.

Why use Voice Search?

As part of a complete User Experience from a vocal perspective, a fundamental aspect to analyze, especially in the context of current and future trends, is Vocal Search , to which users are becoming increasingly accustomed.

A human asks a question to the relevant voice interface, which is capable of processing the response with phrases that mimic natural language.

The importance of Vocal Search is demonstrated by its integration into numerous devices, such as smartphones, cars, and home assistants, which we will specifically discuss below.

Let’s now focus on understanding why using voice search is so convenient and usable , that is, simple, and intuitive, and what the drawbacks might be.

What are the advantages of Voice Search?

The Advantages of resorting to the use of Voice Search are numerous and can be summarized as follows:

  • Allows for hands-free operation: in a multitasking society like ours, where technology is seamlessly integrated into our daily lives, there is often a need to use our devices while engaged in other activities. For example, while slicing something in the kitchen, how convenient is it to ask Alexa to remind you when you need to take the roast out of the oven in half an hour? Or, while driving, how safe – and road-compliant – is it to ask your car to call and notify you when you’ll be home?
  • Speed: in addition to multitasking, ours is a society of extreme speed, and saving a few seconds is considered invaluable. Speaking is much faster than typing, especially for those accustomed to keyboard input. This aspect can be particularly interesting not only in private life but especially in a professional context, especially for users who write numerous texts daily and can thus lighten their workload.
  • Always at hand:  the devices we use for voice search are always with us, so we don’t have to go hunting for them, gaining in terms of time and usability. Just think of your Siri while in front of the TV, the interface set up in your car while driving, and your smartphone practically wherever you are.
  • Intuitiveness:  while not everyone can express what’s in their head in writing, almost everyone can do so verbally, making voice search a very simple and intuitive way to operate for anyone.

What are the disadvantages of Voice Search?

Like any phenomenon, the Voice User Interface not only offers advantages but also disadvantages .

Let’s analyze the most common ones:

  • Insufficient bandwidth:  in large cities, difficulties in accessing a fast network, capable of supporting the internet on smartphones, are less relevant. However, in more isolated areas, this issue can be more significant.
  • Noisy environments:  if we are not in a perfectly isolated room, the voice system may pick up noises or voices different from ours, making it difficult to understand the request and, consequently, process the response. This can happen both outdoors and indoors, for example, in the increasingly popular co-working spaces.
  • Lack of Privacy:  many times, we entrust to online searches those questions we don’t dare to ask other human beings because they touch on our most sensitive areas, such as sexuality, health, and personal relationships, and we even delete written searches from our history. Imagine asking certain embarrassing questions out loud! Therefore, the context in which certain voice searches, particularly private ones, are made is crucial.
  • Lack of habit and discomfort:  even if all precautions regarding privacy are taken, many users are still not accustomed to addressing a voice interface and may feel uncomfortable, perhaps feeling a bit strange talking to an electronic device rather than another human being.

disadvantages of voice search

How does Voice User Interface Affect Web Marketing?

A phenomenon of such broad and growing scope, like that of the Voice User Interface , can only generate repercussions on the way of conducting Web Marketing and the digital professions connected to it.

VUI and Copywriting

Imagine performing a voice search in one of the situations that we may have seen in the previous paragraphs, you will typically be engaged in other activities and need a quick and prompt response to satisfy the specific need you are expressing at that particular moment.

In most cases, it will be a practical need , such as finding an open bar nearby in the late hour while you are out, and the voice interface will try to provide you with an equally quick and practical answer, avoiding the need to navigate a website.

Here are some considerations if you are the Copywriter of a website that aims to be the answer to that voice search:

  • Write rich, detailed content with detailed information. This way, intelligent agents may make your articles appear in instant answers, thus increasing your page’s CTR.
  • In voice search, users typically use more words and construct longer and less precise sentences compared to written searches. Prepare your content to capture these voice requests.
  • Build content that focuses not so much on the individual keyword but on the entire semantic field through which it is assumed the user will make their voice search.

Download the free guide and learn the SEO secrets to being in the top position in Google’s voice search

seo copywriting ebook guide

VUI and Local Search

According to Google Trends , over the last three years, there has been an interesting growth in searches for “ near me .” People are increasingly using search engines to find local businesses, making them just a click away.

Typically, those conducting such searches are highly interested in quickly finding the service provider because they have an immediate need to fulfill – it’s not just curiosity but a genuine urgent need .

Being easily discoverable is crucial for a local business because the user is in a phase of the sales process very close to the point of purchase.

Therefore, if you manage the digital channels of a local business, it’s essential to pay attention to the voice optimization of a website.

growth in local search

VUI and SEO

Chatmeter has coined a new term, in the realm of Voice User Interface , it makes more sense to talk not so much about SEO but about the brand-new VEO , meaning Voice Engine Optimization .

This is an innovative activity that optimizes digital assets to increase the chances of capturing traffic from voice search results.

For some, the Voice Search phenomenon surpasses the concept of the traditional SERP (Search Engine Results Page) in favor of the brand-new VERSO (Vocal Engine Result Search Output).

While in a classic search, with a SERP of ten results, we are willing to delve into the topic and browse through all the results on the first page of Google, we have seen that in voice search, we need a quick, concise, and fast response, and most likely, we won’t go beyond listening to the first three results.

Appearing on the podium of this voice search is, therefore, more than fundamental, as being there gives us a greater chance of capturing a customer very close to making a purchase, who in many cases may not have even visited our website.

Internet of Things (IoT)

The Internet of Things (IoT) is an increasingly relevant technology that, thanks to the use of advanced technologies, analysis of big data , machine learning , and Artificial Intelligence.

It involves the connection to the Internet of any physical object, with implications that until recently were considered unthinkable.

In particular, IoT refers to any system of physical devices – such as light bulbs, thermostats, shipping labels, and medical devices – that receive and send data over wireless networks, without any manual intervention, thanks to the integration of data processing devices and sensors.

Imagine being a certain distance from home, heading towards it. Without you taking any action, a specially designed thermostat may be able to ensure that you find the optimal temperature inside your home exactly when you arrive.

IoT solutions not only improve existing business systems but also create new ways of interacting with customers within increasingly advanced User Experience .

The main applications of this new frontier mainly concern smart buildings, the biomedical sector, surveillance, smart agrifood, animal husbandry, and especially, in connection with the Voice User Interface theme, the smart home .

Internet of Things and Smart Home

According to the Smart Home research by the Internet of Things Observatory, the value of the smart home market in 2018 was $380,000,000.

A similar trend is driven by voice assistants, which, in addition to generating significant sales volumes, have boosted sales throughout the sector.

At the same time, the level of knowledge and dissemination of such devices in our compatriots’ homes is also growing: a considerable majority of Americans (69%) are familiar with smart homes, while 46% of households currently own at least one smart home device. Beyond smart speakers, security solutions such as door and window sensors are among the most popular smart home products among American homeowners.

According to this research, Artificial Intelligence (AI) can play three roles, which can be integrated:

  • It can act inside connected objects, improving their functionality and processing data without the need to go through the cloud;
  • It can further enhance the operation and understanding capabilities of voice assistants;
  • It can become a true governess of our homes.

artificial intelligence roles

Which devices integrate the Voice User Interface?

Today, the Voice User Interface technology sees the following major players:

  • Amazon: Alexa is based on the Bing search engine and is activated simply by naming it. Its name was chosen thinking that the X inside it would be easier to detect. Alexa is becoming an additional family member in many homes! For this reason, Amazon has decided to incorporate it into other household products, such as clocks or microwaves, for a true User Experience based on the Internet of Things.
  • Apple:  its Siri relied on Bing for some time, while currently, it relies on Google. Thanks to Artificial Intelligence, Siri can act as a true assistant to the person holding the phone. One of the latest developments at Apple is Home Pod , and now Apple Music and Siri are integrated into all these new systems, providing the user with a complete experience, both inside and outside the home. Apple’s technology excels in adapting to even the noisiest environments, a simple “Hey Siri” is enough for its six microphones to incorporate all the surrounding sounds.
  • Google: in this field too, Google has released one of the most popular products among users, its assistant only needs an “OK Google”, followed by a polite “Good morning” to start giving you all the information you need to start your day. After calling you by name, it will tell you the exact local time, provide you with weather forecasts, list the appointments on your calendar, and wish you a good day by playing the radio news. Your Google Assistant can perform many functions : if you feel like laughing, ask it to tell you a joke or sing you a song! Google has revealed the development of a new feature for the Assistant on Android phones, making it an efficient screen reader. Just say “OK Google, read this page,” and it will do the task. The novelty lies not in the screen reader itself but in the advancement of the technology used. Google has stated that it has improved the Assistant’s ability to analyze sentences and read them with the tone and rhythm most similar to those a human would use when fully immersed in what they are reading. The most incredible feature is that you can ask the assistant to read the text in a language different from the one it is written in, choosing from the available 42 languages.

It is worth noting how these devices are designed not to appear as simple robots. Alexa seems more like a person’s name than a device.

This is intended to give the user a more human perception of their experience. Consequently, it will be perceived as positive and perfectly integrable into their daily life, leading to an increasingly widespread success of the phenomenon.

Do Voice User Interface devices work?

VUI technology has its challenges. The main flaws include irrelevant responses to questions, a lack of basic general knowledge, and non-context-specific answers.

Research is actively addressing these issues, especially in making the interacting device more human-like.

In this regard, Google has introduced the brand-new chatbot Meena , capable of considering over 2.6 billion parameters and having conversations much closer to those of a human.

While waiting to see it in action, we noticed the broader “humanity” of our Assistant when we pointed out that it didn’t understand our request.

Voice User Interface and Predictive Analytics

Through the analyzed devices, it’s evident how people, often unknowingly, come into contact with various brands to obtain information.

Voice User Interface seamlessly intertwines with a topic dear to corporate marketing that of predictive analytics .

What are Predictive Analytics?

Predictive analysis is a strategy that allows leveraging known data , analyzing it, and making predictions about the probability of a specific event occurring.

The analysis of user insights and their future behavior forms the basis for numerous marketing actions.

These analyses enable:

  • Estimating future profits
  • Creating a targeted customer database
  • Planning targeted sales
  • Targeting online advertising
  • Sending targeted marketing campaigns to undecided customers
  • Identifying customer abandonment rates and creating timely recovery strategies

And who can provide the best information about the user, if not the users themselves ?

Why is Voice User Interface crucial for Predictive Analytics?

In this historical moment in general, and specifically for the predictive analytics we’re discussing, digital data represents an invaluable asset.

Devices integrating Voice User Interfaces play a crucial role by providing first-hand information on user experiences and guiding the major brands behind them toward the best marketing campaigns.

It all started with Amazon. In 1998, when Amazon was not an e-commerce giant but a startup selling books online, the Recommendation Algorithm was created to suggest to readers titles relevant to their interests, all still manually.

Today, we have Alexa, on which Amazon is heavily focusing, encouraging developers to enhance the Artificial Intelligence (AI) system, making it more and more powerful.

The better the system can understand user requests, the more it can provide suitable and precise responses, often resulting in the sale of a product or service.

Transform voices into insights, and challenge yourself with the Web Analytics Specialist test

analysing competitors socials

Voice User Interface and User Experience

Within the evolving discipline of User Experience , which involves the design and anticipation of the overall user experience in the face of a specific digital phenomenon.

The Voice User Interface has and will play an increasingly important role.

What does a VUI Designer do?

The UX Designer, or the User Experience Designer , is a crucial figure responsible for the user’s experience in their interaction with machines, from websites to applications.

Simultaneously, we must consider the phenomenon of conversational architecture, the science that analyzes and organizes all the input we hear and emit vocally through acoustic means.

Voice interfaces become an element of the user experience for mobile applications where the prevailing element is what the human experiences in the use of the Internet of Things (IoT) .

In this context, the role of the UX Designer further specializes, resulting in becoming a VUI Designer , a Voice User Interface Designer—a professional who designs the entire vocal conversation between the device and the user.

A VUI Designer, alone or assisted by specialists, performs the following tasks:

  • Identifies the user’s needs.
  • Research to understand the target user.
  • Creates maps, drawings, and prototypes.
  • Write descriptions of the interaction between the user and the device.
  • Identifies the strengths and weaknesses of the technology in play.

Discover the secrets of VUI and elevate your UX game. Grab our free UX Designer ebook – download it now

mini guide UX Designer ebook

Is VUI necessary for every User Experience?

With the increasing importance of Voice Interfaces established, the question arises, is it necessary to incorporate VUI for every device?

The answer is not straightforward. It is essential to consider the phenomenon, but it remains crucial to ask whether VUI, like any other digital phenomenon, makes sense and can bring value to the user within the specific User Experience.

  • For instance, if you are designing a product or service that implies a highly useful element in VUI, such as a device to be operated while driving, incorporating a voice interface will undoubtedly be an added value that enhances usability within the specific User Experience.
  • On the other hand, if you are designing something that will face insurmountable disadvantages of VUI, it is advisable not to include it in your device. For example, if your service will be used in a sensitive location like a hospital or a public office where everyone can hear, the emotional side of the User Experience would undoubtedly be compromised.

The latest trend in product and service design revolves around the voice user interface. This technology extends beyond simple voice search or utilizing the services of a voice assistant.

The potential for integrating voice control into both digital and physical design s is vast. UI/UX professionals need to familiarize themselves with emerging trends in voice user interface design to enhance their designs significantly.

The preceding discussion provides an excellent starting point. We explored the fundamental concept of VUI design , emphasizing its operational mechanism and the essential components needed for designing voice-enabled devices.

The examples mentioned earlier can also serve as a source of inspiration for aspiring designers, highlighting specific areas where VUI design can propel UI/UX design to the next level.

Explore VUI Innovation! Don’t miss out – contact us now for insights

REQUEST MORE INFORMATION

voice user interface case study

Hello there! I’m passionate about a variety of interests, including playing badminton, diving into video games, indulging in movies and series, and exploring new destinations while listening to my favorite tunes. I thrive on learning, constantly seeking out new knowledge and experiences. My professional journey started in a call center, but I’ve transitioned into freelancing, dedicating myself to continuous improvement and striving to be the best at what I do.

START YOUR CAREER

voice user interface case study

DOWNLOAD YOUR EBOOK

Which is the best Digital jobs for you cover ebook

Submit a Comment

Your email address will not be published. Required fields are marked *

Submit Comment

Privacy Overview

Examples of Natural Voice User Interfaces

Ottomatias Peura

Feb 08, 2021

Reactive voice user user interfaces enable intuitive and efficient experiences that improve key metrics

voice user interface case study

In this post

In this post, we'll go through some use cases and examples for natural voice user interfaces in various domains and user tasks.

Not many apps or websites already employ a voice user interface because of a lack of developer tools for building them. These examples use Speechly Spoken Language Understanding technology for a natural voice UI, enhancing the touch screen user experience with voice functionalities.

  • Form filling with voice
  • Voice in eCommerce and search filtering with voice
  • Adding items from a big inventory, such as grocery eCommerce

Professional applications

Information heavy data input.

  • Voice in VR/AR and gaming
  • Web applications with voice user interfaces
  • Speechly’s speech recognition accuracy

Speechly offers a unique tool for building real-time multimodal voice user interface . Our technology can be applied to any industry or domain to enhance current touch user interfaces with voice functionalities.

Speechly offers a Spoken Language Understanding API that returns user intents and entities in real time for user voice input. This approach enables end users to see the result of their voice commands visually as they speak instead of the traditional smart speaker paradigm that is based on turn-based queries.

Real-time visual feedback is the key to efficient and intuitive user interfaces, because it allows users to multitask. Instead of saying something and waiting for the answer, end users can speak in a stream of consciousness fashion, correcting themselves if needed. On the other hand, the visual feedback encourages users to go on with the voice experience.

We have collected some examples of user tasks that can be solved more effectively with our voice technology. In short, voice user interface works great if:

  • Your users know what they want to achieve
  • Data quality is important
  • User tasks are repetitive

Let's see our examples.

Form filling by voice

Voice is a great solution for information heavy, repetitive tasks such as form filling . Filling forms on a mobile device can be cumbersome because of difficult typing and common usability issues on different screen sizes and mobile browsers.

In our demo, we enhanced an existing HTML form with voice functionalities. The form can be manipulated by using touch or voice simulatenously. End user can use natural language to fill the form and gets instantenous feedback on the form.

By seeing the form, the user knows exactly what kind of questions they need to answer and they can the form in any order and by using any interaction modality.

eCommerce search filtering

Search is one of the most important parts of a eCommerce customer experience. Up to 30% of eCommerce visitors will use search for navigating and a user who doesn't find what they are looking for is a lost customer.

A major share of Google searches are already done by using voice, but very few eCommerce sites offer a similar experience.

Speechly makes it simple to add voice functionalities to eCommerce stores. Again, the user can use natural language to search for products and unlike with traditional categories, voice categories naturally supports synonyms. No matter if the user asks for pants or trousers, they find what they are looking.

It's also important that the user interface updates in real time. This enables user to correct themselves in case of an error and encourages them to go on with the voice experience.

Grocery shopping

Grocery shopping is a special kind of shopping experience, because the user wants to add a lot of familiar products from a large inventory to their shopping cart as easily as possible.

Traditional user experience requires a lot of repeated searches and selections, but voice enables the user to just say out loud the products they want and see them added to their cart. If they need to change a certain product, for example by changing a milk to another brand of milks, they can do it easily by just clicking the product.

Professional applications are a great use case for voice functionalities, because the language used in these settings is accurate and commonly shared by everyone.

Speechly can be used to create efficient user experience for professional applications in many industries and domains. In this example, airline maintenance workers can easily report anomalies and defects in airplane cabin.

You can also read about our offering for warehouse professionals and logistics .

Voice in VR/AR

Virtual reality environments can offer a very immersive experience that can showcase for instance real estate locations easily and accurately, even amidst pandemic situations.

However, the first time user experience in these environments suffer from clunky hand controllers that are unintuitive and hard to use. Learning these controllers take time from the actual experience.

Voice, on the other hand, is a very intuitive and natural way to interact in a virtual reality environment. Speechly created a virtual reality environment with our partner ZOAN that improves the first time user experience significantly.

Speechly can be used to improve form filling when efficiency and data quality is important.

The following demo showcases a CRM use task in which a sales professional can input sales data by using voice. This leads into better data quality and improved data collection.

CRM is a great example of how voice can improve data input. The quality of the data is very important and data input is done in a repetitive way. Similar examples include health apps such as meal tracking and fitness tracking and other professional applications.

Web applications with voice UIs

Unlike most other solutions , Speechly is supported by all modern browsers and can be used to create awesome voice experiences for web.

In our demo application, we created a simple photo editing application that is used solely by using voice. It supports natural language and the user can see the effects being applied to the image in near real time.

Speech recognition accuracy

Speechly is not optimized for pure speech recognition. Our models are configured for a certain use case and we use this configuration to bias the speech recognition model. This helps improve the accuracy.

However, our speech recognition accuracy is still on par with general purpose speech recognition software such as Google Cloud Speech.

In this demo video, Speechly and Google Webspeech API are transcribing the Jobs keynote from the first iPhone release event.

You can try out our general accuracy here . Do note that the ASR accuracy improves significantly when the models are configured for your use case.

Conclusions

Voice GUIs can be used to improve user experience in wide variety of applications and domains. If you want to hear how your application's user experience can be improved with modern voice technologies, submit your email and we'll contact you as soon as possible.

If you are still not convinced, here's what our customers think of working with us. You can also read more about the advantages of voice user interfaces .

About Speechly

Speechly is a YC backed company building tools for speech recognition and natural language understanding. Speechly offers flexible deployment options (cloud, on-premise, and on-device), super accurate custom models for any domain, privacy and scalability for hundreds of thousands of hours of audio.

Latest blog posts

Speechly is joining roblox.

Hannes Heikinheimo

Sep 19, 2023

4 Voice Chat Solutions for Virtual Reality

Voice chat has become an expected feature in virtual reality (VR) experiences. However, there are important factors to consider when picking the best solution to power your experience. This post will compare the pros and cons of the 4 leading VR voice chat solutions to help you make the best selection possible for your game or social experience.

Matt Durgavich

Jul 06, 2023

Speechly Has Received SOC 2 Type II Certification

Speechly has recently received SOC 2 Type II certification. This certification demonstrates Speechly's unwavering commitment to maintaining robust security controls and protecting client data.

Markus Lång

Jun 01, 2023

User Experience Center

Thumbnail

Designing a voice user interface is easier than you may think

Published by darek bittner, same fundamental skills.

Designing a voice user interface requires the same fundamental skills as designing a visual user interface. The design of any human-computer interaction centers around formative user research, rapid prototyping, and regular testing. A voice user interface is simply a new and exciting way to transmit information. The design process remains largely the same, with a few nuances. This blog post will explore some of the major nuances specific to designing voice user interfaces.

Current state of voice interfaces

Most voice user interfaces are applications that augment the capabilities of a preexisting voice assistant. Large tech companies, such as Amazon, Apple, Google, and Microsoft, have built advanced voice assistants (e.g. Alexa, Siri) that interpret speech using natural language processing technology. The companies who build the voice assistants enable third parties to build custom functionality for their voice assistants. For example, a travel booking company could create an app for the Alexa that lets users book a hotel room. Amazon and Microsoft call these applications “skills”, Apple calls them “intents”, and Google calls them “actions”. The applications serve as an alternative way for users to interact with their favorite products and services. The applications are typically limited to lookup and form-input tasks, such as checking availability and booking a hotel room ("are there vacancies tonight at the Sleepy Inn?"). Because voice assistants can only present information sequentially, the applications do not lend well to discovery tasks, such as leisurely browsing a shopping website ("help me shop for a pair of shoes").

Elements of voice design

Amazon presents a useful framework for conceptualizing the transfer of information between a human and a voice assistant. According to Amazon, there are four elements to voice design:

  • Utterance:  The words the user speaks
  • Intent:  The task the user wishes to complete
  • Slot:  Information the app needs to complete the task
  • Prompt:  The response from the voice assistant

The voice assistant must first interpret the words the user speaks (utterance) and match those words to a task that the assistant has been programmed to handle (intent). Once the voice assistant identifies the user’s intent, it must identify and obtain any extra information that is required to complete the task (slot) from the user. To do so, the voice assistant replies in a manner designed to keep the conversation moving forward (prompt). For example, the list below shows the elements of a voice user interface for booking a room at a fictitious hotel:

  • Utterance:  I want to stay at the Sleepy Inn
  • Intent : Reserve a hotel room
  • Slots:  Check-in date, length of stay, number of guests, room type
  • Prompt : Ok, when would you like to visit the Sleepy Inn?

Understanding words vs understanding meaning

Natural language processing (NLP) technologies have become very good at determining the words a user speaks, but not what the user means to say. Voice designers are responsible for writing a list of all the possible utterances a user might speak. The voice assistant uses NLP to translate the spoken utterance into a string of text, and queries the text against the dataset of possible utterances. The comprehensive list of utterances is built using preliminary user research and refined over time using data generated by the voice assistant. This elaborate process affords a voice assistant the illusion of intelligence, however, the assistant is really just playing back pre-recorded dialogs written by the design team.

UX design process for voice

According to the Google conversation design guidelines, a voice design project should produce at least two deliverables: a series of sample dialogs and a diagram of the conversation flow. Sample dialogues often take the form of a script or storyboard. Flow charts are useful for documenting the conversation flow.

A successful voice design process will closely mirror any other user experience design process. For example, the designer should focus on a single persona and use case at a time. Additionally, it’s helpful to start at a low fidelity, test frequently, and refine the design over time. In short, good design is good design, and a good designer should not have to change their process much to adapt to voice.

Prototyping a voice user interface

  • Begin by listening and transcribing human-to-human conversations similar to your interface. For example, if designing a voice user interface for booking a hotel room, listen to people converse with travel agents.
  • Identify the scope of your interface’s functionality. Keep the scope simple at first. For example, my hotel app can book a room, but it cannot provide information special events happening at the hotel.
  • Write the sample dialogs. Begin by writing the shortest possible route to completion, as if the user provided all the necessary information. 
  • Build complexities into the conversation logic, such as dialogs for error handling and dialogs specifically for first-time users.
  • Using a flow chart, document the conversation logic and relationships between each script.

example of a conversation design script

Testing the prototype

Like any other kind of design, it is important to test voice user interfaces early and often. Test your dialogs with real users as much as you possibly can. One helpful tip for testing is to use a “Wizard of Oz” study design , where the user interacts with a fake device, and the researcher sits “behind the curtain” following the script of the voice assistant. For added authenticity, the researcher should use a text to speech application to simulate a computerized voice.

  • Chances are if you’re reading this blog, you already have the skills required to design a voice user interface.
  • A voice assistant is not intelligent and cannot comprehend meaning. Rather, a voice assistant is just a very long list of possible conversation paths coupled with some logic.
  • Using a voice design framework helps build structure into the design process. Always write the optimal script first, and build in complexity later.

Helpful links

  • A thoughtful reflection by the BBC Voice + AI team on designing voice apps for children:  https://www.bbc.co.uk/gel/guidelines/how-to-design-a-voice-experience
  • Observations on the current state of voice UX by Jolina Landicho:  https://www.uxbooth.com/articles/impact-of-voice-in-ux-design/
  • Amazon has published a series of design guides for those who wish to build apps on the Alexa platform. I found this one particularly useful:  https://developer.amazon.com/docs/alexa-design/script.html#write-the-shortest-route-to-completion
  • A guide published by Google on writing sample dialogs:  https://designguidelines.withgoogle.com/conversation/conversation-design-process/write-sample-dialogs.html#

Darek Bittner

Darek is a Research Associate at the User Experience Center. He is currently pursuing a Master of Science degree in Human Factors in Information Design from Bentley University.

darekbittner.com  |  LinkedIn

Let's start a conversation

Get in touch to learn more about Bentley UX consulting services and how we can help your organization.

Other UXC Blogs You Might Like

The UX of GPS

The UX of GPS

newspaper

UX, Public Trust, and Bots

< back to all uxc blogs >.

  • A Bentley Education
  • President's Office
  • Board of Trustees
  • President's Cabinet
  • Mission and Values
  • Diversity and Inclusion
  • Strategic Plan
  • Tuition and Financial Aid
  • Special Programs
  • Connect with a Counselor
  • Why Bentley
  • Tuition and Aid
  • MBA and MS Programs
  • Connect with Us
  • Undergraduate Programs
  • Graduate Programs
  • PhD Programs
  • Research, Centers and Labs
  • Executive Education
  • Campus Life
  • Diversity, Equity and Inclusion
  • Disability Services
  • Housing and Dining
  • Student Health
  • Athletics and Recreation
  • Career Development
  • Leadership Groups
  • FAQs and Benefits
  • Support Bentley

Helpful Links

ACM Digital Library home

  • Advanced Search

Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker

University of Turku, Finland

  • 62 citation

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

New Citation Alert!

Please log in to your account

  • Publisher Site

MobileHCI '18: Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct

ACM Digital Library

Recently, commercial Voice User Interfaces (VUIs) have been introduced to the market (e.g. Amazon Echo and Google Home). Although they have drawn much attention from users, little is known about their usability, user experiences, and usefulness. In this study, we conducted a web-based survey to investigate usability, user experiences, and usefulness of the Google Home smart speaker. A total of 114 users, who are active in a social-media based interest group, participated in the study. The findings show that the Google Home is usable and user-friendly for users, and shows the potential for international users. Based on the users' feedback, we identified the challenges encountered by the participants. The findings from this study can be insightful for researchers and developers to take into account for future research in VUI.

Recommendations

Patterns for how users overcome obstacles in voice user interfaces.

Voice User Interfaces (VUIs) are growing in popularity. However, even the most current VUIs regularly cause frustration for their users. Very few studies exist on what people do to overcome VUI problems they encounter, or how VUIs can be designed to aid ...

The Impact of User Characteristics and Preferences on Performance with an Unfamiliar Voice User Interface

Voice User Interfaces (VUIs) are increasing in popularity. However, their invisible nature with no or limited visuals makes it difficult for users to interact with unfamiliar VUIs. We analyze the impact of user characteristics and preferences on how ...

Towards a human-computer interaction model for voice user interfaces

The user interaction with computer systems has evolved over the years, from Command Line Interfaces (CLIs), Graphics User Interfaces (GUIs), Natural User Interfaces (NUIs) and actually Voice User Interfaces (VUIs). The use of VUIs is increasingly common,...

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

  • Information
  • Contributors

Published in

cover image ACM Conferences

Copyright © 2018 Owner/Author

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

Association for Computing Machinery

New York, NY, United States

Publication History

  • Published: 3 September 2018

Check for updates

Author tags.

  • user experience
  • voice user interfaces

Acceptance Rates

Funding sources, other metrics.

  • Bibliometrics
  • Citations 62

Article Metrics

  • 62 Total Citations View Citations
  • 3,380 Total Downloads
  • Downloads (Last 12 months) 332
  • Downloads (Last 6 weeks) 40

View or Download as a PDF file.

View online with eReader.

Digital Edition

View this article in digital edition.

Share this Publication link

https://dl.acm.org/doi/abs/10.1145/3236112.3236130

Share on Social Media

  • 0 References

Export Citations

  • Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
  • Download citation
  • Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Sensors (Basel)

Logo of sensors

The Investigation of Adoption of Voice-User Interface (VUI) in Smart Home Systems among Chinese Older Adults

1 College of Literature and Journalism, Sichuan University, Chengdu 610064, China; [email protected]

2 Digital Convergence Laboratory of Chinese Cultural Inheritance and Global Communication, Sichuan University, Chengdu 610207, China

3 School of Construction Machinery, Chang’an University, Xi’an 716604, China; nc.ude.dhc@upnaygnay

Peiyao Cheng

4 Design Department, School of Social Science and Humanity, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China

Associated Data

The data used in this study are available upon request from the corresponding author.

Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity in recent years. VUI has been integrated into various devices in the context of the smart home system. In comparison with traditional interaction methods, VUI provides multiple benefits. VUI allows for hands-free and eyes-free interaction. It also enables users to perform multiple tasks while interacting. Moreover, as VUI is highly similar to a natural conversation in daily lives, it is intuitive to learn. The advantages provided by VUI are particularly beneficial to older adults, who suffer from decreases in physical and cognitive abilities, which hinder their interaction with electronic devices through traditional methods. However, the factors that influence older adults’ adoption of VUI remain unknown. This study addresses this research gap by proposing a conceptual model. On the basis of the technology adoption model (TAM) and the senior technology adoption model (STAM), this study considers the characteristic of VUI and the characteristic of older adults through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). A survey was designed and conducted. A total of 420 Chinese older adults participated in this survey, and they were current or potential users of VUI. Through structural equation modeling, data were analyzed. Results showed a good fit with the proposed conceptual model. Path analysis revealed that three factors determine Chinese older adults’ adoption of VUI: perceived usefulness, perceived ease of use, and trust. Aging-related characteristics also influence older adults’ adoption of VUI, but they are mediated by perceived usefulness, perceived ease of use, and trust. Specifically, mobile self-efficacy is demonstrated to positively influence trust and perceived ease of use but negatively influence perceived usefulness. Self-actualization exhibits positive influences on perceived usefulness and perceived ease of use. Technology anxiety only exerts influence on perceived ease of use in a marginal way. No significant influences of perceived physical conditions were found. This study extends the TAM and STAM by incorporating additional variables to explain Chinese older adults’ adoption of VUI. These results also provide valuable implications for developing suitable VUI for older adults as well as planning actionable communication strategies for promoting VUI among Chinese older adults.

1. Introduction

Supported by advanced voice interaction technology, interacting with products through speech is no longer a scenario in fiction. It now happens in our daily lives. When planning for the next day, we can simply ask a smart speaker what the weather will be. It will search for weather information automatically and respond to us by speech with the temperature and probability of rain tomorrow. It can even provide recommendations for bringing an umbrella or wearing a coat. VUI has gained its popularity in this decade along with the dramatic improvements of relevant technologies. With the improvements of speech recognition technology, VUI-embedded systems can recognize voice commands accurately. For example, Google has announced that their speech recognition accuracy rate has reached 95% [ 1 ]. Natural language processing (NLP) has also been largely improved and it enables VUI-integrated systems to be capable of interpreting the intended meanings of users. As a result, an increasing number of electronic devices integrate voice-user interfaces (VUI), such as virtual assistant Siri developed by Apple, smart speaker Google Home, and Echo launched by Amazon.

Different from traditional interaction ways that require input and output devices [ 2 ], VUI allows users to interact with electronic devices through speech. Users can command electronic devices by talking to them, similar to a natural conversation in daily lives. Such hands-free interaction provides huge benefits. As input devices such as keyboard and mouse are no longer necessary, users can interact with devices in a shorter time. The interaction speed can be largely improved [ 3 ]. Moreover, as VUI allow users’ eyes and hands to be free while interacting, users can complete multiple tasks [ 4 ]. For instance, users can search for information through VUI while driving a car, which can improve driving safety [ 5 ]. Because of the advantages of voice interaction, VUI has been adopted at a fast rate [ 6 ]. In particular, VUI has become an important modality for smart home systems. Various smart home devices integrate VUI, such as smart speakers, cleaning robots, and smart television. The integration of VUI in smart home systems largely improves interaction efficiency. For instance, instead of using a remote controller, users can directly talk to a smart television. In comparison with using a remote controller through pressing buttons, voice commands largely reduce interaction time and improve interaction accuracy.

VUI can be particularly beneficial for older adults in smart home contexts. As older adults often suffer from the gradual loss of physical and cognitive capabilities, interacting with devices through traditional graphical user interfaces (GUI) can be difficult [ 7 ]. Older adults may have problems reading texts on the screens. They may also fail to type or click due to shaky hands. Instead, VUI can be a promising solution. As speech is a natural method of interpersonal communication, VUI can be much easier for older adults to learn and operate [ 8 , 9 , 10 ]. However, given the potential benefits brought by VUI, how older adults perceive VUI remains unclear. Fragmented evidence shows that older adults are open to VUI embedded in smart speakers [ 8 , 9 , 11 ] but they also show concerns in adopting them [ 9 ]. Therefore, it is necessary to understand what factors drive older adults’ adoption of VUI.

The size of the older population is increasing worldwide every year. This phenomenon is even more serious in China. The number of older adults, whose ages are above 60, has reached 2.6 million, occupying 18.7% of the general population in China [ 12 ]. The percentage of older adults has increased by 5.44% in comparison with 2010. These changes will make profound consequences for Chinese society. Older adults are more easily suffering from chronic disease [ 13 ]. Due to the decline of physical and cognitive abilities, older adults may encounter difficulties in daily lives and thus become less independent, which burden families and societies. The adoption of new technology becomes a promising way to improve older adults’ well-being, such as maintaining independence, improving their safety, and being active in their social networks [ 14 , 15 ].

However, although adopting technologies can assist older adults’ daily livers, they often show resistance to adopting new technologies in comparison with young people [ 16 , 17 , 18 , 19 , 20 ]. Such resistances can become even stronger with the increase in ages [ 21 , 22 ]. Additionally, there is no exception for VUI adoption. Thus, it is crucial to understand how Chinese older adults perceive VUI and what factors influence their adoption of VUI. Gaining these insights can help companies to develop or adapt the current VUI in order to fulfill the needs of the senior segment [ 23 ].

This study aims to fill in this gap. Specifically, this study investigates the factors that influence Chinese older adults’ adoption of VUI. Through literature review, this study firstly proposes a conceptual framework with eight variables that determine Chinese older adults’ adoption of VUI. Next, a survey was designed and conducted with 420 valid participants. Data analyses were conducted using structural equation modeling.

2. Literature Review

This research aims to investigate older adults’ adoption intention of VUI in China. To investigate users’ adoption of VUI, the theoretical models related to technology adoption are reviewed. On the basis of current technology adoption models, the characteristic of VUI is specifically considered. As VUI can threaten users’ privacy, users have to trust VUI systems in order to use them effectively. Thus, we include trust as an additional factor and review relevant theories. Furthermore, as we especially target older adults, we integrate aging-related characteristics into the framework. The literature related to aging-related characteristics is reviewed.

2.1. The Theoretical Models Related to Technology Adoption

To understand the driving factors of users’ adoption of technologies, several theoretical frameworks have been proposed. Roger [ 24 ] has proposed the diffusion of innovation model, which posits five factors that influence diffusion: complexity, tribality, observability, compatibility, and relative advantages. Davis [ 25 ] proposed the technology acceptance model (TAM), which suggests that users’ adoption intention of technology is mainly influenced by the perceived usefulness and perceived ease of use of the technology. The features of the technology itself largely determine users’ perceived usefulness and perceived ease of use. Some subsequent models have also been proposed by incorporating social norms (TAM2) [ 26 ] and enjoyments (TAM3) [ 27 ]. Extending the TAM, Venkatesh et al. [ 28 ] further established the unified theory of acceptance and use of technology (UTAUT), which pointed out that technology adoption is primarily influenced by effort expectancy, performance expectancy, social influence, and facilitating conditions. The diffusion of innovation model has been recommended for use in commercial contexts and for predicting organizational adoption of innovation [ 29 ]. TAM and UTAUT are considered to be more proper for explaining individuals’ adoption of technology [ 30 , 31 ].

Although TAM and UTAUT are robust and powerful models to predict users’ adoption of technology, the explanatory powers differ from contexts [ 32 , 33 ]. TAM and UTAUT are also found to carry some limitations to explain users’ adoption of new technologies [ 34 ]. In order to improve explanatory powers of explaining users’ adoption of technology in specific contexts, new constructs have been identified and included in TAM and UTAUT [ 35 , 36 , 37 ]. For instance, Wang, Tao, Yu and Qu [ 38 ] extended the UTAUT by including the additional factors of technology characteristics and task characteristics to explain Chinese users’ acceptance of healthcare wearable devices. To understand users’ adoption of digital voice assistants, Fernandes and Oliveira [ 39 ] extended the TAM by considering the influence of trust, social interactivity and social presence. Therefore, although TAM is a robust model to explain users’ adoption of technology adoption, it needs to be adjusted depending on specific contexts.

2.2. Trust and Technology Adoption

To understand users’ adoption of information technology-related applications, previous studies pointed out the uncertainty of the IT environment [ 40 ]. Thus, it is necessary to incorporate the construct of trust into the extended versions of TAM and UTAUT [ 41 , 42 , 43 , 44 ]. Trust is a multidimensional concept [ 45 ]. Mayer et al. [ 46 ] proposed the three dimensions of trusts: (1) competence, which indicates the skills and capabilities that allow a system to perform effectively; (2) benevolence, which refers to one’s willingness to believe that another party will not make use of its vulnerability; and (3) integrity, which is defined as one’s subjective evaluation of the appropriateness of another party’s behavior. When used in different contexts, the construct of trust can be interpreted in different ways. In the contexts of users’ adoption of new technology, trust mainly captures the ability dimension and it refers to individuals’ subjective evaluation of the reliability, functionality and helpfulness of the technology [ 47 ]. In e-commerce contexts, where transactions occurred, trust reflects the dimension of benevolence and integrity. Trust is defined as one’s belief that the e-commerce systems will behave responsibly [ 46 , 48 ].

In the context of users’ adoption of VUI, the dimension of benevolence and integrity of trust can be more prominent. Specifically, while using VUI, users have to allow systems to record and track their speech in order to improve VUI system accuracy [ 49 ]. The VUI system records the users’ voice command as well as the background sound in order to provide immediate feedback [ 50 ]. Due to this, users may feel risky or even threatened while using VUI systems to some extent. In this case, users’ trust reflects their perception of VUI systems’ willingness to behave in a socially responsible way: the VUI systems will not leak or misuse their personal information, and their personal information is protected by the VUI systems. Previous research has demonstrated the significant influence of trust on users’ adoption in various contexts, such as e-commerce [ 51 ], 5G technology [ 52 ], Internet banking [ 53 ], digital voice assistants [ 39 ] and young people’s adoption of VUI [ 40 ]. Therefore, to understand older adults’ adoption of VUI, this study includes the construct of trust.

2.3. Older Adults’ Technology Adoption

To explain the technology adoption of a specific user group, previous studies found that the TAM and UTAUT may be insufficient [ 54 ]. Specifically, the models used for young users can be insufficient for older adults because the two groups value different facets of technology. Older adults show resistance to adopting new technologies. Such resistances come from different sources, including physical and psychological factors. A number of studies have demonstrated that older adults can encounter more difficulties while adopting technologies due to the decline of physical capabilities, such as the gradual loss of sensorial capabilities of vision and hearing [ 55 ] and dexterity problems which cause difficulties in typing [ 56 ]. Psychological factors can also cause problems for older adults to adopt new technologies [ 57 , 58 , 59 ]. For instance, in comparison with young people, older adults are found to suffer more from anxiety when adopting technologies.

In order to gain a comprehensive understanding of older adults’ adoption of technology, the senior technology acceptance model (STAM) has been proposed [ 60 ], which highlighted the importance of aging-related characteristics. Specifically, STAM extends TAM through incorporating self-efficacy, technology anxiety and facilitating conditions. Results demonstrated the significant influences of aging-related characteristics on older adults’ adoption. Similarly, to understand the factors that influence older adults’ adoption of mobile health in China, Deng, Mo and Liu [ 61 ] also considered the aging-related characteristic, including perceived physical condition, technology anxiety, self-actualization needs and resistance to change. Therefore, it is necessary to include aging-related characteristics to better understand older adults’ adoption of VUI.

In summary, this study aims to understand the factors that influence Chinese older adults’ adoption of VUI. Although user adoption of VUI has been investigated [ 40 ], it targeted the young generation in western contexts. Limited research attention has been paid to understanding older adults’ adoption of VUI in Chinese contexts. This study aims to fill in this gap. To do so, this study starts from TAM and considers the uniqueness of VUI by including the factor of trust. Next, by referring to the STAM and other studies related to aging characteristics, this study integrates four ageing-related characteristics (i.e., mobile self-efficacy, technology anxiety, self-actualization, physical health condition). The research framework is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g001.jpg

The conceptual framework of this study.

3. Hypothesis Development

3.1. vui and technology acceptance model.

According to the TAM [ 25 ], users’ adoption of new technology is predicted by perceived ease of use (PEOU) and perceived usefulness. When encountering a new technological application, users tend to subjectively assess the efforts required for using it (PEOU) and the benefits gained from using it. Extensive research demonstrates that users’ intention of adopting new technological applications is positively related to PU and PEOU [ 62 , 63 , 64 , 65 ]. PU largely results from PEOU. In other words, when users perceive a technological application as difficult to use, their perception of users can also be largely discounted. TAM considers technology characteristics as external variables that influence PU and PEOU.

In the context of users’ adoption of VUI, PU refers to the utilitarian benefits of using VUI-driven systems, whereas PEOU reflects users’ perceived difficulty of learning to use VUI. UI directly measures the extent to which users’ intention of using VUI. VUI allows users to complete various interaction tasks by voice controls rather than visual interface controls [ 66 ]. Thus, VUI provides multiple benefits, such as remote interaction and multiple task interaction. Moreover, compared with the traditional user interface, VUI enables users to interact with smart devices in an intuitive way: talking to the smart devices as if talking to a real person. Therefore, considering the benefits brought by VUI, we expect that the utilitarian benefits and convenience will positively influence users’ usage intention. In addition, as interaction with devices through VUI is highly similar to an interpersonal conversation in daily lives, it should be intuitive to learn. Effortless learning can further improve users’ perceptions of usefulness. Previous studies have demonstrated the positive relationships between PU, PEOU and UI [ 67 ] as well as the positive links between PEOU and PU [ 68 ]. The following hypotheses are given:

Perceived usefulness positively influences behavioral intention .

Perceived ease of use positively influences behavioral intention .

Perceived ease of use positively influences perceived usefulness .

3.2. Trust and Technology Adoption

As discussed earlier, trust is an essential factor in influencing users’ adoption of technology. VUI has to record users’ voice command and their daily speech to be responsive. In other words, users have to share their speech in order to use VUI effectively and efficiently. Users may feel more risks associated with using VUI than using a traditional user interface. In this case, trust means that users believe that their personal information will be protected during the usage of VUI [ 40 ]. In users’ adoption of VUI, trust helps alleviate users’ concern that their personal information has been shared and might be misused [ 40 ]. Therefore, we expect that trust is positively related to older adults’ adoption intention of VUI.

Trust positively influences behavioral intention .

3.3. Perceived Physical Conditions

To better understand older adults’ adoption of VUI, it is necessary to consider the changes caused by aging [ 60 , 61 ]. Specifically, in this study, four aspects related to aging were considered: perceived physical conditions, mobile self-efficacy, technology anxiety and self-actualization.

3.3.1. Perceived Physical Conditions

Perceived physical conditions refer to ones’ own belief of the capabilities of vision, hearing and motion in daily lives [ 69 ]. With the increase in age, older adults suffer from the gradual loss of sensory and motor systems [ 70 , 71 ]. The decline of physical health conditions hinders their effective usage of ICT systems [ 72 ]. Past research has demonstrated the negative relationships between older adults’ perceived health conditions and their perceptions of technology and their intention of technology adoption. For instance, Li, Ma, Chan and Man [ 73 ] found that PPC negatively relates to older adults’ perception of the usefulness of health monitoring devices, which in turn lowers their usage intention. PPC is also found to be positively related to perceived ease of use of health informatics systems, which further facilitates older adults’ adoption intention [ 74 ]. Ryu, Kim and Lee [ 64 ] found that PPC leads to the lower intention of participants in video UGC services.

In order to use VUI effectively, users need to have acceptable health conditions, including visual, auditory, and motion ability. Physical disabilities, such as hearing or speaking problems, can become obstacles for older adults’ effective usage of VUI. The current study targets Chinese older adults who are above 55 years old. These older adults start to experience a decline in physical health conditions, which can possibly influence their perceptions of VUI. Therefore, we expect positive relationships between PPC and PU, PEOU.

Perceived physical conditions positively influence perceived usefulness .

Perceived physical conditions positively influence perceived ease of use .

3.3.2. Mobile Self-Efficacy

Mobile self-efficacy refers to one’s subjective evaluation of his/her capability to use mobile devices [ 75 ]. The UTAUT includes the construct of self-efficacy as a factor to influence PEOU, which further influences adoption intention [ 31 ]. Prior research reported that the lack of capacity is one of the difficulties encountered by older adults when learning to use computers [ 76 ]. The higher self-efficacy indicates that users have more expertise and abilities in interacting with mobile devices. Self-efficacy is found to be positively related to technology usage [ 77 , 78 ] and older adults’ perception and adoption of geotechnology [ 60 ].

In terms of the influence of self-efficacy on users’ adoption of VUI, higher self-efficacy brings about among young people [ 40 , 79 ]. However, high self-efficacy could strengthen users’ attachment to the traditional interaction methods, leading to their resistance to new interaction methods, especially for older adults. In fact, Deng et al. [ 61 ] found that older adults often exhibit resistance to change, which further hinders their adoption intention of health information systems. In terms of Chinese older adults’ adoption of VUI in this study, adopting VUI indicates that users need to change their habits, invest considerable learning efforts and spend certain switch costs. For older adults who have high level of mobile efficacy, it becomes even more difficult because of the higher sunk costs, which makes them show more serious resistance to VUI, leading to lower perceptions and trust. The following hypotheses are posited:

Mobile self-efficacy negatively influences perceived usefulness .

Mobile self-efficacy positively influences perceived ease of use .

Mobile self-efficacy positively influences trust .

3.3.3. Technology Anxiety

Technology anxiety mainly refers to the feeling of discomfort that people experience when using technology [ 80 ]. It captures the negative emotions while using technologies. According to UTAUT, technology anxiety hinders users’ adoption intention through PEOU [ 31 ]. Rendering with the negative emotions, users easily perceive technologies negatively and show resistance to adopting new technologies [ 60 , 64 , 81 ]. For instance, in the context of using computers, prior research found that technology anxiety makes users fear using computers and making mistakes, leading to fewer possibilities of using computers [ 82 ].

In the contexts of older adults’ adoption of VUI investigated in this study, technology anxiety should also have negative influences on their perceptions and trust in VUI. Specifically, although users, in general, may experience technology anxiety to some extent, older adults suffer from it more seriously [ 83 , 84 , 85 ]. The negative influences of technology anxiety have been found in various contexts, such as older adults’ PU and PEOU of wearable warming systems [ 86 ], PEOU of geotechnology [ 60 ], adoption intention of mobile health services [ 61 ]. Consistent with this line of research, this study hypothesized the similar negative influences of technology anxiety on users’ perception and trust of VUI.

Technology anxiety negatively influences perceived usefulness .

Technology anxiety negatively influences perceived ease of use .

Technology anxiety negatively influences trust .

3.3.4. Self-Actualization Need

Maslow [ 87 ] also highlights the need for self-actualization is the highest level of a person’s need. Self-actualization relates to people’s sense of satisfaction, desire for personal growth, and pursuit of actualization personal potential [ 72 ]. To pursue self-actualization, people need to be tolerant of new changes, a new phenomenon, and new technologies. People with higher self-actualization needs tend to be more open-minded. They seem to enjoy new adventures through acquiring new skills and making new changes [ 88 ]. They would consider using new technologies as an opportunity for fulfilling their self-actualization needs.

In terms of the adoption of VUI among Chinese older adults, self-actualization could serve as a facilitator for their adoption of VUI. The self-actualization need is not only important for early adults but also for older adults. According to Erikson [ 89 ], a sense of fulfillment is the ultimate purpose that a person pursues to develop in the later stage in life. Thus, driven by the intrinsic motivation of self-actualization, older adults could view adoption VUI as a chance for new adventures. Previous studies found that self-actualization positively relates to older adults’ adoption of e-government services [ 72 ] and wearable health technology [ 90 ]. Therefore, in the context of older adults’ adoption of VUI, similar effects were expected.

Self-actualization need positively influences perceived usefulness .

Self-actualization need positively influences perceived ease of use .

4. Research Methods

4.1. sampling and procedure.

To test the proposed conceptual framework, a survey was designed and conducted. A web-based questionnaire was adopted through the professional online platform of ePanel ( http://www.research.epanel.cn/ , accessed on 8 December 2021). Although online sampling may carry some limitations, it is a valid way for the research aim in this study. The online sampling method was considered as a proper and valid way for data collection in this study because the connection to the Internet and experience with smart devices are required for effective usage of VUI. In other words, users’ experience with the Internet and digital devices is a precondition for VUI adoption. In fact, previous studies have widely used online sampling for investigating users’ adoption of smart devices, such as healthcare wearable devices [ 91 ], smart speakers [ 34 ], and smartwatches [ 92 ].

In terms of participants, participants were included based on two criteria: age and experience with smart devices, such as smartphones, tablets, smartwatches, and smart speakers. As we target older adults, we collected participants who are older than 55 years old, when people’s cognitive and physical capabilities start to decline [ 93 ]. The experience with smart devices was also used as a criterion for selecting participants because it is required by effective usage of VUI. If participants had no experience with smart devices, they would have few chances to use VUI.

Participants were first welcomed to this survey and then filled in the consent. Subsequently, participants were asked to fill in their age and experience with smart devices. These two questions served as screening questions. Participants were allowed to continue this survey if they were older than 55 years old and they had experience with at least one smart device. Next, as participants might be unfamiliar with voice interaction technology, participants were presented with a short introduction video, which briefly exhibited the benefits, usage procedures, and usage scenarios of VUI. To specify, this particular VUI was specially designed by a professional interaction designer. The scenario included using VUI to control smart speakers, smartphones and smart televisions. The video was around 90s. After watching this video, participants were asked to indicate their experience with VUI, perceptions and adoption intention of VUI based on a series of statements. Finally, they were asked about their personal information, including demographic information, their perceived physical condition and their psychological characteristics (see Table 1 in the next section).

Constructs and measurements.

ConstructMeasurement ItemReferences
Behavior Intention (BI)BI1: I predict I would use voice interaction in my smartphone to conduct tasks.[ ]
BI2: In the future, I will often use voice interaction to manage my smartphone.
Perceived Usefulness (PU)PU1: I think that using a voice interface increases productivity.[ , ]
PU2: I think that a voice interface is useful.
PU3: Using a voice interface would make my life convenient.
Perceived Ease of Use (PEOU)PEOU1: It would be easy for me to become skillful at using a voice interface.[ ]
PEOU2: It would be easy for me to use voice interaction in the way I like to.
PEOU3: Leaning to use voice interaction is entirely within my capability.
Trust (TRU)TRU1: I trust that my personal information will not be used for any other purpose. [ ]
TRU2: I believe that my personal information is protected.
TRU3: I am assured that my information is secure.
Perceived Physical Conditions (PPC)PPC1: How is your hearing? [ ]
PPC2: How is your vision?
PPC3: How is your mobility ability?
Mobile Self-Efficacy (SE)SE1: I am fluent in the use of a mobile device.[ ]
SE2: I can figure out almost any mobile application with a minimum of effort.
SE3: I feel I am able to use the mobile internet to browse the world wide web.
Technology Anxiety (TA)TA1: Using voice interaction would make me very nervous. [ ]
TA2: Using voice interaction would make me worried.
TA3: Using voice interaction would make me feel uncomfortable.
TA4: Using voice interaction would make me feel uneasily unconfused.
Self-actualization needs (SA) SA1: Learning to use voice interaction increases my feeling of self-fulfillment.[ ]
SA2: Learning to use voice interaction gives me a feeling of accomplishment.

4.2. Measurement

Through the extensive review of current studies, a questionnaire was designed, which included three parts: (1) participants’ perception of voice interaction technology, (2) aging-related characteristics and (3) participants’ demographic information. The measures related to participants’ perception of voice interaction and aging characteristics can be found in Table 1 . These measures were based on or adapted from existing validated measures. Participants were asked to indicate their opinions of measures based on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree). The human protocols used in this work were evaluated and approved by Sichuan University (YJ202203).

4.3. Data Collection

Participants were collected through an online survey. In total, 420 participants were collected (Mean age = 59.67, 50% male). Table 2 showed detailed descriptions of this sample. We aim to cover current and potential users of VUI, which is a commonly used and valid way to investigate users’ adoption intention of specific technologies [ 91 , 98 ].Thus, we did not select participants based on their experience with VUI. We selected participants based on their experience with smart devices, which is a precondition for users’ adoption of VUI. Among the participants who have experience with smart devices, they have experience with VUI more or less. In this way, we capture both current and potential users of VUI. Participants’ experience with VUI can be found in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g002.jpg

Frequency table of participants’ experience with VUI.

Descriptive analysis of participants.

Characteristics FrequencyPercentage (%)
Age55–5921651.4%
60–6416038.1%
65–69307%
Above 70143.5%
GenderMale21050%
Female21050%
EducationElementary71.7%
Junior High School5112.1%
High School13431.9%
College/university 21651.4%
Postgraduate122.9%
IncomeBelow 50 k RMB6315%
50 k–10 k RMB11727.9%
10 k–15 k RMB8821%
15 k–20 k RMB8821%
20 k–30 k RMB5212.4%
Above 30 k RMB122.9%

The initial descriptive analyses and reliability analyses were conducted by using SPSS 25.0. Next, the data were analyzed by examining the measurement model and the structural model, respectively [ 99 ]. AMOS 24.0 was used to conduct the confirmative factor analysis [ 29 ] to assess the measurement model and perform path analysis.

5.1. Reliability and Validity

The measurement model confirms a goodness of fit: x 2 /df = 1.979, GFI = 0.926, SRMR = 0.040, RMSEA = 0.048, CFI = 0.965, NFI = 0.931. Cronbach’s alpha was calculated for reliability tests. Results revealed satisfactory reliability of all the measures with a threshold of 0.7, except for the measure of perceived physical conditions, which is greater than 0.6 [ 100 ]. The AVE values of all the measures are above 0.5, except for PPC. Next, CFA was conducted to assess the validity, including unidimensionality validity, convergent validity and discriminant validity. Results showed that these measures exhibited adequate validity (see Table 3 and Table 4 for details). Specifically, the standardized loadings of all the items are above 0.5. Although most of the average variance extracted (AVE) is above the threshold of 0.5, AVE for PPC is slightly lower than 0.5. Considering that the composite reliability for PPC is higher than 0.6, the convergent validity of the construct is still adequate [ 101 ]. As for discriminant validity, the square root of AVE should be higher than the inter-construct correlation in the model; however, some square roots are lower than the correlations (e.g., the relationship among BI, PEOU and PU). Thus, it is necessary to further examine whether the current model achieved satisfactory discriminant validity. The latest research suggests HTMT might be a powerful criterion for discriminant validity assessment [ 102 ]. The results of the HTMT (see Table 5 ) test showed all the values are below the threshold of “0.9”, suggesting it achieved an acceptable discriminant validity. In addition, the composite reliabilities of all the constructs were above 0.7. Taken together, the measures used in this study showed satisfactory validity [ 103 , 104 ].

Reliability and unidimensionality.

ConstructVariablesCronbach’s
Alpha
Standardized Loading C.R/t-Value.AVEComposite
Reliability
BIBI10.7860.813-0.6490.787
BI2 0.79817.376
PUPU10.7550.742-0.5090.756
PU2 0.70513.035
PU3 0.69212.814
PEOUPEOU10.7470.733-0.5470.783
PEOU2 0.74712.648
PEOU3 0.73912.549
TRUTRU10.8750.875-0.7080.879
TRU2 0.76818.5
TRU3 0.87722.132
PPCPPC10.6410.753-0.4440.700
PPC2 0.7036.286
PPC3 0.5215.637
SESE10.8330.752-0.6330.838
SE2 0.82816.006
SE3 0.80615.687
TATA10.9500.926-0.8260.950
TA2 0.92132.861
TA3 0.88329.25
TA4 0.90531.302
SASA10.7630.733-0.6230.767
SA2 0.84214.013

Constructs correlation matrix.

PPCSETASATRUPUPEOUBI
0.499
−0.0090.153
0.3600.411−0.086
0.3780.5630.1050.610
0.3400.347−0.1770.7340.439
0.4410.697−0.0570.6580.6720.683
0.4050.523−0.1150.7640.6100.8530.840

The HTMT Analysis of discriminate validity.

PPCSETASATRUPUPEOUBI
0.556
0.0150.153
0.4050.4030.081
0.4150.5750.0960.644
0.3970.3400.1770.7200.437
0.5200.7260.0580.6930.7290.712
0.4440.5250.1150.7570.6220.8520.881

5.2. Structural Model Assessment

Structural equation modeling was used to analyze the proposed research model with AMOS 24.0. The results revealed absolute fit indices and incremental fit indices (see Table 6 ). All the values are greater than the suggested values [ 105 ], which indicates that the data has a good fit with the proposed model and the data is adequate for further path analysis.

Goodness-of-fit test.

CategoryMeasureAcceptable ValuesValue
Absolute fit indicesChi-square/d.f.1–52.248
GFI0.90 or above0.913
SRMR0.08 or below [ ]0.065
RMSEA0.08 or below [ ]0.055
NFI0.90 or above0.920
Incremental fit indicesIFI0.90 or above0.954
TLI0.90 or above0.942
CFI0.90 or above0.953

Note: GFI = goodness-of-fit index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; NFI = normed fit index; IFI = incremental fit index; TLI = Tucker–Lewis index; CFI = comparative fit index.

5.3. Hypotheses Testing and Path Analysis

Path analysis was conducted through SEM to examine the relationships among variables. The results of path analyses can be found in Figure 3 and Table 7 . Results revealed that ten out of fourteen hypotheses were supported or partially supported. Behavior intention was predicted by perceived usefulness, perceived ease of use and trust, with a variance of 64.34%. Perceived usefulness was the most determinate variable, followed by perceived ease of use and perceived trust. Moreover, perceived usefulness was predicted by perceived ease of use, mobile self-efficacy and self-actualization, with a variance of 51.09%. Perceived ease of use was explained with the variance of 30.07% by self-efficacy, technology anxiety and self-actualization. Perceived trust was influenced by self-efficacy with a variance of 54.02%.

An external file that holds a picture, illustration, etc.
Object name is sensors-22-01614-g003.jpg

Results of SEM. Note: * p < 0.1; ** p < 0.05; *** p < 0.01.

Results of hypotheses testing.

Path DirectionPath Coefficients -ValueResults
H1-1PU → BI0.655***Supported
H1-2PEOU → BI0.458***Supported
H1-3PEOU → PU0.595***Supported
H2TRU → BI0.0680.028 **Supported
H3-1PPC → PEOU0.0150.209Not supported
H3-2PPC → PU0.0780.277Not supported
H4-1SE → PEOU0.407***Supported
H4-2SE → PU−0.2160.005 **Supported
H4-3SE → TRU0.735***Supported
H5-1TA → PEOU−0.0190.090 *Partially supported
H5-2TA → PU−0.0150.188Not supported
H6-1SA → PEOU0.367***Supported
H6-2SA → PU0.332***Supported

Note: * p < 0.1; ** p < 0.05; *** p < 0.01.

In terms of the influences of aging characteristics, mobile self-efficacy makes significant influences on perceived usefulness, perceived trust, and perceived ease of use. Technology anxiety influences perceived ease of use negatively in a marginal way. Self-actualization significantly influences perceived usefulness and perceived ease of use.

6. General Discussion

Accordingly, the current study tends to contribute to prior literatures in several ways. To begin with, although previous research might have emphasized the introduction of new media and discussed their acceptance towards the latest technology, limited attention has been given to VUI [ 3 ], especially in the context of China, one of largest elderly populations in the world [ 12 ]. Given the popularity of VUI nowadays, older adults’ adoption intention has been largely overlooked in China. This study addresses the research gap by proposing a model to provide insights on the factors that influence older adults’ adoption of VUI in China. In addition, rare literature has comprehensively discussed the characteristic of VUI and older adults through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). In order to address this gap, this study started from TAM and further extended the model to have a relatively more thoroughly insight of the behavior of the elderly in this digital era. Results revealed that three factors determined Chinese older adults’ adoption of VUI: perceived usefulness, perceived ease of use and trust.

To specify, the results reveal several important findings. Consistent with previous studies on TAM [ 2 ]. Findings confirm that perceived usefulness, perceived ease of use, and trust is three important factors to explain Chinese older adults’ adoption of VUI. Results further reveal aging-related characteristics influence older adults’ perception of ease of use, usefulness and trust. This study finds a positive relationship between trust and the adoption intention of VUI. Trust has been demonstrated as an important factor in the contexts of e-commerce, e-government and technology adoption [ 48 , 108 ]. In the context of VUI, as VUI systems need to perform monitoring functions all the time, users have to share their daily conversations with the systems. The exposure of personal information can make users feel uncomfortable and vulnerable, which hinders users’ adoption of VUI. In this case, trust becomes a crucial factor. Users’ belief that their personal information will be protected becomes can largely alleviate users’ negative feelings and facilitate their adoption of VUI. Consistent with prior research that found the role of trust in influencing young adults’ adoption of VUI in the U.S. [ 40 ], results of this study show a similar pattern. Users who have a higher degree of trust will have a stronger adoption intention of VUI.

This study also reveals the influences of aging-related characteristics. Among the aging-related characteristics, perceived physical conditions did not show any significant influences on perceived usefulness, perceived ease of use and perceived trust. These findings are consistent with previous studies [ 61 ]. One possible explanation would be that healthy conditions serve as a precondition for older adults’ adoption of VUI, but the perceived physical condition itself does not naturally lead to better adoption intention. In other words, relatively healthy physical conditions enable older adults with acceptable physical and cognitive capabilities for using VUI. For instance, a good hearing ability enables older adults to use VUI, but a better hearing ability does not improve their intention of using VUI. Most likely, perceived physical conditions are influenced by other factors, such as technology anxiety.

Different from our hypothesis, no significant influence of technology anxiety is found on perceived usefulness or trust. In line with previous studies [ 61 ], technology anxiety lowers older adults’ perception of ease of use of VUI. A marginal significant negative influence of technology anxiety is found on perceived ease of use ( p < 0.1). This could be influenced by the fact that the benefits of VUI have been well acknowledged by older adults. The anxious emotion does not have a significant influence on their perception of usefulness. Different from other interaction methods (e.g., GUI) that require considerable efforts to acquire, VUI is highly similar to natural speech in daily lives. Such similarities make older adults feel that VUI are close to them and easy to acquire. The anxiety triggered by technology might be largely alleviated because of the intuitiveness of VUI. Thus, no significant influences of technology anxiety on perceived usefulness or trust were detected.

In terms of mobile self-efficacy, as expected, it positively affects perceived ease of use and trust. The extensive experience with mobile devices provides users with a better capability of learning VUI, and thus, they have a more positive perception of ease of use. Similarly, their experience with other technological applications, such as e-commerce, also translates into higher trust with VUI. Through their previous experience, they understand that technology provides have the obligation to protect users’ personal information. There are laws and rules to prohibit the misuse of users’ personal information. Therefore, older adults who have a higher level of mobile self-efficacy form a higher degree of trust with VUI. However, the higher level of self-efficacy does not bring a higher perception of usefulness. Instead, high self-efficacy is found to lower older adults’ perception of the usefulness of VUI. This finding indicates that older adults with a higher level of self-efficacy have more serious resistance to VUI. Specifically, older adults who are skillful at traditional interaction methods may feel that the traditional ways can satisfy their needs and it is unnecessary to change into VUI. Consequently, they have a negative perception of the usefulness of VUI.

As for self-actualization, consistent with our hypotheses, it positively relates to perceived usefulness, perceived ease of use, and perceived trust. Self-actualization is an intrinsic motivation to make achievements [ 87 ]. In line with previous studies that show that a higher level of self-actualization is associated with older adults’ adoption of new technologies [ 61 , 72 , 109 ], this study further confirms this notion by revealing the positive relationship between a higher level of self-actualization and the perception of VUI. Chinese older adults view using VUI as a chance for personal development.

6.1. Practical Implications for Facilitating VUI Adoption

Chinese older adults’ adoption of smart devices remains relatively low [ 110 ]. The complicated interaction is one of the barriers to older adults’ effective usage of smart devices. Using VUI as an interaction method could be a chance to assist older adults’ effective usage of smart products. This study finds that older adults’ adoption of VUI is predicated by perceived usefulness, perceived ease of use and trust. These factors also serve as mediators for the influences of technology anxiety, mobile self-efficacy and self-actualization on older adults’ adoption of VUI. These findings have valuable implications for developers and promoters to develop better VUI and plan for better communication strategies to facilitate adoption by older adults.

Developers should improve the speech recognition quality and language processing quality of VUI, as older adults show a higher adoption intention when they perceive VUI as more useful and ease of use. Both usefulness and ease of use of VUI rely on speech recognition accuracy and natural language processing capability. The higher accuracy of users’ voice commands and better comprehension of users’ intended meanings further improve VUI’s usefulness and ease of use. Specifically, for improving perceived usefulness, developers should carefully assess the contexts for using VUIs. The usage of VUIs can be particularly helpful for complex interaction tasks that require multiple steps, such as searching and navigation tasks. It would be also useful for using VUIs in tasks that are difficult for older adults due to decreasing capabilities, such as typing and dialing tasks.

To improve users’ perception of ease of use, developers can also make the voice interaction simple and intuitive. Involving interpersonal communication techniques into VUI can be particularly helpful for older adults. Designers can think of creating a personality for VUIs, which can largely reduce the psychological distance perceived by older adults. Designers should carefully consider how to create a desirable personality, including gender, tone, speaking styles. As older adults suffer from reduced cognitive load, it would be helpful to use short vocabularies that are easy to remember, such as ‘OK’ and ‘got it’. When it is necessary to highlight certain information, it would also be useful to slow down the speed and improve the volume of voice commands.

Moreover, it is important to improve trust between older adults and VUI. Developers could explore new technologies solutions to improve privacy when using VUI. When promoting VUI, marketers could highlight the sophisticated technologies used to improve privacy as well as the agreements with users for protecting users’ personal information. Policymakers could also try to explain the regulations in law for protecting users’ information and the serious consequences for the misuse of users’ personal information.

This study further shows the influences of mobile self-efficacy, technology anxiety, and self-actualization, which are useful for developers and marketers. Older adults who have a higher level of mobile self-efficacy show a higher perception of ease of use and trust, but a lower perception of the usefulness of VUI. This indicates that a higher level of mobile self-efficacy makes older adults more resistant to the benefits of VUI. When promoting VUI, marketers need different communication strategies for older adults who have a low or high level of mobile self-efficacy. It is necessary to highlight the benefits of VUI, especially the relative advantages of VUI in comparison with previous interaction methods. It would be also possible to first target older adults who have a low level of mobile self-efficacy. Moreover, it seems that VUI is an intuitive interaction method and thus, the influence of technology anxiety is relatively limited. Technology anxiety is found to be marginally related to perceived ease of use negatively. Therefore, developers and markers do not need to pay extensive efforts on how to reduce technology anxiety. In addition, self-actualization is found to make positive influences on perceived ease of use, perceived usefulness, and trust. This finding indicates that marketers should express the message that using VUI is a channel for personal development. Marketers could use multiple channels to express these messages, such as short videos on social media and graphic posters in public places. These efforts could facilitate older adults’ adoption of VUI.

6.2. Practical Implications for Using VUI in Smart Home Systems

Older adults show resistance to adopting smart home devices although they can gain huge benefits from adopting smart home systems. The integration of VUI in smart home systems is promising to facilitate older adults’ adoption of smart home systems. The results of this research not only provide implications for older adults’ adoption of VUI but also for their adoption of smart home systems.

For developers, when integrating VUI into smart home devices, they should pay particular attention to users’ perception of ease of use and usefulness. Specifically, for some smart products, such as smart speakers, the integration of VUI can largely improve users’ perception of ease of use and usefulness because smart speakers provide various functions which require complex interactions. In this case, integration of VUI largely reduces older adults’ learning burdens, which improves their perceptions of ease of use and usefulness of smart speakers in general. Differently, for some products that require simple interactions, integrating VUI may not be an optimal choice because the improvements of perceptions of usefulness and ease of use remain limited. For instance, for a cleaning robot whose function is to clean floors autonomously, users interact with it by pressing a start button, which is direct and simple. Upon completion, users have to physically interact with it in order to clean the dust containers. Thus, because of the simple interaction and requirements of physical interaction, involving VUI in cleaning robots might not largely improve users’ perception of ease of use and usefulness. As developing and integrating VUI into smart devices is costly, developers should carefully consider the appropriateness of involving VUI in smart devices.

This study also shows the influences of aging-related characteristics on older adults’ adoption of VUI, which could also be applicable to explaining their adoption of smart home devices. Specifically, mobile efficacy may lower users’ perceptions of the usefulness of smart home devices, similar to users’ perceptions of VUI. Because users who are very familiar with current mobile devices may feel that these devices sufficiently satisfy their needs, it is not necessary to switch to smart devices. Therefore, to promote older adults’ adoption of smart home devices, it would be interesting to highlight the benefits provided by smart home devices and target users who are less familiar with mobile devices.

In addition, we found a positive relationship between self-actualization and adoption of VUI. It is possible that self-actualization also positively influences older adults’ adoption of smart home devices. When older adults have a higher level of self-actualization, they are more motivated to adopt VUI because they view learning VUI as a chance for personal development. Similarly, for older adults with high self-actualization, learning to use smart home systems could also become an opportunity for them to gain new experiences. Thus, to promote smart home devices, companies should highlight self-actualization messages and target older adults who have a relatively high level of self-actualization.

6.3. Limitations and Future Research

Although this study is carefully prepared, it carries several limitations. We conducted the data collected online. According to CNNIC, 70% of older adults in China are frequent users of the Internet and mobile Internet [ 110 ]. The adoption of smartphones exceeds 80%. The high penetration rate of smartphones and the Internet makes it feasible to collect data online. As VUI is often integrated with smart products, it is also suitable to use the online sampling method. However, the older adults who are less active online might not be covered in this sample. In other words, whether these results can be applicable for older adults who are not frequent users of the Internet still requires further validation, which can be interesting for future research. Moreover, this study provides evidence on the potential usage of VUI toward the target population. A future study can use field experiments to validate the current finding. Specifically, it would be interesting to collect elderly participants who have some hands-on experience regarding VUI usages, which can result in more specific guidelines for developing usable VUI for older adults.

In addition, the average age of participants is 59, who are labeled as young old adults. This group of older adults occupies a large proportion in China, and thus it is worthwhile to focus on this group. However, this group of older adults could be different from older adults whose ages exceed 65. Therefore, future research could replicate this study by focusing on older adults with higher ages. Moreover, this study focuses on VUI adoption intention and older adults’ general perception of VUI. In other words, although older adults are willing to adopt VUI in their daily lives, their actual usage and continuous usage remain unknown. Older adults’ actual usage might be influenced by other factors, such as usability and usage scenarios. Future research could conduct user studies to learn the usability issues with using VUI and generate guidelines for VUI development, which can further facilitate the adoption of VUI.

7. Conclusions

VUI has gained popularity in this decade. It has been integrated with various smart home devices and developed for many usage scenarios. The benefits of VUI should be available to everyone, including older adults, who occupy 25% of the overall population in China. This study investigates the factors that influence older adults’ adoption of VUI in China. On the basis of TAM, this study proposes a theoretical model to predict older adults’ adoption of VUI through incorporating the construct of trust and aging-related characteristics (i.e., perceived physical conditions, mobile self-efficacy, technology anxiety, self-actualization). A survey was conducted with 420 participants who are current or potential users of VUI. Data were analyzed through SEM and the data showed a good fit of the proposed theoretical model. Results further revealed that older adults’ adoption is determined by perceived usefulness, perceived ease of use and trust. These factors also mediate the influences of aging-related characteristics on older adults’ adoption of VUI. Specifically, mobile self-efficacy is found to make positive influences on trust and perceived ease of use, but negative influences on perceived usefulness. Self-actualization makes positive influences on perceived usefulness and perceived ease of use. Technology anxiety only exerts a marginally significant influence on perceived ease of use. No significant influences of perceived physical conditions were found. These results extend the TAM and STAM by incorporating additional variables. These results also provide valuable implications for practice.

Author Contributions

Conceptualization, Y.Y. and P.C.; methodology, Y.Y.; software, Y.Y.; validation, Y.Y.; formal analysis, Y.S. and P.C.; investigation, Y.Y. and P.C.; resources, P.C.; data curation, Y.Y.; writing—original draft preparation, Y.Y.; writing—review and editing, Y.S. and P.C.; visualization, Y.Y.; supervision, P.C.; project administration, P.C.; funding acquisition, Y.S. and P.C. All authors have read and agreed to the published version of the manuscript.

This research was funded by Humanities and Social Science projects of the Ministry of Education in China, grant number 20YJC760009; Shenzhen Science and Technology Innovation Commission under Shenzhen Fundamental Research Program, grant number JCYJ20190806142 401703; the Fundamental Research Funds for the Central Universities, grant number YJ202203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Group 13.1.png

Voice User Interface in Autonomous Vehicles

Since the invention of computers, the norm of human-computer interaction has gone through various stages: from keyboard-mouse interaction to touch screen, to name a few. meanwhile, user interface has undergone tremendous changes in the recent years. voice interface is gradually replacing graphic user interface, and quickly becoming a common part of in-vehicle experiences. products that use voice as the primary interface are becoming popular by the day and the number of users has continued to grow. this case study explores the the application of voice interface in automotive field..

Group 23.png

Voice input as a form of human-computer interaction is intuitive for users. Users are not limited to specific grammatical rules when they interact with the system, and they can construct the inputs in different ways just like how they do so in conversations with humans. This case study explores the use of voice interface in the context of autonomous driving. Benefits of Voice User Interface: ○ Fewer screens -> less interaction costs ○ Free up users’ hands ○ More emotional and personalized ○ Accessibility: good for people with poor eyesight

Group 7.1.png

At the core of VUI lies people’s language ability.

Key takeaway is that we should not separate speech from ui design. just like when we are at a live music show, all five of our senses work together.  , when and where to apply gui or vui really depends. some information is easier to be processed when we see it. in other cases, vui is more suitable., here are a few examples of instances: gui: when we show a long list of option. charts with large amount of data. product information and product comparison. vui: simple command and user instructions. warning and notifications.  .

Rectangle 9.png

At the end of the day, designers are tasked to bridge technology and users. We design products to meet the needs of the users and the user-centric perspective will not change in design of VUI. From a technical standpoint, speech recognition technology converts user’s speech into text. After that, computer processes and understands the text through segmentation and parsing, and then triggers actions and sends feedback back to users. According to the technical framework, designer’s should analyze user intent and design conversational experience via scripts. But the challenge is that association between user’s language and their intent is not always straightforward.  

How do we analyze user’s intentions in vui.

Group 25.png

Conversational Structure of Natural Language

Users will only think that computers understand what they mean by getting the feedback that meet their expectations. complete conversational structure in natural language has a ‘start module’ and an ‘end module’ with ‘nodes in between topics.  , analyze user intent using replacement, we can derive a variety of user needs and responses by dissecting a fairly general user intent into components, and re-combining the components to derive a series of complex user needs. use this case of autonomous driving as an example. suppose users want to ask about weather in the car by striking up a conversation. users ask questions about the weather not only to ontain information about the weather, but we should be able to expand the topic and add dimensions to the conversation with related topics like safety, travel, health, food, mood etc., adobe xd is a powerful prototyping tool for voice interface. when users ariticular a particular word or phrase, the utterance triggers speech-to-text engine, and prototypes will react with words/sentence defined by the designer.  .

rtysry.png

Will senior adults accept being cognitively assessed by a conversational agent? a user-interaction pilot study

  • Open access
  • Published: 15 June 2024

Cite this article

You have full access to this open access article

voice user interface case study

  • Moisés R. Pacheco-Lorenzo   ORCID: orcid.org/0000-0002-0424-8850 1 ,
  • Luis E. Anido-Rifón 1 ,
  • Manuel J. Fernández-Iglesias 1 &
  • Sonia M. Valladares-Rodríguez 2  

15 Accesses

Explore all metrics

Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users and cognitive assessment administrators, through standard evaluation questionnaires. Method: a sample of 46 individuals and 24 evaluators participated in this study. End-users were fairly heterogeneous considering demographic and neuro-psychological characteristics. Evaluators were mostly health and social care professionals, relatively well-balanced in terms of gender, career background and years of experience. Results: end-users acceptability ratings were generally positive (rating above 3 in a 5-point scale for all dimensions) and it improved significantly after the interaction with DigiMoCA. Administrators also rated the usability of DigiMoCA, with an average score of 5.86/7 and with high internal consistency ( \(\alpha \) = 0.95). Conclusion: although there is still room for improvement in terms of user satisfaction and voice interface, DigiMoCA is perceived as an acceptable, accessible and usable cognitive screening tool, both by individuals being tested and test administrators.

Graphical abstract

voice user interface case study

Similar content being viewed by others

voice user interface case study

Effects of Age-Related Cognitive Decline on Elderly User Interactions with Voice-Based Dialogue Systems

voice user interface case study

Evaluating smart speakers as assistants for informal caregivers of people with dementia

voice user interface case study

Expert Insights for Designing Conversational User Interfaces as Virtual Assistants and Companions for Older Adults with Cognitive Impairments

Avoid common mistakes on your manuscript.

1 Introduction

In a world with an ever-increasing human lifespan, the quality of life of senior adults is becoming more and more relevant. According to WHO [ 1 ], the percentage of population over the age of 60 will increase by 34% between 2020 and 2030, and with it, the prevalence of neuro-psychiatric disorders, particularly dementia, which have an extremely high impact on people’s well-being and their social and economical aspects.

Mild Cognitive Impairment (MCI) is the transition stage between healthy aging and dementia and is characterized by subtle cognitive deficits that do not meet the criteria for diagnosis of a major neuro-cognitive disorder (DSM-V) [ 2 ]. These difficulties can manifest themselves in areas such as memory, attention, language, orientation or decision making. Thus, detecting MCI in its early stages is beneficial in preventing the progression of the disease, and, in certain cases, in slowing down some of its symptoms. However, in most cases the detection of cognitive deficits occurs when the symptoms are already evident and when the underlying neurological disorder was already present for some time [ 3 ], which means that the disease progressed. The traditional screening method for early detection of cognitive impairment involves the use of clinically-validated gold-standard tests that assess the cognitive state of a person.

The inception of these tests trace back to the second half of the 20th century. One of the first widely used screening tools was the Mini-Mental State Examination (MMSE), published by Folstein [ 4 ] in 1975; it includes items of orientation, concentration, attention, verbal memory, naming and visuospatial skills. In the 80s, the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) was developed [ 5 ] and it included 7 items, namely word recall, naming, commands, constructional praxis, ideational praxis, orientation and word recognition.

One of the limitations of these evaluation instruments is the fact that they are dementia-oriented, particularly Alzheimer’s. Therefore, in later years other screening tools were created, e.g., the Montreal Cognitive Assessment (MoCA) [ 6 ] test, which has a 90% sensitivity for MCI detection (MMSE is not sensitive to MCI). Its telephone version (T-MoCA) [ 7 , 8 ] is also validated and has a strong correlation with MoCA with a Pearson coefficient of 0.74.

The fact that MoCA is oriented at MCI detection makes it suitable as a screening tool for an early diagnosis.

In this context, the use of Information and Communication Technologies (ICT) could be a valuable tool for the early detection of MCI cases in a reliable and efficient way, where smart conversational agents are a disruptive technology with the potential to help detect neuro-psychiatric disorders in early stages [ 9 , 10 ]. Note that the penetration of these technological tools among senior adults is not as higher as in the case of other age groups, which makes these tools even more relevant.

Previous research demonstrated that it is possible to implement a voice-based version of a gold standard test for cognitive assessment using conversational agents [ 11 ]. More specifically, DigiMoCA, an Alexa voice application based on T-MoCA, was developed and tested with actual elderly people using a smart speaker.

DigiMoCA makes use of Alexa’s voice recognition and natural language processing services, and is able to store and retrieve session data in DynamoDB (Amazon’s NoSQL database service) persistently. Additionally, DigiMoCA utilizes prosodic annotations to adapt the speech rate to the user, and collects the response time to each item using a statistical estimation of rountrip times. This information is subsequently used to enhance DigiMoCA’s CI screening performance. DigiMoCA was evaluated using the Paradigm for Dialogue System Evaluation (PARADISE), yielding a confusion matrix with a Kappa coefficient \(\kappa = 0.901\) . This means DigiMoCA understands the user approximately 90% of the time, which is equivalent to “almost perfect”[ 12 ] in terms of task completion performance.

The main objective of this work is to analyze the acceptability and usability of DigiMoCA through a user interaction pilot study [ 13 ]. For this, the perception of senior end-users as well as administrators was collected by means of standard evaluation questionnaires, and the outcomes were analyzed using standard statistical procedures.

Thus, the research question posed is:

Is the screening tool DigiMoCA acceptable and usable for the cognitive evaluation of senior adults, both by them and their evaluators?

Section 2 describes the sample of participants, the study design and the data analysis carried out; Section 3 presents and discuss the findings of the study, both from the senior end-users as well as the administrators’ point of view; finally, Section 4 summarizes the results of this research.

2 Material and methods

This user-interaction study included the participation of 46 senior end-users and 24 sector-related professionals. According to previous relevant works [ 14 , 15 ], in order to calculate the number of participants for a pilot study we need to take into account: (1) the parameters to be estimated; (2) that at least 30 participants are involved; (3) a minimum confidence interval of 80% is required. The present study fits all three criteria.

Senior end-users participated through two associations: Parque Castrelos Daycare Center (PCDC) and the Association of Relatives of Alzheimer’s Patients (AFAGA), both located in the city of Vigo (Spain). Before the start of each study, applications were submitted to the Research Ethics Committee of Pontevedra-Vigo-Ourense, containing: (1) the objectives of the study, main and secondary; (2) the methodology proposed, i.e. tests and questionnaires to administer, inclusion and exclusion criteria, recruiting procedure within the association, sample size and structure, and detailed schedule; (3) security concerns and how to address them (anonymization and encryption); (4) ethical and legal aspects, particularly regarding data privacy; and finally, (5) a copy of the informed consent to be signed in advance by all participants. Both applications for AFAGA and CDPC were approved by the corresponding dictums with registration codes 2021/213 and 2023/115 respectively.

Inclusion criteria for senior participants consisted mainly of being over the age of 65 and not having an advanced state of dementia or any other psychological pathology, or any auditory/vocal disability. Table 1 collects the demographic characteristics of the end-user participants, classified by cognitive group. The mean age was 78.61 ± 6.75, with 65% of them being female. We can see that the number of individuals is fairly distributed per group. For cognitive state classification, we used the Global Deterioration Scale (GDS) [ 16 ], which is a widely utilized scale that describes the stage of cognitive impairment, with higher GDS score meaning more deterioration. For additional information, we also show the results of the T-MoCA evaluation (16.25 ± 3.28 for healthy users (HC), 16.25 ± 3.28 for users with MCI and 16.25 ± 3.28 for users with dementia (AD)), as well as the Memory Failures of Everyday (MFE) [ 17 ] questionnaire and the Instrumental Activities of the Daily Living (IADL) scale [ 18 ].

Administrator participants, on the other hand, were affiliated to several associations, namely the Unit of Psychogerontology at the University of Santiago de Compostela, the Galicia Sur Health Research Institute, the Multimedia Technology Group at the University of Vigo, and also AFAGA and PCDC. Table 2 depicts the information about these participants, where we can see that they are predominantly from the health field. The sample has a 58.33% female composition, mostly with middle-aged participants, and fairly evenly distributed among different backgrounds. We can also see a variety in terms of seniority, ranging from less than 5 of years of experience (29.17%) to more than 20 (20.83%).

2.1 Study design

The study was organized along 3 different sessions: during the first one, T-MoCA, MFE and IADL questionnaires were administered; during the second, and after at least two weeks in between, DigiMoCA administration took place. Finally, again after two or more weeks, a second administration of DigiMoCA was carried out during the third session.

Before the first and after the second conversation with the agent, participants were asked to answer to a Technology Acceptance Model (TAM) [ 20 ] questionnaire, which covers how users come to accept a technological system. In order to determine the acceptability of the conversational agent by participants, the designed TAM questionnaire addressed 3 dimensions:

Perceived usefulness (PU) . It measures whether a participant finds the smart speaker useful, both as a general concept, and specifically during the cognitive assessment sessions.

Perceived ease-of-use (PEoU) . It measures whether the conversation with the speaker was comfortable and straightforward for the user, purely in terms of communication.

Perceived satisfaction (PS) . It measures whether the user enjoyed the utilization of the speaker, and whether they prefer it to a human counterpart (i.e., another person conducting T-MoCA as an interviewer).

The resulting questionnaire consisted of a 5-point Likert rating scale composed of 6 items, 2 for each main dimension (1 meaning strongly negative/disagree, 5 strongly positive/agree, 3 neutral). For reference, the TAM questionnaire used is available in Section 1 , translated to English.

In addition to studying how end-users interacted with DigiMoCA, another study was conducted to gather the opinions of cognitive evaluation administrators on its usability and user-friendliness. These were individuals either responsible for administering cognitive assessment tools to older adults, or had a background of expertise related to application development and voice assistants. A 7-point Likert scale questionnaire based on the Post-Study System Usability Questionnaire (PSSUQ) [ 21 ] was used (1 meaning strongly disagree, 7 strongly agree, 4 neutral). The English translation of the PSSUQ questionnaire used is available in Section 2 .

The PSSUQ-based questionnaire was designed in order to evaluate 3 usability dimensions:

System usefulness: measures the ease of use and convenience. In the designed version, includes the average scores of items 1 to 8.

Information quality: measures the usefulness of the information and messages provided by the application. Includes average scores of questions 9 to 14.

Interface quality: measures the friendliness and functionality of the user interface of the system. Includes average scores of items 15 to 17 of the questionnaire.

Overall: measures overall usability, computed as the average of the scores of all items (1 to 18 in our case).

2.2 Data analysis

The following statistical instruments were used to assess acceptability:

Fundamental statistics: mean, standard deviation and percentages.

Cronbach’s Alpha ( \(\alpha \) )[ 22 ] to estimate the reliability, and specifically the internal consistency, of the responses. It is widely used in psychological test construction and interpretation, and it seeks to measure how closely test items are related to one another - thus measuring the same construct. When test items are closely related to each other, Cronbach’s alpha will be closer to 1; if they are not, Cronbach’s alpha will be closer to 0. In this study, we use this metric to evaluate the internal consistency of the responses to the TAM (end-user centered) and PSSUQ (administrators centered) questionnaires. It is computed as follows:

k is the number of items/questions included.

\(\sigma _i^2\) is the variance of each item across all responses.

\(\sigma _x^2\) is the total variance, including all items.

According to Gliem [ 23 ], a good interpretation of the value of Cronbach’s alpha regarding internal consistency is \(\alpha > 0.9\) means “excellent"; \(\alpha > 0.8\) means “good"; \(\alpha > 0.7\) menas “acceptable"; \(\alpha > 0.6\) means “questionable"; and anything below 0.6 is considered an indicator of low internal consistency.

Student T-tests [ 24 ] were used for comparison of pre-pilot and post-pilot questionnaires, giving insight on the evolution of the acceptability perception of the participants during the administration. Statistical significance was measured by means of p-values.

Cohen’s d [ 25 ]: measures the effect size of T-tests, and is computed as the standardized mean difference between two groups (in this case, pre-pilot and post-pilot). It is computed as the difference between the means divided by the square root of the average of both variances:

Based on Tellez’s analysis[ 26 ] the interpretation of Cohen’s d is as follows: \(d < 0.2\) is “trivial effect"; \(0.2< d < 0.5\) is “small effect"; \(0.5< d < 0.8\) is “medium effect"; and \(d > 0.8\) means “large effect".

Statistical analysis was performed using the Google Sheets online tool, as well as Google Colab with Jupyter notebooks written in Python. Several commonly-used data analysis libraries were used (e.g., NumPy, Pandas, Pingouin).

3 Results and discussion

This section presents and analyzes the main results obtained regarding the usability and acceptability of DigiMoCA, both from the end-users’ perspective (sample of n = 46) as well as the administrators’ (n = 24).

3.1 User interaction from senior end-users

As explained in Section 2 , users completed the TAM questionnaire before and after the administration of DigiMoCA. The questionnaire included two sections, each with the 3 dimensions and 6 questions: one focused on technology in general, and another focused on DigiMoCA and conversational agents.

Table 3 presents the results of TAM’s 3-dimensional scale, taken from the post evaluation, regarding DigiMoCA’s section. Most relevant results are:

Perceived usefulness: a value of 3.87 ± 0.92 was obtained including all groups, with the highest rating within the MCI group (4.11 ± 0.92) and the lowest from the HC group (3.42 ± 0.93). Regarding the internal consistency of the answers, a value of \(\alpha \) = 0.63 was obtained, with the most internally consistent group being HC ( \(\alpha \) = 0.76) and the lowest MCI ( \(\alpha \) = 0.42).

Perceived ease of use: a value of 3.98 ± 0.96 was obtained including all groups. Once again, the highest mean value was found in the MCI group (4.14 ± 0.99), whereas the lowest rating was also obtained within the HC group (3.83 ± 0.96). In terms of internal consistency, a value of \(\alpha \) = 0.73 was obtained overall, being the HC group the most internally consistent ( \(\alpha \) = 0.96) and MCI the least one ( \(\alpha \) = 0.28).

Perceived satisfaction: including all groups we observe a value of 3.27 ± 1.21, in this case with the best rating coming from the AD group (3.47 ± 1.16), and the worst rating from the HC group (3.00 ± 1.22). Regarding the internal consistency, a value of \(\alpha \) = 0.41 was obtained, with the most internally consistent group being HC again ( \(\alpha \) = 0.56) and the least one being MCI ( \(\alpha \) = 0.25).

Overall, we consider these results to be rather positive: none of the ratings drop below 3 (out of 5) on average, either considering the overall sample or any particular group/sub-sample. This means that regardless of the level of cognitive deterioration, the users find DigiMoCA useful, easy to use and satisfactory.

Regarding the internal consistency however, it is only “acceptable" for one of the dimensions (PEoU), with a worryingly low value for the PS dimension. We believe this inconsistency to be caused by the disparity of results obtained from the two questions regarding PS: the first asks about whether participants “liked to use DigiMoCA", and the second whether they would rather “use DigiMoCA instead of T-MoCA". We observe that the answers to the second part (i.e., after interacting with the agent) are considerably lower than to the first, perhaps due to the comparison between a human-robot interaction and a human-human interaction (which is usually strongly preferred by this demographic group).

Additionally, we can observe a tendency for the MCI group to give the highest ratings but with lowest internal consistency, whereas the HC group usually gives the lowest ratings but with highest internal consistency. One possible explanation for this behavior is that cognitive impairment can interfere with consistent reasoning; it is also likely that users with MCI had more trouble understanding the full implications of the questions posed, giving less consistent answers. Certainly, it is reasonable to believe that healthy users are generally more sensitive to the intrusiveness of these evaluations, hence the slightly lower ratings.

Tables 4 and 5 present the results of the perception variation between pre-administration and post-administration of DigiMoCA. Table 4 contains the results regarding the section about technology in general, while Table 5 contains the results of the section about conversational agents. Again, data is classified by TAM dimensions (rows), including the results for each individual question (“.1" and “.2" for each dimension). We also have the results obtained classified by cognitive group (columns): HC, MCI, AD and the whole sample.

The main objective of this analysis is to determine whether the acceptability perception of users has a significant change after the administration of DigiMoCA. For this, we performed a student’s T-test with pre and post questions, and obtained three metrics: percentage change between the averages, Cohen’s d and the statistical significance p . The following paragraphs address the main findings of this process.

Regarding the technology section, there is a percentage increase in all items of the first two dimensions: +6.17% for PU.1 (d = 0.33), +3.05% for PU.2 (d = 0.11), +5.26% for PEoU.1 (d = 0.17) and +9.00% for PEoU.2 (d = 0.44). However, there is only one item (PEoU.2) that exhibits a significant change (p = 0.010). Both items from the PS dimension remain essentially unchanged. Therefore, generally speaking, we can establish that the administration does not significantly change the acceptability of technology in senior adults, but we do observe a non-significant positive change in both PU and PEoU items. Furthermore, if we look at the sample sub-groups independently, we can also observe a positive non-significant change in the vast majority of items, only one of them being significant (PEoU.2 for AD group with +17.08% change; d = 0.84, p = 0.007).

With respect to the conversational agents section, the acceptability has a more noticeable improvement among most items, three of them being statistically significant, and we also find the first item with a “large effect" size: PU.1 with +59.14% (d = 1.06, p < 0.001), PEoU.2 with +13.71% (d = 0.65, p = 0.005), PS.1 with +12.22% (d = 0.61, p = 0.005). We should also notice that the PS.2 item has a significant decrease of -24.24% (d = 0.95, d < 0.001), but we do not think this particular item is a good representative of the PS dimension, since -as it was stated previously- the pre and post questions are different, and thus it should be taken with a grain of salt. If we look at sample sub-groups independently, we can notice that none of the significant changes are in the HC group, while most are concentrated on the MCI group: +85.84% (d = 1.29, p < 0.001) for PU.1, +21.61% (d = 1.04, p = 0.007) for PEoU.2, and +16.90% (d = 1.14, p = 0.003) for PS.1. Within the AD group, only the PU.1 is statistically significant (+58.59%, d = 1.02, p = 0.013).

In light of the results discussed, it seems reasonable to affirm that the acceptability on conversational agents by senior adults improves significantly after the interaction with DigiMoCA. To support this, we found that at least one item exhibits a statistically significant (p < 0.05) positive change in all 3 dimensions, and if we discard item PS.2, which as pointd out above is probably not accurate, all items have an increase in acceptability across all groups.

3.2 Usability perception of DigiMoCA from administrators

In addition to the end-user interaction study, an additional study was carried out in order to measure the usability perception of DigiMoCA from cognitive assessment administrators and professionals. For this, we employed the PSSUQ questionnaire with items rated in a 7-point Likert scale, which is widely used to measure user’s perceived satisfaction of a software system. Table 6 summarizes the results, which are also categorized by gender, field of occupation and years of experience:

Overall usability (OVERALL): we obtain a 5.86 ± 1.24 mean value for all participants and all items. The mean rating does not excessively change based on gender or career experience, although the average rating for participants in the technological field was slightly higher (6.26 ± 0.94). The internal consistency obtained was “excellent" ( \(\alpha \) = 0.95) overall, with some slight differences based on gender ( \(\alpha \) being 0.88 for males and 0.97 for females), field of expertise ( \(\alpha \) = 0.96 for health field, 0.90 for technological field) and experience ( \(\alpha \) = 0.91 for administrators with 10+ years of experience, 0.97 for the ones with less than 10).

System usefulness (SYSUSE): including items 1 to 8, we obtain a mean value of 5.96 ± 1.14 for all participants. Again the mean rating is not considerably affected by gender or career experience, but we do obtain a slightly higher value of 6.36 ± 0.94 for participants in the technological field. As for the internal consistency of the answers, we get an “excellent" \(\alpha \) = 0.91 for the whole sample, although it does drop to just “good" for the male group ( \(\alpha \) = 0.85) and the most experienced participants ( \(\alpha \) = 0.88). The lowest internal consistency is found within the technological field, with an “acceptable" \(\alpha \) = 0.76.

Information quality (INFOQUAL): the mean value obta-ined from items 9 to 14 was 5.74 ± 1.44 overall. Once again, the highest differences found were based on the field of expertise: the technological field group had the highest mean value of 6.17 ± 1.22, while the lowest value was obtained from the health field group (5.63 ± 1.48). The overall internal consistency was \(\alpha \) = 0.90, and we do find differences between the demographic groups: higher consistency for females ( \(\alpha \) = 0.96) than males ( \(\alpha \) = 0.74); higher consistency for the health field group ( \(\alpha \) = 0.91) than the technological group ( \(\alpha \) = 0.79); and higher consistency for the least experienced individuals ( \(\alpha \) = 0.93) than the most experienced ( \(\alpha \) = 0.84).

Interface quality (INTERQUAL): including items 15 to 17, the overall mean rating was 5.81 ± 1.11. For this dimension the mean value for the technological field group was the highest (6.11 ± 0.65), and the mean value for the least experienced group was the lowest (5.71 ± 1.23). As for the internal consistency, this was the dimension with the lowest overall, with an “acceptable" value of \(\alpha \) = 0.77. Again we find considerable differences between demographic groups: higher \(\alpha \) = 0.88 for females than males ( \(\alpha \) = 0.34), higher \(\alpha \) = 0.80 for health field than technological field ( \(\alpha \) = 0.27) and higher \(\alpha \) = 0.90 for the less experienced group than the people with 10+ years of experience ( \(\alpha \) = 0.42). This is the only dimension where we see the internal consistency drop below an “acceptable" level, and it is probably due to the small amount of items it considers (only three).

In light of the presented results, we observe that the overall usability perception is generally positive, slightly under 6 out of 7 points, and never drops below 5 for any of its dimensions, even if considering specific demographic groups based on gender, career field and experience.

We do observe a pattern between the groups: females provide slightly lower ratings than males, but with a higher internal consistency. The exact same happens between the health field group (i.e., slightly lower ratings and higher consistency) and the technological field group, as well as between the most experienced group and the least one. The fact that this pattern repeats across groups is expected, and it is probably due to the fact that the groups are overlapping: more males than females work in the tech field, and the males happen to be younger on average than females (34.9 years old vs. 40.14, cf. Table  2 ), hence the difference found between different seniority groups. Furthermore, we noticed that participants from the medical field made more comments suggesting improvement areas than participants from the technical field, particularly regarding the user interface.

As to why this pattern occurs, we believe it is justified since DigiMoCA is inherently a technological and disruptive screening tool. Therefore, it is to be expected that professionals from the technological field are more keen to using it, and generally more interested in it and curious about how it works. Conversely, it also makes sense that professionals from the health field are more “skeptical" and less interested, since the health field is generally more stable and less prone to disruptive changes [ 27 ], and certainly more people-oriented than tool-oriented.

Finally, the fact that the information and interface-related items obtain a slightly lower rating across all groups is justified, as one of the main drawbacks of using a voice-only communication channel is the restriction of the user interface, which lacks visual user interaction. This probably means that the PSSUQ questionnaire should be adapted in this context to new ICT tools based on conversational agents, where questions about the user interface either need a reformulation or simply to be excluded.

4 Conclusion

In this paper a user-interaction pilot study analyizing the usability and acceptability of DigiMoCA -a digital Alexa-based cognitive impairment screening tool based on T-MoCA- is discussed, both from end-users’ and administrators’ perspectives.

In the case of end-users, a TAM questionnaire was utilized, administered both before and after DigiMoCA. Overall, the results show that users accept DigiMoCA, giving it a 3+ score in all three TAM’s dimensions, meaning that they perceive it as useful, easy to use and satisfactory. The perceived ease of use was particularly positive and internally consistent, with a mean score of 3.98. Additionally, the pre vs. post analysis show that, while the acceptability of technology does not change significantly after the administration of DigiMoCA, when it comes to conversational agents specifically, their perceived acceptability improves significantly. All three dimensions have an item with a statistically significant positive change. Moreover, the vast majority of non-significant changes were also positive.

In the case of test administrators, a PSSUQ questionnaire was used. Its results show that DigiMoCA is considered usable (mean score 5.86) very consistently ( \(\alpha \) = 0.95), with a score of 5+ out of 7 for all the dimensions and demographic groups. System usefulness was rated consistently higher than information and interface quality, and we find the biggest demographic differences between the health field group and the technological field group.

The sample size is one of the main limitations of the study. To estimate an ideal sample size, initially we obtained an estimation of the prevalence of AD in Spain (10.285%) Footnote 1 . Then, based on the confidence interval needed of 95% we would need n = 142 participants per study group, which is far from the sample size achieved so far.

Future lines of work include further characterizing the sample, carrying out a study of acceptability and usability by technological training of the participants, including their relationship with technology throughout their lives. Additionally, it could be worth to analyse more objective metrics, such as participants’ response times, which could enrich the study of DigiMoCA.

Ongoing work addresses the improvement of the perceived satisfaction from using DigiMoCA, by making it more friendly, while also improving its interface and the information provided to the user, compensating the voice-only interaction limitations. As these aspects are improved to make user interaction with conversational agents to be perceived closer and closer to that with human administrators, the distinctive affordability and accessibility of smart assistant-based tests can effectively set them off as a powerful screening technology.

figure b

Data Availability

All data supporting the findings of this study are available within the paper and its Supplementary Files, particularly the user responses to the usability and acceptability questionnaires.

Source: Clinical Practice Guideline on Comprehensive Care for People with Alzheimer’s Disease and other dementias https://portal. guiasalud.es/wp-content/uploads/2018/12/GPC_484_Alzheimer_AIA- QS_resum.pdf

WHO (2023) Un decade of healthy ageing: Plan of action. https://cdn.who.int/media/docs/default-source/decade-of-healthy-ageing/decade-proposal-final-apr2020-en.pdf?sfvrsn=b4b75ebc_28

APA (2013) Diagnostic and statistical manual of mental disorders, 5th Edn. https://doi.org/10.1176/appi.books.9780890425596

Kowalska M, Owecki M, Prendecki M, Wize K, Nowakowska J, Kozubski W, Lianeri M, Dorszewska J (2017) Aging and neurological diseases. In: Senescence, IntechOpen, Ch. 5. https://doi.org/10.5772/intechopen.69499

Gallegos M, Morgan M, Cervigni M, Martino P, Murray J, Calandra M, Razumovskiy A, Caycho-Rodríguez T, Arias Gallegos W (2022) 45 years of the mini-mental state examination (mmse): A perspective from ibero-america. Dementia & Neuropsychologia. https://doi.org/10.1590/1980-5764-dn-2021-0097

Kueper J, Speechley M, Montero-Odasso M (2018) The alzheimer’s disease assessment scale–cognitive subscale (adas-cog): Modifications and responsiveness in pre-dementia populations. a narrative review. Journal of Alzheimer’s Disease. https://doi.org/10.3233/JAD-170991

Nasreddine Z, Phillips N, Bédirian V, Charbonneau S, Whitehead V, Collin I, Cummings J, Chertkow H (2005) The montreal cognitive assessment, moca: A brief screening tool for mild cognitive impairment. J Am Geriatr Soc 53:695–9. https://doi.org/10.1111/j.1532-5415.2005.53221.x

Article   Google Scholar  

Katz M, Wang C, Nester C, Derby C, Zimmerman M, Lipton R, Sliwinski M, Rabin L (2021) T-moca: A valid phone screen for cognitive impairment in diverse community samples. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring 13. https://doi.org/10.1002/dad2.12144

Nasreddine ZS (2021) Moca test: Validation of a five-minute telephone version. Alzheimer’s & Dementia 17. https://doi.org/10.1002/alz.057817

Pacheco-Lorenzo MR, Valladares-Rodríguez SM, Anido-Rifón LE, Fernández-Iglesias MJ (2021) Smart conversational agents for the detection of neuropsychiatric disorders: A systematic review. Journal of Biomedical Informatics 113. https://doi.org/10.1016/j.jbi.2020.103632

Otero-González I, Pacheco-Lorenzo MR, Fernández-Iglesias MJ, Anido-Rifón LE (2024) Conversational agents for depression screening: A systematic review. International Journal of Medical Informatics. https://doi.org/10.1016/j.ijmedinf.2023.105272

Pacheco-Lorenzo M, Fernández-Iglesias MJ, Valladares-Rodriguez S, Anido-Rifón LE (2023) Implementing scripted conversations by means of smart assistants. Software: Practice and Experience 53. https://doi.org/10.1002/spe.3182

Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics. https://doi.org/10.2307/2529310

Valladares-Rodriguez S, Fernández-Iglesias MJ, Anido-Rifón L, Facal D, Rivas-Costa C, Pérez-Rodríguez R (2019) Touchscreen games to detect cognitive impairment in senior adults. a user-interaction pilot study, International Journal of Medical Informatics 127. https://doi.org/10.1016/j.ijmedinf.2019.04.012

Lancaster GA, Dodd S, Williamson PR (2004) Design and analysis of pilot studies: recommendations for good practice. Journal of Evaluation in Clinical Practice. https://doi.org/10.1111/j.2002.384.doc.x

Cocks K, Torgerson DJ (2013) Sample size calculations for pilot randomized trials: a confidence interval approach. Journal of Clinical Epidemiology. https://doi.org/10.1016/j.jclinepi.2012.09.002

Reisberg B, Torossian C, Shulman M, Monteiro I, Boksay I, Golomb J, Benarous F, Ulysse A, Oo T, Vedvyas A, Rao J, Marsh K, Kluger A, Sangha J, Hassan M, Alshalabi M, Arain F, Sh N, Buj M, Shao Y (2018) Two year outcomes, cognitive and behavioral markers of decline in healthy, cognitively normal older persons with global deterioration scale stage 2 (subjective cognitive decline with impairment). Journal of Alzheimer’s disease: JAD. https://doi.org/10.3233/JAD-180341

Montejo P, Peña M, Sueiro M (2012) The memory failures of everyday questionnaire (mfe): Internal consistency and reliability. The Spanish Journal of Psychology. https://doi.org/10.5209/rev_SJOP.2012.v15.n2.38888

Graf C (2008) The lawton instrumental activities of daily living (iadl) scale, AJN. American Journal of Nursing. https://doi.org/10.1097/01.NAJ.0000314810.46029.74

CSIC (2023) Un perfil de las personas mayores en españa 2023. https://envejecimientoenred.csic.es/wp-content/uploads/2023/10/enred-indicadoresbasicos2023.pdf

Abu Rbeian AH, Owda A, Owda M (2022) A technology acceptance model survey of the metaverse prospects, AI. https://doi.org/10.3390/ai3020018

Lewis JR (1992) Psychometric evaluation of the post-study system usability questionnaire: The pssuq. Proceedings of the Human Factors Society Annual Meeting. https://doi.org/10.1177/154193129203601617

Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika. https://doi.org/10.1007/BF02310555

Gliem JA, Gliem RR (2003) Calculating, interpreting, and reporting cronbach’s alpha reliability coefficient for likert-type scales. https://hdl.handle.net/1805/344

Mishra P, Singh U, Pandey CM, Mishra P, Pandey G (2019) Application of student’s t-test, analysis of variance, and covariance. Annals of Cardiac Anaesthesia. https://doi.org/10.4103/aca.ACA_94_19

Thalheimer W, Cook S (2002)How to calculate effect sizes from published research: A simplified methodology. Work-Learning Research. https://api.semanticscholar.org/CorpusID:145490810

Tellez A, Garcia Cadena C, Corral-Verdugo V (2015) Effect size, confidence intervals and statistical power in psychological research. Psychology in Russia: State of the Art. https://doi.org/10.11621/pir.2015.0303

Nadarzynski T, Miles O, Cowie A, Ridge D (2019) Acceptability of artificial intelligence (ai)-led chatbot services in healthcare: A mixed-methods study. Digital Health. https://doi.org/10.1177/2055207619871808

Download references

Acknowledgements

We acknowledge the contributions and support of author’s colleague Noelia Lago, as well as the staff at AFAGA (Miriam Fortes and Maxi Rodríguez) and Centro de Día Parque Castrelos (Ángeles Álvarez), and all of the participants of this study, without whom this work would not be possible.

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Funding for open access charge: CISUG/Universidade de Vigo. This work has been partially funded by Ministerio de Ciencia e Innovación, project SAPIENS- Services and applications for a healthy aging [grant PID2020-115137RB-I00 funded by MCIN/AEI/10.13039/501100011033] and by the Ministry of Science, Innovation and Universities [grant FPU19/01981] (Formación de Profesorado Universitario).

Author information

Authors and affiliations.

atlanTTic, University of Vigo, 36310, Vigo, Spain

Moisés R. Pacheco-Lorenzo, Luis E. Anido-Rifón & Manuel J. Fernández-Iglesias

Department of Electronics and Computing, USC, 15782, Santiago de Compostela, Santiago de Compostela, Spain

Sonia M. Valladares-Rodríguez

You can also search for this author in PubMed   Google Scholar

Contributions

Moisés R. Pacheco-Lorenzo : administration of questionnaires, statistical analysis and writing.

Sonia Valladares-Rodriguez : statistical analysis and writing.

Manuel J. Fernández-Iglesias : supervision, writing, review and editing.

Luis E. Anido-Rifón : supervision, writing, review and editing.

Corresponding author

Correspondence to Moisés R. Pacheco-Lorenzo .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: TAM questionnaire

figure c

Appendix B: PSSUQ questionnaire

figure g

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Pacheco-Lorenzo, M.R., Anido-Rifón, L.E., Fernández-Iglesias, M.J. et al. Will senior adults accept being cognitively assessed by a conversational agent? a user-interaction pilot study. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05558-z

Download citation

Accepted : 23 May 2024

Published : 15 June 2024

DOI : https://doi.org/10.1007/s10489-024-05558-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • User interaction
  • Acceptability
  • Cognitive impairment
  • Cognitive assessment
  • Voice application
  • Smart speaker
  • Find a journal
  • Publish with us
  • Track your research

This paper is in the following e-collection/theme issue:

Published on 18.6.2024 in Vol 26 (2024)

Monitoring Adverse Drug Events in Web Forums: Evaluation of a Pipeline and Use Case Study

Authors of this article:

Author Orcid Image

Original Paper

  • Pierre Karapetiantz 1 , PhD   ; 
  • Bissan Audeh 1 , PhD   ; 
  • Akram Redjdal 1 , PhD   ; 
  • Théophile Tiffet 2, 3 , MD   ; 
  • Cédric Bousquet 1, 2 , PhD, PharmD   ; 
  • Marie-Christine Jaulent 1 , PhD  

1 Inserm, Sorbonne Université, université Paris 13, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-santé, LIMICS, F-75006, Paris, France

2 Service de santé publique et information médicale, CHU de Saint Etienne, 42000 Saint-Etienne, France

3 Institut National de la Santé et de la Recherche Médicale, Université Jean Monnet, SAnté INgéniérie BIOlogie St-Etienne, SAINBIOSE, 42270 Saint-Priest-en-Jarez, France

Corresponding Author:

Marie-Christine Jaulent, PhD

Sorbonne Université

université Paris 13, Laboratoire d’informatique médicale et d’ingénierie des connaissances en e-santé, LIMICS, F-75006

15 rue de l'école de Médecine

Paris, 75006

Phone: 33 144279108

Email: [email protected]

Background: To mitigate safety concerns, regulatory agencies must make informed decisions regarding drug usage and adverse drug events (ADEs). The primary pharmacovigilance data stem from spontaneous reports by health care professionals. However, underreporting poses a notable challenge within the current system. Explorations into alternative sources, including electronic patient records and social media, have been undertaken. Nevertheless, social media’s potential remains largely untapped in real-world scenarios.

Objective: The challenge faced by regulatory agencies in using social media is primarily attributed to the absence of suitable tools to support decision makers. An effective tool should enable access to information via a graphical user interface, presenting data in a user-friendly manner rather than in their raw form. This interface should offer various visualization options, empowering users to choose representations that best convey the data and facilitate informed decision-making. Thus, this study aims to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively.

Methods: To enhance pharmacovigilance efforts, we have devised a pipeline comprising 4 distinct modules, each independently editable, aimed at efficiently analyzing health-related French web forums. These modules were (1) web forums’ posts extraction, (2) web forums’ posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France. This event led to a surge in reports to the French regulatory authority.

Results: Between January 1, 2017, and February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums. These posts contained 437,192 normalized drug-ADE couples, annotated with the Anatomical Therapeutic Chemical (ATC) Classification and Medical Dictionary for Regulatory Activities (MedDRA). The analysis of the Levothyrox new formula revealed a notable pattern. In August 2017, there was a sharp increase in posts related to this medication on social media platforms, which coincided with a substantial uptick in reports submitted by patients to the national regulatory authority during the same period.

Conclusions: We demonstrated that conducting quantitative analysis using the GUI is straightforward and requires no coding. The results aligned with prior research and also offered potential insights into drug-related matters. Our hypothesis received partial confirmation because the final users were not involved in the evaluation process. Further studies, concentrating on ergonomics and the impact on professionals within regulatory agencies, are imperative for future research endeavors. We emphasized the versatility of our approach and the seamless interoperability between different modules over the performance of individual modules. Specifically, the annotation module was integrated early in the development process and could undergo substantial enhancement by leveraging contemporary techniques rooted in the Transformers architecture. Our pipeline holds potential applications in health surveillance by regulatory agencies or pharmaceutical companies, aiding in the identification of safety concerns. Moreover, it could be used by research teams for retrospective analysis of events.

Introduction

Social media as a complementary data source for pharmacovigilance.

One primary mission of regulatory agencies such as the FDA (Food and Drug Administration) or the EMA (European Medicines Agency) is to monitor drug usage and adverse drug events (ADEs) to mitigate the risks associated with drugs within the population. This task entails analyzing diverse data sources, including clinical trials, postmarketing surveillance, spontaneous reporting systems, and published scientific literature. Despite the wealth of available data, some ADEs are not always detected promptly, largely because of underreporting. In France, for instance, underreporting was estimated to range between 78% and 99% from 1997 to 2002 [ 1 ]. To tackle this challenge, several countries have implemented systems allowing patients to report ADEs.

Additional sources for detecting ADEs have been under exploration, such as electronic patient records [ 2 - 4 ] and social media platforms [ 5 - 9 ]. While some argue that social media alone cannot serve as a primary source for signal detection [ 10 ], it can be viewed as a valuable secondary source for monitoring emerging adverse drug reactions or reinforcing signals previously identified through spontaneous reports stored in traditional pharmacovigilance databases [ 11 ]. In a prior study by the authors, patient profiles and reported ADEs found in web forums were compared with those in the French Pharmacovigilance Database (FPVD). The forums tended to represent younger patients, more women, less severe cases, and a higher incidence of psychiatric disorder–related ADEs compared with the FPVD [ 12 ]. Moreover, forums reported a greater number of unexpected ADEs. Over the past decade, several tools for evaluating social media posts have been described in the literature [ 13 ]. Specifically, effective ADE detection in social media necessitates both quantitative and qualitative analyses of data [ 14 ].

Qualitative Approach for Individual Assessment of Posts

Qualitative assessment entails evaluating whether users’ messages contain pertinent information for an assessment akin to a pharmacovigilance case report. This includes details such as the patient’s age and gender, the severity of the case, the expectedness and timeline of the adverse event, time-to-onset, dechallenge (outcome upon drug withdrawal), and rechallenge (outcome upon drug reintroduction). For instance, GlaxoSmithKline Inc. implemented the qualitative approach Insight Explorer, which facilitates the collection of extensive data for causality and quality assessment. Users can input data including personal information (eg, age range, gender) and product details (eg, name, route of administration, duration of use, dosage). This approach was adapted for the WEB-RADR (Recognizing Adverse Drug Reactions) project to manually construct a gold standard of curated patient-authored text [ 15 ].

Quantitative Approach for Monitoring Adverse Drug Events on Social Media

Quantitative evaluation involves analyzing extracted data using descriptive and analytical statistics, such as signal detection and change-point analysis. Numerous projects have been undertaken to monitor ADEs on social media. One of the earliest projects is the PREDOSE (Prescription Drug Abuse Online Surveillance and Epidemiology) project [ 5 ], which investigates the illicit use of pharmaceutical opioids reported in web forums. While the PREDOSE project showcased the potential of leveraging social media for opioid monitoring, notable limitations are the lack of deidentification and signal detection methods. MedWatcher Social, a monitoring platform for health-related web forums, Twitter, and Facebook, represents a prototype application developed in 2014 [ 16 ]. Yeleswarapu et al [ 6 ] outlined a semiautomatic pipeline that applies natural language processing (NLP) tasks to extract ADEs from MEDLINE abstracts and user comments from health-related websites. However, this pipeline was not intended for routine use.

The Domino’s interface [ 17 ], developed in 2018 by the University of Bordeaux in France and funded by the French Medicines Agency (Agence nationale de sécurité du médicament et des produits de santé [ANSM]), was designed to analyze drug misuses in health-related web forums using NLP methods and the summary of product characteristics. Initially tailored for antidepressant drugs, this tool does not primarily focus on ADE surveillance.

Another pipeline, described by Nikfarjam et al in 2019 [ 7 ], used a neural network–based named entity recognition system specifically designed for user-generated content in social media. This platform is dedicated to identifying the association of cutaneous ADEs with cancer therapy drugs. The study focused on a selection of drugs and only examined 8 ADEs.

Magge et al [ 8 ] described a pipeline aimed at the extraction and normalization of adverse drug mentions on Twitter. Their pipeline consisted of an ADE classifier designed to identify tweets mentioning an ADE, which were then mapped to a MedDRA (Medical Dictionary for Regulatory Activities Terminology) code. However, the normalization process was confined to the ADEs present in the training set. Neither Nikfarjam’s nor Magge’s pipeline provides a graphical user interface.

Some private companies also offer tools for analyzing social media for pharmacovigilance purposes. For instance, the DETECT platform was developed as part of a collaborative project in France by Kappa Santé [ 18 ]. This system enabled the labeling of posts with known controlled vocabulary concepts, and signal detection was conducted [ 19 ]. Within the scope of this project, Expert System Company implemented BIOPHARMA Navigator to extract web forum posts, while the Luxid Annotation Server provided web services for the automatic annotation of posts.

An important finding from the studies of the last decade is that while regulatory agencies have begun using data sources beyond spontaneous reports, social media has yet to be fully leveraged in real-world settings due to the immaturity of available solutions. Primarily, these solutions are essentially proofs of concept that lack scalability and are challenging for experts to evaluate routinely, primarily due to the absence of a graphical user interface to present information.

Our aim was to assess the potential of integrating social media into pharmacovigilance and enhancing decision-making with this novel data source. To achieve this, our objective was to develop and assess a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. The goal was to create a user-friendly tool that enables regulatory authorities to make better-informed decisions effectively.

This article presents the design and implementation of our pipeline dedicated to harnessing posts from social media. In addition, we showcase the use of the pipeline through a specific use case, emphasizing the importance of monitoring drugs in social media to better address patients’ expectations.

The PHARES project (Pharmacovigilance in Social Networks), funded from 2017 to 2019 by the French ANSM, aimed to develop a software suite (a pipeline) enabling pharmacovigilance users to analyze social networks, particularly messages posted on forums. The objective of the pipeline is to facilitate routine use through continuous post extraction and quantitative data analysis from web forums, specifically tailored for the French language.

The pipeline is made up of 4 modules, each referring to its own methods ( Figure 1 ):

The Scraper module, which extracts posts from forums using a previously developed tool, Vigi4Med (V4M) scraper [ 9 ], and produces a comma-separated values (CSV) file filled with the texts extracted.

The Annotation module, which extracts elements of interest from the posts and registers annotations in CSV files, with each line representing an annotation of an ADE or a drug. When a causality relationship is identified, both an ADE and a drug are annotated on the same line.

The Statistical module, which performs quantitative analysis on the annotated posts, generating numerical data, tables, or figures.

voice user interface case study

The Interface module, which supports query definition and visualization of results.

The methodology used to evaluate the PHARES pipeline involved comparing its performance with existing platforms mentioned above, in accordance with a set of criteria established with prospective PHARES users. The criteria, specific to each module, are as follows:

  • General level: focus on ADEs, designed for routine usage.
  • Scraper: collects all posts of a selected website, performs deidentification, allows to extract posts from web forums, and is open source.
  • Statistics: the temporal evolution of posts or annotations is displayed and a change-point analysis (detecting breakpoints) is possible.
  • Signal detection: allows to apply at least one signal detection method, displays the temporal evolution of the proportional reporting ratio (PRR), and allows to perform a logistic regression–based signal detection method.
  • Graphical user interface: has an interface for users.

Scraper Module

V4M Scraper is an open-source tool designed for data extraction from web forums [ 9 ]. Its primary functions are optimizing scraping time, filtering out posts primarily focused on advertisements, and structuring the extracted data semantically. The module operates by taking a configuration file as input, which contains the URL of the targeted forum. The algorithm navigates through forum pages and generates resource description framework (RDF) triplets for each extracted element, allowing for potential alignment with external semantic resources. A caching mechanism has been integrated into this tool to maintain a local copy of previously visited pages, thereby avoiding redundant requests to websites for already scraped web pages, particularly in cases of errors or testing, for example. Vigi4Med V4M Scraper was customized for the PHARES project, as indicated by the red elements in Figure S1 in Multimedia Appendix 1 . The database format (Figure S2 in Multimedia Appendix 1 ) was implemented to enhance interaction with the interface. Specifically, the main scraping script was adjusted to produce a simplified tabular format (CSV) of the extracted data and to store these data in a database. This modification aims to facilitate input to the subsequent module of the pipeline (annotation). V4M Scraper was customized to enable a continuous scraping routine, wherein data extracted from web forums are automatically and regularly annotated and registered. A log file was integrated into the scraper structure to maintain a record of the last scraped element. This log file ensures that the daily routine scraping always begins from the last scraped point. An automation tool (crontab) is used to schedule the execution of the pipeline for each forum on a daily basis at a specific time.

A total of 23 public French health-related web forums were selected through a combination of Google searches and from a list of certified health websites provided by the HON Foundation, in collaboration with the French National Health Authority (HAS). The selection criteria included the requirement for websites to be hosted in France, feature a discussion board or space for sharing experiences, and have more than 10 patient contributions. Furthermore, Twitter posts are collected and analyzed by the pipeline. This is achieved using the Twitter API for data collection, followed by employing the same modules used for processing web forum posts.

Annotation Module

Entities corresponding to drugs and pathological conditions in social media were identified and annotated using an NLP pipeline [ 20 ]. Initially, conditional random fields were used to account for global dependencies [ 21 ]. Specifically, the model considers the entire sequence when making predictions for individual tokens. This approach is advantageous for entity extraction tasks, as the presence of an entity in one part of the text can influence the likelihood of other entities in the vicinity. Second, a support vector machine is used to predict the causality relationship between an entity identified as a drug and another entity identified as an ADE. The annotation method used in this module was implemented at an early stage of the pipeline’s design. Currently, the named entity recognition task of this module is undergoing revision to incorporate more recent advancements in NLP algorithms [ 22 - 26 ].

In a third step, the detected annotations were normalized using codes from the MedDRA and the Anatomical Therapeutic Classification (ATC) to ensure they were suitable for signal detection purposes.

MedDRA is an international medical hierarchical terminology comprising 5 levels used to code potential ADEs in pharmacovigilance. The highest level is the system organ class, which is further divided into high-level group terms, then into high-level terms, preferred terms (PTs), and finally lowest level terms. Typically, the PT level is used in pharmacovigilance signal detection.

The ATC classification system is a drug classification used in France for pharmacovigilance purposes. It categorizes the active ingredients of drugs based on the organ system they primarily affect. The classification comprises 5 levels: the anatomical main group (consisting of 14 main groups), the therapeutic subgroup, the therapeutic/pharmacological subgroup, the chemical/therapeutic/pharmacological subgroup, and the chemical substance. Typically, the fifth level (chemical substance) is used in pharmacovigilance signal detection.

The outputs of the annotation module are CSV files with the following variables:

  • Concerning the post: forum name, post ID, and date
  • Concerning the ADE: verbatim, normalized term, unified medical language system’s concept unique identifier, and MedDRA code
  • Concerning the drug: verbatim, normalized term, active ingredient, and ATC code

In these CSV files, each line can consist of either an adverse event (ADE) annotation, a drug annotation, or both when a causality relationship has been identified between the drug and the ADE. Table 1 provides a sample of the database.

In a prior study, we selected posts where at least one ADE associated with 6 drugs (agomelatine, baclofen, duloxetine, exenatide, strontium ranelate, and tetrazepam) had been detected by this algorithm. A manual review revealed that among 5149 posts, 1284 (24.94%) were validated as pharmacovigilance cases [ 12 ]. The fundamental metrics used to assess the performance of the annotation module were precision (P), recall (R), and their harmonic mean F 1 -score. To calculate these metrics, it is necessary to evaluate false negatives for nonrecognition of relevant terms, false positives for irrelevant recognitions, and true positives for correct recognitions. Precision, recall, and F 1 -score are defined as follows:

Precision = (true positive)/(true positive + false positive); recall = (true positive)/(true positive + false negative); F 1 -score = (2 × precision × recall)/(precision + recall) (1)

In the “Results” section, we present a comparison of the performance of the annotation module with the performance of state-of-the-art methods [ 8 , 22 , 25 , 26 ].

Forum namePost IDDateTimeADE verbatimADE normalizedConcept unique identifierDrug verbatimDrug normalizedActive ingredientMedDRA codeATC code
Atoute7354October 8, 201821:37:00Maux de têteCéphaléeC0018681LévothyroxLEVOTHYROXLevothyroxine sodiqueH03AA01
Atoute7354October 8, 201821:37:00Maux de têteCéphaléeC0018681Calcium
Atoute7354October 8, 201821:37:00Nodules cancereuxLévothyroxLEVOTHYROXLevothyroxine sodiqueH03AA01
Atoute7354October 8, 201821:37:00Nodules cancereuxCalcium
Atoute7354October 8, 201821:37:00FatigueFatigueC0015672LévothyroxLEVOTHYROXLevothyroxine sodique10016256H03AA01
Atoute7354October 8, 201821:37:00fatigueFatigueC0015672Calcium10016256
Atoute7354October 8, 201821:37:00Perte de poidsPoids diminuéC0043096LévothyroxLEVOTHYROXLevothyroxine sodique10048061H03AA01
Atoute7354October 8, 201821:37:00Perte de poidsPoids diminuéC0043096Calcium10048061

a ADE: adverse event.

b MedDRA: Medical Dictionary for Regulatory Activities Terminology.

c ATC: Anatomical Therapeutic Classification.

d No data are available for this slot.

Statistical Module

This module generates general statistics and diagrams for web forums or Twitter. It provides data such as the number of annotated posts (related to the drug, the ADE, or both), the count of drug-ADE pairs identified, and the distribution of ADEs’ MedDRA-PTs. In addition, a change-point analysis method was used to detect significant changes over time in the mean number of posts mentioning the drug and ADE [ 27 ].

Besides, several statistical signal detection methods were implemented to generate potential signals. Safety signals, which provide information on adverse events that may potentially be caused by a medicine, were further evaluated by pharmacovigilance experts to determine the causal relationship between the medicine and the reported adverse event.

The statistical module implements 3 signal detection methods, including 2 well-known and frequently used disproportionality signal detection methods: the PRR [ 28 ] and the reporting odds ratio (ROR) [ 29 ]. In addition, a complementary method, a logistic regression–based signal detection method known as the class imbalanced subsampling lasso [ 30 ], was used.

PRR and ROR are akin to a relative risk and an odds ratio, respectively. However, they differ in their denominators: as the number of exposed patients is typically unknown in pharmacovigilance databases, the denominator in PRR and ROR calculations is the number of cases reported in the pharmacovigilance database.

PRR and ROR are specific to each drug-ADE pair and can be directly computed from the contingency table ( Table 2 ).

Adverse drug event of interestOther adverse drug events
Drug of interest
Other drugs

The PRR compares the proportion of an ADE among all the ADEs reported for a specific drug with the same proportion for all other drugs in the database (Equation 2). A PRR significantly greater than 1 suggests that the ADE is more frequently reported for patients taking the drug of interest, while a PRR equal to 1 suggests independence between the 2 variables.

PRR = [a/(a + b)]/[c/(c + d)] (2)

The ROR quantifies the strength of the association between drug administration and the occurrence of the ADE. It represents the ratio of the odds of drug administration when the ADE is present to the odds of drug administration when the ADE is absent (Equation 3). When the 2 events are independent, the ROR equals 1. An ROR significantly greater than 1 suggests that drug administration is associated with the presence of the ADE.

ROR = ad / bc (3)

We considered events over posts for the calculation of disproportionality statistics. If the same drug-ADE pair was identified multiple times within a post, the pair was counted as many times as it occurred in the calculation.

Disproportionality analysis has certain limitations, including the confounding effect resulting from coreported drugs and the masking effect, where the background relative reporting rate of an ADE is distorted by extensive reporting on the ADE with a specific drug or drug group. Caster et al [ 31 ] demonstrated through 2 real case examples how multivariate regression–based approaches can address these issues. Harpaz et al also suggested that logistic regression could be used for safety surveillance [ 32 ]. Initially designed for pharmacovigilance case reports, we hypothesize that they may also be applicable to posts.

The logistic regression model specifically focuses on a particular ADE or a group of ADEs. It involves creating a vector that represents the presence (1) or absence (0) of the ADE of interest in the pharmacovigilance case (in our case, in the post). Additionally, a matrix is generated to represent the administration or nonadministration of all drugs in the database by the patient (1 for administration and 0 for nonadministration). Figure S3 in Multimedia Appendix 1 illustrates an example of using logistic regression. In our case, we assumed that if a drug was annotated in the post, it was taken by the patient. The logistic regression aims to predict the probability of the presence of the ADE (ADE=1) of interest based on the presence of all ( N m ) drugs in the database (Equation 4), where X represents the distribution of the presence/absence of the drugs. The adjusted factors included only concomitant medications, as patient-related factors are often missing in web forums’ posts. Therefore, we did not need to address the impact of missing data, which should be evaluated when necessary.

ln([P(X|ADE=1)]/[P(X|ADE=0)]) = a + b1 × Drug1 + ... + bi × Drug i + .. . + bNm × Drug Nm (4)

The selection of the drugs depends on the parameter b i . If b i <0, the drug i decreases the risk of the ADE, and if b i >0, the drug i increases the risk of the ADE.

Then, 2 sets are defined:

  • S 1 : set of n 1 posts with an annotation of the ADEs of interest.
  • S 0 : set of n 0 posts without an annotation of the ADEs of interest.

In our case n 0 >> n 1 , indicating a significant imbalance toward posts lacking annotations of the ADEs of interest. To address this issue, we took a subsample with a more favorable ratio of posts with annotated ADEs versus those without. Additionally, to enhance result stability, we conducted multiple draws instead of just one.

In practice, we generated B subsamples. Each subsample was constructed by randomly drawing, with replacement, n 1 posts from S 1 and R posts from S 0 , where R=max(4 n 1 , 4 N m ). The choice of 4 n 1 was inspired by case-control studies, while 4 N m was included to ensure an adequate number of observations considering the multitude of predictors.

voice user interface case study

We implemented a change-point analysis method described in [ 27 ] to detect whether there was a change in the evolution over time of a chosen statistic, such as the number of a specific drug-ADE pair, the number of ADEs associated with a specific drug, or the number of drugs associated with a specific ADE. The method uses the Cumulative Sum (CUSUM) algorithm to analyze the evolution of statistics over time, comparing current values with the period mean. It identifies breakpoints by calculating the highest difference in statistical values and comparing it with random samples. The process repeats for periods before and after detected breakpoints until no more are found.

User Interface Module

The user interface module facilitates user interaction with the pipeline in a user-friendly manner. The interface comprises a dashboard divided into 2 main parts. The left dark column ( Figure 2 ) serves as a control sidebar, where users can select parameters to filter the data, including the forum, period, drug(s) according to the ATC classification, and ADE(s) according to a level in the MedDRA hierarchy. On the right side of the interface, various visualizations are available, organized into several tabs such as “Forum Statistics” and “Consultation of Posts,” with additional tabs for statistics that become active upon querying.

Before applying a specific query, the interface provides general information about the currently available data ( Figure 2 ), including the total annotated posts since 2017 (n=2,081,296) and total annotations since 2017 (n=2,454,310). In addition, a “Consultation of Tweets” tab (not visible in the figure) displays the total annotated tweets since March 2020 (n=46,153).

Furthermore, several tabs corresponding to different types of statistics, including “Forums Statistics” and “Twitter Statistics,” provide general statistics and diagrams for web forums and Twitter. Examples of these are pie charts showing forum distribution, line charts depicting the evolution of drug and ADE mentions, histograms displaying ADE distribution by system organ class, and line charts illustrating the temporal trend of posts containing the drug and an ADE, as shown in Figures 3 and 4 . The “Annotations Plot” tab displays annotations of drugs and adverse effects selected by the user, along with forum information, PTs, high-level terms, high-level group terms, dates, and hours. The “Logistic Regression” tab allows users to choose parameters for applying logistic regression. In the “Disproportionality” tab, users can choose between the PRR and ROR methods, with the time evolution of the chosen method displayed. The “Change-Point” tab enables analysis of temporal evolution, with identified breakpoints indicated. The “Consultation of Posts” and “Consultation of Tweets” tabs provide details on annotated posts/tweets, including downloadable tables. The statistical module performs calculations based on user queries, updating the interface accordingly. If multiple drugs or adverse events are selected, they are treated as new entities for analysis.

The interface was implemented using the R language and environment (R Foundation) for statistical computing and graphics [ 33 ], leveraging the Shiny package [ 34 ] for development.

voice user interface case study

Ethical Considerations

A statement by an Institutional Review Board was not required because we used only publicly available data that do not necessitate Institutional Review Board review.

This study complied with the European General Data Protection Regulation (GDPR), which has been in force since 2018 in Europe [ 35 ]. The GDPR enhances the protection of individuals by introducing the right to be informed about the processing of personal data. However, informing each user individually may be impractical. Therefore, the GDPR introduces 2 legal conditions where informed consent is not mandatory, which can be interpreted as supporting the processing of web forum posts for pharmacovigilance (Article 9): “(e) processing relates to personal data which are manifestly made public by the data subject; [. . .] (i) processing is necessary for reasons of public interest in the area of public health, such as [. . .] ensuring high standards of quality and safety of health care and of medicinal products . . ..” The GDPR also requires data processing to “not permit or no longer permits the identification of data subjects” (Article 89). Deidentification was conducted during the extraction of posts from web forums to ensure privacy [ 9 ]. User identifiers in the main RDF file were encrypted using the SHA1 algorithm [ 36 ]. The correspondence between these encrypted identifiers and the original keys is presented in RDF triplets in a separate file, referred to as the “keys file.” Therefore, the only way to retrieve the original authors’ identities is by concatenating the main RDF containing the encrypted data with the keys file, which is kept in a secured location. Moreover, all our data processing was carried out on a secured server with restricted access.

General Results About the Pipeline

The primary outcome of this study is the operational PHARES pipeline itself. Daily extraction and annotation of posts are initiated and imported into the database linked to the user interface. In this paper, the platform’s use will be demonstrated through a specific use case on the analysis of Levothyrox ADE mentions in forums (discussed later). In addition, we conducted a comparative analysis of the PHARES pipeline with the existing platforms mentioned in the “Introduction” section, based on the criteria listed in the “Methods” section.

Of the 10 identified pipelines, half were public and half were private. While 8 out of 10 focused on ADEs, only 4 were designed for routine usage. Five scrapers were open source, and all posts from considered websites were extracted by only 6 of the scrapers (with others extracting posts under certain conditions). Six scraped web forum posts, but only 3 performed deidentification. Additionally, 4 pipelines focused on the French language. A total of 6 pipelines displayed the temporal evolution of the number of posts, but only 1 conducted a change-point analysis. Signal detection methods were performed by only 4 of them, with none displaying the temporal evolution of the PRR nor a logistic regression–based method. Finally, 6 of them had an interface ( Table 3 ).

PipelineGeneralScraperAnnotationStatisticsSignal detection

Focus on ADEs Routine usagePublic/privateAll postsDeidentificationWeb forumsOpen sourceFrench languageTemporal evolutionChange-point analysisSignal detectionPRR temporal evolutionLogistic regressionInterface
PREDOSE XPublicXXXXXX
Insight ExplorerXPrivateXXXXXXXXX
MedWatcher SocialPublicXXXXXX
Yeleswarapu et al [ ]XPrivateXXXXXXXXXX
DominoXPublicXXXXX
Nikfarjam et al [ ]XPublic and PrivateXXXXXXXXXXX
Magge et al [ ]XPublicXXXXXXXX
ADR-PRISM XPublic and PrivateXXXX
Kappa SantéPrivateXXX
Expert SystemXPrivateXXXXXX

a PHARES: Pharmacovigilance in Social Networks.

b The X symbol means that the characteristic is missing and the symbol ✓ means the characteristic is fulfilled.

c ADE: adverse drug event.

d PRR: proportional reporting ratio.

e PREDOSE: Prescription Drug Abuse Online Surveillance and Epidemiology.

f ADR-PRISM: Adverse Drug Reaction from Patient Reports in Social Media.

Annotation Module’s Comparison With Up-to-Date State-of-the-Art Methods

We also compared the performance of our annotation process with those of up-to-date state-of-the-art methods ( Table 4 ).

While the annotation module demonstrated good performance for named entity recognition ( F 1 -score=0.886), it remains slightly below the state of the art. Presently, in medical texts, the best performances are achieved by Hussain et al [ 25 ] and Ding et al [ 26 ] for the named entity recognition task, and by Xia [ 22 ] for the relationship extraction task. On Twitter, known for its notably more complex data, Hussain et al [ 25 ] achieved slightly better results than our annotator, while Ding et al [ 26 ] achieved slightly worse results.

AnnotatorLanguageDataNatural language processing methodNamed entity recognition (precision; recall; -score)Relationship extraction (precision; recall; -score)
PHARES FrenchPatient’s web drug reviewConditional random fields and support vector machines0.926; 0.845; 0.8860.683; 0.956; 0.797
Magge et al [ ]EnglishTwitterBERT neural networks0.82; 0.76; 0.78
Xia [ ]EnglishMedical textsHAMLE model0.929; 0.914; 0.921
Hussain et al [ ]EnglishMedical texts (PubMed) and TwitterBERT0.982; 0.964; 0.976 (PubMed) and 0.840; 0.861; 0.896 (X/Twitter)
Ding et al [ ]EnglishMedical texts (PubMed) and TwitterBGRU + char LSTM attention + auxiliary classifier0.867; 0.948; 0.906 (PubMed) and 0.785; 0.914; 0.844 (Twitter)

a The 2 categories are entity recognition, which is the detection of a drug or ADE mention, and relationship extraction, which is the detection of a relation between a drug and an ADE.

b PHARES: Pharmacovigilance in Social Networks.

c BERT: Bidirectional Encoder Representations from Transformer.

d Not available.

e HAMLE: Historical Awareness Multi-Level Embedding.

f BGRU: Bidirectional Gated Recurrent Unit.

g LSTM: Long-Short-Term-Memory.

Summary of the Result

From January 1, 2017, to February 28, 2021, a total of 2,081,296 posts were extracted from 23 French web forums ( Table 5 ). We obtained 713,057 normalized annotations of drugs, 1,527,004 normalized annotations of ADEs, and 437,192 annotations of normalized drug-ADE couples. The number of posts annotated with at least one normalized drug-ADE couple was equal to 125,279 (6.02%). Table 4 summarizes the number of posts extracted per forum, the publication dates, and the description of the web forum. For 1 forum, the publication dates were not available. A total of 9 were generalist health forums, 3 were specialized for parents of a young baby, 2 for families, 3 for mothers, 2 specialized in thyroid issues, 1 for pregnant women, 1 for women, 1 for parents of a teenager or for teenagers, 1 for sports persons, and 1 specialized in rare diseases.

ForumExtracted posts, nPublication date of the first extracted postPublication date of the last extracted postDescription
thyroideNEW451,253February 15, 2001February 25, 2021Specialized in thyroid issues
doctissimoSante248,691March 19, 2003January 16, 2021Generalist health forum
doctissimoNutrition183,730December 30, 2002January 16, 2021Specialized in nutrition
infoBebe127,341November 30, 2000March 08, 2019Specialized for parents of a young baby
atoute118,415February 05, 2005February 28, 2021Generalist health forum
notreFamille97,098March 16, 2000October 26, 2017Specialized for families
magicMaman96,713June 14, 1999February 22, 2021Specialized for mothers
doctissimoMed95,531August 05, 2002January 15, 2021Generalist health forum
doctissimoGrossesse93,449November 09, 2006January 15, 2021Specialized for pregnant women
thyroide73,376September 25, 2001January 07, 2019Specialized in thyroid issues
aufeminin72,732April 05, 2001January 09, 2020Specialized for women
mamanVie69,167June 07, 2006April 10, 2019Specialized for mothers
onmeda61,428July 25, 2001February 24, 2021Generalist health forum
ados58,181June 20, 2006March 08, 2019Specialized for parents of a teenager or for teenagers
carenity52,659May 16, 2011August 29, 2020Generalist health forum
famili51,844November 06, 2000November 17, 2019Specialized for families
babyFrance43,806January 20, 2003April 30, 2018Specialized for parents of young baby
bebeMaman38,450Specialized for mothers of young baby
alloDocteurs15,833June 15, 2009February 09, 2021Generalist health forum
reboot9383May 04, 2016February 25, 2021Generalist health forum
futura6765May 12, 2003February 22, 2021Generalist health forum
sportSante6350May 10, 2011January 14, 2020Specialized for sportsperson
maladieRares4827October 09, 2012May 14, 2020Specialized in rare diseases
queChoisir4250June 16, 2003February 11, 2021Generalist health forum

a Not available.

Use Case: Analysis of Levothyrox ADE Mentions in Forums

To demonstrate the usage of the pipeline, we chose to focus on Levothyrox as a case study. Levothyrox is a drug prescribed in France since 1980 for hypothyroidism and circumstances where it is necessary to limit the thyroid-stimulating hormone. In 2017, a new formula of Levothyrox, differing from the 30-year-old drug at the excipient level (with lactose being replaced by mannitol and citric acid in the new formula), was marketed with widespread media coverage. In parallel, an unexpected increase in notifications of ADEs for this drug was detected. Viard et al [ 37 ] were unable to find any pharmacological rationale to explain that signal. Approximately 32,000 adverse effects were reported by patients in France in 2017, representing 42% of all the ADEs collected yearly [ 38 ]. Most of these notifications concerned the new formulation of Levothyrox and led to the “French Levothyrox crisis.” In 2017, 1664 notifications of ADEs were spontaneously reported by patients to the Pharmacovigilance Center of Nice. Among the 1544 reviewed notifications, 1372 concerned Levothyrox while only 172 concerned other drugs [ 37 ].

In this use case, the study period was from January 1, 2017, to February 28, 2021, and the drugs included were 2 drugs from the “H03AA Thyroid hormones” ATC class: “Levothyroxine sodium” and “associations of levothyroxine and liothyronine.” A total of 17 forums were selected as they included at least one post with information about these drugs. Posts were extracted, annotated, and analyzed through the pipeline from several forums ( Table 6 ). Signal detection methods were applied to an ADE chosen as it frequently appeared with Levothyrox in our data: “tiredness.” A signal can be detected when the lower bound of the 95% CI of the logarithm of the PRR is greater than 0. For logistic regression, we applied the tenth quantile. A total of 11,340 posts contained an annotation concerning the drugs of interest. Figure S4 in Multimedia Appendix 1 illustrates the source and evolution over time of these posts. Out of a total of 50,127 annotations of Levothyrox, they principally originated from the Vivre sans thyroïde forum and were mostly posted in mid-2017 ( Figure 4 , Table 6 ). The results of the statistical analysis were displayed by the user interface.

ADEs annotated with Levothyrox were mainly from system organ classes: general disorders and administration site conditions (29.6%), metabolism and nutrition disorders (11.6%), and endocrine disorders (11.4%). The PTs mostly found in association with Levothyrox are listed in Table 7 . All this information is accessible in the interface module (Figure S5 in Multimedia Appendix 1 ).

We chose the PT “tiredness” for the signal detection analysis. A total of 85,976 posts were annotated with either one of the drugs of interest or the ADE tiredness. Among them, 1841 Levothyrox-tiredness couples were found, mostly in 2017 ( Table 7 ).

Figure 5 illustrates the time evolution of the PRR for the Levothyrox-tiredness couple. Figure S6 in Multimedia Appendix 1 displays the source and evolution over time of French web forums’ posts for this couple. A signal is consistently generated throughout the period as the logarithm of the PRR is always greater than 0.

voice user interface case study

ForumValue, nCumulative frequency, %
Vivre sans thyroïde41,21182.21
Doctissimo Santé423090.65
Doctissimo Grossesse147693.60
Doctissimo Nutrition117795.94
Carenity86397.67
Allo docteurs50298.67
Atoute17099.01
Doctissimo medicaments16699.34
Que choisir8599.51
Maladie rares7699.66
Au feminin5899.77
Sport santé5099.87
Onmeda4899.97
Famili799.98
Futura599.99
Maman vie2100.00
Magic maman1100.00
Preferred termsValues, n
Pain1882
Tiredness1841
Faintness1267
Hypothyroidism1110
Dizziness912
Insomnia627
Palpitations571
Hyperthyroidism568
Malignant tumor560
Anxiety498
Overdose490
Nervous tension484
Myalgia409
Nausea388
Stress380
Diarrhea354
Tachycardia322
Muscle spasms321
Convulsions302
Arthralgia276

voice user interface case study

A total of 11 drugs were found to be associated with tiredness using logistic regression: paclitaxel, pegfilgrastim, Levothyrox, glatiramer acetate, escitalopram ferrous sulfate, the combination of Levothyrox and liothyronine, secukinumab, methotrexate, bismuth potassium, tetracycline, and metronidazole.

Change-point analysis was conducted on the monthly evolution of the number of Levothyrox-ADE couples detected in web forums. Six breakpoints were identified ( Figure 6 ), and 3 of them correlated with an increase in the number of ADEs found with Levothyrox on web forums. These increases occurred in August 2017 and in September and December 2018.

This use case demonstrates that the results obtained through the pipeline, particularly in the context of Levothyrox, align with findings in the literature derived from more traditional data sources such as case reports in pharmacovigilance (see the “Discussion” section). It underscores the potential of leveraging such a pipeline to monitor a drug, not only retrospectively but also in real time using social media. Consequently, PHARES has the capability to potentially uncover new signals in pharmacovigilance.

voice user interface case study

Principal Findings

To align with our objective, we implemented and evaluated a pipeline that processes data from the extraction of web forum posts to the generation of indicators and alerts within a visual and interactive environment. Through this pipeline, we demonstrated that quantitative analysis can be conducted through the interface without requiring the user to code. We discovered the feasibility of acquiring information akin to the literature regarding a drug’s ADEs, as well as unexpected ADEs and significant event dates related to a drug. This underscores the relevance and utility of such a pipeline.

A conceptual contribution of this research was the proposal of a methodology for designing a pipeline to facilitate pharmacovigilance studies on web forums. This involved describing 4 independent modules and outlining their interactions. Additionally, another contribution was the adaptation of certain pharmacovigilance analysis methods for the examination of data extracted from web forum posts. The logistic regression–based method presented in this article was originally tailored for pharmacovigilance cases to consider co-prescriptions of drugs. We have adapted it to suit the analysis of pharmacovigilance data extracted from web forum posts.

Comparison With Prior Work

The PHARES pipeline offers added value compared with previous pipelines in terms of the criteria set, which reflects an analysis of experts’ needs for routine monitoring of ADEs in social media. Unlike previous approaches, the scrapers used in PHARES routinely perform deidentification, and the inclusion of change-point analysis, the evolution of PRRs over time, and a logistic regression–based signal detection method were previously unavailable. The temporal evolution of the number of posts and a signal detection method are also seldom supported. Designed for routine usage and focused on ADEs, all posts from selected web forums are scraped and deidentified using an open-source scraper.

The period and selected web forums differed between both studies: Audeh et al [ 38 ] covered the period from January 2015 to December 2017, while our study spanned from January 2017 to February 2021. Additionally, Audeh et al [ 38 ] included only 1 web forum specialized in thyroid issues, whereas we incorporated this specific forum along with 16 others. The main ADEs associated with Levothyrox in our study align with those found by Audeh et al [ 38 ] on similar data, albeit without using the interface. In our study, the 10 most frequent symptoms were pain, tiredness, faintness, hypothyroidism, dizziness, insomnia, palpitations, hyperthyroidism, malignant tumor, and anxiety. By contrast, Audeh et al [ 38 ] reported tiredness, weight gain, pain, ganglions, hot flush, chilly, inflammation, faintness, weight loss, and discomfort.

Furthermore, the PHARES pipeline surpasses previous efforts, particularly regarding several criteria. These include the annotation tool, where only 4 pipelines were identified using a French annotator tool. In terms of available statistics, only 1 pipeline met both criteria we identified. Regarding signal detection, among the 3 criteria identified, 5 pipelines matched with only 1, while the remaining 5 matched with none.

In the use case, a notable increase in the number of ADEs associated with Levothyrox was detected using the change-point analysis method a few months after the introduction of the new formula in March 2017, specifically in August 2017. This surge coincided with the initial declaration to the pharmacovigilance network and a petition initiated by patients to reintroduce the former formula in June 2017. We compared these findings with results from a pharmacovigilance study based on spontaneous reporting. Out of 1554 notifications spontaneously addressed by patients to the Pharmacovigilance Center of Nice from January 1, 2017, to December 31, 2017, 1372 were related to the new formula of Levothyrox, representing 7342 ADEs. Our comparison with these data clarified our findings. The 10 most frequently reported ADEs in these notifications closely resembled our own results [ 37 ]. These were asthenia, headache, dizziness, hair loss, insomnia, cramps, weight gain, nausea, muscle pain, and irritability. Consequently, our results demonstrate coherence with the existing literature. This study illustrates the feasibility of identifying the date of significant events related to a drug. However, it is noteworthy that the detection of such events is not necessarily expedited through social media compared with the traditional pharmacovigilance system.

Limitations

The method used in our annotation process was integrated at an early stage during the pipeline’s design. Regarding the identification of drugs and symptoms, our annotation process exhibited the following performances: precision=0.926, recall=0.845, and F 1 -score=0.886 [ 20 ]. Similarly, for discerning the relationship between the drug and the ADEs, the performances were precision=0.683, recall=0.956, and F 1 -score=0.797 [ 20 ]. This study marked the inaugural publication on using NLP methods to identify ADEs in French-language web forums. The annotation process was thus developed using contemporary state-of-the-art methodologies at the time. However, it would now stand to gain from the integration of more recent NLP algorithms for named entity recognition [ 8 , 23 , 24 ]. These newer algorithms offer comparable performances while effectively handling more complex data, thereby enhancing the efficacy of NLP analysis. However, because of our emphasis on the genericity of the approach and the interoperability between the different modules rather than solely focusing on the performance of each module, we opted not to use these algorithms. Nevertheless, contemporary state-of-the-art methods for annotating ADEs from social media posts encompass convolutional neural networks trained on top of pretrained word vectors for sentence-level classification [ 24 ] and transformers using the bidirectional encoder representations from transformers (BERT) language model [ 39 ]. Hussain et al [ 25 ] introduced a multitask neural network based on BERT with hyperparameter optimization capable of sentence classification and named entity recognition. This model achieved performances of precision=0.840, recall=0.861, and F 1 -score=0.896 on the Twitter (X)-TwiMed data set. Additionally, Magge et al [ 8 ] presented a pipeline consisting of 3 BERT neural networks designed to classify sentences, extract named entities, and normalize those entities to their respective MedDRA concepts. The performances of this model were as follows: precision=0.82, recall=0.76, and F 1 -score=0.78 on the SMM4H-2020 data set (Twitter/X). Thanks to our modular design, it will be straightforward to substitute our current annotation process with an enhanced model in the future.

Several limitations should be acknowledged for future work. First, the scraper relies on the HTML structure of web forums, necessitating updates to its configuration files if a forum alters its page design. Additionally, our interface lacks the capability to incorporate alternate identifiers for drugs or ADEs. For instance, patients may commonly refer to the drug “baclofen” as “baclo” on social media platforms. Consequently, the number of posts pertaining to a drug or ADE could potentially be underestimated.

Forums must be selected before query execution to mitigate calculation time. However, selecting forums based on the presence of information related to a particular drug or ADE can introduce bias into signal detection methods, particularly in disproportionality analysis, where the drug-ADE pair may be overrepresented. Another limitation in qualitative analysis of posts is the inability of users to edit annotations or record typical pharmacovigilance qualitative data.

The assumption that all drugs mentioned in a post were consumed simultaneously by the user, as applied in the logistic regression–based method, introduces an evident bias.

One limitation associated with the use of social media data pertains to fraudulent posts. The pseudonymity inherent in these platforms provides malevolent individuals with the opportunity to disseminate false rumors. Additionally, patients might post identical or similar messages across multiple discussion boards, or even multiple times on the same board. Thus, it is crucial to consider these factors to mitigate biases in signal detection.

Perspectives

In the short to medium term, our objectives are updating the annotation module to enhance accuracy, improving the qualitative analysis by enabling users to edit and correct annotations, and expanding the range of signal detection methods available in the statistics module.

This method could indeed be beneficial for identifying potential drug misuse and unknown ADEs [ 40 ]. By categorizing pathological terms found in web forums based on their presence in the summary of product characteristics, we can distinguish between indications, known ADEs, and potential instances of drug misuse or unexpected ADEs. However, it is important to note that considering all pathological terms found in the summary of product characteristics as indications might obscure cases of drug inefficiency. Therefore, a nuanced approach is necessary to ensure comprehensive and accurate analysis.

We next tested our pipeline from the perspective of end users. However, the hypothesis was only partially confirmed, indicating the need for further studies. These studies should include evaluations with ergonomic criteria.

In the long term, our vision is to expand this tool to encompass other languages and themes beyond pharmacovigilance. This includes areas such as drug misuse, the consumption of food supplements, and the use of illegal drugs. French web forums dedicated to recreational drug use already exist, providing a valuable source of data for such endeavors.

Conclusions

Our hypothesis focused on the challenge encountered by regulatory agencies in using social media, primarily because of the lack of appropriate decision-making tools. To tackle this challenge, we devised a pipeline consisting of 4 editable modules aimed at effectively analyzing health-related French web forums for pharmacovigilance purposes. Using this pipeline and its user-friendly interface, we successfully demonstrated the feasibility of conducting quantitative analyses without the need for coding. This approach yielded coherent results and holds the potential to reveal new insights about drugs.

A practical implication of our pipeline is its potential application in health surveillance by regulatory agencies such as the ANSM or pharmaceutical companies. It can be instrumental in detecting issues related to drug safety and efficacy in real time. Furthermore, research teams can leverage this tool to retrospectively analyze events and gain valuable insights into pharmacovigilance trends.

Acknowledgments

The annotation module was developed by François Morlane-Hondère, Cyril Grouin, Pierre Zweigenbaum, and Leonardo Campillos-Llanos from the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI). Code review for the graphical user interface in R language was performed by Stevenn Volant under a contract with the Stat4Decision company. Stat4Decision was not involved in designing the study and writing this article. This work was funded by the Agence nationale de sécurité du médicament et des produits de santé (ANSM) through Convention No. 2016S076 and was supported by a PhD contract with Sorbonne Université.

Data Availability

Our data were extracted from web forums that do not allow data sharing. Thus, as we are not the owners of the data we cannot make the data available. The scrapper we developed to extract these data is open source and can be used to extract data from web forum posts. The tool as well as full documentation (in English and French) of the code and configuration file are available online [ 41 ].

Conflicts of Interest

None declared.

Vigi4Med Scraper structure, PHARES database structure, example of data representation, and source and evolution over time of web forum posts. PHARES: Pharmacovigilance in Social Networks.

  • Hazell L, Shakir SAW. Under-reporting of adverse drug reactions : a systematic review. Drug Saf. 2006;29(5):385-396. [ CrossRef ] [ Medline ]
  • Liu F, Jagannatha A, Yu H. Towards drug safety surveillance and pharmacovigilance: current progress in detecting medication and adverse drug events from electronic health records. Drug Saf. Jan 2019;42(1):95-97. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Warrer P, Hansen EH, Juhl-Jensen L, Aagaard L. Using text-mining techniques in electronic patient records to identify ADRs from medicine use. Br J Clin Pharmacol. May 2012;73(5):674-684. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Black C, Tagiyeva‐Milne N, Helms P, Moir D. Pharmacovigilance in children: detecting adverse drug reactions in routine electronic healthcare records. A systematic review. Brit J Clinical Pharma. May 28, 2015;80(4):844-854. [ CrossRef ] [ Medline ]
  • Cameron D, Smith GA, Daniulaityte R, Sheth AP, Dave D, Chen L, et al. PREDOSE: a semantic web platform for drug abuse epidemiology using social media. Journal of Biomedical Informatics. Dec 2013;46(6):985-997. [ CrossRef ] [ Medline ]
  • Yeleswarapu S, Rao A, Joseph T, Saipradeep VG, Srinivasan R. A pipeline to extract drug-adverse event pairs from multiple data sources. BMC Med Inform Decis Mak. Feb 24, 2014;14(1):1-16. [ CrossRef ]
  • Nikfarjam A, Ransohoff JD, Callahan A, Jones E, Loew B, Kwong BY, et al. Early detection of adverse drug reactions in social health networks: a natural language processing pipeline for signal detection. JMIR Public Health Surveill. Jun 03, 2019;5(2):e11264. [ CrossRef ] [ Medline ]
  • Magge A, Tutubalina E, Miftahutdinov Z, Alimova I, Dirkson A, Verberne S. DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. J Am Med Inform Assoc. Sep 18, 2021;28(10):2184-2192. [ CrossRef ] [ Medline ]
  • Audeh B, Beigbeder M, Zimmermann A, Jaillon P, Bousquet C. Vigi4Med scraper: a framework for web forum structured data extraction and semantic representation. PLoS One. Jan 25, 2017;12(1):e0169658. [ CrossRef ] [ Medline ]
  • Caster O, Dietrich J, Kürzinger ML, Lerch M, Maskell S, Norén GN, et al. Assessment of the utility of social media for broad-ranging statistical signal detection in pharmacovigilance: results from the WEB-RADR project. Drug Saf. Dec 2018;41(12):1355-1369. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bousquet C, Audeh B, Bellet F, Lillo-Le Louët A. Comment on "Assessment of the utility of social media for broad-ranging statistical signal detection in pharmacovigilance: results from the WEB-RADR project". Drug Saf. Dec 19, 2018;41(12):1371-1373. [ CrossRef ] [ Medline ]
  • Karapetiantz P, Bellet F, Audeh B, Lardon J, Leprovost D, Aboukhamis R, et al. Descriptions of adverse drug reactions are less informative in forums than in the French pharmacovigilance database but provide more unexpected reactions. Front Pharmacol. May 1, 2018;9:439-411. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Lardon J, Abdellaoui R, Bellet F, Asfari H, Souvignet J, Texier N, et al. Adverse drug reaction identification and extraction in social media: a scoping review. J Med Internet Res. Jul 10, 2015;17(7):e171. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Karapetiantz P, Audeh B, Faille J, Lillo-Le Louët A, Bousquet C. Qualitative and quantitative analysis of web forums for adverse events detection: "strontium ranelate" case study. Stud Health Technol Inform. Aug 21, 2019;264:964-968. [ CrossRef ] [ Medline ]
  • Casperson T, Painter J, Dietrich J. Strategies for distributed curation of social media data for safety and pharmacovigilance. 2016. Presented at: International Conference on Data Science (ICDATA); October 1, 2016:118-124; Barcelona, Spain.
  • Freifeld CC. Digital pharmacovigilance: The medwatcher system for monitoring adverse events through automated processing of internet social media and crowdsourcing. OpenBU Libraries. Boston University. OpenBU; 2014. URL: https://open.bu.edu/handle/2144/10995
  • Cossin S, Lebrun L, Lobre G, Loustau R, Jouhet V, Griffier R, et al. Romedi: an open data source about French drugs on the semantic web. Stud Health Technol Inform. Aug 21, 2019;264:79-82. [ CrossRef ] [ Medline ]
  • Abdellaoui R, Schück S, Texier N, Burgun A. Filtering entities to optimize identification of adverse drug reaction from social media: how can the number of words between entities in the messages help? JMIR Public Health Surveill. Jun 22, 2017;3(2):e36. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bousquet C, Dahamna B, Guillemin-Lanne S, Darmoni SJ, Faviez C, Huot C, et al. The adverse drug reactions from patient reports in social media project: five major challenges to overcome to operationalize analysis and efficiently support pharmacovigilance process. JMIR Res Protoc. Sep 21, 2017;6(9):e179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Morlane-Hondère F, Grouin C, Zweigenbaum P. Identification of drug-related medical conditions in social media. 2016. Presented at: The Tenth International Conference on Language Resources and Evaluation (LREC'16); May 2, 2016:2022-2028; Portoroz, Slovenia.
  • Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. San Francisco, CA. Morgan Kaufmann Publishers; 2001. Presented at: Eighteenth International Conference on Machine Learning (ICML 2001); June 28, 2001 to July 1, 2001:282-289; Williamstown, MA.
  • Xia L. Historical profile will tell? A deep learning-based multi-level embedding framework for adverse drug event detection and extraction. Decision Support Systems. Sep 2022;160:113832. [ CrossRef ]
  • Yu D, Vydiswaran VGV. An assessment of mentions of adverse drug events on social media with natural language processing: model development and analysis. JMIR Med Inform. Sep 28, 2022;10(9):e38140. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rezaei Z, Ebrahimpour-Komleh H, Eslami B, Chavoshinejad R, Totonchi M. Adverse drug reaction detection in social media by deep learning methods. Cell J. Oct 2020;22(3):319-324. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hussain S, Afzal H, Saeed R, Iltaf N, Umair MY. Pharmacovigilance with transformers: a framework to detect adverse drug reactions using BERT fine-tuned with farm. Comput Math Methods Med. 2021;2021:5589829. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ding P, Zhou X, Zhang X, Wang J, Lei Z. An attentive neural sequence labeling model for adverse drug reactions mentions extraction. IEEE Access. 2018;6:73305-73315. [ CrossRef ]
  • Xu Z, Kass-Hout T, Anderson-Smits C, Gray G. Signal detection using change point analysis in postmarket surveillance. Pharmacoepidemiol Drug Saf. Jun 22, 2015;24(6):663-668. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Evans SJW, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. Dec 10, 2001;10(6):483-486. [ CrossRef ] [ Medline ]
  • van Puijenbroek EP, Bate A, Leufkens HGM, Lindquist M, Orre R, Egberts ACG. A comparison of measures of disproportionality for signal detection in spontaneous reporting systems for adverse drug reactions. Pharmacoepidemiol Drug Saf. Feb 06, 2002;11(1):3-10. [ CrossRef ] [ Medline ]
  • Ahmed I, Pariente A, Tubert-Bitter P. Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions. Stat Methods Med Res. Mar 25, 2018;27(3):785-797. [ CrossRef ] [ Medline ]
  • Caster O, Norén GN, Madigan D, Bate A. Large‐scale regression‐based pattern discovery: the example of screening the WHO global drug safety database. Statistical Analysis. Jul 20, 2010;3(4):197-208. [ CrossRef ]
  • Harpaz R, DuMouchel W, LePendu P, Bauer-Mehren A, Ryan P, Shah NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther. Jun 11, 2013;93(6):539-546. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Team R. The R Project for Statistical Computing. R Foundation. URL: http://www.R-project.org/ [accessed 2024-04-26]
  • Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y. shiny: web application framework for R. Comprehensive R Archive Network. URL: https://CRAN.R-project.org/package=shiny [accessed 2023-01-30]
  • Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). EUR-Lex. URL: https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX:32016R0679 [accessed 2024-04-26]
  • SHA-1. Wikipedia. 2023. URL: https://en.wikipedia.org/w/index.php?title=SHA-1&oldid=1135933131 [accessed 2023-01-30]
  • Viard D, Parassol-Girard N, Romani S, Van Obberghen E, Rocher F, Berriri S, et al. Spontaneous adverse event notifications by patients subsequent to the marketing of a new formulation of Levothyrox amidst a drug media crisis: atypical profile as compared with other drugs. Fundam Clin Pharmacol. Aug 07, 2019;33(4):463-470. [ CrossRef ] [ Medline ]
  • Audeh B, Grouin C, Zweigenbaum P, Bousquet C, Jaulent M, Benkhebil M. French Levothyrox® crisis: retrospective analysis of social media. Bogota, Colombia. Springer International Publishing; 2019. Presented at: Conference ISOP - International Society of Pharmacovigilance; October 1, 2019; Bogota, Colombie. URL: https://hal.archives-ouvertes.fr/hal-02411632
  • Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019. 2019. Presented at: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; June 2-7, 2019:4171-4186; Minneapolis, MN. URL: https://aclanthology.org/N19-1423.pdf
  • Campillos-Llanos L, Grouin C, Lillo-Le Louët A, Zweigenbaum P. Initial experiments for pharmacovigilance analysis in social media using summaries of product characteristics. Stud Health Technol Inform. Aug 21, 2019;264:60-64. [ CrossRef ] [ Medline ]
  • Vigi4Med Scraper. GitHub. URL: https://github.com/bissana/Vigi4Med-Scraper [accessed 2024-04-22]

Abbreviations

adverse drug event
Agence nationale de sécurité du médicament et des produits de santé
Anatomical Therapeutic Classification
Bidirectional Encoder Representations from Transformer
comma-separated values
Cumulative Sum
European Medicines Agency
Food and Drug Administration
French Pharmacovigilance Database
General Data Protection Regulation
French National Health Authority
Medical Dictionary for Regulatory Activities Terminology
natural language processing
Pharmacovigilance in Social Networks
Prescription Drug Abuse Online Surveillance and Epidemiology
proportional reporting ratio
preferred term
resource description framework
reporting odds ratio
Recognizing Adverse Drug Reactions

Edited by A Mavragani; submitted 01.02.23; peer-reviewed by S Matsuda, L Shang; comments to author 06.07.23; revised version received 20.10.23; accepted 12.03.24; published 18.06.24.

©Pierre Karapetiantz, Bissan Audeh, Akram Redjdal, Théophile Tiffet, Cédric Bousquet, Marie-Christine Jaulent. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.06.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

IMAGES

  1. SMART AGER

    voice user interface case study

  2. Designing a voice user interface is easier than you may think

    voice user interface case study

  3. Learn to Design a Flawless Voice User Interface

    voice user interface case study

  4. How-to Guide for a Flawless Voice User Interface Design

    voice user interface case study

  5. Voice User Interface Design

    voice user interface case study

  6. A Definitive Guide to Voice User Interface Design (VUI)

    voice user interface case study

VIDEO

  1. MY REAL VOICE REVEALED!

  2. Experience the Future

  3. Voice User Interface Solution for RA MCUs

  4. Conversational User Interfaces: Enhancing User Experience through Dialogue

  5. Use Cases 2.0 with Ivar Jacobson

  6. Vonage Campus 2019: Getting started with Voice API

COMMENTS

  1. A Definitive Guide to Voice User Interface Design (VUI)

    Conclusion. The key takeaways of this post are: A speech interface is a VUI (Voice User Interface) referring to an interface that requires voice interaction.; It is different from a tangible user interface, which requires interactions with physical gestures, such as tapping or swiping.; Designers need to carry out thorough research and observations on the user persona, device persona and ...

  2. Conversations with a digital assistant

    Conclusion. Overall, this case study touched on a few key items when designing a conversational interface: The scenarios the assistant should support. The types of statements the assistant should take in. The mapping of the statements to the user flow. The exchange of conversation on different devices.

  3. Voice User Interfaces (VUI): Making the transition from UX Design

    A VUI UX design case study focuses on the process, methodology, and outcomes of designing a voice user interface (VUI) without relying on visuals. While traditional UX case studies often include visual artifacts like wireframes or prototypes, a VUI case study primarily focuses on the conversational flow, user interactions, and the effectiveness ...

  4. Voice Interaction Design: The Ultimate Guide

    [Capital One]: A case study of how Capital One designed a voice user interface for customers to access their banking services through smart speakers. Conclusion Voice interaction design presents an exhilarating and complex realm, necessitating a fusion of skills and expertise from diverse fields such as user experience design, natural language ...

  5. Everything You Want To Know About Creating Voice User Interfaces

    1. Voice-first Design. You need to design hands-free and eyes-free user interfaces. Even when a VUI device has a screen, we should always design for voice-first interactions. While the screen can complement the voice interaction, the user should be able to complete the operation with minimum or no look at the screen.

  6. Voice User Interfaces (VUI)

    In this case, it is safe to assume that voice interaction is one of many possible types of interaction. The user has access to multiple alternative interaction implements: a remote, a paired smartphone, a gaming controller, or a connected IoT device. Voice, therefore, does not necessarily become the default mode of interaction. It is one of many.

  7. A Developer's Guide To Voice User Interface Design

    Designing a conversational flow is at the heart of a voice user interface design. Nailing a logical yet free-flowing dialog flow that mimics how a user would interact with the voice interface can make or break the experience. An effective approach is to use real-life dialogs as references in use cases to help guide the design process.

  8. Designing a VUI

    A speech interface, better known as a VUI (Voice User Interface), is an invisible interface that requires voice to interact with it. A common device that has voice recognition software is the Amazon Alexa smart speaker. ... An E-commerce Checkout Design Case Study. World-class articles, delivered weekly. Sign Me Up. By entering your email, you ...

  9. What are Voice User Interfaces (VUI)?

    Voice user interfaces (VUIs) allow the user to interact with a system through voice or speech commands. Virtual assistants, such as Siri, Google Assistant, and Alexa, are examples of VUIs. The primary advantage of a VUI is that it allows for a hands-free, eyes-free way in which users can interact with a product while focusing their attention ...

  10. Revolutionizing Web Interactions: Exploring Voice User Interfaces (VUI

    Consequently, the VUI system can effectively execute user commands and establish a fluid interaction between voice input and corresponding actions on the web page. By leveraging the power of web automation, VUIs can automate repetitive tasks, enhance user interactivity, and streamline workflows within web applications.

  11. A Step-By-Step Guide To Creating A Perfect Voice User Interface

    The Voice User Interface (VUI) is a significant innovation that allows for seamless communication between humans and machines via voice instructions. ... A UI design case study to redesign an ...

  12. A guide for the Voice User Interface(VUI) design

    Voice User Interface is an interface which enables voice interaction between human and devices. Visuals can be added as a complement to give a better understanding of users in some cases. ... A UI design case study to redesign an example user interface using logical rules or guidelines.

  13. Voice User Interfaces: Seamless Interactions :: UXmatters

    VUIs employ speech-recognition technology that enables users to communicate via a computer, smartphone, smart speaker, or other device by using voice commands. Some examples of VUIs include products such as Apple's Siri, Amazon's Alexa, Google's Assistant, and Microsoft's Cortana. Voice technology is different from any other method of ...

  14. PDF Google Search by Voice: A case study

    we faced to make search by voice a reality. In Section 4 we explore the user interface design issues. Multi-modal interfaces, combining speech and graphical elements, are very new, and there are many challenges to contend with as well as opportunities to exploit. Finally, in Section 5 we describe user studies based on our deployed applications ...

  15. Design guidelines for voice user interfaces

    3. The user asks the interface something it can't do. (System error) In general, users should not experience more than three "no input" or "no match" errors in a row. On the first no match error, the system should do a rapid reprompt that combines an apology with a condensed repetition of the original question.

  16. Voice User Interface: the future is in voice search

    Voice User Interface (abbreviated as VUI) refers to interfaces that enable vocal interaction between humans and devices. A Voice User Interface can be any object, as long as it is capable of recognizing what the person addressing it is saying and consequently responding intelligently. If some aspects still seem a bit strange to the general public, we cannot overlook that more and more ...

  17. Examples of Natural Voice User Interfaces

    These examples use Speechly Spoken Language Understanding technology for a natural voice UI, enhancing the touch screen user experience with voice functionalities. Form filling with voice. Voice in eCommerce and search filtering with voice. Adding items from a big inventory, such as grocery eCommerce. Professional applications.

  18. The Future of Interaction: How Voice User Interfaces (VUIs)are

    Designing voice user interfaces (VUIs) with emotion involves adding elements that convey emotions and personality to the voice and the overall user experience. Here are some tips for designing VUIs with emotion: Choose the right voice: The voice you use for your VUI can play a significant role in the emotional experience of your users. Consider ...

  19. Designing a voice user interface is easier than you may think

    Same fundamental skills. Designing a voice user interface requires the same fundamental skills as designing a visual user interface. The design of any human-computer interaction centers around formative user research, rapid prototyping, and regular testing. A voice user interface is simply a new and exciting way to transmit information.

  20. Investigating the usability and user experiences of voice user

    Investigating the usability and user experiences of voice user interface: a case of Google home smart speaker. Pages 127-131 ... and usefulness. In this study, we conducted a web-based survey to investigate usability, user experiences, and usefulness of the Google Home smart speaker. A total of 114 users, who are active in a social-media ...

  21. The Investigation of Adoption of Voice-User Interface (VUI) in Smart

    Driven by advanced voice interaction technology, the voice-user interface (VUI) has gained popularity in recent years. VUI has been integrated into various devices in the context of the smart home system. In comparison with traditional interaction methods, VUI provides multiple benefits. VUI allows for hands-free and eyes-free interaction.

  22. Voice UI for Autonomous Driving

    Voice interface is gradually replacing Graphic User Interface, and quickly becoming a common part of in-vehicle experiences. Products that use voice as the primary interface are becoming popular by the day and the number of users has continued to grow. This case study explores the the application of Voice Interface in automotive field.

  23. Will senior adults accept being cognitively assessed by a ...

    Abstract Background: early detection of dementia and Mild Cognitive Impairment (MCI) have an utmost significance nowadays, and smart conversational agents are becoming more and more capable. DigiMoCA, an Alexa-based voice application for the screening of MCI, was developed and tested. Objective: to evaluate the acceptability and usability of DigiMoCA, considering the perception of end-users ...

  24. 10 Best Practices for Voice Interaction Design

    Multi-modal interaction is the combination of two or more modes of input or output in a user interface. For example, a user can use voice and touch to interact with a smartphone, or voice and gesture to interact with a smart TV. ... A UI design case study to redesign an example user interface using logical rules or guidelines. Mar 14, 2023. 267 ...

  25. Journal of Medical Internet Research

    These modules were (1) web forums' posts extraction, (2) web forums' posts annotation, (3) statistics and signal detection algorithm, and (4) a graphical user interface (GUI). We showcase the efficacy of the GUI through an illustrative case study involving the introduction of the new formula of Levothyrox in France.