Why humanity is needed to drive conversational AI

Were you unable to attend Transform 2022? Check out all the summits in our on-demand library now! Look here.


Conversational AI is a subset of artificial intelligence (AI) that allows consumers to interact with computer applications as if they were interacting with another human. According to Deloitte, the global conversational AI market is set to grow by 22% between 2022 and 2025 and is estimated to reach $14 billion by 2025.

By offering enhanced language adaptations to cater to a very diverse and large group of hyperlocal audiences, many practical applications of this include financial services, hospital wards and conferences, and can take the form of a translation app or a chatbot. According to Gartner, 70% of white-collar workers reportedly interact regularly with conversational platforms, but this is just a drop in the ocean of what could unfold this decade.

Despite the exciting potential within the AI ​​space, there is a significant obstacle; the data used to train conversational AI models does not adequately account for the subtleties of dialect, language, speech patterns and inflection.

When using a translation app, for example, a person will speak in their source language, and the AI ​​will calculate this source language and convert it into the target language. When the source speaker deviates from a standardized learned accent – for example, if they speak with a regional accent or use regional slang – the effectiveness rate of direct translation drops. Not only does this provide an unparalleled experience, but it also inhibits users’ ability to interact in real time, whether with friends and family or within a business.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to provide guidance on how metaverse technology will transform the way all industries communicate and do business on October 4th in San Francisco, CA.

Register here

The need for humanity in AI

To avoid a drop in efficiency rates, AI must make use of a diverse data set. This could include, for example, having an accurate representation of speakers across the UK – both regionally and nationally – to provide better active translation and speed up interaction between speakers of different languages ​​and dialects.

The idea of ​​using training data in ML programs is a simple concept, but it’s also fundamental to the way these technologies work. Training data works in a simple framework for reinforcement learning and is used to help a program understand how to use technologies like neural networks to learn and produce sophisticated results. The wider the group of people who interact with this technology on the back end, such as speakers with speech difficulties or who stutter, the better the resulting translation experience.

Especially in the area of ​​translation, with a focus on how a user speaks instead of what they talk about is the key to enhancing the end-user experience. The darker side of reinforcement learning was illustrated in recent news with Meta, which recently came under fire for having a chatbot spewing insensitive comments — which it learned from public interaction. Training data should therefore always have a human-in-the-loop (HITL), where a human can ensure that the overall algorithm is accurate and suitable for the purpose.

Account of the active nature of human conversation

Of course, human interaction is incredibly nuanced, and building a conversational design that can navigate that complexity is a perpetual challenge. Once achieved, however, well-structured, fully realized conversational design can ease the burden on customer service teams, translation apps, and improve customer experiences. Beyond regional dialects and slang, training data must also take into account active conversation between two or more speakers interacting with each other. The bot must learn from their speech patterns, the time it takes to actualize an interjection, the pause between speakers and then the response.

Prioritizing balance is also a great way to ensure that conversations remain an active experience for the user, and one way to do that is to eliminate dead-end responses. Think of this as similar to being in an improv setting, where “yes and” statements are fundamental. In other words, you’re meant to accept your partner’s world-building while bringing a new element to the table. The most effective bots work in the same way by formulating responses openly that encourage more inquiries. Providing alternatives and additional relevant choices can help ensure that all end users’ needs are met.

Many people have trouble remembering long trains of thought or take a little longer to process their thoughts. Because of this, translation apps would do well to give users enough time to calculate their thoughts before pausing at the end of an interjection. Teaching a bot to learn filler words – including so, um, well, um, or similar, in English for example – and having them associate a longer lead time with those words is a good way to let users engage in a more realistic conversation in real time. Offering targeted “barge-in” programming (chances for users to cancel the bot) is also another way to more accurately simulate the active nature of the conversation.

Future innovations in conversational AI

Conversational AI still has a long way to go before all users feel accurately represented. Accounting for subtleties of dialect, the time it takes for speakers to think, as well as the active nature of a conversation will be critical to driving this technology forward. Especially in translation apps, taking into account pauses and words associated with thinking will improve the experience for everyone involved and simulate a more natural, active conversation.

Having the data draw from a wider data set in the back-end process, for example learning from both English RP and Geordie inflections, will avoid the effectiveness of a translation falling due to processing issues due to accent. These innovations offer exciting potential, and it is time for translation apps and robots to take into account linguistic subtleties and speech patterns.

Martin Curtis is CEO of Palaver

Data Decision Makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people involved in data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You may even consider contributing an article of your own!

Read more from DataDecisionMakers

Leave a Reply

Your email address will not be published.