Nowadays, voice assistants are becoming ever more popular around the world. Besides famous ones like Siri, Alexa, and Google Assistant, thousands of voice bots are helping customers in many types of business. The reason is obvious: Voice bots can drastically improve contact center performance. Why not take advantage of this?
That’s why our client partnered with us to create a voice bot.
The main goal of this voice assistant is to help people by making insurance buying as easy as possible. The client, an international insurance company, wanted to create a proof of concept — a basic implementation of a project that proves the idea’s feasibility. They needed a bilingual voice assistant (English/German) that would help the client’s customers get answers to questions about their insurance plans.
Due to the company's emphasis on security, there was one important condition that we had to follow: no Google products. So we started looking for alternative ways to implement the project. After thorough research, our team chose the most suitable stack for our client:
It took three people and one month to develop a proof of concept for the voice bot. It works like this:
When an incoming call is received, Twilio triggers the backend service, detects the customer’s language, automatically converts their speech into text via the Twilio Speech Recognition API and fills in the necessary slots with the user’s information (first name, last name, insurance number, etc.). Then the bot checks whether the phone number is already in the database and greets the user (1).
Once the language of the conversation is determined, the voice bot tries to recognize the intent type and the corresponding Amazon Lex bot by calling Houndify Custom Commands with the information provided by the user.
After the bot recognizes the intent type, the customer is redirected to the corresponding bot for further conversation, according to the defined intent. There are three general types of intents: insurance intent, real conversation intent, and undefined question intent.
Here we have two options: either the user wants to buy new insurance or they want to do something with an existing policy.
In the first case (2.1) the bot recognizes the reason for the call and gathers the customer data needed to sell new insurance. Next, the information is put into the database and the bot sends an SMS to the user with details about the insurance and a request for approval.
The second case (2.2) works almost like the first, but here the bot identifies the customer with the help of the existing database and collects the required information to extend the current insurance policy.
Amazon Lex controls the state of the conversation and fills in slots with the user information.
Once everything is done, Twilio Programmable SMS sends the customer a message with the key information.
If the user asks a question that is not defined by Houndify Custom Commands as one of the two first intents, the bot records the question and passes it to a company specialist. Then the bot informs the user about this and asks whether they have any other questions.
If a customer wants to talk to a real person, they can ask the bot to redirect them to a company representative. Our client is an international company, but since their main office is located in Germany, most of the contact center employees speak German. So we enabled bilingual English-German translation.
Here we have two options:
→ The user chooses German as their language → Twilio connects them to a company representative with no need for translation.
→ The user chooses English → Their English speech is translated into German via Amazon Translate, allowing the company representative to understand the user and vice versa.
As the conversation ends, the voice feedback bot asks the customer to rate the service quality by answering a short voice or SMS survey. It then puts the answers into the web database (5).
During the development of the voice bot, we faced the challenge of real-time translation of a conversation between an English-speaking customer and a German-speaking call center specialist.
There are several options for implementing this function. The first solution looks like this: The bot records the entire audio stream of the first speaker, translates it, then replays the translation to the second speaker. It then does the same for the second speaker’s answer. However, latency would be quite long using this method, which would likely make the conversation awkward for both sides. Moreover, the process is multithreaded, meaning the voice assistant can work with multiple calls at the same time. Handling translations like this could make the process much more complex.
Another variant would be near-simultaneous translation: The customer starts speaking, the audio immediately arrives at the server through the open socket and translation begins. But we had to reject this method because it doesn’t take into account the whole context of the conversation, potentially causing mistakes and misunderstandings.
The best way to implement translation was to use Speech-to-Text services. That’s how we did it for this voice bot. The English audio stream is converted into text with Twilio Speech Recognition, this text is then translated into German with Amazon Translate and then converted back into audio using the same Twilio service.
This may look complicated, but we added one feature that made it easier: webhooks. Now the whole process works like this: The customer says something. The voice bot sends a webhook (a request) to the server and waits for a response to play the recording. So we artificially keep this request in a blocking queue on the server in order to wait until the translation is done and send them the answer. As soon as the answer comes from the other side of the conversation, they switch places: one is blocked, the second one listens to their audio.
Using Speech-to-Text services together with webhooks enables the fastest and most accurate translation, so this was the obvious choice for the voice bot.
Our expertise and solid background in related fields like chatbots helped our team to build a proof of concept that totally satisfied the client. We’re going to work with the client in the future to upgrade the proof of concept and build a fast and multifunctional voice bot.
So, based on the bot’s performance, we can tell you that if you want to raise your company’s contact center to the next level, a voice bot can be the perfect solution. A cheap (compared to a live agent), easy-to-scale, automatic voice assistant will speed up the work and make your contact center’s performance as smooth and accurate as possible.
How to Create a Chat App like WhatsApp: The Extensive Guide
How to Analyze App Performance: Tips to Consider
Got a project in mind?
Fill in this form or send us an e-mail
Get weekly updates on the newest design stories, case studies and tips right in your mailbox.