Studio
our blog
Building Voice Controlled Mobile Apps
In our last blog post - part of a series written by our tech team exploring our proof of concept (POC) work at Studio Graphene - we talked about a POC platform idea we developed for a client where we developed a cost effective sound detection and alerting system designed to be installed in remote areas like a dense forest, safari park or nature reserve.
It was great working on a concept like this and exciting to see how a prototype could be developed to enable biodiversity organisations to access innovative technologies at a lower cost to help protect our natural environment.
In our second post in the series we look at building a fully voice-controlled mobile application for settings where an individual wants to view instructions for a task on their screen or have instructions read out to them without having to physically interact with their mobile device.
Goals and Aims
When COVID hit the world, minimal touch or no-touch became a significant factor in every aspect of our lives, even when it came to mobile applications.
According to a recent study your mobile phone is 10 times dirtier than your toilet seat, so in this climate it should come as no surprise that no-touch applications are quickly gaining popularity.
This POC’s aim was to build a hands-free, voice-controlled application for various DIY tasks at home. The user should be able to talk to the app and ask for any DIY instructions. The app should then be able to read out all the desired DIY tasks, time needed, step by step directions etc to the user via simple voice commands and a true no-touch experience.
Challenges
Throughout various brainstorming sessions, we faced quite a few challenges that hindered the product development.
- The mobile application was required to be in an always-listening mode for it to be truly no touch. In such a case how do we filter out non related voices or unintentional noise
- How do we do deep link within the application based on voice commands
- Accent identification: Since the application is aimed for a global audience, it needed to be acquainted with worldwide languages, pronunciations and accents
- Find a lightweight Speech-To-Text (STT) and Text-To-Speech (TTS) library which is scalable and reliable across mobile platforms
Back to Basics - Solution
During the initial POC, our team decided to try on preset, intelligent assistants such as Siri and Alexa but failed at different stages. While we tried moving ahead with Alexa that supported our app development, somehow, the incapability of deep linking based on voice responses hindered the progress.
We then tried a custom open source speech recognition library built in React Native to check on its feasibility. The library worked for us initially but there were several limitations when it came to scaling the application and in terms of reliability. Typical problems when it comes to custom open source libraries.
While all other ways to develop the application failed initially, we chose to go back to basics and authenticate our app with Google Speech-To-Text. Google Speech-To-Text is generally used to customise speech recognition, transcribe domain-specific terms and accurately translate the user’s speech to understandable and actionable words.
The Development
While the open source React Native library failed to benefit us, we decided to create our own library that could fulfil the objective of our hands-free DIY assistant. React Native was chosen as the final language for application development, and Objective C made it compatible with iOS devices.
To make the app more personalised and user-friendly, we decided to name the application ‘Bravo’. While smart assistants such as Siri and Alexa force you to address them by their own name, this application allowed the personalisation so that we could activate our application by saying ‘Hello Bravo’ rather than ‘Hey Alexa’.
Finally we were able to achieve the following in this Proof Of Concept (POC):
- Build an iOS app with preloaded recipes, images, lists and steps
- Have the app in an always listening mode but gets activated only when the wake up wording is said i.e. ‘Hello Bravo’. This ensured that all other noise is not processed
- Understand what was said by the user and display the matched task
- Read out the recipe ingredients and steps to perform the activity based on user’s voice commands therefore making the experience truly no-touch
Learnings and Limitations
During the POC we flagged quite a few limitations including:
- To make the app hands-free, we wanted to develop something similar to Siri and Alexa. React Native libraries were not compatible with the AI assistant development, resulting in app crashes and performance issues
- Personal assistants like Siri, Alexa and Google Assistant work the best with their own operating systems. Hence, the features that needed to be incorporated into the app seemed null and void
Conclusion
Overall we were happy with the POC. Based on a detailed evaluation of the Voice-Controlled Home DIY Activity Assistant, we believe it to be a handy mobile application for all DIY tasks at home. The significant factors considered in this evaluation were minimal or no-touch, easy voice-driven DIY task support and cost-effectiveness. While the POC was for a specific DIY assistant use case, the concept of no-touch voice activated apps can easily be extended to many more use cases.