within voice range with no need to use a keyboard, mouse, buttons, menus, or other interfaces to input commands (Figure 1).
Figure 1: VUI technology has been widely adopted in homes and smart buildings because it is convenient and flexible. Image source: Renesas
The downside of a VUI is its complexity. Conventional
technology is based on the lengthy training of a model with specific words or phrases. But natural language processing is word-order independent, which demands considerable development work and significant computing power to run in real-time. This has slowed the broader adoption of VUIs. Now, a new technique simplifies VUI software to the extent that it can run on small, efficient microcontrollers (MCUs) such as Arm Cortex-M devices. This technique relies on the fact that all words in each spoken language are made up of linguistic sounds called phonemes. There are far fewer phonemes than words; English has 44, Italian has 32, and the traditional Hawaiian language
on phonemes that significantly reduces the processing requirements. The result is highly accurate and efficient VUI software that can run on familiar 32-bit microcontrollers (MCUs) and is supported by easy-to-use design tools. This article describes VUI challenges and use cases. It then introduces commercial, easy-to- use MCU application software and local phoneme-based VUI software for connected home applications. The article concludes by showing developers how to get started on VUI projects using Renesas MCUs, VUI software, and evaluation kits. The challenges of building a VUI A VUI is speech recognition technology that enables interaction
with a computer, smartphone, home automation system, or other device using voice commands. After early engineering challenges, the technology has matured into a reliable control interface and is now widely used in smart speakers and other smart home devices. The key benefit of a VUI is its convenience: instant control from anywhere
How to implement a voice user interface on resource- constrained MCUs
Smart speakers and other connected hubs form the heart of the smart home, allowing users to control devices and access the Internet. Two trends are apparent as these devices proliferate: users prefer voice control over button presses or complicated menu systems, and there is increasing discomfort with continuous Cloud connectivity because of privacy concerns. However, a robust and secure voice user interface (VUI) typically
demands powerful hardware and complex software for voice recognition. Anything less will likely result in poor performance and unsatisfactory user experiences. Also, many smart speakers and hubs are battery powered, so a VUI must be achieved within a tight power budget. Such an ambitious project can be daunting for a developer lacking experience with voice interfaces. Chip makers are responding by introducing a technique based
Written by DigiKey’s North American Editors
Figure 2: Representing words using phonemes
demands fewer microcontroller resources. Image source: Renesas
we get technical
4
5
Powered by FlippingBook