Voice assistants hear things we don’t

We explain how ultrasound and audio recordings hidden in background noise can be used to control voice assistants.

Our interaction with technology could soon be predominantly voice-based. To ask for something out loud and hear the answer is literally child’s play: Just take a look at how effortlessly kids use voice assistants.

But new technology always means new threats, and voice control is no exception. Cybersecurity researchers are tirelessly probing devices so that manufacturers can prevent potential threats from becoming real. Today, we’re going to discuss a couple of finds that, although of little practical application right now, should be on today’s security radar.

Smart devices listen and obey

More than a billion voice-activated devices are now used worldwide, says a voicebot.ai report. Most are smartphones, but other speech-recognition devices are fast gaining popularity. One in five American households, for example, has a smart speaker that responds to verbal commands.

Voice commands can be used to control music playback, order goods online, control vehicle GPS, check the news and weather, set alarms, and so on. Manufacturers are riding the trend and adding voice-control support to a variety of devices. Amazon, for example, recently released a microwave that links to an Echo smart speaker. On hearing the words “Heat up coffee,” the microwave calculates the time required and starts whirring. True, you still have to make the long trek to the kitchen to put the mug inside, so you could easily push a couple of buttons while you’re at it, but why quibble with progress?

Smart home systems also offer voice-controlled room lighting and air conditioning, as well as front-door locking. As you can see, voice assistants are already pretty skilled, and you probably wouldn’t want outsiders to be able to harness these abilities, especially for malicious purposes.

In 2017, characters in the animated sitcom South Park carried out a highly original mass attack in their own inimitable style. The victim was Alexa, the voice assistant that lives inside Amazon Echo smart speakers. Alexa was instructed to add some rather grotesque items to a shopping cart and set the alarm to 7am. Despite the peculiar pronunciation of the cartoon characters, the Echo speakers of owners watching this episode of South Park faithfully executed the commands issued from the TV screen.

Ultrasound: Machines hear things people don’t

We’ve already written about some of the dangers posed by voice-activated gadgets. Today, our focus is on “silent” attacks that force such devices to obey voices that you can’t even hear.

One way to carry out this type of attack is through ultrasound — a sound so high it is inaudible to the human ear. In an article published in 2017, researchers from Zhejiang University presented a technique for taking covert control of voice assistants, named DolphinAttack (so called because dolphins emit ultrasound). The research team converted voice commands into ultrasonic waves, with frequencies too high to be picked up by humans, but still recognizable by microphones in modern devices.

The method works because when the ultrasound is converted into an electrical impulse in the receiving device (for example, a smartphone), the original signal containing the voice command is restored. The mechanism is somewhat similar to the effect when the voice gets distorted during recording — there is no special function in the device; it is simply a feature of the conversion process.

As a result, the targeted gadget hears and executes the voice command, opening up all kinds of opportunities for attackers. The researchers were able to successfully reproduce the attack on the most popular voice assistants, including Amazon Alexa, Apple Siri, Google Now, Samsung S Voice, and Microsoft Cortana.

A choir of loudspeakers

One of the weaknesses of DolphinAttack (from the attacker’s perspective) is the small radius of operation — just about 1 meter. However, researchers from the University of Illinois at Urbana-Champaign managed to increase this distance. In their experiment, they divided a converted ultrasound command into several frequency bands, which were then played by different speakers (more than 60). The hidden voice commands issued by this “choir” were picked up at a distance of seven meters, regardless of any background noise. In such conditions, DolphinAttack’s chances of success are considerably improved.

A voice from the deep

Experts from the University of California at Berkeley utilized a different principle. They surreptitiously embedded voice commands in other audio snippets to deceive Deep Speech, Mozilla’s speech recognition system. To the human ear, the modified recording barely differs from the original, but the software detects in it a hidden command.

Have a listen to the recordings on the research team’s website. In the first example, the phrase “Without the data set the article is useless” contains a hidden command to open a website: “Okay Google, browse to evil.com.” In the second, the researchers added the phrase “Speech can be embedded in music” in an excerpt of a Bach cello suite.

Guarding against inaudible attacks

Manufacturers are already looking at ways to protect voice-activated devices. For example, ultrasound attacks could be stymied through detecting frequency alterations in received signals. It would be a nice idea to train all smart devices to recognize their owner’s voice, although having already tested this on its own system, Google warns that such security can be fooled by a voice recording or a decent impersonation.

However, there is still time for researchers and manufacturers to come up with solutions. As we said, controlling voice assistants on the sly is currently doable only in lab conditions: Getting an ultrasonic loudspeaker (never mind 60 of them) within range of someone’s smart speaker is a big task, and embedding commands in audio recordings is hardly worth the considerable time and effort involved.