Voice Recognition Software Finally Beats Humans At Typing, Study Finds | KERA News

Voice Recognition Software Finally Beats Humans At Typing, Study Finds

Aug 24, 2016
Originally published on August 24, 2016 5:23 pm

Computers have already beaten us at chess, Jeopardy and Go, the ancient board game from Asia. And now, in the raging war with machines, human beings have lost yet another battle — over typing.

Turns out voice recognition software has improved to the point where it is significantly faster and more accurate at producing text on a mobile device than we are at typing on its keyboard. That's according to a new study by Stanford University, the University of Washington and Baidu, the Chinese Internet giant. The study ran tests in English and Mandarin Chinese.

Baidu chief scientist Andrew Ng says this should not feel like defeat. "Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone. Speech has always been a much more natural way for humans to communicate with each other," he says.

Researchers set up a competition, pitting a Baidu program called Deep Speech 2 against 32 humans, ages 19 to 32. The humans took turns saying and then typing short phrases into an iPhone — like "buckle up for safety" and "wear a crown with many jewels" and "this person is a disaster." They found the voice recognition software was three times faster.

Stanford computer scientist James Landay did not expect that. "The surprise for me was that it was that much better: three times faster! You would think everyone would be flocking to use it if they knew how much better it actually was."

Voice recognition still gets a bad rap. That could be because of how people use it. Apple's Siri, the beloved and befuddled personal assistant, has a hard time answering basic questions.

The Stanford University-University of Washington-Baidu team didn't test query skills. They zoomed in on voice recognition software's ability to type the spoken words. In English, they found the software's error rate was 20.4 percent lower than humans typing on a keyboard; and in Mandarin Chinese, it was 63.4 percent lower.

Landay hopes these findings encourage people to revisit the idea of talking to their phone.

"People probably play with Siri and find oh, it didn't give them the right answer. So they don't think to use speech as a way to do their text messaging or their email or what not," he says. "Using speech for those things is now working really well."

Back in the 1990s, researchers found voice recognition tools were far less accurate than keyboard typing. Slang and ambient noise in a room tripped up the software.

In the last few years, that's changed for a few reasons: Just like smartphone cameras with more megapixels can see us better, the built-in microphones can hear us better. Supercomputers are churning through data more effectively in a process called "deep learning."

And there's more training data to vacuum in and learn from. For example, Ng says, Baidu has five years' worth of audio — unique recordings of people speaking that can play nonstop from now until 2021.

Last year, 65 percent of smartphone owners in the U.S. used voice assistants, according to the 2016 Internet Trends Report, a popular annual overview by tech investor Mary Meeker.

Many tech companies are betting that now is the inflection point and are hiring experts in the field of "natural language processing." Google and Amazon are inviting developers to work on voice-driven products.

It's easy to see how talking at your device would be far better than typing, say when you're driving.

Baidu's Ng imagines another scenario. He does not have children yet. But, he says, he looks forward to the day when his future grandchild comes home and asks, "Is it really true that when you were young, if you came home and you said something to your microwave oven — did it really just sit there and ignore you? That's just so rude of the microwave."

His co-author Landay reins him back and notes there are many moments — in a meeting, in bed with your partner sleeping — when typing still makes more sense than talking to one's devices.

Copyright 2017 NPR. To see more, visit http://www.npr.org/.

KELLY MCEVERS, HOST:

Computers have already beaten us at chess, "Jeopardy!" and Go, and humans have now lost another battle over texting. A new study shows that software is significantly better than we are at typing text messages. Here's NPR's Aarti Shahani.

AARTI SHAHANI, BYLINE: The study is by Stanford University, the University of Washington and Baidu, the Chinese internet giant. Baidu chief scientist Andrew Ng says this should not feel like defeat.

ANDREW NG: Humanity was never designed to communicate by using our fingers to poke at a tiny little keyboard on a mobile phone.

SHAHANI: And testing shows there is a better alternative - talking. Researchers set up a competition, pitting a cutting-edge Baidu program called Deep Speech 2 against 32 humans ages 19 to 32. The humans would say and then type short phrases into an iPhone like wear a crown with many jewels, and this person is a disaster. They found the voice recognition software was three times faster which Stanford computer scientist James Landay did not expect.

JAMES LANDAY: The surprise for me was that it was that much better - three times faster. You would think everyone'd be flocking to use it if they knew how much better it actually was.

SHAHANI: Just like smartphone cameras have more megapixels to see us clearly, the built-in microphone can hear us more clearly. Also, supercomputers have more voice recordings to vacuum in and analyze. Still, voice recognition gets a bad rap. That could be because of how people use it. Many people ask Apple's Siri a basic question and too often get a wacky response in turn.

LANDAY: People probably play with Siri and find, oh, it didn't give them the right answer so they don't think to use speech as a way to do their text messaging or to do their email or whatnot.

SHAHANI: The researchers didn't test query skills. They zoomed in on the ability to spit back the right words in two languages. In English, they found the software's error rate was 20 percent lower than humans typing on a keyboard. And in Mandarin Chinese, it was 63 percent lower. Landay hopes these findings encourage people to talk to their phones more in order to transcribe and text.

LANDAY: Using speech for those things is now working really well.

SHAHANI: It's easy to see how talking at your device would be far better than typing - say, when you're driving. Baidu's Ng imagines another scenario. He does not have children yet, but he says he looks forward to the day when his future grandchild asks him...

NG: Is it really true that when you were young, if you came home and you said something to your microwave oven, would it really just sit there and ignore you? That's just so rude of the microwave.

SHAHANI: His co-author Landay reins him back and notes there are many moments - in a meeting, in bed with your partner sleeping - when typing still makes more sense than talking to one's device. Aarti Shahani, NPR News, San Francisco. Transcript provided by NPR, Copyright NPR.