Hey Siri, Bone Apple Teeth!

Image credit: iStockphoto/SIphotography

Musician and writer David Pogue is an early adopter of voice recognition technology. Pogue developed tendinitis, which meant he needed a text-input method that would relieve the strain of typing.

“The pain [Pogue] felt after typing became so severe that he couldn’t button his shirt without help,” said a 2006 Macworld article. “When the New York Times columnist and Broadway pianist was diagnosed with tendinitis, a type of repetitive strain injury (RSI) in which the tendons become inflamed, he turned to speech-recognition software — a solution that he says helped save his career.”

Pogue catalogs amusing errors produced by voice recognition and sometimes uses them in his weekly blog posts — like “Autocorrect Follies.” Anyone using voice recognition software like Apple's Siri assistant, which celebrates its tenth anniversary this month, has gotten some wacky autocorrected text on their screen. Say “bon appetit,” and you might get “bone apple teeth” in reply.

Languages other than English are problematic. So too are accents, ambient noise and other factors. It's astonishing that voice recognition software works at all, but it does.

Should we shout at our phones? Is this an efficient way to compute? Or is Siri and its equivalents glorified bots?

Suspicious digital “assistants”

We've seen virtual assistants like Amazon's Alexa and Microsoft's Cortana enter the fray with varying degrees of success. The unusual names are no accident: Amazon has already encountered flak from people named “Alexa,” but fortunately, the name is relatively rare.

The “assistant” tag evokes images of an executive in bygone days dictating tasks to a human assistant. Voice recognition does transform speech to text, albeit clumsily. It's great for documenting a fleeting thought on the fly but editing after the fact is essential.

“For decades, technologists have teased us with this dream that you’re going to be able to talk to technology, and it’ll do things for us,” said Apple exec Phil Schiller in 2011, taking the stage at the launch of the iPhone 4s and also Siri. “Haven’t we seen this before, over and over? But it never comes true.”

“What we really want to do is just talk to our device,” said Schiller at the event, “and your device — in this case, your phone — will figure out what you mean and help you get what you want done.” Described by Schiller as a “humble personal assistant,” Siri gave 2011’s iPhone 4s a novel cachet, which was quickly eclipsed by firms such as Google rolling out similar products.

Novelty and expectations

Siri often fails because of Apple's “walled garden” approach to its software. Users operate in heterogeneous environments, and access to third-party options should be a given. Yet, according to MacRumors, Apple's latest operating systems will scrap much of this functionality.

“Starting with iOS 15, iPadOS 15, macOS Monterey, and watchOS 8, Apple will cut back on integration between Siri and third-party apps, drastically reducing the type and number of commands users will be able to invoke through the virtual assistant for third-party apps,” wrote MacRumors.

Adaptability and adherence to standards aren't the goals of these cleverly named software bots. Sinking R&D funds into enhanced functionality won't spur Alexa adherents to shift to Cortana.

This doesn't mean virtual assistants can't enhance productivity within an organization. Now more than ever, chief digital officers must remain aware that their workforce is fluid and can shift allegiance with greater ease. Knowledge workers need a degree of latitude in their digital workspaces, although CDOs must remain aware of inherent security issues.

Who's listening in?

A 2019 article reported that Amazon employs teams who transcribe recordings pulled from its Echo devices. “The recordings are transcribed, annotated, and then fed back into the software as part of an effort to eliminate gaps in Alexa’s understanding of human speech and help it better respond to commands,” said the article.

Amazon says: “No, Alexa is not always recording.” And the firm offers a privacy guide for their Echo smart hub devices. Yet, the 2019 article indicates that the devices are recording at least some of the time.

Anyone who's used machine translation knows that human oversight is essential to accurate results. But the overarching question of data privacy comes into play given Amazon's massive datasets on their customers. And as with everything tech, oddness sometimes manifests.

“In 2018, users reported that their Echo speakers began spontaneously laughing, while a family in Portland said their device recorded and sent conversations to a colleague without their knowledge,” wrote The Verge. “For these instances, Amazon claims that the devices were likely triggered by false positive commands.”

Future uses

The spectrum of uses for voice recognition is sure to expand. Watch this space.

Real-time translation via Google is now possible (given expectations). Long a province of science fiction, voice translation on the fly can be achieved with a suitable computing device and online connection.

The service isn't flawless. Real-time conversations with an elderly relative who only speaks Hokkien Chinese, for example, won't be as accurate as those with a clear interlocutor who speaks Deutsche Welle-level German. But it's better than hand gestures and stick-figure drawings.

Still, it frustrates

As of now, Siri and its ilk retain the ability to frustrate and alienate. “I stopped using Siri when I asked it to find me a Citibank location,” said a San Francisco-based computer consultant. “It wanted to send me to a branch many kilometers away. Turns out there was one on the next block.”

“Which one do I swear at the least?” asked Jennifer Pattison Tuohy in The Verge. “Apple’s Siri, which turns ten today, wins hands down. This is slightly to do with the fact it rarely spews useless information at me when it didn’t understand what I was asking (*cough* Google) and definitely because it never asks if I want to buy something (you know who you are).”

Stefan Hammond is a contributing editor to CDOTrends. Best practices, the IoT, payment gateways, robotics, and the ongoing battle against cyberpirates pique his interest. You can reach him at [email protected].

Image credit: iStockphoto/SIphotography