With the Amazon Echo and Google Home, voice control is emerging as a key UX channel in the smart home. Each of these devices has an associated "Artificial Intelligence" (AI) agent--Amazon's is named Alexa, and Google's is named Assistant--and they're both showing notable advantages over the current default of using a native app on a phone or tablet to control devices in the home. It's a new UX paradigm that looks like it could transform how people within their homes (and cars in the future) interact and control both internal and external information and applications.
Both Echo and Home are speakers that can work as a smarthome control center and an AI assistant for the family. You can use them to playback entertainment, manage everyday tasks, query external information sources, and control other smart devices in the home.
Apple's Siri brought voice-controlled AI into the mainstream, first appearing in iOS 4. But Siri and Google Now (the precursor to Google Assistant) were tied to the phone, and they were manually initiated by pressing a button on that phone. Plus, most information responses were displayed on that phone's screen as opposed through voice delivery.
Amazon Echo and Google Home bring the power of voice-controlled AI in a more usable and intuitive experience--an experience that's not constrained to a phone or tablet for the channel.
Both the Echo and Home currently have limited identity capabilities. For instance, neither one can currently distinguish between different family members by differences in voice patterns. Siri, Echo and Home are all examples of voice recognition, but not voice biometrics. In other words, they can't yet identify who a particular human is.
The Echo does allow for different members of a household to be distinguished via separate profiles. The different profiles can be swapped in and out via a 'Switch to Bob' voice command, and it allows for a PIN to be spoken out as an authentication mechanism for certain purchase operations. At least currently, the Home doesn't support multiple accounts, which would significantly diminish the value of a 'Hey Google, what do I have going on today?' query. Whose day exactly...mine or my wife's?
But it seems likely that both Echo and Home will evolve their capabilities to eventually add some level of biometric authentication (i.e., being able to reliably differentiate between different known members of a family based on voice). This capability will open up a whole new range of possibilities for personalization, including:
Filtering search results when performed by a young child (perhaps recognizing only that the speaker is young from the timbre of their voice, and not a particular child).
Tailoring query results to particular requestor (I need my calendar, not my wife's).
Better respecting COPPA by only storing voice data of kids under 13 on the cloud once parental permission is obtained.
Adjusting temperature and lighting based on individual preferences of those in the room.
Beyond personalization, once the Echo and Home can differentiate between members of a family via voice biometrics, they'll open up new authentication and authorization channels. If an Echo in my living room can pass my voice up to the Amazon cloud for analysis, then Amazon can authenticate me based on my voice, and then assert that fact accordingly to applications. For example, if I attempt to order a pizza from the comfort of my couch, my voice command of 'Buy large combo' could implicitly and passively authenticate me into Amazon, and then ultimately to the pizzeria. This kind of authentication, and its connection to my credit card for payment, wouldn't work for my 18-year-old son (who should buy pizza with his own money).
In an alternative authentication flow, a Home speaker could receive a request for an additional voice check required by some online provider I was trying to authenticate to. The Home speaker would call out 'Paul, Best Buy needs voice authentication' to which I would respond 'The rain in Spain falls mainly on the plain' (or whatever phrase allows me to be authenticated).
Compare the above with today's mobile-based authentication, where users are prompted to interact with their phone (either apply a fingerprint, or a swipe, or an OK) when signing on to a web site.
An authenticated voice channel would be useful when bringing new devices into the smart home. The initial registration, provisioning and giving permissions is a highly sensitive operation (you don't want just anybody to be able to add a device to the home). Requiring that explicit approval or consent be given over a voice channel would be both secure and easy (compared to today's manner of installing and learning a new native application).
Because both Echo and Home analyze voices in the cloud, the biometric check would also be performed in the cloud--not ideal from a privacy or security perspective. If voice data is going to be used as a biometric, any templates (or samples from which a template could be constructed) must be protected against breach and compromise. Compared to the FIDO Alliance model where the biometric verification would be performed locally (i.e., on the speaker itself), that verification would make a private key available to sign a challenge from the server. In the FIDO model, the voice templates never leave the speaker, so they're never stored in large databases in the cloud--that are so attractive to hackers.
There are broader privacy concerns with 'always listening' Echo and Home speakers. When do the speakers transfer voice data up to the cloud? How long is that data stored and what security protections exist for that data in the cloud? Both Amazon and Google give homeowners a measure of control over the privacy aspects of the speakers like actively invoking the speakers through a phrase, providing a mute feature that stops the speaker from listening, and allowing voice recordings to be deleted. But ultimately, homeowners need to feel assured that their privacy isn't compromised by inappropriate analysis and/or storage of voice data. So if they start to hear and see ads that aren't based on explicit searches (whether voice initiated or not) that trust may disappear quickly.