How many people actually want to talk to their phones?
In a world of communicating with our devices, voice control seem to have gotten a lot of attention. But is that really what we want?
The voice focus in products
Looking at Siri in iOS, Voice Actions for Android, Google’s Project Glass and much more, there’s an amazingly strong focus on conversing with your phone/device. There is, or about to be, voice commands for any action you can think of, and more and more are added as we speak (sorry, not that funny).
Use cases
Usually with a new feature, it follows this approach:
- It’s new! Use it! For everything!
- It will increase our stock rating.
- Still new-ish – pack it with features.
- Eh… Is it adding any actual value to the user?
- This one is relative, depending on the reply to #4:
- Yes, it actually does
- F**k it, it’s new!
- Who cares?
With voice control, I get the obvious use case with driving a car and getting directions, having the phone read out some important information or you adding events, sending text messages or similar.
And if you’re working with your hands, e.g. a carpenter, sure, I can see some usage there.
But otherwise, how often are the majority of people or the use cases for a phone in such situations? And it’s not like the ads for voice control seems to target those job groups anyway – it’s usually some hipster busy with his hair or making his jeans even tighter…
Silence is golden
I’m probably going to sound like an old grumpy man now, but I do believe silence can be golden at times. I’m picturing people everywhere, talking to their devices, adding things, sending mail etc etc – all through voice commands.
If you’re cringing now when someone is on the phone on the bus/subway talking about what groceries to shop, or is dealing with relationship issues, just wait for what the brave new world has in store for you.
And before the technology is refined enough, just picture the number of people screaming at their devices when they don’t understand. The level of device-directed profanity in public places will be massive.
Integrity
At the end, I think it actually also comes down to integrity. The first part is more to do with security, where people can find out a lot of information about you through listening to you giving your phone commands.
The second part is that, for your own sake, keeping things to yourself and adding them to your phone through touch/keyboard input is much better. You don’t have to share everything with the world, you know.
What about you?
Maybe I’m off here, maybe everyone else see this as an amazing revolution, the next vital part of evolving. Or, it’s has been hyped up way too much.
I’d be very interested in hearing what you think of how to interact with your device!
Absolutely agree – beyond the use cases you mention, and maybe some future uses around the home when Siri et al get a bit more intelligent (waiting for the day when I can tell my phone “pay my gas bill”), I think the novelty will wear off eventually no matter how accurate speech recognition gets.
I’m glad someone finally addressed this issue. I’m with you on this one, I’ve had 2 Android devices so far, and never found it necessary to use voice commands. I haven’t even tested how good or bad it is, I just don’t care.
I think most people use it because it’s new and eventually it will be relegated to the specific use cases you mentioned when it could really come in handy. As most things, I think this feature has been over hyped up by Apple’s marketing, just like they do with all of their products/features.
Regarding silence, some people broadcast lots of information being loud over their phones anywhere in the bus, streets, etc. They have no idea of the security risks of everyone in a bus knowing where they’ll be for dinner and how to get there or whatever. I agree with people in general keeping this kind of stuff to themselves and their phones.
I think you’re really onto something. While I do think that there’s a better way, (especially a better way than on-screen keyboards), I think that better way is mind control – or your phone, (or laptop, if they still exist by then), knowing what you’re thinking and what you want, and that tech is still a long way out.
On security & privacy, check out Alistair Croll’s (slightly older) post: http://solveforinteresting.com/loose-ears-end-careers/
Good points, all of them, but I see an immediate need for a better user interface, where good voice recognition would solve several problems. Touch UIs are pretty slow when you have to touch, swipe, open applications and scroll around to find and edit information and issue commands. Text input is really terrible. Voice recognition may be the solution.
The first thing I think holds back the adoption is that voice recognition just doesn’t work that well yet. You don’t want the device to misinterpret what you’re saying, and if it does, you have to be able to correct it quickly, just like when you’re talking to a human.
The other thing is the privacy problem, but it’s possible to do speech recognition without requiring the user to speak out loud. For example using Subvocal recognition where the user only has to move his/her mouth. Lip reading may also be an option in some cases.
So I think the problems with spoken control of computers are mainly technical and can be solved. But we’re not there yet.
Thinking back of the days when I was doing computational linguistics and we built natural language processing systems that allowed you to order cinema tickets or program a VTR, there’s one thing I remember most of all: People don’t want to talk to machines.
And so do I, still. No matter how good speech recognition will become (and it still is hard to do), it will always feel awkward and weird. Even if it’s just issuing simple commands, I’d feel a little embarrassed if anyone would catch me doing so in public.
I’d always prefer my hands as the way to interact with a device – but, I don’t know, maybe that’s just because I learned that handling a device with my hands, in opposition to my voice, works very well and is very precise.
I use the phone only for phone calls and seldom for SMS.
When I want to be online I use a PC.
The last thing I need is to get online while I am “mobile”, given that there is a world out there and I prefer to interact with it instead.
I think this hype for smartphones and “mobile” devices is nonsense.
I agree, some use cases are very context-specific, like calling or messaging someone – with the current technology, it often goes horribly wrong and the only time it’s worth attempting is when your hands are tied, so to speak.
One area where talking to your phone can be much easier than typing, is search – movies, words, places and other mainstream keywords work rather well. Typing “imdb the world is not enough” on a small device takes a lot more effort than just saying it.
The reason it doesn’t work well for anything other than very simple searches is that voice recognition is a very hard problem. Our software is not smart enough. No one can argue that Siri does not suck. But imagine a piece of software that really understands what you’re communicating – not just what you’re saying, but how you’re saying it and all the other contextual hints that help us understand each other. Imagine near-AI that can learn who you are and how you think, so that when you say “Recommend a movie for tonight”, it can infer that you’re stressed about something and want to see some sort of light comedy, preferably starring Will Ferrell (but not the latest one, because you booked tickets for that a few weeks ago)… That would take a lot of typing.
Even when we have functioning speech-to-meaning, we must remember that some problems are already solved: using T9-style combined dialling/contact search is still the best solution for that particular problem. The problem isn’t so much the drive to make voice interfaces work, it’s the tendency to apply any new technology to all problems, including those already solved.
The thing I always say about voice UI is that not everything from Star Trek is a blueprint for user interaction.
The only reason they talk to the computer on Star Trek and other scifi shows is because it’s a good dramatic device (ie. it’s boring to watch someone type), not because it’s a good interface.
I also don’t want my personal computing devices to be characters with which I interact. I want them to be extensions of *me* – and even though I talk to myself sometimes, I generally don’t expect to have to do that to make my arms and fingers do things.
As someone already said: typing and touch are slow. I can form a sentence in my brain and say it in a fraction of the time it takes to type it in, let alone navigating to the place where I should type it in.
The one place I use voice commands regularly is in the “Dragon Go!” app on my iPhone. I can tell it “directions to XYZ, MyTown, MyState” and it:
* googles the most likely address I’m looking for
* maps the address
* starts navigating to there from my current location
To do all this using my fingers would take a ridiculous amount of tapping in comparison.
I have a shoulder injury and tremor in my right hand that makes it difficult to type. Voice helps, I just wishbSiri didn’t do such a poor job. So, yes I want to talk to my phone.
I use Siri, sometimes I use the dictation (when I feel lazy). I love saying “play song X” or “remind me to pick up something at someone’s house”. Something that normally takes a bit to do (like setting up an appointment or reminder) takes a couple seconds, and that’s it (given it understands you).
The problem is that I can only do when no one else is around, unless I want to show off when I’m with close friends.
Hey, after all we got used to people talking on the phone EVERYWHERE, so we’ll get used to this as well. Just… just don’t be loud, that’s the general rule.
I hate talking to my phone, especially when other people are present. My most common problem is that due to a conversation, I want to add something to my calendar. When I tell my phone to add it, the other person is always totally confused because I’ve suddenly added a 3rd participant in the conversation and the original person gets confused about who I’m talking to.
On the other hand, it’s an order of magnitude faster and less disruptive for many tasks, which is enough for me to keep doing it. Well, that and the in-car use, though the voice interface is horrible enough that I can’t reliably use it there.
It’s not that voice input is so great; it isn’t. It’s because touch input is so awful.
Think of it this way: the input bandwidth of typing on a phone is maybe 2 characters per second, with something like a 20% error rate. This is *awful*. Alternatively, you can select from a displayed set of options, which is comfortable for about 10 options with low error rate, or something like triple that with much higher error rate (which is less painful if you’re selecting A-Z and getting close is still helpful.)
So that’s like 10 bits/sec with 20% error, and 3-4 bits/sec with low error or 5 bits/sec with high error. Compare that to voice, which is (totally guessing) more like 100 bits/sec with moderately low error (and moderately high latency).
That’s enough to enable comfortable search instead of selection, and we all know which one won that fight on the Web.
I actually tend to use graffiti-ish interfaces a fair amount on mobile devices. They give something like 5 bits/sec with very low error rate. They’re much nicer for eg looking up a contact or an app name.
rob hammond,
Thanks! It probably will, unless it improves in some way we can’t foresee right now.
Fernando,
Glad you liked it!
And yes, I don’t think everyone should be quiet, but just more aware of what they share with the world.
John,
Thanks!
And agreed on on-screen keyboards, that can definitely be better.
Really appreciated reading Alistair’s post too, thanks for the tip!
Martin,
I agree touch UIs could be better, but work fairly well at the moment. depends on the complexity of the input, of course. I think most people agree that text input is indeed terrible overall. On Android phones, solutions like SwiftKey makes it better, but there’s a long way to go.
I like the idea about Subvocal recognition! It could work, and would be something better adapted to society.
The security risks would still be there, though, with lip readers seeing your input and such.
Jens,
I feel the same way. But, like you say, a lot of this could be based on how we learned it and where we come from – for the next generation, maybe it’s just natural.
Ciccio,
I like the ida of actually interacting with the world. 🙂
I covered this topic recently in Mobile vs. Social.
Niklas Bergius,
Definitely agree that search is faster, and there isn’t any risk of misentering some crucial information.
> it’s the tendency to apply any new technology to all problems, including those already solved.
I like that. 🙂
And there’s definitely a belief that THE LATEST technology will solve everything.
However, I also do like that people look into options and improving some things.
Les,
Yes, dear ol’ Star Trek and its influence on user interfaces. 🙂
Interesting about the distinction between a character and an extension of you. This attitude is probably core when we talk about how future services are being developed.
Milo,
Typing is definitely slow. And when it comes to navigation, I agree and had that as one of the examples in the post.
However, security plays in here too. If you’re in the street and someone overhears where you are going, they can have a plan for intercepting you or similar.
Tanny,
I definitely overlooked accessibility in my head, since it’s just such a given use case that’s good. Continue to talk to it!
Fedrico,
With appointments, I would have a really hard time trusting it. For me, it’s too important and I wouldn’t want to take that risk.
But you might be right. We might just get used to it, and eventually it will be the common norm.
Steve,
Yes, talking with someone and wanting to add an appointment is a very common use case, that becomes slightly weird.
You’re definitely nailing it with “It’s not that voice input is so great; it isn’t. It’s because touch input is so awful.”.
Also like your efficiency calculation. 🙂
I’ve wondered if the use case for voice interaction is more of a social one. i.e. when two people are using a machine at the same time, touch, mouse and keyboard input kinda suck. But when you’re working together, being able to to just say “Maybe if we move that box over there” might (and I mean MIGHT) be a useful way to collaborate.
DigDug2k,
Interesting use case! And yes, it might be good for that, just like touch input vastly improved collaboration compared to on a keyboard.
It has always Troubled me When I see someone who appears to be talking to themselves…In the Car,walking in the Park , Everywhere Bluetoothed to the High heaven! All my kids don’t even leave their rooms to talk to someone in the same House.. All this and the Fact :you are a Nobody if you don,t use a iphone or blackberry . Is for the Birds .
Mike,
Yes, all these people that seem to be talking to themselves can be a bit worrying at times. 🙂
I completely agree with Ciccio…
“There is a world out there”
Roland,
There is!
[…] amazing revolution, the next vital part of evolving. Or, it’s has been hyped up way too much.This article by Robert Nyman originally appeared on Robert’s Talk and is republished with […]
There are 2 things
1. Maybe more nuance in the interface is desired i.e. Voice getting enabled/disabled depending on situation e.g. enable in office and disable outside (manual override if urgent/essential). Making it easy for people to define their own use cases and location boundaries (geofences?).
2. More nuance in how we use devices, being aware. Even in the past, loud mouths would talk loud about everything in public so each person gets to define their boundaries.
But what new social behaviors will evolve is not truly predictable as new technology integrates with life. Example, I always felt do we wave our hands about more in conversation after the advent of the car where holding hands front of you (steering) is normal than before? Maybe not but could be.
Ravi,
1. Yes, but that should be as easy as the press of a button or similar.
2. Indeed. 🙂 Just afraid that more people will, unknowingly share more personal information in public – not necessarily being loud.
ya i agree with you.
i also hate to talk on phone for too many hours and some people talk on phone approx whole day thats irritating . 🙂
I also agree with many here, and of course the blog post. I always thought it weird that voice actions were pushed so much when I can’t imagine that all voice actions (like another comment, where adding a calendar event) are not somewhat private. Even if it’s not a big deal that someone else hears your voice prompts, telling your phone to remind you to do laundry for once, or clean the litter box, but if most people are like me (weird question), I don’t want to bother others.
I think this is the only case scenario that I would rather have a chip in my brain that I can give thought commands over voice commands :-/ Voices are meant for person to person.
Ankur,
Well, especially if everyone’s doing it all around you!
Jason,
Yes, some things should be private. Personally, I wouldn’t want a chip in my brain, though. 🙂