Thursday, 10 May
Google Duplex and the Struggle of the Moth
There is this one episode of the TV-Series LOST that I can’t seem to shake off my mind. It’s episode 7 of season 1 titled “The Moth”. I believe it’s one of the best hours of TV I have watched and the metaphor certainly comes to my mind quite often. One of the main characters (Locke) teaches another one (Charlie — who is struggling with withdrawal symptoms of heroin) that nature’s way of finding the best is through struggles. Locke demonstrates that by showing him a moth in its cocoon shortly before emerging from it. He says that he could help the moth — he has a knife and he can rip open the cocoon, helping the moth and letting him out. But the moth is not ready if it can’t break his own cocoon. The episode is worth watching as the two larger stories represented in the episode are also an expansion of the metaphor.
I watched yesterday parts of the annual Google IO event where the tech giant presents what has it been doing for the past year and where it’s heading for the next one. This year they have revealed a lot of things around Machine Learning and lots of ways that it is going to interact with us. One of the most talked about ones is called Google Duplex.
The idea of Google Duplex is to tell your Google Assistant “OK Google, make a reservation for that restaurant” and Google will call the restaurant on the background and make the reservation for you. And it won’t be a person but a simulated voice. The technology shown in the demo is incredible — they went even so far as to incorporate “uhmms” and “aha-s” in the voice simulation at the right places as to sound more natural. If you haven’t heard it yet, have a listen on their recent blog post. I think for the first time I would give a pass on the Turing test — I heard the voice samples before reading the post, I assumed it’s some natural voice thing but I could not tell who is the robot and who is the person.
While I am absolutely amazed at the technological advances, the voice synthesis, the natural responses, the real-time answers, the handling of complex situations demonstrated — I have also been thinking a lot about technology from the ethical standpoint. There is no point probably of asking “Do we want that” as technology will move forward no matter our preferences — if it wasn’t Google this year, it would be Microsoft next or a green startup 3 years from now. In the large picture it wouldn’t really matter who came up with the technology, obviously it would happen. But can we really start talking about regulating it somehow? I am shaky on my opinions of whether AI will become smarter than us and not allow us to regulate it — good arguments on both sides, we will have to wait and see I guess. But for now, we can regulate it.
Why this one though? We have had natural sounding Siri-s, Alexa-s and Cortana-s for almost a decade now with vast improvements in the last 5 years. What is different about Google Duplex? Again, if you haven’t heard it, you must. It does not sound like the voices of your assistant. The feeling that at least I got is… well, let’s say there are some people I know that talk way more mechanically than the AI on the phone. The little nuances of “mhm”s go a long way.
Many questions popped into my mind and while reading news articles, comments on HN and reddit. Here are a few in a listicle way: Wouldn’t agents mind receiving calls from bots? — As far as I understand many will actually prefer calls from bots — at least it will be a straightforward conversation and to the point without dealing with weird creepy people. Will they know it’s bots? — They might learn some patterns. But now the other side learns too, it’s not a preset algorithm as good-old-programming. The bots are learning responses so a call this month might be different than a call last month. And the voice cover this up. But maybe people are still smarter and they figure it out, then… Would I be bothered that the other side is a robot? — As in, if I understand the other side is something I can’t hurt the feelings of, because it doesn’t have feelings, I may be much more harsh and avoid social politenesses that we have built in our society. If at the same time I manage to guess bots from people but I have a failure false positives or false negatives, would that deteriorate our human conversations? The bot now has a personality? — It’s not anymore straight out factual answers. Embedding uhms and politeness cues like Thank you and Goodbye gives that bot a personality. It could be overly polite or underly polite. Is the bot going to learn the English way of over excusing itself all the time or the harsher American way of straight to the point? What are the cultural implications — because Google is an American company, are all the cultures going to become slowly just one? * What if the agent is also a bot? — Google didn’t say or confirm if they provide the same service on the other side — a bot being the agent. It hinted it though and there is no reason it won’t provide the service next year. Then two bots will be talking in this highly inefficient human way? Will they send silent cues that they are bots and negotiate a more efficient protocol midway?
All these questions and pondering came to my mind just from this slight improvement in the voice output. But the one that I am mostly thinking about these days is the problem of the increasing social isolation due to technology. Bear with me.
Social isolation (?)
Google creates this Duplex service. How much time does it actually save? It won’t take more than 2–3 minutes to make a call or make an online reservation. Sure, it also lifts up some of the decision fatigue from the day and helps social anxiety types. But does it help them really?
Get back to the cocoon. We are getting more and more help from technology. At the same time the real-life interactions with people are becoming less and less. We IM and email more, we meet less. We swipe, like and look passively at photos, sending little yellow faces of emotions and we smile genuinely less. A phone call is in the middle of what I would call a real-life interaction and a fake interaction like chat. It does have real voice and it’s very instantaneous — unlike chat or email, you can’t spend much time re-thinking your response — it has to be almost immediate, thinking on the spot. You don’t see the person face to face so you also have to guess some of the social cues from the voice alone. This kind of interactions maybe helps us a little bit in keeping the humanity.
Why is this important — keeping the humanity? Well, because we see what happens when we avoid or deny our human nature thinking that technology solves it all. We isolate in our own bubbles, we go back to our cocoons and we stop fighting, stop having the struggle with real life. This makes us passive, goal-less, makes life meaningless, lonely and empty.
It’s not just Google Duplex that does this — far from it. But it’s one more step towards the fake interactions. Because the thing on the other side doesn’t feel offended, it is not hurt by my words, there is no risk of making a human being feel happy or sad and when I talk to bots and they don’t feel anything… I lose empathy. I lose the training of understanding social cues and feeling what the other person is feeling. I lose the struggle of figuring out what went wrong and trying to fix it. I stop evolving.
Of course this is an exaggeration. Yes, I am jumping to conclusions. Maybe I am wrong and this actually helps us get better with other people — use bots to train us out of human suffering. But the more I learn about technology and people, the more I experience the effects of how technology makes me lazy and complacent, the more cynical I become about the future. And I believe it’s healthy to have these sorts of conversations and queries because I think many people in technology are sometimes so driven to achieve the goal of progress that forget about the ethical implications. I am one of them.