This Thursday will mark part two of our six-part odyssey into the murky and mercurial world of Artificial Intelligence. Having previously discussed the ethical implications of man vs. machine, we’ll be turning our attention this week to voice-activated AI, which with the rise of Amazon’s Alexa, Microsoft’s Cortana and Apple’s Siri is a topic that’s got the whole industry talking (pun very much intended….).
One of the many exciting developments in the field has come courtesy of Affectiva, an emotion measurement technology company that grew out of MIT’s Media Lab. While they had previously focused on getting machines to identify emotion in images by observing the ways faces change when expressing different circumstances, their latest technology builds on that premise to detect emotion in speech. Using deep learning, it observes changes in tone, volume, speed and voice quality to recognise emotions in recorded speech. Ultimately this could have useful real-world applications, whether its voice assistants understanding frustration and appreciation from their users and adjusting their approach accordingly, or even groundbreaking safety applications such as voice assistants picking up on road rage or domestic violence in the home.
Voice technology is already having a huge impact in the East, with over 500 million people in China using the app iFlytek Input. This app records voice commands and sends them as text messages, after first translating them into the language of the receiver. For example, a Mandarin speaker can say a message into the phone microphone and send it to two colleagues, one who is an English speaker and one who is a Cantonese speaker, who will both get the message in their native tongue. They can then reply in English or Cantonese, and the Mandarin speaker will receive the messages in Mandarin.
Source: MIT Technology Review
iFlytek Input is also used in the Courts to transcribe lengthy proceedings, to send messages while driving, and to direct ‘Didi’ drivers (a Chinese Uber equivalent). Plans are in place to move the app from just being a passive request enabler to an active entertainment provider – suggesting new music to listen to or highlighting breaking news that might be of interest to the audience.
However, while Voice technology clearly has many clever applications, it still isn’t without its flaws. As we’ve mentioned in previous blogs, brands are already playing around in this space, with Burger King’s hijacking of Google Home being a standout example. As ever, this technological frailty has been taken one step further by the writers of South Park, who in September aired an episode where Cartman becomes obsessed with an Amazon Alexa and adds a number of disgusting items to its shopping basket. The twist being that many real-world Alexas around America picked up on what he was asking his Alexa to do, and started to add the very same things to the shopping baskets of their blissfully unaware owners. While this is undeniably a fantastic prank, it does provide stark evidence of the state of the technology at the moment – smart in what it does, but not necessarily smart in how it does it.
Source: SF Gate, Sep 17
Undoubtedly the potential applications of voice AI will continue to evolve in the coming years, adding another dimension to our daily lives and giving brands yet another channel in which to flex their storytelling capabilities. If you’d like to learn more, please join us this Thursday (Oct 12th) when our friends at Google will be giving us the inside track on how they’re currently leveraging the technology and how they see the next few years unfolding. We’ll also be following up with our five main takeaways from the session – so stay tuned!
Until then, stay curious.
Cover Image Credit: Movieclips, Jan 2014