Apple’s Siri personal assistant on the new iPhone appears to have jump-started a long suffering market in technology – voice automation. What are the challenges ahead for developers, and what does it mean for Apple’s key rivals in 2012?
The next few years will see a huge boom in voice automated technology, led in the consumer space by Apple’s Siri, and most recently the automation in Microsoft Xbox 360. How will app developers react to this new form of programming?
Apple’s Siri may be the key breakthrough in voice automated technology that drives adoption across the industry, says Dr Ahmed Bouzid, senior director of product management at Consumer Experience Management (CEM) firm Angel.
Dr Bouzid has over 15 years of experience in speech automation, has written extensively on voice automation and voice user interface design and is a co-inventor on several patents within the space.
He believes that after more than a decade of speech automation technology being ‘just around the corner’, Apple’s Siri may finally be the product that cracks the mass market.
Siri was introduced to the new iPhone 4S as a ‘personal assistant’; the user has a minor dialogue with a rudimentary, yet personable AI that can schedule meetings, send texts and emails and search the internet for facts. It has quickly developed a typically Apple cult-like following.
Dr Bouzid believes that while Apple’s technology is not revolutionary in and of itself, its integration into the iPhone and the portability of the device have led to a mainstream acceptance that will be the catalyst to force voice automation across a range of devices, from microwaves to TVs.
There are two key elements to the technology, the actual voice recognition (which has been around for decades) and the ‘natural language processing’ which is the ability to reason and put the information in context. Siri is the first proper integration of both in the mainstream, he believes.
Microsoft most recently announced gesture and voice activated TV through its Xbox 360 video game console’s Kinect device.
While Dr Bouzid does agree that this kind of technology is the future, he believes there are two very different philosophies at work. While Siri involves a back and forth conversation, the Xbox’s system is more of a tool to direct an interface or menu. As Dr Bouzid puts it ‘you talk at the Xbox, you talk with Siri’.
In this sense, Apple’s Siri is dominant. It is also a cloud based system, which means it is constantly being fed new information from its users that not only boosts Siri’s knowledge base (providing even better answers) but it also gives Apple a huge insight into what its customer base is doing. Siri is effectively learning.
"Siri, while very good, is still flawed. In a year’s time, after a year of learning effectively, I believe it will be amazing," he said.
More worryingly for Google is that internet users can now bypass its search engine by asking Siri to find information directly. She is effectively a new, non visual, search engine interface. Given that Google’s entire business model relies on click-throughs and ads on its search pages, it is potentially a game-changer.
Dr Bouzid believes that the way developers produce software and apps will fundamentally change also, they will work around voice and potentially integrate Siri, or produce Siri-like programmes that will bypass Google’s influence.
He predicts that Apple will release a Siri Software Development Kit within the next 12-18 months. He also believes that the large in-built audience of developers and consumers on Apple’s App Store, combined with their head start on Siri, will push Apple’s software back into dominance.
Google’s discussion of Siri has been mostly dismissive, but interestingly Dr Bouzid says that Google has long been sitting on its own voice application programming interface (API).
"If you think about it, Google actually has a question and answer model already existing in its search engine. For example, you can ask Google ‘Who is the tallest man on earth?’ and it will give you a pretty accurate answer – same as Siri, but without the voice or OS integration."
So why hasn’t Google acted in this space, or integrated a Siri into its Android software?
"They were too reluctant to look beyond their model where a user had to click on something. I have been watching this market for two years now, and I thought Google was going to be the leader in this [voice automation]. They had all the components ready to go. This is an interesting cautionary tale in how your business model, where you make a lot of money, can blind you to the next step."
"Right now it feels like they are scrambling to find a way not to be completely sidelined by this new way of interacting with information. They cannot survive if they cannot figure out a way to their technology in the loop here."
Given that Google’s Android already has a pretty good voice recognition command system; will Google be looking at rolling out a more conversational Siri style application on Android?
"I don’t think they have a choice at this point. I think they need to look at a way to grow back into their model. Actually they probably had the best speech recognition software until Siri came out. The key is that we are no longer looking at simple voice commands. You can actually hold a conversation with Siri."
Dr Bouzid doesn’t reckon the race is over however, he thinks Microsoft’s API is similarly excellent, and if it can integrate its research and software coherently across the entire Microsoft platform, from Windows Phone 7 through to the Xbox, it will compete with Apple.
Again, much will be determined by the power of the App stores. As CBR has noted before, the growth of a product eco-system (such as apps) and usability is turning out to be far more important than a system’s raw power, and this integration across platforms is key. Siri is rumoured to also be in the new iPad 3, giving it great market saturation.
So where does Dr Bouzid see voice automated apps in the future? He believes the next 12 months will see explosive growth in apps that perform simple 2-3 step conversations – much as Siri does now.
In three years, it will be able to do 8-11 steps plus, and enterprises will be widely using the device. For example, Siri-esque systems will help you organise a holiday or order a pizza. It could also cut down on call centre costs. It looks like we’re going to be talking to more and more Siri-style devices and applications, and once again Apple is driving the adoption.