The new speech recognition system makes more or less the same mistakes as a professional transcriptionist.
Microsoft claimed a breakthrough in speech recognition with a technology that can recognise words and their context in a conversation, which it said is equal to the understanding of humans.
A team from Microsoft Artificial Intelligence and Research said that the speech recognition system makes the same number of fewer mistakes than a professional transcriber.
A word error rate (WER) of 5.9% had been achieved compared to the rate stood at 6.3% reported last month.
With the breakthrough, a computer can now recognise words in a conversation, very similar to a person, the company said.
Microsoft Artificial Intelligence and Research group executive vice president Harry Shum said: “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible.”
The milestone can open up a whole new world of opportunities tech industry for businesses. Technology products could be significantly upgraded with speech recognition and many other products could feature this technology.
Significant upgrades for products and services such as Xbox, speech-to-text transcription applications and personal digital assistants like Microsoft’s Cortana.
According to the team, this was achieved by systematically applying latest neural network technology in all aspects of the system.
The breakthrough came when the researchers used neural language models which recognise words as continuous vectors in space.
A Microsoft-developed deep learning system, Computational Network Toolkit was also used to achieve the breakthrough. The company has placed this toolkit in GitHub through an open source license for developers
Apart from the software, the computers used intense graphics processing units to do the job of processing the inputs, which significantly added to the speed of the operations.
The news follows another of Microsoft’s team’s recent achievement in computer vision. The team had won first place in COCO image segmentation challenge, where the computer can recognise whether or not an object is in an image.
The team says that the technology needs to be tested extensively and to ensure that it works under real-world conditions.
According to the researchers, the technology must be able to recognise voice under several conditions such as while driving, in a party or where there is a lot background noise and also to recognise several accents.