Deep reinforcement learning methods put you to shame
Cambridge-based startup Wayve, founded by a team from Cambridge University’s Engineering Department, has developed a neural network sophisticated enough to learn how to drive a car in 20 minutes using only a computer and a single camera.
“This is the first example where an autonomous car has learnt online, getting better with every trial,” the company said in a public post last week.
Using a heavily modified Renault Twizy with a single camera on the front, the Wayve team hooked the vehicle up to an in-car GPU capable of analysing the camera feed in real time, and ran a learning program based on experimentation and optimisation.
Every time the Renault left its lane, the team stopped it and corrected it. The algorithm “penalised” the car for making mistakes, and “rewarded” it based on how far it traveled without human intervention. It rapidly learned to follow a lane.
Wayve said in a blog post: “We adapted a popular model-free deep reinforcement learning algorithm (deep deterministic policy gradients, DDPG) to solve the lane following task. Our model input was a single monocular camera image. Our system iterated through 3 processes: exploration, optimisation and evaluation.”
“Our network architecture was a deep network with four convolutional layers and 3 fully connected layers with a total of just under 10k parameters. For comparison, state of the art image classification architectures have 10s of millions of parameters.”
The company admittedly only taught the car to stick to a single lane, but the pace at which it learned was blistering, with the test vehicle consistently learning to lane follow in less than 20 trials. This had come after extensive simulation, however.
Wayve said: “We used simulated tests to try out different neural network architectures and hyperparameters until we found settings which consistently solved the task of lane following in very few training episodes i.e. with little data. For example, one of our findings was that training the convolutional layers using an auto-encoder reconstruction loss significantly improved stability and data-efficiency of training.”
Wayve, founded by Cambridge AI experts Amar Shah and Alex Kendall said: “Here, we have provided evidence for the first viable framework to quickly improving driving algorithms from being mediocre to being roadworthy. The ability to quickly learn to solve tasks through clever trial and error is what has made humans incredibly versatile machines capable of evolution and survival. We learn through a mixture of imitation, and lots of trial and error for everything from riding a bicycle, to learning how to cook.”
They added: “DeepMind have shown us that deep reinforcement learning methods can lead to super-human performance in many games including Go, Chess and computer games, almost always outperforming any rule based system. We here show that a similar philosophy is also possible in the real world, and in particular, in autonomous vehicles. A crucial point to note is that DeepMind’s Atari playing algorithms required millions of trials to solve a task. It is remarkable that we consistently learnt to lane-follow in under 20 trials.”
Mark Bridger, SVP of OpenText UK, said: “The results of this research highlight that we’re very much in an era of transition for automotive vehicles. The mix of confusion, fear, optimism and inevitability in the minds of UK citizens shows that whereas some AI-enabled technologies have moved seamlessly into our lives, more game-changing offerings like autonomous vehicles will take time to be embraced.”
“AI will enable automakers to analyse, adapt, and suggest solutions based on data. As autonomous vehicles become more common, the data they produce will become a new, powerful asset for organisations.”
“Yet car companies need to ensure they are doing more than delivering the most innovative connected technology. Addressing consumer concerns and loss of confidence will be critical for success and take up too. They need to ensure the technology is safe and reliable in order to install the level of trust needed for mass adoption.”