New performance milestone reached for deep learning at scale.
The deep learning process could be about to change dramatically thanks to work being carried out Cray, Microsoft and the Swiss National Supercomputing Centre.
In existing architectures and conventional systems, deep learning requires a slow training process that can take months, something that can lead to significantly higher costs and delays in making scientific discoveries.
Cray believes that its work with Microsoft and CSSC could have solved this problem by applying supercomputing architectures to accelerate the training process.
The three worked together to scale the Microsoft Cognitive Toolkit on a Cray XC50 supercomputer at CSCS nicknamed “Piz Daint”.
According to the supercomputer manufacturer, deep learning problems share algorithmic similarities with applications that are traditionally run on a massively parallel supercomputer. So by optimising inter-node communication using the Cray XC Aries network and a high performance MPI library, each training job is said to be able to leverage more compute resources and therefore reduce the amount of time required to train them.
Prof. Dr. Thomas C. Schulthess, director of the Swiss National Supercomputing Centre (CSCS). “What is most exciting is that our researchers and scientists will now be able to use our existing Cray XC supercomputer to take on a new class of deep learning problems that were previously infeasible.”
As part of their work together, the companies scaled the Microsoft Cognitive Toolkit to more than 1,000 NVIDIA Tesla P100 GPU accelerators on the Cray XC50 supercomputer. The result of this is said to open the door for researchers to run larger, more complex, and multi-layered deep learning workloads at scale.
The hope is that this kind of development will help to increase the number of deep learning workloads on supercomputers. With this in mind, Cray is providing deep learning toolkits such as the Microsoft Cognitive Toolkit.