A team of researchers at the Allen School and AWS have released a new open compiler for deploying deep learning frameworks across a variety of platforms and devices. The NNVM compiler simplifies the design of new front-end frameworks and back-end hardware by offering the ability to compile front-end workloads directly to hardware back-ends. The new tool is built upon the TVM stack previously developed by the same Allen School researchers in order to bridge the gap between deep learning systems optimized for productivity, and the programming, performance, and efficiency constraints enforced by different types of hardware.
“While deep learning is becoming indispensable for a range of platforms — from mobile phones and datacenter GPUs, to the Internet of Things and specialized accelerators — considerable engineering challenges remain in the deployment of those frameworks,” noted Allen School Ph.D. student Tianqi Chen. “Our TVM framework made it possible for developers to quickly and easily deploy deep learning on a range of systems. With NNVM, we offer a solution that works across all frameworks, including MXNet and model exchange formats such as ONNX and CoreML, with significant performance improvements.”
With the help of the TVM stack, the NNVM compiler represents and optimizes common deep-learning workloads in standardized computation graphs. It then transforms these high-level graphs, optimizing the data layout while reducing memory utilization and fusing the computation patterns for different hardware back-ends. Finally, NNVM presents an end-to-end compilation pipeline, from the front-end frameworks to bare-metal hardware.
“Existing deep learning frameworks package the graph optimization with the deployment runtime,” noted Allen School professor Carlos Guestrin. “NNVM follows the conventional wisdom of compilers, separating the optimization from the deployment runtime. Using this approach, we get substantial optimization while keeping the runtime lightweight.”
While NNVM is still under development, early indications are that the approach is a step forward compared to the current state of the art. The team benchmarked the performance of the new compiler against that of the MXNet framework for two popular hardware configurations: Nvidia GPU on AWS and ARM CPU on Raspberry Pi. On both benchmarks, the NNVM compiler achieved faster speeds; on the Raspberry Pi, the code generated by the compiler was two times faster for ResNet18 and 11 times faster for MobileNet. With the NNVM compiler, developers will be able to provide consistent results from multiple frameworks to users of a variety of platforms in less time and with significantly less engineering effort.
Like TVM, the NNVM compiler is the product of a collaboration among researchers in machine learning, systems, and computer architecture. In addition to Chen and Guestrin, Allen School Ph.D. students Thierry Moreau and Haichen Shen, and professors Luis Ceze and Arvind Krishnamurthy worked with the AWS AI team to build the new tool.
Learn more in the detailed overview here, and read the AWS announcement here.