The GP5 is a co-processor accelerator built to accelerate discrete belief propagation on factor graphs and other large-scale tensor product operations for machine learning. It is related to, and anticipated by a number of years, the Google Tensor Processing Unit
It is designed to run as a co-processor with another controller (such as a CPU (x86) or an ARM/MIPS/Tensilica core). It was developed as the culmination of DARPA's Analog Logic program [1]
The GP5 has a fairly exotic architecture, resembling neither a GPU nor a DSP, and leverages massive fine-grained and coarse-grained parallelism. It is deeply pipelined. The different algorithmic tasks involved in performing belief propagation updates are performed by independent, heterogeneous compute units. The performance of the chip is governed by the structure of the machine learning workload being evaluated. In typical cases, the GP5 is roughly 100 times faster and 100 times more energy efficient than a single core of a modern core i7 performing a comparable task. It is roughly 10 times faster and 1000 times more energy efficient than a state-of-the art GPU. It is roughly 1000 times faster and 10 times more energy efficient than a state-of-the-art ARM processor. It was benchmarked on typical machine learning and inference workloads that included protein side-chain folding, turbo error correction decoding, stereo vision, signal noise reduction, and others.
Analog Devices, Inc. acquired the intellectual property for the GP5 when it acquired Lyric Semiconductor, Inc. in 2011.
References
- ↑ DARPA FA8750-07-C-0231