Details
Abstract
This paper proposes a scalable DNN processor that can be flexibly reconfigured to maximize inference efficiency on a wide range of DNN models. The processor consists of 18 computing Nodes with various precision modes support. To improve the computation throughput, we propose a sub-image parallelization strategy, where the original input image is divided into multiple sub images and computed on multiple Nodes in parallel. In addition, cross-layer pipeline is implemented to improve the resource utilization. The proposed processor is implemented in 28-nm CMOS technology and achieves a peak performance of 4.17 TOPS and energy efficiency of 2.08 Tops/W.