Skip to main content
Video s3
    Details
    Author(s)
    Display Name
    Xiaobai Chen
    Affiliation
    Affiliation
    Nanjing University of Posts and Telecommunications
    Display Name
    Qiurun Hu
    Affiliation
    Affiliation
    Nanjing University of Posts and Telecommunications
    Display Name
    Fu Xiao
    Affiliation
    Affiliation
    Nanjing University of Posts and Telecommunications
    Display Name
    Jieming Yin
    Affiliation
    Affiliation
    Nanjing University of Posts and Telecommunications
    Abstract

    This paper proposes a scalable DNN processor that can be flexibly reconfigured to maximize inference efficiency on a wide range of DNN models. The processor consists of 18 computing Nodes with various precision modes support. To improve the computation throughput, we propose a sub-image parallelization strategy, where the original input image is divided into multiple sub images and computed on multiple Nodes in parallel. In addition, cross-layer pipeline is implemented to improve the resource utilization. The proposed processor is implemented in 28-nm CMOS technology and achieves a peak performance of 4.17 TOPS and energy efficiency of 2.08 Tops/W.