Skip to main content
Video s3
    Details
    Presenter(s)
    Juinn-Dar Huang Headshot
    Display Name
    Juinn-Dar Huang
    Affiliation
    Affiliation
    National Chiao Tung University
    Country
    Abstract

    Convolutional neural networks (CNNs) have been playing an important role in various applications, e.g., computer vision. Since CNN computations require numerous multiply-accumulate (MAC) operations, how to get them done efficiently is a crucial issue for CNN hardware accelerators. In this paper, we propose a high-speed power-efficient convolver architecture for CNN acceleration. A 3×3 convolver is asked to produce an output every cycle and is commonly accomplished by summing up the results of nine parallel multiplications, which requires ten carry-propagation adders (CPAs) in total. However, the proposed coarse-grained convolver can break the boundary between multipliers and reduce all partial products in a more global way. Consequently, it requires only one CPA to generate the final outcome. It also features a globally delay-optimized partial product reduction tree and a depth-first compression scheme for both area and power minimization. The proposed convolver has been implemented using TSMC 40nm technology. Compared to a conventional 3×3 convolver baseline design, our design can reduce area and power by 15.8% and 26.5% respectively at the clock rate of 1GHz.

    Slides