Skip to main content
Video s3
    Details
    Presenter(s)
    Yu-Che Yen Headshot
    Display Name
    Yu-Che Yen
    Affiliation
    Affiliation
    National Sun Yat-sen University
    Country
    Abstract

    We present a quantization algorithm to find the per-layer bit-width in deep neural networks (DNN) models and then propose multi-precision designs for two fundamental DNN arithmetic operations: multiplication and non-linear activation function computation. The multi-precision multiplier design with truncated partial product bits supports four precision modes (4-bit, 8-bit, 12-bit, and 16-bit) with shared hardware resource so that the power consumption in low-precision modes can be reduced by turning off unnecessary hardware circuits. For the evaluation of non-linear activation functions, we compare three different approaches in various precision requirements and observe that the best design method depends on the bit-accuracy.

    Slides
    • Efficient Quantization and Multi-Precision Design of Arithmetic Components for Deep Learning (application/pdf)