Skip to main content
Video s3
    Details
    Presenter(s)
    Wei-Chen Lin Headshot
    Display Name
    Wei-Chen Lin
    Affiliation
    Affiliation
    National Yang Ming Chiao Tung University
    Country
    Country
    Taiwan
    Abstract

    Multilayer perceptron (MLP) is one of the most popular neural network architectures used for classification, regression, and recommendation systems today. In this paper, we propose an efficient and low-power MLP accelerator for edge computing. The accelerator has three key features. First, it aligns with a novel structured weight pruning algorithm that merely needs minimal hardware support. Second, it takes advantage of activation sparsity for power minimization. Third, it supports asymmetric quantization on both weights and activations to boost the model accuracy especially when those values are in low-precision formats. Furthermore, the number of PEs is determined based on the available external memory bandwidth to ensure the high PE utilization, which avoids area and energy wastes. Experiment results show that the proposed MLP accelerator with only 8 MACs operates at 1.6GHz using the TSMC 40nm technology, delivers 899GOPS equivalent computing power after structured weight pruning on a well-known image classification model, and achieves an equivalent energy efficiency of 9.7TOPS/W, while the model accuracy loss is less than 0.3% at the presence of asymmetric quantization.

    Slides
    • An Efficient and Low-Power MLP Accelerator Architecture Supporting Structured Pruning, Sparse Activations and Asymmetric Quantization for Edge Computing (application/pdf)