An Efficient and Low-Power MLP Accelerator Architecture Supporting Structured Pruning, Sparse Activations and Asymmetric Quantization for Edge Computing

Abstract

Multilayer perceptron (MLP) is one of the most popular neural network architectures used for classification, regression, and recommendation systems today. In this paper, we propose an efficient and low-power MLP accelerator for edge computing. The accelerator has three key features. First, it aligns with a novel structured weight pruning algorithm that merely needs minimal hardware support. Second, it takes advantage of activation sparsity for power minimization. Third, it supports asymmetric quantization on both weights and activations to boost the model accuracy especially when those values are in low-precision formats. Furthermore, the number of PEs is determined based on the available external memory bandwidth to ensure the high PE utilization, which avoids area and energy wastes. Experiment results show that the proposed MLP accelerator with only 8 MACs operates at 1.6GHz using the TSMC 40nm technology, delivers 899GOPS equivalent computing power after structured weight pruning on a well-known image classification model, and achieves an equivalent energy efficiency of 9.7TOPS/W, while the model accuracy loss is less than 0.3% at the presence of asymmetric quantization.