Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling

Large deep neural network (DNN) models are computation and memory intensive, which limits their deployment especially on edge devices with limited resources. This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the cost of arithmetic operations. This is achieved by, using a small set of high-precision weights (maximum absolute weight ”MAW”) and a large set of low-precision weights (Scaling weights ”SWs”). This results in decreasing the model size with minimum loss in accuracy compared to simply reducing precision. Moreover, a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses. The proposed SWC eliminate > 90% of the multiplications in the network. Full analysis for MNIST, Fashion MNIST, Cifar 10 and Cifar 100 datasets is presented for image recognition, where different DNN models are used including LeNet, ResNet, AlexNet and VGG 16.

Slides

Reduce Computing Complexity of Deep Neural Networks Through Weight Scaling (application/pdf)

Download