Low Power Convolutional Neural Network (CNN) Accelerator Design Techniques for both Inference and Training

Abstract

Deep neural networks (DNNs) have been showing very impressive performance in a variety of tasks including image classification and object detection and speech recognition. As recent DNNs adopt deeper and larger network architectures for improved accuracy, larger computation resources and memory capacities are needed for both inference and training of such large DNNs. Although low power deep learning accelerator design has been of interest in cloud computing, it is becoming even more critical as many applications are trying to run large DNNs on resource-constrained edge computing devices. This lecture covers various low power design techniques for DNN, especially convolutional neural network (CNN) inference/training accelerator design. It first introduces the basic operations of CNN inference/training. Then, an error resilient technique for CNN inference to enable aggressive voltage scaling is presented. The aggressive voltage scaling in CNN accelerator is possible by exploiting asymmetric error resilience (sensitivity) with respect to CNN layers, filters, and channels. The last part of this lecture introduces an input-dependent approximation of the weight gradient for improving energy efficiency of CNN training. Considering that the output predictions of network (confidence) changes with training inputs, the relations between the confidence and the magnitudes of weights gradient is efficiently exploited to skip the gradient computations without accuracy drop, especially for high confidence inputs.