On Reducing the Number of Multiplications in RNS-Based CNN Accelerators

In this paper, a method to reduce the number of multiplications in convolutional layers by exploiting the properties of the Residue Number System (RNS) is proposed. RNS decomposes the elementary computations into a number of small bit-width, independent channels, which can be processed in parallel. Naturally, due to the small dynamic range of each RNS channel, the number of common factors inside the weight kernels during a convolution is increased. By identifying these common factors and by rearranging the order of computations to perform first the additions of the input feature-map terms that correspond to the same factors, the number of multiplications can be reduced up to 97%, for state-of-the-art CNN models. The remaining multiplications are also simplified, as they are implemented through shift-add operations or fixed-operand multipliers. ASIC implementations of the proposed Processing Element (PE) architecture show a speedup of up to 2.67x and 1.64x compared to the binary and conventional RNS counterparts, respectively. Compared to a conventional RNS PE implementation, the proposed method also leads to a 20% reduction in area and 16% reduction in power consumption.

Slides

On Reducing the Number of Multiplications in RNS-Based CNN Accelerators (application/pdf)

Download