Skip to main content
Video s3
    Details
    Presenter(s)
    Vasilis Sakellariou Headshot
    Affiliation
    Affiliation
    Khalifa University
    Country
    Author(s)
    Affiliation
    Affiliation
    Khalifa University
    Affiliation
    Affiliation
    University of Patras
    Display Name
    Ioannis Kouretas
    Affiliation
    Affiliation
    University of Patras
    Display Name
    Hani Saleh
    Affiliation
    Affiliation
    Khalifa University
    Display Name
    Thanos Stouraitis
    Affiliation
    Affiliation
    Khalifa University
    Abstract

    In this paper, a method to reduce the number of multiplications in convolutional layers by exploiting the properties of the Residue Number System (RNS) is proposed. RNS decomposes the elementary computations into a number of small bit-width, independent channels, which can be processed in parallel. Naturally, due to the small dynamic range of each RNS channel, the number of common factors inside the weight kernels during a convolution is increased. By identifying these common factors and by rearranging the order of computations to perform first the additions of the input feature-map terms that correspond to the same factors, the number of multiplications can be reduced up to 97%, for state-of-the-art CNN models. The remaining multiplications are also simplified, as they are implemented through shift-add operations or fixed-operand multipliers. ASIC implementations of the proposed Processing Element (PE) architecture show a speedup of up to 2.67x and 1.64x compared to the binary and conventional RNS counterparts, respectively. Compared to a conventional RNS PE implementation, the proposed method also leads to a 20% reduction in area and 16% reduction in power consumption.

    Slides
    • On Reducing the Number of Multiplications in RNS-Based CNN Accelerators (application/pdf)