Skip to main content
Video s3
    Details
    Presenter(s)
    Bradley McDanel Headshot
    Display Name
    Bradley McDanel
    Affiliation
    Affiliation
    Franklin and Marshall College
    Country
    Abstract

    The proposed saturation RRAM for in-memory computing of a pre-trained Convolutional Neural Network (CNN) inference imposes a limit on the maximum analog value output from each bitline in order to reduce analog-to-digital (A/D) conversion costs. The proposed scheme uses term quantization (TQ) to enable flexible bit annihilation at any position for a value in the context of a group of weights values in RRAM. This enables a drastic reduction in the required ADC resolution while still maintaining CNN model accuracy. Specifically, we show that the A/D conversion errors after TQ have a minimum impact on the classification accuracy of the inference task. For instance, for a 64x64 RRAM, reducing the ADC resolution from 6 bits to 4 bits enables a 1.58x reduction in the total system power, without a significant impact to classification accuracy.

    Slides
    • Saturation RRAM Leveraging Bit-Level Sparsity Resulting from Term Quantization (application/pdf)