Recent Progresses of Compute-in-Memory for Deep Learning Inference Engine

Abstract

Compute-in-memory (CIM) is a new computing paradigm that addresses the memory-wall problem in the deep learning hardware accelerator. SRAM and resistive random access memory (RRAM) are identified as two promising embedded memories to store the weights of the deep neural network (DNN) models. In this lecture, first we will review the recent progresses of SRAM and RRAM-CIM macros that are integrated with peripheral analog-to-digital converter (ADC). The bit cell variants (e.g. 6T SRAM, 8T SRAM, 1T1R, 2T2R) and array architectures that allow parallel weighted sum are discussed. State-of-the-art silicon prototypes are surveyed with normalized metrics such as energy efficiency (TOPS/W) and compute efficiency (TOPS/mm2). Second, we will discuss the array-level characterizations of non-ideal device characteristics of RRAM, e.g. the variability and reliability of multilevel states, which may negatively affect the inference accuracy. Third, we will discuss the general challenges in CIM chip design with regards to the imperfect device properties, ADC overhead, and chip to chip variations. Finally, we will discuss future research directions including monolithic 3D integration of memory tier on top of the peripheral logic tier to fully unleash the potentials of the CIM with RRAM technologies.