Video Not Available
Details
This work presents an energy-efficient digital-based computing-in-memory (CIM) processor to support floating-point (FP) deep neural network (DNN) acceleration. Previous FP-CIM processors have two limitations. Processors with post-alignment shows low throughput due to serial operation, and the other processor with pre-alignment incurs truncation error. To resolve these problems, we focus on the statistics that outlier exists according to shift amount in pre-alignment-based FP operation. As those outlier decreases energy efficiency due to long operation cycles, it needs to be processed separately. The proposed Hetero-FP-CIM integrates both CIM arrays and shared NPU, so they compute both dense inlier and sparse outlier respectively. It also includes efficient weight caching system to avoid entire weight copy in shared NPU. The proposed Hetero-FP-CIM is simulated in 28 nm CMOS technology and occupies 2.7 mm2. As a result, it achieves 5.99 TOPS/W at ImageNet (ResNet50) with bfloat16 representation.