A 5.99 TFLOPS/W Heterogeneous CIM-NPU Architecture for an Energy Efficient Floating-Point DNN Acceleration

Details

Author(s)

Wonhoon Park

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Junha Ryu

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Sangjin Kim

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Soyeon Um

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Wooyoung Jo

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Sangyeob Kim

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Hoi-Jun Yoo

Affiliation: Affiliation

Korea Advanced Institute of Science and Technology

View profile

Abstract

This work presents an energy-efficient digital-based computing-in-memory (CIM) processor to support floating-point (FP) deep neural network (DNN) acceleration. Previous FP-CIM processors have two limitations. Processors with post-alignment shows low throughput due to serial operation, and the other processor with pre-alignment incurs truncation error. To resolve these problems, we focus on the statistics that outlier exists according to shift amount in pre-alignment-based FP operation. As those outlier decreases energy efficiency due to long operation cycles, it needs to be processed separately. The proposed Hetero-FP-CIM integrates both CIM arrays and shared NPU, so they compute both dense inlier and sparse outlier respectively. It also includes efficient weight caching system to avoid entire weight copy in shared NPU. The proposed Hetero-FP-CIM is simulated in 28 nm CMOS technology and occupies 2.7 mm2. As a result, it achieves 5.99 TOPS/W at ImageNet (ResNet50) with bfloat16 representation.

Video Not Available

A 5.99 TFLOPS/W Heterogeneous CIM-NPU Architecture for an Energy Efficient Floating-Point DNN Acceleration