A 3.6 TOPS/W Hybrid FP-FXP Deep Learning Processor with Outlier Compensation for Image-to-Image Application

Abstract

A Hybrid floating-point (FP) and fixed-point (FXP) deep learning processor with an outlier-aware channel splitting algorithm is proposed for image-to-image applications on mobile devices. In this work, the proposed algorithm reduces 16-bit FP data to 8-bit FXP data, and only few outliers (< 10%) are computed in 16-bit FP while maintaining the image reconstruction quality. Therefore, it reduces EMA by 45.5%. Moreover, the hierarchical processor accelerates these dense 8-bit FXP data and sparse 16-bit FP data, and the functional L2 memory aggregates the convolution output of them by forming the pipeline, which reduces 98% of latency. The proposed system is simulated in 28nm COMS technology, and it occupies 4.16mm2. The hierarchical processor successfully demonstrates the × 4 scale Full-HD super-resolution generation achieving 76 frames-per-second (fps) with 133.3 mW power-consumption at 0.9 V supply and 3.6 TOPS/W of energy-efficiency which is × 3.27 higher than the previous 16-bit FXP processor.