Multiple-Precision Floating-Point Dot Product Unit for Efficient Convolution Computation

Abstract

For the convolutional neural network (CNN) and high-performance computing, the dot-product units (DPU) have been used to improve the hardware efficiency of the convolution computation. Besides, multiple-precision floating-point (FP) support is essential for the accuracy requirement of various applications. In this work, multiple-precision FP many-term DPU is designed with single instruction multiple data (SIMD) structure. To speed up the summation process, a carry-select adder (CSLA) is designed with excellent area-delay product (ADP) and power-delay product (PDP) performance. The proposed design is realized in UMC 55-nm process with experimental results. Compared with the state-of-the-art multiple-precision work, the proposed design achieves maximum 3.76 times power performance improvement for FP16 operations. Compared with previous CSLA designs, the proposed work can improve ADP and PDP performance by 4.7% and 3.91%, respectively.