Details
We present a training tool flow for deep neural networks (DNN) optimized for a hardware-efficient FPGA- implementation based on reconfigurable constant-coefficient multipliers (RCCMs). RCCMs replace the costly generic multipliers by shift-and-add operations. In previous work, it was shown that RCCMs offer a better alternative for saving FPGA area than utilizing low-precision arithmetic. This work proposes an improved tool flow that enables layer-wise weight quantization, a larger search space by additional RCCM coefficient sets and an optimized retraining. This leads to an improved accuracy compared to the previous method. In addition, hardware requirements are lower as only 1 to 3 adders per multiplication are used. This reduces the overall complexity and the required memory bandwidth simultaneously. We evaluate our tool flow using multiple networks (ResNets) on the ImageNet data set.