Accelerating 3D Convolutional Neural Networks Using 3D Fast Fourier Transform

Abstract

Three-dimensional convolutional neural networks (3D CNNs) have attracted great attention in many complex computer vision tasks. However, it is difficult to deploy 3D CNNs on practical applications due to high algorithmic complexity, imposing the urgent requirement for dedicated accelerators. In this paper, F3D, a fast algorithm for 3D CNNs, is proposed based on 3D Fast Fourier Transform (FFT) and achieves a significant algorithmic strength reduction. We then propose an F3D-based hardware architecture, featuring a flexible FFT module and an efficient partial sum aggregation module. Furthermore, a dataflow for efficient mapping of 3D CNNs is designed, leading to a significant reduction of memory access. To demonstrate the efficiency of the above-mentioned techniques, we implement the widely used 3D CNN model, C3D, as our benchmark on the Xilinx VC709 platform. The experimental result shows that compared with the state-of-the-art accelerator, our work achieves a considerable throughput up to 864.1 GOPs, along with 1.68x and 2.00x efficiency improvement on energy and DSP utilization, respectively.