A Fast Compressed Hardware Architecture for Deep Neural Networks

Hardware acceleration of Deep Neural Networks is very critical to many edge applications. The acceleration solutions available today are typically for GPU, CPU, FPGA and ASIC platforms. The Single Partial Product 2-D Convolution known as SPP2D is a hardware architecture for fast 2-D convolution which can be used for implementing a convolutional neural network (CNN). The SPP2D prevents the re-fetching of input weights for the calculation of partial weights and it computes the output of any input size and kernel with low latency and high throughput compared to some other popular techniques. In this paper, we utilize SPP2D for a full hardware implementation of the VGG-16 CNN and also for the compressed network that is pruned and quantized which requires less on-chip memory thus reducing the most power consuming task of moving data from off-chip to on-chip.

Slides

A Fast Compressed Hardware Architecture for Deep Neural Networks (application/pdf)

Download