A Resource-Saving Energy-Efficient Reconfigurable Hardware Accelerator for BERT-Based Deep Neural Network Language Models Using FFT Multiplication

This paper proposes an energy-efficient hardware accelerator design dedicated for BERT-based architectures with a reconfigurable functionality that improves circuit reusability and reduces hardware resources utilization. It presents a holistic design and implementation of a reconfigurable hardware accelerator for BERT-based deep neural network language models. The proposed design leverages Fast Fourier Transform-based multiplication on block-circulant matrices for accelerating BERT weights matrices’ multiplication. A cross-platform comparative analysis shows that the proposed hardware accelerator achieves a state-of-the-art performance and outperforms both CPU and GPU. This design is suitable for efficient NLP on resource-constrained platforms where low latency and high throughput are critical.

Slides

A Resource-Saving Energy-Efficient Reconfigurable Hardware Accelerator for BERT-Based Deep Neural Network Language Models Using FFT Multiplication (application/pdf)

Download