Details
- Affiliation
-
AffiliationUniversity of Louisiana at Lafayette
- Country
This paper proposes an energy-efficient hardware accelerator design dedicated for BERT-based architectures with a reconfigurable functionality that improves circuit reusability and reduces hardware resources utilization. It presents a holistic design and implementation of a reconfigurable hardware accelerator for BERT-based deep neural network language models. The proposed design leverages Fast Fourier Transform-based multiplication on block-circulant matrices for accelerating BERT weights matrices’ multiplication. A cross-platform comparative analysis shows that the proposed hardware accelerator achieves a state-of-the-art performance and outperforms both CPU and GPU. This design is suitable for efficient NLP on resource-constrained platforms where low latency and high throughput are critical.