A High-Performance RNS LSTM Block | IEEE CAS

The Residue Number System (RNS) has been proposed as an alternative to conventional binary representations for use in AI hardware accelerators. While it has been successfully utilized in applications targeting Convolutional Neural Networks (CNNs), its usage in other network models such as Recurrent Neural Networks (RNNs) has been set back due to the difficulty of implementing more complex activations functions like tanh and sigmoid in the RNS domain. In this paper, we seek to extend its usage in such models, and in particular LSTM networks, by providing efficient RNS implementations of the activation functions. To this aim, we derive improved accuracy piecewise linear approximations of the tanh and sigmoid functions using the minimax approach and propose a fully RNS-based hardware realization. We show that our approximations can effectively mitigate accuracy degradation in LSTM networks compared to naive approximations, while the RNS LSTM block can be up to 40% more efficient in terms of performance per area unit compared to a binary counterpart, when used in high performance-targeted accelerators.

Slides

A High-Performance RNS LSTM Block (application/pdf)

Download