Adaptable Approximation Based on Bit Decomposition for Deep Neural Network Accelerators

Abstract

For optimized deployment of deep neural networks in embedded devices, hardware approximations offer acceptable trade-offs between computational resources, power consumption and network accuracy. In this paper, we propose a novel approximation technique with variable error range, targeting various architectures that adopt bit-decomposition of the Multiply and Accumulate (MAC) operation, specially emerging in-memory computing based architectures. Through our experiments with state-of-the-art neural networks for image classification using CIFAR10 and ImageNet, we demonstrate that this approximation technique achieves 2x speedup compared to the base architecture, with accuracy loss of less than 3%.