Details
![Cecilia Eugenia De la Parra Headshot](https://confcats-catavault.s3.amazonaws.com/CATAVault/ieeecass/master/files/styles/cc_user_photo/s3/user-pictures/12571.jpg?h=3bbcbd73&itok=curBdXDt)
- Affiliation
-
AffiliationRobert Bosch GmbH
- Country
Efficient low-power accelerators for Convolutional Neural Networks (CNNs) largely benefit from quantization and approximation, which are typically applied layer-wise for efficient hardware implementation. In this work, we present a novel strategy for efficient combination of these concepts at a deeper level, which is at each channel or kernel. We first apply layerwise, low bit-width, linear quantization and truncation-based approximate multipliers to the CNN computation. Then, based on a state-of-the-art resiliency analysis, we are able to apply a kernelwise approximation and quantization scheme with negligible accuracy losses, without further retraining. Our proposed strategy is implemented in a specialized framework for fast design space exploration. This optimization leads to a boost in estimated power savings of up to 34% in residual CNN architectures for image classification, compared to the base quantized architecture.