Integer-Only Approximated MFCC for Ultra-Low Power Audio NN Processing on Multi-Core MCUs

Abstract

Given the recent advances in the design of efficient Deep Neural Networks (DNN) for tiny edge devices, the feature extraction frontend has become a computation bottleneck for enabling audio processing on low-end MicroController Units (MCUs). To address this challenge, this work presents novel hardware-aware integer quantization schemes for the Mel- Frequency Cepstral Coefficients (MFCC) feature extractor. Our high-precision integer-only 32 bit approximated flow does not lead to accuracy degradation with respect to a full-precision implementation when feeding multiple DNN models for Audio Keyword Spotting applications. In contrast, a second lowprecision 16-bit approximated MFCC algorithm presents a 0.6% lower accuracy but results 3x faster. Additionally, by leveraging on an 8-cores MCU, GAP8, our solution results 9.8x faster than the full precision MFCC deployed on an FPU-suited MCU. When integrated within an optimized end-to-end system for Keyword Spotting, a GAP8-based audio smart device presents an overall power consumption as low as 3.4mW, demonstrating up to 35 days of lifetime with a single AA battery.