Exploiting Weight Statistics for Compressed Neural Network Implementation on Hardware

Abstract

Computing resources like FPGA , microcontrollers and microprocessors in IoT, embedded and mobile applications have compact memory budgets. To deploy machine and deep learning applications on them the models sizes should be reduced. Few existing techniques do that by trading off with accuracy. We come ups with a novel technique to reduce model sizes without compromising any accuracy. With this, models are exported on the edge without any fine tuning. Our method is implemented in N2D2 framework as an optimization pass. This compression is nearly 10 % and is traded off for execution time overhead in microcontrollers.