A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

We proposed an integer-only quantization method. With no division or big integer multiplication, this quantization method is suitable to be deployed on co-designed hardware platforms. We applied 4 bit quantization on some classical networks. On MNIST, CIFAR10 and CIFAR100, quantization networks perform even better than original networks. On SpeechCommands, quantization error is 0.16%. We also deployed quantized networks on a flash-based in-memory-computing chip to verify this method’s feasibility.

Slides

A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization (application/pdf)

Download