Adaptive Quantization Method for CNN with Computational-Complexity-Aware Regularization

Abstract

Quantization is a typical approach toward reducing processing time for inference of convolutional neural networks (CNNs). The key to reducing inference times without drastic decreases in accuracy is allocating optimal bit widths according to each layer or filter. In this paper, we propose a regularization method using a computational-complexity metric which is correlated with the inference time of quantized CNN models. The proposed method can obtain optimal bit allocations that achieve better recognition accuracy under specified computational-complexity targets. For similar recognition accuracy on the optimized ResNet-18 model, the proposed method achieves 21.0% less inference time compared to the conventional method.