Details
Presenter(s)
![Kengo Nakata Headshot](https://confcats-catavault.s3.amazonaws.com/CATAVault/ieeecass/master/files/styles/cc_user_photo/s3/user-pictures/11981.jpg?h=933b3b01&itok=0N0pSY6X)
Display Name
Kengo Nakata
- Affiliation
-
AffiliationKioxia Corporation
- Country
-
CountryJapan
Abstract
Quantization is a typical approach toward reducing processing time for inference of convolutional neural networks (CNNs). The key to reducing inference times without drastic decreases in accuracy is allocating optimal bit widths according to each layer or filter. In this paper, we propose a regularization method using a computational-complexity metric which is correlated with the inference time of quantized CNN models. The proposed method can obtain optimal bit allocations that achieve better recognition accuracy under specified computational-complexity targets. For similar recognition accuracy on the optimized ResNet-18 model, the proposed method achieves 21.0% less inference time compared to the conventional method.