Skip to main content
Video s3
    Details
    Presenter(s)
    Kengo Nakata Headshot
    Display Name
    Kengo Nakata
    Affiliation
    Affiliation
    Kioxia Corporation
    Country
    Country
    Japan
    Abstract

    Quantization is a typical approach toward reducing processing time for inference of convolutional neural networks (CNNs). The key to reducing inference times without drastic decreases in accuracy is allocating optimal bit widths according to each layer or filter. In this paper, we propose a regularization method using a computational-complexity metric which is correlated with the inference time of quantized CNN models. The proposed method can obtain optimal bit allocations that achieve better recognition accuracy under specified computational-complexity targets. For similar recognition accuracy on the optimized ResNet-18 model, the proposed method achieves 21.0% less inference time compared to the conventional method.

    Slides
    • Adaptive Quantization Method for CNN with Computational-Complexity-Aware Regularization (application/pdf)