Structured and Tiled-Based Pruning of Deep Learning Models Targeting FPGA Implementations

Details

Presenter(s)

Lizeth Gonzalez-Carabarin

Affiliation: Affiliation

Eindhoven University of Technology
Country: Country

Netherlands

View profile

Author(s)

Lizeth Gonzalez-Carabarin

Affiliation: Affiliation

Eindhoven University of Technology

View profile

Alexandre Schmid

Affiliation: Affiliation

École Polytechnique Fédérale de Lausanne

View profile

Ruud J.G. van Sloun

Affiliation: Affiliation

Eindhoven University of Technology

View profile

Abstract

Model compression techniques have lead to a reduction of size and number of computations of Deep Learning models. However, techniques such as pruning mostly lack of a real co-optimization with hardware platforms. For instance, implementing unstructured pruning in dedicated hardware is not a straightforward task, which increases memory and reduces the effective bandwidth usage. Moreover, such pruning algorithms should be adapted to certain hardware requirements, such as the use of tiling. Therefore, in this work, we leverage the use of the Gumbel-Softmax relaxation sampling to structurally prune tiles, which benefits further hardware implementations, and additionally allows to jointly optimize with quantization. Additionally, we show that the combination of different pruning scenarios leads to a larger sparsity. Finally, we demonstrate the benefit of using structured pruning on fine-grained elements (weights) in an FPGA design.

Slides

Structured and Tiled-Based Pruning of Deep Learning Models Targeting FPGA Implementations (application/pdf)

Download