Skip to main content
Video s3
    Details
    Presenter(s)
    Lizeth Gonzalez-Carabarin Headshot
    Affiliation
    Affiliation
    Eindhoven University of Technology
    Country
    Country
    Netherlands
    Author(s)
    Affiliation
    Affiliation
    Eindhoven University of Technology
    Display Name
    Alexandre Schmid
    Affiliation
    Affiliation
    École Polytechnique Fédérale de Lausanne
    Affiliation
    Affiliation
    Eindhoven University of Technology
    Abstract

    Model compression techniques have lead to a reduction of size and number of computations of Deep Learning models. However, techniques such as pruning mostly lack of a real co-optimization with hardware platforms. For instance, implementing unstructured pruning in dedicated hardware is not a straightforward task, which increases memory and reduces the effective bandwidth usage. Moreover, such pruning algorithms should be adapted to certain hardware requirements, such as the use of tiling. Therefore, in this work, we leverage the use of the Gumbel-Softmax relaxation sampling to structurally prune tiles, which benefits further hardware implementations, and additionally allows to jointly optimize with quantization. Additionally, we show that the combination of different pruning scenarios leads to a larger sparsity. Finally, we demonstrate the benefit of using structured pruning on fine-grained elements (weights) in an FPGA design.

    Slides
    • Structured and Tiled-Based Pruning of Deep Learning Models Targeting FPGA Implementations (application/pdf)