[SHORT] Hardware Acceleration in Large-Scale Tensor Decomposition for Neural Network Compression

This work presents an accelerator that implements randomized CPD in large-scale tensors for neural network compression. A mixing method that combines the Walsh-Hadamard transform and discrete cosine transform is proposed to replace the fast Fourier transform with faster convergence. It reduces the computations for transformation by 83%. 75% of computations for solving the required least squares problem are also reduced. The proposed accelerator is flexible to support tensor decomposition with a size of up to 512×512×9×9. Compared to the previous work, this work support larger tensors and achieves a 112× lower latency given the same condition.

Slides

[SHORT] Hardware Acceleration in Large-Scale Tensor Decomposition for Neural Network Compression (application/pdf)

Download