Multi-Level Latent Fusion in Learning-Based Image Coding

Popular learning-based coding approaches are based on variational autoencoders employing Convolutional Neural Networks (CNN) which are end-to-end trained. The receptive field area of the latents in these architectures increase based on the down-sampling ratio and the kernel size used in each convolution layer. This paper proposes new methods to adaptively fuse and code the latents from different layers. It enables a novel multi-level receptive field based latent coding architecture to achieve better coding performance for diverse set of contents. Additionally, Multi-Mixture distribution based entropy modeling of latents and encoder-side content adaptive latent refinements is proposed to bring more coding gains.

Slides

Multi-Level Latent Fusion in Learning-Based Image Coding (application/pdf)

Download