Hierarchical Coding for Talking-Head Video

Talking-head video is very popular in video conference and social media, where the camera captures the movement of user’s head and the change of facial expression. In this paper, we propose a hierarchical coding scheme for the compression of talking-head video. In our proposed method, three data layers, including one base layer, one enhancement layer and one feature layer, are formed as the input of encoder. More specifically, the base layer is generated by spatially sub-sampling the source video. The enhancement layer is composed by the specific key frames and the feature layer is produced based on the extracted facial landmarks. These layers are separately compressed but fused together to reconstruct the video signal in the decoder side. To achieve a high-quality reconstruction, we design the multi-feature fusion network in which the feature layer is used to guide the fusion of base layer and enhancement layer. The experiment results demonstrate the good performance of our proposed method for the coding of talking-head video.

Slides

Hierarchical Coding for Talking-Head Video (application/pdf)

Download