Skip to main content
Video s3
    Details
    Presenter(s)
    Yu Liu Headshot
    Display Name
    Yu Liu
    Affiliation
    Affiliation
    Clarkson University
    Country
    Author(s)
    Display Name
    Yu Liu
    Affiliation
    Affiliation
    Clarkson University
    Display Name
    Shibo Li
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Shuyuan Zhu
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Display Name
    Siu-Kei An Yeung
    Affiliation
    Affiliation
    Hong Kong Metropolitan University
    Display Name
    Xing Wen
    Affiliation
    Affiliation
    Kuaishou Technology
    Display Name
    Bing Zeng
    Affiliation
    Affiliation
    University of Electronic Science and Technology of China
    Abstract

    Talking-head video is very popular in video conference and social media, where the camera captures the movement of user’s head and the change of facial expression. In this paper, we propose a hierarchical coding scheme for the compression of talking-head video. In our proposed method, three data layers, including one base layer, one enhancement layer and one feature layer, are formed as the input of encoder. More specifically, the base layer is generated by spatially sub-sampling the source video. The enhancement layer is composed by the specific key frames and the feature layer is produced based on the extracted facial landmarks. These layers are separately compressed but fused together to reconstruct the video signal in the decoder side. To achieve a high-quality reconstruction, we design the multi-feature fusion network in which the feature layer is used to guide the fusion of base layer and enhancement layer. The experiment results demonstrate the good performance of our proposed method for the coding of talking-head video.

    Slides
    • Hierarchical Coding for Talking-Head Video (application/pdf)