Skip to main content
Video s3
    Details
    Presenter(s)
    Quanhui Cao Headshot
    Display Name
    Quanhui Cao
    Affiliation
    Affiliation
    Tongji University
    Country
    Author(s)
    Display Name
    Quanhui Cao
    Affiliation
    Affiliation
    Tongji University
    Display Name
    Pengjie Tang
    Affiliation
    Affiliation
    Jinggangshan University
    Display Name
    Hanli Wang
    Affiliation
    Affiliation
    Tongji University
    Abstract

    Video captioning is a sequence-to-sequence task of automatically generating natural language descriptions for given videos. In this paper, we propose a novel Spatio-Temporal Super-Resolution (STSR) network which jointly trains video captioning task and video super-resolution task in an end-to-end fashion. Experiments on two benchmarks demonstrate that our proposed STSR boosts the video captioning performance significantly and outperforms most state-of-the-art approaches.

    Slides
    • Spatio-Temporal Super-Resolution Network: Enhance Visual Representations for Video Captioning (application/pdf)