Spatio-Temporal Super-Resolution Network: Enhance Visual Representations for Video Captioning

Details

Presenter(s)

Quanhui Cao

Affiliation: Affiliation

Tongji University
Country

View profile

Author(s)

Quanhui Cao

Affiliation: Affiliation

Tongji University

View profile

Pengjie Tang

Affiliation: Affiliation

Jinggangshan University

View profile

Hanli Wang

Affiliation: Affiliation

Tongji University

View profile

Abstract

Video captioning is a sequence-to-sequence task of automatically generating natural language descriptions for given videos. In this paper, we propose a novel Spatio-Temporal Super-Resolution (STSR) network which jointly trains video captioning task and video super-resolution task in an end-to-end fashion. Experiments on two benchmarks demonstrate that our proposed STSR boosts the video captioning performance significantly and outperforms most state-of-the-art approaches.

Slides

Spatio-Temporal Super-Resolution Network: Enhance Visual Representations for Video Captioning (application/pdf)

Download