Details
Presenter(s)
Display Name
Quanhui Cao
- Affiliation
-
AffiliationTongji University
- Country
Abstract
Video captioning is a sequence-to-sequence task of automatically generating natural language descriptions for given videos. In this paper, we propose a novel Spatio-Temporal Super-Resolution (STSR) network which jointly trains video captioning task and video super-resolution task in an end-to-end fashion. Experiments on two benchmarks demonstrate that our proposed STSR boosts the video captioning performance significantly and outperforms most state-of-the-art approaches.