Localization of Deep Video Inpainting via Eight-Direction Attention and Cascade ConvGRU

Abstract

Deep video inpainting can be exploited to remove the specific target objects. When these inpainted videos are spread on social medias, it is easy to cause negative public perspectives. Therefore, it is necessary to locate the inpainted regions subjected to deep video inpainting. This paper addresses this issue on the basis of the enhanced inpainting traces. Concretely, continuous RGB frames and error-level analysis frames (ELA) are firstly fed into the encoder in parallel to extract more trace features of the inpainted regions, and multi-modal features are generated at different scales through channel feature-level fusion. Then, a cascade of eight ConvGRUs is embedded in the decoder to capture the temporal abnormity between video frames. In particular, an eight-direction local attention module in the last level of the encoder is introduced, which pays attention to the neighborhood information of pixels through eight directions and captures the inconsistency between pixels in the inpainted regions. As a result, our proposed method performs favorably with more tampered details compared with the state-of-the-art methods on the constructed datasets.

Video Not Available

Localization of Deep Video Inpainting via Eight-Direction Attention and Cascade ConvGRU