Appearance-Motion United Auto-Encoder Framework for Video Anomaly Detection

The key to video anomaly detection is understanding the appearance and motion differences between normal and abnormal events. However, previous works either considered the characteristics of appearance or motion in isolation or treated them without distinction, making the model fail to exploit the unique characteristics of both. In this paper, we propose an appearance-motion united auto-encoder framework to learn the prototypical spatial and temporal patterns of normal events jointly. This method includes a spatial auto-encoder to learn appearance normality, a temporal auto-encoder to learn motion normality, and a channel attention-based spatial-temporal decoder to fuse the spatial-temporal features. The experimental results on standard benchmarks demonstrate the effectiveness of the united normality learning, and our method outperforms the state-of-the-art methods with the AUC of 97.4% and 73.6% on the UCSD Ped2 and ShanghaiTech datasets.

Slides

Appearance-Motion United Auto-Encoder Framework for Video Anomaly Detection (application/pdf)

Download