Skip to main content
Video s3
    Details
    Presenter(s)
    Savath Saypadith Headshot
    Display Name
    Savath Saypadith
    Affiliation
    Affiliation
    Osaka University
    Country
    Author(s)
    Display Name
    Savath Saypadith
    Affiliation
    Affiliation
    Osaka University
    Display Name
    Takao Onoye
    Affiliation
    Affiliation
    Osaka University
    Abstract

    Video anomaly detection in the unconstrained environment is challenging due to various background scenes, illuminations, and occlusions. Recent studies show that deep learning approaches can achieve remarkable performance on video anomaly detection. In this paper, we propose a joint representation learning structure for video anomaly detection. The proposed architecture extracts features from the object appearance and their associate motion features via different encoders based on ResNet network architecture. Our network architecture is designed to combine spatial and temporal features, which share the same decoder. Using a joint representation learning approach, the proposed architecture effectively learn both appearance and motion features to detect anomalies in various scene scenarios. The experiments on three benchmark datasets demonstrate the remarkable detection accuracy with respect to existing state-of-the-art methods, which achieve 96.5%, 86.9%, and 73.4% in UCSD Pedestrian, CHUK Avenue, and ShanghaiTech datasets, respectively.

    Slides
    • Joint Representation Learning for Anomaly Detection in Surveillance Videos (application/pdf)