Skip to main content
    Details
    Author(s)
    Display Name
    Yuqi Zuo
    Affiliation
    Affiliation
    King Abdullah University of Science and Technology
    Display Name
    Aymen Hamrouni
    Affiliation
    Affiliation
    King Abdullah University of Science and Technology
    Display Name
    Hakim Ghazzai
    Affiliation
    Affiliation
    King Abdullah University of Science and Technology
    Display Name
    Yehia Massoud
    Affiliation
    Affiliation
    King Abdullah University of Science and Technology
    Abstract

    Crowd behavior monitoring and situation assessment continue to be a very challenging problem. There are two main difficulties for such tasks. First, the complexity brought by the interaction and fusion from individual to group that needs to be assessed and analyzed. Second, the classification of these actions which might be useful in identifying danger and avoiding any undesired consequences. In this paper, we propose a transformer-based crowd management monitoring framework called V3Trans-Crowd that captures information from video data and extracts meaningful output to categorize the behavior of the crowd. We provide an improved hierarchical transformer for multi-modal tasks. Inspired by 3D visual transformer, our proposed 3D visual model, V3Trans-Crowd, has been shown to achieve great performances in terms of accuracy compared to state-of-the-art methods, all tested on the standard Crowd-11 dataset.