Skip to main content
Video s3
    Details
    Presenter(s)
    Xiaodong Chen Headshot
    Display Name
    Xiaodong Chen
    Affiliation
    Affiliation
    University of Science and Technology of China
    Country
    Author(s)
    Display Name
    Xiaodong Chen
    Affiliation
    Affiliation
    University of Science and Technology of China
    Display Name
    Xinchen Liu
    Affiliation
    Affiliation
    AI Research of JD.com
    Display Name
    Wu Liu
    Affiliation
    Affiliation
    JD AI Research
    Display Name
    Kun Liu
    Affiliation
    Affiliation
    JD.com, Inc.
    Display Name
    Dong Wu
    Affiliation
    Affiliation
    JD.com, Inc.
    Display Name
    Yongdong Zhang
    Affiliation
    Affiliation
    University of Science and Technology of China
    Display Name
    Tao Mei
    Affiliation
    Affiliation
    AI Research of JD.com
    Abstract

    Existing action recognition methods usually consider an input video as a whole and learn models with video-level labels, but cannot learn fine-grained cues for human action. Therefore, researchers start to focus on Part-level Action Parsing which predicts the video-level action and frame-level fine-grained actions of body parts. We propose a coarse-to-fine framework for this task, which first predicts the video-level action, then localizes body parts and predicts the part-level actions. Moreover, to balance the accuracy and computation, we propose to recognize the part-level actions by segment-level features. Furthermore, we propose a pose-guided positional embedding method to accurately localize body parts.

    Slides
    • Part-Level Action Parsing via a Pose-Guided Coarse-to-Fine Framework (application/pdf)