Details
Presenter(s)
![Xiaodong Chen Headshot](https://confcats-catavault.s3.amazonaws.com/CATAVault/ieeecass/master/files/styles/cc_user_photo/s3/user-pictures/17161_1.jpg?h=2c4e73f8&itok=ev9BADQz)
Display Name
Xiaodong Chen
- Affiliation
-
AffiliationUniversity of Science and Technology of China
- Country
Abstract
Existing action recognition methods usually consider an input video as a whole and learn models with video-level labels, but cannot learn fine-grained cues for human action. Therefore, researchers start to focus on Part-level Action Parsing which predicts the video-level action and frame-level fine-grained actions of body parts. We propose a coarse-to-fine framework for this task, which first predicts the video-level action, then localizes body parts and predicts the part-level actions. Moreover, to balance the accuracy and computation, we propose to recognize the part-level actions by segment-level features. Furthermore, we propose a pose-guided positional embedding method to accurately localize body parts.