Skip to main content
Video s3
    Details
    Presenter(s)
    Yanan Song Headshot
    Display Name
    Yanan Song
    Affiliation
    Affiliation
    Purdue University Northwest
    Country
    Abstract

    We propose a video-audio based emotion recognition system in order to improve the successive classification rate. The features from audio frames are extracted using Mel frequency Cepstral coefficients (MFCC) while the features from video frames are extracted from VGG16 with pre-trained weights on the ImageNet dataset. Then recurrent neural networks (RNN) are further applied to process the sequence information. The outputs of both RNN are fused into a concatenate layer and then the final classification result is calculated by the softmax operation. Our proposed system achieves 90% accuracy based on the RAVDESS dataset for eight emotion classes.

    Slides
    • Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method (application/pdf)