Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method

Details

Presenter(s)

Affiliation: Affiliation

Purdue University Northwest
Country

View profile

Abstract

We propose a video-audio based emotion recognition system in order to improve the successive classification rate. The features from audio frames are extracted using Mel frequency Cepstral coefficients (MFCC) while the features from video frames are extracted from VGG16 with pre-trained weights on the ImageNet dataset. Then recurrent neural networks (RNN) are further applied to process the sequence information. The outputs of both RNN are fused into a concatenate layer and then the final classification result is calculated by the softmax operation. Our proposed system achieves 90% accuracy based on the RAVDESS dataset for eight emotion classes.

Slides

Video-Audio Emotion Recognition Based on Feature Fusion Deep Learning Method (application/pdf)

Download