Details
Presenter(s)
Display Name
Yanan Song
- Affiliation
-
AffiliationPurdue University Northwest
- Country
Abstract
We propose a video-audio based emotion recognition system in order to improve the successive classification rate. The features from audio frames are extracted using Mel frequency Cepstral coefficients (MFCC) while the features from video frames are extracted from VGG16 with pre-trained weights on the ImageNet dataset. Then recurrent neural networks (RNN) are further applied to process the sequence information. The outputs of both RNN are fused into a concatenate layer and then the final classification result is calculated by the softmax operation. Our proposed system achieves 90% accuracy based on the RAVDESS dataset for eight emotion classes.