Computer Science ›› 2014, Vol. 41 ›› Issue (5): 266-269.doi: 10.11896/j.issn.1002-137X.2014.05.056

Previous Articles     Next Articles

Emotion Recognition Based on Unsupervised Extraction of Facial Expression Spatio-temporal Features

WANG Jin-wei,MA Xi-rong and SUN Ji-zhou   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Emotion recognition is the key to solving the problem of the absence of emotional communication in intelligent tutoring systems.According to the problem of effective extraction of facial expression spatio-temporal features from videos for emotion recognition,a recognition method based on unsupervised feature extraction using stacked convolutional independent subspace analysis (ISA) model was proposed to recognize three emotions including puzzlement,delight and boredom that most often appear in learning.This method first detects face in video and normalizes it,then adopts stacked convolutional ISA model to learn (without supervision) facial expression spatio-temporal features from video blocks,finally uses linear SVM classifier to recognize different emotions.Experimental results indicate that this method can extract spatio-temporal expression features more effectively than the use of hand-designed features,as well as recognition rate is better,and it meets the requirement of real-time.

Key words: Emotion recognition,Unsupervised learning,Independent subspace analysis,Spatio-temporal feature,Facial expression

[1] McDaniel B T,D’Mello S K,King B G,et al.Facial features for affective state detection in learning environments[C]∥Procee-dings of the 29th Annual Cognitive Science Society Conference,2007.Nashville,TX,USA,Cognitive Science Society,2007:467-472
[2] Dahmane M,Meunier J.Emotion recognition using dynamicgrid-based hog features[C]∥Proceedings of IEEE International Conference and Workshop on Automatic Face and Gesture Reco-gnition,2011.IEEE,Santa Barbara,CA,USA,2011:884-888
[3] Song Y,Morency L P,Davis R.Learning a sparse codebook of facial and body microexpressions for emotion recognition[C]∥Proceedings of the 15th ACM on International conference on multimodal interaction,2013.Sydney,Australia,ACM,2013:237-244
[4] Hayat M,Bennamoun M,El-Sallam A.Evaluation of spatiotemporal detectors and descriptors for facial expression recognition[C]∥Proceedings of IEEE 5th International Conference on Human System Interactions,2012.IEEE,Perth,West Australia,2012:43-47
[5] Schmidt E M,Kim Y E.Learning emotion-based acoustic features with deep belief networks[C]∥Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,2011.IEEE,New Paltz,NY,USA,2011:65-68
[6] Vincent P,Larochelle H,Lajoie I,et al.Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion[J].The Journal of Machine Learning Research,2010,11:3371-3408
[7] Le Q V,Zou W Y,Yeung S Y,et al.Learning hierarchical invar-iant spatio-temporal features for action recognition with independent subspace analysis[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2011.IEEE,Colorado Springs,CO,USA,2011:3361-3368
[8] O’Toole A J,Harms J,Snow S L,et al.A video database of moving faces and people [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):812-816
[9] Lucey P,Cohn J F,Kanade T,et al.The Extended Cohn-Kanade Dataset (CK+):A complete dataset for action unit and emotion-specified expression[C]∥Proceedings of IEEE Workshops on Computer Vision and Pattern Recognition,2010.IEEE,San Francisco,CA,USA,2010:94-101
[10] Fan R E,Chang K W,Hsieh C J,et al.LIBLINEAR:A library for large linear classification [J].The Journal of Machine Lear-ning Research,2008,9:1871-1874

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!