基于声学特征的语言情感识别

doi:10.11896/j.issn.1002-137X.2015.09.005

Abstract

Abstract: Emotion recognition from speech is a challenging research area with wide applications.This paper explored one of the key aspects of building an emotion recognition system:generating suitable feature representation.We extractedfeatures from four angles:(1)low-level acoustic features such as intensity,F0,jitter,shimmer,spectral contours etc.and statistical functions over these features,(2)a set of features derived from segmental cepstral-based features scored against emotion-dependent Gaussian mixture models,(3)a set of features derived from a set of low-level acoustic codewords,(4)GMM supervectors constructed by stacking the means or covariance or weights of the adapted mixture components on each utterance.We applied these features for emotion recognition independently and jointly and compared their performance within this task.We built a support vector machine(SVM) classifier based on these features.We testedthe performance of these different features on some public emotion recognition corpus(including IEMOCAP corpus in English,CASIA corpus in Mandarin,and BerlinEMO-DB in Germany).On the IEMOCAP database,the four-class emotion recognition accuracy of our system is 71.9%,which outperforms the previously reported best results on this dataset.

Key words: Speech emotion recognition,Acoustic features,Feature fusion

JIN Qin, CHEN Shi-zhe, LI Xi-rong, YANG Gang and XU Jie-ping. Speech Emotion Recognition Based on Acoustic Features[J].Computer Science, 2015, 42(9): 24-28.

References

[1] Litman D,Forbes K.Recognizing emotions from student speech in tutoring dialogues[C]∥Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU).2003:25-30
[2] France D J,Shiavi R G,Silverman S,et al.Acoustical properties of speech as indicators of depression and suicidal risk [J].IEEE Trans.on Biomedical Engineering,2000,47(7):829-837
[3] Yang N,Muraleedharan R,Kohl J,et al.Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion[C]∥Proceedings of the 4^th IEEE workshop on Spoken Language Technology(SLT),2012.Miami,Florida,2012:455-460
[4] Schuller B,Rigoll G,Lang M.Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture[C]∥Proceedings of the ICASSP.2004,1:577-580
[5] Ayadi M,Kamel M,Karray F.Survey on speech emotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587
[6] Zeng Z,Pantic M,Rosiman G I,et al.A survey of affect recognition methods:Audio,visual,and spontaneous expressions[J].IEEE Trans.on Pattern Analysis and Machine Intelligence,2009,1(1):39-58
[7] Kockmann M,Burget L,Cemocky J.Application of speaker and language independent state-of-the-art techniques for emotion recognition[J].Speech Communication,2011,53(9):1172-1185
[8] Chen L,Mao X,Xue Y-L,et al.Speech Emotion Recognition:Features and Classification Models[J].Digital Signal Proces-sing,2012,22(6):1154-1160
[9] Zhang B Y,Yu J Q,Tang J F,et al.Movie background music classification foremotion [J].Computer Science,2013,0(12):37-40,4
[10] Schuller B,Reiter S,Mueller R,et al.Speaker-independentspeech emotion recognition by ensemble classification[C]∥Proceedings of IEEE International Conference on Multimedia and Expo(ICME).Amsterdam,Netherlands,2005:864-867
[11] Pao T L,Chen Y T,Ye J H,et al.Mandarin Emotional Speech Recognition based on SVM and NN[C]∥Proceedings of International Conference on Patter Recognition(ICPR).2006,1:1096-1100
[12] Lee H,Largman Y,Pham P,et al.Unsupervised feature learning for audio classification using convolutional deep belief networks[C]∥Proceedings of Advances in Neural Information Proces-sing Systems(NIPS).2009:1-9
[13] Eyben F,Wollmer M,Schuller B.OpenSMILE-The MunichVersatile and Fast Open-Source Audio Feature Extractor[C]∥Proceedings of ACM Multimedia(MM).Florence,Italy,2010:1459-1462
[14] Schuller B,Batliner A,Steidl S,et al.Recognizing Realistic Emotions and Affect in Speech:State of the Art and Lessons Leant from the First Challenge[J].Speech Communication,2011,53(10):1062-1087
[15] Rozgic V,Ananthakrishnan S,Saleem S,et al.Emotion Recognition using Acoustic and Lexical Features[C]∥Proceedings of INTERSPEECH 2012.September Portland,2012
[16] Lee K,Ellis D P W.Audio-Based Semantic Concept Classification for Consumer Video[J].IEEE Trans.Audio,Speech,and Language Processing,2010,18(6):1406-1416
[17] Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006:308-311
[18] Busso C,Bulut M,Lee C C,et al.IEMOCAP:Interactive emotional dyadic motion capture database[J].Journal of Language Resources and Evaluation,2008,42(4):335-359
[19] Data collected by the speech group at National Key Laboratory of Pattern Recognition.http://www.datatang.com/data/39277
[20] Burkhardt F,Paeschke A,Rolfes M,et al.A database of German emotional speech[C]∥Proceedings of INTERSPEECH 2005.Lisbon,2005:1517-1520
[21] Hsu C W,Chang C C,Lin C J.A practical guide to support vector classification.2010.http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[22] Witten I H,Frank E,Trigg L E,et al.Weka:Practical machine learning tools and techniques with Java implementations.http://www.cs.waikato.ac.nz/~eibe/pubs/99IHW-EF-LT-MH-GH-SJC-Tools-Java.pdf
[23] Brummer N.FoCal-II:Toolkit for calibration of multiclass reco-gnition scores.https://sites.google.com/site/nikobrummer/focal

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Speech Emotion Recognition Based on Acoustic Features

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0