计算机科学 ›› 2015, Vol. 42 ›› Issue (9): 24-28.doi: 10.11896/j.issn.1002-137X.2015.09.005

• 第十届和谐人机环境联合学术会议 • 上一篇    下一篇

基于声学特征的语言情感识别

金琴,陈师哲,李锡荣,杨 刚,许洁萍   

  1. 中国人民大学数据工程与知识工程教育部重点实验室 北京100872;中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872,中国人民大学信息学院 北京100872
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受北京市自然科学基金 (4142029),中国人民大学科学研究基金(中央高校基本科研业务费专项资金)(14XNLQ01)资助

Speech Emotion Recognition Based on Acoustic Features

JIN Qin, CHEN Shi-zhe, LI Xi-rong, YANG Gang and XU Jie-ping   

  • Online:2018-11-14 Published:2018-11-14

摘要: 语音情感识别是语音处理领域中一个具有挑战性和广泛应用前景的研究课题。探索了语音情感识别中的关键问题之一:生成情感识别的有效的特征表示。从4个角度生成了语音信号中的情感特征表示:(1)低层次的声学特征,包括能量、基频、声音质量、频谱等相关的特征,以及基于这些低层次特征的统计特征;(2)倒谱声学特征根据情感相关的高斯混合模型进行距离转化而得出的特征;(3)声学特征依据声学词典进行转化而得出的特征;(4)声学特征转化为高斯超向量的特征。通过实验比较了各类特征在情感识别上的独立性能,并且尝试了将不同的特征进行融合,最后比较了不同的声学特征在几个不同语言的情感数据集上的效果(包括IEMOCAP英语情感语料库、CASIA汉语情感语料库和Berlin德语情感语料库)。在IEMOCAP数据集上,系统的正确识别率达到了71.9%,超越了之前在此数据集上报告的最好结果。

关键词: 语音情感识别,声学特征,特征融合

Abstract: Emotion recognition from speech is a challenging research area with wide applications.This paper explored one of the key aspects of building an emotion recognition system:generating suitable feature representation.We extractedfeatures from four angles:(1)low-level acoustic features such as intensity,F0,jitter,shimmer,spectral contours etc.and statistical functions over these features,(2)a set of features derived from segmental cepstral-based features scored against emotion-dependent Gaussian mixture models,(3)a set of features derived from a set of low-level acoustic codewords,(4)GMM supervectors constructed by stacking the means or covariance or weights of the adapted mixture components on each utterance.We applied these features for emotion recognition independently and jointly and compared their performance within this task.We built a support vector machine(SVM) classifier based on these features.We testedthe performance of these different features on some public emotion recognition corpus(including IEMOCAP corpus in English,CASIA corpus in Mandarin,and BerlinEMO-DB in Germany).On the IEMOCAP database,the four-class emotion recognition accuracy of our system is 71.9%,which outperforms the previously reported best results on this dataset.

Key words: Speech emotion recognition,Acoustic features,Feature fusion

[1] Litman D,Forbes K.Recognizing emotions from student speech in tutoring dialogues[C]∥Proceeding of IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU).2003:25-30
[2] France D J,Shiavi R G,Silverman S,et al.Acoustical properties of speech as indicators of depression and suicidal risk [J].IEEE Trans.on Biomedical Engineering,2000,47(7):829-837
[3] Yang N,Muraleedharan R,Kohl J,et al.Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion[C]∥Proceedings of the 4th IEEE workshop on Spoken Language Technology(SLT),2012.Miami,Florida,2012:455-460
[4] Schuller B,Rigoll G,Lang M.Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture[C]∥Proceedings of the ICASSP.2004,1:577-580
[5] Ayadi M,Kamel M,Karray F.Survey on speech emotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587
[6] Zeng Z,Pantic M,Rosiman G I,et al.A survey of affect recognition methods:Audio,visual,and spontaneous expressions[J].IEEE Trans.on Pattern Analysis and Machine Intelligence,2009,1(1):39-58
[7] Kockmann M,Burget L,Cemocky J.Application of speaker and language independent state-of-the-art techniques for emotion recognition[J].Speech Communication,2011,53(9):1172-1185
[8] Chen L,Mao X,Xue Y-L,et al.Speech Emotion Recognition:Features and Classification Models[J].Digital Signal Proces-sing,2012,22(6):1154-1160
[9] Zhang B Y,Yu J Q,Tang J F,et al.Movie background music classification foremotion [J].Computer Science,2013,0(12):37-40,4
[10] Schuller B,Reiter S,Mueller R,et al.Speaker-independentspeech emotion recognition by ensemble classification[C]∥Proceedings of IEEE International Conference on Multimedia and Expo(ICME).Amsterdam,Netherlands,2005:864-867
[11] Pao T L,Chen Y T,Ye J H,et al.Mandarin Emotional Speech Recognition based on SVM and NN[C]∥Proceedings of International Conference on Patter Recognition(ICPR).2006,1:1096-1100
[12] Lee H,Largman Y,Pham P,et al.Unsupervised feature learning for audio classification using convolutional deep belief networks[C]∥Proceedings of Advances in Neural Information Proces-sing Systems(NIPS).2009:1-9
[13] Eyben F,Wollmer M,Schuller B.OpenSMILE-The MunichVersatile and Fast Open-Source Audio Feature Extractor[C]∥Proceedings of ACM Multimedia(MM).Florence,Italy,2010:1459-1462
[14] Schuller B,Batliner A,Steidl S,et al.Recognizing Realistic Emotions and Affect in Speech:State of the Art and Lessons Leant from the First Challenge[J].Speech Communication,2011,53(10):1062-1087
[15] Rozgic V,Ananthakrishnan S,Saleem S,et al.Emotion Recognition using Acoustic and Lexical Features[C]∥Proceedings of INTERSPEECH 2012.September Portland,2012
[16] Lee K,Ellis D P W.Audio-Based Semantic Concept Classification for Consumer Video[J].IEEE Trans.Audio,Speech,and Language Processing,2010,18(6):1406-1416
[17] Campbell W M,Sturim D E,Reynolds D A.Support vector machines using GMM supervectors for speaker verification[J].IEEE Signal Processing Letters,2006:308-311
[18] Busso C,Bulut M,Lee C C,et al.IEMOCAP:Interactive emotional dyadic motion capture database[J].Journal of Language Resources and Evaluation,2008,42(4):335-359
[19] Data collected by the speech group at National Key Laboratory of Pattern Recognition.http://www.datatang.com/data/39277
[20] Burkhardt F,Paeschke A,Rolfes M,et al.A database of German emotional speech[C]∥Proceedings of INTERSPEECH 2005.Lisbon,2005:1517-1520
[21] Hsu C W,Chang C C,Lin C J.A practical guide to support vector classification.2010.http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[22] Witten I H,Frank E,Trigg L E,et al.Weka:Practical machine learning tools and techniques with Java implementations.http://www.cs.waikato.ac.nz/~eibe/pubs/99IHW-EF-LT-MH-GH-SJC-Tools-Java.pdf
[23] Brummer N.FoCal-II:Toolkit for calibration of multiclass reco-gnition scores.https://sites.google.com/site/nikobrummer/focal

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!