计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220800211-7.doi: 10.11896/jsjkx.220800211
崔琳1,2, 崔晨露1, 刘政伟1, 薛凯1
CUI Lin1,2, CUI Chenlu1, LIU Zhengwei1, XUE Kai1
摘要: 传统MFCC不仅忽略了浊音信号中基音频率的影响,还不能表征语音的动态特征,因此提出利用滑动平均滤波器滤除浊音信号的基音频率,并在提取完静态MFCC特征后再通过提取其一阶差分与二阶差分来获取动态特征。将得到的特征送入模型中进行训练,为了构建更高效的语音情感识别模型,搭建了一种融合多头注意力机制的并行混合模型。多头注意力机制不仅可以有效防止梯度消失现象,构建更深层的网络,各个注意力头还可以执行不同的任务来提高准确率。最后进行情感特征分类,传统softmax在进行分类时类内距离可能会变大导致模型的置信度差,因此引入了中心损失函数,将两者联合来进行分类。实验结果表明,所提方法在RAVDESS数据集和EMO-DB数据集上的准确率可以分别达到98.15%和96.26%。
中图分类号:
[1]DANNUO J,XIN H,JINGHAN X,et al.Design of Intelligent Vehicle Multimedia Human-Computer Interaction System[C]//IOP Conference Series:Materials Science and Engineering.IOP Publishing,2019 [2]ZHOU Y,SUN Y,ZHANG J,et al.Speech emotion recognition using both spectral and prosodic features[C]//International Conference on Information Engineering and Computer Science.IEEE,2009:1-4. [3]LIU Z T,XU J P,WU M,et al.Overview of speech emotion feature extraction and dimensionality reduction methods[J].Journal of Computer Science,2018,41(12):2833-2851. [4]SHIMIZU T,ONAGA H.Study on acoustic improvements by sound-absorbing panels and acoustical quality assessment of te-leconference systems[J].Applied Acoustics,2018,139(1):101-112. [5]VERVERIDIS D,KOTROPOULOS C,PITAS I.Automaticemotional speech classification[C]//IEEE International Confe-rence on Acoustics,Speech,and Signal Processing.IEEE,2004:1-593. [6]SCHULLER B,RIGOLL G,LANG M.Hidden Markov model-based speech emotion recognition[C]//IEEE International Conference on Acoustics,Speech,and Signal Processing(ICASSP’03).IEEE,2003. [7]LIU Y R,ZHANG X Y,CHEN G J,et al.VMD improves theemotional speech feature extraction of GFCC[J].Journal of Xi’ an University of Electronic Science and Technology,2019,46(5):24-30. [8]MAO Q,DONG M,HUANG Z,et al.Learning salient features for speech emotion recognition using convolutional neural networks[J].IEEE transactions on multimedia,2014,16(8):2203-2213. [9]LEE J,TASHEV I.High-level feature representation using recurrent neural network for speech emotion recognition[J].Interspeech,2015,5(1):10-13. [10]VERKHOLYAK O V,KAYA H,KARPOV A A.ModelingShort-Term and Long-Term Dependencies of the Speech Signal for Paralinguistic Emotion Classification[J].SPIIRAS Procee-dings,2019,18(1):30-56. [11]YU H,JI Y,LI Q.Student sentiment classification model based on GRU neural network and TF-IDF algorithm[J].Journal of Intelligent and Fuzzy Systems,2021,40(2):2301-2311. [12]LI D D,SUN L Y,XU X L,et al.BLSTM and CNN Stacking Architecture for Speech Emotion Recognition[J].Neural Processing Letters,2021,53(6):1-19. [13]CHEN Q P,HUANG G M.A novel dual attention-basedBLSTM with hybrid features in speech emotion recognition[J].Engineering Applications of Artificial Intelligence,2021,102:104277. [14]CHEN W L,SUN X.Speech emotion recognition based on MFCCG-PCA[ J ].Journal of Peking University(Natural Science Edition),2015,51(2):269-274. [15]LU W,DAI B J,LI H,et al.The influence of pitch frequency information in MFCC on speaker recognition system performance[J].Journal of China University of Science and Technology,2009,39(8):859-863,884. [16]DONG Y F,SU H,LIU B,et al.Model level fusion dimensionemotion recognition method based on multi-headed attention mechanism[J].Journal of Signal Processing,2021,37(5):885-892. [17]LIVINGSTONE S R,RUSSO F A,JOSEPH N.The ryerson audio-visual da-tabase of emotional speech and song(RAVDESS):a dynamic,multi-modal set of facial and vocal expressions in North American English[J].PlosOne,2018,13(5):e0196391. [18]CIRAKMAN O,GUNSEL B.Online speaker emotion trackingwith a dynamic state transition model[C]//23rd International Conference on Pattern Recognition(ICPR).IEEE,2016:307-312. [19]ZHENG Y,CHEN J N,WU F,et al.Research and Implementation of Speech Emotion Recognition Based on CGRU Model[J].Journal of Northeastern University(Natural Science Edition),2020,41(12):1680-1685. [20]PURI T,SONIM,DHIMAN G,et al.Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network.[J].Journal of healthcare engineering,2022,2022:8472947. [21]JAHANGIR R,TEH Y W,MUJTABA G,et al.Convolutional neural network-based cross-corpus speech emotion recognition with data augmentation and features fusion[J].Machine Vision and Applications,2022,33(3):1-16. [22]ZHAO J F,MAO X,CHEN L J.Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J].Biomedical Signal Processing and Control,2019,47:312-323. [23]GARCÍA-ORDÁS M T,ALAIZ-MORETÓN H,BENÍTEZ-ANDRADES J A,et al.Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network[J].Biomedical Signal Processing and Control,2021,69:102946. |
|