计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 262-269.doi: 10.11896/jsjkx.230200063
张家豪, 章昭辉, 严琦, 王鹏伟
ZHANG Jiahao, ZHANG Zhaohui, YAN Qi, WANG Pengwei
摘要: 语音情感识别在金融反欺诈等领域有着重要的应用前景,但是语音情感识别的准确率提升变得越来越困难。现有基于语谱图的语音情感识别等方法难以捕捉节奏差异特征,从而影响识别效果。文中基于语音节奏特征的差异性,提出了能量帧时频融合的语音情感识别方法。其关键是,针对语音中高能量区域进行频谱筛选,以高能语音帧的分布和时频变化来体现个体的语音节奏差异。在此基础上建立基于卷积神经网络(CNN)和循环神经网络(RNN)的情感识别模型,实现对频谱的时域和频域变化特征的提取与融合。在公开数据集IEMOCAP上进行实验,结果表明,该基于语音节奏差异的语音情感识别与基于语谱图的方法相比,在加权准确率WA和非加权准确率UA指标上分别平均提升了1.05%和1.9%;同时也表明个体的语音节奏差异对提升语音情感识别效果具有重要作用。
中图分类号:
[1]SONG Y K,XIE J.Lightweight speech emotion recognitionmodel based on multitask learning [J/OL].Computer Enginee-ring:1-8.[2023-03-06].https://doi.org/10.19678/j.issn.1000-3428.0064430. [2]ZHANG S Q,LI L M,ZHAO Z J.Speech emotion recognition based on an improved supervised manifold learning algorithm[J].Journal of Electronics and Information,2010,32(11):2724-2729. [3]BUSSO C,MARIOORYAD S,METALLINOU A,et al.Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech[J].IEEE Transactions on Affective Computing,2013,4(4):386-397. [4]JIN Q,CHEN S Z,LI X R,et al.Speech emotion recognitionbased on acoustic features [J].Computer Science,2015,42(9):24-28. [5]TRIGEORGIS G,RINGEVAL F,BRUECKNER R,et al.Adieu Features? End-To-End Speech Emotion Recognition Using A Deep Convolutional Recurrent Network[C]//International Conference on Acoustics,Speech,and Signal Processing.2016:5200-5204. [6]HUANG C W,NARAYANAN S S.Deep Convolutional Recur-rent Neural Network With Attention Mechanism For Robust Speech Emotion Recognition[C]//International Conference on Multimedia Computing and Systems.2017:583-588. [7]SATT A,ROZENBERG S,HOORY R.Efficient Emotion Re-cognition From Speech Using Deep Learning On Spectrograms[C]//Conference of the International Speech Communication Association.2017:1089-1093. [8]TZIRAKIS P,ZHANG J H,SCHULLER B.End-To-EndSpeech Emotion Recognition Using Deep Neural Networks[C]//International Conference on Acoustics,Speech,and Signal Processing.2018:5089-5093. [9]WU X X,LIU S X,CAO Y W,et al.Speech Emotion Recognition Using Capsule Networks[C]//IEEE ICASSP 2019.IEEE,2019. [10]ZHAO J,MAO X,CHEN L.Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J].Biomedical Signal Processing and Control,2019,47:312-323. [11]MUSTAQEEM,KWON S.A CNN-Assisted Enhanced AudioSignal Processing for Speech Emotion Recognition[J].Sensors,2020,20(1.0):183. [12]LIU J,LIU Z,WANG L,et al.Speech Emotion Recognition with Local-Global Aware Deep Representation Learning [C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020. [13]HU D S,ZHANG X Y,ZHANG J,et al.Speech emotion recognition based on feature fusion of primary and secondary networks [J].Journal of Taiyuan University of Technology,2021,52(5):769-774. [14]WU X X,HU S K,WU Z Y,et al.Neural Architecture Search for Speech Emotion Recognition[C]//International Conference on Acoustics,Speech,and Signal Processing.2022:6902-6906. [15]LU G M,YUAN L,YANG W J,et al.Speech emotion recognition based on short-term memory and convolutional neural network[J].Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition,2018,38(5):63-69. [16]ZHANG S,ZHANG S,HUANG T,et al.Speech Emotion Re-cognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching [J].IEEE Transactions on Multimedia,2018,20(6):1576 -1590. [17]HERACLEOUS P,MOHAMMAD Y,YONEVAMA A.DeepConvolutional Neural Networks for Feature Extraction in Speech Emotion Recognition[C]//International Conference on Human-Computer Interaction(HCII).2019:117-132. [18]WANG J,XUE M,CULHANE R,et al.Speech emotion recognition with dual-sequence LSTM architecture[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:6474-6478. [19]HSU J,SU M,WU C,et al.Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1675-1686. [20]ATILA O,ŞENGÜR A.Attention guided 3D CNN-LSTM mo-del for accurate speech based emotion recognition[J].Applied Acoustics,2021,182:108260. [21]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).2017:3859-3869. [22]LI W H.Research on speech emotion recognition based on spectrum sensing feature[D].Nanchang:Donghua University of Technology,2018. [23]YANG X J,WANG H Y,CHEN J H,et al Application of Fast Fourier Transform Algorithm in Audio Power Amplifier[J].Electronic Technology,2015,44(7):33-35. [24]CHEN J.Speech emotion recognition based on convolutionalneural network[C]//2021 International Conference on Networking,Communications and Information Technology.2021:106-109. [25]KAVITHA S,SANJANA N,YOGAJEEVA K,et al.Speech Emotion Recognition Using Different Activation Function[C]//2021 International Confe-rence on Advancements in Electrical,Electronics,Communication,Computing and Automation(ICAECA).2021:1-5. [26]LIESKOVSKA E,JAKUBEC M,JARINA R.RNN with Im-proved Temporal Modeling for Speech Emotion Recognition[C]//2022 32nd International ConferenceRADIOELEKTRONIKA.2022:1-5. [27]BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:interactiveemotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42(4):335-359. |
|