Computer Science ›› 2024, Vol. 51 ›› Issue (4): 262-269.doi: 10.11896/jsjkx.230200063

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Speech Emotion Recognition Based on Voice Rhythm Differences

ZHANG Jiahao, ZHANG Zhaohui, YAN Qi, WANG Pengwei   

  1. School of Computer Science and Technology,Donghua University,Shanghai 201620,China
  • Received:2023-02-09 Revised:2023-04-26 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    Shanghai Science and Technology Innovation Action High-tech Field Project(22511100700).

Abstract: Speech emotion recognition has an important application prospect in financial anti-fraud and other fields,but it is increasingly difficult to improve the accuracy of speech emotion recognition.The existing methods of speech emotion recognition based on spectrograms are difficult to capture the rhythm difference features,which affects the recognition effect.Based on the difference of speech rhythm features,this paper proposes a speech emotion recognition method based on energy frames and time-frequency fusion.The key is to screen high-energy regions of the spectrum in the speech,and reflect the individual voice rhythm differences with the distribution of high-energy speech frames and time-frequency changes.On this basis,an emotion recognition model based on convolutional neural network(CNN) and recurrent neural network(RNN) is established to realize the extraction and fusion of the time and frequency changes of the spectrum.On the open data set IEMOCAP,the experiment shows that compared with the method based on spectrogram,the weighted accuracy WA and the unweighted accuracy UA of the speech emotion recognition based on the difference of speech rhythm increases by 1.05% and 1.9% on average respectively.At the same time,it also shows that individual voice rhythm difference plays an important role in improving the effect of speech emotion recognition.

Key words: Speech emotion recognition, Energy frames, Spectrum, Time-frequency fusion, Voice rhythm difference

CLC Number: 

  • TP301
[1]SONG Y K,XIE J.Lightweight speech emotion recognitionmodel based on multitask learning [J/OL].Computer Enginee-ring:1-8.[2023-03-06].https://doi.org/10.19678/j.issn.1000-3428.0064430.
[2]ZHANG S Q,LI L M,ZHAO Z J.Speech emotion recognition based on an improved supervised manifold learning algorithm[J].Journal of Electronics and Information,2010,32(11):2724-2729.
[3]BUSSO C,MARIOORYAD S,METALLINOU A,et al.Iterative Feature Normalization Scheme for Automatic Emotion Detection from Speech[J].IEEE Transactions on Affective Computing,2013,4(4):386-397.
[4]JIN Q,CHEN S Z,LI X R,et al.Speech emotion recognitionbased on acoustic features [J].Computer Science,2015,42(9):24-28.
[5]TRIGEORGIS G,RINGEVAL F,BRUECKNER R,et al.Adieu Features? End-To-End Speech Emotion Recognition Using A Deep Convolutional Recurrent Network[C]//International Conference on Acoustics,Speech,and Signal Processing.2016:5200-5204.
[6]HUANG C W,NARAYANAN S S.Deep Convolutional Recur-rent Neural Network With Attention Mechanism For Robust Speech Emotion Recognition[C]//International Conference on Multimedia Computing and Systems.2017:583-588.
[7]SATT A,ROZENBERG S,HOORY R.Efficient Emotion Re-cognition From Speech Using Deep Learning On Spectrograms[C]//Conference of the International Speech Communication Association.2017:1089-1093.
[8]TZIRAKIS P,ZHANG J H,SCHULLER B.End-To-EndSpeech Emotion Recognition Using Deep Neural Networks[C]//International Conference on Acoustics,Speech,and Signal Processing.2018:5089-5093.
[9]WU X X,LIU S X,CAO Y W,et al.Speech Emotion Recognition Using Capsule Networks[C]//IEEE ICASSP 2019.IEEE,2019.
[10]ZHAO J,MAO X,CHEN L.Speech emotion recognition using deep 1D & 2D CNN LSTM networks[J].Biomedical Signal Processing and Control,2019,47:312-323.
[11]MUSTAQEEM,KWON S.A CNN-Assisted Enhanced AudioSignal Processing for Speech Emotion Recognition[J].Sensors,2020,20(1.0):183.
[12]LIU J,LIU Z,WANG L,et al.Speech Emotion Recognition with Local-Global Aware Deep Representation Learning [C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020.
[13]HU D S,ZHANG X Y,ZHANG J,et al.Speech emotion recognition based on feature fusion of primary and secondary networks [J].Journal of Taiyuan University of Technology,2021,52(5):769-774.
[14]WU X X,HU S K,WU Z Y,et al.Neural Architecture Search for Speech Emotion Recognition[C]//International Conference on Acoustics,Speech,and Signal Processing.2022:6902-6906.
[15]LU G M,YUAN L,YANG W J,et al.Speech emotion recognition based on short-term memory and convolutional neural network[J].Journal of Nanjing University of Posts and Telecommunications:Natural Science Edition,2018,38(5):63-69.
[16]ZHANG S,ZHANG S,HUANG T,et al.Speech Emotion Re-cognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching [J].IEEE Transactions on Multimedia,2018,20(6):1576 -1590.
[17]HERACLEOUS P,MOHAMMAD Y,YONEVAMA A.DeepConvolutional Neural Networks for Feature Extraction in Speech Emotion Recognition[C]//International Conference on Human-Computer Interaction(HCII).2019:117-132.
[18]WANG J,XUE M,CULHANE R,et al.Speech emotion recognition with dual-sequence LSTM architecture[C]//2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:6474-6478.
[19]HSU J,SU M,WU C,et al.Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:1675-1686.
[20]ATILA O,ŞENGÜR A.Attention guided 3D CNN-LSTM mo-del for accurate speech based emotion recognition[J].Applied Acoustics,2021,182:108260.
[21]SABOUR S,FROSST N,HINTON G E.Dynamic routing between capsules[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems(NIPS'17).2017:3859-3869.
[22]LI W H.Research on speech emotion recognition based on spectrum sensing feature[D].Nanchang:Donghua University of Technology,2018.
[23]YANG X J,WANG H Y,CHEN J H,et al Application of Fast Fourier Transform Algorithm in Audio Power Amplifier[J].Electronic Technology,2015,44(7):33-35.
[24]CHEN J.Speech emotion recognition based on convolutionalneural network[C]//2021 International Conference on Networking,Communications and Information Technology.2021:106-109.
[25]KAVITHA S,SANJANA N,YOGAJEEVA K,et al.Speech Emotion Recognition Using Different Activation Function[C]//2021 International Confe-rence on Advancements in Electrical,Electronics,Communication,Computing and Automation(ICAECA).2021:1-5.
[26]LIESKOVSKA E,JAKUBEC M,JARINA R.RNN with Im-proved Temporal Modeling for Speech Emotion Recognition[C]//2022 32nd International ConferenceRADIOELEKTRONIKA.2022:1-5.
[27]BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:interactiveemotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42(4):335-359.
[1] DU Hao, WANG Yunchao, YAN Chenyu, LI Xingwei. Test Cases Generation Techniques for Root Cause Location of Fault [J]. Computer Science, 2023, 50(7): 10-17.
[2] CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[3] WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160.
[4] WEI Nan, WEI Xianglin, FAN Jianhua, XUE Yu, HU Yongyang. Backdoor Attack Against Deep Reinforcement Learning-based Spectrum Access Model [J]. Computer Science, 2023, 50(1): 351-361.
[5] XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141.
[6] XU Hao, CAO Gui-jun, YAN Lu, LI Ke, WANG Zhen-hong. Wireless Resource Allocation Algorithm with High Reliability and Low Delay for Railway Container [J]. Computer Science, 2022, 49(6): 39-43.
[7] JIANG Rui, XU Shan-shan, XU You-yun. New Hybrid Precoding Algorithm Based on Sub-connected Structure [J]. Computer Science, 2022, 49(5): 256-261.
[8] QIU Wen-jing, HAN Chen, LIU Ai-jun. Dynamic Spectrum Decision-making Method for UAV Swarms in Jamming Environment [J]. Computer Science, 2022, 49(12): 326-331.
[9] GUO Jun-cheng, WAN Gang, HU Xin-jie, WANG Shuai, YAN Fa-bao. Study on Solar Radio Burst Event Detection Based on Transfer Learning [J]. Computer Science, 2022, 49(11A): 210900198-7.
[10] SHI Ke-xiang, BAO Li-yong, DING Hong-wei, GUAN Zheng, ZHAO Lei. Chaos Artificial Bee Colony Algorithm Based on Homogenizing Optimization of Generated Time Series [J]. Computer Science, 2021, 48(7): 270-280.
[11] ZHANG Zi-cheng, TAN Zhi-wei, ZHANG Chen-rui, WANG Xuan, LIU Xiao-xuan, YU Yi-biao. Speech Endpoint Detection Based on Bayesian Decision of Logarithmic Power Spectrum Ratio in High and Low Frequency Band [J]. Computer Science, 2021, 48(6A): 33-37.
[12] DING Shi-ming, WANG Tian-jing, SHEN Hang, BAI Guang-wei. Energy Classifier Based Cooperative Spectrum Sensing Algorithm for Anti-SSDF Attack [J]. Computer Science, 2021, 48(2): 282-288.
[13] ZHAO Xiao-dong, SU Gong-Jin, LI Ke-li, CHENG Jie and XU Jiang-feng. Spectrum Occupancy Prediction Model Based on EMD Decomposition and LSTM Networks [J]. Computer Science, 2020, 47(6A): 294-298.
[14] CHEN Qian, ZHOU Jie, SHAO Gen-fu. MIMO Channels with Arbitrary AoA Power Spectrum for Various Wireless Environments [J]. Computer Science, 2020, 47(6): 271-275.
[15] ZHENG Chun-jun, WANG Chun-li, JIA Ning. Survey of Acoustic Feature Extraction in Speech Tasks [J]. Computer Science, 2020, 47(5): 110-119.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!