Computer Science ›› 2026, Vol. 53 ›› Issue (1): 180-186.doi: 10.11896/jsjkx.241200006
• Computer Graphics & Multimedia • Previous Articles Next Articles
YAO Jia, LI Dongdong, WANG Zhe
CLC Number:
| [1]GEORGE S M,ILYAS P M.A review on speech emotion recognition:a survey,recent advances,challenges,and the influence of noise[J].Neurocomputing,2024,568:127015. [2]EL AYADI M,KAMEL M S,KARRAY F.Survey on speechemotion recognition:Features,classification schemes,and databases[J].Pattern Recognition,2011,44(3):572-587. [3]HASHEM A,ARIF M,ALGHAMDI M.Speech emotion recognition approaches:A systematic review[J].Speech Communication,2023,154:102974. [4]CHEN Z,LIN M,WANG Z,et al.Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms[J].Knowledge-Based Systems,2023,281:111077. [5]BAEVSKI A,ZHOU Y,MOHAMED A,et al.wav2vec 2.0:A framework for self-supervised learning of speech representations[J].Advances in Neural Information Processing Systems,2020,33:12449-12460. [6]HSU W N,BOLTE B,TSAI Y H H,et al.Hubert:Self-supervised speech representation learning by masked prediction of hidden units[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3451-3460. [7]PEPINO L,RIERA P,FERRER L.Emotion recognition from speech using wav2vec 2.0 embeddings[J].arXiv:2104.03502,2021. [8]CHEN L W,RUDNICKY A.Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5. [9]LAUSEN A,SCHACHT A.Gender differences in the recognition of vocal emotions[J].Frontiers in Psychology,2018,9:882. [10]THAKARE C,CHAURASIA N K,RATHOD D,et al.Gender aware cnn for speech emotion recognition[J].Health Informa-tics:A Computational Perspective in Healthcare,2021,932:367-377. [11]ZHANG Y,YANG Q.A survey on multi-task learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(12):5586-5609. [12]NEDIYANCHATH A,PARAMASIVAM P,YENIGALLA P.Multi-head attention for speech emotion recognition with auxiliary learning of gender recognition[C]//ICASSP 2020-2020 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2020:7179-7183. [13]LIU Z T,HAN M T,WU B H,et al.Speech emotion recognition based on convolutional neural network with attention-based bidirectional long short-term memory network and multi-task learning[J].Applied Acoustics,2023,202:109178. [14]AHN C S,RANA R,BUSSO C,et al.Multitask Transformer for Cross-Corpus Speech Emotion Recognition[J].IEEE Transactions on Affective Computing,2025,16(3):1581-1591. [15]CAI X,YUAN J,ZHENG R,et al.Speech emotion recognition with multi-task learning[C]//Interspeech.2021:4508-4512. [16]SHARMA M.Multi-lingual multi-task speech emotion recognition using wav2vec 2.0[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:6907-6911. [17]CHEN S,WANG C,CHEN Z,et al.Wavlm:Large-scale self-supervised pre-training for full stack speech processing[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(6):1505-1518. [18]LIM J,KIM K.Wav2vec-VC:Voice Conversion via Hidden Representations of Wav2vec 2.0[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2024:10326-10330. [19]LIU S,JOHNS E,DAVISON A J.End-to-end multi-task lear-ning with attention[C]//Proceedings of the IEEE/CVF Confe-rence on Computer Vision and Pattern Recognition.2019:1871-1880. [20]CHEN Z,BADRINARAYANAN V,LEE C Y,et al.Gradnorm:Gradient normalization for adaptive loss balancing in deep multitask networks[C]//International Conference on Machine Learning.PMLR,2018:794-803. [21]BUSSO C,BULUT M,LEE C C,et al.IEMOCAP:Interactiveemotional dyadic motion capture database[J].Language Resources and Evaluation,2008,42:335-359. [22]BURKHARDT F,PAESCHKE A,ROLFES M,et al.A data-base of German emotional speech[C]//Interspeech.2005:1517-1520. [23]KENDALL A,GAL Y,CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7482-7491. [24]LIU B,LIU X,JIN X,et al.Conflict-averse gradient descent for multi-task learning[J].Advances in Neural Information Proces-sing Systems,2021,34:18878-18890. [25]LATIF S,RANA R,KHALIFA S,et al.Multitask learningfrom augmented auxiliary data for improving speech emotion recognition[J].IEEE Transactions on Affective Computing,2022,14(4):3164-3176. [26]GAO Y,LIU J X,WANG L,et al.Domain-adversarial autoencoder with attention based feature level fusion for speech emotion recognition[C]//ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2021:6314-6318. [27]ZOU H,SI Y,CHEN C,et al.Speech emotion recognition with co-attention based multi-level acoustic information[C]//ICASSP 2022-2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2022:7367-7371. [28]CHEN Z,LI J,LIU H,et al.Learning multi-scale features for speech emotion recognition with connection attention mechanism[J].Expert Systems with Applications,2023,214:118943. [29]AFTAB A,MORSALI A,GHAEMMAGHAMI S,et al.Light-sernet:A lightweight fully convolutional neural network for speech emotion recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:6912-6916. [30]YE J,WEN X C,WEI Y,et al.Temporal modeling matters:A novel temporal emotional modeling approach for speech emotion recognition[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2023:1-5. |
| [1] | WANG Yicheng, NING Tai, LIU Xinyu, LUO Ye. Position-aware Based Multi-modality Lung Cancer Survival Prediction Method [J]. Computer Science, 2025, 52(6A): 240500089-8. |
| [2] | PENG Linna, ZHANG Hongyun, MIAO Duoqian. Complex Organ Segmentation Based on Edge Constraints and Enhanced Swin Unetr [J]. Computer Science, 2025, 52(4): 177-184. |
| [3] | ZHAO Qian, GUO Bin, LIU Yubo, SUN Zhuo, WANG Hao, CHEN Mengqi. Generation of Enrich Semantic Video Dialogue Based on Hierarchical Visual Attention [J]. Computer Science, 2025, 52(1): 315-322. |
| [4] | ZHANG Haoyan, DUAN Liguo, WANG Qinchen, GAO Hao. Long Text Multi-entity Sentiment Analysis Based on Multi-task Joint Training [J]. Computer Science, 2024, 51(6): 309-316. |
| [5] | LIU Zeyu, LIU Jianwei. Video and Image Salient Object Detection Based on Multi-task Learning [J]. Computer Science, 2024, 51(4): 217-228. |
| [6] | ZHANG Jiahao, ZHANG Zhaohui, YAN Qi, WANG Pengwei. Speech Emotion Recognition Based on Voice Rhythm Differences [J]. Computer Science, 2024, 51(4): 262-269. |
| [7] | ZHANG Xue, TIAN Lan, ZENG Ming, LIU Junhui, ZONG Shaoguo. Multitask Classification Algorithm of ECG Signals Based on Radient Magnitude Direction Adjustment [J]. Computer Science, 2024, 51(12): 174-180. |
| [8] | FU Mingrui, LI Weijiang. Multi-task Emotion-Cause Pair Extraction Method Based on Position-aware Interaction Network [J]. Computer Science, 2024, 51(11A): 231000086-9. |
| [9] | WANG Kunyang, LIU Yang, YE Ning, ZHANG Kai. Road Extraction from Complex Urban Remote Sensing Images Based on Multi-task Learning [J]. Computer Science, 2024, 51(11A): 240300095-8. |
| [10] | XU Bei, XU Peng. Emotion Elicited Question Generation Model in Dialogue Scenarios [J]. Computer Science, 2024, 51(11): 265-272. |
| [11] | ZHANG Xiaoyun, ZHAO Hui. Study on Multi-task Student Emotion Recognition Methods Based on Facial Action Units [J]. Computer Science, 2024, 51(10): 105-111. |
| [12] | CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7. |
| [13] | LUO Huilan, YE Ju. Study of Multi-task Learning with Joint Semantic Segmentation and Depth Estimation [J]. Computer Science, 2023, 50(6A): 220100111-10. |
| [14] | ZHEN Tiange, SONG Mingyang, JING Liping. Incorporating Multi-granularity Extractive Features for Keyphrase Generation [J]. Computer Science, 2023, 50(4): 181-187. |
| [15] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
|
||