Computer Science ›› 2026, Vol. 53 ›› Issue (6A): 250600112-8.doi: 10.11896/jsjkx.250600112
• Image Processing & Multimedia Technology • Previous Articles Next Articles
SHEN Yingchun1, FENG Xiaohan2, LI Qian3
CLC Number:
| [1] GALES M,YOUNG S.The application of hidden Markov mo-dels in speech recognition[J].Foundations and Trends© in Signal Processing,2008,1(3):195-304. [2] WU C,SUN H,HUANG K,et al.MPSA-Conformer-CTC/Attention:A high-accuracy,low-complexity end-to-end approach for Tibetan speech recognition[J].Sensors,2024,24(21):6824. [3] GRAVES A,JAITLY N.Towards end-to-end speech recogni-tion with recurrent neural networks[C]//International Confe-rence on Machine Learning.PMLR,2014:1764-1772. [4] AMODEI D,ANANTHANARAYANAN S,ANUBHAI R,et al.Deep speech 2:End-to-end speech recognition in english and mandarin[C]//International Conference on Machine Lear-ning.PMLR,2016:173-182. [5] CHOROWSKI J K,BAHDANAU D,SERDYUK D,et al.Attention-based models for speech recognition[J].Advances in Neural Information Processing Systems,2015,28. [6] GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[C]//Proc.Interspeech 2020.2020:5036-5040. [7] LI Q,MAI Q,WANG M,et al.Chinese dialect speech recognition:a comprehensive survey[J].Artificial Intelligence Review,2024,57(2):25. [8] PRABHAVALKAR R,HORI T,SAINATH T N,et al.End-to-end speech recognition:A survey[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2023,32:325-351. [9] GRAVES A,FERNÁNDEZ S,GOMEZ F,et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine Learning.2006:369-376. [10] HORI T,WATANABE S,HERSHEY J R.Joint CTC/attentiondecoding for end-to-end speech recognition[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2017:518-529. [11] KIM S,HORI T,WATANABE S.Joint CTC-attention basedend-to-end speech recognition using multi-task learning[C]//2017 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2017:4835-4839. [12] CAI X,YUAN J,BIAN Y,et al.W-CTC:a connectionist temporal classification loss with wild cards[C]//International Conference on Learning Representations.2021. [13] DJEFFAL N,KHEDDAR H,ADDOU D,et al.Automaticspeech recognition with BERT and CTC transformers:A review[C]//2023 2nd International Conference on Electronics,Energy and Measurement(IC2EM).IEEE,2023,1:1-8. [14] CHEN M,LIU P,YANG H,et al.Towards end-to-end unified recognition for Mandarin and Cantonese[C]//Proc.Interspeech 2024.2024:2365-2369. [15] PU Y Y,YANG J,WEI H,et al.A study on Yunnan dialectal Chinese speech recognition[C]//2008 International Conference on Machine Learning and Cybernetics.IEEE,2008:2760-2764. [16] CHAN W,JAITLY N,LE Q,et al.Listen,attend and spell:A neural network for large vocabulary conversational speech recognition[C]//2016 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2016:4960-4964. [17] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,30. [18] ZHANG Q,LU H,SAK H,et al.Transformer transducer:Astreamable speech recognition model with transformer encoders and rnn-t loss[C]//IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2020).IEEE,2020:7829-7833. [19] PRABHU D,GUPTA A,NITSURE O,et al.Improving self-supervised pre-training using accent-specific codebooks[J].arXiv:2407.03734,2024. [20] JIE Z,SHENGXIANG G,ZHENGTAO Y,et al.DialectMoE:An end-to-end multi-dialect speech recognition model with mixture-of-experts[C]//Proceedings of the 23rd Chinese National Conference on Computational Linguistics(Volume 1:Main Conference).2024:1148-1159. [21] XU K T,XIE F L,TANG X,et al.FireRedASR:Open-source industrial-grade Mandarin speech recognition models from encoder-decoder to LLM integration[J].arXiv:2501.14350,2025. [22] WATANABE S,HORI T,KIM S,et al.Hybrid CTC/attention architecture for end-to-end speech recognition[J].IEEE Journal of Selected Topics in Signal Processing,2017,11(8):1240-1253. [23] SUDO Y,MUHAMMAD S,YAN B,et al.4D ASR:Joint modeling of CTC,attention,transducer,and mask-predict decoders[C]//Proc.Interspeech 2023.2023:3312-3316. [24] ZHU W,SUN S,SHAN C,et al.Skipformer:A skip-and-recover strategy for efficient speech recognition[C]//2024 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2024:1-6. [25] HU K,LI B,SAINATH T,et al.Mixture-of-expert conformer for streaming multilingual asr[C]//Proc.Interspeech 2023.2023:3327-3331. [26] YE S,CHEN S,HU X,et al.Sc-moe:Switch conformer mixture of experts for unified streaming and non-streaming code-switching asr[J].arXiv:2406.18021,2024. [27] SHIM K,LEE J,KIM H.Leveraging adapter for parameter-efficient asr encoder[C]//Proc.Interspeech 2024.2024:2380-2384. [28] CHEN W,YAN B,SHI J,et al.Improving massively multilingual asr with auxiliary ctc objectives[C]//ICASSP 2023-2023 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5. [29] BU H,DU J,NA X,et al.Aishell-1:An open-source mandarin speech corpus and a speech recognition baseline[C]//2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment(O-COCOSDA).IEEE,2017:1-5. [30] ZHANG B,LV H,GUO P,et al.Wenetspeech:A 10 000+hours multi-domain mandarin corpus for speech recognition[C]//2022 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP 2022).IEEE,2022:6182-6186. |
| [1] | WEI Wei, LI Bicheng, ZHU Zhenshui, ZUO Jun. Semantic Modeling and Co-attention Mechanism for Multimodal Sarcasm Detection Method [J]. Computer Science, 2026, 53(6A): 250400127-6. |
| [2] | FENG Guang, LIN Jianzhong, ZHONG Ting, ZHOU Yuanhua, ZHENG Runting, LIU Tianxiang. Triple Extraction Based on Pixel Difference Convolutional Network and Attention Mechanism [J]. Computer Science, 2026, 53(6A): 250400136-10. |
| [3] | CHEN Dianlong, LIU Tengbin, GAO Xiong, TIAN Zijian, ZHU Wenbing, ZOU Shun, WANG Qiang. Defect Detection of Transmission Line Fittings Based on Multiscale Feature Fusion Attention and Cross-layer Aggregation [J]. Computer Science, 2026, 53(6A): 250600110-7. |
| [4] | DUAN Haiying, WANG Baohui, HUANG He. Malicious Traffic Detection Method of ICMP Covert Channel Based on Baseline Features [J]. Computer Science, 2026, 53(6A): 250200069-11. |
| [5] | LI Jie, WANG Baohui, ZHANG Jingyuan. DDoS Attack Detection Based on Attention Mechanism TCN-BiLSTM [J]. Computer Science, 2026, 53(6A): 250300060-9. |
| [6] | ZHANG Shouyi, SHEN Qiang, GUO Yiran, WANG Hanyu. Rain and Fog Weather Object Detection Algorithm Based on Improved YOLOv8 Model [J]. Computer Science, 2026, 53(6A): 250300090-7. |
| [7] | YANG Geer, WANG Xin, SUN Wei, WANG Xinge, HU Zhongrui, MENG Wenjun, ZHANG Junqiang, WU Xinghui, LIU Jinshan, YAN Yuming. Survey on Positional Encoding Algorithms in Deep Learning [J]. Computer Science, 2026, 53(6A): 250300107-16. |
| [8] | WANG Baohui, TAN Yingjie , CHEN Jixuan. Occlusion Head Pose Estimation Algorithm Based on Riemann Optimization [J]. Computer Science, 2026, 53(6A): 250300109-9. |
| [9] | ZHONG Hao, KONG Qingxuan, CAI Xianqing, LI Zhizhong, SUN Hao. Intelligent Recognition Method Based on Multimodal Feature Fusion [J]. Computer Science, 2026, 53(6A): 250700065-10. |
| [10] | ZHANG Zihao, WU Zezhong. Optimization of HAN-based GNN-Transformer Collaborative Contrastive Learning Framework [J]. Computer Science, 2026, 53(6A): 250900103-8. |
| [11] | KE Changbo, LI Tianhao, ZHANG Bolei, XIAO Fu, XU Kang. Teaching Evaluation Sentiment Analysis Method Based on Capsule Network [J]. Computer Science, 2026, 53(6): 10-18. |
| [12] | LIU Ruyi, LYU Xiaohan, MIAO Qiguang, LU Zixiang, WANG Di. Academic Early Warning Prediction Model Based on Attention Mechanism and FeatureInteraction [J]. Computer Science, 2026, 53(6): 19-29. |
| [13] | XU Zhihong, YANG Xinlei, WANG Liqin, DONG Yongfeng, WANG Xu. Knowledge Tracing Model Based on Relational Learning Memory Network [J]. Computer Science, 2026, 53(6): 84-92. |
| [14] | LI Zongmin, WANG Li, LI Yachuan, LIU Yujie, RONG Guangcai, LIU Weihan, MA Wenkang. High-accuracy Human Pose Estimation Combining Wavelet Analysis and Frequency-DomainAttention [J]. Computer Science, 2026, 53(5): 228-236. |
| [15] | CHEN Boying, SHI Jie. Continuous Image Super-resolution Based on Self-attention Implicit Feature Encoding andDecoding [J]. Computer Science, 2026, 53(5): 237-246. |
|
||