Computer Science ›› 2026, Vol. 53 ›› Issue (3): 307-320.doi: 10.11896/jsjkx.250300125
• Artificial Intelligence • Previous Articles Next Articles
XU Cheng1,4,5, LIU Yuxuan1,5, WANG Xin2, ZHANG Cheng1,5, YAO Dengfeng1,4, YUAN Jiazheng3
CLC Number:
| [1]WU H Y,HE C L,QU Y Z,et al.Emotional intelligence of large language models and their psychological applications[J].Science &Technology Review,2025,43(3):47-58. [2]ZHANG H X,HUA M T,XU X Y,et al.Phonetic processing characteristics in children with nonverbal learning disabilities[J].Chinese Journal of Special Education,2024(1):57-65. [3]ZHENG Y,GAN W,CHEN Z,et al.Large language models for medicine:a survey[J].International Journal of Machine Lear-ning and Cybernetics,2025,16(2):1015-1040. [4]ZHANG Q,LI Y Q,JIA W R,et al.International policy framework,core content,and priority areas for healthcare,rehabilitation,and education services for children with hearing and speech disorders[J].Chinese Journal of Rehabilitation Theory and Practice,2024,30(4):373-380. [5]GE S N,WANG Y G,YIN M M,et al.Diagnosis,Assessment and Rehabilitation of Cerebral Palsy Complicated with Speech Disorders:A Study Based on WHO-FICs[J].Chinese Journal of Rehabilitation Theory and Practice,2022,28(6):637-645. [6]PORTALETE C R,DE OLIVEIRA MORAES D A,PAGLIARIN K C,et al.Acoustic and physiological voice assessment and maximum phonation time in patients with different types of dy-sarthria[J].Journal of Voice,2024,38(2):540.e1-540.e11. [7]LÉVÊQUE N,SLIS A,LANCIA L,et al.Acoustic change over time in spastic and/or flaccid dysarthria in motor neuron diseases[J].Journal of Speech,Language,and Hearing Research,2022,65(5):1767-1783. [8]FRIDRIKSSON J,KJARTANSSON O,MORGAN P S,et al.Impaired speech repetition and left parietal lobe damage[J].Journal of Neuroscience,2010,30(33):11057-11061. [9]THYE M,SZAFLARSKI J P,MIRMAN D.Shared lesion corre-lates of semantic and letter fluency in post-stroke aphasia[J].Journal of Neuropsychology,2021,15(1):143-150. [10]NEEF N E,ANWANDER A,BÜTFERING C,et al.Structural connectivity of right frontal hyperactive areas scales with stuttering severity[J].Brain,2018,141(1):191-204. [11]JENSON D,REILLY K J,HARKRIDER A W,et al.Trait related sensorimotor deficits in people who stutter:An EEG investigation of μ rhythm dynamics during spontaneous fluency[J].NeuroImage:Clinical,2018,19:690-702. [12]CANTARELLA G,TORRETTA S,FERRUTA S,et al.Multidimensional assessment of the effectiveness of group voice therapy[J].Journal of Voice,2017,31(6):714-721. [13]BISHOP D V M,SNOWLING M J,THOMPSON P A,et al.Phase 2 of CATALISE:A multinational and multidisciplinary Delphi consensus study of problems with language development:Terminology[J].Journal of Child Psychology and Psychiatry,2017,58(10):1068-1080. [14]SHENG L,YU J,SU P L,et al.Developmental language disorder in Chinese children:A systematic review of research from 1997 to 2022[J].Brain and Language,2023,241:105268. [15]MAHMOUD S S,KUMAR A,TANG Y,et al.An efficient deep learning based method for speech assessment of Mandarin-speaking aphasic patients[J].IEEE Journal of Biomedical and Health Informatics,2020,24(11):3191-3202. [16]TOMANEK K,TOBIN J,VENUGOPALAN S,et al.Large language models as a proxy for human evaluation in assessing the comprehensibility of disordered speech transcription[C]//2024 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).Piscataway,NJ:IEEE,2024:10846-10850. [17]CHENG J,CHEN X,METALLINOU A.Deep neural network acoustic models for spoken assessment applications[J].Speech Communication,2015,73:14-27. [18]SHAMA K,KRISHNA A,CHOLAYYA N U.Study of har-monics-to-noise ratio and critical-band energy spectrum of speech as acoustic indicators of laryngeal and voice pathology[J].EURASIP Journal on Advances in Signal Processing,2007,2007(1):1-9. [19]TSENG S C,KUEI K,TSOU P C.Acoustic characteristics ofvowels and plosives/affricates of Mandarin-speaking hearing-impaired children[J].Clinical Linguistics & Phonetics,2011,25(9):784-803. [20]SFAKIANAKI A,NICOLAIDIS K,KAFENTZIS G P.Temporal,spectral and amplitude characteristics of the Greek fricative/s/in hearing-impaired and normal-hearing speech[J].Clinical Linguistics & Phonetics,2024,38(8):720-746. [21]CHEN Z M.On the Interaction Mechanism between SpeechProduction and Perception[J].Journal of Foreign Languages(Journal of Shanghai International Studies University),2019,42(6):2-17. [22]JAVANMARDI F,KADIRI S R,ALKU P.A comparison of data augmentation methods in voice pathology detection[J].Computer Speech & Language,2024,83:101552. [23]OMEROGLU A N,MOHAMMED H M A,ORAL E A.Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion[J].Engineering Science and Techno-logy,an International Journal,2022,36(6):101148. [24]YUE Q Q,ZHOU P,JING X X.Auditory feature extraction algorithm based on nonlinear power function[J].Microelectronics &Computer,2015,32(6):163-166. [25]HADAR-SHOVAL D,LVOVSKY M,ASRAF K,et al.TheFeasibility of Large Language Models in Verbal Comprehension Assessment:A Proof-of-Concept Study[EB/OL].https://www.sciencedirect.com/org/science/article/pii/S2561326X25001659. [26]MA R,QIAN M,TANG S,et al.Assessment of L2 Oral Proficiency using Speech Large Language Models[J].arXiv:2505.21148,2025. [27]CRAWFORD J L.Linguistic changes in spontaneous speech for detecting Parkinson’s disease using large language models[J].PLOS Digital Health,2025,4(2):e0000757. [28]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-4186. [29]AGBAVOR F,LIANG H.Predicting dementia from spontaneous speech using large language models[J].PLOS Digital Health,2022,1(12):e0000168. [30]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901. [31]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[J].arXiv:2303.08774,2023. [32]LIM B,SETH I,MAXWELL M,et al.Evaluating the efficacy of large language models in generating medical documentation:A comparative study of chatgpt-4,chatgpt-4o,and claude[J].Aesthetic Plastic Surgery,2025,49(20):5846-5857. [33]THELWALL M.Is Google Gemini better than ChatGPT atevaluating research quality[J].Journal of Data and InformationScience,2025,10(2):1-5. [34]WANG Y M,CHEN T J.AI’s deep research revolution:Transforming biomedical literature analysis[J].Journal of the Chinese Medical Association,2025,88(6):415-416. [35]GUO D,YANG D,ZHANG H,et al.Deepseek-r1:Incentivizing reasoning capability in llms via reinforcement learning[J].ar-Xiv:2501.12948,2025. [36]YANG A,LI A,YANG B,et al.Qwen3 technical report[J].arXiv:2505.09388,2025. [37]WANG H,GAO C,DANTONA C,et al.DRG-LLaMA:tuning LLaMA model to predict diagnosis-related group for hospitalized patients[J].NPJ Digital Medicine,2024,7(1):16. [38]SAMO H,ALI K,MEMON M,et al.Fine-tuning mistral 7blarge language model for python query response and code gene-ration:A parameter efficient approach[J].VAWKUM Transactions on Computer Sciences,2024,12(1):205-217. [39]QIU P,WU C,ZHANG X,et al.Towards building multilingual language model for medicine[J].Nature Communications,2024,15(1):8384. [40]CHE W X,DOU Z C,FENG Y S,et al.Natural Language Processing in the Era of Large Models:Challenges,Opportunities,and Developments[J].Scientia Sinica(Informationis),2023,53(9):1645-1687. [41]ZHU Y,ZHOU K,MAO K,et al.Yulan:An open-source large language model[J].arXiv:2406.19853,2024. [42]TAO J H,NIE S,CHE F H.The Evolution and Enlightenment of Language Large Models[J].Bulletin of the Chinese Academy of Sciences,2023,37(5):767-775. [43]LIAN W C,ZHENG M J,XU J.A Comparative Study on the Development Models of AI Chinese Language Companions Based on the Wenxin Large Model[J].Journal of Technology & Chinese Language Teaching,2024,15(2):35-53. [44]XU C,LIU Y X,LI Z J,et al.A Survey of Speech Ability Assessment Methods Based on Large Models[C]//China Compu-ter Users Association Network Application Branch.2024. [45]ZHANG M,QI K M,DING Z B,et al.Research Progress on the Speech Naturalness[J].Chinese Scientific Journal of Hearing and Speech Rehabilitation,2024,22(5):505-509. [46]ZIEGLER W,SCHOLDERLE T,STAIGER A,et al.Die Bogenhausener Dysarthrieskalen(BoDyS):Ein standardisierter Test für die Dysarthriediagnostik bei Erwachsenen[J].Sprache· Stimme· Gehör,2015,39(4):171-175. [47]MEI J,DESROSIERS C,FRASNELLI J.Machine learning for the diagnosis of Parkinson’s disease:a review of literature[J].Frontiers in Aging Neuroscience,2021,13:633752. [48]ZHANG H Y,HUANG H M.Speech emotion recognition based on heterogeneous parallel neural network[J].Computer Engineering,2022,48(4):113-118. [49]BANG C,BOGDANOVIC N,DEUTSCH G,et al.Machinelearning for the diagnosis of Parkinson’s disease using speech analysis:a systematic review[J].International Journal of Speech Technology,2023,26(4):991-998. [50]HAIDER F,DE LA FUENTE S,LUZ S.An assessment ofparalinguistic acoustic features for detection of Alzheimer’s dementia in spontaneous speech[J].IEEE Journal of Selected To-pics in Signal Processing,2019,14(2):272-281. [51]KONG S,LI C,FANG C,et al.Building a Speech Dataset and Recognition Model for the Minority Tu Language[J].Applied Sciences,2024,14(15):6795. [52]JULLIEN S.Screening for language and speech delay in children under five years[J].BMC Pediatrics,2021,21(1):362. [53]LOVATO A,DE COLLE W,GIACOMELLI L,et al.Multi-Dimensional Voice Program(MDVP) vs Praat for assessing euphonic subjects:a preliminary study on the gender-discrimina-ting power of acoustic analysis software[J].Journal of Voice,2016,30(6):765.e1-765.e5. [54]MOHAMED A,LEE H,BORGHOLT L,et al.Self-supervisedspeech representation learning:A review[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(6):1179-1210. [55]BAI Z,ZHANG X L.Speaker recognition based on deep lear-ning:An overview[J].Neural Networks,2021,140:65-99. [56]KOLBÆK M,YU D,TAN Z H,et al.Multitalker speech separation with utterance-level permutation invariant training of deep recurrent neural networks[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2017,25(10):1901-1913. [57]GUO T F,XU X,CHEN J,et al.Speech signal denoising method based on deep learning[J].Audio Engineering,2024,48(6):44-46. [58]OBOUE Y A S I,CHEN Y,FOMEL S,et al.An advanced median filter for improving the signal-to-noise ratio of seismological datasets[J].Computers & Geosciences,2024,182:105464. [59]HSUEH M L,JIN P C,BING Y L U,et al.Comparison of Mo-ving Average and Differential Operation for Wheeze Detection in Spectrograms[J].Archives of Acoustics,2022,47(3):383-388. [60]AL-TAAI R Y L,WU X.Speech enhancement for hearing impaired based on bandpass filters and a compound deep denoising autoencoder[J].Symmetry,2021,13(8):1310. [61]COHEN I.Noise spectrum estimation in adverse environments:Improved minima controlled recursive averaging[J].IEEE Transactions on Speech and Audio Processing,2003,11(5):466-475. [62]CHERUKURU P,MUSTAFA M B.CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform(DWT) preprocessing[J].PeerJ Computer Science,2024,10:e1901. [63]ZHANG Y,LI Y,LIN M H,et al.Pre-emphasis processingmethod for speech signals in power material quality inspection[J].Adhesion,2024,51(11):183-185,189. [64]HUANG X S,LIAO Y L,ZHANG W J,et al.Depression recognition based on speech pre-training models[J].Journal of Biomedical Engineering,2024,41(1):9-16. [65]KIM Y,LEE J,KANG S K.Ultrasensitive crack-based strain sensors:mechanism,performance,and biomedical applications[J].Journal of Mechanical Science and Technology,2022,36(3):1059-1077. [66]BOERSMA P.Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound[J].Proceedings of the Institute of Phonetic Sciences,1993,17(1):97-110. [67]ZHAO Y Q,PENG Z C,JIANG Y H,et al.Feature extraction process of Mel-frequency cepstral coefficients for audio signals[J].Information Technology and Informatization,2023(1):104-111. [68]GABALLAH A,PARSA V,ANDREETTA M,et al.Objective and subjective speech quality assessment of amplification devices for patients with Parkinson’s disease[J].IEEE Transactions on Neural Systems and Rehabilitation Engineering,2019,27(6):1226-1235. [69]MOU Z,YANG J,CHEN Z,et al.Acoustic properties of vowel production in Mandarin-speaking patients with post-stroke spastic dysarthria[J].The Journal of the Acoustical Society of America,2017,142(4):2640-2640. [70]GENG L,LIANG Y,SHAN H,et al.Pathological voice detection and classification based on multimodal transmission network[J].Journal of Voice,2025,39(3):591-601. [71]SHAHAMIRI S R.Speech vision:An end-to-end deep learning-based dysarthric automatic speech recognition system[J].IEEE Transactions on Neural Systems and Rehabilitation Enginee-ring,2021,29:852-861. [72]WALI A,ALAMGIR Z,KARIM S,et al.Generative adversarial networks for speech processing:A review[J].Computer Speech &Language,2022,72:101308. [73]FU S W,YU C,HSIEH T A,et al.Metricgan+:An improved version of metricgan for speech enhancement[J].arXiv:2104.03538,2021. [74]KUMAR K,KUMAR R,DE BOISSIERE T,et al.Melgan:Ge-nerative adversarial networks for conditional waveform synthesis[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:14910-14921. [75]HERNANDEZ A,PÉREZ-TORO P A,NÖTH E,et al.Cross-lingual self-supervised speech representations for improved dysarthric speech recognition[J].arXiv:2204.01670,2022. [76]JI W,YANG M Q,LI Y,et al.Parkinson’s disease detection method based on masked self-supervised speech feature extraction[J].Journal of Electronics & Information Technology,2023,45(10):3502-3510. [77]NGUYEN T,FREDOUILLE C,GHIO A,et al.Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment:Insights and Analysis[C]//2024 IEEE Spoken Language Technology Workshop(SLT).IEEE,2024:975-982. [78]SHEIKH S A,SAHIDULLAH M,HIRSCH F,et al.Stutternet:Stuttering detection using time delay neural network[C]//2021 29th European Signal Processing Conference(EUSIPCO).Piscataway:IEEE,2021:426-430. [79]MARKUS A F,KORS J A,RIJNBEEK P R.The role of explainability in creating trustworthy artificial intelligence for health care:a comprehensive survey of the terminology,design choices,and evaluation strategies[J].Journal of Biomedical Informatics,2021,113:103655. [80]CHEN T,FRANKLE J,CHANG S,et al.The lottery ticket hypothesis for pre-trained bert networks[J].Advances in Neural Information Processing Systems,2020,33:15834-15846. [81]YAN C,LIYEON L,ARIANNA L.Leveraging pre-trained large language models for aphasia detection in English and Chinese speakers[C]//Proceedings of the 6th Clinical Natural Language Processing Workshop.Stroudsburg:ACL,2024:238-245. [82]ROSHANZAMIR A,AGHAJAN H,SOLEYMANI BAGHS-HAH M.Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech[J].BMC Medical Informatics and Decision Making,2021,21(1):1-14. [83]MOHAMMED H M A,OMEROGLU A N,ORAL E A.MMHFNet:Multi-modal and multi-layer hybrid fusion network for voice pathology detection[J].Expert Systems with Applications,2023,223:119790. [84]LATIF S,KHALIFA S,RANA R,et al.Federated learning for speech emotion recognition applications[C]//2020 19th ACM/IEEE International Conference on Information Processing in Sensor Networks(IPSN).IEEE,2020:341-342. [85]BAEVSKI A,ZHOU Y,MOHAMED A,et al.wav2vec 2.0:A framework for self-supervised learning of speech representations[J].Advances in Neural Information Processing Systems,2020,33:12449-12460. [86]HSU W N,BOLTE B,TSAI Y H H,et al.Hubert:Self-super-vised speech representation learning by masked prediction of hidden units[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:3451-3460. [87]WANG P,BAI J,XUE P Y,et al.Pathological voice detection by fusing articulatory and acoustic features[J].Computer Engineering and Design,2021,42(3):776-781. [88]JIANG Y,CHEN Y,WANG T,et al.Investigation of cross modality feature fusion for audio-visual dysarthric speech assessment[C]//2024 IEEE 14th International Symposium on Chinese Spoken Language Processing(ISCSLP).Piscataway,NJ:IEEE,2024:141-145. [89]ZHENG Y,GAN W,CHEN Z,et al.Large language models for medicine:a survey[J].International Journal of Machine Lear-ning and Cybernetics,2024,16(5):1015-1040. [90]HERBOLD L,SADEGHI M,VOGELSANG A.Generating context-aware contrastive explanations in rule-based systems[C]//Proceedings of the 2024 Workshop on Explainability Enginee-ring.New York:ACM,2024:8-14. [91]SHORTEN C,KHOSHGOFTAAR T M,FURHT B.Text data augmentation for deep learning[J].Journal of Big Data,2021,8(1):101. [92]YAO X L,ZHANG X.Research on the Application of SpeechEnhancement Technology in Real-time Audio Processing[J].Audio Engineering,2024,48(11):73-75. [93]TRACEY B,VOLFSON D,GLASS J,et al.Towards interpretable speech biomarkers:exploring MFCCs[J].Scientific Reports,2023,13(1):22787. [94]WANG Z,LYU Q,LAN X,et al.Cross-lingual knowledge graph alignment via graph convolutional networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Stroudsburg:ACL,2018:349-357. [95]KANG X C,DONG X Y,YAO D F,et al.Advances and prospects in dysarthric speaker adaptation[J].Computer Science,2024,51(8):11-19. |
| [1] | LI Wenli, FENG Xiaonian, QIAN Tieyun. Few-shot Continuous Toxicity Detection Based on Large Language Model Augmentation [J]. Computer Science, 2026, 53(3): 321-330. |
| [2] | DU Jiantong, GUAN Zeli, XUE Zhe. Multi-task Learning-based Ophthalmic Video Feature Fusion and Multi-dimensional Profiling [J]. Computer Science, 2026, 53(3): 383-391. |
| [3] | LIU Lilong, LIU Guoming, QI Baoyuan, DENG Xueshan, XUE Dizhan, QIAN Shengsheng. Efficient Inference Techniques of Large Models in Real-world Applications:A Comprehensive Survey [J]. Computer Science, 2026, 53(1): 12-28. |
| [4] | SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38. |
| [5] | LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247. |
| [6] | CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341. |
| [7] | LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7. |
| [8] | HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4. |
| [9] | GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329. |
| [10] | SHANG Yunxian, CAI Guoyong, LIU Qinghua, JIANG Yiming. Active Learning-based Multi-modal Fusion Rumor Detection [J]. Computer Science, 2025, 52(12): 391-399. |
| [11] | LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230. |
| [12] | HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6. |
| [13] | GUO Liwei, WU Yonghao, LIU Yong. Semantic Variations Based Defect Generation and Prediction Model Testing [J]. Computer Science, 2025, 52(11A): 241200059-7. |
| [14] | PAN Jie, WANG Juan, WANG Nan. Large Language Models and Rumors:A Survey on Generation and Detection [J]. Computer Science, 2025, 52(11): 1-12. |
| [15] | FANG Quan, ZHANG Jinlong, WANG Bingqian, HU Jun. Research on Domain Knowledge Question Answering via Large Language Models withCompositional Context Prompting [J]. Computer Science, 2025, 52(11): 13-21. |
|
||