计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 41-51.doi: 10.11896/jsjkx.250600034
陈涵1, 徐泽锋1, 蒋究1, 樊凡2,3, 章军建3, 何楚1, 王文伟1
CHEN Han1, XU Zefeng1, JIANG Jiu1, FAN Fan2,3, ZHANG Junjian3, HE Chu1, WANG Wenwei1
摘要: 认知评估量表是认知障碍快速筛查的重要评定工具之一,传统方法依赖于医生的经验和判断,难以保证诊断结果客观准确。深度网络技术的发展和大语言模型的兴起推动了医疗智能辅助诊断的进步,开展针对医学认知评估量表自动化辅助诊断的研究有较大意义。针对这一问题,聚焦于一个常用认知评估量表——蒙特利尔认知评估量表(Montreal Cognitive Assessment,MoCA),提出由大语言模型和基于深度网络的图像分类模型组成的自动诊断MoCA的框架,并在此框架下选用模型。为增强基础模型对量表题目的处理能力,提出了融合线性注意力的CSWin-FLA Transformer(Cross-Shaped Window With Focused Linear Attention Transfromer)和基于少样本的自动生成提示方法AGPoFS(Automatic Generation of Prompts Based on Fewer Samples),并设计了一个MoCA诊断流程。鉴于不存在公开的MoCA数据集,收集整理了武汉大学中南医院提供的量表数据组成数据集,从各个方法到整体系统分别进行实验,结果表明,该系统在提出的数据集上取得了最好的应用性能,证明了相关改进和整体系统的有效性。
中图分类号:
| [1]REN R J,YIN P,WANG Z H,et al.China alzheimer’s disease report 2021[J].Journal of Diagnostics Concepts & Practice,2021,20:317-337. [2]MANGIALASCHE F,SOLOMON A,WINBLAD B,et al.Alzheimer’s disease:clinical trials and drug development[J].The Lancet Neurology,2010,9(7):702-716. [3]ALBERT M S,DEKOSKY S T,DICKSON D,et al.The diagnosis of mild cognitive impairment due to alzheimer’s disease:recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’ s disease[J].Focus,2013,11(1):96-106. [4]CHEN X C,GUO Q H.Expert consensus on neuropsychological assessment for mild cognitive impairment(2025 edition)[J].National Medical Journal of China,2025,105(3):204-218. [5]NI X S,WU F,SONG J,et al.Chinese expert consensus on assessment of cognitive impairment in the elderly(2022)[J].Chinese Journal of Geriatrics,2022,41(12):1430-1440. [6]WEI J,TAY Y,BOMMASANI R,et al.Emergent abilities oflarge language models[J].arXiv:2206.07682,2022. [7]SINDI S,CALOV E,FOKKENS J,et al.The CAIDE dementia risk score app:the development of an evidence-based mobile application to predict the risk of dementia[J].Alzheimer’s & Dementia:Diagnosis,Assessment & Disease Monitoring,2015,1(3):328-333. [8]JASON B,CAMPBELL S,BURRELL L E,et al.Internet-based screening for dementia risk[J].PLoS One,2013,8(2):e57476. [9]DONG X,BAO J,CHEN D,et al.CSWin transformer:A general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12124-12134. [10]TOUVRON H,MARTIN L,STONE K,et al.Llama2:Openfoundation and fine-tuned chat models[J].arXiv:2307.09288,2023. [11]HAN D C,PAN X R,HAN Y Z,et al.Flatten transformer:Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:5961-5971. [12]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010. [13]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations.2021. [14]GRAHAM B,EL-NOUBY A,TOUVRON H,et al.Levit:a vision transformer in convnet’s clothing for faster inference[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:12259-12269. [15]WU H P,XIAO B,CODELLA N,et al.Cvt:Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:22-31. [16]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning.PMLR,2021:10347-10357. [17]GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[J].arXiv:2005.08100,2020. [18]YUAN L,CHEN Y P,WANG T,et al.Tokens-to-token vit:Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:558-567. [19]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2021:10012-10022. [20]WU G J,ZHENG W S,LU Y T,et al.Pslt:a light-weight vision transformer with ladder self-attention and progressive shift[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(9):11120-11135. [21]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Conference of the North American Chapter of the Asso-ciation for Computational Linguistics:Human Language Technol-ogies.ACL,2019. [22]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[M].OpenAI,2018. [23]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9. [24]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901. [25]XU Y M,HU L,ZHAO J Y,et al.Technology application prospect and risk challenge of large language model[J].Journal of Computer Applications,2024,44(6):1655-1662. [26]KNOX W B,STONE P.Augmenting reinforcement learningwith human feedback[C]//ICML 2011 Workshop on New Developments in Imitation Learning.2011. [27]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].Advances in Neural Information Processing Systems,2022,35:27730-27744. [28]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[EB/OL].https://openai.com/index/gpt-4-research. [29]THOPPILAN R,FREITAS D D,HALL J,et al.Lamda:Language models for dialog applications[J].arXiv:2201.08239,2022. [30]CHOWDHERY A,NARANG S,DEVLIN J,et al.Palm:Scaling language modeling with pathways[J].Journal of Machine Learning Research,2023,24(240):1-113. [31]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023. [32]GUO D Y,YANG D J,ZHANG H W,et al.DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[J].arXiv:2501.12948,2025. [33]FATIMA M,PASHA M.Survey of machine learning algorithms for disease diagnostic[J].Journal of Intelligent Learning Systems and Applications,2017,9(1):1-16. [34]LIU Y,JAIN A,ENG C,et al.A deep learning system for diffe-rential diagnosis of skin diseases[J].Nature Medicine,2020,26(6):900-908. [35]WANG G Y,YANG G X,DU Z X,et al.ClinicalGPT:Large language models finetuned with diverse medical data and comprehensive evaluation[J].arXiv:2306.09968,2023. [36]ZHANG H B,CHEN J Y,JIANG F,et al.HuatuoGPT,towards taming language model to be a doctor[J].arXiv:2305.15075,2023. [37]HU E J,SHEN Y L,WALLIS P,et al.Lora:Low-rank adaptation of large language models[C]//Proceedings of the International Conference on Learning Representations(ICLR).2022. [38]LIU Z L,LI Y W,SHU P,et al.Radiology-llama2:Best-in-class large language model for radiology[J].arXiv:2309.06419,2023. [39]SINGHAL K,AZIZI S,TU T,et al.Large language models encode clinical knowledge[J].Nature,2023,620(7972):172-180. [40]ZHOU S,LIN M Q,DING S R,et al.Interpretable differential diagnosis with dual-inference large language models[J].arXiv:2407.07330,2024. [41]WEI J,WANG X Z,SCHUURMANS D,et al.Chain-of-thought prompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837. [42]WADA A,AKASHI T,SHIH G,et al.Optimizing gpt-4 turbo diagnostic accuracy in neuroradiology through prompt enginee-ring and confidence threshold[J].Diagnostics,2024,14(14):1541. [43]KRESEVIC S,GIUFFRE M,AJCEVIC M,et al.Optimizationof hepatological clinical guidelines interpretation by large language models:a retrieval augmented generation-based framework[J].NPJ Digital Medicine,2024,7(1):102. [44]WANG W H,XIE E Z,LI X,et al.Pyramid vision transformer:A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:568-578. [45]ZHANG H Y,CISSE M,DAUPHIN Y N,et al.mixup:Beyond empirical risk minimization[J].arXiv:1710.09412,2017. [46]PAN S J,TSANG I W,KWOK J T,et al.Domain adaptation via transfer component analysis[J].IEEE transactions on neural networks,2010,22(2):199-210. [47]TAN M X,LE Q V.Efficientnetv2:Smaller models and faster training[C]//International Conference on Machine Learning.PMLR,2021:10096-10106. [48]WOO S,DEBNATH S,HU R H,et al.Convnext v2:Co-designing and scaling convnets with masked autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16133-16142. [49]LIU Z,HU H,LIN Y T,et al.Swin Transformer V2:Scaling Up Capacity and Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11999-12009. [50]PRESS O,ZHANG M R,MIN S,et al.Measuring and Narrowing the Compositionality Gap in Language Models[J].arXiv:2210.03350,2022. [51]ZHOU D,SCHAERLI N,HOU L,et al.Least-to-Most Prompting Enables Complex Reasoning in Large Language Models[C]//Proceedings of the International Conference on Learning Representations(ICLR).2023. |
|
||