计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 41-51.doi: 10.11896/jsjkx.250600034

• 基于AGI技术的智能信息系统 • 上一篇    下一篇

基于大语言模型和深度网络的认知评估量表自动诊断

陈涵1, 徐泽锋1, 蒋究1, 樊凡2,3, 章军建3, 何楚1, 王文伟1   

  1. 1 武汉大学电子信息学院 武汉 430072
    2 黄冈市中心医院神经内科 湖北 黄冈 438000
    3 武汉大学中南医院神经内科 武汉 430071
  • 收稿日期:2025-06-06 修回日期:2025-09-16 发布日期:2026-03-12
  • 通讯作者: 王文伟(wangww@whu.edu.cn)
  • 作者简介:(2019302110373@whu.edu.cn)
  • 基金资助:
    国家自然科学基金(41371342,82571371);国家重点研发计划(2016YFC0803000)

Large Language Model and Deep Network Based Cognitive Assessment Automatic Diagnosis

CHEN Han1, XU Zefeng1, JIANG Jiu1, FAN Fan2,3, ZHANG Junjian3, HE Chu1, WANG Wenwei1   

  1. 1 Electronic Information School, Wuhan University, Wuhan 430072, China
    2 Neurology Department, Huanggang Central Hospital, Huanggang, Hubei 438000, China
    3 Neurology Department, Zhongnan Hospital of Wuhan University, Wuhan 430071, China
  • Received:2025-06-06 Revised:2025-09-16 Online:2026-03-12
  • About author:CHEN Han,born in 2001,postgraduate.His main research interests include large language model,image processing and so on.
    WANG Wenwei,born in 1966,Ph.D,associate professor,master’s supervisor.His main research interests include computer software and computer applications,wireless electronics,computer hardware technology and so on.
  • Supported by:
    National Natural Science Foundation of China(41371342,82571371) and National Key Research and Development Program of China(2016YFC0803000).

摘要: 认知评估量表是认知障碍快速筛查的重要评定工具之一,传统方法依赖于医生的经验和判断,难以保证诊断结果客观准确。深度网络技术的发展和大语言模型的兴起推动了医疗智能辅助诊断的进步,开展针对医学认知评估量表自动化辅助诊断的研究有较大意义。针对这一问题,聚焦于一个常用认知评估量表——蒙特利尔认知评估量表(Montreal Cognitive Assessment,MoCA),提出由大语言模型和基于深度网络的图像分类模型组成的自动诊断MoCA的框架,并在此框架下选用模型。为增强基础模型对量表题目的处理能力,提出了融合线性注意力的CSWin-FLA Transformer(Cross-Shaped Window With Focused Linear Attention Transfromer)和基于少样本的自动生成提示方法AGPoFS(Automatic Generation of Prompts Based on Fewer Samples),并设计了一个MoCA诊断流程。鉴于不存在公开的MoCA数据集,收集整理了武汉大学中南医院提供的量表数据组成数据集,从各个方法到整体系统分别进行实验,结果表明,该系统在提出的数据集上取得了最好的应用性能,证明了相关改进和整体系统的有效性。

关键词: 认知评估量表, 深度网络, 图像分类, 注意力, Transformer, 大语言模型, 自然语言处理

Abstract: Cognitive assessment is one of the important assessment tools for rapid screening of cognitive impairment.Traditional method relies on the experience and judgment of doctors,which is difficult to ensure the objectivity and accuracy of diagnosis results.The development of deep network technology and the rise of large language model have promoted the progress of medical intelligent auxiliary diagnosis.It is of great significance to carry out the research on automatic auxiliary diagnosis of medical cognitive assessment.Aiming at this issue,this paper focuses on a commonly used cognitive assessment Montreal Cognitive Assessment(MoCA),and proposes a framework consisting of large language model and deep network based image classification model for automatic diagnosis of MoCA and selects base model under this framework.To improve the processing abilities of the base models for the evaluation of assessment questions,this paper proposes CSWin-FLA Transformer(Cross-Shaped Window with Focused Linear Attention Transfromer) and AGPoFS(automatic generation of prompts based on fewer samples),and designs a MoCA diagnosis process.Since there is no public MoCA dataset,the assessment data provided by Zhongnan Hospital of Wuhan University are collected to form the datasets.Experiments are carried out from each method to the overall system,and the best application performance is achieved on the proposed datasets,which proves the effectiveness of the relevant improvements and verifies the effectiveness of the system.

Key words: Cognitive assessment, Deep network, Image classification, Attention, Transformer, Large language model, Natural language processing

中图分类号: 

  • TP391
[1]REN R J,YIN P,WANG Z H,et al.China alzheimer’s disease report 2021[J].Journal of Diagnostics Concepts & Practice,2021,20:317-337.
[2]MANGIALASCHE F,SOLOMON A,WINBLAD B,et al.Alzheimer’s disease:clinical trials and drug development[J].The Lancet Neurology,2010,9(7):702-716.
[3]ALBERT M S,DEKOSKY S T,DICKSON D,et al.The diagnosis of mild cognitive impairment due to alzheimer’s disease:recommendations from the national institute on aging-alzheimer’s association workgroups on diagnostic guidelines for alzheimer’ s disease[J].Focus,2013,11(1):96-106.
[4]CHEN X C,GUO Q H.Expert consensus on neuropsychological assessment for mild cognitive impairment(2025 edition)[J].National Medical Journal of China,2025,105(3):204-218.
[5]NI X S,WU F,SONG J,et al.Chinese expert consensus on assessment of cognitive impairment in the elderly(2022)[J].Chinese Journal of Geriatrics,2022,41(12):1430-1440.
[6]WEI J,TAY Y,BOMMASANI R,et al.Emergent abilities oflarge language models[J].arXiv:2206.07682,2022.
[7]SINDI S,CALOV E,FOKKENS J,et al.The CAIDE dementia risk score app:the development of an evidence-based mobile application to predict the risk of dementia[J].Alzheimer’s & Dementia:Diagnosis,Assessment & Disease Monitoring,2015,1(3):328-333.
[8]JASON B,CAMPBELL S,BURRELL L E,et al.Internet-based screening for dementia risk[J].PLoS One,2013,8(2):e57476.
[9]DONG X,BAO J,CHEN D,et al.CSWin transformer:A general vision transformer backbone with cross-shaped windows[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12124-12134.
[10]TOUVRON H,MARTIN L,STONE K,et al.Llama2:Openfoundation and fine-tuned chat models[J].arXiv:2307.09288,2023.
[11]HAN D C,PAN X R,HAN Y Z,et al.Flatten transformer:Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:5961-5971.
[12]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[13]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations.2021.
[14]GRAHAM B,EL-NOUBY A,TOUVRON H,et al.Levit:a vision transformer in convnet’s clothing for faster inference[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:12259-12269.
[15]WU H P,XIAO B,CODELLA N,et al.Cvt:Introducing convolutions to vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:22-31.
[16]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning.PMLR,2021:10347-10357.
[17]GULATI A,QIN J,CHIU C C,et al.Conformer:Convolution-augmented transformer for speech recognition[J].arXiv:2005.08100,2020.
[18]YUAN L,CHEN Y P,WANG T,et al.Tokens-to-token vit:Training vision transformers from scratch on imagenet[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:558-567.
[19]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV).2021:10012-10022.
[20]WU G J,ZHENG W S,LU Y T,et al.Pslt:a light-weight vision transformer with ladder self-attention and progressive shift[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(9):11120-11135.
[21]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Conference of the North American Chapter of the Asso-ciation for Computational Linguistics:Human Language Technol-ogies.ACL,2019.
[22]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[M].OpenAI,2018.
[23]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9.
[24]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[25]XU Y M,HU L,ZHAO J Y,et al.Technology application prospect and risk challenge of large language model[J].Journal of Computer Applications,2024,44(6):1655-1662.
[26]KNOX W B,STONE P.Augmenting reinforcement learningwith human feedback[C]//ICML 2011 Workshop on New Developments in Imitation Learning.2011.
[27]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].Advances in Neural Information Processing Systems,2022,35:27730-27744.
[28]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical report[EB/OL].https://openai.com/index/gpt-4-research.
[29]THOPPILAN R,FREITAS D D,HALL J,et al.Lamda:Language models for dialog applications[J].arXiv:2201.08239,2022.
[30]CHOWDHERY A,NARANG S,DEVLIN J,et al.Palm:Scaling language modeling with pathways[J].Journal of Machine Learning Research,2023,24(240):1-113.
[31]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023.
[32]GUO D Y,YANG D J,ZHANG H W,et al.DeepSeek-R1:Incentivizing Reasoning Capability in LLMs via Reinforcement Learning[J].arXiv:2501.12948,2025.
[33]FATIMA M,PASHA M.Survey of machine learning algorithms for disease diagnostic[J].Journal of Intelligent Learning Systems and Applications,2017,9(1):1-16.
[34]LIU Y,JAIN A,ENG C,et al.A deep learning system for diffe-rential diagnosis of skin diseases[J].Nature Medicine,2020,26(6):900-908.
[35]WANG G Y,YANG G X,DU Z X,et al.ClinicalGPT:Large language models finetuned with diverse medical data and comprehensive evaluation[J].arXiv:2306.09968,2023.
[36]ZHANG H B,CHEN J Y,JIANG F,et al.HuatuoGPT,towards taming language model to be a doctor[J].arXiv:2305.15075,2023.
[37]HU E J,SHEN Y L,WALLIS P,et al.Lora:Low-rank adaptation of large language models[C]//Proceedings of the International Conference on Learning Representations(ICLR).2022.
[38]LIU Z L,LI Y W,SHU P,et al.Radiology-llama2:Best-in-class large language model for radiology[J].arXiv:2309.06419,2023.
[39]SINGHAL K,AZIZI S,TU T,et al.Large language models encode clinical knowledge[J].Nature,2023,620(7972):172-180.
[40]ZHOU S,LIN M Q,DING S R,et al.Interpretable differential diagnosis with dual-inference large language models[J].arXiv:2407.07330,2024.
[41]WEI J,WANG X Z,SCHUURMANS D,et al.Chain-of-thought prompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
[42]WADA A,AKASHI T,SHIH G,et al.Optimizing gpt-4 turbo diagnostic accuracy in neuroradiology through prompt enginee-ring and confidence threshold[J].Diagnostics,2024,14(14):1541.
[43]KRESEVIC S,GIUFFRE M,AJCEVIC M,et al.Optimizationof hepatological clinical guidelines interpretation by large language models:a retrieval augmented generation-based framework[J].NPJ Digital Medicine,2024,7(1):102.
[44]WANG W H,XIE E Z,LI X,et al.Pyramid vision transformer:A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:568-578.
[45]ZHANG H Y,CISSE M,DAUPHIN Y N,et al.mixup:Beyond empirical risk minimization[J].arXiv:1710.09412,2017.
[46]PAN S J,TSANG I W,KWOK J T,et al.Domain adaptation via transfer component analysis[J].IEEE transactions on neural networks,2010,22(2):199-210.
[47]TAN M X,LE Q V.Efficientnetv2:Smaller models and faster training[C]//International Conference on Machine Learning.PMLR,2021:10096-10106.
[48]WOO S,DEBNATH S,HU R H,et al.Convnext v2:Co-designing and scaling convnets with masked autoencoders[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16133-16142.
[49]LIU Z,HU H,LIN Y T,et al.Swin Transformer V2:Scaling Up Capacity and Resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11999-12009.
[50]PRESS O,ZHANG M R,MIN S,et al.Measuring and Narrowing the Compositionality Gap in Language Models[J].arXiv:2210.03350,2022.
[51]ZHOU D,SCHAERLI N,HOU L,et al.Least-to-Most Prompting Enables Complex Reasoning in Large Language Models[C]//Proceedings of the International Conference on Learning Representations(ICLR).2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!