计算机科学 ›› 2025, Vol. 52 ›› Issue (9): 294-302.doi: 10.11896/jsjkx.241000114

• 人工智能 • 上一篇    下一篇

基于大小模型结合与迭代反思框架的电子病历摘要生成方法

钟博洋, 阮彤, 张维彦, 刘井平   

  1. 华东理工大学信息工程与科学学院 上海 200237
  • 收稿日期:2024-10-21 修回日期:2025-01-24 出版日期:2025-09-15 发布日期:2025-09-11
  • 通讯作者: 刘井平(jingpingliu@ecust.edu.cn)
  • 作者简介:(a1561418501@163.com)

Collaboration of Large and Small Language Models with Iterative Reflection Framework for Clinical Note Summarization

ZHONG Boyang, RUAN Tong, ZHANG Weiyan, LIU Jingping   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2024-10-21 Revised:2025-01-24 Online:2025-09-15 Published:2025-09-11
  • About author:ZHONG Boyang,born in 2000,postgraduate.His main research interests include natural language processing and vertical domain large language model.
    LIU Jingping,born in 1991,lecturer,master supervisor.His main research interests include natural language processing and vertical domain large language model.

摘要: 在医疗人工智能领域,从医患对话中自动生成电子病历(EMR)是一项核心任务。传统主流方法多依赖于大规模语言模型(LLM)结合少量示例进行学习,然而,这些方法往往未能有效融入深度的医学专业知识,导致生成的EMR内容在专业性方面存在不足。针对这一挑战,提出了一种新颖的迭代反思框架,该框架融合了Error2Correct示例学习与领域模型监督,旨在提升EMR的总结质量。具体而言,首先设计了一种集成了Error2Correct示例学习机制的大规模语言模型,用于EMR的初步生成与持续优化,并在预生成阶段融入医学领域知识。然后利用一个经过微调的小规模医学预训练语言模型,对初步生成的EMR进行进一步的评估与优化,从而在后生成阶段再次深化领域知识的整合。最后,引入了一个迭代调度器,该调度器能够高效地引导模型在持续的反思与迭代过程中进行优化。实验结果显示,所提方法在两个公开的EMR数据集上均展现出了先进的性能。特别是在IMCS-V2-MRG和ACI-BENCH数据集上,与经过微调的大规模语言模型相比,所提方法分别实现了3.66个百分点和7.75个百分点的整体性能提升1)

关键词: 大规模语言模型, 医疗预训练模型, 摘要生成, 大模型反思, 大小模型结合

Abstract: Generating clinical notes from doctor-patient dialogues is a critical task in medical artificial intelligence.Existing me-thods typically rely on large language models(LLMs) with few-shot demonstrations but often struggle to integrate sufficient domain-specific knowledge,leading to suboptimal and less professional outputs.To address this problem,a novel iterative reflection framework is proposed,which integrates Error2Correct example learning and domain-model supervision,aiming to improve the summary quality of EMR.Specifically,a large-cale language model integrating the Error2Correct example learning mechanism is designed for the initial generation and continuous potimization of EMR,and the medical domain knowledge is integrated into the pre-generation stage.Then,this paper uses a lightweight medical pre-training language model,fine-tuned with domain data,to evaluate the refined content,integrating domain knowledge in post-generation.Finally,an iterative scheduler is introduced,which can effectively guide the model to optimize in the continuous process of reflection and improvement.Experimental results on two public datasets demonstrate that the proposed method achieves state-of-the-art performance.Compared with the fine-tuned large language models,the proposed method improves overall performance by 3.68% and 7.75% on IMCS-V2-MRG and ACI-BENCH datasets.

Key words: Large language model, Medical pre-trained model, Summarization generation, Large model reflection, Collaboration of large and small models

中图分类号: 

  • TP391
[1]LIU Z J,WANG X L,CHEN Q C,et al.Temporal indexing of medical entity in Chinese clinical notes[J].BMC Medical Informatics and Decision Making,2019,19:1-11.
[2]YU H Y,ZUO X L,TANG J T,et al.Identifying causal effects of the clinical sentiment of patients’ nursing notes on anticipated fall risk stratification[J].Information Processing & Mana-gement,2023,60(6):103481.
[3]LU X T,SUN L P,LING C,et al.Named Entity Recognition of Chinese Electronic Health Records Incorporating Phonetic and Part-of-speech Features[J].Journal of Chinese Computer Systems,2025,46(2):330-338.
[4]LIU S S,NIE W J,GAO D F,et al.Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records[J].International Journal of Machine Learning and Cybernetics,2021,12:117-130.
[5]LEWIS M.Bart:Denoising sequence-to-sequence pre-training for natural language generation,translation,and comprehension[J].arXiv:1910.13461,2019.
[6]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[7]NI H Q,LIU D,SHI M Y.Semantic-aware Chinese Short Text Summarization Model[J].Computer Science,2020,47(6):74-78.
[8]XI T J,DUAN Z T,CAO J R,et al.Hybrid SummarizationMethod for Legal-Related Long Texts in Public Opinion Information[J].Journal of Chinese Information Processing,2024,38(7):63-72.
[9]ZHANG L,NEGRINHO R,GHOSH A,et al.Leveraging pretrained models for automatic summarization of doctor-patient conversations[J].arXiv:2109.12174,2021.
[10]KRISHNA K,KHOSLA S,BIGHAM J P,et al.GeneratingSOAP notes from doctor-patient conversations using modular summarization techniques[J].arXiv:2005.01795,2020.
[11]JOSHI A,KATARIYA N,AMATRIAIN X,et al.Dr.summarize:Global summarization of medical dialogue by exploiting local structures[J].arXiv:2009.08666,2020.
[12]MICHALOPOULOS G,WILLIAMS K,SINGH G,et al.MedicalSum:A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations[C]//Findings of the Association for Computational Linguistics:EMNLP 2022.2022:4741-4749.
[13]LU G L,JU X L,CHEN X,et al.GRACE:Empowering LLM-based software vulnerability detection with graph structure and in-context learning[J].Journal of Systems and Software,2024,212:112031.
[14]WANG L F,ZHAO M,JI H R,et al.Dialogue summarization enhanced response generation for multi-domain task-oriented dialogue systems[J].Information Processing & Management,2024,61(3):103668.
[15]DU Z X,QIAN Y J,LIU X,et al.Glm:General language model pretraining with autoregressive blank infilling[J].arXiv:2103.10360,2021.
[16]GIORGI J,TOMA A,XIE R,et al.Clinical note generation from doctor-patient conversations using large language models:Insights from mediqa-chat[J].arXiv:2305.02220,2023.
[17]ZHOU W,WANG Z Y,WEI B.Generative Automatic Summarization Model for Legal Judgments[J].Computer Science,2021,48(12):331-336.
[18]KONG Y L,WANG Z Q,WANG H L.Research on Comment Summarization Combined with Evaluation Object Information[J/OL].Computer Science,1-8[2024-10-16].http://kns.cnki.net/kcms/detail/50.1075.TP.20241012.0929.010.html.
[19]GAO Y J,MILLER T,XU D F,et al.Summarizing patients’ problems from hospital progress notes using pre-trained sequence-to-sequence models[C]//Proceedings of COLING.International Conference on Computational Linguistics.NIH Public Access,2022:2979.
[20]ENARVI S,AMOIA M,TEBA M D A,et al.Generating medical reports from patient-doctor conversations using sequence-to-sequence models[C]//Proceedings of the First Workshop on Natural Language Processing for Medical Conversations.2020:22-30.
[21]SONG Y,TIAN Y H,WANG N,et al.Summarizing medicalconversations via identifying important utterances[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:717-729.
[22]MICHALOPOULOS G,WILLIAMS K,SINGH G,et al.MedicalSum:A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations[C]//Findings of the Association forComputational Linguistics:EMNLP 2022.2022:4741-4749.
[23]CAI P S,LIU F,BAJRACHARYA A,et al.Generation of patient after-visit summaries to support physicians[C]//Procee-dings of the 29th International Conference on Computational Linguistics(COLING).2022:6234-6247.
[24]WU R S,WANG H L,WANG Z Q,et al.Short Text Summarization Method Based on Global Self-matching Mechanism[J].Journal of Software,2019,30(9):2705-2717.
[25]HUANG Y X,YU Z T,GUO J J,et al.Case Topic Summarization Based on Topic Interaction Graph[J].Journal of Software,2023,34(4):1796-1810.
[26]KRISHNA K,KHOSLA S,BIGHAM J P,et al.GeneratingSOAP notes from doctor-patient conversations using modular summarization techniques[J].arXiv:2005.01795,2020.
[27]TANG X R,TRAN A,TAN J,et al.Gersteinlab at mediqa-chat 2023:Clinical note summarization from doctor-patient conversations through fine-tuning and in-context learning[J].arXiv:2305.05001,2023.
[28]LONGPRE S,HOU L,VU T,et al.The flan collection:Designing data and methods for effective instruction tuning[C]//International Conference on Machine Learning.PMLR,2023:22631-22648.
[29]NAIR V,SCHUMACHER E,KANNAN A.Generating medically-accurate summaries of patient-provider dialogue:A multi-stage approach using large language models[J].arXiv:2305.05982,2023.
[30]VAN VEEN D,VAN UDEN C,BLANKEMEIER L,et al.Clinical text summarization:adapting large language models can outperform human experts[J].Research Square,2023,30(4):1134-1142.
[31]DETTMERS T,PAGNONI A,HOLTZMAN A,et al.QLORA:Efficient finetuning of quantized LLMs[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2023:10088-10115.
[32]LYU X,MIN S,BELTAGY I,et al.Z-icl:Zero-shot in-context learning with pseudo-demonstrations[J].arXiv:2212.09865,2022.
[33]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].Advances in Neural Information Processing Systems,2022,35:27730-27744.
[34]CHEN W,LI Z W,FANG H Y,et al.A benchmark for automatic medical consultation system:frameworks,tasks and datasets[J].Bioinformatics,2023,39(1):817.
[35]YIM W,FU Y,BEN ABACHA A,et al.Aci-bench:a novel ambient clinical intelligence datasetfor benchmarking automatic visit note generation[J].Scientific Data,2023,10(1):586.
[36]WANG Q,DAI S T,XU B F,et al.Building chinese biomedical language models via multi-level text discrimination[J].arXiv:2110.07244,2021.
[37]ZHANG J X,GAN R,WANG J J,et al.Fengshenbang 1.0:Being the foundation of chinese cognitive intelligence[J].arXiv:2209.02970,2022.
[38]WANG Y,ZHANG Z,WANG R.Element-aware summarization with large language models:Expert-aligned evaluation and chain-of-thought method[J].arXiv:2305.13412,2023.
[39]YUAN H M,YUAN Z S,GAN R,et al.BioBART:Pretraining and evaluation of a biomedical generative language model[J].arXiv:2204.03905,2022.
[40]COHAN A,DERNONCOURT F,KIM D S,et al.A discourse-aware attention model for abstractive summarization of long documents[J].arXiv:1804.05685,2018.
[41]GLIWA B,MOCHOL I,BIESEK M,et al.SAMSum corpus:A human-annotated dialogue dataset for abstractive summarization[J].arXiv:1911.12237,2019.
[42]ZHENG L M,CHIANG W L,SHENG Y,et al.Judging llm-as-a-judge with mt-bench and chatbot arena[J].Advances in Neural Information Processing Systems,2023,36:46595-46623.
[43]RIBEIRO L F R,BANSAL M,DREYER M.Generating summaries with controllable readability levels[J].arXiv:2310.10623,2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!