Computer Science ›› 2025, Vol. 52 ›› Issue (9): 294-302.doi: 10.11896/jsjkx.241000114

• Artificial Intelligence • Previous Articles     Next Articles

Collaboration of Large and Small Language Models with Iterative Reflection Framework for Clinical Note Summarization

ZHONG Boyang, RUAN Tong, ZHANG Weiyan, LIU Jingping   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2024-10-21 Revised:2025-01-24 Online:2025-09-15 Published:2025-09-11
  • About author:ZHONG Boyang,born in 2000,postgraduate.His main research interests include natural language processing and vertical domain large language model.
    LIU Jingping,born in 1991,lecturer,master supervisor.His main research interests include natural language processing and vertical domain large language model.

Abstract: Generating clinical notes from doctor-patient dialogues is a critical task in medical artificial intelligence.Existing me-thods typically rely on large language models(LLMs) with few-shot demonstrations but often struggle to integrate sufficient domain-specific knowledge,leading to suboptimal and less professional outputs.To address this problem,a novel iterative reflection framework is proposed,which integrates Error2Correct example learning and domain-model supervision,aiming to improve the summary quality of EMR.Specifically,a large-cale language model integrating the Error2Correct example learning mechanism is designed for the initial generation and continuous potimization of EMR,and the medical domain knowledge is integrated into the pre-generation stage.Then,this paper uses a lightweight medical pre-training language model,fine-tuned with domain data,to evaluate the refined content,integrating domain knowledge in post-generation.Finally,an iterative scheduler is introduced,which can effectively guide the model to optimize in the continuous process of reflection and improvement.Experimental results on two public datasets demonstrate that the proposed method achieves state-of-the-art performance.Compared with the fine-tuned large language models,the proposed method improves overall performance by 3.68% and 7.75% on IMCS-V2-MRG and ACI-BENCH datasets.

Key words: Large language model, Medical pre-trained model, Summarization generation, Large model reflection, Collaboration of large and small models

CLC Number: 

  • TP391
[1]LIU Z J,WANG X L,CHEN Q C,et al.Temporal indexing of medical entity in Chinese clinical notes[J].BMC Medical Informatics and Decision Making,2019,19:1-11.
[2]YU H Y,ZUO X L,TANG J T,et al.Identifying causal effects of the clinical sentiment of patients’ nursing notes on anticipated fall risk stratification[J].Information Processing & Mana-gement,2023,60(6):103481.
[3]LU X T,SUN L P,LING C,et al.Named Entity Recognition of Chinese Electronic Health Records Incorporating Phonetic and Part-of-speech Features[J].Journal of Chinese Computer Systems,2025,46(2):330-338.
[4]LIU S S,NIE W J,GAO D F,et al.Clinical quantitative information recognition and entity-quantity association from Chinese electronic medical records[J].International Journal of Machine Learning and Cybernetics,2021,12:117-130.
[5]LEWIS M.Bart:Denoising sequence-to-sequence pre-training for natural language generation,translation,and comprehension[J].arXiv:1910.13461,2019.
[6]RAFFEL C,SHAZEER N,ROBERTS A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].Journal of Machine Learning Research,2020,21(140):1-67.
[7]NI H Q,LIU D,SHI M Y.Semantic-aware Chinese Short Text Summarization Model[J].Computer Science,2020,47(6):74-78.
[8]XI T J,DUAN Z T,CAO J R,et al.Hybrid SummarizationMethod for Legal-Related Long Texts in Public Opinion Information[J].Journal of Chinese Information Processing,2024,38(7):63-72.
[9]ZHANG L,NEGRINHO R,GHOSH A,et al.Leveraging pretrained models for automatic summarization of doctor-patient conversations[J].arXiv:2109.12174,2021.
[10]KRISHNA K,KHOSLA S,BIGHAM J P,et al.GeneratingSOAP notes from doctor-patient conversations using modular summarization techniques[J].arXiv:2005.01795,2020.
[11]JOSHI A,KATARIYA N,AMATRIAIN X,et al.Dr.summarize:Global summarization of medical dialogue by exploiting local structures[J].arXiv:2009.08666,2020.
[12]MICHALOPOULOS G,WILLIAMS K,SINGH G,et al.MedicalSum:A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations[C]//Findings of the Association for Computational Linguistics:EMNLP 2022.2022:4741-4749.
[13]LU G L,JU X L,CHEN X,et al.GRACE:Empowering LLM-based software vulnerability detection with graph structure and in-context learning[J].Journal of Systems and Software,2024,212:112031.
[14]WANG L F,ZHAO M,JI H R,et al.Dialogue summarization enhanced response generation for multi-domain task-oriented dialogue systems[J].Information Processing & Management,2024,61(3):103668.
[15]DU Z X,QIAN Y J,LIU X,et al.Glm:General language model pretraining with autoregressive blank infilling[J].arXiv:2103.10360,2021.
[16]GIORGI J,TOMA A,XIE R,et al.Clinical note generation from doctor-patient conversations using large language models:Insights from mediqa-chat[J].arXiv:2305.02220,2023.
[17]ZHOU W,WANG Z Y,WEI B.Generative Automatic Summarization Model for Legal Judgments[J].Computer Science,2021,48(12):331-336.
[18]KONG Y L,WANG Z Q,WANG H L.Research on Comment Summarization Combined with Evaluation Object Information[J/OL].Computer Science,1-8[2024-10-16].http://kns.cnki.net/kcms/detail/50.1075.TP.20241012.0929.010.html.
[19]GAO Y J,MILLER T,XU D F,et al.Summarizing patients’ problems from hospital progress notes using pre-trained sequence-to-sequence models[C]//Proceedings of COLING.International Conference on Computational Linguistics.NIH Public Access,2022:2979.
[20]ENARVI S,AMOIA M,TEBA M D A,et al.Generating medical reports from patient-doctor conversations using sequence-to-sequence models[C]//Proceedings of the First Workshop on Natural Language Processing for Medical Conversations.2020:22-30.
[21]SONG Y,TIAN Y H,WANG N,et al.Summarizing medicalconversations via identifying important utterances[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:717-729.
[22]MICHALOPOULOS G,WILLIAMS K,SINGH G,et al.MedicalSum:A guided clinical abstractive summarization model for generating medical reports from patient-doctor conversations[C]//Findings of the Association forComputational Linguistics:EMNLP 2022.2022:4741-4749.
[23]CAI P S,LIU F,BAJRACHARYA A,et al.Generation of patient after-visit summaries to support physicians[C]//Procee-dings of the 29th International Conference on Computational Linguistics(COLING).2022:6234-6247.
[24]WU R S,WANG H L,WANG Z Q,et al.Short Text Summarization Method Based on Global Self-matching Mechanism[J].Journal of Software,2019,30(9):2705-2717.
[25]HUANG Y X,YU Z T,GUO J J,et al.Case Topic Summarization Based on Topic Interaction Graph[J].Journal of Software,2023,34(4):1796-1810.
[26]KRISHNA K,KHOSLA S,BIGHAM J P,et al.GeneratingSOAP notes from doctor-patient conversations using modular summarization techniques[J].arXiv:2005.01795,2020.
[27]TANG X R,TRAN A,TAN J,et al.Gersteinlab at mediqa-chat 2023:Clinical note summarization from doctor-patient conversations through fine-tuning and in-context learning[J].arXiv:2305.05001,2023.
[28]LONGPRE S,HOU L,VU T,et al.The flan collection:Designing data and methods for effective instruction tuning[C]//International Conference on Machine Learning.PMLR,2023:22631-22648.
[29]NAIR V,SCHUMACHER E,KANNAN A.Generating medically-accurate summaries of patient-provider dialogue:A multi-stage approach using large language models[J].arXiv:2305.05982,2023.
[30]VAN VEEN D,VAN UDEN C,BLANKEMEIER L,et al.Clinical text summarization:adapting large language models can outperform human experts[J].Research Square,2023,30(4):1134-1142.
[31]DETTMERS T,PAGNONI A,HOLTZMAN A,et al.QLORA:Efficient finetuning of quantized LLMs[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2023:10088-10115.
[32]LYU X,MIN S,BELTAGY I,et al.Z-icl:Zero-shot in-context learning with pseudo-demonstrations[J].arXiv:2212.09865,2022.
[33]OUYANG L,WU J,JIANG X,et al.Training language models to follow instructions with human feedback[J].Advances in Neural Information Processing Systems,2022,35:27730-27744.
[34]CHEN W,LI Z W,FANG H Y,et al.A benchmark for automatic medical consultation system:frameworks,tasks and datasets[J].Bioinformatics,2023,39(1):817.
[35]YIM W,FU Y,BEN ABACHA A,et al.Aci-bench:a novel ambient clinical intelligence datasetfor benchmarking automatic visit note generation[J].Scientific Data,2023,10(1):586.
[36]WANG Q,DAI S T,XU B F,et al.Building chinese biomedical language models via multi-level text discrimination[J].arXiv:2110.07244,2021.
[37]ZHANG J X,GAN R,WANG J J,et al.Fengshenbang 1.0:Being the foundation of chinese cognitive intelligence[J].arXiv:2209.02970,2022.
[38]WANG Y,ZHANG Z,WANG R.Element-aware summarization with large language models:Expert-aligned evaluation and chain-of-thought method[J].arXiv:2305.13412,2023.
[39]YUAN H M,YUAN Z S,GAN R,et al.BioBART:Pretraining and evaluation of a biomedical generative language model[J].arXiv:2204.03905,2022.
[40]COHAN A,DERNONCOURT F,KIM D S,et al.A discourse-aware attention model for abstractive summarization of long documents[J].arXiv:1804.05685,2018.
[41]GLIWA B,MOCHOL I,BIESEK M,et al.SAMSum corpus:A human-annotated dialogue dataset for abstractive summarization[J].arXiv:1911.12237,2019.
[42]ZHENG L M,CHIANG W L,SHENG Y,et al.Judging llm-as-a-judge with mt-bench and chatbot arena[J].Advances in Neural Information Processing Systems,2023,36:46595-46623.
[43]RIBEIRO L F R,BANSAL M,DREYER M.Generating summaries with controllable readability levels[J].arXiv:2310.10623,2023.
[1] LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[2] CAI Qihang, XU Bin, DONG Xiaodi. Knowledge Graph Completion Model Using Semantically Enhanced Prompts and Structural Information [J]. Computer Science, 2025, 52(9): 282-293.
[3] WANG Limei, HAN Linrui, DU Zuwei, ZHENG Ri, SHI Jianzhong, LIU Yiqun. Privacy Policy Compliance Detection Method for Mobile Application Based on Large LanguageModel [J]. Computer Science, 2025, 52(8): 1-16.
[4] WANG Dongsheng. Multi-defendant Legal Judgment Prediction with Multi-turn LLM and Criminal Knowledge Graph [J]. Computer Science, 2025, 52(8): 308-316.
[5] LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[6] CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[7] TU Ji, XIAO Wendong, TU Wenji, LI Lijian. Application of Large Language Models in Medical Education:Current Situation,Challenges and Future [J]. Computer Science, 2025, 52(6A): 240400121-6.
[8] LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[9] ZOU Rui, YANG Jian, ZHANG Kai. Low-resource Vietnamese Speech Synthesis Based on Phoneme Large Language Model andDiffusion Model [J]. Computer Science, 2025, 52(6A): 240700138-6.
[10] ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[11] BAI Yuntian, HAO Wenning, JIN Dawei. Study on Open-domain Question Answering Methods Based on Retrieval-augmented Generation [J]. Computer Science, 2025, 52(6A): 240800141-7.
[12] ZHANG Le, CHE Chao, LIANG Yan. Hallucinations Proactive Relief in Diabetes Q&A LLM [J]. Computer Science, 2025, 52(6A): 240700182-10.
[13] YIN Baosheng, ZONG Chen. Research on Semantic Fusion of Chinese Polysemous Words Based on Large LanguageModel [J]. Computer Science, 2025, 52(6A): 240400139-7.
[14] HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[15] ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!