计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 317-324.doi: 10.11896/jsjkx.230900076

• 人工智能 • 上一篇    下一篇

基于知识辅助的结构化医疗报告生成

史继筠1, 张驰1, 王禹桥1, 罗兆经2, 张美慧1   

  1. 1 北京理工大学计算机学院 北京 100081
    2 新加坡国立大学计算机学院 新加坡 117417
  • 收稿日期:2023-09-13 修回日期:2024-03-11 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 张美慧(meihui_zhang@bit.edu.cn)
  • 作者简介:(shijiyun@bit.edu.cn)

Generation of Structured Medical Reports Based on Knowledge Assistance

SHI Jiyun1, ZHANG Chi1, WANG Yuqiao1, LUO Zhaojing2, ZHANG Meihui1   

  1. 1 School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
    2 School of Computing,National University of Singapore,Singapore 117417,Singapore
  • Received:2023-09-13 Revised:2024-03-11 Online:2024-06-15 Published:2024-06-05
  • About author:SHI Jiyun,born in 1991,Ph.D,is a member of CCF(No.J2410M).Her main research interests include big data and artificial intelligence.
    ZHANG Meihui,born in 1985,professor,Ph.D supervisor,is a member of CCF(No.92466M).Her main research interests include big data,blockchain and artificial intelligence.

摘要: 医疗报告自动生成是文本摘要生成技术的重要应用。由于医疗问诊数据与通用领域的数据特征存在着明显的差异,传统的文本摘要生成方法不能充分理解并利用医疗文本中高复杂性的医疗术语,因此医疗问诊中包含的关键知识并没有得到充分的利用。此外,传统的文本摘要生成方法大多是直接生成摘要,并没有针对医疗报告结构化的特点自动选择过滤关键信息并生成结构化文本的能力。针对上述问题,提出了一种知识辅助的结构化医疗报告生成方法。该方法将实体引导的先验领域知识与结构引导的任务解耦机制相结合,实现了对医疗问诊数据的关键知识与医疗报告的结构化特点的充分利用。在IMCS21数据集上的实验验证了所提方法的有效性,其生成摘要的ROUGE分数与同类方法相比提升了2%~3%,生成了更准确的医疗报告。

关键词: 医疗报告生成, 预训练模型, 生成式摘要, 领域知识先验, 任务解耦机制

Abstract: Automatic generation of medical reports is an important application of text summarization technology.Due to the ob-vious difference between the medical consultation data and data of the general field,the traditional text summary generation me-thod cannot fully understand and utilize the highly complex medical terms in the medical text,so that the key knowledge contained in the medical consultation has not been fully used.In addition,most of the traditional text summary generation methods directly generate summaries,and do not have the ability to automatically select and filter key information and generate structured text according to the structural characteristics of medical reports.In order to solve the above problems,a knowledge-assisted structured medical report generation method is proposed in this paper.The proposed method combines the entity-guided prior domainknowledge with the structure-guided task decoupling mechanism,and realizes the key knowledge of medical consultation data,taking full advantage of the structured features of medical reports.The effectiveness of the method is verified on the IMCS21 dataset.The ROUGE score of the summary generated by our method is 2% to 3% higher than that of baseline methods,and a more accurate medical report is generated.

Key words: Medical report generation, Pre-training model, Generative summarization, Domain knowledge prior, Task decoupling mechanism

中图分类号: 

  • TP301
[1]ZHOU Q,YANG N,WEI F,et al.Neural document summarization by jointly learning to ßscore and select sentences[C]//ACL 2018-56th Annual Meeting of the Association for Computational Linguistics,Proceedings of the Conference(Long Papers).Melbourne,VIC,Australia:2018:654-663.
[2]RUSH A M,CHOPRA S,WESTON J.A neural attention model for abstractive sentence summarization[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2015:379-389.
[3]LIU Y.Fine-tune BERT for extractive summarization[J].ar-Xiv:1903.10318,2019.
[4]ZHONG M,LIU P,CHEN Y,et al.Extractive summarization as text matching[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Seattle,Washington,USA:ACL,2020:6197-6208.
[5]SEE A,LIU P J,MANNING C D.Get to the point:Summa-rization with pointer-generator networks[J].arXiv:1704.04368,2017.
[6]PAULUS R,XIONG C,SOCHER R.A deep reinforced model for abstractive summarization[J].arXiv:1705.04304,2017.
[7]LI W,XIAO X,LYU Y,et al.Improving neural abstractive document summarization with structural regulariza-tion[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.2018:4078-4087.
[8]LUO Z,YEUNG S H,ZHANG M,et al.MLCask:Efficientmanagement of component evolution in collaborative data analytics pipelines[C]//2021 IEEE International Conference on Data Engineering(ICDE).IEEE,2021:1655-1666.
[9]LUO Z,CAI S,GAO J,et al.Adaptive lightweight regularization tool for complex analytics[C]//2018 IEEE International Conference on Data Engineering(ICDE).IEEE,2018:485-496.
[10]LUO Z,CAI S,WANG Y,et al.Regularized Pairwise Relationship based Analytics for Structured Data[C]//Proceedings of the ACM on Management of Data.2023:1-27.
[11]SONG Y,TIAN Y,WANG N,et al.Summarizing medical conversations via identifying important utterances[C]//Proceedings of the 28th International Conference on Computational Linguistics.2020:717-729.
[12]ZHANG Y Z,JIANG Z T,ZHANG T,et al.MIE:A medical information extractortowards medical dialogues[C]//Proceedings of the 58th Annual Meeting of the Association for Computa-tional Linguistics.Association for Computational Linguistics.2020:6460-6469.
[13]ENARVI S,AMOIA M,TEBA M A,et al.Generating medical reports from patient-doctor conversations using sequence-to-sequence models[C]//Proceedings of the First Workshop on Na-tural Language Processing for Medical Conversations.2020:22-30.
[14]CHINTAGUNTA B,KATARIYA N,AMATRIAIN X,et al.Medically aware gpt-3 as a data generator for medical dialogue summarization[C]//Machine Learning for Healthcare Confe-rence.PMLR,2021:354-372.
[15]KRISHNA K,KHOSLA S,BIGHAM J,et al.Generating soap notes from doctor-patient conversations using modular summarization techniques[C]//ACL-IJCNLP 2021-59th Annual Mee-ting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,Proceedings of the Conference.Virtual,Online:2021:4958-4972.
[16]LEWIS M,LIU Y,GOYAL N,et al.Bart:Denoising sequence-to-sequence pre-training for natural language generation,translation,and comprehension[J].arXiv:1910.13461,2019.
[17]SOUZA F,NOGUEIRA R,LOTUFO R.Portuguese named en-tity recognition using BERT-CRF[J].arXiv:1909.10649,2019.
[18]LIU P,YUAN W,FU J,et al.Pre-train,prompt,and predict:A systematic survey of prompting methods in natural language processing[J].ACM Computing Surveys,2023,55(9):1-35.
[19]REBUFFI S A,BILEN H,VEDALDI A.Learning multiple vi-sual domains with residual adapters[J].arXiv:1705.08045,2017.
[20]CHEN W,LI Z,FANG H,et al.A benchmark for automatic medical consultation system:frameworks,tasks and da-tasets[J].Bioinformatics,2023,39(1):817.
[21]ZHANG N,CHEN M,BI Z,et al.Cblue:A chinese biomedical language understanding evaluation benchmark[J].arXiv:2106.08087,2021.
[22]LIN C Y.Rouge:A package for automatic evaluation of summa-ries[C]//Text Summarization Branches out.2004:74-81.
[23]QI W,GONG Y,YAN Y,et al.Prophetnet-x:Large-scale pre-training models for english,chinese,multi-lingual,dialog,and code generation[J].arXiv:2104.08006,2021.
[24]CHEN X,YE J,ZU C,et al.How Robust is GPT-3.5 to Predecessors?A Comprehensive Study on Language Understanding Tasks[J].arXiv:2303.00293,2023.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!