计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 200-208.doi: 10.11896/jsjkx.230600018

• 计算机图形学&多媒体 • 上一篇    下一篇

基于多样化标签矩阵的医学影像报告生成

张俊三, 程铭, 沈秀轩, 刘玉雪, 王雷全   

  1. 中国石油大学(华东)青岛软件学院、计算机科学与技术学院 山东 青岛 266580
  • 收稿日期:2023-06-01 修回日期:2023-10-14 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 张俊三(zhangjunsan@upc.edu.cn)
  • 基金资助:
    山东省自然科学基金(ZR2020MF006,ZR2022LZH015)

Diversified Label Matrix Based Medical Image Report Generation

ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan   

  1. Qingdao Institute of Software,College of Computer Science and Technology,China University of Petroleum(East China),Qingdao,Shandong 266580,China
  • Received:2023-06-01 Revised:2023-10-14 Online:2024-08-15 Published:2024-08-13
  • About author:ZHANG Junsan,born in 1978,Ph.D,associate professor,is a member of CCF(No.74487M).His main research interests include information retrieval and recommender systems.
  • Supported by:
    Natural Science Foundation of Shandong Province,China(ZR2020MF006,ZR2022LZH015).

摘要: 医学影像在医学诊断中具有重要作用,而准确描述的文本报告对于理解图像以及后续疾病诊断是必不可少的。目前在医学影像报告生成领域,基于模式化方法生成规范的文本报告成为近年的研究热点。但正负样本数量差距较大导致的数据偏差问题,使得生成的报告内容普遍倾向于描述正常状况,难以准确捕捉异常信息。为解决这一问题,提出了一种基于多样化标签矩阵的医学报告生成方法,可以对不同的疾病进行差异化学习,生成多样化的医疗报告;设计文本-矩阵特征损失函数,优化多样化标签矩阵;增加特征交叉模块改进Transformer网络,加强图像与文本的映射,提升疾病描述的准确性。在IU-X-Ray和MIMIC-CXR两个数据集上进行实验,实验结果表明,与目前的主流方法相比,所提方法在BLEU,METEOR等多个指标上取得了最优的效果。

关键词: 深度学习, 医学影像报告生成, 注意力机制, 图像-文本生成, 多模态

Abstract: Medical images play a vital role in medical diagnosis.Accurately described text reports are essential for understanding images and subsequent disease diagnosis.In recent years,the generation of standardized reports based on modeling methods has become a research hotspot in the field of medical imaging report generation.However,due to the data deviation problem caused by the large gap between positive and negative samples,the content of the generated report generally tends to describe the normal situation.This limitation creates challenges in accurately capturing abnormal information.To address this issue,this paper proposes a novel approach based on diversified label matrix for medical report generation.This method utilizes a diverse label matrix to perform differential learning on different diseases and generate diverse medical reports.Additionally,a text-matrix feature loss function is designed to optimize the diverse label matrix,enhancing its effectiveness.Furthermore,the Transformer network is enhanced by incorporating a feature intersection module.This module strengthens the mapping between images and text,and improves accuracy in disease description.Experimental results on the two datasets of IU-X-Ray and MIMIC-CXR show that,the proposed method achieves the best results in multiple indicators,such as BLEU and METEOR,compared with the current mainstream methods.

Key words: Deep learning, Medical report generation, Attention mechanism, Image-Text generation, Multi-modal

中图分类号: 

  • TP391
[1]ZHANG M M,QIN P L,CHAI R,et al.CT-Generated MRI Algorithm for Acute Ischemic Stroke[J].Computer Engineering,2024,50(2):317-326.
[2]JIA H Y,XIA R,LYU A Q,et al.Panoramic mosaic approach of ultrasound medical images based on template fusion[J].Journal of Jilin University(Engineering and Technology Edition),2022,52(4):916-924.
[3]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[4]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[5]CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//NIPS 2014 Workshop on Deep Learning.2014.
[6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[7]CHEN Z,SONG Y,CHANG T H,et al.Generating Radiology Reports via Memory-driven Transformer[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:1439-1449.
[8]LI Y,LIANG X,HU Z,et al.Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation[C]//NeurIPS.2018.
[9]LI C Y,LIANG X,HU Z,et al.Knowledge-driven encode,retrieve,paraphrase for medical image report generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6666-6673.
[10]HARZIG P,EINFALT M,LIENHART R.Automatic diseasedetection and report generation for gastrointestinal tract examination[C]//Proceedings of the 27th ACM International Confe-rence on Multimedia.2019:2573-2577.
[11]HAN Z,WEI B,LEUNG S,et al.Towards automatic reportgeneration in spine radiology using weakly supervised framework[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2018:185-193.
[12]CHEN Z,SHEN Y,SONG Y,et al.Cross-modal Memory Networks for Radiology Report Generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:5904-5914.
[13]LIU F,WU X,GE S,et al.Exploring and distilling posterior andprior knowledge for radiology report generation[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:13753-13762.
[14]XU K,BA J,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[J].Computer Science,2015:2048-2057.
[15]BO D,FIDLER S,URTASUN R,et al.Towards diverse andnatural image descriptions via a conditional gan[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2970-2979.
[16]AMIRIAN S,RASHEED K,TAHA T R,et al.Image Captioning with Generative Adversarial Network[C]//2019 International Conference on Computational Science and Computational Intelligence(CSCI).2019.
[17]LIU S,ZHU Z,NING Y,et al.Improved Image Captioning via Policy Gradient optimization of SPIDEr[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017.
[18]JING B,XIE P,XING E.On the Automatic Generation of Medical Imaging Reports[C]//Proceedings of the 56th Annual Mee-ting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2577-2586.
[19]ZHANG Y,WANG X,XU Z,et al.When radiology report ge-neration meets knowledge graph[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12910-12917.
[20]PAULUS R,XIONG C,SOCHER R.A Deep Reinforced Model for Abstractive Summarization[C]//International Conference on Learning Representations.2018.
[21]ZHANG Y,MERCK D,TSAI E B,et al.Optimizing the Factual Correctness of a Summary:A Study of Summarizing Radiology Reports[C]//ACL.2020.
[22]RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7008-7024.
[23]LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:375-383.
[24]JING B,WANG Z,XING E.Show,Describe and Conclude:On Exploiting the Structure Information of Chest X-ray Reports[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6570-6580.
[25]LIU F,GE S,WU X.Competence-based Multimodal Curriculum Learning for Medical Report Generation[C]//ACL/IJCNLP.2021.
[26]YOU J,LI D,OKUMURA M,et al.JPG-Jointly Learn toAlign:Automated Disease Prediction and Radiology Report Ge-neration[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:5989-6001.
[27]YAN B,PEI M,ZHAO M,et al.Prior Guided Transformer for Accurate Radiology Reports Generation[J].IEEE Journal of Biomedical and Health Informatics,2022,26(11):5631-5640.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!