Computer Science ›› 2024, Vol. 51 ›› Issue (8): 200-208.doi: 10.11896/jsjkx.230600018

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Diversified Label Matrix Based Medical Image Report Generation

ZHANG Junsan, CHENG Ming, SHEN Xiuxuan, LIU Yuxue, WANG Leiquan   

  1. Qingdao Institute of Software,College of Computer Science and Technology,China University of Petroleum(East China),Qingdao,Shandong 266580,China
  • Received:2023-06-01 Revised:2023-10-14 Online:2024-08-15 Published:2024-08-13
  • About author:ZHANG Junsan,born in 1978,Ph.D,associate professor,is a member of CCF(No.74487M).His main research interests include information retrieval and recommender systems.
  • Supported by:
    Natural Science Foundation of Shandong Province,China(ZR2020MF006,ZR2022LZH015).

Abstract: Medical images play a vital role in medical diagnosis.Accurately described text reports are essential for understanding images and subsequent disease diagnosis.In recent years,the generation of standardized reports based on modeling methods has become a research hotspot in the field of medical imaging report generation.However,due to the data deviation problem caused by the large gap between positive and negative samples,the content of the generated report generally tends to describe the normal situation.This limitation creates challenges in accurately capturing abnormal information.To address this issue,this paper proposes a novel approach based on diversified label matrix for medical report generation.This method utilizes a diverse label matrix to perform differential learning on different diseases and generate diverse medical reports.Additionally,a text-matrix feature loss function is designed to optimize the diverse label matrix,enhancing its effectiveness.Furthermore,the Transformer network is enhanced by incorporating a feature intersection module.This module strengthens the mapping between images and text,and improves accuracy in disease description.Experimental results on the two datasets of IU-X-Ray and MIMIC-CXR show that,the proposed method achieves the best results in multiple indicators,such as BLEU and METEOR,compared with the current mainstream methods.

Key words: Deep learning, Medical report generation, Attention mechanism, Image-Text generation, Multi-modal

CLC Number: 

  • TP391
[1]ZHANG M M,QIN P L,CHAI R,et al.CT-Generated MRI Algorithm for Acute Ischemic Stroke[J].Computer Engineering,2024,50(2):317-326.
[2]JIA H Y,XIA R,LYU A Q,et al.Panoramic mosaic approach of ultrasound medical images based on template fusion[J].Journal of Jilin University(Engineering and Technology Edition),2022,52(4):916-924.
[3]HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[4]GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks,2005,18(5/6):602-610.
[5]CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[C]//NIPS 2014 Workshop on Deep Learning.2014.
[6]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008.
[7]CHEN Z,SONG Y,CHANG T H,et al.Generating Radiology Reports via Memory-driven Transformer[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP).2020:1439-1449.
[8]LI Y,LIANG X,HU Z,et al.Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation[C]//NeurIPS.2018.
[9]LI C Y,LIANG X,HU Z,et al.Knowledge-driven encode,retrieve,paraphrase for medical image report generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:6666-6673.
[10]HARZIG P,EINFALT M,LIENHART R.Automatic diseasedetection and report generation for gastrointestinal tract examination[C]//Proceedings of the 27th ACM International Confe-rence on Multimedia.2019:2573-2577.
[11]HAN Z,WEI B,LEUNG S,et al.Towards automatic reportgeneration in spine radiology using weakly supervised framework[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.Cham:Springer,2018:185-193.
[12]CHEN Z,SHEN Y,SONG Y,et al.Cross-modal Memory Networks for Radiology Report Generation[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2021:5904-5914.
[13]LIU F,WU X,GE S,et al.Exploring and distilling posterior andprior knowledge for radiology report generation[C]//Procee-dings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:13753-13762.
[14]XU K,BA J,KIROS R,et al.Show,Attend and Tell:NeuralImage Caption Generation with Visual Attention[J].Computer Science,2015:2048-2057.
[15]BO D,FIDLER S,URTASUN R,et al.Towards diverse andnatural image descriptions via a conditional gan[C]//Procee-dings of the IEEE International Conference on Computer Vision.2017:2970-2979.
[16]AMIRIAN S,RASHEED K,TAHA T R,et al.Image Captioning with Generative Adversarial Network[C]//2019 International Conference on Computational Science and Computational Intelligence(CSCI).2019.
[17]LIU S,ZHU Z,NING Y,et al.Improved Image Captioning via Policy Gradient optimization of SPIDEr[C]//2017 IEEE International Conference on Computer Vision(ICCV).IEEE,2017.
[18]JING B,XIE P,XING E.On the Automatic Generation of Medical Imaging Reports[C]//Proceedings of the 56th Annual Mee-ting of the Association for Computational Linguistics(Volume 1:Long Papers).2018:2577-2586.
[19]ZHANG Y,WANG X,XU Z,et al.When radiology report ge-neration meets knowledge graph[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12910-12917.
[20]PAULUS R,XIONG C,SOCHER R.A Deep Reinforced Model for Abstractive Summarization[C]//International Conference on Learning Representations.2018.
[21]ZHANG Y,MERCK D,TSAI E B,et al.Optimizing the Factual Correctness of a Summary:A Study of Summarizing Radiology Reports[C]//ACL.2020.
[22]RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-critical sequence training for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:7008-7024.
[23]LU J,XIONG C,PARIKH D,et al.Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:375-383.
[24]JING B,WANG Z,XING E.Show,Describe and Conclude:On Exploiting the Structure Information of Chest X-ray Reports[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.2019:6570-6580.
[25]LIU F,GE S,WU X.Competence-based Multimodal Curriculum Learning for Medical Report Generation[C]//ACL/IJCNLP.2021.
[26]YOU J,LI D,OKUMURA M,et al.JPG-Jointly Learn toAlign:Automated Disease Prediction and Radiology Report Ge-neration[C]//Proceedings of the 29th International Conference on Computational Linguistics.2022:5989-6001.
[27]YAN B,PEI M,ZHAO M,et al.Prior Guided Transformer for Accurate Radiology Reports Generation[J].IEEE Journal of Biomedical and Health Informatics,2022,26(11):5631-5640.
[1] XIAO Xiao, BAI Zhengyao, LI Zekai, LIU Xuheng, DU Jiajin. Parallel Multi-scale with Attention Mechanism for Point Cloud Upsampling [J]. Computer Science, 2024, 51(8): 183-191.
[2] PU Bin, LIANG Zhengyou, SUN Yu. Monocular 3D Object Detection Based on Height-Depth Constraint and Edge Fusion [J]. Computer Science, 2024, 51(8): 192-199.
[3] GUO Fangyuan, JI Genlin. Video Anomaly Detection Method Based on Dual Discriminators and Pseudo Video Generation [J]. Computer Science, 2024, 51(8): 217-223.
[4] WANG Chao, TANG Chao, WANG Wenjian, ZHANG Jing. Infrared Human Action Recognition Method Based on Multimodal Attention Network [J]. Computer Science, 2024, 51(8): 232-241.
[5] ZHANG Lu, DUAN Youxiang, LIU Juan, LU Yuxi. Chinese Geological Entity Relation Extraction Based on RoBERTa and Weighted Graph Convolutional Networks [J]. Computer Science, 2024, 51(8): 297-303.
[6] CHEN Shanshan, YAO Subin. Study on Recommendation Algorithms Based on Knowledge Graph and Neighbor PerceptionAttention Mechanism [J]. Computer Science, 2024, 51(8): 313-323.
[7] CHEN Siyu, MA Hailong, ZHANG Jianhui. Encrypted Traffic Classification of CNN and BiGRU Based on Self-attention [J]. Computer Science, 2024, 51(8): 396-402.
[8] SUN Yumo, LI Xinhang, ZHAO Wenjie, ZHU Li, LIANG Ya’nan. Driving Towards Intelligent Future:The Application of Deep Learning in Rail Transit Innovation [J]. Computer Science, 2024, 51(8): 1-10.
[9] KONG Lingchao, LIU Guozhu. Review of Outlier Detection Algorithms [J]. Computer Science, 2024, 51(8): 20-33.
[10] LIU Sichun, WANG Xiaoping, PEI Xilong, LUO Hangyu. Scene Segmentation Model Based on Dual Learning [J]. Computer Science, 2024, 51(8): 133-142.
[11] TANG Ruiqi, XIAO Ting, CHI Ziqiu, WANG Zhe. Few-shot Image Classification Based on Pseudo-label Dependence Enhancement and NoiseInterferenceReduction [J]. Computer Science, 2024, 51(8): 152-159.
[12] ZHANG Rui, WANG Ziqi, LI Yang, WANG Jiabao, CHEN Yao. Task-aware Few-shot SAR Image Classification Method Based on Multi-scale Attention Mechanism [J]. Computer Science, 2024, 51(8): 160-167.
[13] WANG Qian, HE Lang, WANG Zhanqing, HUANG Kun. Road Extraction Algorithm for Remote Sensing Images Based on Improved DeepLabv3+ [J]. Computer Science, 2024, 51(8): 168-175.
[14] SHI Dianxi, GAO Yunqi, SONG Linna, LIU Zhe, ZHOU Chenlei, CHEN Ying. Deep-Init:Non Joint Initialization Method for Visual Inertial Odometry Based on Deep Learning [J]. Computer Science, 2024, 51(7): 327-336.
[15] FAN Yi, HU Tao, YI Peng. Host Anomaly Detection Framework Based on Multifaceted Information Fusion of SemanticFeatures for System Calls [J]. Computer Science, 2024, 51(7): 380-388.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!