计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 135-141.doi: 10.11896/jsjkx.221100260

• 计算机图形学&多媒体 • 上一篇    下一篇

基于自注意力机制和多尺度输入输出的医学图像分割算法

丁天舒, 陈媛媛   

  1. 四川大学计算机学院 成都610065
  • 收稿日期:2022-11-29 修回日期:2023-04-06 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 陈媛媛(chenyuanyuan@scu.edu.cn)
  • 作者简介:(2020223045177@stu.scu.edu.cn)

Medical Image Segmentation Algorithm Based on Self-attention and Multi-scale Input-Output

DING Tianshu, CHEN Yuanyuan   

  1. College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2022-11-29 Revised:2023-04-06 Online:2024-02-15 Published:2024-02-22
  • About author:DING Tianshu,born in 1998,postgra-duate.His main research interests include artificial intelligence and medical image analysis.CHEN Yuanyuan,born in 1983,Ph.D,associate professor,master supervisor.Her main research interests include artificial intelligence,the theory and applications of neural networks and medical image analysis.

摘要: 更精细化的糖尿病性视网膜病变眼底图像分割结果,可以更好地辅助医生进行诊断。大规模高分辨率的分割数据集的出现,为更精细化的分割提供了有利条件。基于U-Net的主流分割网络,使用基于局部运算的卷积操作进行像素预测时无法充分挖掘全局信息,网络模型采用单输入单输出的结构,难以获取多尺度特征信息。为了最大程度地利用现有的大规模高分辨率的眼底图像病灶分割数据集,实现更精细化的分割,需要设计更好的分割方法。文中基于自注意力机制和多尺度输入输出结构对U-Net进行改造,提出了一种新的分割网络SAM-Net,用自注意力模块代替传统卷积模块,增大网络获取全局信息的能力,引入多尺度输入和多尺度输出结构,使网络更容易获取多尺度特征信息。使用图片切片方法来缩小模型的输入尺寸,防止神经网络模型因为输入图片像素过大而导致训练难度增大。最终在IDRiD数据集和FGADR数据集上进行实验,结果表明,SAM-Net可以达到比其他方法更优的性能。

关键词: U-Net, 自注意力机制, 糖尿病性视网膜病变, 分割, 神经网络

Abstract: Refined fundus image segmentation results of diabetic retinopathy can better assist doctors in diagnosis.The appea-rance of large scale and high resolution segmentation data sets provides favorable conditions for more refined segmentation.The mainstream segmentation network based on U-Net,using convolution operation based on local operation,cannot fully excavate global information when making pixel prediction.The network model adopts single-input single-output structure,which makes it difficult to obtain multi-scale feature information.In order to maximize the use of existing large-scale high-resolution fundus image focal segmentation data sets and achieve more refined segmentation,better segmentation methods need to be designed.In this paper,U-Net is transformed based on the self-attention mechanism and multi-scale input/output structure,and a new segmentation network,SAM-Net,is proposed.The self-attention module is used to replace the traditional convolutional module,and the ability of the network to obtain global information is increased.The multi-scale input and multi-scale output structures are introduced to make it easier for the network to obtain multi-scale feature information.The image slicing method is used to reduce the input size of the model,so as to prevent the training difficulty of the neural network model from increasing due to the large pixel of the input picture.Finally,experimental results on IDRiD and FGADR data sets show that SAM-Net can achieve better performance than other methods.

Key words: U-Net, Self-attention, Diabetic retinopathy, Segmentation, Neural network

中图分类号: 

  • TP389.1
[1]GULSHAN V,PENG L,CORAM M,et al.Development andvalidation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs[J].Jama,2016,316(22):2402-2410.
[2]HANEDA S,YAMASHITA H.International clinical diabeticretinopathy disease severity scale[J].Nippon Rinsho,Japanese Journal of Clinical Medicine,2010,68(Suppl 9):228-235.
[3]IRVIN J,RAJPURKAR P,KO M,et al.Chexpert:A large chest radiograph dataset with uncertainty labels and expert comparison[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:590-597.
[4]YAN K,WANG X,LU L,et al.Deeplesion:automated mining of large-scale lesion annotations and universal lesion detection with deep learning[J].Journal of Medical Imaging,2018,5(3):036501.
[5]FAN D,ZHOU T,JI G,et al.Inf-net:Automatic covid-19 lung infection segmentation from ct images[J].IEEE Transactions on Medical Imaging,2020,39(8):2626-2637.
[6]ASIRI N,HUSSAIN M,ADELF,et al.Deep learning basedcomputer-aided diagnosis systems for diabetic retinopathy:A survey[J].Artificial Intelligence in Medicine,2018,8:41-57.
[7]TU Z,GAO S,ZHOU K,et al.Sunet:A lesion regularized model for simultaneous diabetic retinopathy and diabetic macular edema grading[C]//2020 IEEE 17th International Symposium on Biomedical Imaging(ISBI).IEEE,2020:1378-1382.
[8]ARCADU F,BENMANSOUR F,MAUNZ A,et al.Deep lear-ning algorithm predicts diabetic retinopathy progression in individual patients[J].NPJ Digital Medicine,2019,2(1):1-9.
[9]GARGEYA R,LENG T.Automated identification of diabeticretinopathy using deep learning[J].Ophthalmology,2017,124(7):962-969.
[10]SEOUD L,CHELBI J,CHERIET F.Automatic grading of diabetic retinopathy on a public database[C]//MICCAI.Springer,2015.
[11]JIANG H,YANG K,GAO M,et al.An interpretable ensemble deep learning model for diabetic retinopathy disease classification[C]//2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society(EMBC).IEEE,2019:2045-2048.
[12]RONNEBERGER O,FISCHER P,BROX T.U-net:Convolu-tional networks for biomedical image segmentation[C]//MICCAI.Springer,2015:234-241.
[13] Kaggle diabetic retinopathy detection competition[EB/OL].https://www.kaggle.com/c/diabetic-retinopathy-detection.
[14]DECENCIERE E,ZHANG X,CAZUGUEL G,et al.Feedback on a publicly distributed image database:the messidor database[J].Image Analysis & Stereology,2014,33(3):231-234.
[15]PORWAL P,PACHADE S,KAMBLE R,et al.Indian diabetic retinopathy image dataset(idrid):A database for diabetic retinopathy screening research[J].Data,2018,3(3):25.
[16]STAAL J,ABRAMOFF M,NIEMEIJER M,et al.Ridge-based vessel segmentation in color images of the retina[J].IEEE Transactions on Medical Imaging,2004,23(4):501-509.
[17] International competition on ocular disease intelligent recogni-tion[OL].https://odir2019.grand-challenge.org.
[18] Kaggle aptos 2019 blindness detection competition[OL].https://www.kaggle.com/c/aptos2019-blindness-detection.
[19]PORWAL P,PACHADE S,KOKARE M,et al.Idrid:Diabetic retinopathy-segmentation and grading challenge[J].Medical Image Analysis,2020,59:101561.
[20] ZHOU Y,WANG B,HUANG L,et al.A benchmark for stu-dying diabetic retinopathy:Segmentation,grading,and transferability[J].IEEE Transactions on Medical Imaging,2020,40(3):818-828.
[21]ZHANG W,ZHONG J,YANG S,et al.Automated identifica-tion and grading system of diabetic retinopathy using deep neural networks[J].Knowl.Based Syst.,2019(175):12-25.
[22]YANG Y,LI T,LI W,et al.Lesion detection and grading of diabetic retinopathy via two-stages deep convolutional neural networks[C]//ICCAI.Springer,2017:533-540.
[23]WANG Z,YIN Y,SHI J,et al.Zoom-in-net:Deep mining lesions for diabetic retinopathy detection[C]//MICCAI.2017:267-275.
[24]LIN Z,GUO R,WANG Y,et al.A framework for identifying diabetic retinopathy based on antinoise detection and attention-based fusion[C]//MICCAI.Springer,2018:74-82.
[25]ZHOU Y,HE X,HUANG L,et al.Collaborative learning ofsemi-supervised segmentation and classification formedical images[C]//CVPR.2019.
[26]WU Y,GAO S,MEI J,et al.Jcs:An explainable covid-19 diagnosis system by joint classification and segmentation[J].arXiv:2004.07054,2020.
[27]CHAURASIA A,CULURCIELLO E.LinkNet:Exploiting encoder representations for efficient semantic segmentation[C]//2017 IEEE Visual Communications and Image Processing(VCIP).2017:1-4.
[28]WU Y,WU J,JIN S,et al.Dense-U-net:Dense encoder-decoder network for holographic imaging of 3D particle fields[J].Optics Communications,2021,493:126970.
[29]IBTEHAZ N,RAHMAN M.MultiResUNet:Rethinking the U-Net architecture for multimodal biomedical image segmentation[J].Neural Networks,2020,121:74-87.
[30]ZHOU Z,SIDDIQUEE M,TAJBAKHSH N,et al.UNet++:Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation[J].IEEE Trans Med Imaging,2019,39(6):1856-1867.
[31]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J/OL].Advances in Neural Information Processing Systems,2017,30.https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[32]DEVLIN J,CHANG M,LEE K,et al.BERT:Pre-training ofdeep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).Minneapolis,Minnesota:Association for Computational Linguistics,2019:4171-4186.
[33]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].International Conference on Learning Representations.2021.
[34]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[J].arXiv:2103.14030,2021.
[35]CAO H,WANG Y,CHEN J,et al.Swin-Unet:Unet-like Pure Transformer for Medical Image Segmentation[J].arXiv:2105.05537,2021.
[36]LI G,YU Y.Visual saliency detection based on multiscale deep CNN features[J].IEEE Trans.Image Process.,2016,25(11):5012-5024.
[37]FU H,CHENG J,XU Y,et al.Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep Network and Polar Transformation 2016[J].IEEE Transactions on Medical Imaging,2018,37(7):1597-1605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!