计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240700183-6.doi: 10.11896/jsjkx.240700183
龙肖1, 黄巍2, 胡凯1
LONG Xiao1, HUANG Wei2, HU Kai1
摘要: 近年来,基于局部窗口的Self-Attention机制在视觉分类任务中表现突出。然而,由于存在感受野有限和建模能力弱的问题,其在处理复杂数据时效果不佳。肺部CT图像中的特征复杂多样,包括结节的形状、大小、密度等,给深入挖掘数据中的深层次特征带来挑战。针对这些问题,文中提出了一个全新的双向多层级交互网络模型Bi-directional Multi-level Interaction Vision Transformer(Bi-MI ViT)。该网络通过双向多层级交互机制有效融合空间和通道信息,从而显著提升特征提取的准确性和全面性。在Transformer分支中,引入了高效的级联组注意力机制,旨在丰富注意力头特征的多样性,并增强模型对关键信息的捕捉能力。同时,在卷积神经网络(Convolutional Neural Networks,CNNs)分支中,通过设计DP block,并利用点卷积(Point-Wise Convolution,PW)和深度卷积(Depth-Wise Convolution,DW)深入挖掘局部信息,以优化模型的表达能力。此外,深度特征提取模块(Deep Feature Extraction,DFE)的建立增强了特征传播和复用,提高了数据利用效率,实现了实质性的性能改进。实验结果显示,在公开的COVID19-CT数据集和私有的LUAD-CT数据集上,所提算法优于对比的8种方法,实现了准确分类。
中图分类号:
[1]MOBINY A,VAN NGUYEN H.Fast capsnet for lung cancerscreening[C]//International conference on medical image computing and computer-assisted intervention.Cham:Springer International Publishing,2018:741-749. [2]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [3]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning.PMLR,2021:10347-10357. [4]YANG C,XU J,DE MELLO S,et al.Gpvit:a high resolution non-hierarchical vision transformer with group propagation[J].arXiv:2212.06795,2022. [5]SHI B,DARRELL T,WANG X.Top-down visual attentionfrom analysis by synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2102-2112. [6]DING M,SHEN Y,FAN L,et al.Visual dependency transformers:Dependency tree emerges from reversed attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14528-14539. [7]LEE Y,KIM J,WILLETTE J,et al.Mpvit:Multi-path visiontransformerfor dense prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:7287-7296. [8]REN P,LI C,WANG G,et al.Beyond fixation:Dynamic window visual transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11987-11997. [9]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [10]CHU X,TIAN Z,WANG Y,et al.Twins:Revisiting the design of spatial attention in vision transformers[J].Advances in Neural Information Processing Systems,2021,34:9355-9366. [11]HUANG Z,BEN Y,LUO G,et al.Shuffle transformer:Rethinking spatial shuffle for vision transformer[J].arXiv:2106.03650,2021. [12]SUN W,CHEN X,ZHANG X,et al.A Multi-Feature Learning Model with Enhanced Local Attention for Vehicle Re-Identification[J].Computers,Materials & Continua,2021,69(3). [13]ZHOU J,WANG P,WANG F,et al.Elsa:Enhanced local self-attention for vision transformer[J].arXiv:2112.12786,2021. [14]ARAR M,SHAMIR A,BERMANO A H.Learned queries for efficient local attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10841-10852. [15]LIU X,PENG H,ZHENG N,et al.Efficientvit:Memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14420-14430. [16]LIU C,DING W,CHEN P,et al.RB-Net:Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(9):6414-6424. [17]HUA B S,TRAN M K,YEUNG S K.Pointwise convolutionalneural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:984-993. [18]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258. [19]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [20]HUANG G,LIU Z,VAN DER MAATEN L,et al.Denselyconnected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708. [21]ZEILER M D,TAYLOR G W,FERGUS R.Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision.IEEE,2011:2018-2025. [22]YANG X,HE X,ZHAO J,et al.Covid-ct-dataset:a ct scan dataset about covid-19[J].arXiv:2003.13865,2020. [23]CHEN C F R,FAN Q,PANDA R.Crossvit:Cross-attentionmulti-scale vision transformer for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:357-366. [24]GUO J,HAN K,WU H,et al.Cmt:Convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12175-12185. [25]CHEN Q,WU Q,WANG J,et al.Mixformer:Mixing features across windows and dimensions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5249-5259. [26]MEHTA S,RASTEGARI M.Mobilevit:light-weight,general-purpose,and mobile-friendly vision transformer[J].arXiv:2110.02178,2021. [27]LI Y,YUAN G,WEN Y,et al.Efficientformer:vision trans-formers at mobilenet speed[J].Advances in Neural Information Processing Systems,2022,35:12934-12949. [28]LIU Z,SHEN L.CECT:Controllable ensemble CNN and transformer for COVID-19 image classification[J].Computers in Biology and Medicine,2024,173:108388. |
|