基于双向多层级交互网络的肺部CT图像分类

doi:10.11896/jsjkx.240700183

Abstract

Abstract: In recent years,the local-window based Self-Attention mechanism has gained prominence in vision tasks.However,due to the limited receptive field and weak modeling ability,it is not effective in dealing with complex data.The features in lung CT images are complex and diverse,including the shape,size and density of nodules,which bring challenges to mining the deep features in the data.To address these issues,this paper proposes a bi-directional multi-level interaction vision Transformer(Bi-MI ViT) backbone network that effectively integrates spatial and channel information through an innovative bi-directional multi-level interaction mechanism.This integration significantly improves the accuracy and comprehensiveness of feature extraction.Within the Transformer branch,we introduce an efficient cascaded group attention(CGA) strategy to enrich the diversity of attention head features and enhance the model’s ability to capture key information.Simultaneously,in the convolutional neural network(CNN) branch,we utilize a depth-wise and point-wise(DP) block structure along with point-wise convolution(PW) and depth-wise convolution(DW) to deeply mine local information and optimize model representation ability.Additionally,our establishment of a deep feature extraction(DFE) module enhances feature propagation and reuse while optimizing data utilization efficiency,leading to substantial performance improvement.Experimental results on both of the public COVID-CT dataset and private LUAD-CT dataset demonstrate that the proposed method outperforms the eight comparison methods in classification accuracy.

Key words: Lung CT images, Bi-directional multi-layer interaction, Convolutional neural network, Transformer, Classification

CLC Number:

TP391

LONG Xiao, HUANG Wei, HU Kai. Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification[J].Computer Science, 2025, 52(6A): 240700183-6.

References

[1]MOBINY A,VAN NGUYEN H.Fast capsnet for lung cancerscreening[C]//International conference on medical image computing and computer-assisted intervention.Cham:Springer International Publishing,2018:741-749.
[2]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[3]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning.PMLR,2021:10347-10357.
[4]YANG C,XU J,DE MELLO S,et al.Gpvit:a high resolution non-hierarchical vision transformer with group propagation[J].arXiv:2212.06795,2022.
[5]SHI B,DARRELL T,WANG X.Top-down visual attentionfrom analysis by synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2102-2112.
[6]DING M,SHEN Y,FAN L,et al.Visual dependency transformers:Dependency tree emerges from reversed attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14528-14539.
[7]LEE Y,KIM J,WILLETTE J,et al.Mpvit:Multi-path visiontransformerfor dense prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:7287-7296.
[8]REN P,LI C,WANG G,et al.Beyond fixation:Dynamic window visual transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11987-11997.
[9]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[10]CHU X,TIAN Z,WANG Y,et al.Twins:Revisiting the design of spatial attention in vision transformers[J].Advances in Neural Information Processing Systems,2021,34:9355-9366.
[11]HUANG Z,BEN Y,LUO G,et al.Shuffle transformer:Rethinking spatial shuffle for vision transformer[J].arXiv:2106.03650,2021.
[12]SUN W,CHEN X,ZHANG X,et al.A Multi-Feature Learning Model with Enhanced Local Attention for Vehicle Re-Identification[J].Computers,Materials & Continua,2021,69(3).
[13]ZHOU J,WANG P,WANG F,et al.Elsa:Enhanced local self-attention for vision transformer[J].arXiv:2112.12786,2021.
[14]ARAR M,SHAMIR A,BERMANO A H.Learned queries for efficient local attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10841-10852.
[15]LIU X,PENG H,ZHENG N,et al.Efficientvit:Memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14420-14430.
[16]LIU C,DING W,CHEN P,et al.RB-Net:Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(9):6414-6424.
[17]HUA B S,TRAN M K,YEUNG S K.Pointwise convolutionalneural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:984-993.
[18]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258.
[19]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[20]HUANG G,LIU Z,VAN DER MAATEN L,et al.Denselyconnected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[21]ZEILER M D,TAYLOR G W,FERGUS R.Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision.IEEE,2011:2018-2025.
[22]YANG X,HE X,ZHAO J,et al.Covid-ct-dataset:a ct scan dataset about covid-19[J].arXiv:2003.13865,2020.
[23]CHEN C F R,FAN Q,PANDA R.Crossvit:Cross-attentionmulti-scale vision transformer for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:357-366.
[24]GUO J,HAN K,WU H,et al.Cmt:Convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12175-12185.
[25]CHEN Q,WU Q,WANG J,et al.Mixformer:Mixing features across windows and dimensions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5249-5259.
[26]MEHTA S,RASTEGARI M.Mobilevit:light-weight,general-purpose,and mobile-friendly vision transformer[J].arXiv:2110.02178,2021.
[27]LI Y,YUAN G,WEN Y,et al.Efficientformer:vision trans-formers at mobilenet speed[J].Advances in Neural Information Processing Systems,2022,35:12934-12949.
[28]LIU Z,SHEN L.CECT:Controllable ensemble CNN and transformer for COVID-19 image classification[J].Computers in Biology and Medicine,2024,173:108388.

Related Articles 15

[1]	BIAN Hui, MENG Changqian, LI Zihan, CHEN Zihaoand XIE Xuelei. Continuous Sign Language Recognition Based on Graph Convolutional Network and CTC/Attention [J]. Computer Science, 2025, 52(6A): 240400098-9.
[2]	WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.
[3]	WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[4]	SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9.
[5]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[6]	RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[7]	PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[8]	ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
[9]	WANG Chundong, ZHANG Qinghua, FU Haoran. Federated Learning Privacy Protection Method Combining Dataset Distillation [J]. Computer Science, 2025, 52(6A): 240500132-7.
[10]	HUANG Zhiyong, LI Bicheng, WEI Wei. Aspect-level Sentiment Analysis Models Based on Syntax and Semantics [J]. Computer Science, 2025, 52(6A): 240400193-7.
[11]	WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[12]	LEI Shuai, QIU Mingxin, LIU Xianhui, ZHANG Yingyao. Image Classification Model for Waste Household Appliance Recycling Based on Multi-scaleDepthwise Separable ResNet [J]. Computer Science, 2025, 52(6A): 240500057-7.
[13]	LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[14]	CHENG Yan, HE Huijuan, CHEN Yanying, YAO Nannan, LIN Guobo. Study on interpretable Shallow Class Activation Mapping Algorithm Based on Spatial Weights andInter Layer Correlation [J]. Computer Science, 2025, 52(6A): 240500140-7.
[15]	GUO Yecai, HU Xiaowei, MAO Xiangnan. Multi-scale Feature Fusion Residual Denoising Network Based on Cascade [J]. Computer Science, 2025, 52(6): 239-246.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0