Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240700183-6.doi: 10.11896/jsjkx.240700183

• Intelligent Medical Engineering • Previous Articles     Next Articles

Bi-MI ViT:Bi-directional Multi-level Interaction Vision Transformer for Lung CT ImageClassification

LONG Xiao1, HUANG Wei2, HU Kai1   

  1. 1 School of Computer Science & School of Cyberspace Science,Xiangtan University,Xiangtan,Hunan 411105,China
    2 Computer Medical Image Processing Research Center,Department of Radiology,The First Hospital of Changsha,Changsha 410005,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:LONG Xiao,born in 2003,undergra-duate.Her main research interests include deep learning and medical image processing.
    HU Kai,born in 1984,Ph.D,professor,is a senior member of CCF.His main research interests include machine learning,pattern recognition,bioinformatics,and medical image processing.
  • Supported by:
    National Natural Science Foundation of China(62272404),Natural Science Foundation of Hunan Province of China(2022JJ30571),Science and Technology Department of Hunan Province(2021SK53105),Project of Education Department of Hunan Province(23A0146),Innovation and Entrepreneurship Training Program for Hunan University Students(S202310530178) and Project of Undergraduate Teaching Reform Research of Hunan Province(202401000574).

Abstract: In recent years,the local-window based Self-Attention mechanism has gained prominence in vision tasks.However,due to the limited receptive field and weak modeling ability,it is not effective in dealing with complex data.The features in lung CT images are complex and diverse,including the shape,size and density of nodules,which bring challenges to mining the deep features in the data.To address these issues,this paper proposes a bi-directional multi-level interaction vision Transformer(Bi-MI ViT) backbone network that effectively integrates spatial and channel information through an innovative bi-directional multi-level interaction mechanism.This integration significantly improves the accuracy and comprehensiveness of feature extraction.Within the Transformer branch,we introduce an efficient cascaded group attention(CGA) strategy to enrich the diversity of attention head features and enhance the model’s ability to capture key information.Simultaneously,in the convolutional neural network(CNN) branch,we utilize a depth-wise and point-wise(DP) block structure along with point-wise convolution(PW) and depth-wise convolution(DW) to deeply mine local information and optimize model representation ability.Additionally,our establishment of a deep feature extraction(DFE) module enhances feature propagation and reuse while optimizing data utilization efficiency,leading to substantial performance improvement.Experimental results on both of the public COVID-CT dataset and private LUAD-CT dataset demonstrate that the proposed method outperforms the eight comparison methods in classification accuracy.

Key words: Lung CT images, Bi-directional multi-layer interaction, Convolutional neural network, Transformer, Classification

CLC Number: 

  • TP391
[1]MOBINY A,VAN NGUYEN H.Fast capsnet for lung cancerscreening[C]//International conference on medical image computing and computer-assisted intervention.Cham:Springer International Publishing,2018:741-749.
[2]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[3]TOUVRON H,CORD M,DOUZE M,et al.Training data-efficient image transformers & distillation through attention[C]//International Conference on Machine Learning.PMLR,2021:10347-10357.
[4]YANG C,XU J,DE MELLO S,et al.Gpvit:a high resolution non-hierarchical vision transformer with group propagation[J].arXiv:2212.06795,2022.
[5]SHI B,DARRELL T,WANG X.Top-down visual attentionfrom analysis by synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:2102-2112.
[6]DING M,SHEN Y,FAN L,et al.Visual dependency transformers:Dependency tree emerges from reversed attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14528-14539.
[7]LEE Y,KIM J,WILLETTE J,et al.Mpvit:Multi-path visiontransformerfor dense prediction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:7287-7296.
[8]REN P,LI C,WANG G,et al.Beyond fixation:Dynamic window visual transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11987-11997.
[9]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[10]CHU X,TIAN Z,WANG Y,et al.Twins:Revisiting the design of spatial attention in vision transformers[J].Advances in Neural Information Processing Systems,2021,34:9355-9366.
[11]HUANG Z,BEN Y,LUO G,et al.Shuffle transformer:Rethinking spatial shuffle for vision transformer[J].arXiv:2106.03650,2021.
[12]SUN W,CHEN X,ZHANG X,et al.A Multi-Feature Learning Model with Enhanced Local Attention for Vehicle Re-Identification[J].Computers,Materials & Continua,2021,69(3).
[13]ZHOU J,WANG P,WANG F,et al.Elsa:Enhanced local self-attention for vision transformer[J].arXiv:2112.12786,2021.
[14]ARAR M,SHAMIR A,BERMANO A H.Learned queries for efficient local attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:10841-10852.
[15]LIU X,PENG H,ZHENG N,et al.Efficientvit:Memory efficient vision transformer with cascaded group attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:14420-14430.
[16]LIU C,DING W,CHEN P,et al.RB-Net:Training highly accurate and efficient binary neural networks with reshaped point-wise convolution and balanced activation[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(9):6414-6424.
[17]HUA B S,TRAN M K,YEUNG S K.Pointwise convolutionalneural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:984-993.
[18]CHOLLET F.Xception:Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1251-1258.
[19]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[20]HUANG G,LIU Z,VAN DER MAATEN L,et al.Denselyconnected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[21]ZEILER M D,TAYLOR G W,FERGUS R.Adaptive deconvolutional networks for mid and high level feature learning[C]//2011 International Conference on Computer Vision.IEEE,2011:2018-2025.
[22]YANG X,HE X,ZHAO J,et al.Covid-ct-dataset:a ct scan dataset about covid-19[J].arXiv:2003.13865,2020.
[23]CHEN C F R,FAN Q,PANDA R.Crossvit:Cross-attentionmulti-scale vision transformer for image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:357-366.
[24]GUO J,HAN K,WU H,et al.Cmt:Convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12175-12185.
[25]CHEN Q,WU Q,WANG J,et al.Mixformer:Mixing features across windows and dimensions[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:5249-5259.
[26]MEHTA S,RASTEGARI M.Mobilevit:light-weight,general-purpose,and mobile-friendly vision transformer[J].arXiv:2110.02178,2021.
[27]LI Y,YUAN G,WEN Y,et al.Efficientformer:vision trans-formers at mobilenet speed[J].Advances in Neural Information Processing Systems,2022,35:12934-12949.
[28]LIU Z,SHEN L.CECT:Controllable ensemble CNN and transformer for COVID-19 image classification[J].Computers in Biology and Medicine,2024,173:108388.
[1] BIAN Hui, MENG Changqian, LI Zihan, CHEN Zihaoand XIE Xuelei. Continuous Sign Language Recognition Based on Graph Convolutional Network and CTC/Attention [J]. Computer Science, 2025, 52(6A): 240400098-9.
[2] WANG Baohui, GAO Zhan, XU Lin, TAN Yingjie. Research and Implementation of Mine Gas Concentration Prediction Algorithm Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240400188-7.
[3] WANG Xuejian, WANG Yiheng, SUN Xinpo, LIU Chuan, JIA Ming, ZHAO Chao, YANG Chao. Extraction of Crustal Deformation Anomalies Based on Transformer-Isolation Forest [J]. Computer Science, 2025, 52(6A): 240600155-6.
[4] SHI Xincheng, WANG Baohui, YU Litao, DU Hui. Study on Segmentation Algorithm of Lower Limb Bone Anatomical Structure Based on 3D CTImages [J]. Computer Science, 2025, 52(6A): 240500119-9.
[5] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[6] RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[7] PIAO Mingjie, ZHANG Dongdong, LU Hu, LI Rupeng, GE Xiaoli. Study on Multi-agent Supply Chain Inventory Management Method Based on Improved Transformer [J]. Computer Science, 2025, 52(6A): 240500054-10.
[8] ZHAO Zheyu, WANG Zhongqing, WANG Hongling. Commodity Attribute Classification Method Based on Dual Pre-training [J]. Computer Science, 2025, 52(6A): 240500127-8.
[9] WANG Chundong, ZHANG Qinghua, FU Haoran. Federated Learning Privacy Protection Method Combining Dataset Distillation [J]. Computer Science, 2025, 52(6A): 240500132-7.
[10] HUANG Zhiyong, LI Bicheng, WEI Wei. Aspect-level Sentiment Analysis Models Based on Syntax and Semantics [J]. Computer Science, 2025, 52(6A): 240400193-7.
[11] WANG Jiamin, WU Wenhong, NIU Hengmao, SHI Bao, WU Nier, HAO Xu, ZHANG Chao, FU Rongsheng. Review of Concrete Defect Detection Methods Based on Deep Learning [J]. Computer Science, 2025, 52(6A): 240900137-12.
[12] LEI Shuai, QIU Mingxin, LIU Xianhui, ZHANG Yingyao. Image Classification Model for Waste Household Appliance Recycling Based on Multi-scaleDepthwise Separable ResNet [J]. Computer Science, 2025, 52(6A): 240500057-7.
[13] LI Yang, LIU Yi, LI Hao, ZHANG Gang, XU Mingfeng, HAO Chongqing. Human Pose Estimation Using Millimeter Wave Radar Based on Transformer and PointNet++ [J]. Computer Science, 2025, 52(6A): 240400169-9.
[14] CHENG Yan, HE Huijuan, CHEN Yanying, YAO Nannan, LIN Guobo. Study on interpretable Shallow Class Activation Mapping Algorithm Based on Spatial Weights andInter Layer Correlation [J]. Computer Science, 2025, 52(6A): 240500140-7.
[15] GUO Yecai, HU Xiaowei, MAO Xiangnan. Multi-scale Feature Fusion Residual Denoising Network Based on Cascade [J]. Computer Science, 2025, 52(6): 239-246.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!