Computer Science ›› 2024, Vol. 51 ›› Issue (2): 189-195.doi: 10.11896/jsjkx.221100218

• Computer Graphics & Multimedia • Previous Articles     Next Articles

LNG-Transformer:An Image Classification Network Based on Multi-scale Information Interaction

WANG Wenjie, YANG Yan, JING Lili, WANG Jie, LIU Yan   

  1. School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,ChinaEngineering Research Center of Sustainable Urban Intelligent Transportation,Ministry of Education,Chengdu 611756,China
  • Received:2022-11-25 Revised:2023-03-27 Online:2024-02-15 Published:2024-02-22
  • About author:WANG Wenjie,born in 1996,postgraduate.His main research interests include image processing andobject detection.YANG Yan,born in 1964,professor,Ph.D supervisor,is a member of CCF(No.06877D).Her main research in-terests include multi-view learning,cluster analysis and so on.
  • Supported by:
    National Natural Science Foundation of China(61976247).

Abstract: Due to the superior representation capability of the Transformer’s Self-Attention mechanism,several researchers have developed Self-Attention mechanism-based image processing model and achieved great success.However,the traditional network for image classification based on Self-Attention cannot take into account global information and computational complexity,which limits the wide application of Self-Attention.This paper proposes an efficient and scalable attention module,Local Neighbor Glo-bal Self-Attention(LNG-SA),that may interact with local,neighbor,and global information at any stage.By cascading LNG-SA module,a brand-new network called LNG-Transformer is created.LNG-Transformer adopts a hierarchical structure that provides excellent flexibility,and has a computational complexity proportional to image resolution.The features of LNG-SA enable LNG-Transformer to interact with local information,neighbor information,and global information even in the early stage of high-resolution,resulting in increased efficiency and enhanced learning capacity.Experimental results show that LNG-Transformer performs well at image classification.

Key words: Image classification, Self-Attention, Multiple scales, Transformer

CLC Number: 

  • TP301
[1]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[2]DING X,ZHANG X,HAN J,et al.Scaling up your kernels to 31×31:Revisiting large kernel design in cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11963-11975.
[3]SZEGEDY C,LIU W,JIA Y,et al.Going deeper withconvolu-tions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[4]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[5]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[6]ANDREW G H,MENGLONG Z,BO C,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04861,2017.
[7]FISHER Y,VLADLEN K.Multi-Scale Context Aggregation by Dilated Convolutions[J].arXiv:1511.07122,2015.
[8]DING X,ZHANG X,MA N,et al.Repvgg:Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:13733-13742.
[9]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing SystemsDecember.2017:6000-6010.
[10]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[11]HAN X L,CHEN J C,ZHOU W S.New MobileNet image classification algorithm based on 3D attention[J].Journal of Chongqing University of Post and Telecommunications(Natural Science Edition),2023,35(3):513-519.
[12]ZHAO H W,ZHANG J R,ZHU J P,et al.Image classification framework based on contrastive self-supervised learning[J].Journal of Jilin University(Engineering and Technology Edition),2022,52(8):1850-1856.
[13]张辉宜,夏媛龙,周克武,等.一种融合标签间强相关性的多标签图像分类方法[J].重庆工商大学学报(自然科学版),2023,40(5):8-15.
[14]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2019.
[15]ALEXEY D,LUCAS B,ALEXANDER K,et al.An Image is Worth 16×16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations.2021.
[16]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using neighbored windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[17]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[18]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[19]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[20]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[21]DAN H,KEVIN G.Gaussian Error Linear Units(GELUs)[J].arXiv:1606.08415,2016.
[22]LEI J B,RYAN K,GEOFFREY E H,et al.Layer Normalization[J].arXiv:1607.06450,2016.
[23]ILYA L,FRANK H.Decoupled Weight Decay Regularization[J].arXiv:1711.05101,2019.
[24]HONGYI Z,MOUSTAPHA C,YANN N D,et al.mixup:Beyond Empirical Risk Minimization[J].arXiv:1710.09412,23018.
[1] XI Ying, WU Xuemeng, CUI Xiaohui. Node Influence Ranking Model Based on Transformer [J]. Computer Science, 2024, 51(4): 106-116.
[2] ZENG Ruiren, XIE Jiangtao, LI Peihua. Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization [J]. Computer Science, 2024, 51(4): 254-261.
[3] DING Tianshu, CHEN Yuanyuan. Medical Image Segmentation Algorithm Based on Self-attention and Multi-scale Input-Output [J]. Computer Science, 2024, 51(2): 135-141.
[4] ZHANG Feng, HUANG Shixin, HUA Qiang, DONG Chunru. Novel Image Classification Model Based on Depth-wise Convolution Neural Network andVisual Transformer [J]. Computer Science, 2024, 51(2): 196-204.
[5] HUANG Hanqiang, XING Yunbing, SHEN Jianfei, FAN Feiyi. Sign Language Animation Splicing Model Based on LpTransformer Network [J]. Computer Science, 2023, 50(9): 184-191.
[6] TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[7] YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
[8] ZHU Yuying, GUO Yan, WAN Yizhao, TIAN Kai. New Word Detection Based on Branch Entropy-Segmentation Probability Model [J]. Computer Science, 2023, 50(7): 221-228.
[9] ZENG Wu, MAO Guojun. Few-shot Learning Method Based on Multi-graph Feature Aggregation [J]. Computer Science, 2023, 50(6A): 220400029-10.
[10] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[11] BAI Zhengyao, FAN Shenglan, LU Qianjie, ZHOU Xue. COVID-19 Instance Segmentation and Classification Network Based on CT Image Semantics [J]. Computer Science, 2023, 50(6A): 220600142-9.
[12] YANG Jingyi, LI Fang, KANG Xiaodong, WANG Xiaotian, LIU Hanqing, HAN Junling. Ultrasonic Image Segmentation Based on SegFormer [J]. Computer Science, 2023, 50(6A): 220400273-6.
[13] DOU Zhi, HU Chenguang, LIANG Jingyi, ZHENG Liming, LIU Guoqi. Lightweight Target Detection Algorithm Based on Improved Yolov4-tiny [J]. Computer Science, 2023, 50(6A): 220700006-7.
[14] YANG Bin, LIANG Jing, ZHOU Jiawei, ZHAO Mengci. Study on Interpretable Click-Through Rate Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(5): 12-20.
[15] WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!