计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 189-195.doi: 10.11896/jsjkx.221100218

• 计算机图形学&多媒体 • 上一篇    下一篇

LNG-Transformer:基于多尺度信息交互的图像分类网络

王文杰, 杨燕, 敬丽丽, 王杰, 刘言   

  1. 西南交通大学计算机与人工智能学院 成都 611756可持续城市交通智能化教育部工程研究中心 成都 611756
  • 收稿日期:2022-11-25 修回日期:2023-03-27 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 杨燕(yyang@swjtu.edu.cn)
  • 作者简介:(yyang@swjtu.edu.cn)
  • 基金资助:
    国家自然科学基金(61976247)

LNG-Transformer:An Image Classification Network Based on Multi-scale Information Interaction

WANG Wenjie, YANG Yan, JING Lili, WANG Jie, LIU Yan   

  1. School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,ChinaEngineering Research Center of Sustainable Urban Intelligent Transportation,Ministry of Education,Chengdu 611756,China
  • Received:2022-11-25 Revised:2023-03-27 Online:2024-02-15 Published:2024-02-22
  • About author:WANG Wenjie,born in 1996,postgraduate.His main research interests include image processing andobject detection.YANG Yan,born in 1964,professor,Ph.D supervisor,is a member of CCF(No.06877D).Her main research in-terests include multi-view learning,cluster analysis and so on.
  • Supported by:
    National Natural Science Foundation of China(61976247).

摘要: 鉴于Transformer的Self-Attention机制具有优秀的表征能力,许多研究者提出了基于Self-Attention机制的图像处理模型,并取得了巨大成功。然而,基于Self-Attention的传统图像分类网络无法兼顾全局信息和计算复杂度,限制了Self-Attention的广泛应用。文中提出了一种有效的、可扩展的注意力模块Local Neighbor Global Self-Attention(LNG-SA),该模块在任意时期都能进行局部信息、邻居信息和全局信息的交互。通过重复级联LNG-SA模块,设计了一个全新的网络,称为LNG-Transformer。该网络整体采用层次化结构,具有优秀的灵活性,其计算复杂度与图像分辨率呈线性关系。LNG-SA模块的特性使得LNG-Transformer即使在早期的高分辨率阶段,也可以进行局部信息、邻居信息和全局信息的交互,从而带来更高的效率、更强的学习能力。实验结果表明,LNG-Transformer在图像分类任务中具有良好的性能。

关键词: 图像分类, 自注意力机制, 多尺度, Transformer

Abstract: Due to the superior representation capability of the Transformer’s Self-Attention mechanism,several researchers have developed Self-Attention mechanism-based image processing model and achieved great success.However,the traditional network for image classification based on Self-Attention cannot take into account global information and computational complexity,which limits the wide application of Self-Attention.This paper proposes an efficient and scalable attention module,Local Neighbor Glo-bal Self-Attention(LNG-SA),that may interact with local,neighbor,and global information at any stage.By cascading LNG-SA module,a brand-new network called LNG-Transformer is created.LNG-Transformer adopts a hierarchical structure that provides excellent flexibility,and has a computational complexity proportional to image resolution.The features of LNG-SA enable LNG-Transformer to interact with local information,neighbor information,and global information even in the early stage of high-resolution,resulting in increased efficiency and enhanced learning capacity.Experimental results show that LNG-Transformer performs well at image classification.

Key words: Image classification, Self-Attention, Multiple scales, Transformer

中图分类号: 

  • TP301
[1]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[2]DING X,ZHANG X,HAN J,et al.Scaling up your kernels to 31×31:Revisiting large kernel design in cnns[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11963-11975.
[3]SZEGEDY C,LIU W,JIA Y,et al.Going deeper withconvolu-tions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[4]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[5]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[6]ANDREW G H,MENGLONG Z,BO C,et al.MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[J].arXiv:1704.04861,2017.
[7]FISHER Y,VLADLEN K.Multi-Scale Context Aggregation by Dilated Convolutions[J].arXiv:1511.07122,2015.
[8]DING X,ZHANG X,MA N,et al.Repvgg:Making vgg-style convnets great again[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:13733-13742.
[9]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing SystemsDecember.2017:6000-6010.
[10]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[11]HAN X L,CHEN J C,ZHOU W S.New MobileNet image classification algorithm based on 3D attention[J].Journal of Chongqing University of Post and Telecommunications(Natural Science Edition),2023,35(3):513-519.
[12]ZHAO H W,ZHANG J R,ZHU J P,et al.Image classification framework based on contrastive self-supervised learning[J].Journal of Jilin University(Engineering and Technology Edition),2022,52(8):1850-1856.
[13]张辉宜,夏媛龙,周克武,等.一种融合标签间强相关性的多标签图像分类方法[J].重庆工商大学学报(自然科学版),2023,40(5):8-15.
[14]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J].arXiv:1810.04805,2019.
[15]ALEXEY D,LUCAS B,ALEXANDER K,et al.An Image is Worth 16×16 Words:Transformers for Image Recognition at Scale[C]//International Conference on Learning Representations.2021.
[16]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using neighbored windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[17]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[18]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[19]XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[20]ZHANG X,ZHOU X,LIN M,et al.Shufflenet:An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6848-6856.
[21]DAN H,KEVIN G.Gaussian Error Linear Units(GELUs)[J].arXiv:1606.08415,2016.
[22]LEI J B,RYAN K,GEOFFREY E H,et al.Layer Normalization[J].arXiv:1607.06450,2016.
[23]ILYA L,FRANK H.Decoupled Weight Decay Regularization[J].arXiv:1711.05101,2019.
[24]HONGYI Z,MOUSTAPHA C,YANN N D,et al.mixup:Beyond Empirical Risk Minimization[J].arXiv:1710.09412,23018.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!