计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 197-203.doi: 10.11896/jsjkx.191000135

• 计算机图形学与多媒体 • 上一篇    下一篇

一种结合非局部和多区域注意力机制的细粒度图像识别方法

刘洋, 金忠   

  1. 南京理工大学计算机科学与工程学院 南京 210094
    南京理工大学高维信息智能感知与系统教育部重点实验室 南京 210094
  • 收稿日期:2019-10-21 修回日期:2020-04-05 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 金忠(zhongjin@njust.edu.cn)
  • 作者简介:959590988@qq.com
  • 基金资助:
    国家自然科学基金(61872188, U1713208)

Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism

LIU Yang, JIN Zhong   

  1. School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2019-10-21 Revised:2020-04-05 Online:2021-01-15 Published:2021-01-15
  • About author:LIU Yang ,born in 1995,postgraduate.His main research interests include fine-grained image recognition and object detection.
    JIN Zhong,born in 1961,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include pattern recognition and face recognition.
  • Supported by:
    National Natural Science Foundation of China(61872188, U1713208).

摘要: 细粒度图像识别的目标是对细粒度级别的物体子类进行分类,由于不同子类间的差异非常细微,使得细粒度图像识别具有非常大的挑战性。目前细粒度图像识别算法的难度在于如何定位细粒度目标中具有分辨性的部位以及如何更好地提取细粒度级别的细微特征。为此,提出了一种结合非局部和多区域注意力机制的细粒度识别方法。Navigator只利用图像标签便可以较好地定位到一些鉴别性区域,通过融合全局特征以及鉴别性区域特征取得了不错的分类结果。然而,Navigator仍存在缺陷:1)Navigator未考虑不同位置间的联系,因此所提算法通过引入非局部模块与Navigator相结合,来加强模型的全局信息感知能力;2)针对非局部模块未建立特征通道间联系的缺陷,构建基于通道注意力机制的特征提取网络,使得网络关注更加重要的特征通道。最后,所提算法在3个公开的细粒度图像库CUB-200-2011,Stanford Cars 和FGVC Aircraft上分别达到了88.1%,94.3%,92.0%的识别精度,并且相比Navigator有明显的精度提升。

关键词: 细粒度图像识别, 注意力机制, 非局部, 区域定位, 特征提取

Abstract: The goal of fine-grained image recognition is to classify object subclasses at a fine-grained level.Because the differences between different subclasses are very subtle,fine-grained image recognition is very challenging.At present,the difficulty of this kind of algorithm is how to locate the distinguishable parts of fine-grained targets and how to extract fine-grained features of fine-grained levels.To this end,a fine-grained recognition method combining Non-local and multi-regional attention mechanisms is proposed.Navigatoronly uses image labels to locate some discriminative regions,and achieves good classification results by fusing global features and discriminative regional features.However,Navigator is still flawed.Firstly,the navigator does not consider the relationship between different locations,so the algorithm proposed in this paper combines the non-local module with the navigator to enhance the global information perception ability of the model.Secondly,aiming at the defect that the Non-local module does not establish the relationship between feature channels,a feature extraction network based on channel attention mechanism is constructed,which makes the network pay more attention to the important feature channels.Finally,the algorithm proposed in this paper achieves recognition accuracy of 88.1%,94.3% and 91.8% on three open fine-grained image databases,CUB-200-2011,Stanford Cars and FGVC Aircraft respectively,and has a significant improvement over Navigator.

Key words: Fine-grained image recognition, Attention mechanism, Non-local, Regional location, Feature extraction

中图分类号: 

  • TP301.6
[1] BRANSON S,VAN HORN G,BELONGIE S,et al.Bird species categorization using pose normalized deep convolutional nets[J].arXiv:1406.2952,2014.
[2] CHAI Y,LEMPITSKY V,ZISSERMAN A.Symbiotic Segmentation and Part Localization for Fine-Grained Categotization[C]//IEEE International Conference on computer Computer Vision.2013:321-328.
[3] ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-basedR-CNNs for fine-grained category detection[C]//European Conference on Computer Vision.Springer,Cham,2014:834-849.
[4] XIE L,TIAN Q,HONG R,et al.Hierarchical Part Matching for Fine-Grained Visual Categorization[C]//IEEE International Conference on Computer Vision.2014.
[5] YANG Z,LUO T,WANG D,et al.Learning to Navigate forFine-grained Classification[C]//European Conference on Computer Vision(ECCV).2018:420-435.
[6] ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE International Conference on Computer Vision(ICCV).2017:5209-5217.
[7] SUN M,YUAN Y,ZHOU F,et al.Multi-attention multi-classconstraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision.2018:805-821.
[8] WANG Y,MORARIU V I,DAVIS L S.Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4148-4157.
[9] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,39(6):1137-1149.
[10] WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:7794-7803.
[11] WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional blockattention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[12] LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN Mo-dels for Fine-grained Visual Recognition[C]//IEEE international conference on computer vision.2015:1449-1457.
[13] HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[J].Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2016:770-778.
[14] SHU K,FOWLKES C.Low-Rank Bilinear Pooling for Fine-Grained Classification[C]//IEEE Conference on Computer Vision & Pattern Recognition.2017.
[15] GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear poo-ling[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:317-326.
[16] KIM J H,ON K W,LIM W,et al.Hadamard Product for Low-rank Bilinear Pooling[J].arXiv:1610.04325,2016.
[17] YU C,ZHAO X,ZHENG Q,et al.Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:574-589.
[18] ZHANG Y,TANG H,JIA K.Fine-grained visual categorization using meta-learning optimization with sample selection of auxi-liary data[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:233-248.
[19] JI Z,FU Y,GUO J,et al.Stacked semantics-guided atten-tion model for fine-grained zero-shot learning[C]//Advances in Neural Information Processing Systems.2018:5995-6004.
[20] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[C]//IEEE conference on computer vision and pattern recognition.2018:7132-7141.
[21] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2017:2117-2125.
[22] WAH C,BRANSON S,WELINDER P,et al.The caltech-ucsd birds-200-2011 dataset[EB/OL].https://www.doc88.com/p-1817605164799.html.
[23] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:554-561.
[24] MOGHIMI M,BELONGIE S J,SABERIAN M J,et al.Boosted Convolutional Neural Networks.[C]//BMVC.2016:21-24.
[25] FU J,ZHENG H,TAO M.Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:4438-4446.
[26] WANG F,JIANG M,QIAN C,et al.Residual attention network for image classification[C]//Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2017).2017:6450-6458.
[27] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings- 30th IEEE Conference on Computer Vision and Pattern Recognition.2017:5987-5995.
[28] ZHANG X,LI Z,LOY C C,et al.PolyNet:A pursuit of structural diversity in very deep networks[C]//Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition.2017:3900-3908.
[29] WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional block attention module[C]//European Conference on Computer Vision(ECCV).2018:3-19.
[30] ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE International Conference on Computer Vision(ICCV).2017:5209-5217.
[31] MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visual classification of aircraft[J].arXiv:1306.5151,2013.
[32] PENG Y,HE X,ZHAO J.Object-part attention model for fine-grained image classification[J].IEEE Transactions on Image Processing,IEEE,2017,27(3):1487-1500.
[33] SUN M,YUAN Y,ZHOU F,et al.Multi-attention multi-class constraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:805-821.
[34] WANG Y,MORARIU V I,DAVIS L S.Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4148-4157
[1] 赵佳琦, 王瀚正, 周勇, 张迪, 周子渊. 基于多尺度与注意力特征增强的遥感图像描述生成方法[J]. 计算机科学, 2021, 48(1): 190-196.
[2] 王瑞平, 贾真, 刘畅, 陈泽威, 李天瑞. 基于DeepFM的深度兴趣因子分解机网络[J]. 计算机科学, 2021, 48(1): 226-232.
[3] 王润正, 高见, 黄淑华, 仝鑫. 基于知识蒸馏的恶意代码家族检测方法[J]. 计算机科学, 2021, 48(1): 280-286.
[4] 潘祖江, 刘宁, 张伟, 王建勇. 基于层次注意力机制的多任务疾病进展模型[J]. 计算机科学, 2020, 47(9): 185-189.
[5] 暴雨轩, 芦天亮, 杜彦辉. 深度伪造视频检测技术综述[J]. 计算机科学, 2020, 47(9): 283-292.
[6] 汪亮, 周新志, 严华. 基于GPU的实时SIFT算法[J]. 计算机科学, 2020, 47(8): 105-111.
[7] 赵威, 林煜明, 王超强, 蔡国永. 基于依赖联系分析的观点词对协同抽取[J]. 计算机科学, 2020, 47(8): 164-170.
[8] 梁正友, 何景琳, 孙宇. 一种用于微表情自动识别的三维卷积神经网络进化方法[J]. 计算机科学, 2020, 47(8): 227-232.
[9] 刘燕, 温静. 基于注意力机制的复杂场景文本检测[J]. 计算机科学, 2020, 47(7): 135-140.
[10] 杨威超, 郭渊博, 李涛, 朱本全. 基于流量指纹的物联网设备识别方法和物联网安全模型[J]. 计算机科学, 2020, 47(7): 299-306.
[11] 吕亿林, 田宏韬, 高建伟, 万怀宇. 结合百科知识与句子语义特征的关系抽取方法[J]. 计算机科学, 2020, 47(6A): 40-44.
[12] 蓝章礼, 申德兴, 曹娟, 张玉欣. 一种基图像提取和内容无关图像重构方法研究[J]. 计算机科学, 2020, 47(6A): 226-229.
[13] 周立鹏, 孟利民, 周磊, 蒋维, 董建平. 基于BP神经网络的摔倒检测算法[J]. 计算机科学, 2020, 47(6A): 242-246.
[14] 袁得嵛, 章逸钒, 高见, 孙海春. 基于用户特征提取的新浪微博异常用户检测方法[J]. 计算机科学, 2020, 47(6A): 364-368.
[15] 倪海清, 刘丹, 史梦雨. 基于语义感知的中文短文本摘要生成模型[J]. 计算机科学, 2020, 47(6): 74-78.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[2] 贾伟,华庆一,张敏军,陈锐,姬翔,王博. 基于改进粒子群优化的移动界面模式聚类算法[J]. 计算机科学, 2018, 45(4): 220 -226 .
[3] 丁舒阳,黎冰,侍洪波. 基于改进的离散PSO算法的FJSP的研究[J]. 计算机科学, 2018, 45(4): 233 -239 .
[4] 童泽平,李涛,李立杰,任亮. 基于随机需求与产能限制的供应链协同优化研究[J]. 计算机科学, 2018, 45(4): 260 -265 .
[5] 司念文,王衡军,李伟,单义栋,谢鹏程. 基于注意力长短时记忆网络的中文词性标注模型[J]. 计算机科学, 2018, 45(4): 66 -70 .
[6] 项英倬, 谭菊仙, 韩杰思, 石浩. 图匹配技术研究[J]. 计算机科学, 2018, 45(6): 27 -31 .
[7] 崔一辉, 宋伟, 彭智勇, 杨先娣. 基于差分隐私的多源数据关联规则挖掘方法[J]. 计算机科学, 2018, 45(6): 36 -40 .
[8] 冉正, 罗蕾, 晏华, 李允. 基于纳什均衡的AUTOSAR任务到多核ECU的映射方法[J]. 计算机科学, 2018, 45(6): 166 -171 .
[9] 赖文星, 邓忠民. 基于支配强度的NSGA2改进算法[J]. 计算机科学, 2018, 45(6): 187 -192 .
[10] 季海娟, 周从华, 刘志锋. 一种基于始末距离的时间序列符号聚合近似表示方法[J]. 计算机科学, 2018, 45(6): 216 -221 .