计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 197-203.doi: 10.11896/jsjkx.191000135

• 计算机图形学与多媒体 • 上一篇    下一篇

一种结合非局部和多区域注意力机制的细粒度图像识别方法

刘洋, 金忠   

  1. 南京理工大学计算机科学与工程学院 南京 210094
    南京理工大学高维信息智能感知与系统教育部重点实验室 南京 210094
  • 收稿日期:2019-10-21 修回日期:2020-04-05 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 金忠(zhongjin@njust.edu.cn)
  • 作者简介:959590988@qq.com
  • 基金资助:
    国家自然科学基金(61872188, U1713208)

Fine-grained Image Recognition Method Combining with Non-local and Multi-region Attention Mechanism

LIU Yang, JIN Zhong   

  1. School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2019-10-21 Revised:2020-04-05 Online:2021-01-15 Published:2021-01-15
  • About author:LIU Yang ,born in 1995,postgraduate.His main research interests include fine-grained image recognition and object detection.
    JIN Zhong,born in 1961,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include pattern recognition and face recognition.
  • Supported by:
    National Natural Science Foundation of China(61872188, U1713208).

摘要: 细粒度图像识别的目标是对细粒度级别的物体子类进行分类,由于不同子类间的差异非常细微,使得细粒度图像识别具有非常大的挑战性。目前细粒度图像识别算法的难度在于如何定位细粒度目标中具有分辨性的部位以及如何更好地提取细粒度级别的细微特征。为此,提出了一种结合非局部和多区域注意力机制的细粒度识别方法。Navigator只利用图像标签便可以较好地定位到一些鉴别性区域,通过融合全局特征以及鉴别性区域特征取得了不错的分类结果。然而,Navigator仍存在缺陷:1)Navigator未考虑不同位置间的联系,因此所提算法通过引入非局部模块与Navigator相结合,来加强模型的全局信息感知能力;2)针对非局部模块未建立特征通道间联系的缺陷,构建基于通道注意力机制的特征提取网络,使得网络关注更加重要的特征通道。最后,所提算法在3个公开的细粒度图像库CUB-200-2011,Stanford Cars 和FGVC Aircraft上分别达到了88.1%,94.3%,92.0%的识别精度,并且相比Navigator有明显的精度提升。

关键词: 非局部, 区域定位, 特征提取, 细粒度图像识别, 注意力机制

Abstract: The goal of fine-grained image recognition is to classify object subclasses at a fine-grained level.Because the differences between different subclasses are very subtle,fine-grained image recognition is very challenging.At present,the difficulty of this kind of algorithm is how to locate the distinguishable parts of fine-grained targets and how to extract fine-grained features of fine-grained levels.To this end,a fine-grained recognition method combining Non-local and multi-regional attention mechanisms is proposed.Navigatoronly uses image labels to locate some discriminative regions,and achieves good classification results by fusing global features and discriminative regional features.However,Navigator is still flawed.Firstly,the navigator does not consider the relationship between different locations,so the algorithm proposed in this paper combines the non-local module with the navigator to enhance the global information perception ability of the model.Secondly,aiming at the defect that the Non-local module does not establish the relationship between feature channels,a feature extraction network based on channel attention mechanism is constructed,which makes the network pay more attention to the important feature channels.Finally,the algorithm proposed in this paper achieves recognition accuracy of 88.1%,94.3% and 91.8% on three open fine-grained image databases,CUB-200-2011,Stanford Cars and FGVC Aircraft respectively,and has a significant improvement over Navigator.

Key words: Attention mechanism, Feature extraction, Fine-grained image recognition, Non-local, Regional location

中图分类号: 

  • TP301.6
[1] BRANSON S,VAN HORN G,BELONGIE S,et al.Bird species categorization using pose normalized deep convolutional nets[J].arXiv:1406.2952,2014.
[2] CHAI Y,LEMPITSKY V,ZISSERMAN A.Symbiotic Segmentation and Part Localization for Fine-Grained Categotization[C]//IEEE International Conference on computer Computer Vision.2013:321-328.
[3] ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-basedR-CNNs for fine-grained category detection[C]//European Conference on Computer Vision.Springer,Cham,2014:834-849.
[4] XIE L,TIAN Q,HONG R,et al.Hierarchical Part Matching for Fine-Grained Visual Categorization[C]//IEEE International Conference on Computer Vision.2014.
[5] YANG Z,LUO T,WANG D,et al.Learning to Navigate forFine-grained Classification[C]//European Conference on Computer Vision(ECCV).2018:420-435.
[6] ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE International Conference on Computer Vision(ICCV).2017:5209-5217.
[7] SUN M,YUAN Y,ZHOU F,et al.Multi-attention multi-classconstraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision.2018:805-821.
[8] WANG Y,MORARIU V I,DAVIS L S.Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4148-4157.
[9] REN S,HE K,GIRSHICK R,et al.Faster R-CNN:TowardsReal-Time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,39(6):1137-1149.
[10] WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2018:7794-7803.
[11] WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional blockattention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[12] LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN Mo-dels for Fine-grained Visual Recognition[C]//IEEE international conference on computer vision.2015:1449-1457.
[13] HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[J].Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2016:770-778.
[14] SHU K,FOWLKES C.Low-Rank Bilinear Pooling for Fine-Grained Classification[C]//IEEE Conference on Computer Vision & Pattern Recognition.2017.
[15] GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear poo-ling[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:317-326.
[16] KIM J H,ON K W,LIM W,et al.Hadamard Product for Low-rank Bilinear Pooling[J].arXiv:1610.04325,2016.
[17] YU C,ZHAO X,ZHENG Q,et al.Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:574-589.
[18] ZHANG Y,TANG H,JIA K.Fine-grained visual categorization using meta-learning optimization with sample selection of auxi-liary data[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:233-248.
[19] JI Z,FU Y,GUO J,et al.Stacked semantics-guided atten-tion model for fine-grained zero-shot learning[C]//Advances in Neural Information Processing Systems.2018:5995-6004.
[20] HU J,SHEN L,ALBANIE S,et al.Squeeze-and-Excitation Networks[C]//IEEE conference on computer vision and pattern recognition.2018:7132-7141.
[21] LIN T Y,DOLLÁR P,GIRSHICK R,et al.Feature pyramidnetworks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2017:2117-2125.
[22] WAH C,BRANSON S,WELINDER P,et al.The caltech-ucsd birds-200-2011 dataset[EB/OL].https://www.doc88.com/p-1817605164799.html.
[23] KRAUSE J,STARK M,DENG J,et al.3D object representations for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:554-561.
[24] MOGHIMI M,BELONGIE S J,SABERIAN M J,et al.Boosted Convolutional Neural Networks.[C]//BMVC.2016:21-24.
[25] FU J,ZHENG H,TAO M.Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:4438-4446.
[26] WANG F,JIANG M,QIAN C,et al.Residual attention network for image classification[C]//Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition(CVPR 2017).2017:6450-6458.
[27] XIE S,GIRSHICK R,DOLLÁR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings- 30th IEEE Conference on Computer Vision and Pattern Recognition.2017:5987-5995.
[28] ZHANG X,LI Z,LOY C C,et al.PolyNet:A pursuit of structural diversity in very deep networks[C]//Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition.2017:3900-3908.
[29] WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional block attention module[C]//European Conference on Computer Vision(ECCV).2018:3-19.
[30] ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE International Conference on Computer Vision(ICCV).2017:5209-5217.
[31] MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visual classification of aircraft[J].arXiv:1306.5151,2013.
[32] PENG Y,HE X,ZHAO J.Object-part attention model for fine-grained image classification[J].IEEE Transactions on Image Processing,IEEE,2017,27(3):1487-1500.
[33] SUN M,YUAN Y,ZHOU F,et al.Multi-attention multi-class constraint for fine-grained image recognition[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:805-821.
[34] WANG Y,MORARIU V I,DAVIS L S.Learning a discriminative filter bank within a CNN for fine-grained recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4148-4157
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 戴禹, 许林峰.
基于文本行匹配的跨图文本阅读方法
Cross-image Text Reading Method Based on Text Line Matching
计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[3] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[4] 熊丽琴, 曹雷, 赖俊, 陈希亮.
基于值分解的多智能体深度强化学习综述
Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization
计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[5] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[6] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[7] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[8] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[9] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[10] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[11] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[12] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[13] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[14] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[15] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!