计算机科学 ›› 2022, Vol. 49 ›› Issue (5): 105-112.doi: 10.11896/jsjkx.210100108

• 计算机图形学&多媒体* 上一篇    下一篇

基于多分支注意力增强的细粒度图像分类

张文轩, 吴秦   

  1. 江南大学人工智能与计算机学院 江苏 无锡214122
    江南大学江苏省模式识别与计算智能工程实验室 江苏 无锡214122
  • 收稿日期:2021-01-14 修回日期:2021-04-21 出版日期:2022-05-15 发布日期:2022-05-06
  • 通讯作者: 吴秦(qinwu@jiangnan.edu.cn)
  • 作者简介:(6181914045@stu.jiangnan.edu.cn)
  • 基金资助:
    国家自然科学基金(61972180)

Fine-grained Image Classification Based on Multi-branch Attention-augmentation

ZHANG Wen-xuan, WU Qin   

  1. School of Artificial Intelligence and Computer Science,Jiangnan University,Jiangsu,Wuxi 214122
    ChinaJiangsu Provincial Engineering Laboratory for Pattern Recognition and Computational Intelligence,Jiangnan University,Jiangsu,Wuxi 214122,China
  • Received:2021-01-14 Revised:2021-04-21 Online:2022-05-15 Published:2022-05-06
  • About author:ZHANG Wen-xuan,born in 1997,master candidate,is a member of China Computer Federation.His main research interests include computer vision and machine learning.
    WU Qin,born in 1978,Ph.D,associate professor,is a member of China Computer Federation.Her main research interests include computer vision and pattern recognition.
  • Supported by:
    National Natural Science Foundation of China(61972180).

摘要: 针对细粒度图像类间差距小、类内差距大的问题,文中提出以弱监督学习的方式使用多分支注意力增强卷积网络,从而实现细粒度图像分类。文中采用Inception-V3网络提取图像的基础特征,从中获取多个局部响应区域并进行特征融合,在此基础上采用注意力机制对图像关键区域进行自约束的局部裁剪和局部擦除,避免仅提取目标单个部位的特征,促使网络更加关注目标物体不同部位的细节特征,同时也提升了目标区域的定位精度。此外,文中提出中心正则化损失函数来约束训练过程中获取的注意力区域,以进一步提升目标定位精度和扩大图像特征的类间差距。在3个公开数据集上进行了实验,结果表明,所提方法取得了比当前最优方法更好的结果。

关键词: 多分支注意力增强, 卷积神经网络, 弱监督学习, 细粒度图像分类, 中心正则化损失

Abstract: In order to address the challenges of high intra-class variances and low inter-class variances in fine-grained image classification,a multi-branch attention-augmented convolution neural network is proposed to solve the problem.The pre-trained Inception-V3 network is used to extract basic feature.In order to solve the problem that features are extracted from one part of an object and encourage the network to pay more attention to the discriminative features of different parts,we apply self-constrained attention-wised cropping and self-constrained attention-wised erasing on the central parts of the original images.It also improves the detection accuracy of object locations.Meanwhile,a central regularization loss function is proposed to constrain attention-augmented training process to obtain better attention regions and expand the gap between different classes of images.Comprehensive experiments on three benchmark datasets show that our approach surpasses the state-of-art works.

Key words: Central regularization loss, Convolutional neural network, Fine-grained image classification, Multi-branch attention-augmentation, Weakly supervised learning

中图分类号: 

  • TP391
[1]WELINDER P,BRANSON S,MITA T,et al.The Caltech-UCSD Birds-200-2011 Dataset[R].California Institute of Technology,2011:1-15.
[2]RABIEE H,HADDADNIA J,MOUSAVI H,et al.Novel dataset for fine-grained abnormal behavior understanding in crowd[C]//IEEE International Conference on Advanced Video & Signal Based Surveillance.2016:121-130.
[3]YANG W G,HUAI Y J.Flower Image Enhancement and Classification Based on Deep Convolution Generative Adversarial Network[J].Computer Science,2020,47(6):176-179.
[4]KRAUSE J,STARK M,DENG J,et al.3D Object Representations for Fine-Grained Categorization[C]//IEEE International Conference on Computer Vision Workshops.2013:554-561.
[5]MAJI S,RAHTU E,KANNALA J,et al.Fine-Grained VisualClassification of Aircraft[C]//IEEE International Conference on Advanced Video & Signal Based Surveillance.2013:1-6.
[6]PERRONNIN F,DANCE C.Fisher Kernels on Visual Vocabularies for Image Categorization[C]//2007 IEEE Conference on Computer Vision and Pattern Recognition.2007:1-8.
[7]SÁNCHEZ J,MENSINK T,VERBEEK J.Image Classification with the Fisher Vector:Theory and Practice[J].International Journal of Computer Vision,2013,105(1):222-245.
[8]LOWE D G.Object recognition from local scale-invariant fea-tures[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision.1999:1150-1157.
[9]DALAL N,TRIGGS B.Histograms of Oriented Gradients for Human Detection[C]//IEEE Computer Society Conference on Computer Vision & Pattern Recognition.2005.
[10]DONAHUE J,JIA Y Q,VINYALS O,et al.DeCAF:A Deep Convolutional Activation Feature for Generic Visual Recognition[C]//Proceedings of the 31st International Conference on Machine Learning.PMLR,2014:647-655.
[11]HE K,ZHANG X,REN S,et al.Deep Residual Learning forImage Recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2016:770-778.
[12]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//ICLR.2015:1-14.
[13]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the Inception Architecture for Computer Vision[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2016:2818-2826.
[14]XIE L,HUANG C.A Residual Network of Water Scene Recognition Based on Optimized Inception Module and Convolutional Block Attention Module[C]//2019 6th International Conference on Systems and Informatics (ICSAI).2019:1174-1178.
[15]SUN G,CHOLAKKAL H,KHAN S,et al.Fine-Grained Recognition:Accounting for Subtle Differences between Similar Classes[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(1):12047-12054.
[16]TAN M,WANG G,ZHOU J,et al.Fine-Grained Classification via Hierarchical Bilinear Pooling With Aggregated Slack Mask[J].IEEE Access,2017,7(1):117944-117953.
[17]YAO B,BRADSKI G,LI F F.A codebook-free and annotation-free approach for fine-grained image categorization[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition.2012:3466-3473.
[18]CHERIYADAT A M.Unsupervised Feature Learning for Aerial Scene Classification[J].IEEE Transactions on Geoscience and Remote Sensing,2014,52(1):439-451.
[19]ZHANG N,DONAHUE J,GIRSHICK R,et al.Part-based R-CNNs for Fine-grained Category Detection[C]//European Conference on Computer Vision(ECCV).2014:834-849.
[20]HE K,GKIOXARI G,DOLLÁR P,et al.Mask R-CNN[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2020,42(2):386-397.
[21]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//2015 IEEE Confe-rence on Computer Vision and Pattern Recognition (CVPR).2015:3431-3440.
[22]GE W,LIN X,YU Y.Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification From the Bottom Up[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019:3029-3038.
[23]XIAO T J,XU Y C,YANG K Y,et al.The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).2015:842-850.
[24]LIN T,ROYCHOWDHURY A,MAJI S.Bilinear CNN Models for Fine-Grained Visual Recognition[C]//2015 IEEE International Conference on Computer Vision (ICCV).2015:1449-1457.
[25]ZHOU M,BAI Y,ZHANG W,et al.Look-Into-Object:Self-Supervised Structure Modeling for Object Recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2020:11771-11780.
[26]LIU C,XIE H,ZHA Z J,et al.Filtration and Distillation:Enhancing Region Attention for Fine-Grained Visual Categorization[C]//AAAI Conference on Artificial Intelligence.2020:11555-11562.
[27]HUANG S,WANG X,DAO D.SnapMix:Semantically Proportional Mixing for Augmenting Fine-grained Data[C]//AAAI Conference on Artificial Intelligence.2021:1-8.
[28]WU J,XU J,DING T.Fine-grained Image Classification Algorithm Based on Ensemble Methods of Transfer Learning[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(3):452-458.
[29]ZHENG H,FU J,MEI T,et al.Learning Multi-attention Con-volutional Neural Network for Fine-Grained Image Recognition[C]//2017 IEEE International Conference on Computer Vision (ICCV).2017:5219-5227.
[30]SUN M,YUAN Y,ZHOU F,et al.Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition[C]//European Conference on Computer Vision(ECCV).2018:834-850.
[31]YANG Z,LUO T,WANG D,et al.Springer International Publishing Learning to Navigate for Fine-Grained Classification[C]//European Conference on Computer Vision(ECCV).2018:438-454.
[32]LUO W,ZHANG H,LI J,et al.Learning Semantically En-hanced Feature for Fine-Grained Image Classification[J].IEEE Signal Processing Letters,2020,27:1545-1549.
[33]CHEN Y,BAI Y,ZHANG W,et al.Destruction and Construction Learning for Fine-Grained Image Recognition[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2019:5152-5161.
[34]HU T,QI H.See Better Before Looking Closer:Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification[J/OL].https://arxiv.org/abs/1901.09891.
[35]ZHAO B,WU X,FENG J,et al.Diversified Visual Attention Networks for Fine-Grained Object Classification[J].IEEE Transactions on Multimedia,2017,19(6):1245-1256.
[36]DUBEY A,GUPTA O,GUO P,et al.Pairwise Confusion for Fine-Grained Visual Classification[C]//European Conference on Computer Vision(ECCV).2018:71-88.
[1] 周乐员, 张剑华, 袁甜甜, 陈胜勇.
多层注意力机制融合的序列到序列中国连续手语识别和翻译
Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion
计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2] 李宗民, 张玉鹏, 刘玉杰, 李华.
基于可变形图卷积的点云表征学习
Deformable Graph Convolutional Networks Based Point Cloud Representation Learning
计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3] 陈泳全, 姜瑛.
基于卷积神经网络的APP用户行为分析方法
Analysis Method of APP User Behavior Based on Convolutional Neural Network
计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4] 朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥.
基于注意力机制的医学影像深度哈希检索算法
Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism
计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6] 张颖涛, 张杰, 张睿, 张文强.
全局信息引导的真实图像风格迁移
Photorealistic Style Transfer Guided by Global Information
计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[7] 戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮.
基于DNGAN的磁共振图像超分辨率重建算法
Super-resolution Reconstruction of MRI Based on DNGAN
计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[8] 刘月红, 牛少华, 神显豪.
基于卷积神经网络的虚拟现实视频帧内预测编码
Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network
计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[9] 徐鸣珂, 张帆.
Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法
Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition
计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[10] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[11] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[12] 王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤.
不同数据增强方法对模型识别精度的影响
Influence of Different Data Augmentation Methods on Model Recognition Accuracy
计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[13] 孙洁琪, 李亚峰, 张文博, 刘鹏辉.
基于离散小波变换的双域特征融合深度卷积神经网络
Dual-field Feature Fusion Deep Convolutional Neural Network Based on Discrete Wavelet Transformation
计算机科学, 2022, 49(6A): 434-440. https://doi.org/10.11896/jsjkx.210900199
[14] 杨玥, 冯涛, 梁虹, 杨扬.
融合交叉注意力机制的图像任意风格迁移
Image Arbitrary Style Transfer via Criss-cross Attention
计算机科学, 2022, 49(6A): 345-352. https://doi.org/10.11896/jsjkx.210700236
[15] 杨健楠, 张帆.
一种结合双注意力机制和层次网络结构的细碎农作物分类方法
Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure
计算机科学, 2022, 49(6A): 353-357. https://doi.org/10.11896/jsjkx.210200169
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!