计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 119-125.doi: 10.11896/jsjkx.220900196
郑世杰, 王高才
ZHENG Shijie, WANG Gaocai
摘要: 针对细粒度图像分类中高类内差异和低类间差异的挑战,提出一种以ConvNeXt网络为主干,使用GradCAM热图进行裁剪和注意力擦除的多分支细粒度图像分类方法。该方法利用GradCAM通过梯度回流得到网络的注意力热图,定位到具有判别性特征的区域,裁剪并放大该区域,使网络关注局部更深层次的特征。同时引入有监督的对比学习,扩大类间差异,减小类内差异。最后进行热图注意力擦除操作,使网络在关注最具判别性特征的前提下,也能关注其他对分类有用的区域。所提方法在CUB-200-2011,Stanford Cars,FGVC Aircraft和Stanford Dogs数据集上的分类准确率分别达到了91.8%,94.9%,94.0%,94.4%,优于多种主流的细粒度图像分类方法,并且在CUB-200-2011和Stanford Dogs数据集上分别达到了top-3和top-1的分类准确率。
中图分类号:
[1]KRAUSE J,STARK M,DENG J,et al.3D Object Representations for Fine-Grained Categorization[C]//IEEE International Conference on Computer Vision Workshops.IEEE,2014. [2]ELINDER P,BRANSON S,MITA T,et al.The caltech-ucsdbirds-200-2011 dataset [R].California Institute of Technology,2011. [3]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013. [4]ZHANG F,LI M,ZHAI G,et al.Multi-branch and multi-scale attention learning for fine-grained visual categorization[C]//International Conference on Multimedia Modeling.Cham:Sprin-ger,2021:136-147. [5]HU T,QI H,HUANG Q,et al.See better before looking clo-ser:Weakly supervised data augmentation network for fine-grained visual classification[J].arXiv:1901.09891,2019. [6]HU Y,JIN X,ZHANG Y,et al.Rams-trans:Recurrent atten-tion multi-scale transformer for fine grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4239-4248. [7]ZHANG Y,CAO J,ZHANG L,et al.A free lunchfrom ViT:adaptive attention multi-scale fusion Transformer for fine-grained visual recognition[C]//2022 IEEE International Confe-rence on Acoustics,Speech and Signal Processing)ICASSP 2022).IEEE,2022:3234-3238. [8]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[J].International Journal of Computer Vision,2020,128(2):336-359. [9]LIU Z,MAO H,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986. [10]BRANSON S,VAN HORN G,BELONGIE S,et al.Bird species categorization using pose normalizeddeep convolutional nets[J].arXiv:1406.2952,2014. [11]HUANG S,XU Z,TAO D,et al.Part-Stacked CNN for Fine-Grained Visual Categorization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016. [12]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440. [13]LAM M,MAHASSENI B,TODOROVIC S.Fine-Grained Re-cognition as HSnet Search for InformativeImage Parts[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017. [14]FU J,ZHENG H,TAO M.Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE Conference onComputer Vision &Pattern Recognition.IEEE,2017. [15]ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//2017 IEEE International Conference on Computer Vision.IEEE,2017. [16]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNNs forfine-grained visual recognition[J].arXiv:1504.07889,2015. [17]HANSELMANN H,NEY H.Fine-grained visual classificationwith efficient end-to-end localization [J].arXiv:2005.05123,2020. [18]OQUAB M,BOTTOU L,LAPTEV I,et al.Is object localization for free?-weakly-supervised learning with convolutional neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2015:685-694. [19]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [20]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022. [21]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017. [22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [23]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607. [24]HE K,FAN H,WU Y,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738. [25]CHEN X,FAN H,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020. [26]KHOSLA P,TETERWAK P,WANG C,et al.Super-vised con-trastive learning[J].Advances in Neural Information Processing Systems,2020,33:18661-18673. [27]HE J,CHEN J N,LIU S,et al.TransFG:A transformer architecture for fine-grained recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:852-860. [28]SONG Y,SEBE N,WANG W.On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition[J].ar-Xiv:2205.13282,2022. [29]CHANG D,PANG K,ZHENG Y,et al.Your “Flamingo” is My “Bird”:Fine-Grained,or Not[C]//Computer Vision and Pattern Recognition.IEEE,2021. [30]ZHUANG P,WANG Y,QIAO Y.Learning attentive pairwise interaction for fine-grained classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):13130-13137. [31]RAO Y,CHEN G,LU J,et al.Counterfactual attention learning for fine-grained visual categorization and reidentification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:1025-1034. [32]DO T,TRAN H,TJIPUTRA E,et al.Fine-Grained Visual Classification using Self Assessment Classifier[J].arXiv:2205.10529,2022. [33]WANG J,YU X,GAO Y.Feature fusion vision transformer for fine-grained visual categorization[J].arXiv:2107.02341,2021. |
|