计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 119-125.doi: 10.11896/jsjkx.220900196

• 计算机图形学&多媒体 • 上一篇    下一篇

基于ConvNeXt热图定位和对比学习的细粒度图像分类研究

郑世杰, 王高才   

  1. 广西大学计算机与电子信息学院 南宁530004
  • 收稿日期:2022-09-20 修回日期:2022-12-06 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 王高才(wanggcgx@163.com)
  • 作者简介:(2371363651@qq.com)
  • 基金资助:
    国家自然科学基金(62062007)

Study on Fine-grained Image Classification Based on ConvNeXt Heatmap Localization and Contrastive Learning

ZHENG Shijie, WANG Gaocai   

  1. School of Computer and Electronic Information,Guangxi University,Nanning 530004,China
  • Received:2022-09-20 Revised:2022-12-06 Online:2023-10-10 Published:2023-10-10
  • About author:ZHENG Shijie,born in 1999,postgra-duate candidate.His main research in-terests include fine-grained image classification and image segmentation.WANG Gaocai,born in 1976,Ph.D,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include computer network,performance evaluation and network security.
  • Supported by:
    National Natural Science Foundation of China(62062007).

摘要: 针对细粒度图像分类中高类内差异和低类间差异的挑战,提出一种以ConvNeXt网络为主干,使用GradCAM热图进行裁剪和注意力擦除的多分支细粒度图像分类方法。该方法利用GradCAM通过梯度回流得到网络的注意力热图,定位到具有判别性特征的区域,裁剪并放大该区域,使网络关注局部更深层次的特征。同时引入有监督的对比学习,扩大类间差异,减小类内差异。最后进行热图注意力擦除操作,使网络在关注最具判别性特征的前提下,也能关注其他对分类有用的区域。所提方法在CUB-200-2011,Stanford Cars,FGVC Aircraft和Stanford Dogs数据集上的分类准确率分别达到了91.8%,94.9%,94.0%,94.4%,优于多种主流的细粒度图像分类方法,并且在CUB-200-2011和Stanford Dogs数据集上分别达到了top-3和top-1的分类准确率。

关键词: 细粒度图像分类, 注意力, 有监督对比学习, 热图, 多分支

Abstract: Aiming at the challenges of high intra-class disparity and low inter-class disparity in fine-grained image classification,a multi-branch fine-grained image classification method based on ConvNeXt network and using GradCAM heatmap for cropping and attention erasure is proposed.This method uses GradCAM to obtain the attention heatmap of the network through gradient reflow,locates the region with discriminative features,crops and enlarges the region,and makes the network focus on local deeper features.At the same time,supervised contrastive learning is introduced to expand between-class differences and reduce intra-class differences.Finally,a heatmap attention erasure operation is performed to enable the network to focus on other regions useful for classification while focusing on the most discriminative features.The proposed method achieves 91.8%,94.9%,94.0%,and 94.4% classification accuracy on CUB-200-2011,Stanford Cars,FGVC Aircraft,and Stanford Dogs datasets,respectively,which is better than many mainstream fine-grained image classification methods.And this method achieves top-3 and top-1 classification accuracy on the CUB-200-2011 and Stanford Dogs datasets,respectively.

Key words: Fine-grained image classification, Attention, Supervised contrastive learning, Heatmap, Multi-branch

中图分类号: 

  • TP391
[1]KRAUSE J,STARK M,DENG J,et al.3D Object Representations for Fine-Grained Categorization[C]//IEEE International Conference on Computer Vision Workshops.IEEE,2014.
[2]ELINDER P,BRANSON S,MITA T,et al.The caltech-ucsdbirds-200-2011 dataset [R].California Institute of Technology,2011.
[3]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013.
[4]ZHANG F,LI M,ZHAI G,et al.Multi-branch and multi-scale attention learning for fine-grained visual categorization[C]//International Conference on Multimedia Modeling.Cham:Sprin-ger,2021:136-147.
[5]HU T,QI H,HUANG Q,et al.See better before looking clo-ser:Weakly supervised data augmentation network for fine-grained visual classification[J].arXiv:1901.09891,2019.
[6]HU Y,JIN X,ZHANG Y,et al.Rams-trans:Recurrent atten-tion multi-scale transformer for fine grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4239-4248.
[7]ZHANG Y,CAO J,ZHANG L,et al.A free lunchfrom ViT:adaptive attention multi-scale fusion Transformer for fine-grained visual recognition[C]//2022 IEEE International Confe-rence on Acoustics,Speech and Signal Processing)ICASSP 2022).IEEE,2022:3234-3238.
[8]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[J].International Journal of Computer Vision,2020,128(2):336-359.
[9]LIU Z,MAO H,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986.
[10]BRANSON S,VAN HORN G,BELONGIE S,et al.Bird species categorization using pose normalizeddeep convolutional nets[J].arXiv:1406.2952,2014.
[11]HUANG S,XU Z,TAO D,et al.Part-Stacked CNN for Fine-Grained Visual Categorization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016.
[12]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[13]LAM M,MAHASSENI B,TODOROVIC S.Fine-Grained Re-cognition as HSnet Search for InformativeImage Parts[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017.
[14]FU J,ZHENG H,TAO M.Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE Conference onComputer Vision &Pattern Recognition.IEEE,2017.
[15]ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//2017 IEEE International Conference on Computer Vision.IEEE,2017.
[16]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNNs forfine-grained visual recognition[J].arXiv:1504.07889,2015.
[17]HANSELMANN H,NEY H.Fine-grained visual classificationwith efficient end-to-end localization [J].arXiv:2005.05123,2020.
[18]OQUAB M,BOTTOU L,LAPTEV I,et al.Is object localization for free?-weakly-supervised learning with convolutional neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2015:685-694.
[19]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[20]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[21]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[23]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[24]HE K,FAN H,WU Y,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738.
[25]CHEN X,FAN H,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020.
[26]KHOSLA P,TETERWAK P,WANG C,et al.Super-vised con-trastive learning[J].Advances in Neural Information Processing Systems,2020,33:18661-18673.
[27]HE J,CHEN J N,LIU S,et al.TransFG:A transformer architecture for fine-grained recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:852-860.
[28]SONG Y,SEBE N,WANG W.On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition[J].ar-Xiv:2205.13282,2022.
[29]CHANG D,PANG K,ZHENG Y,et al.Your “Flamingo” is My “Bird”:Fine-Grained,or Not[C]//Computer Vision and Pattern Recognition.IEEE,2021.
[30]ZHUANG P,WANG Y,QIAO Y.Learning attentive pairwise interaction for fine-grained classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):13130-13137.
[31]RAO Y,CHEN G,LU J,et al.Counterfactual attention learning for fine-grained visual categorization and reidentification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:1025-1034.
[32]DO T,TRAN H,TJIPUTRA E,et al.Fine-Grained Visual Classification using Self Assessment Classifier[J].arXiv:2205.10529,2022.
[33]WANG J,YU X,GAO Y.Feature fusion vision transformer for fine-grained visual categorization[J].arXiv:2107.02341,2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!