基于ConvNeXt热图定位和对比学习的细粒度图像分类研究

doi:10.11896/jsjkx.220900196

Abstract

Abstract: Aiming at the challenges of high intra-class disparity and low inter-class disparity in fine-grained image classification,a multi-branch fine-grained image classification method based on ConvNeXt network and using GradCAM heatmap for cropping and attention erasure is proposed.This method uses GradCAM to obtain the attention heatmap of the network through gradient reflow,locates the region with discriminative features,crops and enlarges the region,and makes the network focus on local deeper features.At the same time,supervised contrastive learning is introduced to expand between-class differences and reduce intra-class differences.Finally,a heatmap attention erasure operation is performed to enable the network to focus on other regions useful for classification while focusing on the most discriminative features.The proposed method achieves 91.8%,94.9%,94.0%,and 94.4% classification accuracy on CUB-200-2011,Stanford Cars,FGVC Aircraft,and Stanford Dogs datasets,respectively,which is better than many mainstream fine-grained image classification methods.And this method achieves top-3 and top-1 classification accuracy on the CUB-200-2011 and Stanford Dogs datasets,respectively.

Key words: Fine-grained image classification, Attention, Supervised contrastive learning, Heatmap, Multi-branch

CLC Number:

TP391

ZHENG Shijie, WANG Gaocai. Study on Fine-grained Image Classification Based on ConvNeXt Heatmap Localization and Contrastive Learning[J].Computer Science, 2023, 50(10): 119-125.

References

[1]KRAUSE J,STARK M,DENG J,et al.3D Object Representations for Fine-Grained Categorization[C]//IEEE International Conference on Computer Vision Workshops.IEEE,2014.
[2]ELINDER P,BRANSON S,MITA T,et al.The caltech-ucsdbirds-200-2011 dataset [R].California Institute of Technology,2011.
[3]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013.
[4]ZHANG F,LI M,ZHAI G,et al.Multi-branch and multi-scale attention learning for fine-grained visual categorization[C]//International Conference on Multimedia Modeling.Cham:Sprin-ger,2021:136-147.
[5]HU T,QI H,HUANG Q,et al.See better before looking clo-ser:Weakly supervised data augmentation network for fine-grained visual classification[J].arXiv:1901.09891,2019.
[6]HU Y,JIN X,ZHANG Y,et al.Rams-trans:Recurrent atten-tion multi-scale transformer for fine grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4239-4248.
[7]ZHANG Y,CAO J,ZHANG L,et al.A free lunchfrom ViT:adaptive attention multi-scale fusion Transformer for fine-grained visual recognition[C]//2022 IEEE International Confe-rence on Acoustics,Speech and Signal Processing)ICASSP 2022).IEEE,2022:3234-3238.
[8]SELVARAJU R R,COGSWELL M,DAS A,et al.Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization[J].International Journal of Computer Vision,2020,128(2):336-359.
[9]LIU Z,MAO H,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986.
[10]BRANSON S,VAN HORN G,BELONGIE S,et al.Bird species categorization using pose normalizeddeep convolutional nets[J].arXiv:1406.2952,2014.
[11]HUANG S,XU Z,TAO D,et al.Part-Stacked CNN for Fine-Grained Visual Categorization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2016.
[12]LONG J,SHELHAMER E,DARRELL T.Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3431-3440.
[13]LAM M,MAHASSENI B,TODOROVIC S.Fine-Grained Re-cognition as HSnet Search for InformativeImage Parts[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2017.
[14]FU J,ZHENG H,TAO M.Look Closer to See Better:Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//IEEE Conference onComputer Vision &Pattern Recognition.IEEE,2017.
[15]ZHENG H,FU J,TAO M,et al.Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition[C]//2017 IEEE International Conference on Computer Vision.IEEE,2017.
[16]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNNs forfine-grained visual recognition[J].arXiv:1504.07889,2015.
[17]HANSELMANN H,NEY H.Fine-grained visual classificationwith efficient end-to-end localization [J].arXiv:2005.05123,2020.
[18]OQUAB M,BOTTOU L,LAPTEV I,et al.Is object localization for free?-weakly-supervised learning with convolutional neural networks[C]//Proceedings of the IEEE Conference on Compu-ter Vision and Pattern Recognition.2015:685-694.
[19]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[20]LIU Z,LIN Y,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[21]HOWARD A G,ZHU M,CHEN B,et al.Mobilenets:Efficient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[22]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[23]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[24]HE K,FAN H,WU Y,et al.Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:9729-9738.
[25]CHEN X,FAN H,GIRSHICK R,et al.Improved baselines with momentum contrastive learning[J].arXiv:2003.04297,2020.
[26]KHOSLA P,TETERWAK P,WANG C,et al.Super-vised con-trastive learning[J].Advances in Neural Information Processing Systems,2020,33:18661-18673.
[27]HE J,CHEN J N,LIU S,et al.TransFG:A transformer architecture for fine-grained recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:852-860.
[28]SONG Y,SEBE N,WANG W.On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition[J].ar-Xiv:2205.13282,2022.
[29]CHANG D,PANG K,ZHENG Y,et al.Your “Flamingo” is My “Bird”:Fine-Grained,or Not[C]//Computer Vision and Pattern Recognition.IEEE,2021.
[30]ZHUANG P,WANG Y,QIAO Y.Learning attentive pairwise interaction for fine-grained classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(7):13130-13137.
[31]RAO Y,CHEN G,LU J,et al.Counterfactual attention learning for fine-grained visual categorization and reidentification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:1025-1034.
[32]DO T,TRAN H,TJIPUTRA E,et al.Fine-Grained Visual Classification using Self Assessment Classifier[J].arXiv:2205.10529,2022.
[33]WANG J,YU X,GAO Y.Feature fusion vision transformer for fine-grained visual categorization[J].arXiv:2107.02341,2021.

Related Articles 15

[1]	LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[2]	WANG Huaiqin, LUO Jian, WANG Haiyan. Feature Weight Perception-based Prediction of Virtual Network Function Resource Demands [J]. Computer Science, 2023, 50(9): 331-336.
[3]	WANG Wei, DU Xiangcheng, JIN Cheng. Image Relighting Network Based on Context-gated Residuals and Multi-scale Attention [J]. Computer Science, 2023, 50(9): 168-175.
[4]	CHEN Guojun, YUE Xueyan, ZHU Yanning, FU Yunpeng. Study on Building Extraction Algorithm of Remote Sensing Image Based on Multi-scale Feature Fusion [J]. Computer Science, 2023, 50(9): 202-209.
[5]	BAI Zhengyao, XU Zhu, ZHANG Yihan. Deep Artificial Correspondence Generation for 3D Point Cloud Registration [J]. Computer Science, 2023, 50(9): 210-219.
[6]	LI Xiang, FAN Zhiguang, LIN Nan, CAO Yangjie, LI Xuexiang. Self-supervised Learning for 3D Real-scenes Question Answering [J]. Computer Science, 2023, 50(9): 220-226.
[7]	YI Liu, GENG Xinyu, BAI Jing. Hierarchical Multi-label Text Classification Algorithm Based on Parallel Convolutional Network Information Fusion [J]. Computer Science, 2023, 50(9): 278-286.
[8]	LUO Yuanyuan, YANG Chunming, LI Bo, ZHANG Hui, ZHAO Xujian. Chinese Medical Named Entity Recognition Method Incorporating Machine ReadingComprehension [J]. Computer Science, 2023, 50(9): 287-294.
[9]	ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[10]	TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[11]	YANG Zhizhuo, XU Lingling, Zhang Hu, LI Ru. Answer Extraction Method for Reading Comprehension Based on Frame Semantics and GraphStructure [J]. Computer Science, 2023, 50(8): 170-176.
[12]	WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[13]	WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[14]	YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
[15]	DAI Xuesong, LI Xiaohong, ZHANG Jingjing, QI Meibin, LIU Yimin. Unsupervised Domain Adaptive Pedestrian Re-identification Based on Counterfactual AttentionLearning [J]. Computer Science, 2023, 50(7): 160-166.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Study on Fine-grained Image Classification Based on ConvNeXt Heatmap Localization and Contrastive Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0