Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230500196-7.doi: 10.11896/jsjkx.230500196

• Image Processing & Multimedia Technolog • Previous Articles     Next Articles

ConvNeXt Feature Extraction Study for Image Data

YANG Pengyue, WANG Feng, WEI Wei   

  1. School of Computer & Information Technology,Shanxi University,Taiyuan 030006,China
  • Published:2024-06-06
  • About author:YANG Pengyue,born in 1999,postgra-duate.His main research interests include image processing,data mining,and machine learning.
    WANG Feng,born in 1984,Ph.D,is a member of CCF(No.36494M).Her main research interests include data mining,machine learning,and granular computing.
  • Supported by:
    National Natural Science Fundation of China(62276158) and Research Project Supported by Shanxi Scholarship Council of China(2021-007).

Abstract: Convolutional neural networks have achieved many results in computer vision tasks,both in target detection and segmentation,which depend on the extracted feature information.Some problems such as ambiguous data and varying object shapes pose great challenges for feature extraction.The traditional convolutional structure can only learn the contextual information of the neighboring spatial locations of the feature map and cannot extract the global information,while models such as the self-attentive mechanism,although having a larger perceptual field and establishing global dependencies,are insufficient due to their high computational complexity and the need for large amounts of data.Therefore,this paper proposes a model combining CNN and LSTM,which can better combine the global information of image data while enhancing the local perceptual field.It uses the backbone network ConvNeXt-T as the base model to solve the problem of different object shapes by splicing different size convolutional kernels to fuse multi-scale features,and aggregates two-way long and short-term memory networks from both horizontal and vertical directions.Focus on the interactivity of global and local information.Experiments are conducted on publicly accessible CIFAR-10,CIFAR-100,and Tiny ImageNet datasets for image classification tasks,and the accuracy of the proposed network improves 3.18%,2.91%,and 1.03% in the three datasets respectively,compared to the base model ConvNeXt-T.Experiments demonstrate that the improved ConvNeXt-T network has substantially improved the number of parameters and accuracy compared with the base model,and can extract more effective feature information.

Key words: Feature extraction, Local receptive field, ConvNeXt-T, Multi-scale features, Bidirectional long and short-term memory network

CLC Number: 

  • TP391
[1]HE K M,ZHANG X Y,REN S Q,et al.Delving deep into rectifiers:Surpassing human-level performance on imagenet classification[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1026-1034.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[3]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[4]GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014:580-587.
[5]HE K M,GKIOXARI G,DOLLAR P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2961-2969.
[6]RUSSAKOVSKY O,JIA D,HAO S,et al.Imagenet large scale visual recognition challenge[J].International Journal of Computer Vision,2015,115:211-252.
[7]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[J].arXiv:1406.2199,2014.
[8]SZEGEDY C,LIU W,JIA Y,et al.Going deeper with convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1-9.
[9]XIE S N,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[10]HUAN G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[11]HOWARD A G,ZHU M L,CHEN B,et al.Mobilenets:Effi-cient convolutional neural networks for mobile vision applications[J].arXiv:1704.04861,2017.
[12]TAN M X,LE Q.Efficientnet:Rethinking model scaling forconvolutional neural networks[C]//International Conference on Machine Learning.PMLR,2019:6105-6114.
[13]RADOSAVOVIC I,KOSARAJU R P,GIRSHICK R,et al.Designing network design spaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10428-10436.
[14]WANG W H,BAO H B,DONG L,et al.Image as a foreign language:Beit pretraining for all vision and vision-language tasks[J].arXiv:2208.10442,2022.
[15]JIA M L,TANG L M,CHEN B C,et al.Visual prompt tuning[C]//17th European Conference Computer Vision(ECCV 2022).Tel Aviv,Israel,Part XXXIII.Cham:Springer Nature Switzerland,2022:709-727.
[16]BAHNG H,JAHANIAN A,SANKARANARAYANAN S,et al.Visual prompting:Modifying pixel space to adapt pre-trained models[J].arXiv:2203.17274,2022.
[17]JIA M L,WU Z X,REITER A,et al.Exploring visual engagement signals for representation learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:4206-4217.
[18]YANG T J N,ZHU Y,XIE Y S,et al.Aim:Adapting image models for efficient video action recognition[J].arXiv:2302.03024,2023.
[19]GRAVES A,WAYNE G,DANIHELKA I.Neural turing ma-chines[J].arXiv:1410.5401,2014.
[20]BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv:1409.0473,2014.
[21]LUONG M T,PHAM H,MANNING C D.Effective approaches to attention-based neural machine translation[J].arXiv:1508.04025,2015.
[22]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].arXiv:1706.03762,2017.
[23]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[24]LIU Y,OTT M,GOYAL N,et al.Roberta:A robustly opti-mized bert pretraining approach[J].arXiv:1907.11692,2019.
[25]YANG Z L,DAI Z H,YANG Y M,et al.Xlnet:Generalized autoregressive pretraining for language understanding[J].arXiv:1906,08237,2019.
[26]LU J,BATRA D,PARIKH D,et al.Vilbert:Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[J].arXiv:1908.02265,2019.
[27]SU W,ZHU X,CAO Y,et al.Vl-bert:Pre-training of generic visual-linguistic representations[J].arXiv:1908.08530,2019.
[28]BERTASIUS G,WANG H,TORRESANI L.Is space-time attention all you need for video understanding?[C]//ICML.2021.
[29]GIRDHAR R,CARREIRA J,DOERSCH C,et al.Video action transformer network[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:244-253.
[30]WANG F,JIANG M Q,QIAN C,et al.Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3156-3164.
[31]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[32]BELLO I,ZOPH B,VASWANI A,et al.Attention augmented convolutional networks[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2019:3286-3295.
[33]ZHANG H,GOODFELLOW I,METAXAS D,et al.Self-attention generative adversarial networks[C]//International Confe-rence on Machine Learning.PMLR,2019:7354-7363.
[34]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An image is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[35]LIU Z,LIN Y T,CAO Y,et al.Swin transformer:Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:10012-10022.
[36]TOLSTIKHIN I O,HOULSBY N,KOLESNIKOV A,et al.Mlp-mixer:An all-mlp architecture for vision[J].Advances in Neural Information Processing Systems,2021,34:24261-24272.
[37]LIU H X,DAI Z H,SO D,et al.Pay attention to mlps[J].Advances in Neural Information Processing Systems,2021,34:9204-9215.
[38]LIU Z,MAO H Z,WU C Y,et al.A convnet for the 2020s[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:11976-11986.
[39]TANG D Y,QIN B,LIU T.Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in natural Language Processing.2015:1422-1432.
[40]LAI S W,XU L H,LIU K,et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2015.
[41]LE Y,YANG X.Tiny imagenet visual recognition challenge[J].CS 231N,2015,7(7):3.
[42]ZHANG H Y,CISSE M,DAUPHIN Y N,et al.mixup:Beyond empirical risk minimization[J].arXiv:1710.09412,2017.
[43]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[J].https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
[44]GRAVES A.Long short-term memory[J].Supervised Se-quence Labelling with Recurrent Neural Networks,2012:37-45.
[45]SUNKARA R,LUO T.No more strided convolutions or poo-ling:a new CNN building block for low-resolution images and small objects[C]//Machine Learning and Knowledge Discovery in Databases:European Conference(ECML PKDD 2022).Grenoble,France,Part III.Cham:Springer Nature Switzerland,2023:443-459.
[46]SANDLER M,HOWARD A,ZHU M,et al.Mobilenetv2:Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4510-4520.
[47]SZEGEDY C,VANHOUCKE V,IOFFE S,et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2818-2826.
[48]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[49]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[1] LANG Lang, CHEN Xiaoqin, LIU Sha, ZHOU Qiang. Detection of Pitting Defects on the Surface of Ball Screw Drive Based on Improved Deeplabv3+ Algorithm [J]. Computer Science, 2024, 51(6A): 240200058-6.
[2] WANG Yanlin, SUN Jing, YANG Hongbo, GUO Tao, PAN Jiahua, WANG Weilian. Classification Model of Heart Sounds in Pulmonary Hypertension Based on Time-Frequency Fusion Features [J]. Computer Science, 2024, 51(6A): 230800091-7.
[3] TIAN Shuaihua, LI Zheng, WU Yonghao, LIU Yong. Identifying Coincidental Correct Test Cases Based on Machine Learning [J]. Computer Science, 2024, 51(6): 68-77.
[4] SUN Jing, WANG Xiaoxia. Convolutional Neural Network Model Compression Method Based on Cloud Edge Collaborative Subclass Distillation [J]. Computer Science, 2024, 51(5): 313-320.
[5] SONG Hao, MAO Kuanmin, ZHU Zhou. Algorithm of Stereo Matching Based on GAANET [J]. Computer Science, 2024, 51(4): 229-235.
[6] ZHANG Yang, XIA Ying. Object Detection Method with Multi-scale Feature Fusion for Remote Sensing Images [J]. Computer Science, 2024, 51(3): 165-173.
[7] WANG Wenmiao. Prediction of Lower Limb Joint Angle Based on VMD-ELMAN Electromyographic Signals [J]. Computer Science, 2024, 51(3): 257-264.
[8] ZHAO Jiangfeng, HE Hongjie, CHEN Fan, YANG Shubin. Two-stage Visible Watermark Removal Model Based on Global and Local Features for Document Images [J]. Computer Science, 2024, 51(2): 172-181.
[9] FU Xiong, NIE Xiaohan, WANG Junchang. Study on Android Fake Application Detection Method Based on Interface Similarity [J]. Computer Science, 2023, 50(6A): 220300114-7.
[10] LIU Hongyi, WANG Rui, WU Guanfeng, ZHANG Yang. Diesel Engine Fault Diagnosis Based on Kernel Robust Manifold Nonnegative Matrix Factorizationand Fusion Features [J]. Computer Science, 2023, 50(6A): 220400128-8.
[11] HUANG Xundi, PANG Xiongwen. Review of Intelligent Device Fault Diagnosis Based on Deep Learning [J]. Computer Science, 2023, 50(5): 93-102.
[12] HU Shaokai, HE Xiaohui, TIAN Zhihui. Land Use Multi-classification Method of High Resolution Remote Sensing Images Based on MLUM-Net [J]. Computer Science, 2023, 50(5): 161-169.
[13] WU Han, NIE Jiahao, ZHANG Zhaowei, HE Zhiwei, GAO Mingyu. Deep Learning-based Visual Multiple Object Tracking:A Review [J]. Computer Science, 2023, 50(4): 77-87.
[14] YANG Xiaoyu, LI Chao, CHEN Shunyao, LI Haoliang, YIN Guangqiang. Text-Image Cross-modal Retrieval Based on Transformer [J]. Computer Science, 2023, 50(4): 141-148.
[15] BAI Xuefei, MA Yanan, WANG Wenjian. Segmentation Method of Edge-guided Breast Ultrasound Images Based on Feature Fusion [J]. Computer Science, 2023, 50(3): 199-207.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!