Computer Science ›› 2024, Vol. 51 ›› Issue (4): 254-261.doi: 10.11896/jsjkx.230200140

• Computer Graphics & Multimedia • Previous Articles     Next Articles

Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization

ZENG Ruiren, XIE Jiangtao, LI Peihua   

  1. School of Information and Communication Engineering,Dalian University of Technology,Dalian,Liaoning 116024,China
  • Received:2023-02-20 Revised:2023-05-28 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundationof China(61971086).

Abstract: Recent research work shows that matrix normalization plays a key role in global covariance pooling,which helps to generate more discriminative representations,thus improving the performance of image recognition tasks.For different normalization methods,the matrix structure-wise normalization can make full use of the geometric structure of the covariance matrix,so it can obtain better performance.However,the structure-wise normalization generally depends on singular value decomposition(SVD) or eigenvalue decomposition(EIG) with high computational cost,which limits parallel computing ability of GPUs,beco-ming a computational bottleneck.Iterative matrix square root normalization(iSQRT) uses Newton-Schulz iteration to normalize the covariance matrix,which is faster than the methods based on SVD and EIG.However,with the increase of the number of itera-tions and dimensions,the time and memory of iSQRT will increase significantly,and this method cannot complete the normalization of general power,which limits its application scope.To solve the above problems,a covariance matrix normalization method based on the maximum singular value power is proposed by dividing the covariance matrix by the power of its maximum singular value which only depends on iterative power method to estimate the maximum singular value of the matrix.Detailed ablation experiments show that,compared with iSQRT,the proposed method is faster and occupies less memory,and is superior to iSQRT in terms of time complexity and space complexity,and its performance is comparable to or better than iSQRT.The proposed method has achieved state-of-the-art performance in large-scale image classification dataset and fine-grained visual recognition datasets,including Aircraft,Cars and Indoor67,where accuracy is 90.7%,93.3% and 83.9% respectively.The result fully demonstrates the robustness and generalization of the proposed method.

Key words: Image classification, Global covariance pooling, Matrix power normalization, Maximum singular value power normalization

CLC Number: 

  • TP391
[1]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN mo-dels for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1449-1457.
[2]IONESCU C,VANTZOS O,SMINCHISESCU C.Matrix backpropagation for deep networks with structured layers[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2965-2973.
[3]LI P H,XIE J T,WANG Q L,et al.Is second-order information helpful for large-scale visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2070-2078.
[4]LIN T Y,MAJI S.Improved Bilinear Pooling with CNNs[C]//Proceedings of the British Machine Vision Conference.2017:117.1-117.12.
[5]DIBA A,SHARMA V,VAN GOOL L.Deep temporal linear encoding networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2329-2338.
[6]WINTERBOTTOM T,XIAO S,MCLEAN A,et al.Trying bilinear pooling in Video-QA[J].arXiv:2012.10285,2020.
[7]RAHMAN S,WANG L,SUN C,et al.ReDro:Efficiently Lear-ning Large-sized SPD Visual Representation[C]//Computer Vision-ECCV 2020:16th European Conference,Glasgow,UK.Springer International Publishing,2020:1-17.
[8]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561.
[9]HIGHAM N J.Functions of matrices:theory and computation[M].Society for Industrial and Applied Mathematics,2008.
[10]WANG Q L,XIE J T,ZUO W M,et al.Deep CNNs Meet Glo-bal Covariance Pooling:Better Representation and Generalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021(8):2582-2597.
[11]SONG Y,SEBE N,WANG W.Why Approximate MatrixSquare Root Outperforms Accurate SVD in Global Covariance Pooling[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:1115-1123.
[12]SONG Y,SEBE N,WANG W.Fast Differentiable MatrixSquare Root[C]//International Conference on Learning Representations.2022.
[13]GRETTON A,MAHONEY M W,MOHRI M,et al.Low-rank methods for large-scale machine learning[C]//Neural Information Processing Systems Workshop.2010:40-41.
[14]WANG W,DANG Z,HU YL,et al.Backpropagation-friendly eigendecomposition[J].Advances in Neural Information Processing Systems,2019(32):1756-1792.
[15]SHLIEN S.A method for computing the partial singular value decomposition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1982(6):671-676.
[16]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:248-255.
[17]CHRABASZCZ P,LOSHCHILOV I,HUTTER F.A downsampled variant of imagenet as an alternative to the cifar datasets[J].arXiv:1707.08819,2017.
[18]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013.
[19]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561.
[20]QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:413-420.
[21]KRIZHEVSKY A.Learning Multiple Layers of Features fromTiny Images[R].Toronto:University of Toronto,2009.
[22]HE K M,ZHANG X Y,REN S Q,et al.Identity mappings in deep residual networks[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,Part IV 14.Springer International Publishing,2016:630-645.
[23]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24]LI P H,XIE J T,WANG Q L,et al.Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:947-955.
[25]CUI Y,ZHOU F,WANG J,et al.Kernel pooling for convolutional neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2921-2930.
[26]GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear pooling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:317-326.
[27]KORSCH D,BODESHEIM P,DENZLER J.Classification-spe-cific parts for improving fine-grained visual categorization[C]//Pattern Recognition:41st DAGM German Conference,DAGM GCPR 2019,Dortmund,Germany.Springer International Publishing,2019:62-75.
[28]XIE S N,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[29]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.2015:448-456.
[30]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[31]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations.2021.
[1] WANG Wenjie, YANG Yan, JING Lili, WANG Jie, LIU Yan. LNG-Transformer:An Image Classification Network Based on Multi-scale Information Interaction [J]. Computer Science, 2024, 51(2): 189-195.
[2] ZHANG Feng, HUANG Shixin, HUA Qiang, DONG Chunru. Novel Image Classification Model Based on Depth-wise Convolution Neural Network andVisual Transformer [J]. Computer Science, 2024, 51(2): 196-204.
[3] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[4] WANG Xianwang, ZHOU Hao, ZHANG Minghui, ZHU Youwei. Hyperspectral Image Classification Based on Swin Transformer and 3D Residual Multilayer Fusion Network [J]. Computer Science, 2023, 50(5): 155-160.
[5] XIE Qinqin, HE Lang, XU Ruli. Classification of Oil Painting Art Style Based on Multi-feature Fusion [J]. Computer Science, 2023, 50(3): 223-230.
[6] CHEN Luoxuan, LIN Chengchuang, ZHENG Zhaoliang, MO Zefeng, HUANG Xinyi, ZHAO Gansen. Review of Transformer in Computer Vision [J]. Computer Science, 2023, 50(12): 130-147.
[7] WU Fei, SONG Yibo, JI Yimu, XU Xi, WANG Musen, JING Xiaoyuan. Contribution-based Federated Learning Approach for Global Imbalanced Problem [J]. Computer Science, 2023, 50(12): 343-348.
[8] TANG Junkun, ZHANG Hui, ZHANG Zhouquanand WU Tianyue. Image Classification for Unsupervised Domain Adaptation Based on Task Relevant FeatureSeparation Network [J]. Computer Science, 2023, 50(11A): 230100068-8.
[9] ZHENG Shijie, WANG Gaocai. Study on Fine-grained Image Classification Based on ConvNeXt Heatmap Localization and Contrastive Learning [J]. Computer Science, 2023, 50(10): 119-125.
[10] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[11] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[12] YANG Jian-nan, ZHANG Fan. Classification Method for Small Crops Combining Dual Attention Mechanisms and Hierarchical Network Structure [J]. Computer Science, 2022, 49(6A): 353-357.
[13] ZHU Xu-dong, XIONG Yun. Study on Multi-label Image Classification Based on Sample Distribution Loss [J]. Computer Science, 2022, 49(6): 210-216.
[14] PENG Yun-cong, QIN Xiao-lin, ZHANG Li-ge, GU Yong-xiang. Survey on Few-shot Learning Algorithms for Image Classification [J]. Computer Science, 2022, 49(5): 1-9.
[15] ZHANG Wen-xuan, WU Qin. Fine-grained Image Classification Based on Multi-branch Attention-augmentation [J]. Computer Science, 2022, 49(5): 105-112.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!