计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 254-261.doi: 10.11896/jsjkx.230200140

• 计算机图形学&多媒体 • 上一篇    下一篇

基于快速最大奇异值幂正规化的全局协方差池化

曾睿仁, 谢江涛, 李培华   

  1. 大连理工大学信息与通信工程学院 辽宁 大连116024
  • 收稿日期:2023-02-20 修回日期:2023-05-28 出版日期:2024-04-15 发布日期:2024-04-10
  • 通讯作者: 李培华(peihuali@dlut.edu.cn)
  • 作者简介:(coke990921@mail.dlut.edu.cn)
  • 基金资助:
    国家自然科学基金(61971086)

Global Covariance Pooling Based on Fast Maximum Singular Value Power Normalization

ZENG Ruiren, XIE Jiangtao, LI Peihua   

  1. School of Information and Communication Engineering,Dalian University of Technology,Dalian,Liaoning 116024,China
  • Received:2023-02-20 Revised:2023-05-28 Online:2024-04-15 Published:2024-04-10
  • Supported by:
    National Natural Science Foundationof China(61971086).

摘要: 近期的研究工作表明,矩阵正规化对全局协方差池化起着关键作用,有助于生成分辨能力更强的表征,从而提升图像识别任务的性能。在不同的矩阵正规化方法中,矩阵结构正规化能充分利用协方差矩阵的几何结构,因此可以获得更好的性能。然而,结构正规化一般依赖计算代价很高的奇异值分解(SVD)或者特征值分解(EIG),不能充分利用GPU的并行计算能力,从而形成计算瓶颈。迭代矩阵平方根正规化(iSQRT)通过牛顿-舒尔兹迭代对协方差矩阵进行正规化,速度比基于SVD和EIG的方法更快。但是随着迭代次数和维度的提高,iSQRT的时间和内存开销都会显著增加,而且该方法无法完成一般幂次的正规化,限制了其应用范围。为了弥补iSQRT的不足,文中提出了一种基于最大奇异值幂的协方差矩阵正规化方法。该方法通过将协方差矩阵除以其最大奇异值的幂来实现,计算过程仅需迭代幂法计算矩阵的最大奇异值。详细的消融实验的结果表明,与iSQRT相比,所提方法的速度更快并占用更少的显存,在时间复杂度和空间复杂度上都优于iSQRT方法,同时性能上与iSQRT方法相当或更好。所提方法在大规模图像分类数据库和细粒度识别数据库中取得了领先的性能,其中在Aircraft,Cars和Indoor67上分别表现为90.7%,93.3%以及83.9%,充分验证了所提方法的鲁棒性和泛化性。

关键词: 图像分类, 全局协方差池化, 矩阵幂正规化, 最大奇异值幂正规化

Abstract: Recent research work shows that matrix normalization plays a key role in global covariance pooling,which helps to generate more discriminative representations,thus improving the performance of image recognition tasks.For different normalization methods,the matrix structure-wise normalization can make full use of the geometric structure of the covariance matrix,so it can obtain better performance.However,the structure-wise normalization generally depends on singular value decomposition(SVD) or eigenvalue decomposition(EIG) with high computational cost,which limits parallel computing ability of GPUs,beco-ming a computational bottleneck.Iterative matrix square root normalization(iSQRT) uses Newton-Schulz iteration to normalize the covariance matrix,which is faster than the methods based on SVD and EIG.However,with the increase of the number of itera-tions and dimensions,the time and memory of iSQRT will increase significantly,and this method cannot complete the normalization of general power,which limits its application scope.To solve the above problems,a covariance matrix normalization method based on the maximum singular value power is proposed by dividing the covariance matrix by the power of its maximum singular value which only depends on iterative power method to estimate the maximum singular value of the matrix.Detailed ablation experiments show that,compared with iSQRT,the proposed method is faster and occupies less memory,and is superior to iSQRT in terms of time complexity and space complexity,and its performance is comparable to or better than iSQRT.The proposed method has achieved state-of-the-art performance in large-scale image classification dataset and fine-grained visual recognition datasets,including Aircraft,Cars and Indoor67,where accuracy is 90.7%,93.3% and 83.9% respectively.The result fully demonstrates the robustness and generalization of the proposed method.

Key words: Image classification, Global covariance pooling, Matrix power normalization, Maximum singular value power normalization

中图分类号: 

  • TP391
[1]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN mo-dels for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1449-1457.
[2]IONESCU C,VANTZOS O,SMINCHISESCU C.Matrix backpropagation for deep networks with structured layers[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2965-2973.
[3]LI P H,XIE J T,WANG Q L,et al.Is second-order information helpful for large-scale visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2070-2078.
[4]LIN T Y,MAJI S.Improved Bilinear Pooling with CNNs[C]//Proceedings of the British Machine Vision Conference.2017:117.1-117.12.
[5]DIBA A,SHARMA V,VAN GOOL L.Deep temporal linear encoding networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2329-2338.
[6]WINTERBOTTOM T,XIAO S,MCLEAN A,et al.Trying bilinear pooling in Video-QA[J].arXiv:2012.10285,2020.
[7]RAHMAN S,WANG L,SUN C,et al.ReDro:Efficiently Lear-ning Large-sized SPD Visual Representation[C]//Computer Vision-ECCV 2020:16th European Conference,Glasgow,UK.Springer International Publishing,2020:1-17.
[8]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561.
[9]HIGHAM N J.Functions of matrices:theory and computation[M].Society for Industrial and Applied Mathematics,2008.
[10]WANG Q L,XIE J T,ZUO W M,et al.Deep CNNs Meet Glo-bal Covariance Pooling:Better Representation and Generalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021(8):2582-2597.
[11]SONG Y,SEBE N,WANG W.Why Approximate MatrixSquare Root Outperforms Accurate SVD in Global Covariance Pooling[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:1115-1123.
[12]SONG Y,SEBE N,WANG W.Fast Differentiable MatrixSquare Root[C]//International Conference on Learning Representations.2022.
[13]GRETTON A,MAHONEY M W,MOHRI M,et al.Low-rank methods for large-scale machine learning[C]//Neural Information Processing Systems Workshop.2010:40-41.
[14]WANG W,DANG Z,HU YL,et al.Backpropagation-friendly eigendecomposition[J].Advances in Neural Information Processing Systems,2019(32):1756-1792.
[15]SHLIEN S.A method for computing the partial singular value decomposition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1982(6):671-676.
[16]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:248-255.
[17]CHRABASZCZ P,LOSHCHILOV I,HUTTER F.A downsampled variant of imagenet as an alternative to the cifar datasets[J].arXiv:1707.08819,2017.
[18]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013.
[19]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561.
[20]QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:413-420.
[21]KRIZHEVSKY A.Learning Multiple Layers of Features fromTiny Images[R].Toronto:University of Toronto,2009.
[22]HE K M,ZHANG X Y,REN S Q,et al.Identity mappings in deep residual networks[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,Part IV 14.Springer International Publishing,2016:630-645.
[23]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[24]LI P H,XIE J T,WANG Q L,et al.Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:947-955.
[25]CUI Y,ZHOU F,WANG J,et al.Kernel pooling for convolutional neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2921-2930.
[26]GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear pooling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:317-326.
[27]KORSCH D,BODESHEIM P,DENZLER J.Classification-spe-cific parts for improving fine-grained visual categorization[C]//Pattern Recognition:41st DAGM German Conference,DAGM GCPR 2019,Dortmund,Germany.Springer International Publishing,2019:62-75.
[28]XIE S N,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500.
[29]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.2015:448-456.
[30]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708.
[31]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations.2021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!