计算机科学 ›› 2024, Vol. 51 ›› Issue (4): 254-261.doi: 10.11896/jsjkx.230200140
曾睿仁, 谢江涛, 李培华
ZENG Ruiren, XIE Jiangtao, LI Peihua
摘要: 近期的研究工作表明,矩阵正规化对全局协方差池化起着关键作用,有助于生成分辨能力更强的表征,从而提升图像识别任务的性能。在不同的矩阵正规化方法中,矩阵结构正规化能充分利用协方差矩阵的几何结构,因此可以获得更好的性能。然而,结构正规化一般依赖计算代价很高的奇异值分解(SVD)或者特征值分解(EIG),不能充分利用GPU的并行计算能力,从而形成计算瓶颈。迭代矩阵平方根正规化(iSQRT)通过牛顿-舒尔兹迭代对协方差矩阵进行正规化,速度比基于SVD和EIG的方法更快。但是随着迭代次数和维度的提高,iSQRT的时间和内存开销都会显著增加,而且该方法无法完成一般幂次的正规化,限制了其应用范围。为了弥补iSQRT的不足,文中提出了一种基于最大奇异值幂的协方差矩阵正规化方法。该方法通过将协方差矩阵除以其最大奇异值的幂来实现,计算过程仅需迭代幂法计算矩阵的最大奇异值。详细的消融实验的结果表明,与iSQRT相比,所提方法的速度更快并占用更少的显存,在时间复杂度和空间复杂度上都优于iSQRT方法,同时性能上与iSQRT方法相当或更好。所提方法在大规模图像分类数据库和细粒度识别数据库中取得了领先的性能,其中在Aircraft,Cars和Indoor67上分别表现为90.7%,93.3%以及83.9%,充分验证了所提方法的鲁棒性和泛化性。
中图分类号:
[1]LIN T Y,ROYCHOWDHURY A,MAJI S.Bilinear CNN mo-dels for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1449-1457. [2]IONESCU C,VANTZOS O,SMINCHISESCU C.Matrix backpropagation for deep networks with structured layers[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:2965-2973. [3]LI P H,XIE J T,WANG Q L,et al.Is second-order information helpful for large-scale visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2070-2078. [4]LIN T Y,MAJI S.Improved Bilinear Pooling with CNNs[C]//Proceedings of the British Machine Vision Conference.2017:117.1-117.12. [5]DIBA A,SHARMA V,VAN GOOL L.Deep temporal linear encoding networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:2329-2338. [6]WINTERBOTTOM T,XIAO S,MCLEAN A,et al.Trying bilinear pooling in Video-QA[J].arXiv:2012.10285,2020. [7]RAHMAN S,WANG L,SUN C,et al.ReDro:Efficiently Lear-ning Large-sized SPD Visual Representation[C]//Computer Vision-ECCV 2020:16th European Conference,Glasgow,UK.Springer International Publishing,2020:1-17. [8]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561. [9]HIGHAM N J.Functions of matrices:theory and computation[M].Society for Industrial and Applied Mathematics,2008. [10]WANG Q L,XIE J T,ZUO W M,et al.Deep CNNs Meet Glo-bal Covariance Pooling:Better Representation and Generalization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021(8):2582-2597. [11]SONG Y,SEBE N,WANG W.Why Approximate MatrixSquare Root Outperforms Accurate SVD in Global Covariance Pooling[C]//Proceedings of the IEEE International Conference on Computer Vision.2021:1115-1123. [12]SONG Y,SEBE N,WANG W.Fast Differentiable MatrixSquare Root[C]//International Conference on Learning Representations.2022. [13]GRETTON A,MAHONEY M W,MOHRI M,et al.Low-rank methods for large-scale machine learning[C]//Neural Information Processing Systems Workshop.2010:40-41. [14]WANG W,DANG Z,HU YL,et al.Backpropagation-friendly eigendecomposition[J].Advances in Neural Information Processing Systems,2019(32):1756-1792. [15]SHLIEN S.A method for computing the partial singular value decomposition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1982(6):671-676. [16]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:248-255. [17]CHRABASZCZ P,LOSHCHILOV I,HUTTER F.A downsampled variant of imagenet as an alternative to the cifar datasets[J].arXiv:1707.08819,2017. [18]MAJI S,RAHTU E,KANNALA J,et al.Fine-grained visualclassification of aircraft[J].arXiv:1306.5151,2013. [19]KRAUSE J,STARK M,DENG J,et al.3d object representa-tions for fine-grained categorization[C]//Proceedings of the IEEE International Conference on Computer Vision Workshops.2013:554-561. [20]QUATTONI A,TORRALBA A.Recognizing indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2009:413-420. [21]KRIZHEVSKY A.Learning Multiple Layers of Features fromTiny Images[R].Toronto:University of Toronto,2009. [22]HE K M,ZHANG X Y,REN S Q,et al.Identity mappings in deep residual networks[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,Part IV 14.Springer International Publishing,2016:630-645. [23]HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [24]LI P H,XIE J T,WANG Q L,et al.Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:947-955. [25]CUI Y,ZHOU F,WANG J,et al.Kernel pooling for convolutional neural networks[C]//Proceedings of the IEEE Confe-rence on Computer Vision and Pattern Recognition.2017:2921-2930. [26]GAO Y,BEIJBOM O,ZHANG N,et al.Compact bilinear pooling[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:317-326. [27]KORSCH D,BODESHEIM P,DENZLER J.Classification-spe-cific parts for improving fine-grained visual categorization[C]//Pattern Recognition:41st DAGM German Conference,DAGM GCPR 2019,Dortmund,Germany.Springer International Publishing,2019:62-75. [28]XIE S N,GIRSHICK R,DOLLAR P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1492-1500. [29]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.2015:448-456. [30]HUANG G,LIU Z,VAN DER MAATEN L,et al.Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:4700-4708. [31]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[C]//International Conference on Learning Representations.2021. |
|