计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220300092-6.doi: 10.11896/jsjkx.220300092
张舜尧1,2,3, 李华旺1,2,3, 张永合1,3, 王新宇1,3, 丁国鹏1,3
ZHANG Shunyao1,2,3, LI Huawang1,2,3, ZHANG Yonghe1,3, WANG Xinyu1,3, DING Guopeng1,3
摘要: 近年来,深度学习的方法在基于内容的图像检索领域已经占据主导地位。为了改善主干网络提取出的特征,使得网络能计算出更具区分度的图像描述,提出了一种独立于输入特征的注意力模块ICSA(Independent Channel-wise and Spatial Attention)。该模块与其他的注意力机制的主要区别在于它的注意力权重在输入不同特征时保持一致,传统注意力模块通过对输入特征进行处理得到注意力,因此它的模型更为精简,其参数大小仅有6.7kB,为SENet大小的5.2%和CBAM的2.6%,运行时间与SENet基本一致,为CBAM的14.9%。ICSA的注意力分为通道和空间注意力两部分,分别储存输入特征不同方向上的权重。在Pittsburgh数据集上进行实验,实验结果表明,对于不同的主干网络,在添加了ICSA模块后Recall@1有0.1%~2.4%的提升。
中图分类号:
[1]LEW M S,SEBE N,DJERABA C,et al.Content-based multimedia information retrieval[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2006,2(1):1-19. [2]SMEULDERS A W M,WORRING M,SANTINI S,et al.Content-based image retrieval at the end of the early years[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380. [3]CHANG S K,HSU A.Image information systems:where do we go from here?[J].IEEE transactions on Knowledge and Data Engineering,1992,4(5):431-442. [4]SIVIC J,ZISSERMAN A.Video Google:A text retrieval ap-proach to object matching in videos[C]//IEEE International Conference on Computer Vision.IEEE Computer Society,2003:1470-1470. [5]FEI-FEI L,PERONA P.A bayesian hierarchical model forlearning natural scene categories[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05).IEEE,2005:524-531. [6]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110. [7]JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:3304-3311. [8]PERRONNIN F,SÁNCHEZ J,MENSINK T.Improving thefisher kernel for large-scale image classification[C]//European Conference on Computer Vision.Springer.2010:143-156. [9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,60(6):84-90. [10]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [11]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [12]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [13]BABENKO A,SLESAREV A,CHIGORIN A,et al.Neuralcodes for image retrieval[C]//European Conference on Computer Vision.Springer.2014:584-599. [14]LAI H,PAN Y,LIU Y,et al.Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3270-3278. [15]NOROUZI M,FLEET D J,SALAKHUTDINOV R R.Hamming distance metric learning[J].Advances in Neural Information Processing Systems,2012,25:1061-1069. [16]ZHANG R,LIN L,ZHANG R,et al.Bit-scalable deep hashing withregularized similarity learning for image retrieval and person re-identification[J].IEEE Transactions on Image Processing,2015,24(12):4766-4779. [17]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307. [18]ONG E J,HUSAIN S,BOBER M.Siamese network of deep fisher-vector descriptors for image retrieval[J].arXiv:1702.00338,2017. [19]RADENOVIC′ F,TOLIAS G,CHUM O.CNN image retrieval learns from BoW:Unsupervised fine-tuning with hard examples[C]//European Conference on Computer Vision.Springer.2016:3-20. [20]BROWN A,XIE W,KALOGEITON V,et al.Smooth-ap:Smoothing the path towards large-scale image retrieval[C]//European Conference on Computer Vision.Springer,2020:677-694. [21]BABENKO A,LEMPITSKY V.Aggregating local deep features for image retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1269-1277. [22]KALANTIDIS Y,MELLINA C,OSINDERO S.Cross-dimen-sional weighting for aggregated deep convolutional features[C]//European Conference on Computer Vision.Springer,2016:685-701. [23]ITTI L,KOCH C,NIEBUR E.A model of saliency-based visual attention for rapid scene analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259. [24]MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[J].Advances in Neural Information Processing Systems,2014,27:2204-2212. [25]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[J].Advances in Neural Information Processing Systems,2015,28. [26]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141. [27]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19. [28]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,2:6000-6010. [29]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7794-7803. [30]GUO M H,XU T X,LIU J J,et al.Attention Mechanisms in Computer Vision:A Survey[J].arXiv:2111.07624,2021. [31]BALNTAS V,RIBA E,PONSA D,et al.Learning local feature descriptors with triplets and shallow convolutional neural networks[C]//Bmvc.2016. [32]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.PMLR.2013:1139-1147. [33]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605. |
|