基于独立注意力机制的图像检索算法

doi:10.11896/jsjkx.220300092

Abstract

Abstract: In recent years,deep learning methods has taken a dominant position in the field of content-based image retrieval.To improve features extracted by off-the-shelf backbones and enable the network produce more discriminative image descriptors,the attention module ICSA(independent channel-wise and spatial attention),which is independent with features input into the mo-dule,is proposed.Attention weights of the proposed module keeps the same when input features change,while attention weights are usually computed with input features in other attention mechanisms,which is a main difference between ICSA and other attention modules.This feature also enables the module to be quite small(only 6.7kB,5.2% the size of SENet,2.6% of the size of CBAM) and relatively fast(similar with SENet in speed and 14.9% the time of CBAM).The attention of ICSA is divided as two parts:channel-wise and spatial attention,and they store the weights along orthogonal directions.Experiments on Pittsburgh shows that ICSA made improvement from 0.1% to 2.4% at Recall@1 when with different backbones.

Key words: Content based image retrieval, Attention mechanism, Feature enhancement

CLC Number:

TP391

ZHANG Shunyao, LI Huawang, ZHANG Yonghe, WANG Xinyu, DING Guopeng. Image Retrieval Based on Independent Attention Mechanism[J].Computer Science, 2023, 50(6A): 220300092-6.

References

[1]LEW M S,SEBE N,DJERABA C,et al.Content-based multimedia information retrieval[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2006,2(1):1-19.
[2]SMEULDERS A W M,WORRING M,SANTINI S,et al.Content-based image retrieval at the end of the early years[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380.
[3]CHANG S K,HSU A.Image information systems:where do we go from here?[J].IEEE transactions on Knowledge and Data Engineering,1992,4(5):431-442.
[4]SIVIC J,ZISSERMAN A.Video Google:A text retrieval ap-proach to object matching in videos[C]//IEEE International Conference on Computer Vision.IEEE Computer Society,2003:1470-1470.
[5]FEI-FEI L,PERONA P.A bayesian hierarchical model forlearning natural scene categories[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05).IEEE,2005:524-531.
[6]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[7]JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:3304-3311.
[8]PERRONNIN F,SÁNCHEZ J,MENSINK T.Improving thefisher kernel for large-scale image classification[C]//European Conference on Computer Vision.Springer.2010:143-156.
[9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,60(6):84-90.
[10]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[11]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[12]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[13]BABENKO A,SLESAREV A,CHIGORIN A,et al.Neuralcodes for image retrieval[C]//European Conference on Computer Vision.Springer.2014:584-599.
[14]LAI H,PAN Y,LIU Y,et al.Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3270-3278.
[15]NOROUZI M,FLEET D J,SALAKHUTDINOV R R.Hamming distance metric learning[J].Advances in Neural Information Processing Systems,2012,25:1061-1069.
[16]ZHANG R,LIN L,ZHANG R,et al.Bit-scalable deep hashing withregularized similarity learning for image retrieval and person re-identification[J].IEEE Transactions on Image Processing,2015,24(12):4766-4779.
[17]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307.
[18]ONG E J,HUSAIN S,BOBER M.Siamese network of deep fisher-vector descriptors for image retrieval[J].arXiv:1702.00338,2017.
[19]RADENOVIC′ F,TOLIAS G,CHUM O.CNN image retrieval learns from BoW:Unsupervised fine-tuning with hard examples[C]//European Conference on Computer Vision.Springer.2016:3-20.
[20]BROWN A,XIE W,KALOGEITON V,et al.Smooth-ap:Smoothing the path towards large-scale image retrieval[C]//European Conference on Computer Vision.Springer,2020:677-694.
[21]BABENKO A,LEMPITSKY V.Aggregating local deep features for image retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1269-1277.
[22]KALANTIDIS Y,MELLINA C,OSINDERO S.Cross-dimen-sional weighting for aggregated deep convolutional features[C]//European Conference on Computer Vision.Springer,2016:685-701.
[23]ITTI L,KOCH C,NIEBUR E.A model of saliency-based visual attention for rapid scene analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259.
[24]MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[J].Advances in Neural Information Processing Systems,2014,27:2204-2212.
[25]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[J].Advances in Neural Information Processing Systems,2015,28.
[26]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[27]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[28]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,2:6000-6010.
[29]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7794-7803.
[30]GUO M H,XU T X,LIU J J,et al.Attention Mechanisms in Computer Vision:A Survey[J].arXiv:2111.07624,2021.
[31]BALNTAS V,RIBA E,PONSA D,et al.Learning local feature descriptors with triplets and shallow convolutional neural networks[C]//Bmvc.2016.
[32]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.PMLR.2013:1139-1147.
[33]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.

Related Articles 15

[1]	ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[2]	TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[3]	WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[4]	WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[5]	YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
[6]	GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[7]	ZHANG Tao, CHENG Yifei, SUN Xinxu. Graph Attention Networks Based on Causal Inference [J]. Computer Science, 2023, 50(6A): 220600230-9.
[8]	CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[9]	DUAN Jianyong, YANG Xiao, WANG Hao, HE Li, LI Xin. Document-level Relation Extraction of Graph Attention Convolutional Network Based onInter-sentence Information [J]. Computer Science, 2023, 50(6A): 220800189-6.
[10]	YANG Xing, SONG Lingling, WANG Shihui. Remote Sensing Image Classification Based on Improved ResNeXt Network Structure [J]. Computer Science, 2023, 50(6A): 220100158-6.
[11]	LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[12]	LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[13]	SUN Kaiwei, WANG Zhihao, LIU Hu, RAN Xue. Maximum Overlap Single Target Tracking Algorithm Based on Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220400023-5.
[14]	WU Liuchen, ZHANG Hui, LIU Jiaxuan, ZHAO Chenyang. Defect Detection of Transmission Line Bolt Based on Region Attention Mechanism andMulti-scale Feature Fusion [J]. Computer Science, 2023, 50(6A): 220200096-7.
[15]	ZHANG Shuaiyu, PENG Li, DAI Feifei. Person Re-identification Method Based on Progressive Attention Pyramid [J]. Computer Science, 2023, 50(6A): 220200084-8.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Image Retrieval Based on Independent Attention Mechanism

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0