Computer Science ›› 2023, Vol. 50 ›› Issue (6A): 220300092-6.doi: 10.11896/jsjkx.220300092

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

Image Retrieval Based on Independent Attention Mechanism

ZHANG Shunyao1,2,3, LI Huawang1,2,3, ZHANG Yonghe1,3, WANG Xinyu1,3, DING Guopeng1,3   

  1. 1 Innovation Academy for Mircrosatellites of Chinese Academy of Sciences,Shanghai 201210,China;
    2 Shanghai Tech University,Shanghai 201210,China;
    3 University of Chinese Academy of Sciences,Beijing 100094,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHANG Shunyao,born in 1996,postgraduate.His main research interests include content based image retrieval and pose estimation. LI Huawang,born in 1973,Ph.D,professor,Ph.D supervisor.His main research interests include digital signal processing and computer science.

Abstract: In recent years,deep learning methods has taken a dominant position in the field of content-based image retrieval.To improve features extracted by off-the-shelf backbones and enable the network produce more discriminative image descriptors,the attention module ICSA(independent channel-wise and spatial attention),which is independent with features input into the mo-dule,is proposed.Attention weights of the proposed module keeps the same when input features change,while attention weights are usually computed with input features in other attention mechanisms,which is a main difference between ICSA and other attention modules.This feature also enables the module to be quite small(only 6.7kB,5.2% the size of SENet,2.6% of the size of CBAM) and relatively fast(similar with SENet in speed and 14.9% the time of CBAM).The attention of ICSA is divided as two parts:channel-wise and spatial attention,and they store the weights along orthogonal directions.Experiments on Pittsburgh shows that ICSA made improvement from 0.1% to 2.4% at Recall@1 when with different backbones.

Key words: Content based image retrieval, Attention mechanism, Feature enhancement

CLC Number: 

  • TP391
[1]LEW M S,SEBE N,DJERABA C,et al.Content-based multimedia information retrieval[J].ACM Transactions on Multimedia Computing,Communications,and Applications,2006,2(1):1-19.
[2]SMEULDERS A W M,WORRING M,SANTINI S,et al.Content-based image retrieval at the end of the early years[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(12):1349-1380.
[3]CHANG S K,HSU A.Image information systems:where do we go from here?[J].IEEE transactions on Knowledge and Data Engineering,1992,4(5):431-442.
[4]SIVIC J,ZISSERMAN A.Video Google:A text retrieval ap-proach to object matching in videos[C]//IEEE International Conference on Computer Vision.IEEE Computer Society,2003:1470-1470.
[5]FEI-FEI L,PERONA P.A bayesian hierarchical model forlearning natural scene categories[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05).IEEE,2005:524-531.
[6]LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
[7]JÉGOU H,DOUZE M,SCHMID C,et al.Aggregating local descriptors into a compact image representation[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.IEEE,2010:3304-3311.
[8]PERRONNIN F,SÁNCHEZ J,MENSINK T.Improving thefisher kernel for large-scale image classification[C]//European Conference on Computer Vision.Springer.2010:143-156.
[9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,60(6):84-90.
[10]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[11]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[12]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[13]BABENKO A,SLESAREV A,CHIGORIN A,et al.Neuralcodes for image retrieval[C]//European Conference on Computer Vision.Springer.2014:584-599.
[14]LAI H,PAN Y,LIU Y,et al.Simultaneous feature learning and hash coding with deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3270-3278.
[15]NOROUZI M,FLEET D J,SALAKHUTDINOV R R.Hamming distance metric learning[J].Advances in Neural Information Processing Systems,2012,25:1061-1069.
[16]ZHANG R,LIN L,ZHANG R,et al.Bit-scalable deep hashing withregularized similarity learning for image retrieval and person re-identification[J].IEEE Transactions on Image Processing,2015,24(12):4766-4779.
[17]ARANDJELOVIC R,GRONAT P,TORII A,et al.NetVLAD:CNN architecture for weakly supervised place recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:5297-5307.
[18]ONG E J,HUSAIN S,BOBER M.Siamese network of deep fisher-vector descriptors for image retrieval[J].arXiv:1702.00338,2017.
[19]RADENOVIC′ F,TOLIAS G,CHUM O.CNN image retrieval learns from BoW:Unsupervised fine-tuning with hard examples[C]//European Conference on Computer Vision.Springer.2016:3-20.
[20]BROWN A,XIE W,KALOGEITON V,et al.Smooth-ap:Smoothing the path towards large-scale image retrieval[C]//European Conference on Computer Vision.Springer,2020:677-694.
[21]BABENKO A,LEMPITSKY V.Aggregating local deep features for image retrieval[C]//Proceedings of the IEEE International Conference on Computer Vision.2015:1269-1277.
[22]KALANTIDIS Y,MELLINA C,OSINDERO S.Cross-dimen-sional weighting for aggregated deep convolutional features[C]//European Conference on Computer Vision.Springer,2016:685-701.
[23]ITTI L,KOCH C,NIEBUR E.A model of saliency-based visual attention for rapid scene analysis[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(11):1254-1259.
[24]MNIH V,HEESS N,GRAVES A.Recurrent models of visual attention[J].Advances in Neural Information Processing Systems,2014,27:2204-2212.
[25]JADERBERG M,SIMONYAN K,ZISSERMAN A.Spatialtransformer networks[J].Advances in Neural Information Processing Systems,2015,28.
[26]HU J,SHEN L,SUN G.Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7132-7141.
[27]WOO S,PARK J,LEE J Y,et al.Cbam:Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:3-19.
[28]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[J].Advances in Neural Information Processing Systems,2017,2:6000-6010.
[29]WANG X,GIRSHICK R,GUPTA A,et al.Non-local neuralnetworks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:7794-7803.
[30]GUO M H,XU T X,LIU J J,et al.Attention Mechanisms in Computer Vision:A Survey[J].arXiv:2111.07624,2021.
[31]BALNTAS V,RIBA E,PONSA D,et al.Learning local feature descriptors with triplets and shallow convolutional neural networks[C]//Bmvc.2016.
[32]SUTSKEVER I,MARTENS J,DAHL G,et al.On the importance of initialization and momentum in deep learning[C]//International Conference on Machine Learning.PMLR.2013:1139-1147.
[33]VAN DER MAATEN L,HINTON G.Visualizing data usingt-SNE[J].Journal of Machine Learning Research,2008,9(11):2579-2605.
[1] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[2] TENG Sihang, WANG Lie, LI Ya. Non-autoregressive Transformer Chinese Speech Recognition Incorporating Pronunciation- Character Representation Conversion [J]. Computer Science, 2023, 50(8): 111-117.
[3] WANG Jiahao, ZHONG Xin, LI Wenxiong, ZHAO Dexin. Human Activity Recognition with Meta-learning and Attention [J]. Computer Science, 2023, 50(8): 193-201.
[4] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[5] YAN Mingqiang, YU Pengfei, LI Haiyan, LI Hongsong. Arbitrary Image Style Transfer with Consistent Semantic Style [J]. Computer Science, 2023, 50(7): 129-136.
[6] GAO Xiang, TANG Jiqiang, ZHU Junwu, LIANG Mingxuan, LI Yang. Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement [J]. Computer Science, 2023, 50(6A): 220700153-6.
[7] ZHANG Tao, CHENG Yifei, SUN Xinxu. Graph Attention Networks Based on Causal Inference [J]. Computer Science, 2023, 50(6A): 220600230-9.
[8] CUI Lin, CUI Chenlu, LIU Zhengwei, XUE Kai. Speech Emotion Recognition Based on Improved MFCC and Parallel Hybrid Model [J]. Computer Science, 2023, 50(6A): 220800211-7.
[9] DUAN Jianyong, YANG Xiao, WANG Hao, HE Li, LI Xin. Document-level Relation Extraction of Graph Attention Convolutional Network Based onInter-sentence Information [J]. Computer Science, 2023, 50(6A): 220800189-6.
[10] YANG Xing, SONG Lingling, WANG Shihui. Remote Sensing Image Classification Based on Improved ResNeXt Network Structure [J]. Computer Science, 2023, 50(6A): 220100158-6.
[11] LIU Haowei, YAO Jingchi, LIU Bo, BI Xiuli, XIAO Bin. Two-stage Method for Restoration of Heritage Images Based on Muti-scale Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220600129-8.
[12] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[13] SUN Kaiwei, WANG Zhihao, LIU Hu, RAN Xue. Maximum Overlap Single Target Tracking Algorithm Based on Attention Mechanism [J]. Computer Science, 2023, 50(6A): 220400023-5.
[14] WU Liuchen, ZHANG Hui, LIU Jiaxuan, ZHAO Chenyang. Defect Detection of Transmission Line Bolt Based on Region Attention Mechanism andMulti-scale Feature Fusion [J]. Computer Science, 2023, 50(6A): 220200096-7.
[15] ZHANG Shuaiyu, PENG Li, DAI Feifei. Person Re-identification Method Based on Progressive Attention Pyramid [J]. Computer Science, 2023, 50(6A): 220200084-8.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!