基于混合注意力与偏振非对称损失的哈希图像检索

doi:10.11896/jsjkx.240600057

Abstract

Abstract: With the continuous development of the Internet,massive and complex image data is being created every day,so that today's mainstream social media is full of complex media data such as images.Effectively processing these image data can not only increase the utilization rate of image data but also improve the user experience.Therefore,how to retrieve images quickly and accurately has become a meaningful and urgent problem.The current mainstream hash image retrieval model is convolutional neural network model.However,the convolution operation of CNN can only capture local features,but cannot process global information,and the receptive field size of the convolution operation is fixed,it cannot adapt to input images of different scales.This paper proposes based on Swin Transformer model in Transformer model to achieve effective image retrieval.The Transformer model effectively solves the CNN problem with self-attention mechanism and location coding operation.However,the window attention module of the existing Swin-Transformer hashing image retrieval model gives the same weight to different channels of the image when extracting image features,thus ignoring the differences and dependencies of the feature information of different channels of the image,which reduces the availability of the extracted features and leads to a waste of computing resources.To solve these problems,this paper proposes hash image retrieval model based on mixed attention and polarization asymmetric loss.The model design is based on Swin-Transformer feature extraction module.The window self-attention module in HFST has been added to the channel attention block.The hash feature extraction module based on mixed attention is obtained,which enables the model to assign different weight information to the features of different channels of the input image.Increase the diversity of extracted features and maximize the use of computing resources.At the same time,in order to minimize the intra-class Hamming distance,maximize the inter-class Hamming distance,make full use of the supervision information of the data,and improve the retrieval accuracy of the image,this paper proposes polarization asymmetric loss function.The polarization loss and asymmetric loss are combined with a certain weight allocation ratio,so effectively improve the image retrieval precision.The experimental results show the validity and rationality of the proposed method.For example,when the hash coding length is 16 bits,the proposed model has a maximum average accuracy of 98.73% on the CIFAR-10 single-label dataset,which is 1.51% higher than that of the VTS16-CSQ model.The highest average retrieval accuracy mean is 90.65% on NUSWIDE multi-label dataset,which is 18.02% higher than TransHash and 5.92% higher than VTS16-CSQ model.

Key words: Hash search, Spatial attention, Swin-Transformer, Mixed attention, Polarization loss, Asymmetric loss

CLC Number:

TP391.41

LIU Huayong, XU Minghui. Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss[J].Computer Science, 2025, 52(8): 204-213.

References

[1]ZHANG X Y,ZOU J H,HE K,et al.Accelerating very deep convolutional networks for classification and detection[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2016,38(10):1943-55.
[2]LIU F X,ZHAO W B,WANG Z W,et al.IM3A:boosting deep neural network efficiency via in-memory addressing-assisted acceleration[C]//Proceedings of the 2021 on Great Lakes Symposium on VLSI.New York:ACM,2021:253-258.
[3]JIANG Q Y,LI W J.Asymmetric deep supervised hashing[C]//Proceedings of the 32th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2018:3342-3349.
[4]SU S P,ZHANG C,HAN K,et al.Greedy hash:Towards fast optimization for accurate hash coding in CNN [C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2018:806-815.
[5]CAO Y,LONG M S,LIU B,et al.Deep cauchy hashing for hamming space retrieval[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:1229-1237.
[6]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[7]CHEN Y B,ZHANG S,LIU F X,et al.TransHash:transfor-mer-based hamming hashing for efficient image retrieval[C]//Proceedings of the 2022 International Conference on Multimedia Retrieval.New York:ACM,2022:127-136.
[8]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE Computer Society,2021:10012-10022.
[9]MIAO Z,ZHAO X X,LI Y,et al.Deep supervised hash image retrieval method based on Swin Transformer[J].Journal of Hunan University(Natural Science Edition),2023,50(8):62-71.
[10]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,60(6):84-90.
[11]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[12]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778.
[13]WANG W,YANG Y,WANG X,et al.Development of convolutional neural network and its application in image classification:A survey[J].Optical Engineering,2019,58(4):1.
[14]GKOUNTAKOS K,SEMERTZIDIS T,PAPADOPOULOS GT,et al.A reliability object layer for deep hashing-based visual indexing[C]//International Conference on MultiMedia Mode-ling.Cham:Springer,2019:132-143.
[15]LIONG V E,LU J W,WANG G,et al.Deep hashing for Compact binary codes learning[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2015:2475-2483.
[16]ZHU H,LONG M S,WANG J M,et al. Deep hashing network for efficient similarity retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:2415-2421.
[17]LIU H M,WANG R P,SHAN S G,et al.Deep supervised hashing for fast image retrieval[J].International Journal of Computer Vision,2019,127(9):1217-1234.
[18]CHENG S L,LAI H C,WANG L J,et al.A novel deep hashing method for fast image retrieval[J].The Visual Computer,2019,35(9):1255-1266.
[19]FENG X J,CHENG Y W.Image retrieval based on deep convolutional neural networks and hash[J].Computer Engineering and Design,2020,41(3):670-675.
[20]SHI L Q,WANG Y M.RAN and deep hashing for image retrieval[J].Electronic Design Engineering,2021,29(6):99-103,110.
[21]ZHANG C Y,ZHU L,ZHANG S C,et al.TDHPPIR:an efficient deep hashing based privacy-preserving image retrieval method[J].Neurocomputing,2020,406:386-398.
[22]ZHANG W Q,WU D Y,ZHOU Y,et al.Binary neural network hashing for image retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2021:1318-1327.
[23]YANG W J,WANG L J,CHENG S L,et al.Deep hash with improved dual attention for image retrieval[J].Information,2021,12(7):285-285.
[24]WANGX Y.Research on image retrieval method based on deep hash[D].Taiyuan:North University of China,2023.
[25]WANG W H,XIE E Z,LI X,et al.Pyramid vision transformer:a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE & CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:568-578.
[26]LI T,ZHANG Z,PEI L S,et al.HashFormer:vision transformerbased deep hashing for image retrieval[J].IEEE Signal Processing Letters,2022,29:827-831.
[27]DUBEY S R,SINGH S K,CHU W T.Vision transformer hashing for image retrieval[C]//2022 IEEE International Confe-rence on Multimedia and Expo(ICME).IEEE,2022:1-6.
[28]HE C,WEI H X.Image retrieval based on transformer andasymmetric learning strategy[J].Journal of Image and Graphi-cs,2023,28(2):535-544.
[29]LI K C,WANG Y L,ZHANG J H,et al.Uniformer:unifying convolution and self-attention for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10):12581-12600.
[30]ZHANG Y L,LI K P,LI K,et al.Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision.2018:286-301.
[31]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,2:6000-6010.
[32]FAN L X,NG K W,JU C,et al.Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes[C]//Proceedings of the 2020 International Joint Conference on Artificial Intelligence(IJCAI).2020:825-831.
[33]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[D].Toronto:University of Toronto,2009.
[34]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from national university of singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.New York:ACM,2009:1-9.
[35]CAO Z,LONG M,WANG J,et al.HashNet:deep learning to hash by continuation[C]//Proceedings of the IEEE InternationalConference on Computer Vision.IEEE,2017:5609-5618.
[36]XIE Y Z,WEI R K,SONG J K,et al.Label-affinity self-adaptive central similarity hashing for image retrieval[J].IEEE Transactions on Multimedia,2023,25:9161-9174.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 8

Metrics

Comments

Recommended 0

[1]	XU Yongwei, REN Haopan, WANG Pengfei. Object Detection Algorithm Based on YOLOv8 Enhancement and Its Application Norms [J]. Computer Science, 2025, 52(7): 189-200.
[2]	CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[3]	LIAO Junshuang, TAN Qinhong. DETR with Multi-granularity Spatial Attention and Spatial Prior Supervision [J]. Computer Science, 2024, 51(6): 239-246.
[4]	GENG Huantong, LI Jiaxing, JIANG Jun, LIU Zhenyu, FAN Zichen. High-precision Real-time Semantic Segmentation Algorithm Architecture for Autonomous Driving [J]. Computer Science, 2024, 51(11): 174-181.
[5]	LIU Chang, ZHU Yan. Context-rich Sarcasm Recognition Based on DPCNN and Multiple Learning Modes Loss [J]. Computer Science, 2023, 50(11A): 230200067-5.
[6]	YANG Yue, FENG Tao, LIANG Hong, YANG Yang. Image Arbitrary Style Transfer via Criss-cross Attention [J]. Computer Science, 2022, 49(6A): 345-352.
[7]	MA Wan-yi, ZHANG De-ping. Study on Human Pose Estimation Based on Multiscale Dual Attention [J]. Computer Science, 2022, 49(11A): 220100057-5.
[8]	LI Tian-pei, CHEN Li. Retinal Vessel Segmentation Based on Dual Attention and Encoder-decoder Structure [J]. Computer Science, 2020, 47(5): 166-171.