计算机科学 ›› 2025, Vol. 52 ›› Issue (8): 204-213.doi: 10.11896/jsjkx.240600057
刘华咏, 徐明慧
LIU Huayong, XU Minghui
摘要: 随着互联网的不断发展,人们每天都在制造大量且复杂的图像数据,使当今主流的社交媒体充满了图像等媒体数据,快速且准确地对图像进行检索已经成为了有意义且亟待解决的问题。卷积神经网络(CNN)模型是现有的主流哈希图像检索模型。然而,CNN的卷积操作只能捕捉局部特征,无法处理全局信息;且卷积操作的感受野大小固定,无法适应不同尺度的输入图像。为此,基于Transformer模型中的Swin-Transformer模型实现了图像的有效检索。Transformer模型利用自注意力机制和位置编码操作,有效地解决了CNN的问题。而现有的Swin-Transformer哈希图像检索模型的窗口注意力模块在提取图像特征时对于图像的不同通道给予了相同的权重,忽略了图像不同通道特征信息的差异性和依赖关系,使得提取的特征的可利用性降低,造成了计算资源的浪费。针对上述问题,提出了基于混合注意力与偏振非对称损失的哈希图像检索模型(HRMPA)。该设计基于Swin-Transformer的哈希特征提取模块(HFST),在HFST中的(S)W-MSA模块加入了通道注意力模块(CAB),得到基于混合注意力的哈希特征提取模块(HFMA),从而使模型对输入图像的不同通道的特征赋予不同的权重信息,增加了提取特征的多样性且最大限度地利用了计算资源。同时,为了最小化类内汉明距离、最大化类间汉明距离,并充分利用数据的监督信息,提高图像的检索精度,提出了偏振非对称损失函数(PA),使偏振损失和非对称损失以一定的权重分配比进行组合,从而有效地提高了图像的检索精度。实验表明,在哈希编码长度为16 bits时,所提模型在CIFAR-10单标签数据集上,最高平均精度均值达到98.73%,比VTS16-CSQ模型提高了1.51%;在NUSWIDE多标签数据集上,最高平均精度均值达到90.65%,比TransHash提高了18.02%,比VTS16-CSQ模型提高了5.92%。
中图分类号:
[1]ZHANG X Y,ZOU J H,HE K,et al.Accelerating very deep convolutional networks for classification and detection[J].IEEETransactions on Pattern Analysis and Machine Intelligence,2016,38(10):1943-55. [2]LIU F X,ZHAO W B,WANG Z W,et al.IM3A:boosting deep neural network efficiency via in-memory addressing-assisted acceleration[C]//Proceedings of the 2021 on Great Lakes Symposium on VLSI.New York:ACM,2021:253-258. [3]JIANG Q Y,LI W J.Asymmetric deep supervised hashing[C]//Proceedings of the 32th AAAI Conference on Artificial Intelligence.Menlo Park:AAAI,2018:3342-3349. [4]SU S P,ZHANG C,HAN K,et al.Greedy hash:Towards fast optimization for accurate hash coding in CNN [C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2018:806-815. [5]CAO Y,LONG M S,LIU B,et al.Deep cauchy hashing for hamming space retrieval[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:1229-1237. [6]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020. [7]CHEN Y B,ZHANG S,LIU F X,et al.TransHash:transfor-mer-based hamming hashing for efficient image retrieval[C]//Proceedings of the 2022 International Conference on Multimedia Retrieval.New York:ACM,2022:127-136. [8]LIU Z,LIN Y T,CAO Y,et al.Swin Transformer:hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway,NJ:IEEE Computer Society,2021:10012-10022. [9]MIAO Z,ZHAO X X,LI Y,et al.Deep supervised hash image retrieval method based on Swin Transformer[J].Journal of Hunan University(Natural Science Edition),2023,50(8):62-71. [10]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2012,60(6):84-90. [11]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014. [12]HE K,ZHANG X,REN S,et al.Deep residual learning forimage recognition [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:770-778. [13]WANG W,YANG Y,WANG X,et al.Development of convolutional neural network and its application in image classification:A survey[J].Optical Engineering,2019,58(4):1. [14]GKOUNTAKOS K,SEMERTZIDIS T,PAPADOPOULOS GT,et al.A reliability object layer for deep hashing-based visual indexing[C]//International Conference on MultiMedia Mode-ling.Cham:Springer,2019:132-143. [15]LIONG V E,LU J W,WANG G,et al.Deep hashing for Compact binary codes learning[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).IEEE,2015:2475-2483. [16]ZHU H,LONG M S,WANG J M,et al. Deep hashing network for efficient similarity retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:2415-2421. [17]LIU H M,WANG R P,SHAN S G,et al.Deep supervised hashing for fast image retrieval[J].International Journal of Computer Vision,2019,127(9):1217-1234. [18]CHENG S L,LAI H C,WANG L J,et al.A novel deep hashing method for fast image retrieval[J].The Visual Computer,2019,35(9):1255-1266. [19]FENG X J,CHENG Y W.Image retrieval based on deep convolutional neural networks and hash[J].Computer Engineering and Design,2020,41(3):670-675. [20]SHI L Q,WANG Y M.RAN and deep hashing for image retrieval[J].Electronic Design Engineering,2021,29(6):99-103,110. [21]ZHANG C Y,ZHU L,ZHANG S C,et al.TDHPPIR:an efficient deep hashing based privacy-preserving image retrieval method[J].Neurocomputing,2020,406:386-398. [22]ZHANG W Q,WU D Y,ZHOU Y,et al.Binary neural network hashing for image retrieval[C]//Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2021:1318-1327. [23]YANG W J,WANG L J,CHENG S L,et al.Deep hash with improved dual attention for image retrieval[J].Information,2021,12(7):285-285. [24]WANGX Y.Research on image retrieval method based on deep hash[D].Taiyuan:North University of China,2023. [25]WANG W H,XIE E Z,LI X,et al.Pyramid vision transformer:a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE & CVF International Conference on Computer Vision.Piscataway,NJ:IEEE,2021:568-578. [26]LI T,ZHANG Z,PEI L S,et al.HashFormer:vision transformerbased deep hashing for image retrieval[J].IEEE Signal Processing Letters,2022,29:827-831. [27]DUBEY S R,SINGH S K,CHU W T.Vision transformer hashing for image retrieval[C]//2022 IEEE International Confe-rence on Multimedia and Expo(ICME).IEEE,2022:1-6. [28]HE C,WEI H X.Image retrieval based on transformer andasymmetric learning strategy[J].Journal of Image and Graphi-cs,2023,28(2):535-544. [29]LI K C,WANG Y L,ZHANG J H,et al.Uniformer:unifying convolution and self-attention for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(10):12581-12600. [30]ZHANG Y L,LI K P,LI K,et al.Image super-resolution using very deep residual channel attention networks[C]//Proceedings of the European Conference on Computer Vision.2018:286-301. [31]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,2:6000-6010. [32]FAN L X,NG K W,JU C,et al.Deep Polarized Network for Supervised Learning of Accurate Binary Hashing Codes[C]//Proceedings of the 2020 International Joint Conference on Artificial Intelligence(IJCAI).2020:825-831. [33]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[D].Toronto:University of Toronto,2009. [34]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from national university of singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.New York:ACM,2009:1-9. [35]CAO Z,LONG M,WANG J,et al.HashNet:deep learning to hash by continuation[C]//Proceedings of the IEEE InternationalConference on Computer Vision.IEEE,2017:5609-5618. [36]XIE Y Z,WEI R K,SONG J K,et al.Label-affinity self-adaptive central similarity hashing for image retrieval[J].IEEE Transactions on Multimedia,2023,25:9161-9174. |
|