计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 159-166.doi: 10.11896/jsjkx.240400022
刘华咏, 朱婷
LIU Huayong, ZHU Ting
摘要: 监督方法在跨模态检索中已有不少成果,是比较热门的方法。然而,这类方法过于依赖标记的数据,没有充分利用无标签数据所包含的丰富信息。为了解决这一问题,人们开始研究无监督方法,但是仅依靠未标记数据的效果并不理想。对此,提出了基于GAN的语义对齐网络半监督跨模态哈希方法(GAN-SASCH)。该模型基于生成对抗网络,结合了语义对齐的概念。生成对抗网络分为两个模块,分别是生成器和判别器,生成器学习拟合未标记数据的相关性分布并生成虚假的数据样本,判别器则用于判断数据对样本是来自数据集还是生成器。通过这两个模块之间展开极大极小的对抗博弈游戏,不断提升生成对抗网络的性能。语义对齐能充分利用不同模态之间的相互作用和对称性,统一不同模态的相似性信息,有效地指导哈希代码的学习过程。除此之外,还引入了自适应学习优化参数以提升模型性能。在NUS-WIDE和MIRFLICKR25K数据集上,对比了所提方法与9种相关前沿方法,使用MAP与PR图两种评价指标验证了所提方法的有效性。
中图分类号:
[1]CHI L H,ZHU X Q.Hashing techniques:a survey and taxonomy[J].Association for Computing Machinery,2017,50(1):1-36. [2]ZHANG J,PENG Y X,YUAN M K.SCH-GAN:semi-supervised cross-modal hashing by generative adversarial network[J].IEEE Transactions on Cybernetics,2020,50(2):489-502. [3]CHEN N,DUAN Y X,SUN Q F.Cross-modal search research literature review[J].Computer Science and Exploration,2021,15(8):1390-1404. [4]WU B T,YANG Q,ZHENG W S,et al.Quantized correlation hashing for fast cross-modal search[C]//Proceedings of the 24th International Conference on Artificial Intelligence.2015:3946-3952. [5]ZHEN Y,YEUNG D Y.Co-regularized hashing for multimodal data[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:1376-1384. [6]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving hashing for cross-view retrieval[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition.2015:3864-3872. [7]ABID H,HENG C L,MEHBOOB H,et al.A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval[J].Computers and Electrical Engineering.2024,120(PC):109799-109799. [8]DING G G,GUO Y C,ZHOU J L,et al.Large-scale cross-modality search via collective matrix factorization hashing[J].IEEE Transactions on Image Processing.2016,25(11):5427-5440. [9]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.2011:1360-1365. [10]RASTEGARIM,CHOI J,FAKHRAEI S,et al.Predictable dual-view hashing[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning.2013:1328-1336. [11]LI Y Q,LU Z W,LIU C.Unsupervised Triplet Hashing Method Based on Contrastive Learning [J].Application Research of Computers,2023,40(5):1434-1440 [12]PENG L K,LU X M,XU Q B.Research progress on cross-modal hash retrieval based on deep learning[J].Journal of Data Communications,2022,208(3):32-38. [13]JIANGQ Y,LI W J.Deep Cross-modal hashing[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.2017:3270-3278. [14]CAOY,LIU B,LONG M S,et al.Cross-modal hamming hashing[C]//Proceedings of the European Conference on Computer Vision.2018:202-218. [15]ZOUX T,WANG X Z,BAKKER E M,et al.Multi-label semantics preserving based deep cross-modal hashing[J].Signal Processing:Image Communication,2021,93:116131. [16]XIE Y C,ZENG X H,WANG T H,et al.Deep online cross-modal hashing by a co-training mechanism[J].Knowledge-Based Systems,2022,257:109888. [17]HARDOOND R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664. [18]HUM Q,YANG Y,SHEN F M,et al.Collective reconstructive embeddings for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(6):2770-2784. [19]HU P,ZHU H Y,LIN J,et al.Unsupervised contrastive cross-modal hashing[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3877-3889. [20]YAOD,LI Z X,LI B,et al.Similarity graph-correlation reconstruction network for unsupervised cross-modal hashing[J].Expert Syst.Appl.,2024,273:1-13. [21]JIANGQ Y,LI W J.Discrete latent factor model for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(7):3490-3501. [22]CHENY,ZHANG H,TIAN Z B,et al.Enhanced discrete multi-modal hashing:more constraints yet less time to learn[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(3):1177-1190. [23]LI Z,YAO T,WANG L L,et al.Supervisedcontrastive discrete hashing for cross-modal retrieval[J].Knowledge-Based Systems,2024,295:1-13. [24]ZHANG C,ZHENG W S.Semi-supervised multi-view discretehashing for fast image search[J].IEEE Transactions on Image Processing,2017,26(6):2604-2617. [25]WU F,LI S S,GAO G W,et al.Semi-supervised cross-modalhashing via modality-specific and cross-modal graph convolutional networks[J].Pattern Recognation,2023,136(C):1-10. [26]DENGC,CHEN Z J,LIU X L,et al.Triplet-based deep hashing network for cross-modal retrieval[J].IEEE Transactions on Image Processing,2018,27(8):3893-3903. [27]ZHENL L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10386-10395. [28]HUANG Z,HU H W,SU M.Hybrid DAER based cross-modal retrieval exploiting deep representation learning.Entropy[J].Entrpoy,2023,25(8):1216-1234. [29]GOODFELLOWI J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:2672-2680. [30]WANGB K,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162. [31]PENG Y X,QI J W.CM-GANs:cross-modal generative adver-sarial networks for common representation learning[J].Association for Computing Machinery,2019,15(22):1-24. [32]ANDREJ K,LI F F.Deep visual-semantic alignments for genera-ting image descriptions[C]//2015 IEEE Conference on ComputerVision and Pattern Recognition.2015:3128-3137. [33]CAI L W,ZHU L,ZHANG H Y,et al.DA-GAN:Dualattention generative adversarial network for cross-modal retrieval[J].Future Internet,2022,14(2):43-43. [34]WEN K,GU X,CHENG Q.Learning dual semantic relationswith graph attention for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(7):2866-2879. [35]ZHANGL,CHEN L T,OU W H,et al.Semi-supervised cross-modal retrieval with graph-based semantic alignment network[J].Computers and Electrical Engineering,2022,102(C):1-19. [36]CHUAT S,TANG J H,HONG R C,et al.NUS-WIDE:a real-world web image database from national university of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9. [37]HUISKESM J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.2008:39-43. [38]WANG X Z,ZOU X T,BAKKER E M,et al.Self-constrainingand attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271. [39]ZENG Z X,MAO W J.A comprehensive empirical study of vision-language pre-trained model for supervised cross-modal retrieval[J].arXiv:2201.02772,2022. [40]YANG X H,WANG Z,LIU W H,et al.Deep adversarialmulti-label cross-modal hashing algorithm[J].International Journal of Multimedia Information Retrieval,2023,12:1-12. [41]NI H M,FANG X Z,KANG P P,et al.SCH:Symmetric consistent hashing for cross-modal retrieval[J].Signal Processing,2024,215(C):1-12. |
|