计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 159-166.doi: 10.11896/jsjkx.240400022

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于GAN的语义对齐网络半监督跨模态哈希方法

刘华咏, 朱婷   

  1. 华中师范大学计算机学院 武汉 430079
  • 收稿日期:2024-04-02 修回日期:2024-09-26 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 朱婷(zhuting_ccnu@163.com)
  • 作者简介:(lhywuhee@ccnu.edu.cn)
  • 基金资助:
    教育部人文社会科学研究项目(21YJA870005)

Semi-supervised Cross-modal Hashing Method for Semantic Alignment Networks Basedon GAN

LIU Huayong, ZHU Ting   

  1. School of Computer Science,Central China Normal University,Wuhan 430079,China
  • Received:2024-04-02 Revised:2024-09-26 Online:2025-06-15 Published:2025-06-11
  • About author:LIU Huayong,born in 1978.Ph.D,associate professor,is a member of CCF(No.35656M).His main research interests include cross modal retrieval,computer vision and deep learning.
    ZHU Ting,born in 2001,postgraduate.Her main research interests include cross modal retrieval and deep learning.
  • Supported by:
    Humanities and Social Sciences Research Project of the MoE(21YJA870005).

摘要: 监督方法在跨模态检索中已有不少成果,是比较热门的方法。然而,这类方法过于依赖标记的数据,没有充分利用无标签数据所包含的丰富信息。为了解决这一问题,人们开始研究无监督方法,但是仅依靠未标记数据的效果并不理想。对此,提出了基于GAN的语义对齐网络半监督跨模态哈希方法(GAN-SASCH)。该模型基于生成对抗网络,结合了语义对齐的概念。生成对抗网络分为两个模块,分别是生成器和判别器,生成器学习拟合未标记数据的相关性分布并生成虚假的数据样本,判别器则用于判断数据对样本是来自数据集还是生成器。通过这两个模块之间展开极大极小的对抗博弈游戏,不断提升生成对抗网络的性能。语义对齐能充分利用不同模态之间的相互作用和对称性,统一不同模态的相似性信息,有效地指导哈希代码的学习过程。除此之外,还引入了自适应学习优化参数以提升模型性能。在NUS-WIDE和MIRFLICKR25K数据集上,对比了所提方法与9种相关前沿方法,使用MAP与PR图两种评价指标验证了所提方法的有效性。

关键词: 跨模态哈希, 生成对抗网络, 语义对齐, 半监督, 自适应学习

Abstract: Supervised methods have achieved a lot of results in cross-modal retrieval and have become popular methods.How-ever,these methods rely too much on labeled data and do not make full use of the rich information contained in unlabeled data.To solve this problem,unsupervised methods have been studied,but when relying solely on unlabeled data,the results are not ideal.Therefore,this paper proposes a semi-supervised cross-modal hashing method for semantic alignment networks based on GAN(GAN-SASCH).This model is based on generative adversarial networks that incorporate the concept of semantic alignment.The generative adversarial network is divided into two modules.The generator learns to fit the correlation distribution of the unlabeled data and generates a spurious data sample,and the discriminator is used to determine whether the data pair sample comes from the dataset or the generator.By developing a very small adversarial game between these two modules,the performance of the ge-nerative adversarial network is continuously improved.Semantic alignment can make full use of the interaction and symmetry between different modalities,unify the similarity information of different modalities,and effectively guide the learning process of hash code.In this paper,adaptive learning optimization parameters are also introduced to improve the performance of the model.On NUS-WIDE and MIRFLICKR25K datasets,we compare the proposed method with 9 related frontier methods,and verify the effectiveness of the proposed method by using two evaluation indicators,MAP and PR map.

Key words: Cross-modal hash, Generative adversarial network, Semantic alignment, Semi-supervised, Adaptive learning

中图分类号: 

  • TP391
[1]CHI L H,ZHU X Q.Hashing techniques:a survey and taxonomy[J].Association for Computing Machinery,2017,50(1):1-36.
[2]ZHANG J,PENG Y X,YUAN M K.SCH-GAN:semi-supervised cross-modal hashing by generative adversarial network[J].IEEE Transactions on Cybernetics,2020,50(2):489-502.
[3]CHEN N,DUAN Y X,SUN Q F.Cross-modal search research literature review[J].Computer Science and Exploration,2021,15(8):1390-1404.
[4]WU B T,YANG Q,ZHENG W S,et al.Quantized correlation hashing for fast cross-modal search[C]//Proceedings of the 24th International Conference on Artificial Intelligence.2015:3946-3952.
[5]ZHEN Y,YEUNG D Y.Co-regularized hashing for multimodal data[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:1376-1384.
[6]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving hashing for cross-view retrieval[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition.2015:3864-3872.
[7]ABID H,HENG C L,MEHBOOB H,et al.A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval[J].Computers and Electrical Engineering.2024,120(PC):109799-109799.
[8]DING G G,GUO Y C,ZHOU J L,et al.Large-scale cross-modality search via collective matrix factorization hashing[J].IEEE Transactions on Image Processing.2016,25(11):5427-5440.
[9]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.2011:1360-1365.
[10]RASTEGARIM,CHOI J,FAKHRAEI S,et al.Predictable dual-view hashing[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning.2013:1328-1336.
[11]LI Y Q,LU Z W,LIU C.Unsupervised Triplet Hashing Method Based on Contrastive Learning [J].Application Research of Computers,2023,40(5):1434-1440
[12]PENG L K,LU X M,XU Q B.Research progress on cross-modal hash retrieval based on deep learning[J].Journal of Data Communications,2022,208(3):32-38.
[13]JIANGQ Y,LI W J.Deep Cross-modal hashing[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.2017:3270-3278.
[14]CAOY,LIU B,LONG M S,et al.Cross-modal hamming hashing[C]//Proceedings of the European Conference on Computer Vision.2018:202-218.
[15]ZOUX T,WANG X Z,BAKKER E M,et al.Multi-label semantics preserving based deep cross-modal hashing[J].Signal Processing:Image Communication,2021,93:116131.
[16]XIE Y C,ZENG X H,WANG T H,et al.Deep online cross-modal hashing by a co-training mechanism[J].Knowledge-Based Systems,2022,257:109888.
[17]HARDOOND R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[18]HUM Q,YANG Y,SHEN F M,et al.Collective reconstructive embeddings for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(6):2770-2784.
[19]HU P,ZHU H Y,LIN J,et al.Unsupervised contrastive cross-modal hashing[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3877-3889.
[20]YAOD,LI Z X,LI B,et al.Similarity graph-correlation reconstruction network for unsupervised cross-modal hashing[J].Expert Syst.Appl.,2024,273:1-13.
[21]JIANGQ Y,LI W J.Discrete latent factor model for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(7):3490-3501.
[22]CHENY,ZHANG H,TIAN Z B,et al.Enhanced discrete multi-modal hashing:more constraints yet less time to learn[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(3):1177-1190.
[23]LI Z,YAO T,WANG L L,et al.Supervisedcontrastive discrete hashing for cross-modal retrieval[J].Knowledge-Based Systems,2024,295:1-13.
[24]ZHANG C,ZHENG W S.Semi-supervised multi-view discretehashing for fast image search[J].IEEE Transactions on Image Processing,2017,26(6):2604-2617.
[25]WU F,LI S S,GAO G W,et al.Semi-supervised cross-modalhashing via modality-specific and cross-modal graph convolutional networks[J].Pattern Recognation,2023,136(C):1-10.
[26]DENGC,CHEN Z J,LIU X L,et al.Triplet-based deep hashing network for cross-modal retrieval[J].IEEE Transactions on Image Processing,2018,27(8):3893-3903.
[27]ZHENL L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10386-10395.
[28]HUANG Z,HU H W,SU M.Hybrid DAER based cross-modal retrieval exploiting deep representation learning.Entropy[J].Entrpoy,2023,25(8):1216-1234.
[29]GOODFELLOWI J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:2672-2680.
[30]WANGB K,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162.
[31]PENG Y X,QI J W.CM-GANs:cross-modal generative adver-sarial networks for common representation learning[J].Association for Computing Machinery,2019,15(22):1-24.
[32]ANDREJ K,LI F F.Deep visual-semantic alignments for genera-ting image descriptions[C]//2015 IEEE Conference on ComputerVision and Pattern Recognition.2015:3128-3137.
[33]CAI L W,ZHU L,ZHANG H Y,et al.DA-GAN:Dualattention generative adversarial network for cross-modal retrieval[J].Future Internet,2022,14(2):43-43.
[34]WEN K,GU X,CHENG Q.Learning dual semantic relationswith graph attention for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(7):2866-2879.
[35]ZHANGL,CHEN L T,OU W H,et al.Semi-supervised cross-modal retrieval with graph-based semantic alignment network[J].Computers and Electrical Engineering,2022,102(C):1-19.
[36]CHUAT S,TANG J H,HONG R C,et al.NUS-WIDE:a real-world web image database from national university of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9.
[37]HUISKESM J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.2008:39-43.
[38]WANG X Z,ZOU X T,BAKKER E M,et al.Self-constrainingand attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271.
[39]ZENG Z X,MAO W J.A comprehensive empirical study of vision-language pre-trained model for supervised cross-modal retrieval[J].arXiv:2201.02772,2022.
[40]YANG X H,WANG Z,LIU W H,et al.Deep adversarialmulti-label cross-modal hashing algorithm[J].International Journal of Multimedia Information Retrieval,2023,12:1-12.
[41]NI H M,FANG X Z,KANG P P,et al.SCH:Symmetric consistent hashing for cross-modal retrieval[J].Signal Processing,2024,215(C):1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!