Computer Science ›› 2025, Vol. 52 ›› Issue (6): 159-166.doi: 10.11896/jsjkx.240400022

• Database & Big Data & Data Science • Previous Articles     Next Articles

Semi-supervised Cross-modal Hashing Method for Semantic Alignment Networks Basedon GAN

LIU Huayong, ZHU Ting   

  1. School of Computer Science,Central China Normal University,Wuhan 430079,China
  • Received:2024-04-02 Revised:2024-09-26 Online:2025-06-15 Published:2025-06-11
  • About author:LIU Huayong,born in 1978.Ph.D,associate professor,is a member of CCF(No.35656M).His main research interests include cross modal retrieval,computer vision and deep learning.
    ZHU Ting,born in 2001,postgraduate.Her main research interests include cross modal retrieval and deep learning.
  • Supported by:
    Humanities and Social Sciences Research Project of the MoE(21YJA870005).

Abstract: Supervised methods have achieved a lot of results in cross-modal retrieval and have become popular methods.How-ever,these methods rely too much on labeled data and do not make full use of the rich information contained in unlabeled data.To solve this problem,unsupervised methods have been studied,but when relying solely on unlabeled data,the results are not ideal.Therefore,this paper proposes a semi-supervised cross-modal hashing method for semantic alignment networks based on GAN(GAN-SASCH).This model is based on generative adversarial networks that incorporate the concept of semantic alignment.The generative adversarial network is divided into two modules.The generator learns to fit the correlation distribution of the unlabeled data and generates a spurious data sample,and the discriminator is used to determine whether the data pair sample comes from the dataset or the generator.By developing a very small adversarial game between these two modules,the performance of the ge-nerative adversarial network is continuously improved.Semantic alignment can make full use of the interaction and symmetry between different modalities,unify the similarity information of different modalities,and effectively guide the learning process of hash code.In this paper,adaptive learning optimization parameters are also introduced to improve the performance of the model.On NUS-WIDE and MIRFLICKR25K datasets,we compare the proposed method with 9 related frontier methods,and verify the effectiveness of the proposed method by using two evaluation indicators,MAP and PR map.

Key words: Cross-modal hash, Generative adversarial network, Semantic alignment, Semi-supervised, Adaptive learning

CLC Number: 

  • TP391
[1]CHI L H,ZHU X Q.Hashing techniques:a survey and taxonomy[J].Association for Computing Machinery,2017,50(1):1-36.
[2]ZHANG J,PENG Y X,YUAN M K.SCH-GAN:semi-supervised cross-modal hashing by generative adversarial network[J].IEEE Transactions on Cybernetics,2020,50(2):489-502.
[3]CHEN N,DUAN Y X,SUN Q F.Cross-modal search research literature review[J].Computer Science and Exploration,2021,15(8):1390-1404.
[4]WU B T,YANG Q,ZHENG W S,et al.Quantized correlation hashing for fast cross-modal search[C]//Proceedings of the 24th International Conference on Artificial Intelligence.2015:3946-3952.
[5]ZHEN Y,YEUNG D Y.Co-regularized hashing for multimodal data[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.2012:1376-1384.
[6]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving hashing for cross-view retrieval[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition.2015:3864-3872.
[7]ABID H,HENG C L,MEHBOOB H,et al.A gradual approach to knowledge distillation in deep supervised hashing for large-scale image retrieval[J].Computers and Electrical Engineering.2024,120(PC):109799-109799.
[8]DING G G,GUO Y C,ZHOU J L,et al.Large-scale cross-modality search via collective matrix factorization hashing[J].IEEE Transactions on Image Processing.2016,25(11):5427-5440.
[9]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.2011:1360-1365.
[10]RASTEGARIM,CHOI J,FAKHRAEI S,et al.Predictable dual-view hashing[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning.2013:1328-1336.
[11]LI Y Q,LU Z W,LIU C.Unsupervised Triplet Hashing Method Based on Contrastive Learning [J].Application Research of Computers,2023,40(5):1434-1440
[12]PENG L K,LU X M,XU Q B.Research progress on cross-modal hash retrieval based on deep learning[J].Journal of Data Communications,2022,208(3):32-38.
[13]JIANGQ Y,LI W J.Deep Cross-modal hashing[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition.2017:3270-3278.
[14]CAOY,LIU B,LONG M S,et al.Cross-modal hamming hashing[C]//Proceedings of the European Conference on Computer Vision.2018:202-218.
[15]ZOUX T,WANG X Z,BAKKER E M,et al.Multi-label semantics preserving based deep cross-modal hashing[J].Signal Processing:Image Communication,2021,93:116131.
[16]XIE Y C,ZENG X H,WANG T H,et al.Deep online cross-modal hashing by a co-training mechanism[J].Knowledge-Based Systems,2022,257:109888.
[17]HARDOOND R,SZEDMAK S,SHAWE-TAYLOR J.Canonical correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[18]HUM Q,YANG Y,SHEN F M,et al.Collective reconstructive embeddings for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(6):2770-2784.
[19]HU P,ZHU H Y,LIN J,et al.Unsupervised contrastive cross-modal hashing[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2023,45(3):3877-3889.
[20]YAOD,LI Z X,LI B,et al.Similarity graph-correlation reconstruction network for unsupervised cross-modal hashing[J].Expert Syst.Appl.,2024,273:1-13.
[21]JIANGQ Y,LI W J.Discrete latent factor model for cross-modal hashing[J].IEEE Transactions on Image Processing,2019,28(7):3490-3501.
[22]CHENY,ZHANG H,TIAN Z B,et al.Enhanced discrete multi-modal hashing:more constraints yet less time to learn[J].IEEE Transactions on Knowledge and Data Engineering,2022,34(3):1177-1190.
[23]LI Z,YAO T,WANG L L,et al.Supervisedcontrastive discrete hashing for cross-modal retrieval[J].Knowledge-Based Systems,2024,295:1-13.
[24]ZHANG C,ZHENG W S.Semi-supervised multi-view discretehashing for fast image search[J].IEEE Transactions on Image Processing,2017,26(6):2604-2617.
[25]WU F,LI S S,GAO G W,et al.Semi-supervised cross-modalhashing via modality-specific and cross-modal graph convolutional networks[J].Pattern Recognation,2023,136(C):1-10.
[26]DENGC,CHEN Z J,LIU X L,et al.Triplet-based deep hashing network for cross-modal retrieval[J].IEEE Transactions on Image Processing,2018,27(8):3893-3903.
[27]ZHENL L,HU P,WANG X,et al.Deep supervised cross-modal retrieval[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:10386-10395.
[28]HUANG Z,HU H W,SU M.Hybrid DAER based cross-modal retrieval exploiting deep representation learning.Entropy[J].Entrpoy,2023,25(8):1216-1234.
[29]GOODFELLOWI J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems.2014:2672-2680.
[30]WANGB K,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162.
[31]PENG Y X,QI J W.CM-GANs:cross-modal generative adver-sarial networks for common representation learning[J].Association for Computing Machinery,2019,15(22):1-24.
[32]ANDREJ K,LI F F.Deep visual-semantic alignments for genera-ting image descriptions[C]//2015 IEEE Conference on ComputerVision and Pattern Recognition.2015:3128-3137.
[33]CAI L W,ZHU L,ZHANG H Y,et al.DA-GAN:Dualattention generative adversarial network for cross-modal retrieval[J].Future Internet,2022,14(2):43-43.
[34]WEN K,GU X,CHENG Q.Learning dual semantic relationswith graph attention for image-text matching[J].IEEE Transactions on Circuits and Systems for Video Technology,2021,31(7):2866-2879.
[35]ZHANGL,CHEN L T,OU W H,et al.Semi-supervised cross-modal retrieval with graph-based semantic alignment network[J].Computers and Electrical Engineering,2022,102(C):1-19.
[36]CHUAT S,TANG J H,HONG R C,et al.NUS-WIDE:a real-world web image database from national university of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9.
[37]HUISKESM J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.2008:39-43.
[38]WANG X Z,ZOU X T,BAKKER E M,et al.Self-constrainingand attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271.
[39]ZENG Z X,MAO W J.A comprehensive empirical study of vision-language pre-trained model for supervised cross-modal retrieval[J].arXiv:2201.02772,2022.
[40]YANG X H,WANG Z,LIU W H,et al.Deep adversarialmulti-label cross-modal hashing algorithm[J].International Journal of Multimedia Information Retrieval,2023,12:1-12.
[41]NI H M,FANG X Z,KANG P P,et al.SCH:Symmetric consistent hashing for cross-modal retrieval[J].Signal Processing,2024,215(C):1-12.
[1] CHEN Qirui, WANG Baohui, DAI Chencheng. Research on Electrocardiogram Classification and Recognition Algorithm Based on Transfer Learning [J]. Computer Science, 2025, 52(6A): 240900073-8.
[2] GAO Xinjun, ZHANG Meixin, ZHU Li. Study on Short-time Passenger Flow Data Generation and Prediction Method for RailTransportation [J]. Computer Science, 2025, 52(6A): 240600017-5.
[3] DU Yuanhua, CHEN Pan, ZHOU Nan, SHI Kaibo, CHEN Eryang, ZHANG Yuanpeng. Correntropy Based Multi-view Low-rank Matrix Factorization and Constraint Graph Learning for Multi-view Data Clustering [J]. Computer Science, 2025, 52(6A): 240900131-10.
[4] BAO Shenghong, YAO Youjian, LI Xiaoya, CHEN Wen. Integrated PU Learning Method PUEVD and Its Application in Software Source CodeVulnerability Detection [J]. Computer Science, 2025, 52(6A): 241100144-9.
[5] ZHANG Yaolin, LIU Xiaonan, DU Shuaiqi, LIAN Demeng. Hybrid Quantum-classical Compressed Generative Adversarial Networks Based on Matrix Product Operators [J]. Computer Science, 2025, 52(6): 74-81.
[6] WANG Xiao, LI Guanxiong, LI Na, YUAN Dongfeng. Semi-supervised Learning Flow Field Prediction Method Based on Gaussian Mixture Discrimination [J]. Computer Science, 2025, 52(6): 88-95.
[7] WU You, WANG Jing, LI Peipei, HU Xuegang. Semi-supervised Partial Multi-label Feature Selection [J]. Computer Science, 2025, 52(4): 161-168.
[8] SHEN Yaxin, GAO Lijian , MAO Qirong. Semi-supervised Sound Event Detection Based on Meta Learning [J]. Computer Science, 2025, 52(3): 222-230.
[9] XIN Yongjie, CAI Jianghui, HE Yanting, SU Meihong, SHI Chenhui, YANG Haifeng. Multi-view Clustering Based on Cross-structural Feature Selection and Graph Cycle AdaptiveLearning [J]. Computer Science, 2025, 52(2): 145-157.
[10] LIU Yulu, WU Shuhong, YU Dan, MA Yao, CHEN Yongle. Cross-age Identity Membership Inference Based on Attention Feature Decomposition [J]. Computer Science, 2024, 51(9): 401-407.
[11] GUO Fangyuan, JI Genlin. Video Anomaly Detection Method Based on Dual Discriminators and Pseudo Video Generation [J]. Computer Science, 2024, 51(8): 217-223.
[12] HE Zhilin, GU Tianhao, XU Guanhua. Few-shot Semi-supervised Semantic Image Translation Algorithm Based on Prototype Correction [J]. Computer Science, 2024, 51(8): 224-231.
[13] XU Bei, LIU Tong. Semi-supervised Emotional Music Generation Method Based on Improved Gaussian Mixture Variational Autoencoders [J]. Computer Science, 2024, 51(8): 281-296.
[14] ZHANG Le, YU Ying, GE Hao. Mural Inpainting Based on Fast Fourier Convolution and Feature Pruning Coordinate Attention [J]. Computer Science, 2024, 51(6A): 230400083-9.
[15] ZHUO Peiyan, ZHANG Yaona, LIU Wei, LIU Zijin, SONG You. CTGANBoost:Credit Fraud Detection Based on CTGAN and Boosting [J]. Computer Science, 2024, 51(6A): 230600199-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!