Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 240400151-10.doi: 10.11896/jsjkx.240400151

• Image Processing & Multimedia Technology • Previous Articles     Next Articles

FLIP-based Joint Similarity Preserving Hashing for Cross-modal Retrieval

TANG Lijun , YANG Zheng, ZHAO Nan, ZHAI Suwei   

  1. Electric Power Research Institute,Yunnan Power Grid Co.,Ltd.,Kunming 650217,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:TANG Lijun,born in 1985,postgraduate,senior engineer.His main research interests include the application of power grid automation and artificial intelligence technology.
    ZHAI Suwei,born in 1991,post-doctoral.His main research interests include power system and automation,as well as artificial intelligence technology.

Abstract: Recently,supervised cross-modal retrieval techniques have garnered significant attention.Based on sample-level semantic relationships,existing methods primarily focus on assessing the sample-wise similarity while neglecting the potential impact of label distribution on improving retrieval performance.Furthermore,existing approaches still face challenges related to inaccurate feature extraction and sluggish processing rates.To address this problems,we introduce a new method,termed FLIP-based joint similarity preserving hashing(FJSPH),for cross-modal retrieval.Specifically,we leverage the fast language image pre-training model(FLIP) to extract more accurate cross-modal features.To further reduce the cross-modal semantic differences,we attempt to enhance modal interaction and refine modal semantic representation through multimodal comparative learning.In addition,we use sample-wise similarity and cluster-wise similarity to further exploit the semantic correlation between different modalities.This approach ensures that samples sharing similar semantics are positioned closer together in Hamming space,thereby producing more distinctive hash codes.The experimental results on three cross-modal datasets indicate that the FJSPH approach exhibits excellent retrieval performance in cross-modal retrieval.

Key words: Joint similarity preserving, Fast language-image pre-trained model, Cross-modal retrieval, Sample-wise similarity, Cluster-wise similarity

CLC Number: 

  • TP391
[1]DING G G,GUO Y C,ZHOU J.Collective matrix factorization hashing for multimodal data[C]//Proceedings IEEE Conf.Comput.Vis.Pattern Recognit..2014:2075-2082.
[2]SU S P,ZHONG Z S,ZHANG C.Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval[C]//Proceedings IEEE/CVF Int.Conf.Comput.Vis..2019:3027-3035.
[3]YANG D J,WU D Y,ZHANG W Q,et al.Deep semantic-alignment hashing for unsupervised cross-modal retrieval[C]//Proceedings 2020 Int.Conf.Multimed.Retr..2020:44-52.
[4]LIU S,QIAN S S,GUAN Y,et al.Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval[C]//Proceedings 43rd Int.ACM SIGIR Conf.Res.Dev.Inf.Retr..2020:1379-1388.
[5]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving hashing for cross-view retrieval[C]//Proceedings IEEE Conf.Comput.Vis.Pattern Recognit..2015:3864-3872.
[6]LI T Y,YANG X C,WANG B,et al.bi-CMR:Bidirectional reinforcement guided hashing for effective cross-modal retrieval[C]//Proceedings AAAI Conf.Artif.Intell..2022:10275-10282.
[7]RADFORD A,KIM J W,HALLACY C,et al.Learning transferable visual models from natural language supervision,presented[J].Int.Conf.Mach.Learn..2021:8748-8763.
[8]SAUER A,KARRAS T,LAINE S,et al.Stylegan-t:Unlocking the power of gans for fast large-scale text-to-image synthesis[J].arXiv:2301.09515,2023.
[9]CHEN R N,et al.Clip2scene:Towards label-efficient 3d scene understanding by clip[C]//Proceedings IEEE/CVF Conf.Comput.Vis.Pattern Recognit..2023:7020-7030.
[10]YU W W,LIU Y L,HUA W,et al.Turning a clip model into a scene text detector[C]//Proceedings IEEE/CVF Conf.Comput.Vis.Pattern Recognit..2023:6978-6988.
[11]HE K M,CHEN X L,XIE S N,et al.Masked autoencoders are scalable vision learners[C]//Proceedings IEEE/CVF Conf.Comput.Vis.Pattern Recognit..2022:16000-16009.
[12]LI Y H,FAN H Q,HU R H,et al.Scaling language-image pre-training via masking[C]//Proceedings IEEE/CVF Conf.Comput.Vis.Pattern Recognit..2023:23390-23400.
[13]MANDAL D,CHAUDHURY K N,BISWAS S.Generalized semantic preserving hashing for n-label cross-modal retrieval[C]//Proceedings IEEE Conf.Comput.Vis.Pattern Recognit..2017:4076-4084.
[14]WANG Y X,CHEN Z D,LUO X,et al.High-dimensional sparse cross-modal hashing with fine-grained similarity embedding[C]//Proceedings Web Conf..2021:2900-2909.
[15]KUMAR S,UDUPA R.Learning hash functions for cross-view similarity search,” presented at 22nd Int[J].Joint Conf.Artif.Intell.,2011.
[16]WANG W W,SHEN Y M,ZHANG H F,et al.Set and rebase:determining the semantic graph connectivity for unsupervised cross-modal hashing[C]//Proceedings 29th Int.Joint Conf.Artif.Intell..2021:853-859.
[17]LI X L,HU D,NIE F P.Deep binary reconstruction for cross-modal hashing[C]//Proceedings 25th ACM Int.Conf.Multimedia.2017:1398-1406.
[18]YU J,ZHOU H,ZHAN Y B,et al.Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing[C]//Proceedings AAAI Conf.Artif.Intell..2021:4626-4634.
[19]TU R C,et al.Unsupervised cross-modal hashing via semantic text mining[J].IEEE Trans Multimedia,2023.
[20]XIA X Y,DONG G H,LI F L,et al.When clip meets cross-modal hashing retrieval:A new strong baseline[J].Inf.Fusion,2023,100:101968.
[21]JIN L,LI K,HU H,et al.Semantic neighbor graph hashing for multimodal retrieval[J].IEEE Trans.Image Process.,2017,27(3):1405-1417.
[22]TANG J,WANG K,SHAO L.Supervised matrix factorization hashing for cross-modal retrieval[J].IEEE Trans.Image Process.,2016,25(7):3157-3166.
[23]LIU X,HU Z K,LING H B,et al.Mtfh:A matrix tri-factorization hashing framework for efficient cross-modal retrieval[J].IEEE Trans.Pattern Anal.Mach.Intell.,2019,43(3):964-981.
[24]WANG Y X,LUO X,NIE L Q,et al.Batch:A scalable asymmetric discrete cross-modal hashing[J].IEEE Trans.Knowl.Data Eng.,2020,33(11):3507-3519.
[25]JIANG Q Y,LI W J.Deep cross-modal hashing[C]//Procee-dings IEEE Conf.Comput.Vis.Pattern Recognit..2017:3232-3240.
[26]XIE D,DENG C,LI C,et al.Multi-task consistency-preserving adversarial hashing for cross-modal retrieval[J].IEEE Trans.Image Process.,2020,29:3626-3637.
[27]XU R Q,LI C,YAN J C,et al.Graph convolutional networkhashing for cross-modal retrieval[C]//IJCAI,2019.2019:982-988.
[28]TU R C,MAO X L,MA B,et al.Deep cross-modal hashing with hashing functions and unified hash codes jointly learning[J].IEEE Trans.Knowl.Data Eng.,2020,34(2):560-572.
[29]BAI C,ZENG C,MA Q,et al.Deep adversarial discrete hashing for cross-modal retrieval[C]//Proceedings 2020 Int.Conf.Multimed.Retr..2020:525-531.
[30]ZENG Z X,MAO W J.A comprehensive empirical study of vision-language pre-trained model for supervised cross-modal retrieval[J].arXiv:2201.02772,2022.
[31]HUISKES M J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings 1st ACM Int.Conf.Multimed.Inf.Retr..2008:39-43.
[32]CHUA T S,TANG J H,HONG R C,et al.Nus-wide:a real-world web image database from national university of singapore[C]//Proceedings ACM Int.Conf.Image Video Retr..2009:1-9.
[33]LIN T Y.Microsoft coco:Common objects in context[C]//Computer Vision-ECCV 2014:13th European Conference,Zurich,Switzerland,September 6-12,2014,Proceedings,Part V 13.Springer,2014:740-755.
[34]SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[J].arXiv:1409.1556,2014.
[35]ZHENG C Q,ZHU L,CHENG Z Y,et al.Adaptive partialmulti-view hashing for efficient social image retrieval[J].IEEE Trans.Multimedia,2020,23:4079-4092.
[36]ZHANG D L,WU X J,YU J.Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval[J].ACM Trans.Multimed.Comput.Commun.Appl.,2021,17(3):1-18.
[37]LUO K Y,ZHANG C,LI H X,et al.Adaptive marginalized semantic hashing for unpaired cross-modal retrieval[J].IEEE Trans.Multimedia,2023.
[38]CHEN Y,ZHANG H,TIAN Z B,et al.Enhanced discretemulti-modal hashing:More constraints yet less time to learn[J].IEEE Trans.Knowl.Data Eng.,2020,34(3):1177-1190.
[39]HU Z K,CHEUNG Y M,LI M K,et al.Joint semantic preserving sparse hashing for cross-modal retrieval[J].IEEE Trans.Circuits Syst.Video Technol.,2023.
[40]LI C,DENG C,LI N,et al.Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings IEEE Conf.Comput.Vis.Pattern Recognit..2018:4242-4251.
[41]ZHANG Z,LUO H Y,ZHU L,et al.Modality-invariant asymmetric networks for cross-modal hashing[J].IEEE Trans.Knowl.Data Eng.,2022,35(5):5091-5104,.
[42]YU E,MA J H,SUN J D,et al.Deep discrete cross-modal hashing with multiple supervision[J].Neurocomputing,2022,486:215-224.
[43]LI X,YU J,LU H C,et al.Mafh:Multilabel aware framework for bit-scalable cross-modal hashing[J].Knowl.Based Syst.,2023,279:110922.
[44]HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving neural networks by preventing co-adaptation of feature detectors[J].arXiv:1207,0580,2012.
[45]KO Y.A study of term weighting schemes using class information for text classification[C]//Proceedings 35th Int.ACM SIGIR Conf.Res.Dev.Inf.Retr..2012:1029-1030.
[1] ZHANG Changfan, MA Yuanyuan, LIU Jianhua, HE Jing. Dual Gating-Residual Feature Fusion for Image-Text Cross-modal Retrieval [J]. Computer Science, 2023, 50(6A): 220700030-7.
[2] YANG Xiaoyu, LI Chao, CHEN Shunyao, LI Haoliang, YIN Guangqiang. Text-Image Cross-modal Retrieval Based on Transformer [J]. Computer Science, 2023, 50(4): 141-148.
[3] GU Baocheng, LIU Li. Cross-modal Hash Retrieval Based on Text-guided Image Semantic Fusion [J]. Computer Science, 2023, 50(11A): 221100191-6.
[4] HAN Hui-zhen, LIU Li-bo. Lycium Barbarum Pest Retrieval Based on Attention and Visual Semantic Reasoning [J]. Computer Science, 2022, 49(11A): 211200087-6.
[5] LIU Li-bo, GOU Ting-ting. Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning [J]. Computer Science, 2021, 48(9): 200-207.
[6] FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[7] SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo. Embedding Consensus Autoencoder for Cross-modal Semantic Analysis [J]. Computer Science, 2021, 48(7): 93-98.
[8] DENG Yi-jiao, ZHANG Feng-li, CHEN Xue-qin, AI Qing, YU Su-zhe. Collaborative Attention Network Model for Cross-modal Retrieval [J]. Computer Science, 2020, 47(4): 54-59.
[9] SHAO Yang-xue, MENG Wei, KONG Deng-zhen, HAN Lin-xuan, LIU Yang. Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning [J]. Computer Science, 2020, 47(12): 205-209.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!