计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 123-131.doi: 10.11896/jsjkx.220600011
曹晓雯, 梁美玉, 鲁康康
CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang
摘要: 跨媒体哈希因其优越的搜索效率和较低的存储成本而在跨媒体搜索任务中受到广泛关注。然而,现有方法无法充分保持多模态数据的高阶语义相关性和多标签语义信息,从而导致学习到的哈希编码的质量下降。为了解决上述问题,提出了基于细粒度语义推理的跨媒体双路对抗哈希(Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model,SDAH)学习模型,通过最大程度地挖掘不同模态间的细粒度语义关联,产生紧凑且一致的跨媒体统一高效哈希语义表示。首先,提出了基于跨媒体协同注意力机制的细粒度跨媒体语义关联学习和推理方法,基于跨媒体注意力机制协同学习图像和文本的细粒度隐含语义关联,获取图像和文本的显著性语义推理特征;然后,建立了跨媒体双路对抗哈希网络,通过联合学习模态内和模态间的语义相似性约束,并通过双路对抗学习机制更好地对齐不同模态哈希码的语义分布,产生更高质量和更具判别性的跨媒体统一哈希表示,促进了跨媒体语义融合,提升了跨媒体搜索性能。在两个公开数据集上与现有方法的对比实验结果验证了所提方法在各种跨媒体搜索场景下的优越性能。
中图分类号:
[1]LIU S,QIAN S S,GUAN Y,et al.Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2020:1379-1388. [2]ZHANG Y F,ZHOU W G,WANG M,et al.Deep relation embedding for cross-modal retrieval[J].IEEE Transactions on Image Processing,2020,30:617-627. [3]HE Y,LIU X,CHEUNG Y M,et al.Cross-Graph AttentionEnhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval[C]//Proceedings of the 44th Interna-tional ACM SIGIR Conference on Research and Development in Information Retrieval.2021:1865-1869. [4]ZHANG P F,DUAN J S,HUANG Z,et al.Joint-teaching:Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:1517-1525. [5]ZHANG D Q,LI W J.Large-scale supervised multimodal ha-shing with semantic correlation maximization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:2177-2183. [6]LIN Z J,DING G G,HU M Q,et al.Semantics-preserving ha-shing for cross-view retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:3864-3872. [7]LIU X B,NIE X S,SUN H L,et al.Modality-specific structure preserving hashing for cross-modal retrieval[C]//2018 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2018:1678-1682. [8]LIANG M Y,DU J P,YANG C X,et al.Cross-Media Semantic Correlation Learning Based on Deep Hash Network and Semantic Expansion for Social Network Cross-Media Search[J].IEEE Transactions on Neural Networks and Learning Systems,2020,31(9):3634-3648. [9]DEVRAJ M,KUNAL N C,SOMA B.Generalized semantic preserving hashing for cross-modal retrieval[J].IEEE Transations on Image Processing,2018,28(1):102-112. [10]CHEN Z D,WANG Y X,LI H Q,et al.A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:1694-1702. [11]JIANG Q Y,LI W J.Deep cross-modal hashing[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:3232-3240. [12]GU W,GU X Y,GU J Z,et al.Adversary guided asymmetric hashing for cross-modal retrieval[C]//Proceedings of the 2019International Conference on Multimedia Retrieval.2019:159-167. [13]WANG X Z,ZOU X T,BAKKER E M,et al.Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval[J].Neurocomputing,2020,400:255-271. [14]ZOU X T,WANG X Z,BAKKER E M,et al.Multi-label semantics preserving based deep cross-modal hashing[J].Signal Processing:Image Communication,2021,93:116131. [15]YAO H L,ZHAN Y W,CHEN Z D,et al.TEACH:Attention-Aware Deep Cross-Modal Hashing[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval.2021:376-384. [16]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[J].arXiv:1406.2661,2014. [17]WANG B L,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162. [18]LI C,DENG C,LI N,et al.Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:4242-4251. [19]BAI C,ZENG C,MA Q,et al.Deep adversarial discrete hashing for cross-modal retrieval[C]//Proceedings of the 2020 International Conference on Multimedia Retrieval.2020:525-531. [20]HAN L G,MIN M R,STATHOPOULOS A,et al.Dual projection generative adversarial networks for conditional imagegene-ration[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2021:14438-14447. [21]KARRAS T,AITTALA M,HELLSTEN J,et al.Training gene-rative adversarial networks with limited data[J].Advances in Neural Information Processing Systems,2020,33:12104-12114. [22]SANTORO A,RAPOSO D,BARRETT D G,et al.A simple neural network module for relational reasoning[J].arXiv:1706.01427,2017. [23]MESSINA N,AMATO G,CARRARA F,et al.Learning visual features for relational CBIR[J].International Journal of Multimedia Information Retrieval,2020,9(2):113-124. [24]MESSINA N,AMATO G,CARRARA F,et al.Learning rela-tionship-aware visual features[C]//Proceedings of the Euro-pean Conference on Computer Vision(ECCV) Workshops.2018:486-501. [25]HU R H,ANDREAS J,ROHRBACH M,et al.Learning to reason:End-to-end module networks for visual question answering[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:804-813. [26]ZHENG W F,LIU X J,NI X B,et al.Improving visual reaso-ning through semantic representation[J].IEEE Access,2021,9:91476-91486. [27]WANG J B,WANG W,WANG L,et al.Learning visual relationship and context-aware attention for image captioning[J].Pattern Recognition,2020,98:107075. [28]YANG L,HU H,LU X L,et al.Constrained lstm and residual attention for image captioning[J].ACM Transactions on Multimedia Computing,Communications,and Applications(TOMM),2020,16(3):1-18. [29]LI Y K,OUYANG W L,ZHOU B,et al.Factorizable net:anefficient subgraph-based framework for scene graph generation[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:335-351. [30]REN S Q,HE K M,GIRSHICK R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[J].Advances in Neural Information Processing Systems,2017,39(6):1137-1149. [31]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1(Long and Short Papers).2019:4171-4186. [32]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762 2017. [33]MESSINA N,FALCHI F,ESULI A,et al.Transformer reaso-ning network for image-text matching and retrieval[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:5222-5229. [34]ZHAO F,HUANG Y,WANG L,et al.Deep semantic ranking based hashing for multi-label image retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:1556-1564. [35]HUISKES M J,LEW M S.The mir flickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.2008:39-43. [36]CHUA T S,TANG J H,HONG R C,et al.Nus-wide:a real-world web image database from national university of singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9. [37]WOLF T,DEBUT L,SANH V,et al.Huggingface's transfor-mers:State-of-the-art natural language processing[J].arXiv:1910.03771,2019. [38]ANDERSON P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:6077-6086. |
[1] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
[2] | 刘立波, 苟婷婷. 融合深度典型相关分析和对抗学习的跨模态检索 Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning 计算机科学, 2021, 48(9): 200-207. https://doi.org/10.11896/jsjkx.200600119 |
[3] | 王胜, 张仰森, 陈若愚, 向尕. 基于细粒度差异特征的文本匹配方法 Text Matching Method Based on Fine-grained Difference Features 计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008 |
[4] | 孙全, 曾晓勤. 基于生成对抗网络的图像修复 Image Inpainting Based on Generative Adversarial Networks 计算机科学, 2018, 45(12): 229-234. https://doi.org/10.11896/j.issn.1002-137X.2018.12.038 |
[5] | 陈恒. 一种基于Spark的大规模语义数据分布式推理框架 Spark Based Large-scale Semantic Data Distributed Reasoning Framework 计算机科学, 2016, 43(Z11): 93-96. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.020 |
[6] | 崔华,应时,袁文杰,胡罗凯. 语义Web服务组合综述 Review of Semantic Web Service Composition 计算机科学, 2010, 37(5): 21-25. |
[7] | . 基于粒语义推理的粒归结研究 计算机科学, 2009, 36(1): 171-176. |
[8] | 危辉 危炜. 言语获取、理解和生成过程中的语义推理问题 计算机科学, 2002, 29(5): 94-96. |
|