计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 200-207.doi: 10.11896/jsjkx.200600119

• 计算机图形学&多媒体 • 上一篇    下一篇

融合深度典型相关分析和对抗学习的跨模态检索

刘立波, 苟婷婷   

  1. 宁夏大学信息工程学院 银川750021
  • 收稿日期:2020-06-19 修回日期:2020-12-04 出版日期:2021-09-15 发布日期:2021-09-10
  • 通讯作者: 刘立波(liulib@163.com)
  • 基金资助:
    国家自然科学基金(61862050);西部一流大学科研创新项目(ZKZD2017005);宁夏大学研究生创新项目(GIP2019054)

Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning

LIU Li-bo, GOU Ting-ting   

  1. School of Information Engineering,Ningxia University,Yinchuan 750021,China
  • Received:2020-06-19 Revised:2020-12-04 Online:2021-09-15 Published:2021-09-10
  • About author:LIU Li-bo,born in 1974,Ph.D,professor,is a member of China Computer Federation.Her main research interests include intelligent information proces-sing and so on.
  • Supported by:
    National Natural Science Foundation of China(61862050),Scientific Research Innovation Project of First-Class Western Universities(ZKZD2017005) and Postgraduate Innovation Project of Ningxia University(GIP2019054)

摘要: 文中提出一种融合深度典型相关分析和对抗学习的跨模态检索方法(DCCA-ACMR),该方法提高了无标签样本的利用率,能够学习到更有力的特征投影模型,进而提升了跨模态检索准确率。具体而言,在DCGAN框架下:1)在图像与文本两个单模态的表示层间增加深度典型相关分析约束,构建图文特征投影模型,充分挖掘样本对的语义关联性;2)以图文特征投影模型作为生成器,以模态特征分类模型作为判别器共同构成图文跨模态检索模型;3)利用有标签样本和无标签样本,在生成器和判别器的相互对抗中学习到样本的公共子空间表示。在Wikipedia和NUSWIDE-10k两个公开数据集上,采用平均准确率均值(mAP)作为评价指标对所提方法进行验证。图像检索文本和文本检索图像的平均mAP值在两个数据集上分别为0.556和0.563。实验结果表明,DCCA-ACMR优于现有的代表性方法。

关键词: 对抗学习, 跨模态检索, 深度典型相关分析, 深度卷积生成对抗网络

Abstract: This paper proposes a cross-modal retrieval method (DCCA-ACMR) that integrates deep canonical correlation analysis and adversarial learning.The method can improve the utilization rate of unlabeled samples,learn more powerful feature projection models,and improve the accuracy of cross-modal retrieval.Specifically,under the DCGAN framework:1)depth canonical correlation analysis constraints are added between the two single-modal representation layers of image and text,to construct a graphic feature projection model,and the semantic relevance of sample pairs is exploited fully;2)the graphic feature projection model is used as a generator,and the modal feature classification model is used as a discriminator to form a graphic and text cross-modal retrieval model;3)the common subspace representation of samples is learned by using labeled samples and unlabeled samples in the confrontation between generator and discriminator.We utilize average accuracy rate (mAP) to evaluate the proposed method on the two public datsets,Wikipedia and NUSWIDE-10k.The average mAP values of image-to-text retrievaland text-image retrie-val are 0.556 and 0.563 respectively on the two datasets.Experimental results show that DCCA-ACMR method is superior to the existing representative methods.

Key words: Adversarial learning, Cross-modal retrieval, Deep canonical analysis, Deep convolution generative adversarial network

中图分类号: 

  • TP391.3
[1]WAQAS M,TU S,KOUBAA A,et al.Deep Learning Techniques for Future Intelligent Cross-Media Retrieval[J].arXiv:2008.01191,2020.
[2]HOTELLING H.Relations between two sets of variates[M]//Breakthroughs in Statistics.New York:Springer,1992:162-190.
[3]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical Correlation Analysis:An Overview with Application to Learning Methods[J].Neural Computation,2004,16(12):2639-2664.
[4]VERMA Y,JAWAHAR C V.Im2Text and Text2Im:Associating Images and Texts for Cross-Modal Retrieval[C]//BMVC.2014:2.
[5]KLEIN B,LEV G,SADEH G,et al.Associating neural wordembeddings with deep image representations using fisher vectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4437-4446.
[6]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:251-260.
[7]RANJAN V,RASIWASIA N,JAWAHAR C V.Multi-labelcross-modal retrieval[C]//Proceedings of the IEEE InternationalConference on Computer Vision.2015:4094-4102.
[8]YAO T,MEI T,NGO C W.Learning query and image similarities with ranking canonical correlation analysis[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:28-36.
[9]ZUO C,FENG S J,ZHANG X Y,et al.The calculated imaging:deep learning situation,challenges and future[J].Journal of Optics,2020,40(1):45-70.
[10]WANG F,WANG H,BIAN Y M,et al.Deep learning applications in computational imaging[J].Journal of Optics,2020,40(1):31-44.
[11]WANG C,YANG H,MEINEL C.Deep semantic mapping forcross-modal retrieval[C]//2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2015:234-241.
[12]CASTREJON L,AYTAR Y,VONDRICK C,et al.Learningaligned cross-modal representations from weakly aligned data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2940-2949.
[13]WEI Y,ZHAO Y,LU C,et al.Cross-Modal Retrieval withCNN Visual Features:A New Baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460.
[14]HOANG T,DO T T,NGUYEN T V,et al.Unsupervised Deep Cross-modality Spectral Hashing[J].IEEE Transactions on Image Processing,2020,29:8391-8406.
[15]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Neural Information Processing Systems.2014:2672-2680.
[16]REED S,AKATA Z,YAN X C,et al.Generative adversarialtext to image synthesis[C]// Proceedings of the 33rd International Conference on Machine Learning.New York,USA:JML,2016:1060-1069.
[17]LIANG X D,HU Z T,ZHANG H,et al.Recurrent topic-transition gan for visual paragraph generation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3362-3371.
[18]WANG B,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162.
[19]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015.
[20]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on Machine Learning.PMLR,2013:1247-1255.
[21]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep Leaning[M].Cambridge:The MIT Press,2016:26-29.
[22]ABADI M,BARHAM P,CHEN J,et al.Tensorflow:A system for large-scale machine learning[C]//12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).2016:265-283.
[23]WEI Y,ZHAO Y,ZHU Z,et al.Modality-dependent cross-media retrieval[J].ACM Transactions on Intelligent Systems and Technology (TIST),2016,7(4):1-13.
[24]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9.
[25]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv:1607.06215,2016.
[26]XUAN R S,OU W H,SONG H Q,et al.Research on the cross-modal retrieval method of semi-supervised confrontation with graph constraint[J].Journal of Guizhou Normal University (Natural Science Edition),2019,37(4):86-94.
[27]WANG K,HE R,WANG W,et al.Learning coupled feature spaces for cross-modal matching[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2088-2095.
[28]SRIVASTAVA N,SALAKHUTDINOV R R.Multimodallearning with deep boltzmann machines[C]//Advances in Neural Information Processing Systems.2012:2222-2230.
[29]FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia.2014:7-16.
[30]ZHAI X,PENG Y,XIAO J.Learning Cross-Media Joint Representation with Sparse and Semi supervised Regularization[J].IEEE Transactions on Circuits & Systems for Video Technology,2014,24(6):965-978.
[31]NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]//ICML.2011.
[32]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(10):2010-2023.
[33]LONG M,CAO Y,WANG J,et al.Composite CorrelationQuantization for Efficient Multimodal Retrieval[C]//Procee-dings of the 39th International ACM SIGIR Conference on Research and Development in Infromation Retrieval.2016:578-588.
[34]PENG Y,QI J,HUANG X,et al.CCL:Cross-modal correlationlearning with multi-grained fusion by hierarchical network[J].IEEE Trans.Multimed.,2018,20(2):405-420.
[35]SHANG F,ZHANG H,ZHU L,et al.Adversarial cross-modal retrieval based on dictionary learning[J].Neurocomputing,2019,355:93-104.
[36]HU P,PENG D,WANG X,et al.Multimodal adversarial net-work for cross-modal retrieval[J].Knowledge-Based Systems,2019,180:38-50.
[1] 曹晓雯, 梁美玉, 鲁康康.
基于细粒度语义推理的跨媒体双路对抗哈希学习模型
Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model
计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011
[2] 侯宏旭, 孙硕, 乌尼尔.
蒙汉神经机器翻译研究综述
Survey of Mongolian-Chinese Neural Machine Translation
计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006
[3] 王胜, 张仰森, 陈若愚, 向尕.
基于细粒度差异特征的文本匹配方法
Text Matching Method Based on Fine-grained Difference Features
计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008
[4] 冯霞, 胡志毅, 刘才华.
跨模态检索研究进展综述
Survey of Research Progress on Cross-modal Retrieval
计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
[5] 谢源, 苗玉彬, 许凤麟, 张铭.
基于半监督深度卷积生成对抗网络的注塑瓶表面缺陷检测模型
Injection-molded Bottle Defect Detection Using Semi-supervised Deep Convolutional Generative Adversarial Network
计算机科学, 2020, 47(7): 92-96. https://doi.org/10.11896/jsjkx.190700093
[6] 邓一姣, 张凤荔, 陈学勤, 艾擎, 余苏喆.
面向跨模态检索的协同注意力网络模型
Collaborative Attention Network Model for Cross-modal Retrieval
计算机科学, 2020, 47(4): 54-59. https://doi.org/10.11896/jsjkx.190600181
[7] 邵阳雪, 孟伟, 孔德珍, 韩林轩, 刘扬.
基于深度学习的特种车辆跨模态检索方法
Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning
计算机科学, 2020, 47(12): 205-209. https://doi.org/10.11896/jsjkx.191000132
[8] 王格格, 郭涛, 李贵洋.
多层感知器深度卷积生成对抗网络
Multi-layer Perceptron Deep Convolutional Generative Adversarial Network
计算机科学, 2019, 46(9): 243-249. https://doi.org/10.11896/j.issn.1002-137X.2019.09.036
[9] 孙全, 曾晓勤.
基于生成对抗网络的图像修复
Image Inpainting Based on Generative Adversarial Networks
计算机科学, 2018, 45(12): 229-234. https://doi.org/10.11896/j.issn.1002-137X.2018.12.038
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!