计算机科学 ›› 2021, Vol. 48 ›› Issue (9): 200-207.doi: 10.11896/jsjkx.200600119
刘立波, 苟婷婷
LIU Li-bo, GOU Ting-ting
摘要: 文中提出一种融合深度典型相关分析和对抗学习的跨模态检索方法(DCCA-ACMR),该方法提高了无标签样本的利用率,能够学习到更有力的特征投影模型,进而提升了跨模态检索准确率。具体而言,在DCGAN框架下:1)在图像与文本两个单模态的表示层间增加深度典型相关分析约束,构建图文特征投影模型,充分挖掘样本对的语义关联性;2)以图文特征投影模型作为生成器,以模态特征分类模型作为判别器共同构成图文跨模态检索模型;3)利用有标签样本和无标签样本,在生成器和判别器的相互对抗中学习到样本的公共子空间表示。在Wikipedia和NUSWIDE-10k两个公开数据集上,采用平均准确率均值(mAP)作为评价指标对所提方法进行验证。图像检索文本和文本检索图像的平均mAP值在两个数据集上分别为0.556和0.563。实验结果表明,DCCA-ACMR优于现有的代表性方法。
中图分类号:
[1]WAQAS M,TU S,KOUBAA A,et al.Deep Learning Techniques for Future Intelligent Cross-Media Retrieval[J].arXiv:2008.01191,2020. [2]HOTELLING H.Relations between two sets of variates[M]//Breakthroughs in Statistics.New York:Springer,1992:162-190. [3]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canonical Correlation Analysis:An Overview with Application to Learning Methods[J].Neural Computation,2004,16(12):2639-2664. [4]VERMA Y,JAWAHAR C V.Im2Text and Text2Im:Associating Images and Texts for Cross-Modal Retrieval[C]//BMVC.2014:2. [5]KLEIN B,LEV G,SADEH G,et al.Associating neural wordembeddings with deep image representations using fisher vectors[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4437-4446. [6]RASIWASIA N,PEREIRA J C,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of the 18th ACM International Conference on Multimedia.2010:251-260. [7]RANJAN V,RASIWASIA N,JAWAHAR C V.Multi-labelcross-modal retrieval[C]//Proceedings of the IEEE InternationalConference on Computer Vision.2015:4094-4102. [8]YAO T,MEI T,NGO C W.Learning query and image similarities with ranking canonical correlation analysis[C]//Procee-dings of the IEEE International Conference on Computer Vision.2015:28-36. [9]ZUO C,FENG S J,ZHANG X Y,et al.The calculated imaging:deep learning situation,challenges and future[J].Journal of Optics,2020,40(1):45-70. [10]WANG F,WANG H,BIAN Y M,et al.Deep learning applications in computational imaging[J].Journal of Optics,2020,40(1):31-44. [11]WANG C,YANG H,MEINEL C.Deep semantic mapping forcross-modal retrieval[C]//2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).IEEE,2015:234-241. [12]CASTREJON L,AYTAR Y,VONDRICK C,et al.Learningaligned cross-modal representations from weakly aligned data[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:2940-2949. [13]WEI Y,ZHAO Y,LU C,et al.Cross-Modal Retrieval withCNN Visual Features:A New Baseline[J].IEEE Transactions on Cybernetics,2017,47(2):449-460. [14]HOANG T,DO T T,NGUYEN T V,et al.Unsupervised Deep Cross-modality Spectral Hashing[J].IEEE Transactions on Image Processing,2020,29:8391-8406. [15]GOODFELLOW I J,POUGETABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Neural Information Processing Systems.2014:2672-2680. [16]REED S,AKATA Z,YAN X C,et al.Generative adversarialtext to image synthesis[C]// Proceedings of the 33rd International Conference on Machine Learning.New York,USA:JML,2016:1060-1069. [17]LIANG X D,HU Z T,ZHANG H,et al.Recurrent topic-transition gan for visual paragraph generation[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:3362-3371. [18]WANG B,YANG Y,XU X,et al.Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Confe-rence on Multimedia.2017:154-162. [19]RADFORD A,METZ L,CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[J].arXiv:1511.06434,2015. [20]ANDREW G,ARORA R,BILMES J,et al.Deep canonical correlation analysis[C]//International Conference on Machine Learning.PMLR,2013:1247-1255. [21]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep Leaning[M].Cambridge:The MIT Press,2016:26-29. [22]ABADI M,BARHAM P,CHEN J,et al.Tensorflow:A system for large-scale machine learning[C]//12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16).2016:265-283. [23]WEI Y,ZHAO Y,ZHU Z,et al.Modality-dependent cross-media retrieval[J].ACM Transactions on Intelligent Systems and Technology (TIST),2016,7(4):1-13. [24]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]//Proceedings of the ACM International Conference on Image and Video Retrieval.2009:1-9. [25]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv:1607.06215,2016. [26]XUAN R S,OU W H,SONG H Q,et al.Research on the cross-modal retrieval method of semi-supervised confrontation with graph constraint[J].Journal of Guizhou Normal University (Natural Science Edition),2019,37(4):86-94. [27]WANG K,HE R,WANG W,et al.Learning coupled feature spaces for cross-modal matching[C]//Proceedings of the IEEE International Conference on Computer Vision.2013:2088-2095. [28]SRIVASTAVA N,SALAKHUTDINOV R R.Multimodallearning with deep boltzmann machines[C]//Advances in Neural Information Processing Systems.2012:2222-2230. [29]FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the 22nd ACM International Conference on Multimedia.2014:7-16. [30]ZHAI X,PENG Y,XIAO J.Learning Cross-Media Joint Representation with Sparse and Semi supervised Regularization[J].IEEE Transactions on Circuits & Systems for Video Technology,2014,24(6):965-978. [31]NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]//ICML.2011. [32]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,38(10):2010-2023. [33]LONG M,CAO Y,WANG J,et al.Composite CorrelationQuantization for Efficient Multimodal Retrieval[C]//Procee-dings of the 39th International ACM SIGIR Conference on Research and Development in Infromation Retrieval.2016:578-588. [34]PENG Y,QI J,HUANG X,et al.CCL:Cross-modal correlationlearning with multi-grained fusion by hierarchical network[J].IEEE Trans.Multimed.,2018,20(2):405-420. [35]SHANG F,ZHANG H,ZHU L,et al.Adversarial cross-modal retrieval based on dictionary learning[J].Neurocomputing,2019,355:93-104. [36]HU P,PENG D,WANG X,et al.Multimodal adversarial net-work for cross-modal retrieval[J].Knowledge-Based Systems,2019,180:38-50. |
[1] | 曹晓雯, 梁美玉, 鲁康康. 基于细粒度语义推理的跨媒体双路对抗哈希学习模型 Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model 计算机科学, 2022, 49(9): 123-131. https://doi.org/10.11896/jsjkx.220600011 |
[2] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
[3] | 王胜, 张仰森, 陈若愚, 向尕. 基于细粒度差异特征的文本匹配方法 Text Matching Method Based on Fine-grained Difference Features 计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008 |
[4] | 冯霞, 胡志毅, 刘才华. 跨模态检索研究进展综述 Survey of Research Progress on Cross-modal Retrieval 计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165 |
[5] | 谢源, 苗玉彬, 许凤麟, 张铭. 基于半监督深度卷积生成对抗网络的注塑瓶表面缺陷检测模型 Injection-molded Bottle Defect Detection Using Semi-supervised Deep Convolutional Generative Adversarial Network 计算机科学, 2020, 47(7): 92-96. https://doi.org/10.11896/jsjkx.190700093 |
[6] | 邓一姣, 张凤荔, 陈学勤, 艾擎, 余苏喆. 面向跨模态检索的协同注意力网络模型 Collaborative Attention Network Model for Cross-modal Retrieval 计算机科学, 2020, 47(4): 54-59. https://doi.org/10.11896/jsjkx.190600181 |
[7] | 邵阳雪, 孟伟, 孔德珍, 韩林轩, 刘扬. 基于深度学习的特种车辆跨模态检索方法 Cross-modal Retrieval Method for Special Vehicles Based on Deep Learning 计算机科学, 2020, 47(12): 205-209. https://doi.org/10.11896/jsjkx.191000132 |
[8] | 王格格, 郭涛, 李贵洋. 多层感知器深度卷积生成对抗网络 Multi-layer Perceptron Deep Convolutional Generative Adversarial Network 计算机科学, 2019, 46(9): 243-249. https://doi.org/10.11896/j.issn.1002-137X.2019.09.036 |
[9] | 孙全, 曾晓勤. 基于生成对抗网络的图像修复 Image Inpainting Based on Generative Adversarial Networks 计算机科学, 2018, 45(12): 229-234. https://doi.org/10.11896/j.issn.1002-137X.2018.12.038 |
|