计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 93-98.doi: 10.11896/jsjkx.200600003
孙圣姿, 郭炳晖, 杨小博
SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo
摘要: 跨模态检索技术是一项近年来的研究热点。多模态数据具有异质性,而不同形式的信息之间又有着相似性。传统的单模态方法只能以一种方式重构原始数据,并未考虑到不同数据之间的语义相似性,不能进行有效的检索。因此,文中建立了一个跨模态嵌入共识自动编码器(Cross-Modal Semantic Autoencoder with Embedding Consensus,ECA-CMSA),将原始数据映射到低维共识空间以保留语义信息,学习出对应的语义代码向量,并引入参数来实现去噪。然后,考虑到各模态之间的相似性,采用自动编码器将特征投影关联到语义代码向量。此外,对低维矩阵进行正则化稀疏约束,以平衡重构误差。在4个多模态数据集上验证所提方法的性能,实验结果证明其查询结果有所提升,实现了有效的跨模态检索。进一步,ECA-CMSA还可以应用于与计算机和网络有关的领域,如深度学习和子空间学习。该模型突破了传统方法中的障碍,创新地使用深度学习方法将多模态数据转换为抽象的表达,使其可以获得更好的准确度和识别结果。
中图分类号:
[1]NIE L,ZHAO Y L,AKBARI M,et al.Bridging the vocabulary gap between health seekers and healthcare knowledge[J].IEEE Trans.Knowl.DataEng.,2015,27 (2):396-409. [2]ABHISHEK S,ABHISHEK K,DAUME H,et al.Generalized multi-view analysis:a discriminative latent space[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2160-2167. [3]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].Trans Pattern Anal.Mach.Intell.,2016,38:2010-2023. [4]PUTTHIVIDHY D,ATTIAS H T,NAGARAJAN S S.Topic regression multi-modal latent dirichlet allocation for image annotation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.CVPR,2010. [5]MUNOZ L,RAMOS J.Similarity-based Heterogeneous Neural Networks[J].Engineering Letters,2007,14(2):103-116. [6]ZHOU J,DING G,GUO Y.Latent semantic sparse hashing for cross-modal similarity search[C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:415-424. [7]WU Y L,WANG S H,HUANG Q M.Multi-modal semantic au-toencoder for cross-modal retrieval[J].Neurocomputing,2019, 331:165-175. [8]KANG C,XIANG S,LIAO S,et al.Learning consistent feature representation for cross-modal multimedia retrieval[J].IEEE Trans.Multimed.,2015,17(3):370-381. [9]DAI X M,LI S G.Cross-modal deep discriminant analysis[J].Neurocomputing,2018,314:437-444. [10]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canoni-cal correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664. [11]YANG W,YI D,LEI Z,et al.2d-3d face matching using cca[C]//Proceedings of the 8th IEEE International Conference on Automatic Face & Gesture Recognition(FG’08).IEEE,2008:1-6. [12]SHARMA A,JACOBS D W.Bypassing synthesis:PLS for face recognition with pose,low-resolution and sketch[C]//Procee-dings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2011:593-600. [13]VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning.ACM,2008:1096-1103. [14]LANGE S,RIEDMILLER M.Deep Auto-Encoder NeuralNetworks in Reinforcement Learning[C]//International Joint Conference on Neural Networks(IJCNN 2010).Barcelona,Spain,2010:18-23. [15]SAINATH T N,KINGSBUR Y B,RAMABHADRAN B.Auto-encoderbottleneck features using deep belief networks[C]//2012 IEEE International Conference on IEEE Acoustics,Speech and Signal Processing (ICASSP).2012:4153-4156. [16]ZHANG L,MA B,LI G,et al.PL-ranking:a novel rankingmethod for cross-modal retrieval[C]//Proceedings of the ACM on Multimedia Conference.ACM,2016:1355-1364. [17]PEREIRA J C,COVIELLO E,DOYLE G,et al.On the role of correlation and abstraction in cross-modal multimedia retrieval[J].TPAMI,2014,36(3):521-535. [18]CHUA T S,TANG J,HONG R,et al.Nus-wide:a real-world web image database from national university of Singapore[C]//Proceedings of the CIVR.ACM,2009:48. [19]HUISKES M J,LEW M S.The mirflickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.ACM,2008:39-43. |
[1] | 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙. 基于自然语言的视频片段定位综述 Overview of Natural Language Video Localization 计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130 |
[2] | 陈世聪, 袁得嵛, 黄淑华, 杨明. 基于结构深度网络嵌入模型的节点标签分类算法 Node Label Classification Algorithm Based on Structural Depth Network Embedding Model 计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177 |
[3] | 刘丹, 赵森, 颜志良, 赵静, 王会青. 基于堆叠自动编码器的miRNA-疾病关联预测方法 miRNA-disease Association Prediction Model Based on Stacked Autoencoder 计算机科学, 2021, 48(10): 114-120. https://doi.org/10.11896/jsjkx.200900169 |
[4] | 李亚男, 胡宇佳, 甘伟, 朱敏. 基于深度学习的miRNA靶位点预测研究综述 Survey on Target Site Prediction of Human miRNA Based on Deep Learning 计算机科学, 2021, 48(1): 209-216. https://doi.org/10.11896/jsjkx.191200111 |
[5] | 付文博, 孙涛, 梁藉, 闫宝伟, 范福新. 深度学习原理及应用综述 Review of Principle and Application of Deep Learning 计算机科学, 2018, 45(6A): 11-15. |
[6] | 珠杰,洪军建. 基于SDAs的人物关系抽取方法研究 Research on Method of Personal Relation Extraction under SDAs 计算机科学, 2017, 44(Z6): 141-145. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.033 |
[7] | 周来恩,王晓丹. 基于非监督特征学习的兴趣点检测算法 Unsupervised Feature Learning Based Interest Point Detection Algorithm 计算机科学, 2016, 43(9): 289-294. https://doi.org/10.11896/j.issn.1002-137X.2016.09.058 |
[8] | 王宪保,何文秀,王辛刚,姚明海,钱沄涛. 基于堆叠降噪自动编码器的胶囊缺陷检测方法 Capsule Defects Detection Based on Stacked Denoising Autoencoders 计算机科学, 2016, 43(2): 64-67. https://doi.org/10.11896/j.issn.1002-137X.2016.02.014 |
|