计算机科学 ›› 2021, Vol. 48 ›› Issue (7): 93-98.doi: 10.11896/jsjkx.200600003

• 数据库&大数据&数据科学 • 上一篇    下一篇

用于多模态语义分析的嵌入共识自动编码器

孙圣姿, 郭炳晖, 杨小博   

  1. 北京航空航天大学大数据与脑机智能高精尖中心 北京100191
    鹏程实验室 广东 深圳518055教育部数学信息与行为重点实验室和北京航空航天大学数学科学学院 北京100191
  • 收稿日期:2020-05-30 修回日期:2020-09-09 出版日期:2021-07-15 发布日期:2021-07-02
  • 通讯作者: 郭炳晖(guobinghui@buaa.edu.cn)
  • 基金资助:
    科技创新2030-“新一代人工智能”重大项目(2018AAA0102301);国家自然科学基金(11671025);民机项目(MJ-F-2012-04)

Embedding Consensus Autoencoder for Cross-modal Semantic Analysis

SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo   

  1. Beijing Advanced Innovation Center for Big Data and Brain Computing,Beihang University,Beijing 100191,China
    Peng Cheng Laboratory,Shenzhen,Guangdong 518055,ChinaLMIB and School of Mathematical Sciences,Beihang University,Beijing 100191,China
  • Received:2020-05-30 Revised:2020-09-09 Online:2021-07-15 Published:2021-07-02
  • About author:SUN Sheng-zi,born in 1996,postgra-duate,is a member of China Computer Federation.Her main research interests include artificial intelligence and pattern recognition.(znlx367@163.com)
    GUO Bing-hui,born in 1982,associate professor,is a professional member of China Computer Federation.His main research interests include data science and complex intelligent system.
  • Supported by:
    Technological innovation 2030-Artificial Intelligence Project(2018AAA0102301), National Natural Science Foundation of China(11671025) and Fundamental Research of Civil Aircraft(MJ-F-2012-04).

摘要: 跨模态检索技术是一项近年来的研究热点。多模态数据具有异质性,而不同形式的信息之间又有着相似性。传统的单模态方法只能以一种方式重构原始数据,并未考虑到不同数据之间的语义相似性,不能进行有效的检索。因此,文中建立了一个跨模态嵌入共识自动编码器(Cross-Modal Semantic Autoencoder with Embedding Consensus,ECA-CMSA),将原始数据映射到低维共识空间以保留语义信息,学习出对应的语义代码向量,并引入参数来实现去噪。然后,考虑到各模态之间的相似性,采用自动编码器将特征投影关联到语义代码向量。此外,对低维矩阵进行正则化稀疏约束,以平衡重构误差。在4个多模态数据集上验证所提方法的性能,实验结果证明其查询结果有所提升,实现了有效的跨模态检索。进一步,ECA-CMSA还可以应用于与计算机和网络有关的领域,如深度学习和子空间学习。该模型突破了传统方法中的障碍,创新地使用深度学习方法将多模态数据转换为抽象的表达,使其可以获得更好的准确度和识别结果。

关键词: 多模态检索, 嵌入共识, 稀疏正则, 自动编码器

Abstract: Cross-modal retrieval has become a topic of popularity,since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention.Traditional single-modal methods reconstruct the original information and lacks of considering the semantic similarity between different data.In this work,an Embedding Consensus Autoencoder for Cross-Modal Semantic Analysis is proposed,which maps the original data to a low-dimensional shared space to retain semantic information.Considering the similarity between the modalities,an automatic encoder is utilized to associate the feature projection to the semantic code vector.In addition,regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors.The high dimentional data is transformed into semantic code vector.Different models are constrained by parameters to achieve denoising.The experiments on four multi-modal data sets show that the query results are improved and effective cross-modal retrieval is achieved.Further,ECA-CMSA can also be applied to fields related to computer and network such as deep and subspace learning.The model breaks through the obstacles in traditional methods,and uses deep learning methods innovatively to convert multi modal data into abstract expression,which can get better accuracy and achieve better results in recognition.

Key words: Autoencoder, Cross-modal retrieval, Embedding consensus, Sparse regularization

中图分类号: 

  • TP39
[1]NIE L,ZHAO Y L,AKBARI M,et al.Bridging the vocabulary gap between health seekers and healthcare knowledge[J].IEEE Trans.Knowl.DataEng.,2015,27 (2):396-409.
[2]ABHISHEK S,ABHISHEK K,DAUME H,et al.Generalized multi-view analysis:a discriminative latent space[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2160-2167.
[3]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].Trans Pattern Anal.Mach.Intell.,2016,38:2010-2023.
[4]PUTTHIVIDHY D,ATTIAS H T,NAGARAJAN S S.Topic regression multi-modal latent dirichlet allocation for image annotation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.CVPR,2010.
[5]MUNOZ L,RAMOS J.Similarity-based Heterogeneous Neural Networks[J].Engineering Letters,2007,14(2):103-116.
[6]ZHOU J,DING G,GUO Y.Latent semantic sparse hashing for cross-modal similarity search[C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:415-424.
[7]WU Y L,WANG S H,HUANG Q M.Multi-modal semantic au-toencoder for cross-modal retrieval[J].Neurocomputing,2019, 331:165-175.
[8]KANG C,XIANG S,LIAO S,et al.Learning consistent feature representation for cross-modal multimedia retrieval[J].IEEE Trans.Multimed.,2015,17(3):370-381.
[9]DAI X M,LI S G.Cross-modal deep discriminant analysis[J].Neurocomputing,2018,314:437-444.
[10]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canoni-cal correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[11]YANG W,YI D,LEI Z,et al.2d-3d face matching using cca[C]//Proceedings of the 8th IEEE International Conference on Automatic Face & Gesture Recognition(FG’08).IEEE,2008:1-6.
[12]SHARMA A,JACOBS D W.Bypassing synthesis:PLS for face recognition with pose,low-resolution and sketch[C]//Procee-dings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2011:593-600.
[13]VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning.ACM,2008:1096-1103.
[14]LANGE S,RIEDMILLER M.Deep Auto-Encoder NeuralNetworks in Reinforcement Learning[C]//International Joint Conference on Neural Networks(IJCNN 2010).Barcelona,Spain,2010:18-23.
[15]SAINATH T N,KINGSBUR Y B,RAMABHADRAN B.Auto-encoderbottleneck features using deep belief networks[C]//2012 IEEE International Conference on IEEE Acoustics,Speech and Signal Processing (ICASSP).2012:4153-4156.
[16]ZHANG L,MA B,LI G,et al.PL-ranking:a novel rankingmethod for cross-modal retrieval[C]//Proceedings of the ACM on Multimedia Conference.ACM,2016:1355-1364.
[17]PEREIRA J C,COVIELLO E,DOYLE G,et al.On the role of correlation and abstraction in cross-modal multimedia retrieval[J].TPAMI,2014,36(3):521-535.
[18]CHUA T S,TANG J,HONG R,et al.Nus-wide:a real-world web image database from national university of Singapore[C]//Proceedings of the CIVR.ACM,2009:48.
[19]HUISKES M J,LEW M S.The mirflickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.ACM,2008:39-43.
[1] 聂秀山, 潘嘉男, 谭智方, 刘新放, 郭杰, 尹义龙.
基于自然语言的视频片段定位综述
Overview of Natural Language Video Localization
计算机科学, 2022, 49(9): 111-122. https://doi.org/10.11896/jsjkx.220500130
[2] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[3] 刘丹, 赵森, 颜志良, 赵静, 王会青.
基于堆叠自动编码器的miRNA-疾病关联预测方法
miRNA-disease Association Prediction Model Based on Stacked Autoencoder
计算机科学, 2021, 48(10): 114-120. https://doi.org/10.11896/jsjkx.200900169
[4] 李亚男, 胡宇佳, 甘伟, 朱敏.
基于深度学习的miRNA靶位点预测研究综述
Survey on Target Site Prediction of Human miRNA Based on Deep Learning
计算机科学, 2021, 48(1): 209-216. https://doi.org/10.11896/jsjkx.191200111
[5] 付文博, 孙涛, 梁藉, 闫宝伟, 范福新.
深度学习原理及应用综述
Review of Principle and Application of Deep Learning
计算机科学, 2018, 45(6A): 11-15.
[6] 珠杰,洪军建.
基于SDAs的人物关系抽取方法研究
Research on Method of Personal Relation Extraction under SDAs
计算机科学, 2017, 44(Z6): 141-145. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.033
[7] 周来恩,王晓丹.
基于非监督特征学习的兴趣点检测算法
Unsupervised Feature Learning Based Interest Point Detection Algorithm
计算机科学, 2016, 43(9): 289-294. https://doi.org/10.11896/j.issn.1002-137X.2016.09.058
[8] 王宪保,何文秀,王辛刚,姚明海,钱沄涛.
基于堆叠降噪自动编码器的胶囊缺陷检测方法
Capsule Defects Detection Based on Stacked Denoising Autoencoders
计算机科学, 2016, 43(2): 64-67. https://doi.org/10.11896/j.issn.1002-137X.2016.02.014
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!