Computer Science ›› 2021, Vol. 48 ›› Issue (7): 93-98.doi: 10.11896/jsjkx.200600003

• Database & Big Data & Data Science • Previous Articles     Next Articles

Embedding Consensus Autoencoder for Cross-modal Semantic Analysis

SUN Sheng-zi, GUO Bing-hui , YANG Xiao-bo   

  1. Beijing Advanced Innovation Center for Big Data and Brain Computing,Beihang University,Beijing 100191,China
    Peng Cheng Laboratory,Shenzhen,Guangdong 518055,ChinaLMIB and School of Mathematical Sciences,Beihang University,Beijing 100191,China
  • Received:2020-05-30 Revised:2020-09-09 Online:2021-07-15 Published:2021-07-02
  • About author:SUN Sheng-zi,born in 1996,postgra-duate,is a member of China Computer Federation.Her main research interests include artificial intelligence and pattern recognition.(znlx367@163.com)
    GUO Bing-hui,born in 1982,associate professor,is a professional member of China Computer Federation.His main research interests include data science and complex intelligent system.
  • Supported by:
    Technological innovation 2030-Artificial Intelligence Project(2018AAA0102301), National Natural Science Foundation of China(11671025) and Fundamental Research of Civil Aircraft(MJ-F-2012-04).

Abstract: Cross-modal retrieval has become a topic of popularity,since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention.Traditional single-modal methods reconstruct the original information and lacks of considering the semantic similarity between different data.In this work,an Embedding Consensus Autoencoder for Cross-Modal Semantic Analysis is proposed,which maps the original data to a low-dimensional shared space to retain semantic information.Considering the similarity between the modalities,an automatic encoder is utilized to associate the feature projection to the semantic code vector.In addition,regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors.The high dimentional data is transformed into semantic code vector.Different models are constrained by parameters to achieve denoising.The experiments on four multi-modal data sets show that the query results are improved and effective cross-modal retrieval is achieved.Further,ECA-CMSA can also be applied to fields related to computer and network such as deep and subspace learning.The model breaks through the obstacles in traditional methods,and uses deep learning methods innovatively to convert multi modal data into abstract expression,which can get better accuracy and achieve better results in recognition.

Key words: Autoencoder, Cross-modal retrieval, Embedding consensus, Sparse regularization

CLC Number: 

  • TP39
[1]NIE L,ZHAO Y L,AKBARI M,et al.Bridging the vocabulary gap between health seekers and healthcare knowledge[J].IEEE Trans.Knowl.DataEng.,2015,27 (2):396-409.
[2]ABHISHEK S,ABHISHEK K,DAUME H,et al.Generalized multi-view analysis:a discriminative latent space[C]//Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition.2012:2160-2167.
[3]WANG K,HE R,WANG L,et al.Joint feature selection andsubspace learning for cross-modal retrieval[J].Trans Pattern Anal.Mach.Intell.,2016,38:2010-2023.
[4]PUTTHIVIDHY D,ATTIAS H T,NAGARAJAN S S.Topic regression multi-modal latent dirichlet allocation for image annotation[C]//Proceedings of Conference on Computer Vision and Pattern Recognition.CVPR,2010.
[5]MUNOZ L,RAMOS J.Similarity-based Heterogeneous Neural Networks[J].Engineering Letters,2007,14(2):103-116.
[6]ZHOU J,DING G,GUO Y.Latent semantic sparse hashing for cross-modal similarity search[C]//Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval.ACM,2014:415-424.
[7]WU Y L,WANG S H,HUANG Q M.Multi-modal semantic au-toencoder for cross-modal retrieval[J].Neurocomputing,2019, 331:165-175.
[8]KANG C,XIANG S,LIAO S,et al.Learning consistent feature representation for cross-modal multimedia retrieval[J].IEEE Trans.Multimed.,2015,17(3):370-381.
[9]DAI X M,LI S G.Cross-modal deep discriminant analysis[J].Neurocomputing,2018,314:437-444.
[10]HARDOON D R,SZEDMAK S,SHAWE-TAYLOR J.Canoni-cal correlation analysis:an overview with application to learning methods[J].Neural Computation,2004,16(12):2639-2664.
[11]YANG W,YI D,LEI Z,et al.2d-3d face matching using cca[C]//Proceedings of the 8th IEEE International Conference on Automatic Face & Gesture Recognition(FG’08).IEEE,2008:1-6.
[12]SHARMA A,JACOBS D W.Bypassing synthesis:PLS for face recognition with pose,low-resolution and sketch[C]//Procee-dings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).IEEE,2011:593-600.
[13]VINCENT P,LAROCHELLE H,BENGIO Y,et al.Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning.ACM,2008:1096-1103.
[14]LANGE S,RIEDMILLER M.Deep Auto-Encoder NeuralNetworks in Reinforcement Learning[C]//International Joint Conference on Neural Networks(IJCNN 2010).Barcelona,Spain,2010:18-23.
[15]SAINATH T N,KINGSBUR Y B,RAMABHADRAN B.Auto-encoderbottleneck features using deep belief networks[C]//2012 IEEE International Conference on IEEE Acoustics,Speech and Signal Processing (ICASSP).2012:4153-4156.
[16]ZHANG L,MA B,LI G,et al.PL-ranking:a novel rankingmethod for cross-modal retrieval[C]//Proceedings of the ACM on Multimedia Conference.ACM,2016:1355-1364.
[17]PEREIRA J C,COVIELLO E,DOYLE G,et al.On the role of correlation and abstraction in cross-modal multimedia retrieval[J].TPAMI,2014,36(3):521-535.
[18]CHUA T S,TANG J,HONG R,et al.Nus-wide:a real-world web image database from national university of Singapore[C]//Proceedings of the CIVR.ACM,2009:48.
[19]HUISKES M J,LEW M S.The mirflickr retrieval evaluation[C]//Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval.ACM,2008:39-43.
[1] LIU Xin, WANG Jun, SONG Qiao-feng, LIU Jia-hao. Collaborative Multicast Proactive Caching Scheme Based on AAE [J]. Computer Science, 2022, 49(9): 260-267.
[2] WANG Guan-yu, ZHONG Ting, FENG Yu, ZHOU Fan. Collaborative Filtering Recommendation Method Based on Vector Quantization Coding [J]. Computer Science, 2022, 49(9): 48-54.
[3] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[4] QIAO Jie, CAI Rui-chu, HAO Zhi-feng. Mining Causality via Information Bottleneck [J]. Computer Science, 2022, 49(2): 198-203.
[5] LIU Li-bo, GOU Ting-ting. Cross-modal Retrieval Combining Deep Canonical Correlation Analysis and Adversarial Learning [J]. Computer Science, 2021, 48(9): 200-207.
[6] ZHANG Shi-peng, LI Yong-zhong. Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions [J]. Computer Science, 2021, 48(9): 345-351.
[7] XU Tao, TIAN Chong-yang, LIU Cai-hua. Deep Learning for Abnormal Crowd Behavior Detection:A Review [J]. Computer Science, 2021, 48(9): 125-134.
[8] FENG Xia, HU Zhi-yi, LIU Cai-hua. Survey of Research Progress on Cross-modal Retrieval [J]. Computer Science, 2021, 48(8): 13-23.
[9] YANG Lei, JIANG Ai-lian, QIANG Yan. Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization [J]. Computer Science, 2021, 48(8): 53-59.
[10] HU Xiao-wei, CHEN Yu-zhong. Query Suggestion Method Based on Autoencoder and Reinforcement Learning [J]. Computer Science, 2021, 48(6A): 206-212.
[11] XING Hong-jie, HAO ZhongHebei. Novelty Detection Method Based on Global and Local Discriminative Adversarial Autoencoder [J]. Computer Science, 2021, 48(6): 202-209.
[12] ZHAO Xin-can, CHANG Han-xing, JIN Ren-biao. 3D Point Cloud Shape Completion GAN [J]. Computer Science, 2021, 48(4): 192-196.
[13] FU Kun, ZHAO Xiao-meng, FU Zi-tong, GAO Jin-hui, MA Hao-ran. Deep Network Representation Learning Method on Incomplete Information Networks [J]. Computer Science, 2021, 48(12): 212-218.
[14] PAN Yu, ZOU Jun-hua, WANG Shuai-hui, HU Gu-yu, PAN Zhi-song. Deep Community Detection Algorithm Based on Network Representation Learning [J]. Computer Science, 2021, 48(11A): 198-203.
[15] FAN Lian-xi, LIU Yan-bei, WANG Wen, GENG Lei, WU Jun, ZHANG Fang, XIAO Zhi-tao. Multimodal Representation Learning for Alzheimer's Disease Diagnosis [J]. Computer Science, 2021, 48(10): 107-113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!