融合多层语义的跨模态检索

doi:10.11896/j.issn.1002-137X.2019.03.034

Abstract

Abstract: How to explore the inherent relations of different modalities is the core problem of cross-modal retrieval.The previous works demonstrate that the models which incorporate representation learning and correlation learning into a single process are more suitable for cross-modal retrieval task,but these models only contain the 1-1 correspondence correlations between different modalities.However,different modalities are more likely to have different granularities of semantics abstraction,and the correlations between different modalities are more likely to occur in different layers of semantic at the same time.This paper proposed a cross-modal retrieval model fusing multilayer semantic.The model benefits from the architecture of deep boltzmann machine which is an undirected graph model and implements that each semantic layer of text modality is associated with multiple different semantic layers of image modality at last,and explores the inherent N-M relations of different modalities more sufficiently.The results of experiments on three real and public datasets demonstrate that this model is obviously superior to the state-of-art models,and has higher accuracy of retrieval.

Key words: Cross-modal, Deep learning, Fusion, Multilayer semantics, Retrieval

CLC Number:

TP183

FENG Yao-gong CAI Guo-yong. Cross-modal Retrieval Fusing Multilayer Semantics[J].Computer Science, 2019, 46(3): 227-233.

References

[1]FENG F X.Deep learning for cross-modal retrieval[D].Beijing:Beijing University of Posts and Telecommunications,2015.(in Chinese)
冯方向.基于深度学习的跨模态检索研究[D].北京:北京邮电大学,2015.
[2]FENG F,WANG X,LI R.Cross-modal retrieval with correspondence autoencoder[C]∥Proceedings of the 22nd ACM international conference on Multimedia.ACM,2014:7-16.
[3]FENG F,LI R,WANG X.Deep correspondence restricted Boltzmann machine for cross-modal retrieval[J].Neurocomputing,2015,154:50-60.
[4]WANG W,OOI B C,YANG X,et al.Effective multi-modal retrieval based on stacked auto-encoders[J].Proceedings of the VLDB Endowment,2014,7(8):649-660.
[5]CAI G,FENG Y,LIN Q.Cross-modal retrieval based on deep
correlated network[C]∥2017 3rd IEEE InternationalConfe-rence on Computer and Communications (ICCC).IEEE,2017:1226-1231.
[6]PENG Y,HUANG X,QI J.Cross-media Shared Representation by Hierarchical Learning with Multiple Deep Networks[C]∥International Joint Conference on Artificial Intelligence(IJCAI).IEEE,2016:3846-3853.
[7]WANG K,YIN Q,WANG W,et al.A comprehensive survey on cross-modal retrieval[J].arXiv preprint arXiv:1607.06215,2016.
[8]SALAKHUTDINOV R,HINTON G.Deep boltzmann machines[C]∥Artificial Intelligence and Statistics.IEEE,2009:448-455.
[9]SRIVASTAVA N,SALAKHUTDINOV R R.Multimodal lear-
ning with deep boltzmann machines[C]∥Advances in Neural Information Processing Systems.2012:2222-2230.
[10]CHO K H,RAIKO T,ILIN A.Gaussian-bernoulli deep boltz-
mann machine[C]∥The 2013 International Joint Conference on Neural Networks (IJCNN).IEEE,2013:1-7.
[11]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[R].Technical Teport,University of Toronto,2009.
[12]WELLING M,ROSEN-ZVI M,HINTON G E.Exponential
family harmoniums with an application to information retrieval[C]∥Advances in Neural Information Processing Systems.2005:1481-1488.
[13]HINTON G E,SALAKHUTDINOV R R.Replicated softmax:an undirected topic model[C]∥Advances in Neural Information Processing Systems.2009:1607-1614.
[14]SALAKHUTDINOV R,LAROCHELLE H.Efficient learning
of deep Boltzmann machines[C]∥Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:693-700.
[15]HINTON G E.Training products of experts by minimizing contrastive divergence[J].Neural Computation,2002,14(8):1771-1800.
[16]RASIWASIA N,COSTA PEREIRA J,COVIELLO E,et al.A new approach to cross-modal multimedia retrieval[C]∥Proceedings of the 18th ACM International Conference on Multimedia.ACM,2010:251-260.
[17]CHUA T S,TANG J,HONG R,et al.NUS-WIDE:a real-world web image database from National University of Singapore[C]∥Proceedings of the ACM International Conference on Image and Video Retrieval.ACM,2009.
[18]FARHADI A,HEJRATI M,SADEGHI M,et al.Every picture tells a story:Generating sentences from images[M]∥Computer Vision-ECCV 2010.Berlin:Springer,2010:15-29.
[19]NGIAM J,KHOSLA A,KIM M,et al.Multimodal deep learning[C]∥Proceedings of the 28th International Conference on Machine Learning (ICML-11).2011:689-696.

Related Articles 15

[1]	RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207.
[2]	WU Zi-yi, LI Shao-mei, JIANG Meng-han, ZHANG Jian-peng. Ontology Alignment Method Based on Self-attention [J]. Computer Science, 2022, 49(9): 215-220.
[3]	TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[4]	NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[5]	CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[6]	ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
[7]	XU Yong-xin, ZHAO Jun-feng, WANG Ya-sha, XIE Bing, YANG Kai. Temporal Knowledge Graph Representation Learning [J]. Computer Science, 2022, 49(9): 162-171.
[8]	WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[9]	HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[10]	JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[11]	QIN Qi-qi, ZHANG Yue-qin, WANG Run-ze, ZHANG Ze-hua. Hierarchical Granulation Recommendation Method Based on Knowledge Graph [J]. Computer Science, 2022, 49(8): 64-69.
[12]	ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119.
[13]	WEI Kai-xuan, FU Ying. Re-parameterized Multi-scale Fusion Network for Efficient Extreme Low-light Raw Denoising [J]. Computer Science, 2022, 49(8): 120-126.
[14]	SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177.
[15]	SHEN Xiang-pei, DING Yan-rui. Multi-detector Fusion-based Depth Correlation Filtering Video Multi-target Tracking Algorithm [J]. Computer Science, 2022, 49(8): 184-190.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Cross-modal Retrieval Fusing Multilayer Semantics

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0