计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 163-169.doi: 10.11896/jsjkx.220700216

• 人工智能 • 上一篇    下一篇

一种文本-图像增强的多模态知识图谱嵌入方法

肖桂阳, 王立松, 江国华   

  1. 南京航空航天大学计算机科学与技术学院 南京 210000
  • 收稿日期:2022-07-21 修回日期:2022-11-24 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 王立松(wangls@nuaa.edu.cn)
  • 作者简介:(xiaogy@nuaa.edu.cn)
  • 基金资助:
    基础加强计划重点项目(2019JCJQZD33800)

Multimodal Knowledge Graph Embedding with Text-Image Enhancement

XIAO Guiyang, WANG Lisong , JIANG Guohua   

  1. School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics, Nanjing 210000,China
  • Received:2022-07-21 Revised:2022-11-24 Online:2023-08-15 Published:2023-08-02
  • About author:XIAO Guiyang,born in 1998,master.Her main research interests include knowledge graph embedding and deep learning.
    WANG Lisong,born in 1969,Ph.D,professor,is a member of China Computer Federation.His main research interests include natural language processing and formal method.
  • Supported by:
    Key Projects of Foundation Strengthening Plan(2019JCJQZD33800).

摘要: 大多传统的知识表示学习方法只关注三元组中的结构化信息,无法很好地利用实体图像、关系路径、文本描述等附加信息来学习知识表示或只融合一种附加信息。因此,提出同时融合实体描述和图像的多模态知识图谱嵌入方法,通过文本、图像相互增强,可以提供更加全面的外部信息来弥补单个信息源的不完整性给知识表示学习带来的不足。首先进行实体描述和图像建模,得到实体的文本表示和图像表示,并把它们作为TransE中结构表示的补充,最后通过3种实体表示的联合训练实现知识图谱和文本、图像的统一空间表示,提高实体和关系预测的准确性。实验结果表明,该模型实体预测的命中率比不融合附加信息的方法提高了3.09%,比只融合实体描述的方法提高了0.97%,比只融合实体图像的方法提高了1.32%。

关键词: 知识表示学习, 实体描述, 实体图像, Text-CNN, 联合训练

Abstract: Most traditional knowledge representation learning methods only focus on the structured information in triples,and cannot make good use of the additional information such as entity images,relation path and text descriptions to learn knowledge representation or fuse only one additional information.Therefore,a multimodal knowledge graph embedding method combining entity descriptions and images is proposed.Through mutual enhancement of text and images,more comprehensive external information can be provided to make up for the deficiency of knowledge representation learning caused by the incompleteness of a single information source.Firstly,text representation and image representation of entities are obtained by modeling entity descriptions and images.Then,they are used as a supplement to the structural representation in TransE.Finally,through the joint trai-ning of three entity representations,the unified spatial representation of knowledge graph,text and image is realized to improve the accuracy of entity and relation prediction.Experimental results show that the hit rate of entity prediction of this model improves by 3.09% compared with the method of without additional information,improves by 0.97% compared with the method of fusing only entity descriptions,and improves by 1.32% compared with the method of fusing only entity images.

Key words: Knowledge representation learning, Entity descriptions, Entity images, Text-CNN, Joint training

中图分类号: 

  • TP183
[1]BOLLACKER K,EVANS C,PARITOSH P,et al.Freebase:a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.2008:1247-1250.
[2]AUER S,BIZER C,KOBILAROV G,et al.Dbpedia:A nucleus for a web of open data[C]//The Semantic Web:6th Interna-tional Semantic Web Conference,2nd Asian Semantic Web Conference(ISWC 2007+ ASWC 2007).Busan,Korea:Springer,2007:722-735.
[3]SUCHANEK F M,KASNECI G,WEIKUM G.Yago:a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web.2007:697-706.
[4]YANG B,YI H W,HE X,et al.Embedding entities and relations for learning and inference in knowledge bases[J].arXiv:1412.6575,2014.
[5]YIN J,JIANG X,LU Z,et al.Neural generative question an-swering[J].arXiv:1512.01337,2015.
[6]WANG Q,MAO Z,WANG B,et al.Knowledge graph embedding:A survey of approaches and applications[J].IEEE Tran-sactions on Knowledge and Data Engineering,2017,29(12):2724-2743.
[7]DONG X,GABRILOVICH E,HEITZ G,et al.Knowledgevault:A web-scale approach to probabilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:601-610.
[8]BORDES A,USUNIER N,GARCIA-DURAN A,et al.Translating embeddings for modeling multi-relational data[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2.2013:2787-2795.
[9]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNetclassification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1.2012:1097-1105.
[10]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2.2013:3111-3119.
[11]WANG Z,ZHANG J,FENG J,et al.Knowledge graph embedding by translating on hyperplanes[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.2014:1112-1119.
[12]LIN Y,LIU Z,SUN M,et al.Learning entity and relation embeddings for knowledge graph completion[C]//Twenty-ninth AAAI Conference on Artificial Intelligence.2015.
[13]FAN M,ZHOU Q,CHANG E,et al.Transition-based know-ledge graph embedding with relational mapping properties[C]//Proceedings of the 28th Pacific Asia Conference on Language,Information Cnd computing.2014:328-337.
[14]GUO S,WANG Q,WANG B,et al.Semantically smooth know-ledge graph embedding[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing(Volume 1:Long Papers).2015:84-94.
[15]LIN Y,LIU Z,LUAN H,et al.Modeling relation paths for representation learning of knowledge bases[J].arXiv:1506.00379,2015.
[16]XIE R,LIU Z,JIA J,et al.Representation learning of knowledge graphs with entity descriptions[C]//Proceedings of the Thir-tieth AAAI Conference on Artificial Intelligence.2016:2659-2665.
[17]JIANG T,LIU T,GE T,et al.Encoding temporal information for time-aware link prediction[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Proces-sing.2016:2350-2354.
[18]KIM Y.Convolutional Neural Networks for Sentence Classification[J].arXiv:1408.5882,2014.
[19]PASZKE A,GROSS S,MASSA F,et al.PyTorch:an imperative style,high-performance deep learning library[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.2019:8026-8037.
[20]SHUTOVA E,KIELA D,MAILLARD J.Black holes and white rabbits:Metaphor identification with visual features[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:160-170.
[21]XIE R,LIU Z,LUAN H,et al.Image-embodied knowledge representation learning[J].arXiv:1609.07028,2016.
[22]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[23]MILLER G A.WordNet:a lexical database for English[J].Communications of the ACM,1995,38(11):39-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!