Computer Science ›› 2019, Vol. 46 ›› Issue (1): 238-244.doi: 10.11896/j.issn.1002-137X.2019.01.037

• Artificial Intelligence • Previous Articles     Next Articles

Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding

YU Yuan-yuan1, CHAO Wen-han1, HE Yue-ying2, LI Zhou-jun1   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)1
    (National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China)2
  • Received:2018-01-24 Online:2019-01-15 Published:2019-02-25

Abstract: Cross-language knowledge linking (CLKL) refers to the establishment of links between encyclopedia articles in different languages that describe the same content.CLKL can be divided into two parts:candidate selection and candidate ranking.Firstly,this paper formulated candidate selection as cross-language information retrieval problem,and proposed a method to generate query by combining title with keywords,which greatly improves the recall of candidate selection,reaching 93.8%.In the part of the candidate ranking,this paper trained a ranking model by mixing bilingual topic model and bilingual embedding,implementing military articles linking in English Wikipedia and Chinese Baidu Baike.The evaluation results show that the accuracy of model achieves 75%,which significantly improves the perfor-mance of CLKL.The proposed method does not depend on linguistic characteristics and domain characteristics,and it can be easily extended to CLKL in other languages and other domains.

Key words: Bilingual embedding, Bilingual topic model, Cross-language information retrieval, Cross-language knowledge linking

CLC Number: 

  • TP391
[1]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia-a large-scale,multilingual knowledge base extracted from Wikipedia.Semantic Web,2015,6(2):167-195.<br /> [2]WANG Z,LI J,TANG J.Boosting Cross-Lingual Knowledge Linking via Concept Annotation//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2733-2739.<br /> [3]WANG Z,PAN L,LI J,et al.Boosting to Build a Large-Scale Cross-Lingual Ontology//China Conference on Knowledge Graph and Semantic Computing.Singapore:Springer,2016:41-53.<br /> [4]RUDER S,VULIC I,SØGAARD A.A survey of cross-lingual embedding models.https://arxiv.org/pdf/1706.04902v2.pdf.<br /> [5]FARUQUI M,DYER C.Improving vector space word representations using multilingual correlation//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.2014:462-471.<br /> [6]ARTETXE M,LABAKA G,AGIRRE E.Learning bilingual word embeddings with (almost) no bilingual data//Meeting of the Association for Computational Linguistics.2017:451-462.<br /> [7]DUONG L,KANAYAMA H,MA T F,et al.Learning Crosslingual Word Embeddings without Bilingual Corpora//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.USA:ACL,2016:1285-1295.<br /> [8]MORENO J G,BESANÇON R,BEAUMONT R,et al.Combining word and entity embeddings for entity linking//European Semantic Web Conference.Cham:Springer,2017:337-352.<br /> [9]BLANCO R,OTTAVIANO G,MEIJ E.Fast and space-efficient entity linking for queries//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.ACM,2015:179-188.<br /> [10]PAPPU A,BLANCO R,MEHDAD Y,et al.Lightweight multilingual entity extraction and linking//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.ACM,2017:365-374.<br /> [11]WANG Z,LI J,WANG Z,et al.Cross-lingual knowledge linking across wiki knowledge bases//International Conference on World Wide Web.ACM,2012:459-468.<br /> [12]PAN L,WANG Z,LI J,et al.Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding//International Conference on Knowledge Science,Engineering and Ma-nagement.Cham:Springer,2016:426-438.<br /> [13]WANG Y C,WU C K,TSAI T H.Cross-Language Article Linking with Different Knowledge Bases Using Bilingual Topic Model and Translation Features.Knowledge-Based Systems,2016,111(3):228-236.<br /> [14]SHEN W,WANG J,LUO P,et al.LINDEN:linking named entities with knowledge base via semantic knowledge//Proceedings of the 21st International Conference on World Wide Web.ACM,2012:449-458.<br /> [15]TSAI C T,DAN R.Cross-lingual Wikification Using Multi-lingual Embeddings//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:589-598.<br /> [16]SORG P,CIMIANO P.Enriching the crosslingual link structure of wikipedia-a classification-based approach//Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence.Chicago,Illinois,2008:49-54.<br /> [17]OH J H,KAWAHARA D,UCHIMOTO K,et al.Enriching multilingual language resources by discovering missing cross-language links in wikipedia//Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01.IEEE Computer So-ciety,2008:322-328.<br /> [18]SHEARKAT E,MILIOS E E.Vector embedding of wikipedia concepts and entities//International Conference on Applications of Natural Language to Information Systems.Cham:Springer,2017:418-428.<br /> [19]ARTETXE M,LABAKA G,AGIRRE E.Learning principled bilingual mappings of word embeddings while preserving monolingual invariance//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2289-2294.<br /> [20]HOFFART J,ALTUN Y,WEIKUM G.Discovering emerging entities with ambiguous names//Proceedings of the 23rd International Conference on World Wide Web.ACM,2014:385-396.<br /> [21]RATINOV L,ROTH D,DOWNEY D,et al.Local and global algorithms for disambiguation to wikipedia//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1375-1384.<br /> [22]BARRÓN-CEDEÑO A,ESPAÑA-BONET C,BOLDOBA J,et al.A factory of comparable corpora from wikipedia//Proceedings of the Eighth Workshop on Building and Using Comparable Corpora.2015:3-13.<br /> [23]ZHANG T,LIU K,ZHAO J.Cross Lingual Entity Linking with Bilingual Topic Model//Proceedings of the 23rd InternationalJoint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2218-2224.<br /> [24]LEE C P,LIN C J.Large-scale linear ranksvm.Neural Computation,2014,26(4):781-817.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] WANG Ming, WU Wen-fang, WANG Da-ling, FENG Shi, ZHANG Yi-fei. Generative Link Tree:A Counterfactual Explanation Generation Approach with High Data Fidelity [J]. Computer Science, 2022, 49(9): 33-40.
[3] ZHANG Jia, DONG Shou-bin. Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer [J]. Computer Science, 2022, 49(9): 41-47.
[4] ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63.
[5] SONG Jie, LIANG Mei-yu, XUE Zhe, DU Jun-ping, KOU Fei-fei. Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level [J]. Computer Science, 2022, 49(9): 64-69.
[6] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[7] ZHENG Wen-ping, LIU Mei-lin, YANG Gui. Community Detection Algorithm Based on Node Stability and Neighbor Similarity [J]. Computer Science, 2022, 49(9): 83-91.
[8] LYU Xiao-feng, ZHAO Shu-liang, GAO Heng-da, WU Yong-liang, ZHANG Bao-qi. Short Texts Feautre Enrichment Method Based on Heterogeneous Information Network [J]. Computer Science, 2022, 49(9): 92-100.
[9] XU Tian-hui, GUO Qiang, ZHANG Cai-ming. Time Series Data Anomaly Detection Based on Total Variation Ratio Separation Distance [J]. Computer Science, 2022, 49(9): 101-110.
[10] NIE Xiu-shan, PAN Jia-nan, TAN Zhi-fang, LIU Xin-fang, GUO Jie, YIN Yi-long. Overview of Natural Language Video Localization [J]. Computer Science, 2022, 49(9): 111-122.
[11] CAO Xiao-wen, LIANG Mei-yu, LU Kang-kang. Fine-grained Semantic Reasoning Based Cross-media Dual-way Adversarial Hashing Learning Model [J]. Computer Science, 2022, 49(9): 123-131.
[12] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[13] DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145.
[14] QU Qian-wen, CHE Xiao-ping, QU Chen-xin, LI Jin-ru. Study on Information Perception Based User Presence in Virtual Reality [J]. Computer Science, 2022, 49(9): 146-154.
[15] ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!