Computer Science ›› 2019, Vol. 46 ›› Issue (1): 238-244.doi: 10.11896/j.issn.1002-137X.2019.01.037

• Artificial Intelligence • Previous Articles     Next Articles

Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding

YU Yuan-yuan1, CHAO Wen-han1, HE Yue-ying2, LI Zhou-jun1   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)1
    (National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China)2
  • Received:2018-01-24 Online:2019-01-15 Published:2019-02-25

Abstract: Cross-language knowledge linking (CLKL) refers to the establishment of links between encyclopedia articles in different languages that describe the same content.CLKL can be divided into two parts:candidate selection and candidate ranking.Firstly,this paper formulated candidate selection as cross-language information retrieval problem,and proposed a method to generate query by combining title with keywords,which greatly improves the recall of candidate selection,reaching 93.8%.In the part of the candidate ranking,this paper trained a ranking model by mixing bilingual topic model and bilingual embedding,implementing military articles linking in English Wikipedia and Chinese Baidu Baike.The evaluation results show that the accuracy of model achieves 75%,which significantly improves the perfor-mance of CLKL.The proposed method does not depend on linguistic characteristics and domain characteristics,and it can be easily extended to CLKL in other languages and other domains.

Key words: Cross-language knowledge linking, Cross-language information retrieval, Bilingual topic model, Bilingual embedding

CLC Number: 

  • TP391
[1]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia-a large-scale,multilingual knowledge base extracted from Wikipedia.Semantic Web,2015,6(2):167-195.<br /> [2]WANG Z,LI J,TANG J.Boosting Cross-Lingual Knowledge Linking via Concept Annotation//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2733-2739.<br /> [3]WANG Z,PAN L,LI J,et al.Boosting to Build a Large-Scale Cross-Lingual Ontology//China Conference on Knowledge Graph and Semantic Computing.Singapore:Springer,2016:41-53.<br /> [4]RUDER S,VULIC I,SØGAARD A.A survey of cross-lingual embedding models.https://arxiv.org/pdf/1706.04902v2.pdf.<br /> [5]FARUQUI M,DYER C.Improving vector space word representations using multilingual correlation//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.2014:462-471.<br /> [6]ARTETXE M,LABAKA G,AGIRRE E.Learning bilingual word embeddings with (almost) no bilingual data//Meeting of the Association for Computational Linguistics.2017:451-462.<br /> [7]DUONG L,KANAYAMA H,MA T F,et al.Learning Crosslingual Word Embeddings without Bilingual Corpora//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.USA:ACL,2016:1285-1295.<br /> [8]MORENO J G,BESANÇON R,BEAUMONT R,et al.Combining word and entity embeddings for entity linking//European Semantic Web Conference.Cham:Springer,2017:337-352.<br /> [9]BLANCO R,OTTAVIANO G,MEIJ E.Fast and space-efficient entity linking for queries//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.ACM,2015:179-188.<br /> [10]PAPPU A,BLANCO R,MEHDAD Y,et al.Lightweight multilingual entity extraction and linking//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.ACM,2017:365-374.<br /> [11]WANG Z,LI J,WANG Z,et al.Cross-lingual knowledge linking across wiki knowledge bases//International Conference on World Wide Web.ACM,2012:459-468.<br /> [12]PAN L,WANG Z,LI J,et al.Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding//International Conference on Knowledge Science,Engineering and Ma-nagement.Cham:Springer,2016:426-438.<br /> [13]WANG Y C,WU C K,TSAI T H.Cross-Language Article Linking with Different Knowledge Bases Using Bilingual Topic Model and Translation Features.Knowledge-Based Systems,2016,111(3):228-236.<br /> [14]SHEN W,WANG J,LUO P,et al.LINDEN:linking named entities with knowledge base via semantic knowledge//Proceedings of the 21st International Conference on World Wide Web.ACM,2012:449-458.<br /> [15]TSAI C T,DAN R.Cross-lingual Wikification Using Multi-lingual Embeddings//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:589-598.<br /> [16]SORG P,CIMIANO P.Enriching the crosslingual link structure of wikipedia-a classification-based approach//Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence.Chicago,Illinois,2008:49-54.<br /> [17]OH J H,KAWAHARA D,UCHIMOTO K,et al.Enriching multilingual language resources by discovering missing cross-language links in wikipedia//Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01.IEEE Computer So-ciety,2008:322-328.<br /> [18]SHEARKAT E,MILIOS E E.Vector embedding of wikipedia concepts and entities//International Conference on Applications of Natural Language to Information Systems.Cham:Springer,2017:418-428.<br /> [19]ARTETXE M,LABAKA G,AGIRRE E.Learning principled bilingual mappings of word embeddings while preserving monolingual invariance//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2289-2294.<br /> [20]HOFFART J,ALTUN Y,WEIKUM G.Discovering emerging entities with ambiguous names//Proceedings of the 23rd International Conference on World Wide Web.ACM,2014:385-396.<br /> [21]RATINOV L,ROTH D,DOWNEY D,et al.Local and global algorithms for disambiguation to wikipedia//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1375-1384.<br /> [22]BARRÓN-CEDEÑO A,ESPAÑA-BONET C,BOLDOBA J,et al.A factory of comparable corpora from wikipedia//Proceedings of the Eighth Workshop on Building and Using Comparable Corpora.2015:3-13.<br /> [23]ZHANG T,LIU K,ZHAO J.Cross Lingual Entity Linking with Bilingual Topic Model//Proceedings of the 23rd InternationalJoint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2218-2224.<br /> [24]LEE C P,LIN C J.Large-scale linear ranksvm.Neural Computation,2014,26(4):781-817.
[1] GE Meng-fan, LIU Zhen, WANG Na-na, TIAN Jing-yu. Cross-domaing Item Recommendation Algorithm Including Tag Transfer [J]. Computer Science, 2019, 46(10): 1-6.
[2] YI Xiao-qun, LI Tian-rui, CHEN Chao. Sunburst Visualization for Comment Text Data [J]. Computer Science, 2019, 46(10): 14-18.
[3] YU Ying, CHEN Ke, SHOU Li-dan, CHEN Gang, WU Xiao-fan. Sentiment Analysis of User Comments Based on Extraction of Key Words and Key Sentences [J]. Computer Science, 2019, 46(10): 19-26.
[4] ZHANG Qi, LIU Ling, WEN Jun-hao. Recommendation Algorithm with Field Trust and Distrust Based on SVD [J]. Computer Science, 2019, 46(10): 27-31.
[5] WANG Bin, MA Jun-jie, FANG Xin-xiu, WEI Tian-you. Association Rule Mining Algorithm Based on Timestamp and Vertical Format [J]. Computer Science, 2019, 46(10): 71-76.
[6] FENG Yun-fei, CHEN Hong-mei. Topological Structure Based Density Peak Algorithm for Overlapping Community Detection [J]. Computer Science, 2019, 46(10): 39-48.
[7] CHEN Feng, MENG Zu-qiang. Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm [J]. Computer Science, 2019, 46(10): 49-54.
[8] WANG Wei-hong, LIANG Chao-kai, MIN Yong. Multi-recording Complex Webpage Information Extraction Algorithm Based on Visual Block [J]. Computer Science, 2019, 46(10): 63-70.
[9] CHEN Jiong, ZHANG Hu, CAO Fu-yuan. Study on Point-of-interest Collaborative Recommendation Method Fusing Multi-factors [J]. Computer Science, 2019, 46(10): 77-83.
[10] WANG Peng-fei, ZHANG Hang. Sub-sampling Signal Reconstruction Based on Principal Component Under Underdetermined Conditions [J]. Computer Science, 2019, 46(10): 103-108.
[11] LI Hao, LIU Yong-jian, XIE Qing, TANG Ling-li. Distant Supervision Relation Extraction Model Based on Multi-level Attention Mechanism [J]. Computer Science, 2019, 46(10): 252-257.
[12] HAN Xu-li, ZENG Bi-qing, ZENG Feng, ZHANG Min, SHANG Qi. Sentiment Analysis Based on Word Embedding Auxiliary Mechanism [J]. Computer Science, 2019, 46(10): 258-264.
[13] CHEN Jian-ping, ZOU Feng, LIU Quan, WU Hong-jie, HU Fu-yuan, FU Qi-ming. Reinforcement Learning Algorithm Based on Generative Adversarial Networks [J]. Computer Science, 2019, 46(10): 265-272.
[14] TANG Wen-liang, TANG Shu-fang, ZHANG Ping. Research and Improvement of Web Fingerprint Identification Algorithm Based on Cosine Measure [J]. Computer Science, 2019, 46(10): 295-398.
[15] ZHU Wei, YI Yao, WANG Tu-qiang, ZHENG Ya-yu. Fast Coding Unit Partition Algorithm for Depth Maps [J]. Computer Science, 2019, 46(10): 286-294.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[3] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[4] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[5] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[6] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[7] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[8] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[9] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[10] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .