Computer Science ›› 2019, Vol. 46 ›› Issue (1): 238-244.doi: 10.11896/j.issn.1002-137X.2019.01.037

• Artificial Intelligence • Previous Articles     Next Articles

Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding

YU Yuan-yuan1, CHAO Wen-han1, HE Yue-ying2, LI Zhou-jun1   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)1
    (National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China)2
  • Received:2018-01-24 Online:2019-01-15 Published:2019-02-25

Abstract: Cross-language knowledge linking (CLKL) refers to the establishment of links between encyclopedia articles in different languages that describe the same content.CLKL can be divided into two parts:candidate selection and candidate ranking.Firstly,this paper formulated candidate selection as cross-language information retrieval problem,and proposed a method to generate query by combining title with keywords,which greatly improves the recall of candidate selection,reaching 93.8%.In the part of the candidate ranking,this paper trained a ranking model by mixing bilingual topic model and bilingual embedding,implementing military articles linking in English Wikipedia and Chinese Baidu Baike.The evaluation results show that the accuracy of model achieves 75%,which significantly improves the perfor-mance of CLKL.The proposed method does not depend on linguistic characteristics and domain characteristics,and it can be easily extended to CLKL in other languages and other domains.

Key words: Cross-language knowledge linking, Cross-language information retrieval, Bilingual topic model, Bilingual embedding

CLC Number: 

  • TP391
[1]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia-a large-scale,multilingual knowledge base extracted from Wikipedia.Semantic Web,2015,6(2):167-195.<br /> [2]WANG Z,LI J,TANG J.Boosting Cross-Lingual Knowledge Linking via Concept Annotation//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2733-2739.<br /> [3]WANG Z,PAN L,LI J,et al.Boosting to Build a Large-Scale Cross-Lingual Ontology//China Conference on Knowledge Graph and Semantic Computing.Singapore:Springer,2016:41-53.<br /> [4]RUDER S,VULIC I,SØGAARD A.A survey of cross-lingual embedding models.https://arxiv.org/pdf/1706.04902v2.pdf.<br /> [5]FARUQUI M,DYER C.Improving vector space word representations using multilingual correlation//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.2014:462-471.<br /> [6]ARTETXE M,LABAKA G,AGIRRE E.Learning bilingual word embeddings with (almost) no bilingual data//Meeting of the Association for Computational Linguistics.2017:451-462.<br /> [7]DUONG L,KANAYAMA H,MA T F,et al.Learning Crosslingual Word Embeddings without Bilingual Corpora//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.USA:ACL,2016:1285-1295.<br /> [8]MORENO J G,BESANÇON R,BEAUMONT R,et al.Combining word and entity embeddings for entity linking//European Semantic Web Conference.Cham:Springer,2017:337-352.<br /> [9]BLANCO R,OTTAVIANO G,MEIJ E.Fast and space-efficient entity linking for queries//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.ACM,2015:179-188.<br /> [10]PAPPU A,BLANCO R,MEHDAD Y,et al.Lightweight multilingual entity extraction and linking//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.ACM,2017:365-374.<br /> [11]WANG Z,LI J,WANG Z,et al.Cross-lingual knowledge linking across wiki knowledge bases//International Conference on World Wide Web.ACM,2012:459-468.<br /> [12]PAN L,WANG Z,LI J,et al.Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding//International Conference on Knowledge Science,Engineering and Ma-nagement.Cham:Springer,2016:426-438.<br /> [13]WANG Y C,WU C K,TSAI T H.Cross-Language Article Linking with Different Knowledge Bases Using Bilingual Topic Model and Translation Features.Knowledge-Based Systems,2016,111(3):228-236.<br /> [14]SHEN W,WANG J,LUO P,et al.LINDEN:linking named entities with knowledge base via semantic knowledge//Proceedings of the 21st International Conference on World Wide Web.ACM,2012:449-458.<br /> [15]TSAI C T,DAN R.Cross-lingual Wikification Using Multi-lingual Embeddings//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:589-598.<br /> [16]SORG P,CIMIANO P.Enriching the crosslingual link structure of wikipedia-a classification-based approach//Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence.Chicago,Illinois,2008:49-54.<br /> [17]OH J H,KAWAHARA D,UCHIMOTO K,et al.Enriching multilingual language resources by discovering missing cross-language links in wikipedia//Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01.IEEE Computer So-ciety,2008:322-328.<br /> [18]SHEARKAT E,MILIOS E E.Vector embedding of wikipedia concepts and entities//International Conference on Applications of Natural Language to Information Systems.Cham:Springer,2017:418-428.<br /> [19]ARTETXE M,LABAKA G,AGIRRE E.Learning principled bilingual mappings of word embeddings while preserving monolingual invariance//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2289-2294.<br /> [20]HOFFART J,ALTUN Y,WEIKUM G.Discovering emerging entities with ambiguous names//Proceedings of the 23rd International Conference on World Wide Web.ACM,2014:385-396.<br /> [21]RATINOV L,ROTH D,DOWNEY D,et al.Local and global algorithms for disambiguation to wikipedia//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1375-1384.<br /> [22]BARRÓN-CEDEÑO A,ESPAÑA-BONET C,BOLDOBA J,et al.A factory of comparable corpora from wikipedia//Proceedings of the Eighth Workshop on Building and Using Comparable Corpora.2015:3-13.<br /> [23]ZHANG T,LIU K,ZHAO J.Cross Lingual Entity Linking with Bilingual Topic Model//Proceedings of the 23rd InternationalJoint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2218-2224.<br /> [24]LEE C P,LIN C J.Large-scale linear ranksvm.Neural Computation,2014,26(4):781-817.
[1] SHAN Mei-jing, QIN Long-fei, ZHANG Hui-bing. L-YOLO:Real Time Traffic Sign Detection Model for Vehicle Edge Computing [J]. Computer Science, 2021, 48(1): 89-95.
[2] YUAN Lu, ZHU Zheng-zhou, REN Ting-yu. Survey on Fake Review Recognition [J]. Computer Science, 2021, 48(1): 111-118.
[3] ZHANG Yu, LU Yi-hong, HUANG De-cai. Weighted Hesitant Fuzzy Clustering Based on Density Peaks [J]. Computer Science, 2021, 48(1): 145-151.
[4] ZHANG Yang, MA Xiao-hu. Anime Character Portrait Generation Algorithm Based on Improved Generative Adversarial Networks [J]. Computer Science, 2021, 48(1): 182-189.
[5] WANG Rui-ping, JIA Zhen, LIU Chang, CHEN Ze-wei, LI Tian-rui. Deep Interest Factorization Machine Network Based on DeepFM [J]. Computer Science, 2021, 48(1): 226-232.
[6] ZHANG Fan, HE Wen-qi, JI Hong-bing, LI Dan-ping, WANG Lei. Multi-view Dictionary-pair Learning Based on Block-diagonal Representation [J]. Computer Science, 2021, 48(1): 233-240.
[7] YU Wen-jia, DING Shi-fei. Conditional Generative Adversarial Network Based on Self-attention Mechanism [J]. Computer Science, 2021, 48(1): 241-246.
[8] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[9] XU Yun-qi, HUANG He, JIN Zhong. Application Research on Container Technology in Scientific Computing [J]. Computer Science, 2021, 48(1): 319-325.
[10] YANG Jing-wei, WEI Zi-qi, LIU Lin. What Users Think about Predictive Analytics?——A Domestic Survey on NFRs [J]. Computer Science, 2020, 47(12): 18-24.
[11] JIA Jing-dong, ZHANG Xiao-man, HAO Lu, TAN Huo-bin. Analysis of Focuses of Requirements Engineering in Industry [J]. Computer Science, 2020, 47(12): 25-34.
[12] YANG Li, MA Jia-jia, JIANG Hua-xi, MA Xiao-xiao, LIANG Geng, ZUO Chun. Requirements Modeling and Decision-making for Machine Learning Systems [J]. Computer Science, 2020, 47(12): 42-49.
[13] LU Dong-dong, WU Jie, LIU Peng, SHENG Yong-xiang. Analysis of Key Developer Type and Robustness of Collaboration Network in Open Source Software [J]. Computer Science, 2020, 47(12): 100-105.
[14] CHAO Le-men. Open-source Course and Open-sourcing Intro to Data Science [J]. Computer Science, 2020, 47(12): 114-118.
[15] ZHANG Hu, ZHOU Jing-jing, GAO Hai-hui, WANG Xin. Network Representation Learning Method on Fusing Node Structure and Content [J]. Computer Science, 2020, 47(12): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] . [J]. Computer Science, 2018, 1(1): 1 .
[2] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[3] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[4] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[5] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[6] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[7] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[8] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[9] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[10] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .