Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding

YU Yuan-yuan1, CHAO Wen-han1, HE Yue-ying2, LI Zhou-jun1   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)1
    (National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China)2
  • Received:2018-01-24 Online:2019-01-15 Published:2019-02-25

Abstract: Cross-language knowledge linking (CLKL) refers to the establishment of links between encyclopedia articles in different languages that describe the same content.CLKL can be divided into two parts:candidate selection and candidate ranking.Firstly,this paper formulated candidate selection as cross-language information retrieval problem,and proposed a method to generate query by combining title with keywords,which greatly improves the recall of candidate selection,reaching 93.8%.In the part of the candidate ranking,this paper trained a ranking model by mixing bilingual topic model and bilingual embedding,implementing military articles linking in English Wikipedia and Chinese Baidu Baike.The evaluation results show that the accuracy of model achieves 75%,which significantly improves the perfor-mance of CLKL.The proposed method does not depend on linguistic characteristics and domain characteristics,and it can be easily extended to CLKL in other languages and other domains.

Key words: Cross-language knowledge linking, Cross-language information retrieval, Bilingual topic model, Bilingual embedding

  • TP391
