计算机科学 ›› 2019, Vol. 46 ›› Issue (1): 238-244.doi: 10.11896/j.issn.1002-137X.2019.01.037
余圆圆1, 巢文涵1, 何跃鹰2, 李舟军1
YU Yuan-yuan1, CHAO Wen-han1, HE Yue-ying2, LI Zhou-jun1
摘要: 跨语言知识链接是指在描述相同内容的不同语言的在线百科文章之间建立联系。跨语言知识链接可分为候选集选择和候选集排序两部分。首先,把候选集选择问题转换为跨语言信息检索问题,提出一种将标题与关键词相结合从而生成查询的方法,该方法将候选集选择的召回率大幅提高至93.8%;在候选集排序部分,提出一种融合双语主题模型及双语词向量的排序模型,实现了英文维基百科和中文百度百科之间军事领域的跨语言知识链接。实验结果表明,该模型取得了75%的准确率,显著提高了跨语言知识链接的性能,并且提出的方法不依赖于语言特性和领域特性,因此可以很容易地扩展至其他语言和其他领域的跨语言知识链接。
中图分类号:
[1]LEHMANN J,ISELE R,JAKOB M,et al.DBpedia-a large-scale,multilingual knowledge base extracted from Wikipedia.Semantic Web,2015,6(2):167-195.<br /> [2]WANG Z,LI J,TANG J.Boosting Cross-Lingual Knowledge Linking via Concept Annotation//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2733-2739.<br /> [3]WANG Z,PAN L,LI J,et al.Boosting to Build a Large-Scale Cross-Lingual Ontology//China Conference on Knowledge Graph and Semantic Computing.Singapore:Springer,2016:41-53.<br /> [4]RUDER S,VULIC I,SØGAARD A.A survey of cross-lingual embedding models.https://arxiv.org/pdf/1706.04902v2.pdf.<br /> [5]FARUQUI M,DYER C.Improving vector space word representations using multilingual correlation//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.2014:462-471.<br /> [6]ARTETXE M,LABAKA G,AGIRRE E.Learning bilingual word embeddings with (almost) no bilingual data//Meeting of the Association for Computational Linguistics.2017:451-462.<br /> [7]DUONG L,KANAYAMA H,MA T F,et al.Learning Crosslingual Word Embeddings without Bilingual Corpora//Procee-dings of the 2016 Conference on Empirical Methods in Natural Language Processing.USA:ACL,2016:1285-1295.<br /> [8]MORENO J G,BESANÇON R,BEAUMONT R,et al.Combining word and entity embeddings for entity linking//European Semantic Web Conference.Cham:Springer,2017:337-352.<br /> [9]BLANCO R,OTTAVIANO G,MEIJ E.Fast and space-efficient entity linking for queries//Proceedings of the Eighth ACM International Conference on Web Search and Data Mining.ACM,2015:179-188.<br /> [10]PAPPU A,BLANCO R,MEHDAD Y,et al.Lightweight multilingual entity extraction and linking//Proceedings of the Tenth ACM International Conference on Web Search and Data Mining.ACM,2017:365-374.<br /> [11]WANG Z,LI J,WANG Z,et al.Cross-lingual knowledge linking across wiki knowledge bases//International Conference on World Wide Web.ACM,2012:459-468.<br /> [12]PAN L,WANG Z,LI J,et al.Domain Specific Cross-Lingual Knowledge Linking Based on Similarity Flooding//International Conference on Knowledge Science,Engineering and Ma-nagement.Cham:Springer,2016:426-438.<br /> [13]WANG Y C,WU C K,TSAI T H.Cross-Language Article Linking with Different Knowledge Bases Using Bilingual Topic Model and Translation Features.Knowledge-Based Systems,2016,111(3):228-236.<br /> [14]SHEN W,WANG J,LUO P,et al.LINDEN:linking named entities with knowledge base via semantic knowledge//Proceedings of the 21st International Conference on World Wide Web.ACM,2012:449-458.<br /> [15]TSAI C T,DAN R.Cross-lingual Wikification Using Multi-lingual Embeddings//Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2016:589-598.<br /> [16]SORG P,CIMIANO P.Enriching the crosslingual link structure of wikipedia-a classification-based approach//Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence.Chicago,Illinois,2008:49-54.<br /> [17]OH J H,KAWAHARA D,UCHIMOTO K,et al.Enriching multilingual language resources by discovering missing cross-language links in wikipedia//Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology-Volume 01.IEEE Computer So-ciety,2008:322-328.<br /> [18]SHEARKAT E,MILIOS E E.Vector embedding of wikipedia concepts and entities//International Conference on Applications of Natural Language to Information Systems.Cham:Springer,2017:418-428.<br /> [19]ARTETXE M,LABAKA G,AGIRRE E.Learning principled bilingual mappings of word embeddings while preserving monolingual invariance//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.2016:2289-2294.<br /> [20]HOFFART J,ALTUN Y,WEIKUM G.Discovering emerging entities with ambiguous names//Proceedings of the 23rd International Conference on World Wide Web.ACM,2014:385-396.<br /> [21]RATINOV L,ROTH D,DOWNEY D,et al.Local and global algorithms for disambiguation to wikipedia//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies-Volume 1.Association for Computational Linguistics,2011:1375-1384.<br /> [22]BARRÓN-CEDEÑO A,ESPAÑA-BONET C,BOLDOBA J,et al.A factory of comparable corpora from wikipedia//Proceedings of the Eighth Workshop on Building and Using Comparable Corpora.2015:3-13.<br /> [23]ZHANG T,LIU K,ZHAO J.Cross Lingual Entity Linking with Bilingual Topic Model//Proceedings of the 23rd InternationalJoint Conference on Artificial Intelligence.Beijing,China:AAAI Press,2013:2218-2224.<br /> [24]LEE C P,LIN C J.Large-scale linear ranksvm.Neural Computation,2014,26(4):781-817. |
[1] | 张俊林 曲为民 杜林 孙玉芳. 跨语言信息检索研究进展 计算机科学, 2004, 31(7): 16-19. |
[2] | 张玥杰 连理 吴立德. 一种新型的跨语言信息检索技术 计算机科学, 2002, 29(8): 66-72. |
|