计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 95-99.doi: 10.11896/j.issn.1002-137X.2017.01.018

• 2016第六届中国数据挖掘会议 • 上一篇    下一篇

基于知识图谱的未登录词语义研究

朱峰,顾敏,郑好,顾彦慧,周俊生,曲维光   

  1. 南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023,南京师范大学计算机科学与技术学院 南京210023
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61272221,9),江苏省社科基金(12YYA002),江苏省高校自然科学基金项目(14KJB520022),山东省语言资源开发与应用重点实验室开放课题资助

Research on Sense Guessing of Chinese Unknown Words Based on Knowledge Graph

ZHU Feng, GU Min, ZHENG Hao, GU Yan-hui, ZHOU Jun-sheng and QU Wei-guang   

  • Online:2018-11-13 Published:2018-11-13

摘要: 传统的应用于未登录词语义研究的语料库包含许多限制,例如更新慢、语言相关等。为了解决此问题,提出了基于知识图谱的中文未登录词语义研究方法。知识图谱是一种包含实体、概念及语义关系的语义网络。它具有丰富的实体,并且实体及其关系的添加极为方便,使得弥补传统语料库更新慢的缺憾成为可能。在充分熟悉知识图谱的结构、数据获取方法及相关数据处理方法后,进行基于知识图谱的未登录词语义研究的探索工作,最后以百度百科(目前最大的中文知识图谱)为语料资源,在同一语义分析模型下分别进行基于知识图谱与传统语料的实验,对实验结果进行分析并提出改进方法。

关键词: 汉语未登录词语义预测,语义标注,知识图谱

Abstract: Semantic study based on traditional corpus has lots of limits,such as updating infrequently and being language-related.To tackle such issues,sense guessing of Chinese unknown words based on knowledge graph(KG) was proposed in this paper.KG is a semantic network containing entities,concepts and semantic relations.It has a huge number of entities and relations and it is very convenient to add them into the KG,which makes it possible to fix the infrequent updating problem.After the introduction of the structure of knowledge graph,how to get data and ways to process them,some exploration about KG-based sense guessing of Chinese unknown words were excuted.At last,Bai-duBaike,which has the most abundant chinese-related data,is used as the corpus with traditional ones to do experiments that are particularly designed to use one specific sense guessing model.This paper also compared the results of experiments based on different knowledge bases and proposed some improvement work.

Key words: Sense guessing of Chinese unknown words,Semantic annotation,Knowledge graph

[1] SUN Mao-song,ZOU Mao-song.Several problems in Automatic Chinese Word Segmentation[J].Applied Linguistics,1995,16(4):40-46.(in Chinese) 孙茂松,邹嘉彦.汉语自动分词研究中的若干理论问题[J].语言文字应用,1995,16(4):40-46.
[2] CHEN Xiao-he.A package scheme for identifying unlisted words in Chinese segmentation[J].Applied Linguistics,1993,3(3):103-109.(in Chinese) 陈小荷.自动分词中未登录词问题的一揽子解决方案[J].语言文字应用,1999,13(3):103-109.
[3] LUA K T.Prediction of Meaning of Bi-syllabic Chinese Com-pound Words Using Back Propagation Neural Network[J].Computational Processing of Oriental Languages,1997,11(2):133-144.
[4] SHANG Feng-feng,GU Yan-hui,DAI Ru-bing,et al.Researchon the Sense Guessing of Chinese Unknown Words Based on Semantic Knowledge-base of Modern Chinese [J].Acta Scientiarum naturalium Universitatis Pekinensis,2016,2(1):10-16.(in Chinese) 尚芬芬,顾彦慧,戴茹冰,等.基于《现代汉语语义词典》的未登录词语义预测研究[J].北京大学学报:自然科学版,2016,52(1):10-16.
[5] CHEN K,CHEN C.Automatic Semantic Classification for Chinese Unknown Compound Nouns[C]∥Proceedings of the 18th International Conference on Computational Linguistics (COLING),2000.USA,2000:173-179.
[6] CHEN C.Character-sense Association and Compounding Template Similarity:Automatic Semantic Classification of Chinese Compounds[C]∥Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing.Barcelona.2004:33-40.
[7] LU Xiao-fei.Hybrid Model for Chinese Unknown Word Resolution[D].The Ohio State University,2006.
[8] LU Xiao-fei.Hybrid Model for Semantic Classification of Chinese Unknown Words[C]∥Proceedings of North American Chapter of the Association for Computational Linguistics-Human Language Technologies 07,2007.New York,2007:188-195.
[9] ZHANG Rui-xia,XIAO Han.The construction of Lattice based on HowNet [J].Journal of North China Institute of Water Conservancy and Hydro Electric Power,2008,9(3):53-56.(in Chinese) 张瑞霞,肖汉.基于《知网》的词图构造[J].华北水利水电学院学报,2008,29(3):53-56.
[10] LU Xiao-fei.Hybrid Model for Chinese Unknown Word Resolution[D].The Ohio State University,2006.
[11] LU Xiao-fei.Hybrid Models for Semantic Classification of Chinese Unknown Words[C]∥Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics,2007.USA,2007:188-195.
[12] BORDES A,GABRILOVICH E.Constructing and Mining Web-Scale Knowledge Graphs:WWW 2015 Tutorial[C]∥Procee-dings of International Conference on World Wide Web,2015.Italy,2015:1523.
[13] MASS Y,SAGIV Y.Knowledge Management for KeywordSearch over Data Graphs[C]∥Proceedings of the 23rd ACM International Conference on Information and Knowledge Management,2014.China,2014:2051-2053.
[14] WANG Z,ZHANG J,FENG J L,et al.Knowledge Graph andText Jointly Embedding[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing,2014.Qatar,2014:1591-1601.
[15] ROKACH L,MAIMON O.Data mining and knowledge disco-very handbook(2nd ed)[M].US:Springer,2005:321-352.
[16] ALFRED R,FUN T S,TAHIR A,et al.Concepts Labeling of Document Clusters Using a Hierarchical Agglomerative Clustering (HAC) Technique[C]∥The 8th International Conference on Knowledge Management in Organizations.Springer Netherlands,2013:263-272.
[17] TONG H,FALOUTSOS C,PAN J Y.Fast Random Walk with Restart and Its Applications[C]∥Proceedings of IEEE International Conference on Data Mining,2006.China,IEEE Computer Society,2006:613-622.
[18] XIA J,CARAGEA D,HSU W H.Bi-relational Network Analysis Using a Fast Random Walk with Restart[C]∥Proceedings of IEEE International Conference on Data Mining,2009.USA,IEEE Computer Society,2009:1052-1057.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!