规则与统计相结合的词义消歧方法研究

计算机科学 ›› 2013, Vol. 40 ›› Issue (12): 282-286.

规则与统计相结合的词义消歧方法研究

苗海,张仰森

北京信息科技大学计算机学院北京100192;北京信息科技大学计算机学院北京100192

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家自然科学基金项目:基于语义分析的中文微博信息挖掘方法研究(61370139),国家自然科学基金项目:基于语义分析的汉语文本错误自动侦测与纠错方法研究(61070119),北京市属高等学校创新团队建设与教师职业发展计划项目:面向大数据内容理解的理论基础及智能化处理技术(IDHT20130519)资助

Research of Word Sense Disambiguation Based on Combination of Rules and Statistics

MIAO Hai and ZHANG Yang-sen

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 针对多年来词义消歧方法的不完善,从可计算性及其计算复杂度方面分析了多种不同结构的知识词典,最后选择北大计算语言所的《现代汉语语法信息词典》、《现代汉语语义词典》和同形标注的人民日报语料作为词义消歧知识源。研究了异构多知识源的融合方法,提取了敏捷规则知识库和词义搭配库,设计出了一种规则与统计相结合的词义消歧方法。在多种方法中最大熵与规则相结合的词义消歧方法准确率最高,与SemEval 2007(task #5)的最好成绩相比,分别在微平均值 MicroAve(micro-average accuracy)和宏平均值MacroAve(macro-average accuracy)上提升了5.5%和0.9%。

关键词: 词义消歧,知识源,规则,统计

Abstract: In this paper,various structure knowledge dictionaries were analyzed in the computability and computational complexity aspects．The grammatical knowledge-base of contemporary Chinese and Modern Chinese Semantic Dictionary,both from the Institute of Computational Chinese Linguistics of Peking University,were chosen as the knowledge source．Fusion method of more heterogeneous source was considered,and agile rules knowledge base and lexical collocation library were constructed,and a word sense disambiguation method of rules and statistics combination was designed．The method of combining maximum entropy and rule presents the highest accuracy in many kinds of word sense disambiguation method．Compared to the best result in the SemEval 2007(task #5),the MicroAve (micro-average accuracy) and MacroAve (macro-average accuracy) are promoted by 5.5% and 0.9%.

Key words: Word sense disambiguation,Knowledge source,Rule,Statistics

苗海,张仰森. 规则与统计相结合的词义消歧方法研究[J]. 计算机科学, 2013, 40(12): 282-286. https://doi.org/

MIAO Hai and ZHANG Yang-sen. Research of Word Sense Disambiguation Based on Combination of Rules and Statistics[J]. Computer Science, 2013, 40(12): 282-286. https://doi.org/

参考文献

[1] Wu Yun-fang,Jin Peng,Zhang Yang-sen,et al．A Chinese Corpus with Word Sense Annotation[C]∥Proceedings of 21th International Conference on Computer Processing of Oriental Languages.Singapore,2006:12
[2] 张仰森,黄改娟.基于多知识源的汉语词义消歧方法[J].汉语学报,2008(2):46-52
[3] Jaynes E T．Information Theory and Statistical Mechanics[J]．Physical Review,1957,106(4):620-630
[4] Wang Shao-jun,Schuurmans D,Zhao Yun-xin．The Latent Ma-ximum Entropy Principle[C]∥IEEE International Symposium on Information Theory．2002:182-185
[5] 李生,张晶,赵铁军,等.词义消歧研究的现状与发展方向[J].计算机科学,2001,8(9):95-98
[6] 张仰森,郭江.4种统计词义消歧模型的分析与比较[J].北京信息科技大学学报:自然科学版,2011,6(2):13-18
[7] 何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010,1(6):1287-1295

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed