计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 252-255.doi: 10.11896/j.issn.1002-137X.2017.04.053

• 人工智能 • 上一篇    下一篇

基于上下文翻译的有监督词义消歧研究

杨陟卓   

  1. 山西大学计算机科学与信息技术学院 太原030006
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金项目(61502287),山西省高校科技创新项目(2015105),国家863计划项目(2015AA015407)资助

Supervised WSD Method Based on Context Translation

YANG Zhi-zhuo   

  • Online:2018-11-13 Published:2018-11-13

摘要: 针对目前有监督词义消歧方法存在的数据稀疏问题,提出一种基于上下文翻译的词义消歧方法。该方法假设由歧义词上下文的译文所组成的语境与原上下文语境所表述的意义相似。根据此假设,首先,将译文所组成的上下文生成大量的伪训练语料;然后,利用真实训练语料和伪训练语料训练一个贝叶斯消歧模型;最后,利用该消歧模型决策歧义词的词义。实验结果表明, 与传统的消歧方法相比,所提出的方法消歧准确率提高了4.35%,并且超过了参加SemEval-2007测评的最好的有监督消歧系统。

关键词: 词义消歧,上下文扩充,机器翻译,伪训练语料,贝叶斯模型

Abstract: In order to overcome the data sparseness problem for supervised WSD methods,this paper presented a WSD method based on context translation.The method assumes that the context consisted of the ambiguous words has the similar meaning as the context in the original.Under this assumption,first,a large number of pseudo training data are generated in the context of the target text.Then the Bayesian model is trained by utilizing both authentic and pseudo training data.Finally,the method performs word sense disambiguation by using Bayesian model.Experimental results show that the proposed method can significantly improve traditional WSD accuracy by 4.35%,and outperforms the best participating system in the SemEval-2007 evaluation.

Key words: Data sparseness,Context expansion,Machine translation,Pseudo training data,Bayesian model

[1] CHAN Y S,NG H T.Scaling up word sense disambiguation via parallel texts[C]∥AAAI.2005:1037-1042.
[2] WANG R Q,KONG F S.Research on unsupervised word sense disambiguation[J].Journal of Software,2009,20(8):2138-2152.(in Chinese) 王瑞琴,孔繁胜.无监督词义消歧研究[J].软件学报,2009,20(8):2138-2152.
[3] NAVIGLI R.Word sense disambiguation:A survey[J].ACM Computing Surveys (CSUR),2009,41(2):1-69.
[4] YAROWSKY D.Unsupervised word sense disambiguation rivaling supervised methods[C]∥Proceedings of the 33rd annual meeting on Association for Computational Linguistics.Association for Computational Linguistics,1995:189-196.
[5] LEACOCK C,MILLER G A,CHODOROW M.Using corpus statistics and WordNet relations for sense identification[J].Computational Linguistics,1998,24(1):147-165.
[6] MIHALCEA R,MOLDOVAN D I.A method for word sense disambiguation of unrestricted text[C]∥Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics.Association for Computational Linguistics,1999:152-158.
[7] MIHALCEA R,MOLDOVAN D I.An automatic method for generating sense tagged corpora[C]∥AAAI/IAAI.1999:461-466.
[8] AGIRRE E,MARTINEZ D.Unsupervised WSD based on Automatically Retrieved Examples:The Importance of Bias[C]∥EMNLP.2004:25-32.
[9] BRODY S,LAPATA M.Good neighbors make good senses:Exploiting distributional similarity for unsupervised WSD[C]∥Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1.Association for Computational Linguistics,2008:65-72.
[10] BROWN P F,PIETRA S A D,Pietra V J D,et al.Word-sense disambiguation using statistical methods[C]∥Proceedings of the 29th Annual Meeting on Association for Computational Linguistics.Association for Computational Linguistics,1991:264-270.
[11] HE Q Z,WANG H F.Chinese Word Sense DisambiguationBased on Maximum Entropy Model with Feature Selection[J].Journal of Software,2010,21(6):1287-1295.(in Chinese) 何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧[J].软件学报,2010,21(6):1287-1295.
[12] LU W P,HUANG H Y.Word sense disambigua -tion based on dependency fitness with automatic knowledge acquisition[J].Journal of Software,2013,4(10):2300-2311.(in Chinese) 鹿文鹏,黄河燕.基于依存适配度的知识自动获取词义消歧方法[J].软件学报,2013,24(10):2300-2311.
[13] LU W P,HUANG H Y,WU H.Word Sense Disambiguation with Graph Model Based on Domain Knowledge[J].Acta Automatica Sinica,2014,40(12):2836-2850.(in Chinese) 鹿文鹏,黄河燕,吴昊.基于领域知识的图模型词义消歧方法[J].自动化学报,2014,40(12):2836-2850.
[14] YANG Z Z,HUANG H Y.Graph Based Word Sense Disambi-guation Method Using Distance Between Words[J].Journal of Software,2012,23(4):776-785.(in Chinese) 杨陟卓,黄河燕.基于词语距离的网络图词义消歧[J].软件学报,2012,23(4):776-785.
[15] YANG Z Z,HUANG H Y.WSD Method Based on Heteroge-neous Relation Graph Network[J].Journal of Computer Research and Development,2013,50(2):437-444.(in Chinese) 杨陟卓,黄河燕.基于异构关系网络图的词义消歧研究[J].计算机研究与发展,2013,50(2):437-444.
[16] ESCUDERO G,MRQUEZ L,RIGAU G.Naive Bayes and exem-plar-based approaches to word sense disambiguation revisited.http://www.cs.opc.edu/~escudero/wsd/00-ecai.pdf .
[17] JIN P,WU Y,YU S.Semeval-2007 task 05:Multilingual chi-nese-english lexical sample[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations.Association for Computational Linguistics,2007:19-23.
[18] XING Y.SRCB-WSD:Supervised Chinese word sense disambi-guation with key features[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations.Association for Computational Linguistics,2007:300-303.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!