计算机科学 ›› 2014, Vol. 41 ›› Issue (2): 23-32.

• CCML 2013 • 上一篇    下一篇

词义归纳综述

孙玉霞,曲维光,狄颖,周俊生   

  1. 南京师范大学计算机科学与技术学院 南京210023;南京师范大学计算机科学与技术学院 南京210023;南京师范大学计算机科学与技术学院 南京210023;南京师范大学计算机科学与技术学院 南京210023
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金(61272221),江苏省社科基金 (12YYA002),国家社科基金(11CYY030,10CYY021)资助

Review of Word Sense Induction

SUN Yu-xia,QU Wei-guang,DI Ying and ZHOU Jun-sheng   

  • Online:2018-11-14 Published:2018-11-14

摘要: 对于很多自然语言处理任务,如机器翻译、信息检索等,使用词义来进行相关表征其效果要比单纯使用词语的好得多。由于词义消歧需要大量标注语料、存在词义缺失等问题,词义归纳受到越来越多的关注。介绍了近年来词义归纳的一些相关工作和发展,并从词义归纳概述、相关技术、评估方法这3个方面进行了详述,最后对词义归纳工作进行了总结和展望。

关键词: 词义归纳,向量空间,图,评估方法 中图法分类号TP391文献标识码A

Abstract: For many natural language processing tasks,such as machine translation,information retrieval,using word sense other than word itself as feature can perform much better.However,word sense disambiguation requires a large number of marked corpuses,at the same time,there are some problems hindering its application,for example the absence of some word senses.Therefore,people pay more attention to word sense induction.This paper introduced the related works and development of WSI from three aspects,which include the introduction of WSI,the related WSI methods and the evaluations.At last,we summaried and outlooked these works.

Key words: Word sense induction,Feature space,Graph,Evaluation

[1] Dorow B,Widdows D.Discovering corpus-specific word senses[C]∥Proceedings of the 10th conference of the European chapter of the ACL.2003:79-82
[2] van Dongen S.A cluster algorithm for graphs[R].Technical Report INS-R0010.National Research Institute for Mathematics and Computer Science,2000
[3] de Cruys T V,Apidianaki M.Latent Semantic Word Sense Induction and Disambiguation[C]∥the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.2011:1476-1485
[4] Klapaftis I P,Manandhar S.UOY:A Hypergraph Model ForWord Sense Induction & Disambiguation[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007).2007:414-417
[5] Cai Xiao-yan,Dai Guan-zhong,Yang Li-bin.Survey on Spectral Clustering Alogrithms[J].Computer Science,2008,5(7):14-18
[6] Korkontzelos I,Manandhar S.UoY:Graphs of UnambiguousVertices for Word Sense Induction and Disambiguation[C]∥Proceeding of the 5th International Workshop on Semantic Eva-luation.2010:355-358
[7] Veronis J.Hyperlex:lexical cartography for information retrie-val[J].Computer Speech & Language,2004,8(3):223-252
[8] Biemann C.Chinese whispers-an effcient graph clustering algorithm and its application to natural language processing problems[C]∥Proceedings of TextGraphs.2006:73-80
[9] Zhang Bi-chuan,Sun Jia-shen.Word Sense Induction using Cluster Ensemble[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[10] Pedersen T,Bruce R.Distinguishing word senses in untaggedtext[C]∥Proceedings of the Second Conference on Empirical Methods in Natural Language Processing.1997:197-207
[11] Schütze H.Automatic word sense discrimination[J].Computational Linguistics,1998,24(1):97-123
[12] Zhao Y,Karypis G.Evaluation of hierarchical clustering algo-rithms for document datasets[C]∥Proceedings of the 11th Conference of Information and Knowledge Management (CIKM).2002:515-524
[13] Purandare A,Pedersen T.Word sense discrimination by clustering contexts in vector and similarity spaces[C]∥Proceedings of the CoNLL.2004:41-48
[14] Pedersen T,Bruce R.Knowledge lean word sense disambigu-ation[C]∥Proceedings of the Fifteenth National Conference on Artificial Intelligence.1998:800-805
[15] Schütze H.Dimensions of meaning[C]∥Proceedings of Super Computing.1992:787-796
[16] Bordag S.Word Sense Induction:Triplet-Based Clustering and Automatic Evaluation[C]∥Proceedings of the 11st Conference of the European Chapter of the Association for Computational Linguistics.2006
[17] Niu Zheng-yu,Ji Dong-hong,Tan C-L.I2R:Three Systems forWord Sense Discrimination,Chinese Word Sense Disambiguation,and English Word Sense Disambiguation[C]∥Procee-dings of the 4th International Workshop on Semantic Evaluations (SemEval-2007).2007:177-182
[18] Kern R,Muhr M,Granitzer M.KCDC:Word Sense Induction by Using Grammatical Dependencies and Sentence Phrase Structure[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.ACL,2010:351-354
[19] Pinto D,Rosso P,Jimenez-Salazar H.UPV-SI:Word Sense Induction using Self Term Expansion[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007).2007:430-433
[20] Van de Cruys T.Using Three Way Data for Word Sense Discrimination[C]∥Proceedings of the 22nd International Confe-rence on Computational Linguistics (Coling 2008).2008:929-936
[21] Purandare A,Pedersen T.SenseClusters-Finding Clusters thatRepresent Word Senses[M].Department of Computer Science,University of Minnesota,2007
[22] Karypis G.CLUTO-a clustering toolkit[R].Technical Report02-017.Department of Computer Science,University of Minnesota,2002
[23] Banerjee S,Pedersen T.The design,implementation,and use of the Ngram Statistics Package[C]∥Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics.2003:370-381
[24] Berry M,Do T,O’Brien G,et al.SVDPACK (version 1.0) user’s guide[R].Technical Report CS-93-194. Computer Science Department, University of Tennessee at Knoxville,2003
[25] Pedersen T.UMND2:SenseClusters Applied to the Sense Induction Task of SENSEVAL-4[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007).2007:394-397
[26] Pedersen T.Duluth-WSI:SenseClusters Applied to the Sense Induction Task of SemEval-2[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.ACL,2010:351-354
[27] Zhang Hao,Xiao Tong,Zhu Jing-bo.NEUNLPLab ChineseWord Sense Induction System for SIGHAN Bakeoff 2010[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[28] Pedersen T,Kayaalp M,Bruce R.Significant lexical relation-ships[C]∥Proceedings of the Thirteenth National Conference on Artificial Intelligence.1996:455-460
[29] Liu Zhao,Qiu Xi-peng,Huang Xuan-jing.Triplet-Based Chinese Word Sense Induction[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[30] Kanerva P,Kristoferson J,Holst A.Random indexing of textsamples for latent semantic analysis[C]∥Proceedings of the 22nd Annual Conference of the Cognitive Science Society.2000:1036-1040
[31] 蔡科,史晓东,陈毅东,等.基于层次聚类的中文词义归纳[J].心智计算,2010,4(3):159-167
[32] Apidianaki M.Translation-oriented word sense induction based on parallel corpora[C]∥Proceedings of the Sixth International Language Resources and Evaluation (LREC’08).2008
[33] Elshamy W,Caragea D,Hsu W H.KSU KDD:Word Sense Induction by Clustering in Topic Space[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.ACL,2010:367-370
[34] Brody S,Lapata M.Bayesian Word Sense Induction[C]∥Proceedings of the 12th Conference of the European Chapter of the ACL.2009:103-111
[35] Blei,David M,Ng A Y,et al.Latent dirichlet allocation[C]∥Journal of Machine Learning Research.2003:993-1022
[36] Yao Xu-chen,Van Durme B.Nonparametric Bayesian WordSense Induction[C]∥Proceedings of the TextGraphs-6Workshop.2011:10-14
[37] Teh Y W,Jordan M I,Beal M J,et al.Hierarchical Dirichlet Processes[J].Journal of the American Statistical Association,2006,101(476):1566-1581
[38] Klapaftis I P,Manandhar S.Word Sense Induction UsingGraphs of Collocations[C]∥Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.2007
[39] Manandhar S,Klapaftis I P.Semeval-2010Task 14:Evaluation Setting for Word Sense Induction & Disambiguation Systems[C]∥DEW’09:Proceedings of the Workshop on Semantic Eva-luations:Recent Achievements and Future Directions.2009:117-122
[40] Manandhar S,Klapaftis I P,Dligach D,et al.SemEval-2010 Task 14:Word Sense Induction & Disambiguation[C]∥Proceedings of the 5th International Workshop on Semantic Evaluation.ACL,2010:63-68
[41] Agirre E,Soroa A.SemEval-2007 Task 02:Evaluating WordSense Induction and Discrimination Systems[C]∥Proceedings of SemEval-2007.2007:7-12
[42] Zhao Ying,Karypis G,Fayyad U.Hierarchical clustering algo-rithms for document datasets[J].Data Mining and Knowledge Discovery,2005,0(2):141-168
[43] Rosenberg A,Hirschberg J.V-measure:A Conditional Entropy-based External Cluster Evaluation Measure[C]∥Proceedings of the 2007EMNLP-CoNLL Joint Conference.2007:410-420
[44] Purandare A.Discriminating among word senses using mcquitty’s similarity analysis[C]∥Proceedings of the HLT-NAACL 2003Student Research Workshop.2003:19-24
[45] Jurgens D.Word Sense Induction by Community Detection[C]∥Proceedings of the TextGraphs-6Workshop.2011:24-28
[46] Klapaftis I P,Manandhar S.Word Sense Induction & Disambiguation Using Hierarchical Random Graphs[C]∥Proceedings of the 2010Conference on Empirical Methods in Natural Language Processing.2010:745-755
[47] Clauset A,Moore C,Newman M E J.Hierarchical Structure and the Prediction of Missing Links in Networks[J].Nature,2008,453(7191):98-101
[48] Agirre E,Soroa A.UBC-AS:A Graph Based Unsupervised System for Induction and Classification[C]∥Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007).2007:346-349
[49] He Zheng-yan,Song Yang,Wang Hou-feng.Applying SpectralClustering for Chinese Word Sense Induction[M].The first CIPS-SIGHAN Joint Conference on Chinese Language Proces-sing(CLP2010).2010
[50] Slonim,Friedman,Tishby.Unsupervised Document Classication Using Sequential Information Maximization[C]∥Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2002
[51] Pantel,Patrick,Lin De-kang.Discovering word senses from text[C]∥Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2002:613-619
[52] Kleinberg J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM,1999,6(5):604-632
[53] Korkontzelos I,Klapaftis I,Manandhar S.Graph ConnectivityMeasures for Unsupervised Parameter Tuning of Graph-Based Sense Induction Systems[C]∥Proceedings of the NAACL HLT Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics.2009:36-44
[54] Jia Yu-xiang,Yu Shi-wen,Chen Zheng-yan.Chinese Word Sense Induction with Basic Clustering Algorithms[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Proces-sing(CLP2010).2010
[55] Zhang Zhen-zhong,Sun Le,Li Wen-bo.ISCAS:A System forChinese Word Sense Induction Based on K-means Algorithm[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[56] Xu Hua,Liu Bing,Qian Long-hua,et al.Soochow University:Description and Analysis of the Chinese Word Sense Induction System for CLP2010[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[57] Wang Li-sha,Dou Yan-zhao,Sun Xiao-ling,et al.K-means and Graph-based Approaches for Chinese Word Sense Induction Task[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010
[58] Jin Peng,Sun Rui,Zhang Yi-hao.A Knowledge based Methodfor Chinese Word Sense Induction[C]∥Genetic and Evolutiona-ry Computing(ICGEC).2010:248-251
[59] Jin Peng,Zhang Yi-hao,Sun Rui.LSTC System for ChineseWord Sense Induction[C]∥The first CIPS-SIGHAN Joint Conference on Chinese Language Processing(CLP2010).2010

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!