摘要: 词语相似度计算是自然语言处理的关键技术之一,是一个被广泛研究的基础课题。传统的词语相似度量方法大多是基于语义知识和基于语料库统计的方法,即这两类方法需要具有层次关系组织的语义词典和大规模的语料库。提出了一种新的基于百度百科的词语相似度量方法,通过分析百度百科词条信息,从表征词条的解释内容方面综合分析词条相似度,并定义了词条间的相似度计算公式,通过计算部分之间的相似度得到整体的相似度。实验结果表明,与已有的相似度计算方法对比,提出的算法更加有效合理。
[1] 章志凌,虞立群,陈奕秋,等.基于Corpus库的词语相似度计算方法[J].计算机应用,2006,26(3):638-640,4 [2] Salton G,Lesk M E.Computer evaluation of indexing and text processing[J].Journal of the ACM,1968,15(1):8-36 [3] Rada R.Development and application of a metric on semantic nets[J].IEEE Transactions on System.Man and Cybernetics,1989,19(1):17-30 [4] Lee J H.Information retrieval based on conceptual distance in ISA hierarchies [J].Journal of Documentation,1993,49(2):188-207 [5] Agirre E,Rigau G.A Proposal for word sense disambiguation using conceptual distance [C]∥International Conference/Recent Advances in Natural Language Recessing RANLP.95.Tzigov Chark,Bulgaria,1995:91-98 [6] Sussna M.Word sense disambiguation for free-text indexing using a massive semantic network[C]∥Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM’93).Washington,DC,US,1993:67-74 [7] 刘群,李素建.基于《知网》的词汇语义相似度计算[C]∥台北第三届汉语词汇语义学研讨会 [8] 王斌.汉英双语语料库自动对齐研究[D].北京:中国科学院计算技术研究所,1999 [9] Li Su-jian,et al.Semantic computation in Chinese question-an-swering system [J].Journal of Computer Science and Technology,2002,17(6):933-939 [10] Brown P.Word sense disambiguation using tactical methods[C]∥Proceedings of 29th Meeting of the Association For Computational Linguistics (ACL291).1991:210-207 [11] 胡俊峰,俞士汶.唐宋诗词汇间语义相似度计算[J].中文信息学报,2002(4):40-45 [12] Ferreri Cancho R,Sole R V.The small world of human language[J].Biological Sciences,2001,268(1482):2261-2265 [13] Seco N,Veale T,Hayes J.An Intrinsic Information ContentMetric for Semantic Similarity in WordNet[C]∥Proc of ECAI.2004 [14] 黄承慧,印鉴,候昉,等.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011(5):856-864 [15] 郑家恒,卢娇丽,等.关键词抽取方法的研究[J].计算机工程,2005(9):194-196 |
No related articles found! |
|