摘要: 词语语义相似度计算在自然语言处理如词义消歧、语义信息检索、文本自动分类中有着广泛的应用。不同于传统的方法,提出的是一种基于维基百科社区挖掘的词语语义相似度计算方法。本方法不考虑单词页面文本内容,而是利用维基百科庞大的带有类别标签的单词页面网信息,将基于主题的社区发现算法HITS应用到该页面网,获取单词页面的社区。在获取社区的基础上,从3个方面来考虑两个单词间的语义相似度:(1)单词页面语义关系;(2)单词页面社区语义关系;(3)单词页面社区所属类别的语义关系。最后,在标准数据集WordSimilarity-353上的实验结果显示,该算法具有可行性且略优于目前的一些经典算法;在最好的情况下,其Spearman相关系数达到0.58。
[1] Liu Qun,Li Su-jian.Word Similarity Computing Based on How-net[J].Computational Linguistics and Chinese Language Processing,2002,7(2):59-76(in Chinese) 刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76 [2] Leacock C,Chodorow M.Combining local context and WordNet similarity for word sense identification[M]∥WordNet:An Electronic Lexical Database.1998:265-283 [3] Resnik P.Using information content to evaluate semantic similarity in a taxonomy[J].arXiv:cmp-lg/ 9511007,1995 [4] Strube M,Ponzetto S P.WikiRelate! Computing semantic rela-tedness using Wikipedia[C]∥AAAI.2006:1419-1424 [5] Gabrilovich E,Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]∥IJCAI.2007:1606-1611 [6] Milne D.Computing semantic relatedness using wikipedia link structure[C]∥Proceedings of the New Zealand Computer Science Research Student Conference.2007:63-70 [7] Wang Rui-qin.Measurement of Semantic Relatedness between Words Based on Link Information of Wikipedia[J].Journal of the China Society for Scientific and Technical Information,2013,32(4):385-389(in Chinese) 王瑞琴.基于Wikipedia链接信息的词汇语义相关性度量[J].情报学报,2013,32(4):385-389 [8] Ye F,Zhang F,Luo X,et al.Research on measuring semantic correlation based on the Wikipedia hyperlink network[C]∥2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).IEEE,2013:309-314 [9] Taieb M A H,Aouicha M B,Hamadou A B.Computing semantic relatedness using Wikipedia features[J].Knowledge-Based Systems,2013,50:260-278 [10] Sheng Zhi-chao,Tao Xiao-peng.Semantic Similarity Computing Method Based on Wikipedia [J].Computer Engineering,2011,37(7):193-195(in Chinese) 盛志超,陶晓鹏.基于维基百科的语义相似度计算方法[J].计算机工程,2011,37(7):193-195 [11] Sun Chen-chen,Shen De-rong, Shan Jing,et al.WSR:A Semantic Relatedness Measure Based on Wikipedia Structure[J],Chinese Journal of Computers,2012,35(11):2361-2370(in Chinese) 孙琛琛,申德荣,单菁,等.WSR:一种基于维基百科结构信息的语义关联度计算算法[J].计算机学报,2012,35(11):2361-2370 [12] Liu Xiao-liang.Research on Computation of Lexical SemanticRelatedness Based on Wikipedia Semantic Graph[J].Journal of the China Society for Scientific and Technical Information,2014,33(11):1124-1132(in Chinese) 刘晓亮.基于维基语义图的词语语义相关度计算研究[J].情报学报,2014,33(11):1124-1132 [13] Bellomi F,Bonato R.Network analysis for Wikipedia[C]∥Proceedings of Wikimania.2005 [14] Lizorkin D,Medelyan O,Grineva M.Analysis of communitystructure in wikipedia[C]∥Proceedings of the 18th InternationalConference on World Wide Web.ACM,2009:1221-1222 [15] Li Yun.Research about semantic knowledge mining based onthe Chinese Wikipedia[D].Beijing:Beijing University of Posts and Telecommunications,2009 [16] Kleinberg J M.Authoritative sources in a hyperlinked environ-ment[J].Journal of the ACM (JACM),1999,46(5):604-632 [17] Jarmasz M.Roget’s thesaurus as a lexical resource for natural language processing[J].arXiv: 1204.0140,2 [18] Landauer T K,Foltz P W,Laham D.An introduction to latent semantic analysis[J].Discourse Processes,1998,25(2/3):259-284 [19] Levenshtein V I.Binary codes capable of correcting deletions,insertions,and reversals[C]∥Soviet Physics Doklady,1966,10(10):707-710 [20] 维基百科数据集.http://dumps.wikimedia.org/ [21] DataMachine.http://search.maven.org/#search|ga|1|tudarm stadt.ukp |
No related articles found! |
|