Computer Science ›› 2016, Vol. 43 ›› Issue (4): 45-49.doi: 10.11896/j.issn.1002-137X.2016.04.009

Previous Articles     Next Articles

Semantic Similarity Computing Based on Community Mining of Wikipedia

PENG Li-zhen and WU Yang-yang   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Words semantic similarity computing has been widely used in natural language processing,such as word sense disambiguation,information retrieval,text auto categorization.Different from traditional methods,we presented an algorithm based on community mining of Wikipedia to compute words semantic similarity.Our method makes use of the huge Wikipedia page network with category labels rather than its textual content.To get the community of a word page,we applied the HITS,which is a community discovery algorithm based on the theme,to pages network.Based on the gotten community,we measured the semantic similarity between two words from three aspects:(1)semantic relations between the two word pages,(2)semantic relations between the two communities of word page,(3)semantic relations between the categories which two communities belong to.Finally,tests on standard data sets WordSimilarity-353 show that the method we proposed is feasible and slightly better than some classic algorithms.In the best case,the Spearman correlation coefficient reaches 0.58.

Key words: Semantic similarity,Community discovery,Wikipedia

[1] Liu Qun,Li Su-jian.Word Similarity Computing Based on How-net[J].Computational Linguistics and Chinese Language Processing,2002,7(2):59-76(in Chinese) 刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76
[2] Leacock C,Chodorow M.Combining local context and WordNet similarity for word sense identification[M]∥WordNet:An Electronic Lexical Database.1998:265-283
[3] Resnik P.Using information content to evaluate semantic similarity in a taxonomy[J].arXiv:cmp-lg/ 9511007,1995
[4] Strube M,Ponzetto S P.WikiRelate! Computing semantic rela-tedness using Wikipedia[C]∥AAAI.2006:1419-1424
[5] Gabrilovich E,Markovitch S.Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis[C]∥IJCAI.2007:1606-1611
[6] Milne D.Computing semantic relatedness using wikipedia link structure[C]∥Proceedings of the New Zealand Computer Science Research Student Conference.2007:63-70
[7] Wang Rui-qin.Measurement of Semantic Relatedness between Words Based on Link Information of Wikipedia[J].Journal of the China Society for Scientific and Technical Information,2013,32(4):385-389(in Chinese) 王瑞琴.基于Wikipedia链接信息的词汇语义相关性度量[J].情报学报,2013,32(4):385-389
[8] Ye F,Zhang F,Luo X,et al.Research on measuring semantic correlation based on the Wikipedia hyperlink network[C]∥2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).IEEE,2013:309-314
[9] Taieb M A H,Aouicha M B,Hamadou A B.Computing semantic relatedness using Wikipedia features[J].Knowledge-Based Systems,2013,50:260-278
[10] Sheng Zhi-chao,Tao Xiao-peng.Semantic Similarity Computing Method Based on Wikipedia [J].Computer Engineering,2011,37(7):193-195(in Chinese) 盛志超,陶晓鹏.基于维基百科的语义相似度计算方法[J].计算机工程,2011,37(7):193-195
[11] Sun Chen-chen,Shen De-rong, Shan Jing,et al.WSR:A Semantic Relatedness Measure Based on Wikipedia Structure[J],Chinese Journal of Computers,2012,35(11):2361-2370(in Chinese) 孙琛琛,申德荣,单菁,等.WSR:一种基于维基百科结构信息的语义关联度计算算法[J].计算机学报,2012,35(11):2361-2370
[12] Liu Xiao-liang.Research on Computation of Lexical SemanticRelatedness Based on Wikipedia Semantic Graph[J].Journal of the China Society for Scientific and Technical Information,2014,33(11):1124-1132(in Chinese) 刘晓亮.基于维基语义图的词语语义相关度计算研究[J].情报学报,2014,33(11):1124-1132
[13] Bellomi F,Bonato R.Network analysis for Wikipedia[C]∥Proceedings of Wikimania.2005
[14] Lizorkin D,Medelyan O,Grineva M.Analysis of communitystructure in wikipedia[C]∥Proceedings of the 18th InternationalConference on World Wide Web.ACM,2009:1221-1222
[15] Li Yun.Research about semantic knowledge mining based onthe Chinese Wikipedia[D].Beijing:Beijing University of Posts and Telecommunications,2009
[16] Kleinberg J M.Authoritative sources in a hyperlinked environ-ment[J].Journal of the ACM (JACM),1999,46(5):604-632
[17] Jarmasz M.Roget’s thesaurus as a lexical resource for natural language processing[J].arXiv: 1204.0140,2
[18] Landauer T K,Foltz P W,Laham D.An introduction to latent semantic analysis[J].Discourse Processes,1998,25(2/3):259-284
[19] Levenshtein V I.Binary codes capable of correcting deletions,insertions,and reversals[C]∥Soviet Physics Doklady,1966,10(10):707-710
[20] 维基百科数据集.http://dumps.wikimedia.org/
[21] DataMachine.http://search.maven.org/#search|ga|1|tudarm stadt.ukp

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!