Computer Science ›› 2013, Vol. 40 ›› Issue (6): 199-202.

Previous Articles     Next Articles

Word Similarity Measurement Based on BaiduBaike

ZHAN Zhi-jian,LIANG Li-na and YANG Xiao-ping   

  • Online:2018-11-16 Published:2018-11-16

Abstract: Research on word similarity measurement has been popular not only in natural language processing but also in other basic research.Traditional word similarity measurements use semantic lexical or large-scale corpus.We first discussed the background of the applications of word similarity measurement,such as information retrieval,information extraction,text classification,example-based machine translation,etc.Then two strategies of word similarity measurement were summarized:one is based on ontology or a semantic taxonomy,the other is based on large collocations of words in corpus.BaiduBaike,an online open encyclopedia,could be used not only as a corpus but also a knowledge resource with rich semantic information.Based on BaiduBaike with its rich semantic information and category graph,we proposed a new method to analyze and compute Chinese word similarity from four dimensions:the baike card,the content of word,the open classification of word and the correlation words.We used language-network to choose top key terms of content of word.Based on vector space mode (VSM) theory,we calculated the similarity between parts of words.We presented a new “multi-path searching” algorithm on BaiduBaike category graph.A comprehensive similarity measuring method based on the four parts was proposed.Experiment results show that the method has a good performance.

Key words: Word similarity,Language network,BaiduBaike,VSM

[1] 章志凌,虞立群,陈奕秋,等.基于Corpus库的词语相似度计算方法[J].计算机应用,2006,26(3):638-640,4
[2] Salton G,Lesk M E.Computer evaluation of indexing and text processing[J].Journal of the ACM,1968,15(1):8-36
[3] Rada R.Development and application of a metric on semantic nets[J].IEEE Transactions on System.Man and Cybernetics,1989,19(1):17-30
[4] Lee J H.Information retrieval based on conceptual distance in ISA hierarchies [J].Journal of Documentation,1993,49(2):188-207
[5] Agirre E,Rigau G.A Proposal for word sense disambiguation using conceptual distance [C]∥International Conference/Recent Advances in Natural Language Recessing RANLP.95.Tzigov Chark,Bulgaria,1995:91-98
[6] Sussna M.Word sense disambiguation for free-text indexing using a massive semantic network[C]∥Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM’93).Washington,DC,US,1993:67-74
[7] 刘群,李素建.基于《知网》的词汇语义相似度计算[C]∥台北第三届汉语词汇语义学研讨会
[8] 王斌.汉英双语语料库自动对齐研究[D].北京:中国科学院计算技术研究所,1999
[9] Li Su-jian,et al.Semantic computation in Chinese question-an-swering system [J].Journal of Computer Science and Technology,2002,17(6):933-939
[10] Brown P.Word sense disambiguation using tactical methods[C]∥Proceedings of 29th Meeting of the Association For Computational Linguistics (ACL291).1991:210-207
[11] 胡俊峰,俞士汶.唐宋诗词汇间语义相似度计算[J].中文信息学报,2002(4):40-45
[12] Ferreri Cancho R,Sole R V.The small world of human language[J].Biological Sciences,2001,268(1482):2261-2265
[13] Seco N,Veale T,Hayes J.An Intrinsic Information ContentMetric for Semantic Similarity in WordNet[C]∥Proc of ECAI.2004
[14] 黄承慧,印鉴,候昉,等.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011(5):856-864
[15] 郑家恒,卢娇丽,等.关键词抽取方法的研究[J].计算机工程,2005(9):194-196

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!