Computer Science ›› 2017, Vol. 44 ›› Issue (Z11): 422-427.doi: 10.11896/j.issn.1002-137X.2017.11A.090

Previous Articles     Next Articles

Text Similarity Calculation Based on Semantic Dictionary and Word Frequency Information

DONG Yuan and QIAN Li-ping   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Considering the drawbacks of semantic understanding and frequent word appearance,this paper proposed a text similarity algorithm based on semantic dictionary and word frequency information,referred to as TSSDWFI.In particular,the proposed algorithm aims at evaluating the similarity between two texts by calculating the expanded similarity between any two words in texts and the maximum similarity matching between text words.The proposed algorithm adopts semantic dictionary to calculate similarity between texts and takes into account the similarity relationship between different words and the frequency of word appearance in the text.Simulation results show that,compared with the existing algorithms,the proposed algorithm TSSDWFI has higher accuracy.

Key words: Text mining,Text similarity,Semantic dictionary,Keywords,Word frequency

[1] 陈飞宏.基于向量空间模型的中文文本相似度算法研究[D].成都:电子科技大学,2011.
[2] 张振亚,王进,程红梅,等.基于余弦相似度的文本空间索引方法研究[J].计算机科学,2005,32(9):160-163.
[3] 吴奎,周献中,王建宇,等.基于贝叶斯估计的概念语义相似度算法[J].中文信息学报,2010,24(2):52-57.
[4] 郭庆琳,李艳梅,唐琦.基于VSM的文本相似度计算的研究[J].计算机应用研究,2008,25(11):3256-3258.
[5] 卫驰.基于TFIDF的文本分类算法[D].杭州:浙江大学,2015.
[6] 冯荣俊.基于文档频率的特征提取算法的改进及应用[D].南京:南京邮电大学,2005.
[7] 韩如冰,叶得学.基于VSM的权重改进文档相似度算法研究[J].软件,2012,33(10):103-105.
[8] 王格,吴钊,李向.基于全文检索的文本相似度算法应用研究[J].计算机与数字工程,2016,44(4):567-571.
[9] 刘杰,郭宇,汤世平,等.基于《知网》2008的词语相似度计算[J].小型微型计算机系统,2015,36(8):1728-1733.
[10] 吴健,吴朝晖,李莹,等.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602.
[11] 张沪寅,刘道波,温春艳.基于《知网》的词语语义相似度改进算法研究[J].计算机工程,2015,41(2):151-156.
[12] 肖志军,冯广丽.基于《知网》义原空间的文本相似度计算[J].科学技术与工程,2013,13(29):8651-8656.
[13] 孙润志.基于语义理解的文本相似度计算研究与实现[D].辽宁:中国科学院研究生院(沈阳计算技术研究所),2015.
[14] 袁晓峰.基于《知网》的文本相似度研究[J].成都大学学报(自然科学版),2014,33(3):251-253.
[15] 陈攀,杨浩,吕品,等.基于LDA模型的文本相似度研究[J].计算机技术与发展,2016,26(4):82-85.
[16] 王蒙.基于LDA的文本推荐算法的研究及在文献检索的应用[D].沈阳:辽宁大学,2015.
[17] 王振振,何明,杜永萍.基于LDA主题模型的文本相似度计算[J].计算机科学,2013,40(12):229-232.
[18] 田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,8(6):603-608.
[19] 杜坤,刘怀亮,王帮金.基于语义相关度的中文文本聚类方法研究[J].情报理论与实践,2016,39(2):129-133.
[20] 《同义词词林扩展版》[EB/OL].http://www.ir-lab.org.
[21] AGIRRE E,RIGAU G.A Proposal for Word Sense Disambi-guantion Using Conceptual Distance[C]∥ Proc.of Recent Advances in NLP(RANLP).1995:258-264.
[22] 李荣陆.Reuters-21578语料说明[EB/OL].http://more.datatang.com/data/43318.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!