计算机科学 ›› 2016, Vol. 43 ›› Issue (8): 95-99.doi: 10.11896/j.issn.1002-137X.2016.08.020
李志华,陈超群,李村,胡振宇,张华伟
LI Zhi-hua, CHEN Chao-qun, LI Cun, HU Zhen-yu and ZHANG Hua-wei
摘要: 针对密文的相似性度量问题,提出了一种新的密文文本相似性度量方法。该方法通过定义关键词的有效作用域、相对作用域、分散域的概念,有效克服了现有的关键词权重量化方法不能对篇幅不同、结构不同的文档进行相对公平的关键词权重量化的不足,同时显著减少了文本度量时所依赖的关键词数量。进一步对文档的关键词进行重提取,并建立文档的关键词密文索引条目,通过密文的索引条目来度量密文的相似性。将该方法在真实文档上进行实验,并同其它算法进行比较,结果表明所提出的方法在准确率和召回率两方面优于其它参与比较的算法,并能在准确率和召回率之间取得比较好的平衡。
[1] Wang C,Cao N,Li J,et al.Secure ranked keyword search over encrypted cloud data[C]∥Proceedings of ICDCS.Genova,Italy,2010:253-262 [2] Sebastiani F.Machine learning in automated text categorization,acmcs[J].ACM Computing Surveys,2002,34(1):1-47 [3] Hemalatha S,Raja K,Arasu T.Duplicate Detection of Query Results from Multiple Web Databases [J].IJCA Special Issue on Computational Science—New Dimension & Perspectives,2011(2):71-75 [4] Zhang Zu-ping,Xu Xin,Long Jun,et al.Parameters Correlation and optimization in Text Similarity Measurement[J].Journal of Chinese Computer Systems,2011,2(5):983-989(in Chinese) 张祖平,徐昕,龙军,等.文本相似性度量中参数相关性与优化配置研究[J].小型微型计算机系统,2011,2(5):983-989 [5] Song Qin-bao,Yang Xiang-rong,Shen Jun-yi,et al.A Detection Algorithm for the Illegal Coping and Distributing of Digital Goods[J].Chinese Journal of Computers,2002,5(11):1207-1213(in Chinese) 宋擒豹,杨向荣,沈钧毅,等.数字商品非法复制的检测算法[J].计算机学报,2002,5(11):1207-1213 [6] Li Ya-zhou.The research and improvement of an automatic construction system of text classification corpus[D].Wuhan:Wuhan University of Technology,2011(in Chinese) 李亚洲.文本分类语料库自动构建系统的研究与改进[D].武汉:武汉理工大学,2011 [7] Ye Shao-zhi,Wen Ji-rong,Ma Wei-ying.A systematic study on parameter correlation in large scale duplicate document detection[J].Knowledge and Information Systems,2008,4(2):217-232 [8] Li Rui-lin,Sun Bing,Li Chao,et al.Differential Fault Analysis on SMS4 using a single fault[J].Information Processing Letters,2011,111(4):156-163 [9] Shi Kan-sheng,Liu Hai-tao,Song Wen-tao.A Text ClusteringMethod Based on Speech to Text and Improved Center Selection[J].Pattern Recognition and Artificial Intelligence,2012,5(6):996-1001(in Chinese) 施侃晟,刘海涛,宋文涛.基于词性和中心点改进的文本聚类方法[J].模式识别与人工智能,2012,5(6):996-1001 [10] Xu Ge,Wang Hou-feng.The Development of Topic Models in Natural Language Processing[J].Chinese Journal of Compu-ters,2011,4(8):1423-1436(in Chinese) 徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,4(8):1423-1436 |
No related articles found! |
|