Computer Science ›› 2016, Vol. 43 ›› Issue (8): 95-99.doi: 10.11896/j.issn.1002-137X.2016.08.020

Previous Articles     Next Articles

Similarity Measure Algorithm of Cipher-text Based on Re-extracted Keywords

LI Zhi-hua, CHEN Chao-qun, LI Cun, HU Zhen-yu and ZHANG Hua-wei   

  • Online:2018-12-01 Published:2018-12-01

Abstract: To solve the similarity of dissimilarity measurement between the cipher texts,a new similarity measure algorithm of cipher-text based on re-extracted keywords called SMCTBRK was proposed.Through defining the new concepts of effective scope,relative scope,distributed scope of the keywords,and re-extracting the keywords in documents,the SMCTBRK constructs the encryption index item for the compared documents depending on the less amounts of re-extracted keywords.Here,the encryption index item is organized as the feature vector.Further,the SMCTBRK computes the similarity between the different cipher texts by the encryption index item instead of the separated keywords.Experiments on real documents were conducted.And the results show that the SMCTBRK is more promised than the Shingling algorithm and the Simhash algorithm on accuracy and recall ratio.

Key words: LI Zhi-hua CHEN Chao-qun LI Cun HU Zhen-yu ZHANG Hua-wei (Department of Computer Science,School of IOT Engineering,Jiangnan University,Wuxi 214122,China)

[1] Wang C,Cao N,Li J,et al.Secure ranked keyword search over encrypted cloud data[C]∥Proceedings of ICDCS.Genova,Italy,2010:253-262
[2] Sebastiani F.Machine learning in automated text categorization,acmcs[J].ACM Computing Surveys,2002,34(1):1-47
[3] Hemalatha S,Raja K,Arasu T.Duplicate Detection of Query Results from Multiple Web Databases [J].IJCA Special Issue on Computational Science—New Dimension & Perspectives,2011(2):71-75
[4] Zhang Zu-ping,Xu Xin,Long Jun,et al.Parameters Correlation and optimization in Text Similarity Measurement[J].Journal of Chinese Computer Systems,2011,2(5):983-989(in Chinese) 张祖平,徐昕,龙军,等.文本相似性度量中参数相关性与优化配置研究[J].小型微型计算机系统,2011,2(5):983-989
[5] Song Qin-bao,Yang Xiang-rong,Shen Jun-yi,et al.A Detection Algorithm for the Illegal Coping and Distributing of Digital Goods[J].Chinese Journal of Computers,2002,5(11):1207-1213(in Chinese) 宋擒豹,杨向荣,沈钧毅,等.数字商品非法复制的检测算法[J].计算机学报,2002,5(11):1207-1213
[6] Li Ya-zhou.The research and improvement of an automatic construction system of text classification corpus[D].Wuhan:Wuhan University of Technology,2011(in Chinese) 李亚洲.文本分类语料库自动构建系统的研究与改进[D].武汉:武汉理工大学,2011
[7] Ye Shao-zhi,Wen Ji-rong,Ma Wei-ying.A systematic study on parameter correlation in large scale duplicate document detection[J].Knowledge and Information Systems,2008,4(2):217-232
[8] Li Rui-lin,Sun Bing,Li Chao,et al.Differential Fault Analysis on SMS4 using a single fault[J].Information Processing Letters,2011,111(4):156-163
[9] Shi Kan-sheng,Liu Hai-tao,Song Wen-tao.A Text ClusteringMethod Based on Speech to Text and Improved Center Selection[J].Pattern Recognition and Artificial Intelligence,2012,5(6):996-1001(in Chinese) 施侃晟,刘海涛,宋文涛.基于词性和中心点改进的文本聚类方法[J].模式识别与人工智能,2012,5(6):996-1001
[10] Xu Ge,Wang Hou-feng.The Development of Topic Models in Natural Language Processing[J].Chinese Journal of Compu-ters,2011,4(8):1423-1436(in Chinese) 徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,4(8):1423-1436

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!