Computer Science ›› 2015, Vol. 42 ›› Issue (10): 132-137.

Previous Articles     Next Articles

Sensitive Information Inference Method Based on Semi-supervised Document Clustering

SU Ying-bin, DU Xue-hui, XIA Chun-tao, CAO Li-feng and CHEN Hua-cheng   

  • Online:2018-11-14 Published:2018-11-14

Abstract: For the problem that sensitive information leakage caused by multi-document clustering and inference has the features of high risk and high concealment,a sensitive information inference method based on semi-supervised document clustering was proposed.Firstly,a new second-order constraint active learning algorithm was designed,which can ensure to obtain high quality constraints with less time by choosing the most uncertain informative data.Then,a new semi-supervised clustering algorithm combining constraints and DBSCAN was proposed,which can effectively resolve fuzzy boundaries of DBSCAN and improve the precision of document clustering.Finally,possibility measure of sensitive information on similar documents was calculated based on the results of semi-supervise clustering.The experiments show that the precision of semi-supervised clustering improves significantly,and the inference method can infer sensitive information effectively.

Key words: Semi-supervised clustering,DBSCAN,Active learning,Sensitive information,Fuzzy math,Inference method

[1] Motro A,Marks D G,Jajodia S.Aggregation in relational databases:Controlled disclosure of sensitive information[M]∥Computer Security—ESORICS 94.Springer Berlin Heidelberg,1994:429-445
[2] Accorsi R,Müller G.Preventive inference control in data-centric business models[C]∥2013 IEEE Security and Privacy Workshops (SPW).IEEE,2013:28-33
[3] 冯婷.安全数据库的推理通道问题研究[D].南京:南京航空航天大学,2010 Feng Ting.The study of the inference of security database[D].Nanjing:Nanjing University of Aeronautics and Astronautics,2010
[4] 曹利峰,陈性元,杜学绘,等.基于聚类分析的客体聚合信息级别推演方法[J].电子与信息学报,2012,34(6):1432-1437Cao Li-feng,Chen Xing-yuan,Du Xue-hui,et al.A level infe-rence method for aggregated information of objects based on clustering analysis[J].Journal of Electronics and Information Technology,2012,34(6):1432-1437
[5] 王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422 Wang Ling,Bo Lie-feng,Jiao Li-cheng.Density-Sensitive Smi-Supervised spectral clustering[J].Journal of Software,2007,18(10):2412-2422
[6] Wagstaff K,Cardie C.Clustering with instance-level constraints[C]∥Proc.of the 17th Int’l Conf.on Machine Learning.2000:1103-1110
[7] 赵卫中,马慧芳,李志清,等.一种结合主动学习的半监督文档聚类算法[J].软件学报,2012,23(6):1486-1499 Zhao Wei-zhong,Ma Hui-fang,Li Zhi-qing,et al.Efficiently active learning for Smi-Supervised document clustering[J].Journal of Software,2012,23(6):1486-1499
[8] Jain A K.Data clustering:50 years beyond K-means[J].Pattern Recognition Letters,2010,31(8):651-666
[9] 苏赢彬,杜学绘,夏春涛,等.基于文档平滑和查询扩展的文档敏感信息检测方法[J].计算机应用,2014,34(9):2639-2644 Su Ying-bin,Du Xue-hui,Xia Chun-tao,et al.Sensitive information detection approach for documents based on document smoothing and query expansion[J].Journal of Computer Applications,2014,34(9):2639-2644
[10] Goyal P,Behera L,Mcginnity T M.A novel neighborhood based document smoothing model for information retrieval[J].Information retrieval,2013,16(3):391-425
[11] Settles B.Active learning literature survey[R].University ofWisconsin-Madison,2010
[12] 龙军,殷建平,祝恩,等.主动学习研究综述[J].计算机研究与发展,2008,45(z1):300-304 Long Jun,Yin Jian-ping,Zhu En,et al.The research of active learning[J].Journal of Computer Research and Development,2008,45(z1):300-304
[13] Xiong S,Azimi J,Fern X Z.Active learning of constraints for semi-supervised clustering[J].IEEE Transactions on Know-ledge and Data Engineering,2014,26(1):43-54
[14] Davidson I,Wagstaff K.Measuring constraint-set utility for partitional clustering algorithms[M]∥Lecture Notes in Computer Science,Vol 4213.Springer,2006:115-125
[15] 杨纶标,高英仪,等.模糊数学原理及应用(第三版)[M].广州:华南理工大学出版社,2005:338-344 Yang Lun-biao,Gao Ying-yi,et al.The principle and application of fuzzy mathematics (third edition)[M].Guangzhou:South China University of Technology Press,2005:338-344

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!