Computer Science ›› 2019, Vol. 46 ›› Issue (2): 120-126.doi: 10.11896/j.issn.1002-137X.2019.02.019

• Information Security • Previous Articles     Next Articles

Clustering Algorithm in Differential Privacy Preserving

HU Chuang1,2, YANG Geng1,2, BAI Yun-lu1,3   

  1. College of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China1
    Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing 210023,China2
    College of Information Technology,Nanjing University of Chinese Medicine,Nanjing 210023,China3
  • Received:2018-01-29 Online:2019-02-25 Published:2019-02-25

Abstract: Data mining has made great progress in the field of research and application of big data,but sensitive information disclosure could bring users many threats and losses.Therefore,how to protect data privacy in clustering analysis has become a hot issue in data mining and data privacy protection.Traditional differential privacy k-means is sensitive to the selection of its initial centers,and it has a certain blindness in the selection of cluster number k,which reduces the availability of clustering results.To improve the availability of clustering results of differential privacy k-means clustering,this paper presented a new DPk-means-up clustering algorithm based on differential privacy and carried out theoretical analysis and comparison experiment.Theoretical analysis shows that the algorithm satisfies ε-differential privacy,and can be applied to data sets with different sizes and dimensions.In addition,experimental results indicate that the proposed algorithm improves clustering availability than other differential privacy k-means clustering methods at the same level of privacy preserve.

Key words: Clustering algorithms, Differential privacy, k-means, Privacy preserving

CLC Number: 

  • TP309
[1]MADDEN S,FRANKLIN M J,HELLERSTEIN J M.TAG:a tiny aggregation service for ad-hoc sensor networks[C]∥Proceedings of the 5th Symposium on Operating Systems Design and Implementation.New York,USA,2002:131-146.
[2]MACHANAVAJJHALA A,GEHRKE J,KIFER D,et al.l-diversity:Privacy beyond k-anonymity[C]∥Proceedings of the 22nd International Conference on Data Engineering.IEEE,2006:24-24.
[3]GANTA S R,KASIVISWANATHAN S P,SMITH A.Composition attacks and auxiliary information in data privacy[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2008:265-273.
[4]BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]∥Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2005:128-138.
[5]LI Y,HAO Z,WEN W,et al.Research on differential privacy preserving k-means clustering[J].Computer Science,2013,40(3):287-290.
[6]YU Q,LUO Y,CHEN C,et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence,2016,45(4):1179-1191.
[7]REN J,XIONG J,YAO Z,et al.DPLK-means:a novel differential privacy k-means mechanism[C]∥2017 IEEE Second International Conference on Data Science in Cyberspace(DSC).IEEE,2017:133-139.
[8]DWORK C.Differential privacy[C]∥Proceedings of the 33rd International Conference on Automata,Languages and Programming-Volume Part II.Springer,Berlin,Heidelberg,2006:1-19.
[9]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]∥Theory of Cryptography Conference(TCC).2006:265-284.
[10]DWORK C,LEI J.Differential privacy and robust statistics[C]∥ Proceedings of the 41st annual ACM Symposium on Theory of Computing.ACM,2009:371-380.
[11]DWORK C.A firm foundation for private data analysis[J]. Communications of the ACM,2011,54(1):86-95.
[12]DWORK C.Differential privacy[M]∥Encyclopedia of Cryptography and Security.Springer US,2011:338-340.
[13]LI N,QARDAJI W,SU D.Provably Private Data Anonymiza- tion:Or,k-Anonymity Meets Differential Privacy[J/OL].Corr,2010,abs/1101.2604:32-33.
[14]LI C,HAY M,RASTOGI V,et al.Optimizing linear counting queries under differential privacy[C]∥Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2010:123-134.
[15]XIAO X,WANG G,GEHRKE J.Differential privacy via wavelet transforms[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(8):1200-1214.
[16]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accura- cy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1-2):1021-1032.
[17]HARDT M,TALWAR K.On the geometry of differential privacy[C]∥Proceedings of the 42nd ACM Symposium on Theory of Computing.ACM,2010:705-714.
[18]HARDT M,ROTHBLUM G N.Amultiplicative weights mecha- nism for privacy-preserving data analysis[C]∥Proceedings of the 51st Annual IEEE Symposium.IEEE,2010:61-70.
[19]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]∥Proceedings of the 39th annual ACM Symposium on Theory of Computing.ACM,2007:75-84.
[20]SU D,CAO J,LI N,et al.Differentially private k-means clustering and a hybrid approach to private optimization[J].ACM Transactions on Privacy and Security (TOPS),2017,20(4):1-33.
[21]DWORK C.Differential privacy:A survey of results[C]∥International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19.
[22]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2009:19-30.
[23]ARTHUR D,VASSILVITSKII S.k-means++:The advantages of careful seeding[C]∥Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.Society for Industrial and Applied Mathematics,2007:1027-1035.
[24]CHEN R,ACS G,CASTELLUCCIA C.Differentially private sequential data publication via variable-length n-grams[C]∥Proceedings of the 2012 ACM Conference on Computer and Communications Security.ACM,2012:638-649.
[25]DWORK C.Differential privacy in new settings[C]∥Procee- dings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathema-tics,2010:174-183.
[26]FRÄNTI P.Clustering datasets[OL].http://cs.joensuu.fi/si-pu/datasets.
[27]NGUYEN H H,KIM J,KIM Y.Differential privacy in practice [J].Journal of Computing Science and Engineering,2013,7(3):177-186.
[28]JIANG H,YI S,LI J,et al.Ant clustering algorithm with K- harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684.
[29]FANG Y J,ZHU J Z,ZHOU W,et al.A survey on data mining privacy protection algorithms[J].Netinfo Security,2017(2):6-11.(in Chinese)
方跃坚,朱锦钟,周文,等.数据挖掘隐私保护算法研究综述[J].信息网络安全,2017(2):6-11.
[30]ZHANG F X,JIANG C H.Research on query on privacy anonymity algorithm based on grid clustering[J].Netinfo Security,2015(8):53-58.(in Chinese)
张付霞,蒋朝惠.一种基于网格聚类的查询隐私匿名算法研究[J].信息网络安全,2015(8):53-58.
[1] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[2] HUANG Jue, ZHOU Chun-lai. Frequency Feature Extraction Based on Localized Differential Privacy [J]. Computer Science, 2022, 49(7): 350-356.
[3] WANG Jian. Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving [J]. Computer Science, 2022, 49(6A): 575-580.
[4] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[5] LYU You, WU Wen-yuan. Linear System Solving Scheme Based on Homomorphic Encryption [J]. Computer Science, 2022, 49(3): 338-345.
[6] KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173.
[7] DONG Xiao-mei, WANG Rui, ZOU Xin-kai. Survey on Privacy Protection Solutions for Recommended Applications [J]. Computer Science, 2021, 48(9): 21-35.
[8] SUN Lin, PING Guo-lou, YE Xiao-jun. Correlation Analysis for Key-Value Data with Local Differential Privacy [J]. Computer Science, 2021, 48(8): 278-283.
[9] ZHANG Xue-jun, YANG Hao-ying, LI Zhen, HE Fu-cun, GAI Ji-yang, BAO Jun-da. Differentially Private Location Privacy-preserving Scheme withSemantic Location [J]. Computer Science, 2021, 48(8): 300-308.
[10] CHEN Tian-rong, LING Jie. Differential Privacy Protection Machine Learning Method Based on Features Mapping [J]. Computer Science, 2021, 48(7): 33-39.
[11] WANG Le-ye. Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities [J]. Computer Science, 2021, 48(6): 301-305.
[12] PENG Chun-chun, CHEN Yan-li, XUN Yan-mei. k-modes Clustering Guaranteeing Local Differential Privacy [J]. Computer Science, 2021, 48(2): 105-113.
[13] XU Shou-kun, NI Chu-han, JI Chen-chen, LI Ning. Image Caption of Safety Helmets Wearing in Construction Scene Based on YOLOv3 [J]. Computer Science, 2020, 47(8): 233-240.
[14] WANG Mao-ni, PENG Chang-gen, HE Wen-zhu, DING Xing, DING Hong-fa. Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information [J]. Computer Science, 2020, 47(4): 270-277.
[15] ZHONG Ya,GUO Yuan-bo,LIU Chun-hui,LI Tao. User Attributes Profiling Method and Application in Insider Threat Detection [J]. Computer Science, 2020, 47(3): 292-297.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!