计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 120-126.doi: 10.11896/j.issn.1002-137X.2019.02.019

• 信息安全 • 上一篇    下一篇

面向差分隐私保护的聚类算法

胡闯1,2, 杨庚1,2, 白云璐1,3   

  1. 南京邮电大学计算机学院 南京2100031
    江苏省大数据安全与智能处理重点实验室 南京 2100232
    南京中医药大学信息技术学院 南京2100233
  • 收稿日期:2018-01-29 出版日期:2019-02-25 发布日期:2019-02-25
  • 通讯作者: 杨 庚(1961-),男,博士,教授,主要研究领域为网络与信息安全、分布式与并行计算、大数据隐私保护,E-mail:yangg@njupt.edu.cn
  • 作者简介:胡 闯(1994-),男,硕士生,主要研究领域为差分隐私保护;白云璐(1988-),女,博士生,主要研究领域为信息安全与隐私保护。
  • 基金资助:
    本文受国家自然科学基金项目(61572263),江苏省自然科学基金政策引导类计划——前瞻性联合研究项目(2016ZS04)资助。

Clustering Algorithm in Differential Privacy Preserving

HU Chuang1,2, YANG Geng1,2, BAI Yun-lu1,3   

  1. College of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China1
    Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing 210023,China2
    College of Information Technology,Nanjing University of Chinese Medicine,Nanjing 210023,China3
  • Received:2018-01-29 Online:2019-02-25 Published:2019-02-25

摘要: 大数据时代的数据挖掘技术在研究和应用等领域取得了较大发展,但大量敏感信息披露给用户带来了众多威胁和损失。因此,在聚类分析过程中如何保护数据隐私成为数据挖掘和数据隐私保护领域的热点问题。传统差分隐私保护k-means算法对其初始中心点的选择较为敏感,而且在聚簇个数k值的选择上存在一定的盲目性,降低了聚类结果的可用性。为了进一步提高差分隐私k-means聚类方法聚类结果的可用性,研究并提出一种新的基于差分隐私的DPk-means-up聚类算法,同时进行了理论分析和比较实验。理论分析表明,该算法满足ε-差分隐私,可适用于不同规模和不同维度的数据集。此外,实验结果表明,在相同隐私保护级别下,与其他差分隐私k-means聚类方法相比,所提算法有效提高了聚类的可用性。

关键词: k-均值, 差分隐私, 聚类算法, 隐私保护

Abstract: Data mining has made great progress in the field of research and application of big data,but sensitive information disclosure could bring users many threats and losses.Therefore,how to protect data privacy in clustering analysis has become a hot issue in data mining and data privacy protection.Traditional differential privacy k-means is sensitive to the selection of its initial centers,and it has a certain blindness in the selection of cluster number k,which reduces the availability of clustering results.To improve the availability of clustering results of differential privacy k-means clustering,this paper presented a new DPk-means-up clustering algorithm based on differential privacy and carried out theoretical analysis and comparison experiment.Theoretical analysis shows that the algorithm satisfies ε-differential privacy,and can be applied to data sets with different sizes and dimensions.In addition,experimental results indicate that the proposed algorithm improves clustering availability than other differential privacy k-means clustering methods at the same level of privacy preserve.

Key words: Clustering algorithms, Differential privacy, k-means, Privacy preserving

中图分类号: 

  • TP309
[1]MADDEN S,FRANKLIN M J,HELLERSTEIN J M.TAG:a tiny aggregation service for ad-hoc sensor networks[C]∥Proceedings of the 5th Symposium on Operating Systems Design and Implementation.New York,USA,2002:131-146.
[2]MACHANAVAJJHALA A,GEHRKE J,KIFER D,et al.l-diversity:Privacy beyond k-anonymity[C]∥Proceedings of the 22nd International Conference on Data Engineering.IEEE,2006:24-24.
[3]GANTA S R,KASIVISWANATHAN S P,SMITH A.Composition attacks and auxiliary information in data privacy[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2008:265-273.
[4]BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]∥Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2005:128-138.
[5]LI Y,HAO Z,WEN W,et al.Research on differential privacy preserving k-means clustering[J].Computer Science,2013,40(3):287-290.
[6]YU Q,LUO Y,CHEN C,et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence,2016,45(4):1179-1191.
[7]REN J,XIONG J,YAO Z,et al.DPLK-means:a novel differential privacy k-means mechanism[C]∥2017 IEEE Second International Conference on Data Science in Cyberspace(DSC).IEEE,2017:133-139.
[8]DWORK C.Differential privacy[C]∥Proceedings of the 33rd International Conference on Automata,Languages and Programming-Volume Part II.Springer,Berlin,Heidelberg,2006:1-19.
[9]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]∥Theory of Cryptography Conference(TCC).2006:265-284.
[10]DWORK C,LEI J.Differential privacy and robust statistics[C]∥ Proceedings of the 41st annual ACM Symposium on Theory of Computing.ACM,2009:371-380.
[11]DWORK C.A firm foundation for private data analysis[J]. Communications of the ACM,2011,54(1):86-95.
[12]DWORK C.Differential privacy[M]∥Encyclopedia of Cryptography and Security.Springer US,2011:338-340.
[13]LI N,QARDAJI W,SU D.Provably Private Data Anonymiza- tion:Or,k-Anonymity Meets Differential Privacy[J/OL].Corr,2010,abs/1101.2604:32-33.
[14]LI C,HAY M,RASTOGI V,et al.Optimizing linear counting queries under differential privacy[C]∥Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2010:123-134.
[15]XIAO X,WANG G,GEHRKE J.Differential privacy via wavelet transforms[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(8):1200-1214.
[16]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accura- cy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1-2):1021-1032.
[17]HARDT M,TALWAR K.On the geometry of differential privacy[C]∥Proceedings of the 42nd ACM Symposium on Theory of Computing.ACM,2010:705-714.
[18]HARDT M,ROTHBLUM G N.Amultiplicative weights mecha- nism for privacy-preserving data analysis[C]∥Proceedings of the 51st Annual IEEE Symposium.IEEE,2010:61-70.
[19]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]∥Proceedings of the 39th annual ACM Symposium on Theory of Computing.ACM,2007:75-84.
[20]SU D,CAO J,LI N,et al.Differentially private k-means clustering and a hybrid approach to private optimization[J].ACM Transactions on Privacy and Security (TOPS),2017,20(4):1-33.
[21]DWORK C.Differential privacy:A survey of results[C]∥International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19.
[22]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2009:19-30.
[23]ARTHUR D,VASSILVITSKII S.k-means++:The advantages of careful seeding[C]∥Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.Society for Industrial and Applied Mathematics,2007:1027-1035.
[24]CHEN R,ACS G,CASTELLUCCIA C.Differentially private sequential data publication via variable-length n-grams[C]∥Proceedings of the 2012 ACM Conference on Computer and Communications Security.ACM,2012:638-649.
[25]DWORK C.Differential privacy in new settings[C]∥Procee- dings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathema-tics,2010:174-183.
[26]FRÄNTI P.Clustering datasets[OL].http://cs.joensuu.fi/si-pu/datasets.
[27]NGUYEN H H,KIM J,KIM Y.Differential privacy in practice [J].Journal of Computing Science and Engineering,2013,7(3):177-186.
[28]JIANG H,YI S,LI J,et al.Ant clustering algorithm with K- harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684.
[29]FANG Y J,ZHU J Z,ZHOU W,et al.A survey on data mining privacy protection algorithms[J].Netinfo Security,2017(2):6-11.(in Chinese)
方跃坚,朱锦钟,周文,等.数据挖掘隐私保护算法研究综述[J].信息网络安全,2017(2):6-11.
[30]ZHANG F X,JIANG C H.Research on query on privacy anonymity algorithm based on grid clustering[J].Netinfo Security,2015(8):53-58.(in Chinese)
张付霞,蒋朝惠.一种基于网格聚类的查询隐私匿名算法研究[J].信息网络安全,2015(8):53-58.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 吕由, 吴文渊.
隐私保护线性回归方案与应用
Privacy-preserving Linear Regression Scheme and Its Application
计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[4] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[5] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[6] 王健.
基于隐私保护的反向传播神经网络学习算法
Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving
计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[7] 李利, 何欣, 韩志杰.
群智感知的隐私保护研究综述
Review of Privacy-preserving Mechanisms in Crowdsensing
计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077
[8] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[9] 吕由, 吴文渊.
基于同态加密的线性系统求解方案
Linear System Solving Scheme Based on Homomorphic Encryption
计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124
[10] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[11] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[12] 金华, 朱靖宇, 王昌达.
视频隐私保护技术综述
Review on Video Privacy Protection
计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047
[13] 雷羽潇, 段玉聪.
面向跨模态隐私保护的AI治理法律技术化框架
AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection
计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011
[14] 董晓梅, 王蕊, 邹欣开.
面向推荐应用的差分隐私方案综述
Survey on Privacy Protection Solutions for Recommended Applications
计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083
[15] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!