计算机科学 ›› 2019, Vol. 46 ›› Issue (2): 120-126.doi: 10.11896/j.issn.1002-137X.2019.02.019
胡闯1,2, 杨庚1,2, 白云璐1,3
HU Chuang1,2, YANG Geng1,2, BAI Yun-lu1,3
摘要: 大数据时代的数据挖掘技术在研究和应用等领域取得了较大发展,但大量敏感信息披露给用户带来了众多威胁和损失。因此,在聚类分析过程中如何保护数据隐私成为数据挖掘和数据隐私保护领域的热点问题。传统差分隐私保护k-means算法对其初始中心点的选择较为敏感,而且在聚簇个数k值的选择上存在一定的盲目性,降低了聚类结果的可用性。为了进一步提高差分隐私k-means聚类方法聚类结果的可用性,研究并提出一种新的基于差分隐私的DPk-means-up聚类算法,同时进行了理论分析和比较实验。理论分析表明,该算法满足ε-差分隐私,可适用于不同规模和不同维度的数据集。此外,实验结果表明,在相同隐私保护级别下,与其他差分隐私k-means聚类方法相比,所提算法有效提高了聚类的可用性。
中图分类号:
[1]MADDEN S,FRANKLIN M J,HELLERSTEIN J M.TAG:a tiny aggregation service for ad-hoc sensor networks[C]∥Proceedings of the 5th Symposium on Operating Systems Design and Implementation.New York,USA,2002:131-146. [2]MACHANAVAJJHALA A,GEHRKE J,KIFER D,et al.l-diversity:Privacy beyond k-anonymity[C]∥Proceedings of the 22nd International Conference on Data Engineering.IEEE,2006:24-24. [3]GANTA S R,KASIVISWANATHAN S P,SMITH A.Composition attacks and auxiliary information in data privacy[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2008:265-273. [4]BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]∥Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2005:128-138. [5]LI Y,HAO Z,WEN W,et al.Research on differential privacy preserving k-means clustering[J].Computer Science,2013,40(3):287-290. [6]YU Q,LUO Y,CHEN C,et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence,2016,45(4):1179-1191. [7]REN J,XIONG J,YAO Z,et al.DPLK-means:a novel differential privacy k-means mechanism[C]∥2017 IEEE Second International Conference on Data Science in Cyberspace(DSC).IEEE,2017:133-139. [8]DWORK C.Differential privacy[C]∥Proceedings of the 33rd International Conference on Automata,Languages and Programming-Volume Part II.Springer,Berlin,Heidelberg,2006:1-19. [9]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]∥Theory of Cryptography Conference(TCC).2006:265-284. [10]DWORK C,LEI J.Differential privacy and robust statistics[C]∥ Proceedings of the 41st annual ACM Symposium on Theory of Computing.ACM,2009:371-380. [11]DWORK C.A firm foundation for private data analysis[J]. Communications of the ACM,2011,54(1):86-95. [12]DWORK C.Differential privacy[M]∥Encyclopedia of Cryptography and Security.Springer US,2011:338-340. [13]LI N,QARDAJI W,SU D.Provably Private Data Anonymiza- tion:Or,k-Anonymity Meets Differential Privacy[J/OL].Corr,2010,abs/1101.2604:32-33. [14]LI C,HAY M,RASTOGI V,et al.Optimizing linear counting queries under differential privacy[C]∥Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.ACM,2010:123-134. [15]XIAO X,WANG G,GEHRKE J.Differential privacy via wavelet transforms[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(8):1200-1214. [16]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accura- cy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1-2):1021-1032. [17]HARDT M,TALWAR K.On the geometry of differential privacy[C]∥Proceedings of the 42nd ACM Symposium on Theory of Computing.ACM,2010:705-714. [18]HARDT M,ROTHBLUM G N.Amultiplicative weights mecha- nism for privacy-preserving data analysis[C]∥Proceedings of the 51st Annual IEEE Symposium.IEEE,2010:61-70. [19]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]∥Proceedings of the 39th annual ACM Symposium on Theory of Computing.ACM,2007:75-84. [20]SU D,CAO J,LI N,et al.Differentially private k-means clustering and a hybrid approach to private optimization[J].ACM Transactions on Privacy and Security (TOPS),2017,20(4):1-33. [21]DWORK C.Differential privacy:A survey of results[C]∥International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19. [22]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2009:19-30. [23]ARTHUR D,VASSILVITSKII S.k-means++:The advantages of careful seeding[C]∥Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.Society for Industrial and Applied Mathematics,2007:1027-1035. [24]CHEN R,ACS G,CASTELLUCCIA C.Differentially private sequential data publication via variable-length n-grams[C]∥Proceedings of the 2012 ACM Conference on Computer and Communications Security.ACM,2012:638-649. [25]DWORK C.Differential privacy in new settings[C]∥Procee- dings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathema-tics,2010:174-183. [26]FRÄNTI P.Clustering datasets[OL].http://cs.joensuu.fi/si-pu/datasets. [27]NGUYEN H H,KIM J,KIM Y.Differential privacy in practice [J].Journal of Computing Science and Engineering,2013,7(3):177-186. [28]JIANG H,YI S,LI J,et al.Ant clustering algorithm with K- harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684. [29]FANG Y J,ZHU J Z,ZHOU W,et al.A survey on data mining privacy protection algorithms[J].Netinfo Security,2017(2):6-11.(in Chinese) 方跃坚,朱锦钟,周文,等.数据挖掘隐私保护算法研究综述[J].信息网络安全,2017(2):6-11. [30]ZHANG F X,JIANG C H.Research on query on privacy anonymity algorithm based on grid clustering[J].Netinfo Security,2015(8):53-58.(in Chinese) 张付霞,蒋朝惠.一种基于网格聚类的查询隐私匿名算法研究[J].信息网络安全,2015(8):53-58. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[4] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[5] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[6] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[7] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[8] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[9] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[10] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[11] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[12] | 金华, 朱靖宇, 王昌达. 视频隐私保护技术综述 Review on Video Privacy Protection 计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047 |
[13] | 雷羽潇, 段玉聪. 面向跨模态隐私保护的AI治理法律技术化框架 AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection 计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011 |
[14] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述 Survey on Privacy Protection Solutions for Recommended Applications 计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083 |
[15] | 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析 Correlation Analysis for Key-Value Data with Local Differential Privacy 计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122 |
|