计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 162-173.doi: 10.11896/jsjkx.201200008
孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉
KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong
摘要: 差分隐私K-means算法(Differential Privacy K-means Algorithm,DP K-means)作为一种基于差分隐私技术的隐私保护数据挖掘(Privacy Preserving Data Mining,PPDM)模型,因简单高效且可保障数据的隐私而备受研究者的关注。文中首先阐述了差分隐私K-means算法的原理、隐私攻击模型,以分析算法的不足。然后从数据预处理、隐私预算分配、聚簇划分等3个角度讨论分析DP K-means算法改进研究的优缺点,并对研究中的相关数据集和通用评价指标进行了总结。最后指出DP K-means算法改进研究中亟待解决的挑战性问题,并展望了DP K-means算法的未来发展趋势。
中图分类号:
[1]GAO Z,SUN Y,CUI X,et al.Privacy-Preserving Hybrid K-Means[J].International Journal of Data Warehousing and Mi-ning,2018,14(2):1-17. [2]NELSON B,OLOVSSON T.Security and privacy for big data:A systematic literature review[C]//IEEE International Confe-rence on Big Data.IEEE,2017. [3]ZHANG X J,YANG H Y,LI Z,et al.Differentially Private Location Privacy-preserving Scheme with Semantic Location[J].Computer Science,2021,48(8):300-308. [4]PENG C C,CHEN Y L,XUN Y M.k-modes Clustering Gua-ranteeing Local Differential Privacy[J].Computer Science,2021,48(2):105-113. [5]ZHOU S G,LI F,TAO Y F,et al.Privacy Preservation in Database Applications:A Survey[J].Chinese Journal of Computers,2009,32(5):847-861. [6]MIN Z,YANG G,SANGAIAH A K,et al.A privacy protection-oriented parallel fully homomorphic encryption algorithm in cyber physical systems[J].EURASIP Journal on Wireless Communications and Networking,2019,2019(1):15. [7]LIU J,YIN S L,LI H,et al.A density-based clustering method for k-anonymity privacy protection[J].Journal of Information Hiding and Multimedia Signal Processing,2017,8(1):12-18. [8]TEMUUJIN O,AHN J,IM D H.Efficient L-Diversity algo-rithm for preserving privacy of dynamically published datasets[J].IEEE Access,2019,197:122878-122888. [9]DWORK C.Differential Privacy:A Survey of Results[C]//Theory and Applications of Models of Computation(TAMC 2008).Springer,2008:1-19. [10]FANG Y,ZHU J,ZHOU W,et al.A Survey on Data Mining Privacy Protection Algorithms[J].Netinfo Security,2017,2:6-11. [11]WANG J Y,LIU C,FU X C,et al.Crucial Patterns Mining with Differential Privacy over Data Streams[J].Journal of Software,2019,30(3):648-666. [12]INAN A,GURSOY M E,SAYGIN Y.Sensitivity analysis fornon-interactive differential privacy:bounds and efficient algorithms[J].IEEE Transactions on Dependable and Secure Computing,2020,17(1):194-207. [13]GAO Z Q,WANG Y T.Survey on differential privacy and its progress[J].Journal on Communications,2017,38:151-155. [14]BLUM A,DWORK C,MCSHERRY F,et al.Practical Privacy:The SuLQ Framework[C]//Proceedings of the Twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems(PODS '05).2016:128-138. [15]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2019:19-30. [16]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Thirty-ninth ACM Symposium on Theory of Computing.ACM,2007. [17]FELDMAN D,FIAT A,KAPLAN H,et al.Private coresets[C]//Proceedings of the 41st Annual ACM Symposium on Theory of Computing,STOC 2009.Bethesda,MD,USA:ACM,2009. [18]MOHAN P,THAKURTA A,SHI E,et al.GUPT:privacy preserving data analysis made easy[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.2012:349-360. [19]FLETCHER S,ISLAM M Z.Decision tree classification withdifferential privacy:A survey[J].ACM Computing Surveys,2019,52(4):1-33. [20]HAY M,MACHANAVAJJHALA A,MIKLAU G,et al.Principled Evaluation of Differentially Private Algorithms using DPBench[C]//ACM.2016:139-154. [21]YE Q Q,MENG X F,ZHU M J,et al.Survey on Local Differential Privacy[J].Journal of Software,2018,29(7):1981-2005. [22]HE X M,WANG X Y,CHEN H H.Study on choosing the parameter ε in differential privacy[J].Journal on Communications,2015,36(12):124-130. [23]CONSUL S,WILLIAMSON S.Differentially Private MedianForests for Regression and Classification[J].arXiv:2006.08795. [24]ALVIM M,CHATZIKOKOLAKIS K,PALAMIDESSI C,et al.Invited Paper:Local Differential Privacy on Metric Spaces:Optimizing the Trade-Off with Utility[C]//2018 IEEE 31st Computer Security Foundations Symposium (CSF).IEEE,2018:262-267. [25]BUN M,STEINKE T.Average-Case Averages:Private Algo-rithms for Smooth Sensitivity and Mean Estimation[J].Advances in Neural Information Processing Systems,2019,17:181-191. [26]WANG H,XU Z,XIONG L,et al.Conducting Correlated La-place Mechanism for Differential Privacy[C]//International Conference on Cloud Computing and Security.Cham:Springer,2017. [27]DONG J,DURFEE D,ROGERS R.Optimal Differential Privacy Composition for Exponential Mechanisms and the Cost of Adaptivity[J].arXiv:1909.13830. [28]YANG X,WANG T,REN X,et al.Survey on Improving Data Utility in Differentially Private Sequential Data Publishing[J].IEEE Transactions on Big Data,2021,7(4):729-749. [29]LIU F.Generalized Gaussian Mechanism for Differential Privacy[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(4):747-756. [30]TALWAR K,HARDT M A W.Geometric mechanism for privacy-preserving answers:U.S.Patent 8,661,047[P].2014-2-25. [31]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].VLDB Journal — the International Journal on Very Large Data Bases,2015,24(6):757-781. [32]ZHAO J,CHEN Y,ZHANG W.Differential Privacy Preservation in Deep Learning:Challenges,Opportunities and Solutions[J].IEEE Access,2019,7:48901-48911. [33]GONG M,XIE Y,PAN K,et al.A Survey on Differentially Private Machine Learning[J].IEEE Computational Intelligence Magazine,2020,15(2):49-64. [34]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of cryptography conference.Springer,2006:265-284. [35]LIN W C,TSAI C F,KE S W,et al.Top 10 data mining techniques in business applications:a brief survey[J].Kybernetes,2017,46(7):1158-1170. [36]XIONG J,REN J,CHEN L,et al.Enhancing privacy and availability for data clustering in intelligent electrical service of IoT[J].IEEE Internet of Things Journal,2018,6(2):1530-1540. [37]DWORK C,ROTHBLUM G N,VADHAN S P.Boosting andDifferential Privacy[C]//2010 IEEE 51st Annual Symposium on Foundations of Computer Science.IEEE,2010:51-60. [38]YANG J C,ZHAO C.A Survey on K-Means Clustering Algorithm[J].Computer Engineering and Applications,2019,55(23):7-14,63. [39]REN Y H.Survey of K-means algorithms on big data[J/OL].Application Research of Computers.https://doi.org/10.19734/j.issn.1001-3695.2019.10.0581. [40]XING Y N,QIAN Y R,NAN F Z,et al.Survey of optimization on K-means algorithm in Spark[J].2020,37(3):641-647. [41]LI Y,HAO Z F,WEN W,et al.Research on Differential Privacy Preserving k-means Clustering[J].Computer Science,2013,3:287-290. [42]REN J,XIONG J,YAO Z,et al.DPLK-Means:A Novel Diffe-rential Privacy K-Means Mechanism[C]//IEEE Second International Conference on Data Science in Cyberspace.IEEE,2017. [43]ZHANG G,ZHANG C,ZHANG H.Improved K-means Algo-rithm Based on Density Canopy[J].Knowledge-Based Systems,2018,145:289-297. [44]XIA D,NING F,HE W.Research on Parallel Adaptive Canopy-K-Means Clustering Algorithm for Big Data Mining Based on Cloud Platform[J].Journal of Grid Computing,2020,18(2):263-273. [45]LI L F.The analysis of K-means Clustering with Differential Privacy[D].Sichuan:Southwest Jiaotong University,2016. [46]YAO S.An Improved Differential Privacy K-Means Algorithm Based on MapReduce[C]//2018 11th International Symposium on Computational Intelligence and Design (ISCID).IEEE,2018:141-145. [47]SHANG T,ZHAO Z,GUAN Z,et al.A DP canopy k-means algorithm for privacy preservation of Hadoop Platform[C]//International Symposium on Cyberspace Safety and Security.Cham:Springer,2017:189-198. [48]SHAO R M,ZHANG L,LIU Y,et al.A-PAM Clustering Algorithm Based on Differential Privacy Preserving[C]//2015 International Conference on Software,Multimedia and Communication Engineering.2015. [49]YU Q,LUO Y,CHEN C,et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence,2016,45(4):1179-1191. [50]FAN Y K,LIU J W.Parallel K-means algorithm with differen-tial privacy preservation and outlier pruning[J].Application Research of Computers,2019,6:1776-1781. [51]MENG Q,ZHOU L.Research On Differential Privacy Preserving Clustering Algorithm Based On Spark Platform[J].Journal of Computers (Taiwan),2018,29(1):47-62. [52]DWORK C.Differential privacy[C]//International Colloquiumon Automata,Languages,and Programming.Springer,2006:1-12. [53]FAN Z,XU X.APDP k-means:A New Differential PrivacyClustering Algorithm Based on Arithmetic Progression Privacy Budget Allocation[C]//2019 IEEE 21st International Confe-rence on High Performance Computing and Communications.IEEE,2019. [54]SU D,CAO J,LI N,et al.Differentially Private K-Means Clustering[C]//ACM.2016:26-37. [55]FU Y M,LI Z D.Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J].Netinfo Security,2019,218(2):49-58. [56]ZHANG Y,LIU N,WANG S,et al.A differential privacy protecting K-means clustering algorithm based on contour coefficients[J].Plos One,2018,13(11):1-15. [57]MO R,LIU J,YU W,et al.A Differential Privacy-Based Protecting Data Preprocessing Method for Big Data Mining[C]//2019 18th IEEE International Conference on Trust,Security And Privacy In Computing And Communications.IEEE,2019. [58]DONG S U,CAO J N E,NINGHUI L I,et al.Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization[J].ACM Transaction on Information & System Security,2017,20(4):16.1-16.33. [59]ZHANG J,XIAO X,YANG Y,et al.PrivGene:differentially private model fitting using genetic algorithms[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Ma-nagement of Data.2013:665-676. [60]LU Z,SHEN H.A convergent differentially private k-means clustering algorithm[C]//Pacific-Asia Conference on Know-ledge Discovery and Data Mining.Springer,2019:612-624. [61]LU Z,SHEN H.Differentially Private k-Means Clustering with Guaranteed Convergence[J].arXiv:2002.01043,2020. [62]ACS G,MELIS L,CASTELLUCCIA C,et al.Differentially private mixture of generative neural networks[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(6):1109-1121. [63]LV Z,WANG L,GUAN Z,et al.An Optimizing and Differen-tially Private Clustering Algorithm for Mixed Data in SDN based Smart Grid[J].IEEE Access,2019,7:45773-45782. [64]NGUYEN H H.Privacy-Preserving Mechanisms for k-ModesClustering[J].Computers & Security,2018,78:60-75. [65]NI T,QIAO M,CHEN Z,et al.Utility-efficient Differentially Private K-means Clustering based on Cluster Merging-ScienceDirect[J].Neurocomputing,2021,424:205-214. [66]GAO Z Q,ZHANG L J.DPHKMS:An Efficient Hybrid Clus-tering Preserving Differential Privacy in Spark[C]//Internatio-nal Conference on Emerging Internetworking.Cham:Springer,2017. [67]FRANK A,ASUNCION A.UCI machine learning repository[EB/OL]. [2022-01-13]. https://archive.ics.uci.edu/ml/index.php. [68]PENG H L,JIN K Z,FU C C,et al.Private Time Series Pattern Mining with Sequential Lattice[J].Acta Electronica Sinica,2020,48(1):153-163. [69]LI H C,WU X P,CHEN Y.k-means clustering method preserving differential privacy in MapReduce framework[J].Journal on Communications,2016,37(2):124-130. [70]XIA C,HUA J,TONG W,et al.Distributed K-Means clustering guaranteeing local differential privacy[J].Computers & Security,2020,90:101699.1-101699.11. [71]NGUYEN T D,GUPTA S,RANA S,et al.Privacy Aware K-Means Clustering with High Utility[C]//Pacific-Asia Confe-rence on Knowledge Discovery & Data Mining.Springer International Publishing,2016:388-400. [72]ZHANG T,ZHU T,XIONG P,et al.Correlated Differential Privacy:Feature Selection in Machine Learning[J].IEEE Transactions on Industrial Informatics,2019,16(3):2115-2124. [73]RATHI M,RAJAVAT A.High Dimensional Data Processing in Privacy Preserving Data Mining[C]//2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT).IEEE,2020. [74]ZHANG T,ZHU T,LIU R,et al.Correlated Data in Differential Privacy:Definition and Analysis[J].Concurrency and Computation:Practice and Experience,2020:e6015. [75]TRAN H Y,HU J.Privacy-preserving big data analytics a comprehensive survey[J].Journal of Parallel and Distributed Computing,2019,134:207-218. [76]GUAN Z,LV Z,DU X,et al.Achieving Data Utility-PrivacyTradeoff in Internet of Medical Things:A Machine Learning Approach[J].Future Generation Computer Systems,2019,98:60-68. [77]SAKELLARIOU G,GOUNARIS A.Homomorphically encryp-ted k-means on cloud-hosted servers with low client-side load[J].Computing,2019,101(12):1813-1836. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[4] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[5] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[6] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[7] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[8] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[9] | 金华, 朱靖宇, 王昌达. 视频隐私保护技术综述 Review on Video Privacy Protection 计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047 |
[10] | 雷羽潇, 段玉聪. 面向跨模态隐私保护的AI治理法律技术化框架 AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection 计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011 |
[11] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述 Survey on Privacy Protection Solutions for Recommended Applications 计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083 |
[12] | 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析 Correlation Analysis for Key-Value Data with Local Differential Privacy 计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122 |
[13] | 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达. 融合语义位置的差分私有位置隐私保护方法 Differentially Private Location Privacy-preserving Scheme withSemantic Location 计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198 |
[14] | 陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法 Differential Privacy Protection Machine Learning Method Based on Features Mapping 计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224 |
[15] | 王辉, 朱国宇, 申自浩, 刘琨, 刘沛骞. 基于用户偏好和位置分布的假位置生成方法 Dummy Location Generation Method Based on User Preference and Location Distribution 计算机科学, 2021, 48(7): 164-171. https://doi.org/10.11896/jsjkx.200800069 |
|