计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 105-113.doi: 10.11896/jsjkx.200700172
彭春春, 陈燕俐, 荀艳梅
PENG Chun-chun, CHEN Yan-li, XUN Yan-mei
摘要: 如何在保护数据隐私的同时进行可用性的数据挖掘已成为热点问题。鉴于在很多实际应用场景中,很难找到一个真正可信的第三方对用户的敏感数据进行处理,文中首次提出了一种支持本地化差分隐私技术的聚类方案——LDPK-modes(Local Differential Privacy K-modes)。与传统的基于中心化差分隐私的聚类算法相比,其不再需要一个可信的第三方对数据进行收集和处理,而由用户担任数据隐私化的工作,极大地降低了第三方窃取用户隐私的可能性。用户使用满足本地d-隐私(带有距离度量的本地差分隐私技术)定义的随机响应机制对敏感数据进行扰动,第三方收集到用户扰动数据后,恢复其统计特征,生成合成数据集,并进行k-modes聚类。在聚类过程中,将数据集上频繁出现的特征分配给初始聚类中心点,进一步提高了聚类结果的可用性。理论分析和实验结果表明了LDPK-modes的隐私性和聚类可用性。
中图分类号:
[1] DWORK C.Differential privacy:A survey of results[C]//International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19. [2] SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570. [3] MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-diversity:Privacy beyond k-anonymity[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):3. [4] YE Q Q,MENG X F,ZHU M J,et al.Survey on local differen-tial privacy[J].Journal of Software,2018,29(7):1981-2005. [5] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.IEEE,2013:429-438. [6] KASIVISWANATHAN S P,LEE H K,NISSIM K,et al.What can we learn privately[C]// Proc.of the 49th Annual IEEE Symp.on Foundations of Computer Science (FOCS).IEEE,2008:531-540. [7] HOPE T,CHAN J,KITTUR A,et al.Accelerating innovation through analogy mining[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:235-243. [8] MENDES R,VILELA J P.Privacy-preserving data mining:methods,metrics,and applications[J].IEEE Access,2017,5:10562-10582. [9] HAMMING R W.Error detecting and error correcting codes[J].The Bell System Technical Journal,1950,29(2):147-160. [10] GU X,LI M,CAO Y,et al.Supporting both range queries and frequency estimation with local differential privacy[C]//2019 IEEE Conference on Communications and Network Security (CNS).IEEE,2019:124-132. [11] WARNER S L.Randomized response:A survey technique foreliminating evasive answer bias[J].Journal of the American Statistical Association,1965,60(309):63-69. [12] ERLINGSSON Ú,PIHUR V,KOROLOVA A.Rappor:Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Compu-ter and Communications Security.2014:1054-1067. [13] BLOOM B H.Space/Time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426. [14] BASSILY R,SMITH A.Local,private,efficient protocols forsuccinct histograms[C]//Proc.of the 47th Annual ACM on Symp.on Theory of Computing.ACM,2015:127-135. [15] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy,data processing inequalities,and statistical minimax rates[J].arXiv:1302.3203,2013. [16] WAINWRIGHT M J,JORDAN M I,DUCHI J C.Privacy aware learning[C]//Advances in Neural Information Processing Systems.2012:1430-1438. [17] NGUYÊN T T,XIAO X,YANG Y,et al.Collecting and analyzing data from smart device users with local differential privacy[J].arXiv:1606.05053,2016. [18] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]//Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138. [19] REN J,XIONG J,YAO Z,et al.DPLK-means:A novel Differential Privacy K-means Mechanism[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC).IEEE,2017:133-139. [20] FU Y M,LI Z Z.Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J].Netinfo Security,2019,19(2):43-52. [21] HU C,YANG G,BAI Y L.Clustering Algorithm in Differential Privacy Preserving[J].Computer Science,2019,46(2):120-126. [22] XIA C,HUA J,TONG W,et al.Distributed K-Means clustering guaranteeing local differential privacy[J].Computers & Security,2020,90:1-11. [23] NGUYEN H H.Privacy-preserving mechanisms for k-modesclustering[J].Computers & Security,2018,78:60-75. [24] LYU Z,WANG L,GUAN Z,et al.An optimizing and differentially private clustering algorithm for mixed data in SDN-based smart grid[J].IEEE Access,2019,7:45773-45782. [25] WANG T,BLOCKI J,LI N,et al.Locally differentially private protocols for frequency estimation[C]//26th Security Sympo-sium (Security 17).2017:729-745. [26] NEWEY K W,MCFADDEN D.Large sample estimation andhypothesis[J].Handbook of Econometrics,1994,4:2111-2245. [27] NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing.2007:75-84. [28] JIANG H,YI S,LI J,et al.Ant clustering algorithm with K-harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[4] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[5] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[6] | 刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230 |
[7] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059 |
[8] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[9] | 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042 |
[10] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[11] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[12] | 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174 |
[13] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[14] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[15] | 朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131 |
|