计算机科学 ›› 2021, Vol. 48 ›› Issue (2): 105-113.doi: 10.11896/jsjkx.200700172
彭春春, 陈燕俐, 荀艳梅
PENG Chun-chun, CHEN Yan-li, XUN Yan-mei
摘要: 如何在保护数据隐私的同时进行可用性的数据挖掘已成为热点问题。鉴于在很多实际应用场景中,很难找到一个真正可信的第三方对用户的敏感数据进行处理,文中首次提出了一种支持本地化差分隐私技术的聚类方案——LDPK-modes(Local Differential Privacy K-modes)。与传统的基于中心化差分隐私的聚类算法相比,其不再需要一个可信的第三方对数据进行收集和处理,而由用户担任数据隐私化的工作,极大地降低了第三方窃取用户隐私的可能性。用户使用满足本地d-隐私(带有距离度量的本地差分隐私技术)定义的随机响应机制对敏感数据进行扰动,第三方收集到用户扰动数据后,恢复其统计特征,生成合成数据集,并进行k-modes聚类。在聚类过程中,将数据集上频繁出现的特征分配给初始聚类中心点,进一步提高了聚类结果的可用性。理论分析和实验结果表明了LDPK-modes的隐私性和聚类可用性。
中图分类号:
[1] DWORK C.Differential privacy:A survey of results[C]//International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19. [2] SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570. [3] MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-diversity:Privacy beyond k-anonymity[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):3. [4] YE Q Q,MENG X F,ZHU M J,et al.Survey on local differen-tial privacy[J].Journal of Software,2018,29(7):1981-2005. [5] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.IEEE,2013:429-438. [6] KASIVISWANATHAN S P,LEE H K,NISSIM K,et al.What can we learn privately[C]// Proc.of the 49th Annual IEEE Symp.on Foundations of Computer Science (FOCS).IEEE,2008:531-540. [7] HOPE T,CHAN J,KITTUR A,et al.Accelerating innovation through analogy mining[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:235-243. [8] MENDES R,VILELA J P.Privacy-preserving data mining:methods,metrics,and applications[J].IEEE Access,2017,5:10562-10582. [9] HAMMING R W.Error detecting and error correcting codes[J].The Bell System Technical Journal,1950,29(2):147-160. [10] GU X,LI M,CAO Y,et al.Supporting both range queries and frequency estimation with local differential privacy[C]//2019 IEEE Conference on Communications and Network Security (CNS).IEEE,2019:124-132. [11] WARNER S L.Randomized response:A survey technique foreliminating evasive answer bias[J].Journal of the American Statistical Association,1965,60(309):63-69. [12] ERLINGSSON Ú,PIHUR V,KOROLOVA A.Rappor:Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Compu-ter and Communications Security.2014:1054-1067. [13] BLOOM B H.Space/Time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426. [14] BASSILY R,SMITH A.Local,private,efficient protocols forsuccinct histograms[C]//Proc.of the 47th Annual ACM on Symp.on Theory of Computing.ACM,2015:127-135. [15] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy,data processing inequalities,and statistical minimax rates[J].arXiv:1302.3203,2013. [16] WAINWRIGHT M J,JORDAN M I,DUCHI J C.Privacy aware learning[C]//Advances in Neural Information Processing Systems.2012:1430-1438. [17] NGUYÊN T T,XIAO X,YANG Y,et al.Collecting and analyzing data from smart device users with local differential privacy[J].arXiv:1606.05053,2016. [18] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]//Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138. [19] REN J,XIONG J,YAO Z,et al.DPLK-means:A novel Differential Privacy K-means Mechanism[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC).IEEE,2017:133-139. [20] FU Y M,LI Z Z.Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J].Netinfo Security,2019,19(2):43-52. [21] HU C,YANG G,BAI Y L.Clustering Algorithm in Differential Privacy Preserving[J].Computer Science,2019,46(2):120-126. [22] XIA C,HUA J,TONG W,et al.Distributed K-Means clustering guaranteeing local differential privacy[J].Computers & Security,2020,90:1-11. [23] NGUYEN H H.Privacy-preserving mechanisms for k-modesclustering[J].Computers & Security,2018,78:60-75. [24] LYU Z,WANG L,GUAN Z,et al.An optimizing and differentially private clustering algorithm for mixed data in SDN-based smart grid[J].IEEE Access,2019,7:45773-45782. [25] WANG T,BLOCKI J,LI N,et al.Locally differentially private protocols for frequency estimation[C]//26th Security Sympo-sium (Security 17).2017:729-745. [26] NEWEY K W,MCFADDEN D.Large sample estimation andhypothesis[J].Handbook of Econometrics,1994,4:2111-2245. [27] NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing.2007:75-84. [28] JIANG H,YI S,LI J,et al.Ant clustering algorithm with K-harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684. |
[1] | 邓丽, 武金达, 李科学, 卢亚康. 基于TPE的SpaRC算法超参数优化方法[J]. 计算机科学, 2021, 48(2): 70-75. |
[2] | 王晓飞, 周超, 刘利刚. 基于特征聚类的轻量级图像搜索系统[J]. 计算机科学, 2021, 48(2): 148-152. |
[3] | 郭上铜, 王瑞锦, 张凤荔. 区块链技术原理与应用综述[J]. 计算机科学, 2021, 48(2): 271-281. |
[4] | 余雪勇, 陈涛. 边缘计算场景中基于虚拟映射的隐私保护卸载算法[J]. 计算机科学, 2021, 48(1): 65-71. |
[5] | 梁伟, 段晓东, 徐健锋. 基于差异性度量的基础聚类三支过滤算法[J]. 计算机科学, 2021, 48(1): 136-144. |
[6] | 张煜, 陆亿红, 黄德才. 基于密度峰值的加权犹豫模糊聚类算法[J]. 计算机科学, 2021, 48(1): 145-151. |
[7] | 李彦, 申德荣, 聂铁铮, 寇月. 面向加密云数据的多关键字语义搜索方法[J]. 计算机科学, 2020, 47(9): 318-323. |
[8] | 杨帆, 王俊斌, 白亮. 基于安全性的成对约束扩充算法[J]. 计算机科学, 2020, 47(9): 324-329. |
[9] | 徐守坤, 倪楚涵, 吉晨晨, 李宁. 基于YOLOv3的施工场景安全帽佩戴的图像描述[J]. 计算机科学, 2020, 47(8): 233-240. |
[10] | 陈庆超, 王韬, 冯文博, 尹世庄, 刘丽君. 基于最长连续间隔的未知二进制协议格式推断[J]. 计算机科学, 2020, 47(8): 313-318. |
[11] | 曹素娥, 杨泽民. 基于聚类分析算法和优化支持向量机的无线网络流量预测[J]. 计算机科学, 2020, 47(8): 319-322. |
[12] | 高方远, 王秀美. 一种基于块对角表示和近邻约束的子空间聚类方法[J]. 计算机科学, 2020, 47(7): 66-70. |
[13] | 李向利, 贾梦雪. 基于预处理的超图非负矩阵分解算法[J]. 计算机科学, 2020, 47(7): 71-77. |
[14] | 高玉潼, 雷为民, 原玥. 复杂环境下基于聚类分析的人脸目标识别[J]. 计算机科学, 2020, 47(7): 111-117. |
[15] | 张衡, 马明栋, 王得玉. 基于聚类网络的文本-视频特征学习[J]. 计算机科学, 2020, 47(7): 125-129. |
|