计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 278-283.doi: 10.11896/jsjkx.201200122
孙林, 平国楼, 叶晓俊
SUN Lin, PING Guo-lou, YE Xiao-jun
摘要: 在群智感知系统中,从分布式数据源中持续收集和分析数据可以为先进的数据挖掘模型提供决策支持。由于数据中可能包含个人相关的信息,数据的采集和分析过程中通常伴随着隐私泄露的风险。本地化差分隐私作为先进的隐私保护方案可在用户的隐私性和数据的可用性之间提供较好的权衡。当前,键值数据作为异构类型数据,其同时含有分类数据和数值数据,基于本地化差分隐私在多维度下对键值数据进行关联分析面临着一定的挑战。针对隐私保护前提下键值数据的发布和关联分析问题,首先定义了键值数据的频率关联和均值关联问题,然后提出了适用于键值对的索引独热编码,为键值数据提供本地化差分隐私保护,最后在扰动的数据上对键值数据进行关联分析。基于仿真数据集和真实数据集的实验和理论分析验证了所提方案的有效性。
中图分类号:
[1]YANG G M,YANG J,ZHANG J P.Research on Data Publi-shing of Privacy Preserving[J].Computer Science,2011(9):17-23. [2]WANG B,YANG J.Research on Anonymity Technique forPersonalization Privacy-preserving Data Publishing[J].Compu-ter Science,2012,39(4):168-171. [3]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.Heidelberg:Springer,2006:265-284. [4]ABADI M,CHU A,GOODFELLOW I,et al.Deep learningwith differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna:ACM,2016:308-318. [5]PHAN N H,WANG Y,WU X,et al.Differential privacy preservation for deep auto-encoders:an application of human behavior prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Arizona:AAAI Press,2016:1309-1315. [6]CHEN R,XIAO Q,ZHANG Y,et al.Differentially privatehigh-dimensional data publication via sampling-based inference[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Sydeny:ACM,2015:129-138. [7]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.Washington:IEEE,2013:429-438. [8]BLOOM B H.Space/time trade-offs in hash coding with allowa-ble errors[J].Communications of the ACM,1970,13(7):422-426. [9]ERLINGSSON U,PIHUR V,KOROLOVA A.RAPPOR:Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Compu-ter and Communications Security.Arizona:ACM,2014:1054-1067. [10]NGUYEN T T,XIAO X X,YANG Y,et al.Collecting and Ana-lyzing Data from Smart Device Users with Local Differential Privacy[EB/OL].https://arxiv.org/abs/1606.05053. [11]WANG N,XIAO X K,YANG Y,et al.Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]//IEEE 35th International Conference on Data Engineering.Macau:IEEE Press,2019:638-649. [12]QIN Z,YANG Y,YU T,et al.Heavy hitter estimation over set-valued data with local differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna:ACM,2016:192-203. [13]SWANG T H,LI N H,JHA S.Locally differentially private frequent itemset mining[C]//2018 IEEE Symposium on Security and Privacy.San Francisco:IEEE Press,2018:127-143. [14]BASSILY R,NISSIM K,STEMMER U,et al.Practical locally private heavy hitters[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:Curran Associates Inc,2017:2285-2293. [15]CORMODE G,KULKARNI T,SRIVASTAVA D.Marginalrelease under local differential privacy[C]//Proceedings of the 2018 International Conference on Management of Data.Houston:ACM,2018:131-146. [16]YE Q Q,HU H B,MENG X F.PrivKV:Key-Value Data Collection with Local Differential Privacy[C]//2019 IEEE Symposium on Security and Privacy (SP).San Francisco:IEEE Press,2019:317-331. [17]GU X L,LI M,CHENG Y Q,et al.PCKV:Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility[C]//29th USENIX Security Symposium.USENIX Association,2020:967-984. [18]ZHANG J,CORMODE G,PROCOPIUC C M,et al.Privbayes:Private data release via bayesian networks[J].ACM Transactions on Database Systems,2017,42(4):1-41. [19]BARAK B,CHAUDHURI K,DWORK C,et al.Privacy,accuracy,and consistency too:a holistic solution to contingency table release[C]//Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.New York:ACM,2007:273-282. [20]DING B L,KULKARNI J,YEKHANIN S.Collecting telemetry data privately[C]//Proceedings of the 31st International Conference on Neural Information Processing.California:ACM,2017:3574-3583. [21]DUCHI J C,JORDAN M I.Minimax Optimal Procedures forLocally Private Estimation[J].Journal of the American Statistical Association,2018,113(521):182-201. [22]WANG T H,BLOCKI J,LI N H,et al.Locally differentiallyprivate protocols for frequency estimation[C]//26th USENIX Security Symposium.Vancouver:USENIX Association,2017:729-745. |
[1] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[2] | 李思颖, 徐杨, 王欣, 赵若成. 基于关联分析的铁路旅客同行预测方法 Railway Passenger Co-travel Prediction Based on Association Analysis 计算机科学, 2021, 48(9): 95-102. https://doi.org/10.11896/jsjkx.200700097 |
[3] | 孙明玮, 司维超, 董琪. 基于多维度数据的网络服务质量的综合评估研究 Research on Comprehensive Evaluation of Network Quality of Service Based on Multidimensional Data 计算机科学, 2021, 48(6A): 246-249. https://doi.org/10.11896/jsjkx.200900131 |
[4] | 彭春春, 陈燕俐, 荀艳梅. 支持本地化差分隐私保护的k-modes聚类方法 k-modes Clustering Guaranteeing Local Differential Privacy 计算机科学, 2021, 48(2): 105-113. https://doi.org/10.11896/jsjkx.200700172 |
[5] | 张琴, 陈红梅, 封云飞. 一种基于粗糙集和密度峰值的重叠社区发现方法 Overlapping Community Detection Method Based on Rough Sets and Density Peaks 计算机科学, 2020, 47(5): 72-78. https://doi.org/10.11896/jsjkx.190400160 |
[6] | 李刚, 王超, 韩德鹏, 刘强伟, 李莹. 基于深度主成分相关自编码器的多模态影像遗传数据研究 Study on Multimodal Image Genetic Data Based on Deep Principal Correlated Auto-encoders 计算机科学, 2020, 47(4): 60-66. https://doi.org/10.11896/jsjkx.190300073 |
[7] | 戴欢, 蒋敬敬, 束沁冬, 石鹏展, 史文华. 基于信道状态相位信息的生命体征监测算法 Vital Signs Monitoring Method Based on Channel State Phase Information 计算机科学, 2020, 47(10): 48-54. https://doi.org/10.11896/jsjkx.200500057 |
[8] | 王妍, 韩笑, 曾辉, 刘荆欣, 夏长清. 边缘计算环境下服务质量可信的任务迁移节点选择 Task Migration Node Selection with Reliable Service Quality in Edge Computing Environment 计算机科学, 2020, 47(10): 240-246. https://doi.org/10.11896/jsjkx.190900054 |
[9] | 鲁显光, 杜学绘, 王文娟. 基于改进FP growth的告警关联算法 Alert Correlation Algorithm Based on Improved FP Growth 计算机科学, 2019, 46(8): 64-70. https://doi.org/10.11896/j.issn.1002-137X.2019.08.010 |
[10] | 付泽强, 王晓锋, 孔军. 高性能网络安全告警信息的关联分析方法 High-performance Association Analysis Method for Network Security Alarm Information 计算机科学, 2019, 46(5): 116-121. https://doi.org/10.11896/j.issn.1002-137X.2019.05.018 |
[11] | 茹锋, 徐锦, 常琪, 阚丹会. 一种用于影像遗传学关联分析的高阶统计量结构化稀疏算法 High Order Statistics Structured Sparse Algorithm for Image Genetic Association Analysis 计算机科学, 2019, 46(4): 66-72. https://doi.org/10.11896/j.issn.1002-137X.2019.04.010 |
[12] | 李广璞, 黄妙华. 频繁项集挖掘的研究进展及主流方法 Research Progress and Mainstream Methods of Frequent Itemsets Mining 计算机科学, 2018, 45(11A): 1-11. |
[13] | 吴珺,王春枝. 面向大数据的多维粒矩阵关联分析及应用 Multiple Correlation Analysis and Application of Granular Matrix Based on Big Data 计算机科学, 2017, 44(Z11): 407-410. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.086 |
[14] | 琚安康,郭渊博,朱泰铭,王通. 网络安全事件关联分析技术与工具研究 Survey on Network Security Event Correlation Analysis Methods and Tools 计算机科学, 2017, 44(2): 38-45. https://doi.org/10.11896/j.issn.1002-137X.2017.02.004 |
[15] | 胡文生,杨剑锋,赵明. 类设计质量评估方法的研究 Methodology for Classes Design Quality Assessment 计算机科学, 2017, 44(12): 150-155. https://doi.org/10.11896/j.issn.1002-137X.2017.12.029 |
|