计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 36-40.doi: 10.11896/j.issn.1002-137X.2018.06.006
• 第十四届全国Web信息系统及其应用学术会议 • 上一篇 下一篇
崔一辉, 宋伟, 彭智勇, 杨先娣
CUI Yi-hui, SONG Wei, PENG Zhi-yong, YANG Xian-di
摘要: 随着大数据时代的到来,挖掘大数据的潜在价值越来越受到学术界和工业界的关注。但与此同时,由于互联网安全事件频发,用户越来越多地关注个人隐私数据的泄露问题,用户数据的安全问题成为阻碍大数据分析的首要问题之一。关于用户数据的安全性问题,现有研究更多地关注访问控制、密文检索和结果验证,虽然可以保证用户数据本身的安全性,但是无法挖掘出所保护数据的潜在价值。如何既能保护用户的数据安全又能挖掘数据的潜在价值,是亟需解决的关键问题之一。文中提出了一种基于差分隐私保护的关联规则挖掘方法,数据拥有者使用拉普拉斯机制和指数机制在数据发布的过程中对用户数据进行保护,数据分析者在差分隐私的FP-tree上进行关联规则挖掘。其中的安全性假设是:攻击者即使掌握了除攻击目标以外的所有元组数据信息的背景知识,仍旧无法获得攻击目标的信息,因此具有极高的安全性。所提方法是兼顾安全性、性能和准确性,以牺牲部分精确率为代价,大幅增加了用户数据的安全性和处理性能。实验结果表明,所提方法的精确性损失在可接受的范围内,性能优于已有算法的性能。
中图分类号:
[1]AGRAWAL R,SRIKANT R.Privacy-preserving data mining[C]//ACM Sigmod International Conference on Management of Data.ACM,2000:439-450. [2]CHANDRAMOULI B,GOLDSTEIN J,QUAMAR A.Scalable progressive analytics on big data in the cloud[J].Proceedings of the VLDB Endowment,2013,6(14):1726-1737. [3]CHANDRAMOULI B,GOLDSTEIN J,DUANS.Temporal analytics on big data for web advertising[C]//International Confe-rence on Data Engineering.IEEE Computer Socieyt,2013:90-101. [4]LI B,MAZUR E,DIAO Y,et al.A platform for scalable one-pass analytics using mapreduce[C]//ACM SIGMOD International Conference on Management of Data.ACM,2011:985-996. [5]JOHNSON A,SHMATIKOV V.Privacy-preserving data exploration in genome-wide association studies[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2013:1079-1087. [6]BONOMI L,XIONG L.Mining frequent patterns with differential privacy[J].Proceedings of the VLDB Endowment,2013,6(12):1422-1427. [7]XU S,SU S,CHENG X,et al.Differentially private frequent sequence mining via sampling-based candidate pruning[C]//2015 IEEE 31st International Conference on Data Engineering (ICDE).IEEE,2015:1035-1046. [8]LI N,QARDAJI W,SU D,et al.Privbasis:Frequent itemset mining with differential privacy[J].Proceedings of the VLDB Endowment,2012,5(11):1340-1351. [9]ZENG C,NAUGHTON J F,CAI J Y.On differentially private frequent itemsetmining[J].Proceedings of the VLDB Endowment,2012,6(1):25-36. [10]BHASKAR R,LAXMAN S,SMITH A,et al.Discovering frequent patterns in sensitive data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2010:503-512. [11]WONG K S,KIM M H.Privacy-preserving frequent itemsets mining via secure collaborative framework[J].Security and Communication Networks,2012,5(3):263-272. [12]NANAVATI N R,JINWALA D C.A novel privacy‐preserving scheme for collaborative frequent itemset mining across vertically partitioned data[J].Security and Communication Networks,2015,8(18):4407-4420. [13]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407. [14]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.Springer Berlin Heidelberg,2006:265-284. [15]GIANNOTTI F,LAKSHMANAN L V S,MONREALE A,et al.Privacy-preserving mining of association rules from outsourced transaction databases[J].IEEE Systems Journal,2013,7(3):385-395. [16]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data.ACM,2009:19-30. [17]ROY I,SETTY S T V,KILZER A,et al.Airavat:Security and Privacy for MapReduce[C]//Usenix Symposium on Networked Systems Design and Implementation(NSDI 2010).San Jose,CA,USA,2010:297-312. [18]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]//ACM SIGMOD International Conference on Management of data.ACM,2000:1-12. [19]XIONG P,ZHU T Q,WANG X F.A survey on differential privacy and applications[J].Chinese Journal of Computers,2014,37(1):101-122.(in Chinese) 熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122. [20]BLAKE C L,MERZ C J.UCI Repository of machine learning databases [OL].http://www.ics.uci.edu/~mlearn/MLReposi-tory.html. |
[1] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[2] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[3] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[4] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[5] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述 Survey on Privacy Protection Solutions for Recommended Applications 计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083 |
[6] | 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析 Correlation Analysis for Key-Value Data with Local Differential Privacy 计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122 |
[7] | 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达. 融合语义位置的差分私有位置隐私保护方法 Differentially Private Location Privacy-preserving Scheme withSemantic Location 计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198 |
[8] | 陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法 Differential Privacy Protection Machine Learning Method Based on Features Mapping 计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224 |
[9] | 王乐业. 群智感知中的地理位置本地化差分隐私机制:现状与机遇 Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities 计算机科学, 2021, 48(6): 301-305. https://doi.org/10.11896/jsjkx.201200223 |
[10] | 彭春春, 陈燕俐, 荀艳梅. 支持本地化差分隐私保护的k-modes聚类方法 k-modes Clustering Guaranteeing Local Differential Privacy 计算机科学, 2021, 48(2): 105-113. https://doi.org/10.11896/jsjkx.200700172 |
[11] | 王毛妮, 彭长根, 何文竹, 丁兴, 丁红发. 基于图论与互信息量的差分隐私度量模型 Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information 计算机科学, 2020, 47(4): 270-277. https://doi.org/10.11896/jsjkx.190400098 |
[12] | 吴英杰, 黄鑫, 葛晨, 孙岚. 差分隐私流数据实时发布中的自适应参数优化 Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication 计算机科学, 2019, 46(9): 99-105. https://doi.org/10.11896/j.issn.1002-137X.2019.09.013 |
[13] | 李兰, 杨晨, 王安福. 差分隐私模型中隐私参数ε的选取研究 Study on Selection of Privacy Parameters ε in Differential Privacy Model 计算机科学, 2019, 46(8): 201-205. https://doi.org/10.11896/j.issn.1002-137X.2019.08.033 |
[14] | 胡闯, 杨庚, 白云璐. 面向差分隐私保护的聚类算法 Clustering Algorithm in Differential Privacy Preserving 计算机科学, 2019, 46(2): 120-126. https://doi.org/10.11896/j.issn.1002-137X.2019.02.019 |
[15] | 李森有, 季新生, 游伟, 赵星. 一种基于差分隐私的数据查询分级控制策略 Hierarchical Control Strategy for Data Querying Based on Differential Privacy 计算机科学, 2019, 46(11): 130-136. https://doi.org/10.11896/jsjkx.180901690 |
|