计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 36-40.doi: 10.11896/j.issn.1002-137X.2018.06.006

• 第十四届全国Web信息系统及其应用学术会议 • 上一篇    下一篇

基于差分隐私的多源数据关联规则挖掘方法

崔一辉, 宋伟, 彭智勇, 杨先娣   

  1. 武汉大学计算机学院 武汉430072
  • 收稿日期:2017-03-11 出版日期:2018-06-15 发布日期:2018-07-24
  • 作者简介:崔一辉(1981-),男,博士生,主要研究方向为可信数据管理;宋 伟(1978-),男,博士,副教授,主要研究方向为可信数据管理,E-mail:songwei@whu.edu.cn(通信作者);彭智勇(1963-),男,博士,教授,主要研究方向为海量数据管理;杨先娣(1974-),女,博士,副教授,主要研究方向为社区数据管理
  • 基金资助:
    本文受国家自然科学基金(61232002,61572378,61202034),湖南省自然科学基金面上项目(2017CFB420),CCF中文信息技术开放课题基金(CCF2014-01-02),武汉市创新团队项目(2014070504020237),武汉大学自主科研项目(2042016gf0020,2016-2017)资助

Mining Method of Association Rules Based on Differential Privacy

CUI Yi-hui, SONG Wei, PENG Zhi-yong, YANG Xian-di   

  1. Computer School,Wuhan University,Wuhan 430072,China
  • Received:2017-03-11 Online:2018-06-15 Published:2018-07-24

摘要: 随着大数据时代的到来,挖掘大数据的潜在价值越来越受到学术界和工业界的关注。但与此同时,由于互联网安全事件频发,用户越来越多地关注个人隐私数据的泄露问题,用户数据的安全问题成为阻碍大数据分析的首要问题之一。关于用户数据的安全性问题,现有研究更多地关注访问控制、密文检索和结果验证,虽然可以保证用户数据本身的安全性,但是无法挖掘出所保护数据的潜在价值。如何既能保护用户的数据安全又能挖掘数据的潜在价值,是亟需解决的关键问题之一。文中提出了一种基于差分隐私保护的关联规则挖掘方法,数据拥有者使用拉普拉斯机制和指数机制在数据发布的过程中对用户数据进行保护,数据分析者在差分隐私的FP-tree上进行关联规则挖掘。其中的安全性假设是:攻击者即使掌握了除攻击目标以外的所有元组数据信息的背景知识,仍旧无法获得攻击目标的信息,因此具有极高的安全性。所提方法是兼顾安全性、性能和准确性,以牺牲部分精确率为代价,大幅增加了用户数据的安全性和处理性能。实验结果表明,所提方法的精确性损失在可接受的范围内,性能优于已有算法的性能。

关键词: 差分隐私, 拉普拉斯机制, 隐私保护的数据挖掘, 指数机制

Abstract: With the advent of the era big data,the potential value of mining big data has attracted more and more attention from academia and industry.However,at the same time,due to frequent Internet security incidents,users are increasingly concerned about the disclosure of personal privacy data,and user data security issues become one of the most important obstacles to big data analysis.With regard to the study of user data security,the existing researches more focus on access control,ciphertext retrieval and result verification.The above researches can guarantee the security of user data itself,but can not dig out the potential value of protected data.Therefore,how to protect the security and dig the potential value of the data in the meantime is one of the key issues that need to be addressed.This paper proposed an association rules mining method based on differential privacy protection.Data owners use Laplacian mechanism and exponential mechanism to protect user data during data release.Data analysis is associated with differential privacy FP-tree Rule mining.The experimental results show that the performance and accuracy of the proposed method are superior to the existing methods.

Key words: Differential privacy, Exponential mechanism, Laplace mechanism, Privacy preserving data mining

中图分类号: 

  • TP311
[1]AGRAWAL R,SRIKANT R.Privacy-preserving data mining[C]//ACM Sigmod International Conference on Management of Data.ACM,2000:439-450.
[2]CHANDRAMOULI B,GOLDSTEIN J,QUAMAR A.Scalable progressive analytics on big data in the cloud[J].Proceedings of the VLDB Endowment,2013,6(14):1726-1737.
[3]CHANDRAMOULI B,GOLDSTEIN J,DUANS.Temporal analytics on big data for web advertising[C]//International Confe-rence on Data Engineering.IEEE Computer Socieyt,2013:90-101.
[4]LI B,MAZUR E,DIAO Y,et al.A platform for scalable one-pass analytics using mapreduce[C]//ACM SIGMOD International Conference on Management of Data.ACM,2011:985-996.
[5]JOHNSON A,SHMATIKOV V.Privacy-preserving data exploration in genome-wide association studies[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2013:1079-1087.
[6]BONOMI L,XIONG L.Mining frequent patterns with differential privacy[J].Proceedings of the VLDB Endowment,2013,6(12):1422-1427.
[7]XU S,SU S,CHENG X,et al.Differentially private frequent sequence mining via sampling-based candidate pruning[C]//2015 IEEE 31st International Conference on Data Engineering (ICDE).IEEE,2015:1035-1046.
[8]LI N,QARDAJI W,SU D,et al.Privbasis:Frequent itemset mining with differential privacy[J].Proceedings of the VLDB Endowment,2012,5(11):1340-1351.
[9]ZENG C,NAUGHTON J F,CAI J Y.On differentially private frequent itemsetmining[J].Proceedings of the VLDB Endowment,2012,6(1):25-36.
[10]BHASKAR R,LAXMAN S,SMITH A,et al.Discovering frequent patterns in sensitive data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2010:503-512.
[11]WONG K S,KIM M H.Privacy-preserving frequent itemsets mining via secure collaborative framework[J].Security and Communication Networks,2012,5(3):263-272.
[12]NANAVATI N R,JINWALA D C.A novel privacy‐preserving scheme for collaborative frequent itemset mining across vertically partitioned data[J].Security and Communication Networks,2015,8(18):4407-4420.
[13]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407.
[14]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.Springer Berlin Heidelberg,2006:265-284.
[15]GIANNOTTI F,LAKSHMANAN L V S,MONREALE A,et al.Privacy-preserving mining of association rules from outsourced transaction databases[J].IEEE Systems Journal,2013,7(3):385-395.
[16]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data.ACM,2009:19-30.
[17]ROY I,SETTY S T V,KILZER A,et al.Airavat:Security and Privacy for MapReduce[C]//Usenix Symposium on Networked Systems Design and Implementation(NSDI 2010).San Jose,CA,USA,2010:297-312.
[18]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]//ACM SIGMOD International Conference on Management of data.ACM,2000:1-12.
[19]XIONG P,ZHU T Q,WANG X F.A survey on differential privacy and applications[J].Chinese Journal of Computers,2014,37(1):101-122.(in Chinese)
熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122.
[20]BLAKE C L,MERZ C J.UCI Repository of machine learning databases [OL].http://www.ics.uci.edu/~mlearn/MLReposi-tory.html.
[1] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[2] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[3] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[4] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5] 董晓梅, 王蕊, 邹欣开.
面向推荐应用的差分隐私方案综述
Survey on Privacy Protection Solutions for Recommended Applications
计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083
[6] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[7] 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达.
融合语义位置的差分私有位置隐私保护方法
Differentially Private Location Privacy-preserving Scheme withSemantic Location
计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198
[8] 陈天荣, 凌捷.
基于特征映射的差分隐私保护机器学习方法
Differential Privacy Protection Machine Learning Method Based on Features Mapping
计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224
[9] 王乐业.
群智感知中的地理位置本地化差分隐私机制:现状与机遇
Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities
计算机科学, 2021, 48(6): 301-305. https://doi.org/10.11896/jsjkx.201200223
[10] 彭春春, 陈燕俐, 荀艳梅.
支持本地化差分隐私保护的k-modes聚类方法
k-modes Clustering Guaranteeing Local Differential Privacy
计算机科学, 2021, 48(2): 105-113. https://doi.org/10.11896/jsjkx.200700172
[11] 王毛妮, 彭长根, 何文竹, 丁兴, 丁红发.
基于图论与互信息量的差分隐私度量模型
Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information
计算机科学, 2020, 47(4): 270-277. https://doi.org/10.11896/jsjkx.190400098
[12] 吴英杰, 黄鑫, 葛晨, 孙岚.
差分隐私流数据实时发布中的自适应参数优化
Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication
计算机科学, 2019, 46(9): 99-105. https://doi.org/10.11896/j.issn.1002-137X.2019.09.013
[13] 李兰, 杨晨, 王安福.
差分隐私模型中隐私参数ε的选取研究
Study on Selection of Privacy Parameters ε in Differential Privacy Model
计算机科学, 2019, 46(8): 201-205. https://doi.org/10.11896/j.issn.1002-137X.2019.08.033
[14] 胡闯, 杨庚, 白云璐.
面向差分隐私保护的聚类算法
Clustering Algorithm in Differential Privacy Preserving
计算机科学, 2019, 46(2): 120-126. https://doi.org/10.11896/j.issn.1002-137X.2019.02.019
[15] 李森有, 季新生, 游伟, 赵星.
一种基于差分隐私的数据查询分级控制策略
Hierarchical Control Strategy for Data Querying Based on Differential Privacy
计算机科学, 2019, 46(11): 130-136. https://doi.org/10.11896/jsjkx.180901690
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!