计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 36-40.doi: 10.11896/j.issn.1002-137X.2018.06.006

• 第十四届全国Web信息系统及其应用学术会议 • 上一篇    下一篇

基于差分隐私的多源数据关联规则挖掘方法

崔一辉, 宋伟, 彭智勇, 杨先娣   

  1. 武汉大学计算机学院 武汉430072
  • 收稿日期:2017-03-11 出版日期:2018-06-15 发布日期:2018-07-24
  • 作者简介:崔一辉(1981-),男,博士生,主要研究方向为可信数据管理;宋 伟(1978-),男,博士,副教授,主要研究方向为可信数据管理,E-mail:songwei@whu.edu.cn(通信作者);彭智勇(1963-),男,博士,教授,主要研究方向为海量数据管理;杨先娣(1974-),女,博士,副教授,主要研究方向为社区数据管理
  • 基金资助:
    本文受国家自然科学基金(61232002,61572378,61202034),湖南省自然科学基金面上项目(2017CFB420),CCF中文信息技术开放课题基金(CCF2014-01-02),武汉市创新团队项目(2014070504020237),武汉大学自主科研项目(2042016gf0020,2016-2017)资助

Mining Method of Association Rules Based on Differential Privacy

CUI Yi-hui, SONG Wei, PENG Zhi-yong, YANG Xian-di   

  1. Computer School,Wuhan University,Wuhan 430072,China
  • Received:2017-03-11 Online:2018-06-15 Published:2018-07-24

摘要: 随着大数据时代的到来,挖掘大数据的潜在价值越来越受到学术界和工业界的关注。但与此同时,由于互联网安全事件频发,用户越来越多地关注个人隐私数据的泄露问题,用户数据的安全问题成为阻碍大数据分析的首要问题之一。关于用户数据的安全性问题,现有研究更多地关注访问控制、密文检索和结果验证,虽然可以保证用户数据本身的安全性,但是无法挖掘出所保护数据的潜在价值。如何既能保护用户的数据安全又能挖掘数据的潜在价值,是亟需解决的关键问题之一。文中提出了一种基于差分隐私保护的关联规则挖掘方法,数据拥有者使用拉普拉斯机制和指数机制在数据发布的过程中对用户数据进行保护,数据分析者在差分隐私的FP-tree上进行关联规则挖掘。其中的安全性假设是:攻击者即使掌握了除攻击目标以外的所有元组数据信息的背景知识,仍旧无法获得攻击目标的信息,因此具有极高的安全性。所提方法是兼顾安全性、性能和准确性,以牺牲部分精确率为代价,大幅增加了用户数据的安全性和处理性能。实验结果表明,所提方法的精确性损失在可接受的范围内,性能优于已有算法的性能。

关键词: 隐私保护的数据挖掘, 差分隐私, 拉普拉斯机制, 指数机制

Abstract: With the advent of the era big data,the potential value of mining big data has attracted more and more attention from academia and industry.However,at the same time,due to frequent Internet security incidents,users are increasingly concerned about the disclosure of personal privacy data,and user data security issues become one of the most important obstacles to big data analysis.With regard to the study of user data security,the existing researches more focus on access control,ciphertext retrieval and result verification.The above researches can guarantee the security of user data itself,but can not dig out the potential value of protected data.Therefore,how to protect the security and dig the potential value of the data in the meantime is one of the key issues that need to be addressed.This paper proposed an association rules mining method based on differential privacy protection.Data owners use Laplacian mechanism and exponential mechanism to protect user data during data release.Data analysis is associated with differential privacy FP-tree Rule mining.The experimental results show that the performance and accuracy of the proposed method are superior to the existing methods.

Key words: Privacy preserving data mining, Differential privacy, Laplace mechanism, Exponential mechanism

中图分类号: 

  • TP311
[1]AGRAWAL R,SRIKANT R.Privacy-preserving data mining[C]//ACM Sigmod International Conference on Management of Data.ACM,2000:439-450.
[2]CHANDRAMOULI B,GOLDSTEIN J,QUAMAR A.Scalable progressive analytics on big data in the cloud[J].Proceedings of the VLDB Endowment,2013,6(14):1726-1737.
[3]CHANDRAMOULI B,GOLDSTEIN J,DUANS.Temporal analytics on big data for web advertising[C]//International Confe-rence on Data Engineering.IEEE Computer Socieyt,2013:90-101.
[4]LI B,MAZUR E,DIAO Y,et al.A platform for scalable one-pass analytics using mapreduce[C]//ACM SIGMOD International Conference on Management of Data.ACM,2011:985-996.
[5]JOHNSON A,SHMATIKOV V.Privacy-preserving data exploration in genome-wide association studies[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2013:1079-1087.
[6]BONOMI L,XIONG L.Mining frequent patterns with differential privacy[J].Proceedings of the VLDB Endowment,2013,6(12):1422-1427.
[7]XU S,SU S,CHENG X,et al.Differentially private frequent sequence mining via sampling-based candidate pruning[C]//2015 IEEE 31st International Conference on Data Engineering (ICDE).IEEE,2015:1035-1046.
[8]LI N,QARDAJI W,SU D,et al.Privbasis:Frequent itemset mining with differential privacy[J].Proceedings of the VLDB Endowment,2012,5(11):1340-1351.
[9]ZENG C,NAUGHTON J F,CAI J Y.On differentially private frequent itemsetmining[J].Proceedings of the VLDB Endowment,2012,6(1):25-36.
[10]BHASKAR R,LAXMAN S,SMITH A,et al.Discovering frequent patterns in sensitive data[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2010:503-512.
[11]WONG K S,KIM M H.Privacy-preserving frequent itemsets mining via secure collaborative framework[J].Security and Communication Networks,2012,5(3):263-272.
[12]NANAVATI N R,JINWALA D C.A novel privacy‐preserving scheme for collaborative frequent itemset mining across vertically partitioned data[J].Security and Communication Networks,2015,8(18):4407-4420.
[13]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407.
[14]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.Springer Berlin Heidelberg,2006:265-284.
[15]GIANNOTTI F,LAKSHMANAN L V S,MONREALE A,et al.Privacy-preserving mining of association rules from outsourced transaction databases[J].IEEE Systems Journal,2013,7(3):385-395.
[16]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data.ACM,2009:19-30.
[17]ROY I,SETTY S T V,KILZER A,et al.Airavat:Security and Privacy for MapReduce[C]//Usenix Symposium on Networked Systems Design and Implementation(NSDI 2010).San Jose,CA,USA,2010:297-312.
[18]HAN J,PEI J,YIN Y.Mining frequent patterns without candidate generation[C]//ACM SIGMOD International Conference on Management of data.ACM,2000:1-12.
[19]XIONG P,ZHU T Q,WANG X F.A survey on differential privacy and applications[J].Chinese Journal of Computers,2014,37(1):101-122.(in Chinese)
熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,37(1):101-122.
[20]BLAKE C L,MERZ C J.UCI Repository of machine learning databases [OL].http://www.ics.uci.edu/~mlearn/MLReposi-tory.html.
[1] 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述[J]. 计算机科学, 2021, 48(9): 21-35.
[2] 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析[J]. 计算机科学, 2021, 48(8): 278-283.
[3] 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达. 融合语义位置的差分私有位置隐私保护方法[J]. 计算机科学, 2021, 48(8): 300-308.
[4] 陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法[J]. 计算机科学, 2021, 48(7): 33-39.
[5] 王乐业. 群智感知中的地理位置本地化差分隐私机制:现状与机遇[J]. 计算机科学, 2021, 48(6): 301-305.
[6] 彭春春, 陈燕俐, 荀艳梅. 支持本地化差分隐私保护的k-modes聚类方法[J]. 计算机科学, 2021, 48(2): 105-113.
[7] 王毛妮, 彭长根, 何文竹, 丁兴, 丁红发. 基于图论与互信息量的差分隐私度量模型[J]. 计算机科学, 2020, 47(4): 270-277.
[8] 吴英杰, 黄鑫, 葛晨, 孙岚. 差分隐私流数据实时发布中的自适应参数优化[J]. 计算机科学, 2019, 46(9): 99-105.
[9] 李兰, 杨晨, 王安福. 差分隐私模型中隐私参数ε的选取研究[J]. 计算机科学, 2019, 46(8): 201-205.
[10] 胡闯, 杨庚, 白云璐. 面向差分隐私保护的聚类算法[J]. 计算机科学, 2019, 46(2): 120-126.
[11] 李森有, 季新生, 游伟, 赵星. 一种基于差分隐私的数据查询分级控制策略[J]. 计算机科学, 2019, 46(11): 130-136.
[12] 彭慧丽,张啸剑,金凯忠. 基于差分隐私的社交推荐方法[J]. 计算机科学, 2017, 44(Z6): 395-398.
[13] 马银方,张琳. 基于差分隐私的LBS群组最近邻查询[J]. 计算机科学, 2017, 44(Z6): 336-341.
[14] 曹春萍,徐帮兵. 一种带隐私保护的基于标签的推荐算法研究[J]. 计算机科学, 2017, 44(8): 134-139.
[15] 沈思倩,毛宇光,江冠儒. 不完全数据集的差分隐私保护决策树研究[J]. 计算机科学, 2017, 44(6): 139-143.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 编辑部. 新网站开通,欢迎大家订阅![J]. 计算机科学, 2018, 1(1): 1 .
[2] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[3] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[4] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[5] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[6] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[7] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[8] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[9] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[10] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .