计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 99-105.doi: 10.11896/j.issn.1002-137X.2019.09.013

• 第35届中国数据库学术会议 • 上一篇    下一篇

差分隐私流数据实时发布中的自适应参数优化

吴英杰, 黄鑫, 葛晨, 孙岚   

  1. (福州大学数学与计算机科学学院 福州350116)
  • 收稿日期:2018-07-13 出版日期:2019-09-15 发布日期:2019-09-02
  • 通讯作者: 孙 岚(1978-),女,硕士,讲师,主要研究方向为数据安全与隐私保护,E-mail:15009466@qq.com
  • 作者简介:吴英杰(1979-),男,博士,教授,CCF会员,主要研究方向为数据挖掘、数据安全与隐私保护,E-mail:yjwu@fzu.edu.cn;黄 鑫(1993-),男,硕士,主要研究方向为差分隐私;葛 晨(1992-),男,硕士,主要研究方向为差分隐私;
  • 基金资助:
    国家自然科学基金项目(61300026),福建省自然科学基金项目(2017J01754,2018J01797)

Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication

WU Ying-jie, HUANG Xin, GE Chen, SUN Lan   

  1. (College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350116,China)
  • Received:2018-07-13 Online:2019-09-15 Published:2019-09-02

摘要: 当前许多实际应用需要持续地对流数据的区间统计查询做出实时响应,并使用差分隐私保护模型来应对信息发布过程中的敏感数据泄露问题。现有研究采用树状数组作为组织和存储流数据的数据结构,以满足信息发布的实时性要求。然而,现有方法中的相关参数为预先确定的,并不能很好地适应查询的动态变化。为此,文中提出在流数据实时发布的框架上,引入历史查询信息,以实现发布过程中树高参数的动态调整。首先,使用移动平均法分析历史查询记录,并预测后续的查询范围分布;继而针对预测结果,通过理论推导,得出使得期望误差最小的树高;最终实现差分隐私流数据实时发布中树高参数的自适应优化。实验结果表明,该方法在保证了时间效率的同时,有效地提高了发布结果的精度。

关键词: 差分隐私, 历史查询, 流数据发布, 自适应参数优化

Abstract: Recently,many practical applications need to continuously respond to range queries over streaming data in real-time,and adopt the differential privacy protection to deal with the disclosure of sensitive data in the information publication process.Existing research adopts Fenwick tree as data structure to organize and store data items in the stream to satisfy the real-time requirement in the information publication process.However,parameters in the previous method are predefined,which cannot adapt dynamic changes of the queries well.To solve this problem,based on the framework of real-time differential privacy streaming data publication,this paper proposed a method introducing the historical query records to achieve the adaptive parameter optimization.Firstly,based on moving average method,the historical queries are analyzed to predict the subsequent queries.Then according to the prediction results,the optimistic height of the tree is calculated theoretically,which minimizes the expected error.Finally,the adaptive parameter optimization is achieved in real-time differential privacy streaming data publication.The experimental results show that this method can significantly improve the query accuracy while guaranteeing the time efficiency.

Key words: Adaptive parameter optimization, Differential privacy, Historical queries, Streaming data publication

中图分类号: 

  • TP391
[1]FUNG B C M,WANG K,CHEN R,et al.Privacy-preserving data publishing:A survey of recent developments[J].Acm Computing Surveys,2010,42(4):14.
[2]DWORK C.Differential Privacy:A Survey of Results[C]//International Conference on Theory and Applications of MODELS of Computation.Springer-Verlag,2008:1-19.
[3]ZHANG X J,MENG X F.Differential privacy in data publication and analysis[J].Chinese Journal of Computers,2014,37(4):927-949.(in Chinese)张啸剑,孟小峰.面向数据发布和分析的差分隐私保护[J].计算机学报,2014,37(4):927-949.
[4]DWORK C,NAOR M,PITASSI T,et al.Differential privacyunder continual observation///Proc. of the 42nd ACM Symposium on Theory of Computing(STOC).New York:ACM,2010:715-724.
[5]CHAN T H,SHI E,SONG D.Private and Continual Release of Statistics[J].ACM Trans. on Information and System Security,2011,14(3):1-24.
[6]GE C,WU Y J,SUN L.A Real-time Publishing Method of Differential Privacy Streaming Data[OL].http://kns.cnki.net/kcms/detail/11.5602.TP.20171016.1629.004.html.(in Chinese)葛晨,吴英杰,孙岚.一种差分隐私流数据实时发布方法[OL].http://kns.cnki.net/kcms/detail/11.5602.TP.20171016.1629.004.html.
[7]ZHANG X J,MENG X F.Stream histogram publication methodwith differential privacy[J].Journal of Software,2016,27(2):381-393.(in Chinese)张啸剑,孟小峰.基于差分隐私的流式直方图发布方法[J].软件学报,2016,27(2):381-393.
[8]BOLOT J,FAWAZ N,MUTHUKRISHNAN S,et al.Privatedecayed predicate sums on streams[C]//Proceedings of the 16th International Conference on Extending Database Technology.New York:ACM,2013:284-295.
[9]CHEN Y,MACHANAVAJJHALA A,HAY M,et al.PeGa-Sus:Data-Adaptive Differentially Private Stream Processing[C]//ACM Sigsac Conference.ACM,2017:1375-1388.
[10]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating Noise to Sensitivity in Private Data Analysis[J].Lecture Notes in Computer Science,2012,3876(8):265-284.
[11]FENWICK P M.A new data structure for cumulative frequency tables[J].Software Practice & Experience,1994,24(3):327-336.
[12]LI C,HAY M,RASTOGI V,et al.Optimizing linear countingqueries under differential privacy[C]//Twenty-Ninth ACM Sigmod-Sigact-Sigart Symposium on Principles of Database Systems(PODS 2010).Indianapolis,Indiana,USA,DBLP,2010:123-134.
[13]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accuracy of differentially private histograms through consistency[J].Proceedings of the Vldb Endowment,2010,3(1-2):1021-1032.
[14]KELLARIS G,PAPADOPOULOS S,XIAO X,et al.Differen-tially private event sequences over infinite streams[J].Procee-dings of the Vldb Endowment,2014,7(12):1155-1166.
[1] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[2] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[3] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[4] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5] 董晓梅, 王蕊, 邹欣开.
面向推荐应用的差分隐私方案综述
Survey on Privacy Protection Solutions for Recommended Applications
计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083
[6] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[7] 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达.
融合语义位置的差分私有位置隐私保护方法
Differentially Private Location Privacy-preserving Scheme withSemantic Location
计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198
[8] 陈天荣, 凌捷.
基于特征映射的差分隐私保护机器学习方法
Differential Privacy Protection Machine Learning Method Based on Features Mapping
计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224
[9] 王乐业.
群智感知中的地理位置本地化差分隐私机制:现状与机遇
Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities
计算机科学, 2021, 48(6): 301-305. https://doi.org/10.11896/jsjkx.201200223
[10] 彭春春, 陈燕俐, 荀艳梅.
支持本地化差分隐私保护的k-modes聚类方法
k-modes Clustering Guaranteeing Local Differential Privacy
计算机科学, 2021, 48(2): 105-113. https://doi.org/10.11896/jsjkx.200700172
[11] 王毛妮, 彭长根, 何文竹, 丁兴, 丁红发.
基于图论与互信息量的差分隐私度量模型
Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information
计算机科学, 2020, 47(4): 270-277. https://doi.org/10.11896/jsjkx.190400098
[12] 李兰, 杨晨, 王安福.
差分隐私模型中隐私参数ε的选取研究
Study on Selection of Privacy Parameters ε in Differential Privacy Model
计算机科学, 2019, 46(8): 201-205. https://doi.org/10.11896/j.issn.1002-137X.2019.08.033
[13] 胡闯, 杨庚, 白云璐.
面向差分隐私保护的聚类算法
Clustering Algorithm in Differential Privacy Preserving
计算机科学, 2019, 46(2): 120-126. https://doi.org/10.11896/j.issn.1002-137X.2019.02.019
[14] 李森有, 季新生, 游伟, 赵星.
一种基于差分隐私的数据查询分级控制策略
Hierarchical Control Strategy for Data Querying Based on Differential Privacy
计算机科学, 2019, 46(11): 130-136. https://doi.org/10.11896/jsjkx.180901690
[15] 崔一辉, 宋伟, 彭智勇, 杨先娣.
基于差分隐私的多源数据关联规则挖掘方法
Mining Method of Association Rules Based on Differential Privacy
计算机科学, 2018, 45(6): 36-40. https://doi.org/10.11896/j.issn.1002-137X.2018.06.006
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!