计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 344-352.doi: 10.11896/jsjkx.210700242
王修君1,2,3, 莫磊1,3, 郑啸1,2,3, 高云全1,2,3
WANG Xiu-jun1,2,3, MO Lei1,3, ZHENG Xiao1,2,3, GAO Yun-quan1,2,3
摘要: 差分隐私技术作为一种有效的隐私保护机制,已被广泛应用在诸多领域。目前已有的静态数据集和动态数据集上的直方图发布方法在处理数据流滑动窗口模型时,往往只能够通过对数据直方图信息添加统一噪声的形式来实现数据保护,这导致了它们在实际应用中存在数据可用性低、时间复杂度高等问题。针对这些问题,文中通过将数据流近似计数技术综合到差分隐私保护算法中,进而提出了一种面向数据流滑动窗口模型的自适应直方图发布方法APS(Adaptive Histogram Publishing Method for Sliding Window)。APS算法首先利用数据流近似计数方法来预测下一时刻滑动窗口内数据的分布信息;然后通过比较估计值与真实值之间的差异来选取合适的发布值;最后对排序后的直方图区间进行聚类处理,并优化其桶内数据的误差。理论分析显示,APS算法能够在减少隐私预算的同时,有效地提高数据的可用性和缩短运行时间。在两种不同的真实数据集上的实验结果也验证了APS算法在数据可用性和运行时间上显著优于现有的基于分组的直方图发布算法。
中图分类号:
[1]FAN L,BONOMI L,XIONG L,et al.Monitoring web browsing behavior with differential privacy[C]//Proceedings of the 23rd International Conference on World Wide Web.2014:177-188. [2]CELA A,JURIK T,HAMOUCHE R,et al.Energy optimalreal-time navigation system[J].IEEE Intelligent Transportation Systems Magazine,2014,6(3):66-79. [3]ALMADANI B,SAEED B,ALROUBAIY A.Healthcare sys-tems integration using real time publish subscribe(RTPS) middleware[J].Computers & Electrical Engineering,2016,50(C):67-78. [4]DWORK C.Differential privacy[C]//Proceedings of the 33rdInternational Conference on Automata,Languages and Programming-Volume Part II.Springer-Verlag,2006:1-12. [5]LUO Z,WU D J,ADELI E,et al.Scalable Differential Privacy With Sparse Network Finetuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:5059-5068. [6]YANG W,SUN Y E,HUANG H,et al.Persistent transportation traffic volume estimation with differential privacy[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):213-231. [7]FICEK J,WANG W,CHEN H,et al.Differential privacy inhealth research:A scoping review[J].Journal of the American Medical Informatics Association,2021,28(10):2269-2276. [8]SUN Z,WANG Y,SHU M,et al.Differential privacy for data and model publishing of medical data[J].IEEE Access,2019,7:152103-152114. [9]FAN L,XIONG L,SUNDERAM V.Differentially private multi-dimensional time series release for traffic monitoring[C]//IFIP Annual Conference on Data and Applications Security and Privacy.Berlin:Springer,2013:33-48. [10]LIU J Q,ZHANG C,FANG Y.Epic:A differential privacyframework to defend smart homes against internet traffic analysis[J].IEEE Internet of Things Journal,2018,5(2):1206-1217. [11]LI H R,XIONG L,JIANG X Q.Differentially private histogramand synthetic data publication[M]//Medical Data Privacy Handbook.Cham:Springer,2015:35-58. [12]QU J J,CAI Y,XIA H K.A survey of differential privacy protection research for dynamic data release[J].Journal of Beijing University of Information Science and Technology(Natural Science Edition),2019,34(6):30-36. [13]XU J,ZHANG Z,XIAO X,et al.Differentially private histo-gram publication[J].The VLDB Journal,2013,22(6):797-822. [14]ACS G,CASTELLUCCIA C,CHEN R.Differentially privatehistogram publishing through lossy compression[C]//IEEE the 12th International Conference on Data Mining.IEEE,2012:1-10. [15]ZHANG X J,SHAO C,MENG X F.Accurate histogram release under differential privacy [J].Journal of Computer Research and Development,2016,53(5):1106-1117. [16]TANG H X,YANG G,BAI Y L.Histogram publishing algorithm based on adaptive differential privacy budget allocation strategy[J].Application Research of Computers,2020,37(7):1952-1957,1963. [17]CHAN T H H,SHI E,SONG D.Private and continual release of statistics[J].ACM Transactions on Information and System Security(TISSEC),2011,14(3):1-24. [18]RASTOGI V,NATH S.Differentially private aggregation ofdistributed time-series with transformation and encryption[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.2010:735-746. [19]LI H F,LEE S Y,SHAN M K.Online mining(recently) maximal frequent itemsets over data streams[C]//The 15th International Workshop on Research Issues in Data Engineering:Stream Data Mining and Applications(RIDE-SDMA'05).IEEE,2005:11-18. [20]NGUYEN H L,WOON Y K,NG W K.A survey on data stream clustering and classification[J].Knowledge and Information Systems,2015,45(3):535-569. [21]LIN F P,WU Y J,WANG Y L,et al.Differentially private statistical publication for two-dimensional data stream [J].Journal of Computer Applications,2015,35(1):88-92. [22]ZHANG X J,MENG X F.Streaming histogram publication me-thod with differential privacy[J].Journal of Software,2016,27(2):381-393. [23]LI Y,LI S.Research on Differential Private Streaming Histogram Publication Algorithm[C]//2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS).IEEE,2018:598-603. [24]GAO R C,MA X B.Dynamic data histogram publishing basedon differential privacy[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:737-743. [25]WU X,TONG N,YE Z,et al.Histogram Publishing Algorithm Based on Sampling Sorting and Greedy Clustering[C]//International Conference on Blockchain and Trustworthy Systems.Springer,Singapore,2019:81-91. [26]SCOTT D W.Histogram[J].Wiley Interdisciplinary Reviews:Computational Statistics,2010,2(1):44-48. [27]CHEN Z,ZHANG A.A survey of approximate quantile computation on large-scale data[J].IEEE Access,2020,8:34585-34597. [28]YANG L,CAO J,YUAN Y,et al.A framework for partitioning and execution of data stream applications in mobile cloud computing[J].ACM SIGMETRICS Performance Evaluation Review,2013,40(4):23-32. [29]LI Y,ORGERIE A C,RODERO I,et al.End-to-end energymodels for Edge Cloud-based IoT platforms:Application to data stream analysis in IoT[J].Future Generation Computer Systems,2018,87:667-678. [30]TANG Y,GEDIK B.Autopipelining for data stream processing[J].IEEE Transactions on Parallel and Distributed Systems,2012,24(12):2344-2354. [31]KOLAJO T,DARAMOLA O,ADEBIYI A.Big data stream analysis:a systematic literature review[J].Journal of Big Data,2019,6(1):1-30. [32]GAROFALAKIS M,GEHRKE J,RASTOGI R.Data StreamManagement:Processing High-Speed Data Streams[J/OL].http://www.springer.com/-SGWID=0-102-1297-72039114-0. [33]CHAN T H H,LI M,SHI E,et al.Differentially private conti-nual monitoring of heavy hitters from distributed streams[C]//International Symposium on Privacy Enhancing Technologies Symposium.Berlin:Springer,2012:140-159. [34]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24(6):757-781. |
[1] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[2] | 胡安祥, 尹小康, 朱肖雅, 刘胜利. 基于数据流特征的比较类函数识别方法 Strcmp-like Function Identification Method Based on Data Flow Feature Matching 计算机科学, 2022, 49(9): 326-332. https://doi.org/10.11896/jsjkx.220200163 |
[3] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[4] | 李霞, 马茜, 白梅, 王习特, 李冠宇, 宁博. RIIM:基于独立模型的在线缺失值填补 RIIM:Real-Time Imputation Based on Individual Models 计算机科学, 2022, 49(8): 56-63. https://doi.org/10.11896/jsjkx.210600180 |
[5] | 陈圆圆, 王志海. 基于聚类分区的多维数据流概念漂移检测方法 Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition 计算机科学, 2022, 49(7): 25-30. https://doi.org/10.11896/jsjkx.210600155 |
[6] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[7] | 庞兴龙, 朱国胜. 基于半监督学习的网络流量分析研究 Survey of Network Traffic Analysis Based on Semi Supervised Learning 计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131 |
[8] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[9] | 夏源, 赵蕴龙, 范其林. 基于信息熵更新权重的数据流集成分类算法 Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight 计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047 |
[10] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[11] | 董晓梅, 王蕊, 邹欣开. 面向推荐应用的差分隐私方案综述 Survey on Privacy Protection Solutions for Recommended Applications 计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083 |
[12] | 汤世征, 张岩峰. DragDL:一种易用的深度学习模型可视化构建系统 DragDL:An Easy-to-Use Graphical DL Model Construction System 计算机科学, 2021, 48(8): 220-225. https://doi.org/10.11896/jsjkx.200900045 |
[13] | 孙林, 平国楼, 叶晓俊. 基于本地化差分隐私的键值数据关联分析 Correlation Analysis for Key-Value Data with Local Differential Privacy 计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122 |
[14] | 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达. 融合语义位置的差分私有位置隐私保护方法 Differentially Private Location Privacy-preserving Scheme withSemantic Location 计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198 |
[15] | 陈天荣, 凌捷. 基于特征映射的差分隐私保护机器学习方法 Differential Privacy Protection Machine Learning Method Based on Features Mapping 计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224 |
|