计算机科学 ›› 2022, Vol. 49 ›› Issue (10): 344-352.doi: 10.11896/jsjkx.210700242

• 信息安全 • 上一篇    下一篇

面向数据流滑动窗口的自适应直方图发布算法

王修君1,2,3, 莫磊1,3, 郑啸1,2,3, 高云全1,2,3   

  1. 1 安徽工业大学计算机科学与技术学院 安徽 马鞍山 243032
    2 合肥综合性国家科学中心人工智能研究院 合肥 230091
    3 安徽省工业互联网智能应用与安全工程实验室 安徽 马鞍山 243032
  • 收稿日期:2021-07-25 修回日期:2021-12-08 出版日期:2022-10-15 发布日期:2022-10-13
  • 通讯作者: 王修君(xjwang@ahut.edu.cn)
  • 基金资助:
    安徽高校协同创新项目(GXXT-2020-012);国家自然科学基金(61402008,61702006,61672038);安徽省重点研发与开发计划面上攻关项目(202004a05020009,201904a05020071);安徽省自然科学基金(2108085MF218);安徽高校自然科学研究项目(KJ2020A0249,KJ2020A0250);安徽普通高校重点实验室开放基金(CS2020-06)

Adaptive Histogram Publishing Algorithm for Sliding Window of Data Stream

WANG Xiu-jun1,2,3, MO Lei1,3, ZHENG Xiao1,2,3, GAO Yun-quan1,2,3   

  1. 1 School of Computer Science and Technology,Anhui University of Technology,Maanshan,Anhui 243032,China
    2 Institute for Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230091,China
    3 Anhui Engineering Laboratory for Intelligent Applications and Security ofIndustrial Internet,Maanshan,Anhui 243032,China
  • Received:2021-07-25 Revised:2021-12-08 Online:2022-10-15 Published:2022-10-13
  • About author:WANG Xiu-jun,born in 1983,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include RFID system tag management and data stream pro-cessing.
  • Supported by:
    University Synergy Innovation Program of Anhui Province(GXXT-2020-012),National Natural Science Foundation of China(61402008,61702006,61672038),Provincial Key Research and Development Program of Anhui Province(202004a05020009,201904a05020071), Natural Science Foundation of Anhui Province(2108085MF218), University Natural Science Research Project of Anhui Province(KJ2020A0249,KJ2020A0250) and Open Fund of Key Laboratory of Anhui Higher Education Institutes(CS2020-06).

摘要: 差分隐私技术作为一种有效的隐私保护机制,已被广泛应用在诸多领域。目前已有的静态数据集和动态数据集上的直方图发布方法在处理数据流滑动窗口模型时,往往只能够通过对数据直方图信息添加统一噪声的形式来实现数据保护,这导致了它们在实际应用中存在数据可用性低、时间复杂度高等问题。针对这些问题,文中通过将数据流近似计数技术综合到差分隐私保护算法中,进而提出了一种面向数据流滑动窗口模型的自适应直方图发布方法APS(Adaptive Histogram Publishing Method for Sliding Window)。APS算法首先利用数据流近似计数方法来预测下一时刻滑动窗口内数据的分布信息;然后通过比较估计值与真实值之间的差异来选取合适的发布值;最后对排序后的直方图区间进行聚类处理,并优化其桶内数据的误差。理论分析显示,APS算法能够在减少隐私预算的同时,有效地提高数据的可用性和缩短运行时间。在两种不同的真实数据集上的实验结果也验证了APS算法在数据可用性和运行时间上显著优于现有的基于分组的直方图发布算法。

关键词: 差分隐私, 滑动窗口, 数据流, 直方图发布, 近似误差, 拉普拉斯误差

Abstract: As one of the most effective privacy protection mechanisms,differential privacy has been widely used in many fields.The existing histogram publishing methods for either static data set or dynamic data set mainly protect the privacy of sliding windows in data streams by adding unified noise.This leads to low data availability,high time complexity and weak privacy protection in their practical applications.In this paper,we tackle this problem by integrating the approximate counting techniques into the differential privacy and proposing an adaptive histogram publishing method for sliding window(APS).Firstly,the proposed APS predicates the distributional information of the sliding windows in the data stream by using an approximate counting method.Secondly,it computes an appropriate value suitable for publishing by checking the difference between estimated values and actual values.Finally,it reduces statistical errors within each interval by clustering.Theoretical analysis shows that the APS algorithm can effectively improve data availability and reduce running time while reducing the privacy budget.Experimental results on two different real data sets also verify the superiority of APS algorithm over existing grouping-based histogram publishing algorithms in terms of data availability and running time.

Key words: Differential privacy, Sliding window, Data stream, Histogram publishing, Approximation error, Laplacian error

中图分类号: 

  • TP309
[1]FAN L,BONOMI L,XIONG L,et al.Monitoring web browsing behavior with differential privacy[C]//Proceedings of the 23rd International Conference on World Wide Web.2014:177-188.
[2]CELA A,JURIK T,HAMOUCHE R,et al.Energy optimalreal-time navigation system[J].IEEE Intelligent Transportation Systems Magazine,2014,6(3):66-79.
[3]ALMADANI B,SAEED B,ALROUBAIY A.Healthcare sys-tems integration using real time publish subscribe(RTPS) middleware[J].Computers & Electrical Engineering,2016,50(C):67-78.
[4]DWORK C.Differential privacy[C]//Proceedings of the 33rdInternational Conference on Automata,Languages and Programming-Volume Part II.Springer-Verlag,2006:1-12.
[5]LUO Z,WU D J,ADELI E,et al.Scalable Differential Privacy With Sparse Network Finetuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:5059-5068.
[6]YANG W,SUN Y E,HUANG H,et al.Persistent transportation traffic volume estimation with differential privacy[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):213-231.
[7]FICEK J,WANG W,CHEN H,et al.Differential privacy inhealth research:A scoping review[J].Journal of the American Medical Informatics Association,2021,28(10):2269-2276.
[8]SUN Z,WANG Y,SHU M,et al.Differential privacy for data and model publishing of medical data[J].IEEE Access,2019,7:152103-152114.
[9]FAN L,XIONG L,SUNDERAM V.Differentially private multi-dimensional time series release for traffic monitoring[C]//IFIP Annual Conference on Data and Applications Security and Privacy.Berlin:Springer,2013:33-48.
[10]LIU J Q,ZHANG C,FANG Y.Epic:A differential privacyframework to defend smart homes against internet traffic analysis[J].IEEE Internet of Things Journal,2018,5(2):1206-1217.
[11]LI H R,XIONG L,JIANG X Q.Differentially private histogramand synthetic data publication[M]//Medical Data Privacy Handbook.Cham:Springer,2015:35-58.
[12]QU J J,CAI Y,XIA H K.A survey of differential privacy protection research for dynamic data release[J].Journal of Beijing University of Information Science and Technology(Natural Science Edition),2019,34(6):30-36.
[13]XU J,ZHANG Z,XIAO X,et al.Differentially private histo-gram publication[J].The VLDB Journal,2013,22(6):797-822.
[14]ACS G,CASTELLUCCIA C,CHEN R.Differentially privatehistogram publishing through lossy compression[C]//IEEE the 12th International Conference on Data Mining.IEEE,2012:1-10.
[15]ZHANG X J,SHAO C,MENG X F.Accurate histogram release under differential privacy [J].Journal of Computer Research and Development,2016,53(5):1106-1117.
[16]TANG H X,YANG G,BAI Y L.Histogram publishing algorithm based on adaptive differential privacy budget allocation strategy[J].Application Research of Computers,2020,37(7):1952-1957,1963.
[17]CHAN T H H,SHI E,SONG D.Private and continual release of statistics[J].ACM Transactions on Information and System Security(TISSEC),2011,14(3):1-24.
[18]RASTOGI V,NATH S.Differentially private aggregation ofdistributed time-series with transformation and encryption[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.2010:735-746.
[19]LI H F,LEE S Y,SHAN M K.Online mining(recently) maximal frequent itemsets over data streams[C]//The 15th International Workshop on Research Issues in Data Engineering:Stream Data Mining and Applications(RIDE-SDMA'05).IEEE,2005:11-18.
[20]NGUYEN H L,WOON Y K,NG W K.A survey on data stream clustering and classification[J].Knowledge and Information Systems,2015,45(3):535-569.
[21]LIN F P,WU Y J,WANG Y L,et al.Differentially private statistical publication for two-dimensional data stream [J].Journal of Computer Applications,2015,35(1):88-92.
[22]ZHANG X J,MENG X F.Streaming histogram publication me-thod with differential privacy[J].Journal of Software,2016,27(2):381-393.
[23]LI Y,LI S.Research on Differential Private Streaming Histogram Publication Algorithm[C]//2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS).IEEE,2018:598-603.
[24]GAO R C,MA X B.Dynamic data histogram publishing basedon differential privacy[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:737-743.
[25]WU X,TONG N,YE Z,et al.Histogram Publishing Algorithm Based on Sampling Sorting and Greedy Clustering[C]//International Conference on Blockchain and Trustworthy Systems.Springer,Singapore,2019:81-91.
[26]SCOTT D W.Histogram[J].Wiley Interdisciplinary Reviews:Computational Statistics,2010,2(1):44-48.
[27]CHEN Z,ZHANG A.A survey of approximate quantile computation on large-scale data[J].IEEE Access,2020,8:34585-34597.
[28]YANG L,CAO J,YUAN Y,et al.A framework for partitioning and execution of data stream applications in mobile cloud computing[J].ACM SIGMETRICS Performance Evaluation Review,2013,40(4):23-32.
[29]LI Y,ORGERIE A C,RODERO I,et al.End-to-end energymodels for Edge Cloud-based IoT platforms:Application to data stream analysis in IoT[J].Future Generation Computer Systems,2018,87:667-678.
[30]TANG Y,GEDIK B.Autopipelining for data stream processing[J].IEEE Transactions on Parallel and Distributed Systems,2012,24(12):2344-2354.
[31]KOLAJO T,DARAMOLA O,ADEBIYI A.Big data stream analysis:a systematic literature review[J].Journal of Big Data,2019,6(1):1-30.
[32]GAROFALAKIS M,GEHRKE J,RASTOGI R.Data StreamManagement:Processing High-Speed Data Streams[J/OL].http://www.springer.com/-SGWID=0-102-1297-72039114-0.
[33]CHAN T H H,LI M,SHI E,et al.Differentially private conti-nual monitoring of heavy hitters from distributed streams[C]//International Symposium on Privacy Enhancing Technologies Symposium.Berlin:Springer,2012:140-159.
[34]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24(6):757-781.
[1] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[2] 胡安祥, 尹小康, 朱肖雅, 刘胜利.
基于数据流特征的比较类函数识别方法
Strcmp-like Function Identification Method Based on Data Flow Feature Matching
计算机科学, 2022, 49(9): 326-332. https://doi.org/10.11896/jsjkx.220200163
[3] 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙.
数据流概念漂移处理方法研究综述
Survey of Concept Drift Handling Methods in Data Streams
计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112
[4] 李霞, 马茜, 白梅, 王习特, 李冠宇, 宁博.
RIIM:基于独立模型的在线缺失值填补
RIIM:Real-Time Imputation Based on Individual Models
计算机科学, 2022, 49(8): 56-63. https://doi.org/10.11896/jsjkx.210600180
[5] 陈圆圆, 王志海.
基于聚类分区的多维数据流概念漂移检测方法
Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition
计算机科学, 2022, 49(7): 25-30. https://doi.org/10.11896/jsjkx.210600155
[6] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[7] 庞兴龙, 朱国胜.
基于半监督学习的网络流量分析研究
Survey of Network Traffic Analysis Based on Semi Supervised Learning
计算机科学, 2022, 49(6A): 544-554. https://doi.org/10.11896/jsjkx.210600131
[8] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[9] 夏源, 赵蕴龙, 范其林.
基于信息熵更新权重的数据流集成分类算法
Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight
计算机科学, 2022, 49(3): 92-98. https://doi.org/10.11896/jsjkx.210200047
[10] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[11] 董晓梅, 王蕊, 邹欣开.
面向推荐应用的差分隐私方案综述
Survey on Privacy Protection Solutions for Recommended Applications
计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083
[12] 汤世征, 张岩峰.
DragDL:一种易用的深度学习模型可视化构建系统
DragDL:An Easy-to-Use Graphical DL Model Construction System
计算机科学, 2021, 48(8): 220-225. https://doi.org/10.11896/jsjkx.200900045
[13] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[14] 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达.
融合语义位置的差分私有位置隐私保护方法
Differentially Private Location Privacy-preserving Scheme withSemantic Location
计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198
[15] 陈天荣, 凌捷.
基于特征映射的差分隐私保护机器学习方法
Differential Privacy Protection Machine Learning Method Based on Features Mapping
计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!