Computer Science ›› 2022, Vol. 49 ›› Issue (10): 344-352.doi: 10.11896/jsjkx.210700242

• Information Security • Previous Articles     Next Articles

Adaptive Histogram Publishing Algorithm for Sliding Window of Data Stream

WANG Xiu-jun1,2,3, MO Lei1,3, ZHENG Xiao1,2,3, GAO Yun-quan1,2,3   

  1. 1 School of Computer Science and Technology,Anhui University of Technology,Maanshan,Anhui 243032,China
    2 Institute for Artificial Intelligence,Hefei Comprehensive National Science Center,Hefei 230091,China
    3 Anhui Engineering Laboratory for Intelligent Applications and Security ofIndustrial Internet,Maanshan,Anhui 243032,China
  • Received:2021-07-25 Revised:2021-12-08 Online:2022-10-15 Published:2022-10-13
  • About author:WANG Xiu-jun,born in 1983,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include RFID system tag management and data stream pro-cessing.
  • Supported by:
    University Synergy Innovation Program of Anhui Province(GXXT-2020-012),National Natural Science Foundation of China(61402008,61702006,61672038),Provincial Key Research and Development Program of Anhui Province(202004a05020009,201904a05020071), Natural Science Foundation of Anhui Province(2108085MF218), University Natural Science Research Project of Anhui Province(KJ2020A0249,KJ2020A0250) and Open Fund of Key Laboratory of Anhui Higher Education Institutes(CS2020-06).

Abstract: As one of the most effective privacy protection mechanisms,differential privacy has been widely used in many fields.The existing histogram publishing methods for either static data set or dynamic data set mainly protect the privacy of sliding windows in data streams by adding unified noise.This leads to low data availability,high time complexity and weak privacy protection in their practical applications.In this paper,we tackle this problem by integrating the approximate counting techniques into the differential privacy and proposing an adaptive histogram publishing method for sliding window(APS).Firstly,the proposed APS predicates the distributional information of the sliding windows in the data stream by using an approximate counting method.Secondly,it computes an appropriate value suitable for publishing by checking the difference between estimated values and actual values.Finally,it reduces statistical errors within each interval by clustering.Theoretical analysis shows that the APS algorithm can effectively improve data availability and reduce running time while reducing the privacy budget.Experimental results on two different real data sets also verify the superiority of APS algorithm over existing grouping-based histogram publishing algorithms in terms of data availability and running time.

Key words: Differential privacy, Sliding window, Data stream, Histogram publishing, Approximation error, Laplacian error

CLC Number: 

  • TP309
[1]FAN L,BONOMI L,XIONG L,et al.Monitoring web browsing behavior with differential privacy[C]//Proceedings of the 23rd International Conference on World Wide Web.2014:177-188.
[2]CELA A,JURIK T,HAMOUCHE R,et al.Energy optimalreal-time navigation system[J].IEEE Intelligent Transportation Systems Magazine,2014,6(3):66-79.
[3]ALMADANI B,SAEED B,ALROUBAIY A.Healthcare sys-tems integration using real time publish subscribe(RTPS) middleware[J].Computers & Electrical Engineering,2016,50(C):67-78.
[4]DWORK C.Differential privacy[C]//Proceedings of the 33rdInternational Conference on Automata,Languages and Programming-Volume Part II.Springer-Verlag,2006:1-12.
[5]LUO Z,WU D J,ADELI E,et al.Scalable Differential Privacy With Sparse Network Finetuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:5059-5068.
[6]YANG W,SUN Y E,HUANG H,et al.Persistent transportation traffic volume estimation with differential privacy[J].Journal of Ambient Intelligence and Humanized Computing,2021,12(1):213-231.
[7]FICEK J,WANG W,CHEN H,et al.Differential privacy inhealth research:A scoping review[J].Journal of the American Medical Informatics Association,2021,28(10):2269-2276.
[8]SUN Z,WANG Y,SHU M,et al.Differential privacy for data and model publishing of medical data[J].IEEE Access,2019,7:152103-152114.
[9]FAN L,XIONG L,SUNDERAM V.Differentially private multi-dimensional time series release for traffic monitoring[C]//IFIP Annual Conference on Data and Applications Security and Privacy.Berlin:Springer,2013:33-48.
[10]LIU J Q,ZHANG C,FANG Y.Epic:A differential privacyframework to defend smart homes against internet traffic analysis[J].IEEE Internet of Things Journal,2018,5(2):1206-1217.
[11]LI H R,XIONG L,JIANG X Q.Differentially private histogramand synthetic data publication[M]//Medical Data Privacy Handbook.Cham:Springer,2015:35-58.
[12]QU J J,CAI Y,XIA H K.A survey of differential privacy protection research for dynamic data release[J].Journal of Beijing University of Information Science and Technology(Natural Science Edition),2019,34(6):30-36.
[13]XU J,ZHANG Z,XIAO X,et al.Differentially private histo-gram publication[J].The VLDB Journal,2013,22(6):797-822.
[14]ACS G,CASTELLUCCIA C,CHEN R.Differentially privatehistogram publishing through lossy compression[C]//IEEE the 12th International Conference on Data Mining.IEEE,2012:1-10.
[15]ZHANG X J,SHAO C,MENG X F.Accurate histogram release under differential privacy [J].Journal of Computer Research and Development,2016,53(5):1106-1117.
[16]TANG H X,YANG G,BAI Y L.Histogram publishing algorithm based on adaptive differential privacy budget allocation strategy[J].Application Research of Computers,2020,37(7):1952-1957,1963.
[17]CHAN T H H,SHI E,SONG D.Private and continual release of statistics[J].ACM Transactions on Information and System Security(TISSEC),2011,14(3):1-24.
[18]RASTOGI V,NATH S.Differentially private aggregation ofdistributed time-series with transformation and encryption[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data.2010:735-746.
[19]LI H F,LEE S Y,SHAN M K.Online mining(recently) maximal frequent itemsets over data streams[C]//The 15th International Workshop on Research Issues in Data Engineering:Stream Data Mining and Applications(RIDE-SDMA'05).IEEE,2005:11-18.
[20]NGUYEN H L,WOON Y K,NG W K.A survey on data stream clustering and classification[J].Knowledge and Information Systems,2015,45(3):535-569.
[21]LIN F P,WU Y J,WANG Y L,et al.Differentially private statistical publication for two-dimensional data stream [J].Journal of Computer Applications,2015,35(1):88-92.
[22]ZHANG X J,MENG X F.Streaming histogram publication me-thod with differential privacy[J].Journal of Software,2016,27(2):381-393.
[23]LI Y,LI S.Research on Differential Private Streaming Histogram Publication Algorithm[C]//2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS).IEEE,2018:598-603.
[24]GAO R C,MA X B.Dynamic data histogram publishing basedon differential privacy[C]//2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications,Ubiquitous Computing & Communications,Big Data & Cloud Computing,Social Computing & Networking,Sustainable Computing & Communications(ISPA/IUCC/BDCloud/SocialCom/SustainCom).IEEE,2018:737-743.
[25]WU X,TONG N,YE Z,et al.Histogram Publishing Algorithm Based on Sampling Sorting and Greedy Clustering[C]//International Conference on Blockchain and Trustworthy Systems.Springer,Singapore,2019:81-91.
[26]SCOTT D W.Histogram[J].Wiley Interdisciplinary Reviews:Computational Statistics,2010,2(1):44-48.
[27]CHEN Z,ZHANG A.A survey of approximate quantile computation on large-scale data[J].IEEE Access,2020,8:34585-34597.
[28]YANG L,CAO J,YUAN Y,et al.A framework for partitioning and execution of data stream applications in mobile cloud computing[J].ACM SIGMETRICS Performance Evaluation Review,2013,40(4):23-32.
[29]LI Y,ORGERIE A C,RODERO I,et al.End-to-end energymodels for Edge Cloud-based IoT platforms:Application to data stream analysis in IoT[J].Future Generation Computer Systems,2018,87:667-678.
[30]TANG Y,GEDIK B.Autopipelining for data stream processing[J].IEEE Transactions on Parallel and Distributed Systems,2012,24(12):2344-2354.
[31]KOLAJO T,DARAMOLA O,ADEBIYI A.Big data stream analysis:a systematic literature review[J].Journal of Big Data,2019,6(1):1-30.
[32]GAROFALAKIS M,GEHRKE J,RASTOGI R.Data StreamManagement:Processing High-Speed Data Streams[J/OL].http://www.springer.com/-SGWID=0-102-1297-72039114-0.
[33]CHAN T H H,LI M,SHI E,et al.Differentially private conti-nual monitoring of heavy hitters from distributed streams[C]//International Symposium on Privacy Enhancing Technologies Symposium.Berlin:Springer,2012:140-159.
[34]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24(6):757-781.
[1] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[2] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[3] LI Xia, MA Qian, BAI Mei, WANG Xi-te, LI Guan-yu, NING Bo. RIIM:Real-Time Imputation Based on Individual Models [J]. Computer Science, 2022, 49(8): 56-63.
[4] CHEN Yuan-yuan, WANG Zhi-hai. Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition [J]. Computer Science, 2022, 49(7): 25-30.
[5] HUANG Jue, ZHOU Chun-lai. Frequency Feature Extraction Based on Localized Differential Privacy [J]. Computer Science, 2022, 49(7): 350-356.
[6] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[7] XIA Yuan, ZHAO Yun-long, FAN Qi-lin. Data Stream Ensemble Classification Algorithm Based on Information Entropy Updating Weight [J]. Computer Science, 2022, 49(3): 92-98.
[8] KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong. Review of K-means Algorithm Optimization Based on Differential Privacy [J]. Computer Science, 2022, 49(2): 162-173.
[9] DONG Xiao-mei, WANG Rui, ZOU Xin-kai. Survey on Privacy Protection Solutions for Recommended Applications [J]. Computer Science, 2021, 48(9): 21-35.
[10] SUN Lin, PING Guo-lou, YE Xiao-jun. Correlation Analysis for Key-Value Data with Local Differential Privacy [J]. Computer Science, 2021, 48(8): 278-283.
[11] ZHANG Xue-jun, YANG Hao-ying, LI Zhen, HE Fu-cun, GAI Ji-yang, BAO Jun-da. Differentially Private Location Privacy-preserving Scheme withSemantic Location [J]. Computer Science, 2021, 48(8): 300-308.
[12] CHEN Tian-rong, LING Jie. Differential Privacy Protection Machine Learning Method Based on Features Mapping [J]. Computer Science, 2021, 48(7): 33-39.
[13] GONG Jian-feng. Resisting Power Analysis Algorithm of Scalar Multiplication Based on Signed Sliding Window [J]. Computer Science, 2021, 48(6A): 533-537.
[14] WANG Le-ye. Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities [J]. Computer Science, 2021, 48(6): 301-305.
[15] PENG Chun-chun, CHEN Yan-li, XUN Yan-mei. k-modes Clustering Guaranteeing Local Differential Privacy [J]. Computer Science, 2021, 48(2): 105-113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!