Computer Science ›› 2024, Vol. 51 ›› Issue (2): 322-332.doi: 10.11896/jsjkx.230600142

• Information Security • Previous Articles     Next Articles

High-dimensional Data Publication Under Local Differential Privacy

CAI Mengnan1,2, SHEN Guohua1,2,3, HUANG Zhiqiu1,2,3, YANG Yang1,2   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Key Laboratory of Safety-Critical Software,Ministry of Industry and Information Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    3 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210093,China
  • Received:2023-06-16 Revised:2023-11-15 Online:2024-02-15 Published:2024-02-22
  • About author:CAI Mengnan,born in 2000,postgra-duate.Her main research interests include privacy protection and deep lear-ning.SHEN Guohua,born in 1976,Ph.D,associate professor.His main research interests include software engineering,privacy protection,semantic web and description logic.
  • Supported by:
    National Natural Science Foundation of China(U2241216,61772270) and Opening Fund of Key Laboratory of Civil Aviation Emergency Science&Technology(CAAC)(NJ2022022).

Abstract: With the increasing availability of high-dimensional data collected from numerous users,preserving user privacy while utilizing high-dimensional data poses significant challenges.This paper focuses on the problem of high-dimensional data publication under local differential privacy.State-of-the-art solutions first construct probabilistic graphical models to generate a set of noisy low-dimensional marginal distributions of the input data,and then use them to approximate the joint distribution of the input dataset for generating synthetic datasets.However,existing methods have limitations in computing marginal distributions for a large number of attribute pairs to construct probabilistic graphical models,as well as in calculating joint distributions for attribute subsets within the probabilistic graphical models.To address these limitations,this paper proposes a method PrivHDP(high-dimensional data publication under local differential privacy) for high-dimensional data publication under local differential privacy.Firstly,it uses random sampling response instead of the traditional privacy budget splitting strategy to perturb user data.It proposes an adaptive marginal distribution computation method to compute the marginal distributions of pairwise attributes and construct a Markov network.Secondly,it employs a novel method to measure the correlation between pairwise attributes,replacing mutual information.This method introduces a threshold technique based on high-pass filtering to reduce the search space during the construction of the probabilistic graphical model.It combines sufficient triangulation operations and a joint tree algorithm to obtain a set of attribute subsets.Finally,based on joint distribution decomposition and redundancy elimination,the proposed method computes the joint distribution over attribute subsets.Experimental results on four real datasets demonstrate that the PrivHDP algorithm outperforms similar algorithms in terms of k-way query and SVM classification accuracy,validating its effectiveness and efficiency.

Key words: Local differential privacy, High-dimensional data, Data publication, Marginal distribution, Joint distribution

CLC Number: 

  • TP309
[1]SHEN H,ZHANG M W,SHEN J.Efficient privacy-preserving cube-data aggregation scheme for smart grids[J].IEEE Tran-sactions on Information Forensics and Security,2017,12(6):1369-1381.
[2]NAVEED M,AYDAY E,CLAYTON E W,etal.Privacy in the genomic era[J].ACM Computing Surveys(CSUR),2015,48(1):1-44.
[3]KHAN F,REHMAN A U,ZHENG J,et al.Mobile crowdsen-sing:A survey on privacy-preservation,task management,assignment models,and incentives mechanisms[J].Future Generation Computer Systems,2019,100:456-472.
[4]DWORK C.Differential privacy[C]//International colloquium on automata,languages,and programming.Berlin:Springer,2006:1-12.
[5]DWORK C,ROTH A.The algorithmic foundations ofdifferential privacy[J].Foundations and Trends© in Theoretical Computer Science,2014,9(3/4):211-407.
[6]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.IEEE,2013:429-438.
[7]REN X B,YU C M,YU W,et al.LoPub:high-dimensionalcrowdsourced data publication with local differential privacy[J].IEEE Transactions on Information Forensics and Security,2018,13(9):2151-2166.
[8]ZHANG J,CORMODE G,PROCOPIUC C M,et al.Privbayes:Private data release via bayesian networks[J].ACM Transactions on Database Systems(TODS),2017,42(4):1-41.
[9]CHEN R,XIAO Q,ZHANG Y,et al.Differentially private high-dimensional data publication via sampling-based inference[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2015:129-138.
[10]CHEN X,LIU J,FENG X Q,et al.Differentially Private Synthetic Dataset Releasing Algorithm Based on Naive Bayes[J].Computer Science,2015,42(1):236-238.
[11]ZHANG Z K,WANG T H,LI N H,et al.PrivSyn:Differentially Private Data Synthesis[C]//30th USENIX Security Symposium(USENIX Security 21).2021:929-946.
[12]ZHANG X J,CHEN L,JIN K Z,et al.Private High-DimensionalData Publication with Junction Tree [J].Computer Research and Development,2018,55(12):2794-2809.
[13]LI H,XIONG L,JIANG X.Differentially private synthesization of multi-dimensional data using copula functions[C]//Advances in Database Technology:Proceedings.International Conference on Extending Database Technology.NIH Public Access,2014.
[14]CAI K T,LEI X Y,WEI J X,et al.DataSynthesis via Differen-tially Private Markov Random Fields[C]//Proceedings of the VLDB Endowment.New York:ACM,2021:2190-2202.
[15]GIUSEPPE V,GRACE T,MARK B,et al.New oracle-efficient algorithms for private synthetic data release[C]//International Conference on Machine Learning(ICML).Vienna:ACM,2020:9765-9774.
[16]FANTI G,PIHUR V,ERLINGSSON L.Building a RAPPORwith the Unknown:Privacy-Preserving Learning of Associations and Data Dictionaries[C]//Proceedings on Privacy Enhancing Technologies.2016:41-61.
[17]REN X B,YU C M,YU W,et al.High-dimensional crowd-sourced data distribution estimation with local privacy[C]//2016 IEEE International Conference on Computer and Information Technology(CIT).IEEE,2016:226-233.
[18]CORMODE G,KULKARNI T,SRIVASTAVA D.Marginal release under local differential privacy[C]//Proceedings of the 2018 International Conference on Management of Data.2018:131-146.
[19]WANG T,YANG X Y,REN X B,et al.Locally private high-dimensional crowdsourced data release based on copula functions[J].IEEE Transactions on Services Computing,2019,15(2):778-792.
[20]DU L K,ZHANG Z K,BAI S J,et al.AHEAD:adaptive hierarchical decomposition for range query under local differential privacy[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.2021:1266-1288.
[21]KULKARNI T.Answering range queries under local differential privacy[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1832-1834.
[22]LIU L K,ZHOU C L.RCP:Mean Value Protection Technology Under Local Differential Privacy[J].Computer Science,2023,50(2):333-345.
[23]WANG N,XIAO X K,YANG Y,et al.Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).IEEE,2019:638-649.
[24]WARNER S L.Randomized response:A survey technique foreliminating evasive answer bias[J].Journal of the American Statistical Association,1965,60(309):63-69.
[25]KAIROUZ P,BONAWITZ K,RAMAGE D.Discrete distribu-tion estimation under local privacy[C]//International Confe-rence on Machine Learning.PMLR,2016:2436-2444.
[26]WANG T H,BLOCKI J,LI N H,et al.Locally differentiallyprivate protocols for frequency estimation[C]//26th USENIX Security Symposium(USENIX Security 17).2017:729-745.
[27]QARDAJI W,YANG W N,LI N H.Priview:practical differentially private release of marginal contingency tables[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.2014:1435-1446.
[1] LIU Likang, ZHOU Chunlai. RCP:Mean Value Protection Technology Under Local Differential Privacy [J]. Computer Science, 2023, 50(2): 333-345.
[2] CAO Dongtao, SHU Wenhao, QIAN Jin. Feature Selection Algorithm Based on Rough Set and Density Peak Clustering [J]. Computer Science, 2023, 50(10): 37-47.
[3] YIN Shiyu, ZHU Youwen, ZHANG Yue. Utility-optimized Local Differential Privacy Joint Distribution Estimation Mechanisms [J]. Computer Science, 2023, 50(10): 315-326.
[4] SHI Kun, ZHOU Yong, ZHANG Qi-liang, JIANG Shun-rong. Privacy-preserving Scheme of Energy Trading Data Based on Consortium Blockchain [J]. Computer Science, 2022, 49(11): 335-344.
[5] LIU Yi, MAO Ying-chi, CHENG Yang-kun, GAO Jian, WANG Long-bao. Locality and Consistency Based Sequential Ensemble Method for Outlier Detection [J]. Computer Science, 2022, 49(1): 146-152.
[6] SUN Lin, PING Guo-lou, YE Xiao-jun. Correlation Analysis for Key-Value Data with Local Differential Privacy [J]. Computer Science, 2021, 48(8): 278-283.
[7] ZHOU Gang, GUO Fu-liang. Research on Ensemble Learning Method Based on Feature Selection for High-dimensional Data [J]. Computer Science, 2021, 48(6A): 250-254.
[8] PENG Chun-chun, CHEN Yan-li, XUN Yan-mei. k-modes Clustering Guaranteeing Local Differential Privacy [J]. Computer Science, 2021, 48(2): 105-113.
[9] WU Ying-jie, HUANG Xin, GE Chen, SUN Lan. Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication [J]. Computer Science, 2019, 46(9): 99-105.
[10] LIU Peng, YE Bin. Linear Discriminant Analysis of High-dimensional Data Using Random Matrix Theory [J]. Computer Science, 2019, 46(6A): 423-426.
[11] LI Meng-jie, XIE Qiang and DING Qiu-lin. Orthogonal Non-negative Matrix Factorization for K-means Clustering [J]. Computer Science, 2016, 43(5): 204-208.
[12] XU Tong-de. High-dimensional Data Discretization Method Based on Improved LLE [J]. Computer Science, 2015, 42(Z6): 146-150.
[13] WANG Wei, LI Lei and ZHANG Zhi-hong. Improvement of C4.5 Algorithm with Free Noise Capacity [J]. Computer Science, 2015, 42(12): 268-271.
[14] HE Jin-rong,DING Li-xin,HU Qing-hui and LI Zhao-kui. Properties of High-dimensional Data Space and Metric Choice [J]. Computer Science, 2014, 41(3): 212-217.
[15] YAN Zhen, PI De-chang,WU Wen-hao. Research on Frequent Itemsets Mining Algorithm Based on High-dimensional Sparse Dataset [J]. Computer Science, 2011, 38(6): 183-186.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!