计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 322-332.doi: 10.11896/jsjkx.230600142
蔡梦男1,2, 沈国华1,2,3, 黄志球1,2,3, 杨阳1,2
CAI Mengnan1,2, SHEN Guohua1,2,3, HUANG Zhiqiu1,2,3, YANG Yang1,2
摘要: 从众多用户收集的高维数据可用性越来越高,庞大的高维数据涉及用户个人隐私,如何在使用高维数据的同时保护用户的隐私极具挑战性。文中主要关注本地差分隐私下的高维数据发布问题。现有的解决方案首先构建概率图模型,生成输入数据的一组带噪声的低维边缘分布,然后使用它们近似输入数据集的联合分布以生成合成数据集。然而,现有方法在计算大量属性对的边缘分布构建概率图模型,以及计算概率图模型中规模较大的属性子集的联合分布时存在局限性。基于此,提出了一种本地差分隐私下的高维数据发布方法PrivHDP(High-dimensional Data Publication Under Local Differential Privacy)。首先,该方法使用随机采样响应代替传统的隐私预算分割策略扰动用户数据,提出自适应边缘分布计算方法计算成对属性的边缘分布构建Markov网。其次,使用新的方法代替互信息度量成对属性间的相关性,引入了基于高通滤波的阈值过滤技术缩减概率图构建过程的搜索空间,结合充分三角化操作和联合树算法获得一组属性子集。最后,基于联合分布分解和冗余消除,计算属性子集上的联合分布。在4个真实数据集上进行实验,结果表明,PrivHDP 算法在k-way 查询和 SVM 分类精度方面优于同类算法,验证了所提方法的可用性与高效性。
中图分类号:
[1]SHEN H,ZHANG M W,SHEN J.Efficient privacy-preserving cube-data aggregation scheme for smart grids[J].IEEE Tran-sactions on Information Forensics and Security,2017,12(6):1369-1381. [2]NAVEED M,AYDAY E,CLAYTON E W,etal.Privacy in the genomic era[J].ACM Computing Surveys(CSUR),2015,48(1):1-44. [3]KHAN F,REHMAN A U,ZHENG J,et al.Mobile crowdsen-sing:A survey on privacy-preservation,task management,assignment models,and incentives mechanisms[J].Future Generation Computer Systems,2019,100:456-472. [4]DWORK C.Differential privacy[C]//International colloquium on automata,languages,and programming.Berlin:Springer,2006:1-12. [5]DWORK C,ROTH A.The algorithmic foundations ofdifferential privacy[J].Foundations and Trends© in Theoretical Computer Science,2014,9(3/4):211-407. [6]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.IEEE,2013:429-438. [7]REN X B,YU C M,YU W,et al.LoPub:high-dimensionalcrowdsourced data publication with local differential privacy[J].IEEE Transactions on Information Forensics and Security,2018,13(9):2151-2166. [8]ZHANG J,CORMODE G,PROCOPIUC C M,et al.Privbayes:Private data release via bayesian networks[J].ACM Transactions on Database Systems(TODS),2017,42(4):1-41. [9]CHEN R,XIAO Q,ZHANG Y,et al.Differentially private high-dimensional data publication via sampling-based inference[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2015:129-138. [10]CHEN X,LIU J,FENG X Q,et al.Differentially Private Synthetic Dataset Releasing Algorithm Based on Naive Bayes[J].Computer Science,2015,42(1):236-238. [11]ZHANG Z K,WANG T H,LI N H,et al.PrivSyn:Differentially Private Data Synthesis[C]//30th USENIX Security Symposium(USENIX Security 21).2021:929-946. [12]ZHANG X J,CHEN L,JIN K Z,et al.Private High-DimensionalData Publication with Junction Tree [J].Computer Research and Development,2018,55(12):2794-2809. [13]LI H,XIONG L,JIANG X.Differentially private synthesization of multi-dimensional data using copula functions[C]//Advances in Database Technology:Proceedings.International Conference on Extending Database Technology.NIH Public Access,2014. [14]CAI K T,LEI X Y,WEI J X,et al.DataSynthesis via Differen-tially Private Markov Random Fields[C]//Proceedings of the VLDB Endowment.New York:ACM,2021:2190-2202. [15]GIUSEPPE V,GRACE T,MARK B,et al.New oracle-efficient algorithms for private synthetic data release[C]//International Conference on Machine Learning(ICML).Vienna:ACM,2020:9765-9774. [16]FANTI G,PIHUR V,ERLINGSSON L.Building a RAPPORwith the Unknown:Privacy-Preserving Learning of Associations and Data Dictionaries[C]//Proceedings on Privacy Enhancing Technologies.2016:41-61. [17]REN X B,YU C M,YU W,et al.High-dimensional crowd-sourced data distribution estimation with local privacy[C]//2016 IEEE International Conference on Computer and Information Technology(CIT).IEEE,2016:226-233. [18]CORMODE G,KULKARNI T,SRIVASTAVA D.Marginal release under local differential privacy[C]//Proceedings of the 2018 International Conference on Management of Data.2018:131-146. [19]WANG T,YANG X Y,REN X B,et al.Locally private high-dimensional crowdsourced data release based on copula functions[J].IEEE Transactions on Services Computing,2019,15(2):778-792. [20]DU L K,ZHANG Z K,BAI S J,et al.AHEAD:adaptive hierarchical decomposition for range query under local differential privacy[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.2021:1266-1288. [21]KULKARNI T.Answering range queries under local differential privacy[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1832-1834. [22]LIU L K,ZHOU C L.RCP:Mean Value Protection Technology Under Local Differential Privacy[J].Computer Science,2023,50(2):333-345. [23]WANG N,XIAO X K,YANG Y,et al.Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).IEEE,2019:638-649. [24]WARNER S L.Randomized response:A survey technique foreliminating evasive answer bias[J].Journal of the American Statistical Association,1965,60(309):63-69. [25]KAIROUZ P,BONAWITZ K,RAMAGE D.Discrete distribu-tion estimation under local privacy[C]//International Confe-rence on Machine Learning.PMLR,2016:2436-2444. [26]WANG T H,BLOCKI J,LI N H,et al.Locally differentiallyprivate protocols for frequency estimation[C]//26th USENIX Security Symposium(USENIX Security 17).2017:729-745. [27]QARDAJI W,YANG W N,LI N H.Priview:practical differentially private release of marginal contingency tables[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.2014:1435-1446. |
|