计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 315-326.doi: 10.11896/jsjkx.221000053

• 信息安全 • 上一篇    下一篇

效用优化的本地差分隐私联合分布估计机制

尹诗玉, 朱友文, 张跃   

  1. 南京航空航天大学计算机科学与技术学院 南京211106
  • 收稿日期:2022-10-09 修回日期:2023-02-15 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 朱友文(zhuyw@nuaa.edu.cn)
  • 作者简介:(shiyu.y@nuaa.edu.cn)
  • 基金资助:
    国家重点研发计划(2020YFB1005900);国家自然科学基金(62172216);江苏省自然科学基金(BK20211180);广西密码学与信息安全重点实验室研究课题(GCIS202107)

Utility-optimized Local Differential Privacy Joint Distribution Estimation Mechanisms

YIN Shiyu, ZHU Youwen, ZHANG Yue   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2022-10-09 Revised:2023-02-15 Online:2023-10-10 Published:2023-10-10
  • About author:YIN Shiyu,born in 1998,postgraduate,is a student member of China Computer Federation.Her main research interests include data security and differential privacy.ZHU Youwen,born in 1986,Ph.D,professor,is a member of China Computer Federation.His main research interests include data security,privacy computing,and applied cryptography.
  • Supported by:
    National Key Research and Development Program of China(2020YFB1005900),National Natural Science Foundation of China(62172216),Natural Science Foundation of Jiangsu Province,China(BK20211180) and Guangxi Key Laboratory of Cryptography and Information Security(GCIS202107).

摘要: 相对于传统的中心化差分隐私,本地差分隐私(Local Differential Privacy,LDP) 具有不依赖可信第三方等优势,但也存在数据效用较低的问题。效用优化本地差分隐私模型ULDP(Utility-Optimized Local Differential Privacy) 利用不同输入的敏感度差异,可以提升估计结果的准确度。二维数据联合分布计算可广泛应用于众多数据分析场景,然而,如何在ULDP模型下实现二维数据联合分布估计,仍然是尚未解决的重要问题。针对这一问题,首先给出了二维ULDP模型的定义,兼顾了两个属性分别敏感与否的4种情况;其次,在该模型下,针对联合分布估计问题,提出了JuRR(Joint Utility-Optimized Randomized Response) 与CPRR(Cartesian Product Randomized Response) 2种机制,并理论证明了估计结果的无偏性;最后,在真实数据集上进行对比实验,讨论了不同参数对估计误差的影响。实验结果表明,所提2种机制具有更高的数据效用。

关键词: 本地差分隐私, 效用优化, 联合分布, 频率估计, 敏感度差异

Abstract: Compared with traditional centralized differential privacy,local differential privacy(LDP) has the advantage of not re-lying on trusted third parties,but it also has the problem of low data utility.The utility-optimized local differential privacy(ULDP) can improve the accuracy of estimation results by taking advantage of the sensitivity differences of different inputs.Two-dimensional data joint distribution calculation can be widely used in many data analysis scenarios.However,how to realize two-dimensional data joint distribution estimation under the ULDP model is still an important problem that has not yet been solved.Aiming at this problem,the definition of the two-dimensional ULDP model is given first,taking into account the four cases of whether the two attributes are sensitive or not.Secondly,under this model,for the joint distribution estimation problem,two mechanisms joint utility-optimized randomized response(JuRR) and cartesian product randomized response(CPRR) are proposed,and the unbiasedness of the estimation results is proved theoretically.Finally,comparative experiments are carried out on real datasets to discuss the influence of different parameters on the estimation error.Experimental results show that the proposed two mechanisms have better data utility.

Key words: Local differential privacy, Utility-optimized, Joint distribution, Frequency estimation, Difference in sensitivity

中图分类号: 

  • TP309
[1]SWEENEY L.k-anonymity:A Model for Protecting Privacy[J].International Journal of Uncertainty,Fuzziness and Know-ledge-Based Systems,2002,10(5):557-570.
[2]MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-diversity:Privacy beyond k-anonymity[C]//22nd International Conference on Data Engineering.2006.
[3]LI N,LI T,VENKATASUBRAMANIAN S.t-closeness:Privacy beyond k-anonymity and l-diversity[C]//IEEE 23rd International Conference on Data Engineering.Istanbul:IEEE Press,2007:106-115.
[4]YAO A C.Protocols for Secure Computations[C]//23rd An-nual Symposium on Foundations of Computer Science.Chicago:IEEE Press,1982:160-164.
[5]DWORK C.Differential privacy:A survey of results[C]//International Conference on Theory and Applications of Models of Computation.Berlin,Heidelberg:Springer Press,2008:1-19.
[6]KASIVISWANATHAN S P,LEE H K,NISSIM K,et al.What Can We Learn Privately?[J].SIAM Journal on Computing,2011,40(3):793-826.
[7]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local Privacy and Statistical Minimax Rates[C]//IEEE 54th Annual Symposium on Foundations of Computer Science.Berkeley:IEEE Press,2013:429-438.
[8]WANG T,BLOCKI J,LI N,et al.Locally Differentially Private Protocols for Frequency Estimation[C]//26th USENIX Security Symposium.Vancouver:USENIX Press,2017:729-745.
[9]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Minimax Optimal Procedures for Locally Private Estimation[J].Journal of the American Statistical Association,2018,113(521):182-201.
[10]WANG N,XIAO X,YANG Y,et al.Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]//IEEE 35th Inter-national Conference on Data Engineering.Macao:IEEE Press,2019:638-649.
[11]ERLINGSSON Ú,PIHUR V,KOROLOVA A.Rappor:Ran-domized Aggregatable Privacy-Preserving Ordinal Response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.Scottsdale:ACM Press,2014:1054-1067.
[12]TEAM A D P.Learning with Privacy at Scale[J].Apple Mach.Learn.J,2017,1(8):1-25.
[13]SAMARATI P,SWEENEY L.Generalizing Data to Provide Anonymity when Disclosing Information[C]//PODS.Seattle:ACM Press,1998,98(188):1045-1145.
[14]YILMAZ E,AL-RUBAIE M,CHANG J M.Naive Bayes Classification under Local Differential Privacy[C]//IEEE 7th International Conference on Data Science and Advanced Analytics.Adelaide:IEEE Press,2020:709-718.
[15]MURAKAMI T,KAWAMOTO Y.Utility-Optimized Local Differential Privacy Mecha-nisms for Distribution Estimation[C]//28th USENIX Security Symposium.Santa Clara:USENIX Press,2019:1877-1894.
[16]CORMODE G,MADDOCK S,MAPLE C.Frequency Estimation under Local Differential Privacy[J].Proceedings of the VLDB Endowment,2021,14(11):2046-2058.
[17]WANG T,LI N,JHA S.Locally Differentially Private Frequent Itemset Mining[C]//2018 IEEE Symposium on Security and Privacy.San Francisco:IEEE Press,2018:127-143.
[18]ZHU S X,WANG L,SUN G L.A Perturbation Mechanism for Classified Transformation Satisfying Local Differential Privacy [J].Journal of Computer Research and Development,2022,59(2):430-439.
[19]ZHANG Z,WANG T,LI N,et al.Calm:Consistent Adaptive Local Marginal for Marginal Release under Local Differential Privacy[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM Press,2018:212-229.
[20]YE Q,HU H,MENG X,et al.PrivKV:Key-Value Data Collection with Local Differential Privacy[C]//2019 IEEE Symposium on Security and Privacy.San Francisco:IEEE Press,2019:317-331.
[21]GU X,LI M,CHENG Y,et al.PCKV:Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility[C]//29th USENIX Security Symposium.Boston:USENIX Press,2020:967-984.
[22]YE Q,HU H,MENG X,et al.PrivKVM*:Revisiting Key-Va-lue Statistics Estimation with Local Differential Privacy[J].IEEE Transactions on Dependable and Secure Computing,2021,14(8):1-18.
[23]TIAN F,WU Z Q,LU L F,et al.A Sample Based Personalized Differential Privacy Mechanism for Trajectory Data Publication[J].Chinese Journal of Computers,2021,44(4):709-723.
[24]BALLE B,BELL J,GASCÓN A,et al.The Privacy Blanket of The Shuffle Model[C]//Annual International Cryptology Conference.Cham:Springer Press,2019:638-667.
[25]LIU Y F,WANG N,WANG Z G.Collecting and AnalyzingMultidimensional Categorical Data Under Shuffled Differential Privacy[J].Journal of Software,2022,33(3):1093-1110.
[26]GU X,LI M,XIONG L,et al.Providing Input-Discriminative Protection for Local Differential Privacy[C]//IEEE 36th International Conference on Data Engineering.Dallas:IEEE Press,2020:505-516.
[27]UCI MACHINE LEARNING REPOSITORY.Nursery Data Set [DB/OL].[2022-05-16].https://archive.ics.uci.edu/ml/datasets/Nursery.
[28]KAGGLE.Performance in Exams [DB/OL].[2022-06-25].https://www.kaggle.com/datasets/spscientist/students-perfor-mance-in-exams.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!