计算机科学 ›› 2023, Vol. 50 ›› Issue (4): 265-276.doi: 10.11896/jsjkx.220500292
赵禹齐, 杨敏
ZHAO Yuqi, YANG Min
摘要: 在过去的十年里,普遍的数据收集已经成为常态。随着大规模数据分析和机器学习的快速发展,数据隐私正面临着根本性的挑战。探索隐私保护和数据收集与分析之间的权衡是一个关键的科学问题。差分隐私已经成为实际上的数据隐私标准并得到了广泛的研究与应用,该技术可通过一定的随机化机制为用户数据提供严格的隐私保护。文中给出了差分隐私技术的全面概述,总结并分析了差分隐私的最新进展。具体来说,首先给出了差分隐私的理论总结,包括中心化模型、本地化模型和近年提出的洗牌模型,并对它们作了详细比较,分析了不同模型的优势和缺点。接着,在3个模型的基础上,从算法的角度介绍并分析了文献中一些典型的差分隐私机制,然后介绍了当前差分隐私技术在多个领域的应用。最后介绍了一些关于差分隐私的新研究课题,它们为差分隐私技术拓展了丰富的研究方向。
中图分类号:
[1]CPC CENTRAL COMMITTEE and STATE COUNCIL.Opi-nions on building a more perfect market-oriented allocation system and mechanism of factors[EB/OL].(2020-04-09)[2022-05-22].http://www.gov.cn/zhengce/2020-04/09/content_5500622.htm. [2]EUROPEAN UNION.General Data Protection Regulation[EB/OL].(2018-05-25)[2022-05-22].https://gdpr-info.eu/. [3]THE NATIONAL PEOPLE’S CONGRESS OF CHINA.Data Security Law of the People’s Republic of China[EB/OL].(2021-06-10)[2022-05-22].http://www.npc.gov.cn/npc/c30834/202106/7c9af12f51334a73b56d7938f99a788a.shtml. [4]DWORK C,KENTHAPADI K,MCSHERRY F,et al.Our Data,Ourselves:Privacy Via Distributed Noise Generation[C]//International Conference on Advances in Cryptology.Berlin:Springer,2006:486-503. [5]TENG W,ZHANG X F,FENG J Y,et al.A Comprehensive Survey on Local Differential Privacy Toward Data Statistics and Analysis in Crowdsensing[J].Sensors.2020,20(24):7030-7077. [6]KASIVISWANATHAN S P,LEE H K,NISSIM K,et al.What can we learn privately?[J].SIAM Journal on Computing,2011,40(3):793-826. [7]BITTAU A,ERLINGSSON Ú,MANIATIS P,et al.Prochlo:Strong Privacy for Analytics in the Crowd[C]//Proceedings of the 26th Symposium on Operating Systems Principles.New York:ACM,2017:441-459. [8]CHEU A,SMITH A,ULLMAN J,et al.Distributed Differential Privacy via Shuffling[C]//Advances in Cryptology-EUROCRYPT 2019.Berlin:Springer,2019:375-403. [9]CHAN T H H,SHI E,SONG D.Optimal Lower Bound for Differentially Private Multi-party Aggregation[C]//Algorithms- ESA 2012.ESA 2012.Berlin:Springer,2012:277-288. [10]BEIMEL A,NISSIM K,OMRI E.Distributed Private DataAnalysis:Simultaneously Solving How and What[C]//Advances in Cryptology-CRYPTO 2008.Berlin:Springer,2008:451-468. [11]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating Noise to Sensitivity in Private Data Analysis[C]//Theory of Cryptography.Berlin:Springer,2006:265-284. [12]ULLMAN J.Tight lower bounds for locally differentially private selection.[EB/OL].(2018-03-09)[2022-05-22].https://www.thetalkingmachines.com/sites/default/files/feeds/2018/02/10/15/1802.02638.pdf. [13]BAFNA M,ULLMAN J.The price of selection in differential privacy[C]//Proceedings of The 30th Conference on Learning Theory.New York:PMLR,2017:151-168. [14]MCSHERRY F,TALWAR K.Mechanism Design via Differen-tial Privacy[C]//Proceedings of the 48th Annual IEEE Sympo-sium on Foundations of Computer Science.Piscataway:IEEE,2007:94-103. [15]VADHAN S.The Complexity of Differential Privacy[M].Berlin:Springer,2017:347-450. [16]BASSILY R,SMITH A.Local,Private,Efficient Protocols forSuccinct Histograms[C]//Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing.New York:ACM,2015:127-135. [17]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Found.Trends Theor.Comput.Sci.,2014,9(3/4):211-407. [18]WARNER S L.Randomized Response:A Survey Technique for Eliminating Evasive AnswerBias[J].Journal of the American Statistical Association,1965,60(309):63-69. [19]KAIROUZ P,BONAWITZ K,RAMAGE D.Discrete distribu-tion estimation under local privacy[C]//International Confe-rence on Machine Learning.New York:PMLR,2016:2436-2444. [20]WANG S W,HUANG L S,WANG P Z,et al.Private Weighted Histogram Aggregation in Crowdsourcing[C]//Wireless Algorithms,Systems,and Applications.Berlin:Springer,2016:250-261. [21]WANG T H,BLOCKI J,LI N H,et al.Locally differentiallyprivate protocols for frequency estimation[C]//Proceedings of the 26th USENIX Conference on Security Symposium.Berkeley:USENIX Association,2017:729-745. [22]WANG S W,HUANG L S,NIE Y W,et al.Local Differential Private Data Aggregation for Discrete Distribution Estimation[J].IEEE Transactions on Parallel and Distributed Systems,2019,30(9):2046-2059. [23]ACHARYA J,SUN Z,ZHANG H.Hadamard response:Estimating distributions privately,efficiently,and with little communication[C]//The 22nd International Conference on Artificial Intelligence and Statistics.PMLR,2019:1120-1129. [24]YE Q Q,HU H B,MENG X F,et al.PrivKV:Key-value data collection with local differential privacy[C]//2019 IEEE Symposium on Security and Privacy(SP).Piscataway,NJ:IEEE,2019:317-331. [25]GU X L,LI M,CHENG Y Q,et al.PCKV:Locally Differentially Private Correlated Key-ValueData Collection with Optimized Utility[C]//29th USENIX Security Symposium(USENIX Security 20).Berkeley:USENIX,2020:967-984. [26]WANG N,XIAO X K,YANG Y,et al.Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering(ICDE).IEEE,2019:638-649. [27]WANG T,ZHAO J,HU Z,et al.Local Differential Privacy for data collection and analysis[J].Neurocomputing,2021,426(8):114-133. [28]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Minimax optimal procedures for locally private estimation[J].Journal of the American Statistical Association,2018,113(521):182-201. [29]LI N H,QARDAJI W,SU D.On sampling,anonymization,and differential privacy or,k-anonymization meets differential privacy[C]//Proceedings of the 7th ACM Symposium on Information,Computer and Communications Security.2012:32-33. [30]ARCOLEZI H H,COUCHOT J F,AL BOUNA B,et al.Random sampling plus fake data:Multidimensional frequency estimates with local differential privacy[C]//Proceedings of the 30th ACM International Conference on Information & Know-ledge Management.2021:47-57. [31]WANG T H,LOPUHAÄ-ZWAKENBERG M,LI Z T,et al.Locally differentially private frequency estimation with consistency[C]//27th Annual Network and Distributed System Security Symposium.The Internet Society,2020. [32]JIA J Y,GONG N Z.Calibrate:Frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge[C]//IEEE INFOCOM 2019-IEEE Conference on Computer Communications.Piscataway,NJ:IEEE,2019:2008-2016. [33]BELL J H,BONAWITZ K A,GASCÓN A,et al.Secure single-server aggregation with(poly)logarithmic overhead[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:1253-1269. [34]BALLE B,BELL J,GASCÓN A,et al.The privacy blanket of the shuffle model[C]//Annual International Cryptology Conference.Berlin:Springer,2019:638-667. [35]BALLE B,BELL J,GASCÓN A,et al.Private Summation in the Multi-Message Shuffle Model[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:657-676. [36]ISHAI Y,KUSHILEVITZ E,OSTROVSKY R,et al.Crypto-graphy from anonymity[C]//2006 47th Annual IEEE Sympo-sium on Foundations of Computer Science(FOCS’06).Pisca-taway,NJ:IEEE,2006:239-248. [37]CHAUDHURI K,MONTELEONI C.Privacy-preserving logistic regression[C]//Advances in Neural Information Processing Systems 21,Proceedings of the Twenty-Second Annual Confe-rence on Neural Information Processing Systems.New York:Curran Associates Inc.2008:289-296. [38]ZHANG J,ZHANG Z J,XIAO X K,et al.Functional Mechanism:Regression Analysis under Differential Privacy[J].Proceedings of the VLDB Endowment,2012,5(11):1364-1375. [39]WU N,FAROKHI F,SMITH D,et al.The value of collaboration in convex machine learning with differential privacy[C]//2020 IEEE Symposium on Security and Privacy(SP).Pisca-taway,NJ:IEEE,2020:304-317. [40]DENG L,CHEN X T,ZHANG Q H,et al.Differential privacy protection algorithms based on tree model[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2020,32(5):848-849. [41]GHANE S,JOLFAEI A,KULIK L,et al.Preserving privacy in the internet of connected vehicles[J].IEEE Transactions on Intelligent Transportation Systems,2020,22(8):5018-5027. [42]ZHAO P,ZHANG G L,WAN S H,et al.A survey of local differential privacy for securing internet of vehicles[J].The Journal of Supercomputing,2020,76(11):8391-8412. [43]JORGENSEN Z,YU T,CORMODE G.Publishing attributedsocial graphs with formal privacy guarantees[C]//Proceedings of the 2016 International Conference on Management of Data.New York:ACM,2016:107-122. [44]KASIVISWANATHAN S P,NISSIM K,RASKHODNIKOVA S,et al.Analyzing graphs with node differential privacy[C]//Theory of Cryptography Conference.Berlin:Springer,2013:457-476. [45]DAY W Y,LI N H,LYU M.Publishing graph degree distribution with node differential privacy[C]//Proceedings of the 2016 International Conference on Management of Data.New York:ACM,2016:123-138. [46]KARWA V,RASKHODNIKOVA S,SMITH A,et al.Private analysis of graph structure[J].Proceedings of the VLDB Endowment,2011,4(11):1146-1157. [47]RASKHODNIKOVA S,SMITH A.Lipschitz extensions fornode-private graph statistics and the generalized exponential mechanism[C]//2016 IEEE 57th Annual Symposium on Foundations of Computer Science(FOCS).Piscataway,NJ:IEEE,2016:495-504. [48]SALA A,ZHAO XIAOHAN,WILSON C,et al.Sharing graphs using differentially private graph models[C]//Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference.New York:ACM,2011:81-98. [49]XIA S Y,CHANG B Z,KNOPF K,et al.DPGraph:A Benchmark Platform for Differentially Private Graph Analysis[C]//Proceedings of the 2021 International Conference on Management of Data.New York:ACM,2021:2808-2812. [50]BLOCKI J,BLUM A,DATTA A,et al.The johnson-linden-strauss transform itself preserves differential privacy[C]//2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.Piscataway,NJ:IEEE,2012:410-419. [51]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics:Theory and Experiment,2008,2008(10):P10008. [52]QIN Z,YU T,YANG Y,et al.Generating synthetic decentra-lized social graphs with local differential privacy[C]//Procee-dings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2017:425-438. [53]SESHADHRI C,KOLDA T G,PINAR A.Community structure and scale-free collections of Erdc″s-Rényi graphs[J].Physical Review E,2012,85(5):056109. [54]ZHANG H L,LATIF S,BASSILY R,et al.Differentially-Private Control-Flow Node Coverage for Software Usage Analysis.[C]//29th USENIX Security Symposium(USENIX Security 20),Berkeley,CA:USENIX,2020. [55]GORYCZKA S,XIONG L.A comprehensive comparison of mul-tiparty secure additions with differentialprivacy[J].IEEE Transactions on Dependable and Secure Computing,2015,14(5):463-477. [56]BOEHLER J,KERSCHBAUM F.Secure sublinear time diffe-rentially private median computation[C]//Network and Distri-buted System Security Symposium.The Internet Society,2020. [57]PATEL A A,DHARWA J N.An integrated hybrid recommendation model using graph database[C]//2016 International Conference on ICT in Business Industry & Government(ICTBIG).Piscataway,NJ:IEEE,2016:1-5. [58]XIONG P,ZHU T Q,WANG X F.A survey on differential privacy protection and its application[J].Chinese Journal of Computers,2014,37(1):101-122. [59]CALANDRINO J A,KILZER A,NARAYANAN A,et al. You might also like:Privacy risks of collaborative filtering[C]//2011 IEEE Symposium on Security and Privacy.Piscataway,NJ:IEEE,2011:231-246. [60]MCSHERRY F,MIRONOV I.Differentially private recommender systems:Building privacy into the netflix prize contenders[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:627-636. [61]SHIN H,KIM S,SHIN J,et al.Privacy enhanced matrix factorization for recommendation with local differential privacy[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(9):1770-1782. [62]GAO C,HUANG C,LIN D S,et al.DPLCF:differentially private local collaborative filtering[C]//Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2020:961-970. [63]ANDRÉS M E,BORDENABE N E,CHATZIKOKOLAKIS K,et al.Geo-indistinguishability:Differential privacy for location-based systems[C]//Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security.New York:ACM,2013:901-914. [64]ZHAO X G,LI Y H,YUAN Y,et al.LDPart:effective location-record data publication via local differential privacy[J].IEEE Access,2019,7:31435-31445. [65]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data.New York:ACM,2009:19-30. [66]BARTHE G,FARINA G P,GABOARDI M,et al.Differentially private bayesian programming[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2016:68-79. [67]ZHANG D F,KIFER D.LightDP:Towards automating diffe-rential privacy proofs[C]//Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages.New York:ACM,2017:888-901. [68]WINOGRAD-CORT D,HAEBERLEN A,ROTH A,et al.Aframework for adaptive differential privacy[J].Proceedings of the ACM on Programming Languages,2017,1(ICFP):1-29. [69]ZHANG D,MCKENNA R,KOTSOGIANNIS I,et al.EKTELO:A framework for defining differentially-private computations[C]//Proceedings of the 2018 International Conference on Management of Data.2018:115-130. [70]LOBO-VESGA E,RUSSO A,GABOARDI M.A programming framework for differential privacy with accuracy concentration bounds[C]//2020 IEEE Symposium on Security and Privacy(SP).Piscataway,NJ:IEEE,2020:411-428. [71]CHEN Y,MACHANAVAJJHALA A.On the privacy properties of variants on the sparse vector technique[J].arXiv:1508.07306,2015. [72]LYU M,SU D,LI N.Understanding the Sparse Vector Technique for Differential Privacy[J].Proceedings of the VLDB Endowment,2017,10(6):637-648. [73]MCSHERRY F.Uber’s differential privacy.probably isn’t[EB/OL].(2018-02-25)[2022-05-22].https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md. [74]BICHSEL B,GEHR T,DRACHSLER-COHEN D,et al.DP-Finder:Finding differential privacy violations by sampling and optimization[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2018:508-524. [75]WANG Y X,DING Z Y,KIFER D,et al.CheckDP:An automated and integrated approach for proving differential privacy or finding precise counterexamples[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:919-938. [76]BALCER V,VADHAN S.Differential privacy on finite compu-ters[J].arXiv:1709.05396,2017. [77]MIRONOV I.On significance of the least significant bits for differential privacy[C]//Proceedings of the 2012 ACM Confe-rence on Computer and Communications Security.New York:ACM,2012:650-661. [78]GAZEAU I,MILLER D,PALAMIDESSI C.Preserving differential privacy under finite-precision semantics[J].Theoretical Computer Science,2016,655(Pt.B):92-108. [79]ILVENTO C.Implementing the exponential mechanism withbase-2 differential privacy[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Secu-rity.New York:ACM,2020:717-742. [80]XIONG A P,WANG T H,LI N H,et al.Towards effective differential privacy commu-nication for users’ data sharing decision and comprehension[C]//2020 IEEE Symposium on Security and Privacy(SP).Piscataway,NJ:IEEE,2020:392-410. [81]CUMMINGS R,KAPTCHUK G,REDMILES E M.I need abetter description:An Investigation Into User Expectations for Differential Privacy[C]//Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2021:3037-3052. |
[1] | 彭钺峰, 赵波, 刘会, 安杨. 针对机器学习的成员推断攻击综述 Survey on Membership Inference Attacks Against Machine Learning 计算机科学, 2023, 50(3): 351-359. https://doi.org/10.11896/jsjkx.220100016 |
[2] | 刘利康, 周春来. RCP:本地差分隐私下的均值保护技术 RCP:Mean Value Protection Technology Under Local Differential Privacy 计算机科学, 2023, 50(2): 333-345. https://doi.org/10.11896/jsjkx.220700273 |
[3] | 徐苗苗, 陈珍萍. 基于对称加密和双层真值发现的连续群智感知激励机制 Incentive Mechanism for Continuous Crowd Sensing Based Symmetric Encryption and Double Truth Discovery 计算机科学, 2023, 50(1): 294-301. https://doi.org/10.11896/jsjkx.220400101 |
[4] | 陈得鹏, 刘肖, 崔杰, 何道敬. 面向机器学习的成员推理攻击综述 Survey of Membership Inference Attacks for Machine Learning 计算机科学, 2023, 50(1): 302-317. https://doi.org/10.11896/jsjkx.220800227 |
[5] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[6] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[7] | 吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190 |
[8] | 黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229 |
[9] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[10] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[11] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[12] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[13] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[14] | 杨鸿健, 胡学先, 李可佳, 徐阳, 魏江宏. 隐私保护的非线性联邦支持向量机研究 Study on Privacy-preserving Nonlinear Federated Support Vector Machines 计算机科学, 2022, 49(12): 22-32. https://doi.org/10.11896/jsjkx.220500240 |
[15] | 瞿祥谋, 吴映波, 蒋晓玲. 一种非独立同分布问题下的联邦数据增强算法 Federated Data Augmentation Algorithm for Non-independent and Identical Distributed Data 计算机科学, 2022, 49(12): 33-39. https://doi.org/10.11896/jsjkx.220300031 |
|