计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 350-356.doi: 10.11896/jsjkx.210900229

• 信息安全 • 上一篇    下一篇

基于本地化差分隐私的频率特征提取

黄觉, 周春来   

  1. 中国人民大学信息学院 北京100872
  • 收稿日期:2021-09-27 修回日期:2021-12-20 出版日期:2022-07-15 发布日期:2022-07-12
  • 通讯作者: 周春来(czhou@ruc.edu.cn)
  • 作者简介:(3287401165@qq.com)
  • 基金资助:
    国家自然科学基金重点项目(61732006);国家自然科学基金(61972404,12071478)

Frequency Feature Extraction Based on Localized Differential Privacy

HUANG Jue, ZHOU Chun-lai   

  1. Department of Information,Renmin University,Beijing 100872,China
  • Received:2021-09-27 Revised:2021-12-20 Online:2022-07-15 Published:2022-07-12
  • About author:HUANG Jue,born in 1998,postgra-duate.His main research interests include artificial intelligence uncertainty.
    ZHOU Chun-lai,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include uncertainty in AI and privacy in data science.
  • Supported by:
    Key Program of the National Natural Science Foundation of China(61732006) and National Natural Science Foundation of China(61972404,12071478).

摘要: 大数据时代信息技术不断发展,隐私问题越来越受到人们的关注。尤其是随着移动端的普及,如何在数据发布的同时保护用户个人的隐私信息是当前面临的重大挑战。此前学术界曾提出依赖于可信第三方的中心化差分隐私技术,但在实际应用中可信第三方的条件通常不成立;随后,在中心化差分隐私的基础上进一步提出了本地化差分隐私,它能够防止来自不可信第三方的隐私攻击,并且面对具有任意知识背景的隐私攻击者依然具有很强的防御效果。但是,市场通常不仅要迎合用户的需求,也要满足运营商的要求。为了对两者进行平衡,如何解决运营商的分析任务是亟待解决的问题。RAPPOR(Randomized Aggregatable Privacy-Preserving Ordinal Response)算法能够很好地完成这个任务,它通过使用两次随机响应机制对用户数据进行加密,保证了隐私保护的力度,并使用 Lasso 回归模型对加密数据进行解密,保证了频率特征提取的准确度。文中的贡献在于将RAPPOR算法应用于疫情信息采集,在保护受访者隐私信息的同时能获取真实的疫情资料,以美国各地新冠确诊人数的数据集进行实验,实验结果表明,所提方法较高程度地拟合了真实结果,完成了频率特征提取的分析任务。RAPPOR算法实现了本地化差分隐私技术从理论走向应用,切实保障了个人的隐私问题。

关键词: RAPPOR, 本地化差分隐私, 频率特征, 随机响应

Abstract: With the continuous development of information technology in the era of big data,privacy problem has attracted more and more attention.Especially with the increasing popularity of mobile terminals,how to protect users' privacy information while releasing data is a major challenge at present.Previously,academic circle has proposed the center differential privacy technology that relies on a trusted third platform,but the condition that needs a trusted third platform is usually not valid in practical applications.On the basis of center differential privacy,localized differential privacy is further proposed.It can prevent privacy attacks from untrusted third platforms,and it still has a strong defensive effect against privacy attackers with abundant knowledge background.But markets often cater to the needs of service providers as well as users.In order to balance the contradiction between the two,how to accomplish the analysis tasks of service providers is a problem that must be solved.RAPPOR is a good mechanism to accomplish these tasks.It encrypts user data by using two random response mechanisms to ensure the strength of privacy protection.Lasso regression model is used to decrypt the encrypted data to ensure the accuracy of frequency feature extraction.In this paper,RAPPOR algorithm is applied to COVID-19 epidemic information collection,which can obtain real epidemic data while protecting the privacy of respondents.The dataset which includes people diagnosed with COVID-19 in the United States is used to simulate the RAPPOR mechanism and fits the real results to a high degree.RAPPOR algorithm realizes the localized differential privacy technology from theory to application,and effectively protects personal privacy.

Key words: Frequency characteristics, Localized differential privacy, Random response, RAPPOR

中图分类号: 

  • TP311
[1]GEORGINA E,GARY K,ADAM D S,et al.Differentially Private Survey Research[DB/OL].(2021-03-21)[2021-06-18].https://j.mp/3jAYXo3.
[2]SAMARATI P,SWEENEY L.Generalizing Data to Provide Anonymity when Disclosing Information[C]//Proceedings of the Seventeenth ACM-SIGACT-SIGMOD-SIGART Symposium on Principles Systems.New York:ACM,1998:98-188.
[3]MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-Di-versity:Privacy Beyond k-anonymity[C]//Proceedings of the 22nd International Conference on Data Engineering. Atlanta:IEEE Press,2006:24-24.
[4]LI N,LI T,VENKATASUBRAMANIAN S.t-Closeness:privacy Beyond k-Anonymity and l-diversity[C]//Proceedings of the 23rd IEEE International Conference on Data Engineering(ICDE).IEEE,2007:106-115.
[5]GEORGINA E,GARY K,MARGARET S,et al.StatisticallyValid Inferences from Privacy Protected Data[DB/OL].https://j.mp/2qkWjfj.
[6]DWORK C.Differential Privacy[C]//Automata,Languages and Programming.Venice:Springer,2006:1-12.
[7]WARNER S L.Randomized Response:A Survey Technique for Eliminating Evasive Answer Bias[J].Journal of the American Statistical Association,1965,60(309):63-69.
[8]YOSHUA B,REJEAN D,PASCAL V,et al.A Neural Probabilistic Language Model[J].Journal of Machine Learning Research(JMLR),2003,3:1137-1155.
[9]WANG N,XIAO X K,YANG Y,et al.Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]//IEEE 35th International Conference on Data Enginee Ring(ICDE).Macao,China,2019:638-649.
[10]WANG J N,KRASKA T,FRANKLIN M J,et al.CrowdER:Crowdsourcing Entity Resolution[C]//Proceedings of the VLDB Endowment,Istanbul:VLDB Endowment,2012:1483-1494.
[11]ULFAR E,VASYL P,ALEKSANDRA K.RAPPOR:Rando-mized Aggregatable Privacy-Preserving Ordinal Response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM 2014:1054-1067.
[12]ROBERT T.Regression Shrinkage and Selection Via the Lasso[J].Journal of the Royal Statistical Society:series B,1994,58(1):267-288.
[13]JOHN C D,MICHAEL I J.Local Privacy and Statistical Mini-max Rates[C]//Proceedings of the IEEE 54th Annual Symposium on Foundations of Computer Science.New York:IEEE Press,2013:1592-1592.
[14]DING B,WINSLETT M,HAN J,et al.Differentially Private Data Cubes:Optimizing Noise Sources and Consistency[C]//Proceedings of the 2011 ACM SIGMOD International Confe-rence on Management of Data.NEW YORK:ACM,2011:217-228.
[15]HARDT M,ROTHBLUM G N.A Multiplicative WeightsMechanism for Privacy-Preserving Data Analysis[C]//Procee-dings of the 51st Annual IEEE Symposium on Foundations of Computer Science.New York:IEEE Press,2010:61-70.
[16]OBERSKI D L,KREUTERM F.Differential Privacy and Social Science:An Urgent Puzzle[EB/OL].https://doi.org/10.1162/99608f92.63a22079.
[17]HARDT M,LIGETT K,MCSHERRY F.A Simple and Practical Algorithm for Differentially Private Data Release[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.New York:Curran Associates Inc,2012:2339-2347.
[18]YE Q Q,MENG X F,ZHU M J,et al.Survey on Local Differen-tial Privacy[J].Journal of Software,2018,29(7):1981-2005.
[1] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[2] 彭春春, 陈燕俐, 荀艳梅.
支持本地化差分隐私保护的k-modes聚类方法
k-modes Clustering Guaranteeing Local Differential Privacy
计算机科学, 2021, 48(2): 105-113. https://doi.org/10.11896/jsjkx.200700172
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!