计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 368-377.doi: 10.11896/jsjkx.230100013

• 信息安全 • 上一篇    

基于差分隐私的人口普查关联多属性数据发布

尤菲芙, 蔡剑平, 孙岚   

  1. 福州大学计算机与大数据学院 福州350108
  • 收稿日期:2023-01-02 修回日期:2023-04-26 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 蔡剑平(jpingcai@163.com)
  • 作者简介:(youfeifu97@163.com)

Census Associated Multiple Attributes Data Release Based on Differential Privacy

YOU Feifu, CAI Jianping, SUN Lan   

  1. College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China
  • Received:2023-01-02 Revised:2023-04-26 Online:2024-03-15 Published:2024-03-13
  • About author:YOU Feifu,born in 1997,postgraduate.Her main research interests include privacy protection and data security.CAI Jianping,born in 1990,Ph.D.His main research interests include differential privacy,federated learning,machine learning and optimization theory.

摘要: 发布未经保护的人口普查统计数据有泄露居民个人隐私信息的风险。基于差分隐私的人口普查数据保护方案已经得到研究者的广泛关注。在解决人口普查统计数据的地理区域之间的一致性约束时,具有更复杂层次性、一致性约束的关联多属性数据在现有方法下面临无法在单棵层次树中构建的挑战。文中提出了一种基于差分隐私的人口普查区域内部关联多属性统计数据最优一致发布方法,该方法能够实现复杂一致性约束统计数据的高效发布。首先将复杂的关联多属性之间的一致性约束划分为相对独立且易于求解的多重一致性约束,然后根据人口普查关联多属性数据的结构特性,通过数学分析在现有方法的基础上进行进一步的效率优化,最后结合多重一致性约束问题的逼近方法实现最优一致发布。在真实的人口普查数据集和合成数据集上进行实验,结果表明,所提方法能够在效率表现上优于同类方法1~2个数量级的同时保持与同类方法一致的精度。

关键词: 差分隐私, 隐私保护, 数据发布, 一致性约束, 人口普查

Abstract: The release of unprotected census statistics carries the risk of revealing residents' personal privacy information.Census data protection solutions based on differential privacy have received substantial attention from researchers.Existing methods address the consistency constraint among geographic regions of census statistics,but associated multi-attribute data with more complex hierarchical consistency constraints face the challenge of being unable to build in a single hierarchical tree under existing methods.In this paper,we propose a differentially privacy method for optimally consistent release of associated multiple attributes statistics within census regions,which can achieve efficient release of statistics with complex consistency constraints.Firstly,the consistency constraints among the complex associated multiple attributes are divided into relatively independent and easily solved multiple consistency constraints.Then,based on the structural characteristics of the census associated multiple attributes data,mathematical analysis is used to further optimize the efficiency based on the existing methods.Finally,the optimal consistent release is achieved by combining the approximation method of the multiple consistency constraints problem.Experiments on real census datasets and synthetic datasets show that the proposed method can outperform similar methods in efficiency performance by one to two orders of magnitude while maintaining the same accuracy as similar methods.

Key words: Differential privacy, Privacy protection, Data release, Consistency constraints, Census

中图分类号: 

  • TP30
[1]NING J Z.Major figures on 2020 population census of China[J].China Statistics,2021(5):4-5.
[2]HU Y,LI R.Method discussion on 2020 population census of China--based on the small census in 2015[J].The World of Survey and Research,2017(7):51-54.
[3]DINUR I,NISSIM K.Revealing information while preserving privacy[C]//Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems.New York:Association for Computing Machinery,2003:202-210.
[4]CHEN W Q.Reform and development of the Chinese population census[J].The World of Survey and Research,2012(11):48-52.
[5]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accuracy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1/2):1021-1032.
[6]WANG N,XIAO X,YANG Y,et al.PrivTrie:Effective Fre-quent Term Discovery under Local Differential Privacy[C]//2018 IEEE 34th International Conference on Data Engineering(ICDE).Paris:IEEE,2018:821-832.
[7]CAI J P,LIU X M,LI J Y,et al.Generation Matrix:An Embeddable Matrix Representation for Hierarchical Trees[J].arXiv:2201.11297,2022.
[8]ABOWD J M.The US Census Bureau adopts differential privacy[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:Association for Computing Machinery,2018:2867-2867.
[9]FIORETTO F,VAN HENTENRYCK P,ZHU K.Differential privacy of hierarchical census data:An optimization approach[J].Artificial Intelligence,2021,296:103475.
[10]ABOWD J,ASHMEAD R,SIMSON G,et al.Census topdown:Differentially private data,incremental schemas,and consistency with public knowledge[R/OL].Washington:US Census Bureau,2019.https://github.com/uscensusbureau/census2020-das-e2e/blob/master/doc/20190711_0945_Consistency_for_Large_Scale_Differentially_Private_Histograms.pdf.
[11]KUO Y H,CUIU C C,KIFER D,et al.Differentially private hierarchical count-of-counts histograms[J].Proceedings of the VLDB Endowment,2018,11(11):1509-1521.
[12]CAI J P,LIU X M,XIONG J B,et al.Approximation method of multiple consistency constraint under differential privacy [J].Journal on Communications,2021,42(6):107-117.
[13]QARDAJI W,YANG W,LI N.Understanding hierarchicalmethods for differentially private histograms[J].Proceedings of the VLDB Endowment,2013,6(14):1954-1965.
[14]DING B,WINSLETT M,HAN J,et al.Differentially private data cubes:optimizing noise sources and consistency[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:Association for Computing Machinery,2011:217-228.
[15]LI C,HAY M,MIKLAU G,et al.A Data-and Workload-Aware Algorithm for Range Queries Under Differential Privacy[J].Proceedings of the VLDB Endowment,2014,7(5):341-352.
[16]CORMODE G,PROCOPIUC C,SRIVASTAVA D,et al.Diffe-rentially private spatial decompositions[C]//2012 IEEE 28th International Conference on Data Engineering.Arlington:IEEE,2012:20-31.
[17]ZHANG J,XIAO X,XIE X.Privtree:A differentially private algorithm for hierarchical decompositions[C]//Proceedings of the 2016 International Conference on Management of Data.New York:Association for Computing Machinery,2016:155-170.
[18]SHAHAM S,GHINITA G,AHUJA R,et al.HTF:Homogeneous Tree Framework for Differentially-Private Release of Location Data[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems.New York:Association for Computing Machinery,2021:184-194.
[19]LI S,GENG Y,LI Y.A Differentially private hybrid decomposition algorithm based on quad-tree[J].Computers & Security,2021,109:102384.
[20]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24:757-781.
[21]MCKENNA R,MIKLAU G,HAY M,et al.HDMM:Optimi-zing error of high-dimensional statistical queries under differential privacy[J].arXiv:2106.12118,2021.
[22]CARDOSO A R,ROGERS R.Differentially private histograms under continual observation:Streaming selection into the unknown[C]//International Conference on Artificial Intelligence and Statistics.New York:PMLR,2022:2397-2419.
[23]ZHU H,YIN F,PENG S,et al.Differentially private hierarchical tree with high efficiency[J].Computers & Security,2022,118:102727.
[24]LEE J,WANG Y,KIFER D.Maximum likelihood postproces-sing for differential privacy under consistency constraints[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2015:635-644.
[25]DWORK C.Differential privacy[C]//Automata,Languages and Programming:33rd International Colloquium.Berlin:Springer,2006:1-12.
[26]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography:Third Theory of Cryptography Conference.Berlin:Sprin-ger,2006:265-284.
[27]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends® in Theoretical Computer Science,2014,9(3/4):211-407.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!