计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 368-377.doi: 10.11896/jsjkx.230100013
• 信息安全 • 上一篇
尤菲芙, 蔡剑平, 孙岚
YOU Feifu, CAI Jianping, SUN Lan
摘要: 发布未经保护的人口普查统计数据有泄露居民个人隐私信息的风险。基于差分隐私的人口普查数据保护方案已经得到研究者的广泛关注。在解决人口普查统计数据的地理区域之间的一致性约束时,具有更复杂层次性、一致性约束的关联多属性数据在现有方法下面临无法在单棵层次树中构建的挑战。文中提出了一种基于差分隐私的人口普查区域内部关联多属性统计数据最优一致发布方法,该方法能够实现复杂一致性约束统计数据的高效发布。首先将复杂的关联多属性之间的一致性约束划分为相对独立且易于求解的多重一致性约束,然后根据人口普查关联多属性数据的结构特性,通过数学分析在现有方法的基础上进行进一步的效率优化,最后结合多重一致性约束问题的逼近方法实现最优一致发布。在真实的人口普查数据集和合成数据集上进行实验,结果表明,所提方法能够在效率表现上优于同类方法1~2个数量级的同时保持与同类方法一致的精度。
中图分类号:
[1]NING J Z.Major figures on 2020 population census of China[J].China Statistics,2021(5):4-5. [2]HU Y,LI R.Method discussion on 2020 population census of China--based on the small census in 2015[J].The World of Survey and Research,2017(7):51-54. [3]DINUR I,NISSIM K.Revealing information while preserving privacy[C]//Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems.New York:Association for Computing Machinery,2003:202-210. [4]CHEN W Q.Reform and development of the Chinese population census[J].The World of Survey and Research,2012(11):48-52. [5]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accuracy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1/2):1021-1032. [6]WANG N,XIAO X,YANG Y,et al.PrivTrie:Effective Fre-quent Term Discovery under Local Differential Privacy[C]//2018 IEEE 34th International Conference on Data Engineering(ICDE).Paris:IEEE,2018:821-832. [7]CAI J P,LIU X M,LI J Y,et al.Generation Matrix:An Embeddable Matrix Representation for Hierarchical Trees[J].arXiv:2201.11297,2022. [8]ABOWD J M.The US Census Bureau adopts differential privacy[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:Association for Computing Machinery,2018:2867-2867. [9]FIORETTO F,VAN HENTENRYCK P,ZHU K.Differential privacy of hierarchical census data:An optimization approach[J].Artificial Intelligence,2021,296:103475. [10]ABOWD J,ASHMEAD R,SIMSON G,et al.Census topdown:Differentially private data,incremental schemas,and consistency with public knowledge[R/OL].Washington:US Census Bureau,2019.https://github.com/uscensusbureau/census2020-das-e2e/blob/master/doc/20190711_0945_Consistency_for_Large_Scale_Differentially_Private_Histograms.pdf. [11]KUO Y H,CUIU C C,KIFER D,et al.Differentially private hierarchical count-of-counts histograms[J].Proceedings of the VLDB Endowment,2018,11(11):1509-1521. [12]CAI J P,LIU X M,XIONG J B,et al.Approximation method of multiple consistency constraint under differential privacy [J].Journal on Communications,2021,42(6):107-117. [13]QARDAJI W,YANG W,LI N.Understanding hierarchicalmethods for differentially private histograms[J].Proceedings of the VLDB Endowment,2013,6(14):1954-1965. [14]DING B,WINSLETT M,HAN J,et al.Differentially private data cubes:optimizing noise sources and consistency[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:Association for Computing Machinery,2011:217-228. [15]LI C,HAY M,MIKLAU G,et al.A Data-and Workload-Aware Algorithm for Range Queries Under Differential Privacy[J].Proceedings of the VLDB Endowment,2014,7(5):341-352. [16]CORMODE G,PROCOPIUC C,SRIVASTAVA D,et al.Diffe-rentially private spatial decompositions[C]//2012 IEEE 28th International Conference on Data Engineering.Arlington:IEEE,2012:20-31. [17]ZHANG J,XIAO X,XIE X.Privtree:A differentially private algorithm for hierarchical decompositions[C]//Proceedings of the 2016 International Conference on Management of Data.New York:Association for Computing Machinery,2016:155-170. [18]SHAHAM S,GHINITA G,AHUJA R,et al.HTF:Homogeneous Tree Framework for Differentially-Private Release of Location Data[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems.New York:Association for Computing Machinery,2021:184-194. [19]LI S,GENG Y,LI Y.A Differentially private hybrid decomposition algorithm based on quad-tree[J].Computers & Security,2021,109:102384. [20]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24:757-781. [21]MCKENNA R,MIKLAU G,HAY M,et al.HDMM:Optimi-zing error of high-dimensional statistical queries under differential privacy[J].arXiv:2106.12118,2021. [22]CARDOSO A R,ROGERS R.Differentially private histograms under continual observation:Streaming selection into the unknown[C]//International Conference on Artificial Intelligence and Statistics.New York:PMLR,2022:2397-2419. [23]ZHU H,YIN F,PENG S,et al.Differentially private hierarchical tree with high efficiency[J].Computers & Security,2022,118:102727. [24]LEE J,WANG Y,KIFER D.Maximum likelihood postproces-sing for differential privacy under consistency constraints[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2015:635-644. [25]DWORK C.Differential privacy[C]//Automata,Languages and Programming:33rd International Colloquium.Berlin:Springer,2006:1-12. [26]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography:Third Theory of Cryptography Conference.Berlin:Sprin-ger,2006:265-284. [27]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends® in Theoretical Computer Science,2014,9(3/4):211-407. |
|