Computer Science ›› 2024, Vol. 51 ›› Issue (3): 368-377.doi: 10.11896/jsjkx.230100013

• Information Security • Previous Articles    

Census Associated Multiple Attributes Data Release Based on Differential Privacy

YOU Feifu, CAI Jianping, SUN Lan   

  1. College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China
  • Received:2023-01-02 Revised:2023-04-26 Online:2024-03-15 Published:2024-03-13
  • About author:YOU Feifu,born in 1997,postgraduate.Her main research interests include privacy protection and data security.CAI Jianping,born in 1990,Ph.D.His main research interests include differential privacy,federated learning,machine learning and optimization theory.

Abstract: The release of unprotected census statistics carries the risk of revealing residents' personal privacy information.Census data protection solutions based on differential privacy have received substantial attention from researchers.Existing methods address the consistency constraint among geographic regions of census statistics,but associated multi-attribute data with more complex hierarchical consistency constraints face the challenge of being unable to build in a single hierarchical tree under existing methods.In this paper,we propose a differentially privacy method for optimally consistent release of associated multiple attributes statistics within census regions,which can achieve efficient release of statistics with complex consistency constraints.Firstly,the consistency constraints among the complex associated multiple attributes are divided into relatively independent and easily solved multiple consistency constraints.Then,based on the structural characteristics of the census associated multiple attributes data,mathematical analysis is used to further optimize the efficiency based on the existing methods.Finally,the optimal consistent release is achieved by combining the approximation method of the multiple consistency constraints problem.Experiments on real census datasets and synthetic datasets show that the proposed method can outperform similar methods in efficiency performance by one to two orders of magnitude while maintaining the same accuracy as similar methods.

Key words: Differential privacy, Privacy protection, Data release, Consistency constraints, Census

CLC Number: 

  • TP30
[1]NING J Z.Major figures on 2020 population census of China[J].China Statistics,2021(5):4-5.
[2]HU Y,LI R.Method discussion on 2020 population census of China--based on the small census in 2015[J].The World of Survey and Research,2017(7):51-54.
[3]DINUR I,NISSIM K.Revealing information while preserving privacy[C]//Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems.New York:Association for Computing Machinery,2003:202-210.
[4]CHEN W Q.Reform and development of the Chinese population census[J].The World of Survey and Research,2012(11):48-52.
[5]HAY M,RASTOGI V,MIKLAU G,et al.Boosting the accuracy of differentially private histograms through consistency[J].Proceedings of the VLDB Endowment,2010,3(1/2):1021-1032.
[6]WANG N,XIAO X,YANG Y,et al.PrivTrie:Effective Fre-quent Term Discovery under Local Differential Privacy[C]//2018 IEEE 34th International Conference on Data Engineering(ICDE).Paris:IEEE,2018:821-832.
[7]CAI J P,LIU X M,LI J Y,et al.Generation Matrix:An Embeddable Matrix Representation for Hierarchical Trees[J].arXiv:2201.11297,2022.
[8]ABOWD J M.The US Census Bureau adopts differential privacy[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York:Association for Computing Machinery,2018:2867-2867.
[9]FIORETTO F,VAN HENTENRYCK P,ZHU K.Differential privacy of hierarchical census data:An optimization approach[J].Artificial Intelligence,2021,296:103475.
[10]ABOWD J,ASHMEAD R,SIMSON G,et al.Census topdown:Differentially private data,incremental schemas,and consistency with public knowledge[R/OL].Washington:US Census Bureau,2019.https://github.com/uscensusbureau/census2020-das-e2e/blob/master/doc/20190711_0945_Consistency_for_Large_Scale_Differentially_Private_Histograms.pdf.
[11]KUO Y H,CUIU C C,KIFER D,et al.Differentially private hierarchical count-of-counts histograms[J].Proceedings of the VLDB Endowment,2018,11(11):1509-1521.
[12]CAI J P,LIU X M,XIONG J B,et al.Approximation method of multiple consistency constraint under differential privacy [J].Journal on Communications,2021,42(6):107-117.
[13]QARDAJI W,YANG W,LI N.Understanding hierarchicalmethods for differentially private histograms[J].Proceedings of the VLDB Endowment,2013,6(14):1954-1965.
[14]DING B,WINSLETT M,HAN J,et al.Differentially private data cubes:optimizing noise sources and consistency[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.New York:Association for Computing Machinery,2011:217-228.
[15]LI C,HAY M,MIKLAU G,et al.A Data-and Workload-Aware Algorithm for Range Queries Under Differential Privacy[J].Proceedings of the VLDB Endowment,2014,7(5):341-352.
[16]CORMODE G,PROCOPIUC C,SRIVASTAVA D,et al.Diffe-rentially private spatial decompositions[C]//2012 IEEE 28th International Conference on Data Engineering.Arlington:IEEE,2012:20-31.
[17]ZHANG J,XIAO X,XIE X.Privtree:A differentially private algorithm for hierarchical decompositions[C]//Proceedings of the 2016 International Conference on Management of Data.New York:Association for Computing Machinery,2016:155-170.
[18]SHAHAM S,GHINITA G,AHUJA R,et al.HTF:Homogeneous Tree Framework for Differentially-Private Release of Location Data[C]//Proceedings of the 29th International Conference on Advances in Geographic Information Systems.New York:Association for Computing Machinery,2021:184-194.
[19]LI S,GENG Y,LI Y.A Differentially private hybrid decomposition algorithm based on quad-tree[J].Computers & Security,2021,109:102384.
[20]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].The VLDB Journal,2015,24:757-781.
[21]MCKENNA R,MIKLAU G,HAY M,et al.HDMM:Optimi-zing error of high-dimensional statistical queries under differential privacy[J].arXiv:2106.12118,2021.
[22]CARDOSO A R,ROGERS R.Differentially private histograms under continual observation:Streaming selection into the unknown[C]//International Conference on Artificial Intelligence and Statistics.New York:PMLR,2022:2397-2419.
[23]ZHU H,YIN F,PENG S,et al.Differentially private hierarchical tree with high efficiency[J].Computers & Security,2022,118:102727.
[24]LEE J,WANG Y,KIFER D.Maximum likelihood postproces-sing for differential privacy under consistency constraints[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:Association for Computing Machinery,2015:635-644.
[25]DWORK C.Differential privacy[C]//Automata,Languages and Programming:33rd International Colloquium.Berlin:Springer,2006:1-12.
[26]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography:Third Theory of Cryptography Conference.Berlin:Sprin-ger,2006:265-284.
[27]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends® in Theoretical Computer Science,2014,9(3/4):211-407.
[1] GE Yinchi, ZHANG Hui, SUN Haohang. Differential Privacy Data Synthesis Method Based on Latent Diffusion Model [J]. Computer Science, 2024, 51(3): 30-38.
[2] WANG Dong, LI Zheng, XIAO Bingbing. Blockchain Coin Mixing Scheme Based on Homomorphic Encryption [J]. Computer Science, 2024, 51(3): 335-339.
[3] CAI Mengnan, SHEN Guohua, HUANG Zhiqiu, YANG Yang. High-dimensional Data Publication Under Local Differential Privacy [J]. Computer Science, 2024, 51(2): 322-332.
[4] WANG Zhousheng, YANG Geng, DAI Hua. Lightweight Differential Privacy Federated Learning Based on Gradient Dropout [J]. Computer Science, 2024, 51(1): 345-354.
[5] ZHONG Yue, GU Jieming, CAO Honglin. Survey of Lightweight Block Cipher [J]. Computer Science, 2023, 50(9): 3-15.
[6] LU Xingyuan, CHEN Jingwei, FENG Yong, WU Wenyuan. Privacy-preserving Data Classification Protocol Based on Homomorphic Encryption [J]. Computer Science, 2023, 50(8): 321-332.
[7] LI Kejia, HU Xuexian, CHEN Yue, YANG Hongjian, XU Yang, LIU Yang. Differential Privacy Linear Regression Algorithm Based on Principal Component Analysis andFunctional Mechanism [J]. Computer Science, 2023, 50(8): 342-351.
[8] LI Rongchang, ZHENG Haibin, ZHAO Wenhong, CHEN Jinyin. Data Reconstruction Attack for Vertical Graph Federated Learning [J]. Computer Science, 2023, 50(7): 332-338.
[9] ZHANG Lianfu, TAN Zuowen. Robust Federated Learning Algorithm Based on Adaptive Weighting [J]. Computer Science, 2023, 50(6A): 230200188-9.
[10] ZHAO Yuqi, YANG Min. Review of Differential Privacy Research [J]. Computer Science, 2023, 50(4): 265-276.
[11] PENG Yuefeng, ZHAO Bo, LIU Hui, AN Yang. Survey on Membership Inference Attacks Against Machine Learning [J]. Computer Science, 2023, 50(3): 351-359.
[12] LIU Likang, ZHOU Chunlai. RCP:Mean Value Protection Technology Under Local Differential Privacy [J]. Computer Science, 2023, 50(2): 333-345.
[13] SUN Min, XU Senwei, SHAN Tong. LN-ERCL Lightning Network Optimization Scheme [J]. Computer Science, 2023, 50(11A): 230200115-5.
[14] YIN Shiyu, ZHU Youwen, ZHANG Yue. Utility-optimized Local Differential Privacy Joint Distribution Estimation Mechanisms [J]. Computer Science, 2023, 50(10): 315-326.
[15] XU Miaomiao, CHEN Zhenping. Incentive Mechanism for Continuous Crowd Sensing Based Symmetric Encryption and Double Truth Discovery [J]. Computer Science, 2023, 50(1): 294-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!