计算机科学 ›› 2017, Vol. 44 ›› Issue (6): 139-143.doi: 10.11896/j.issn.1002-137X.2017.06.023

• 信息安全 • 上一篇    下一篇

不完全数据集的差分隐私保护决策树研究

沈思倩,毛宇光,江冠儒   

  1. 南京航空航天大学计算机科学与技术学院 南京211106,南京航空航天大学计算机科学与技术学院 南京211106;南京大学计算机软件新技术国家重点实验室 南京210093,卡尔斯鲁厄理工学院计算机系 巴登-符腾堡州76131
  • 出版日期:2018-11-13 发布日期:2018-11-13

Method of Constructing Differential Privacy Decision Tree Classifier with Incomplete Data Sets

SHEN Si-qian, MAO Yu-guang and JIANG Guan-ru   

  • Online:2018-11-13 Published:2018-11-13

摘要: 主要研究在对不完全数据集进行决策树分析时,如何加入差分隐私保护技术。首先简单介绍了差分隐私ID3算法和差分隐私随机森林决策树算法;然后针对上述算法存在的缺陷和不足进行了修改,提出指数机制的差分隐私随机森林决策树算法;最后对于不完全数据集提出了一种新的WP(Weight Partition)缺失值处理方法,能够在不需要插值的情况下,使决策树分析算法既能满足差分隐私保护,也能拥有更高的预测准确率和适应性。实验证明,无论是Laplace机制还是指数机制,无论是ID3算法还是随机森林决策树算法,都能适用于所提方法。

关键词: 差分隐私保护,不完全数据集,ID3算法,随机森林决策树

Abstract: We mainly studied the problem of constructing differential privacy decision tree classifier with incomplete data sets.We first introduced the differential privacy ID3 decision tree algorithm and differentially private random decision tree algorithm.Then we considered the weakness of the algorithms talked above,and created a new differentially private random decision tree algorithm with exponential mechanism.Finally,an approach for decision tree classifier with incomplete data sets was proposed,which yields better prediction while maintaining good privacy without inserting values,called WP(Weight Partition).And the experimental results show that our approach is suitable for either differential privacy ID3 decision trees or differentially private random decision trees,either laplace or exponential mechanism.

Key words: Differential privacy,Incomplete data sets,ID3 decision tree algorithm,Random decision tree algorithm

[1] DWORK C,DANOS V,KASHEFI K,et al.Automata,Languages and Programming[J].Verlag Lecture Notes in Computer Science,1980,27(3):282-298.
[2] CLIFTON C,KANTARCIOGLU M,VAIDYA J.Defining Privacy for Data Mining[C]∥Proceedings of the National Science Foundation Workshop on Next Generation Data Mining.2012:126-133.
[3] SWEENEY L.k-anonymity:A Model for Protecting Privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,0(5):557-570.
[4] MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-Diversity:Privacy Beyond k-Anonymity[C]∥ Proceeding of the 22nd International Conference on Data Engineering.2006:24 .
[5] XIONG P,ZHU T Q,WANG X F.A Survey on Differential Privacy and Applications[J].Chinese Journal of Computers,2014,7(1):101-122.(in Chinese) 熊平,朱天清,王晓峰.差分隐私保护及其应用[J].计算机学报,2014,7(1):101-122.
[6] HAY M,RASTOGI V,MIKLAU G,et al.Boosting the Accuracy of Differentially Private Histograms Through Consistency[J].Proceedings of the VLDB Endowment,2010,3(1/2):1021-1032.
[7] XIAO X,WANG G,GEHRKE J.Differential Privacy via Wavelet Transforms[J].IEEE Transactions on Knowledge and Data Engineering,2011,3(8):1200-1214.
[8] CORMODE G,PROCOPIUC C,SRIVASTAVA D,et al.Diffe-rentially Private Spatial Decompositions[C]∥2012 IEEE 28th International Conference on Data Engineering (ICDE).2012:20-31.
[9] LI C,HAY M,RASTOGI V,et al.Optimizing Linear Counting Queries under Differential Privacy[C]∥Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2010:123-134.
[10] XIONG P,ZHU T,NIU W,et al.A Differentially Private Algorithm for Location Data Release[J].Knowledge and Information Systems,2016,47(3):647-669.
[11] XIAO X,BENDER G,HAY M,et al.iReduct:Differential Privacy with Reduced Relative Errors[C]∥Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data.2011:229-240.
[12] CHAUDHURI K,MONTELEONI C,SARWATE A D.Diffe-rentially Private Empirical Risk Minimization[J].The Journal of Machine Learning Research,2011,2:1069-1109.
[13] ZHANG J,XIAO X,YANG Y,et al.PrivGene:DifferentiallyPrivate Model Fitting using Genetic Algorithms[C]∥Procee-dings of the 2013 ACM SIGMOD International Conference on Mana-gement of Data.2013:665-676.
[14] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ Framework[C]∥Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138.
[15] MCSHERRY F D.Privacy Integrated Queries:an ExtensiblePlatform for Privacy-preserving Data Analysis[C]∥Proceedings of the 2009 ACM SIGMOD International Conference on Mana-gement of Data.2009:19-30.
[16] FRIEDMAN A,SCHUSTER A.Data mining with DifferentialPrivacy[C]∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2010:493-502.
[17] JAGANNATHAN G,PILLAIPAKKAMNATT K,W RIGHT R N.A Practical Differentially Private Random Decision Tree Classifier[C]∥IEEE International Conference on Data Mining Workshops,2009(ICDMW’09).2009:114-121.
[18] XIONG P,ZHU T Q,JING D W.Different Private Data Publishing Algorithm for Building Decision Tree[J].Application Research Computers,2014,1(10):3108-3112.(in Chinese) 熊平,朱天清,金大卫.一种面向决策树构建的差分隐私保护算法[J].计算机应用研究,2014,1(10):3108-3112.
[19] DWORK C.The promise of Differential Privacy:A Tutorial on Algorithmic Techniques[C]∥Foundations of Computer Science (FOCS).2011:1-2.
[20] DWORK C.Differential Privacy in New Settings[C]∥SODA.2010:174-183.
[21] DWORK C.Differential Privacy:A survey of Results[M]∥Theo-ry and Applications of Models of Computation.Springer,2008:1-19.
[22] MCSHERRY F,TALWAR K.Mechanism Design via Differen-tial Privacy[C]∥48th Annual IEEE Symposium on Foundations of Computer Science,2007(FOCS’07).2007:94-103.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!