计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 342-351.doi: 10.11896/jsjkx.220800255

• 信息安全 • 上一篇    下一篇

基于主成分分析和函数机制的差分隐私线性回归算法

李可佳1, 胡学先1, 陈越1, 杨鸿健1, 徐阳1, 刘扬1,2   

  1. 1 中国人民解放军战略支援部队信息工程大学 郑州 450001
    2 郑州大学网络空间安全学院 郑州 450001
  • 收稿日期:2022-08-29 修回日期:2023-03-23 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 胡学先(xuexian_hu@hotmail.com)
  • 作者简介:(kejia_li@outlook.com)
  • 基金资助:
    国家自然科学基金(62172433,62172434,61862011,61872449)

Differential Privacy Linear Regression Algorithm Based on Principal Component Analysis andFunctional Mechanism

LI Kejia1, HU Xuexian1, CHEN Yue1, YANG Hongjian1, XU Yang1, LIU Yang1,2   

  1. 1 The PLA Strategic Support Force Information Engineering University,Zhengzhou 450001,China
    2 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450001,China
  • Received:2022-08-29 Revised:2023-03-23 Online:2023-08-15 Published:2023-08-02
  • About author:LI Kejia,born in 1997,postgraduate.Her main research interests include fe-derated learning,differential privacy,and blockchain.
    HU Xuexian,born in 1982,Ph.D,asso-ciate professor,Ph.D supervisor.His main research interests include big data security,applied cryptography,and network security.
  • Supported by:
    National Natural Science Foundation of China(62172433,62172434,61862011,61872449).

摘要: 随着人工智能应用的不断落地以及隐私保护法律法规的持续出台,机器学习中的隐私保护问题已成为目前信息安全领域的一个研究热点。文中针对现有的差分隐私线性回归算法全局敏感度大、模型可用性较差的问题,基于高斯机制代替传统的Laplace机制,并通过在算法的两个主要阶段分别添加噪声的方法,提出了一种基于主成分分析和函数机制的差分隐私线性回归算法(PCAFM-DPLR)。首先,为了在降维的同时兼顾数据的隐私性,向原始数据集的协方差矩阵中注入高斯噪声,基于主成分分析得到具有差分隐私保护效果的低维数据集;其次,为防止模型训练过程中可能存在的隐私泄露,再向目标函数的展开多项式系数添加高斯噪声,并以扰动后的目标函数最小化为目标,求得最优模型参数。理论分析和实验结果表明,PCAFM-DPLR算法训练出的线性回归模型能够在有效保证隐私性的同时,具有良好的可用性。

关键词: 差分隐私, 主成分分析, 函数机制, 线性回归, 高斯噪声

Abstract: With the continuous development of artificial intelligence applications and the subsequent promulgation of privacy protection laws and regulations,the privacy protection issue in machine learning has become a research hotspot in the field of information security.To overcome the issues of high global sensitivity and poor model usability of the existing differential privacy linear regression algorithms,we present a differential privacy linear regression algorithm based on principal component analysis and functional mechanism(PCAFM-DPLR).In the PCAFM-DPLR algorithm,the traditional Laplace mechanism is replaced by the Gaussian mechanism,and the noise is added in the two major stages of the algorithm respectively.First,in order to take into account the privacy of the data while reducing the dimensionality,Gaussian noise is injected into the covariance matrix of the original data set,and a low-dimensional data set with differential privacy protection effect is obtained based on principal component analysis.Second,to prevent the possible privacy leakage during the model training,Gaussian noise is then added to the expansion polynomial coefficients of the objective function,and the minimization of the perturbed objective function is used as the objective to find the optimal model parameters.Theoretical analysis and experimental results show that the linear regression model trained by the PCAFM-DPLR algorithm can effectively guarantee privacy while having good utility.

Key words: Differential privacy, Principal component analysis, Functional mechanism, Linear regression, Gaussian noise

中图分类号: 

  • TP391
[1]FREDRIKSON M,JHA S,RISTENPART T.Model inversion attacks that exploit confidence information and basic countermeasures[C]//Proceedings of the 22nd ACM SIGSAC Confe-rence on Computer and Communications Security.New York:ACM,2015:1322-1333.
[2]DWORK C.Differential privacy[C]//Proceedings of the 33rd International Colloquium on Automata,Languages and Programming.Berlin:Springer,2006:1-12.
[3]DWORK C.Differential privacy:A survey of results[C]//Proceedings of the 5th International Conference on Theory and Applications of Models of Computation.Berlin:Springer,2008:1-19.
[4]ZHANG Y X,WEI J H,LI J,et al.Graph degree histogrampublication method with node-differential privacy[J].Journal of Computer Research and Development,2019,56(3):508-520.
[5]CORMODE G,PROCOPIUC M,SHEN E,et al.Differentiallyprivate spatial decompositions[C]//Proceedings of the IEEE 28th International Conference on Data Engineering.Piscataway,NJ:IEEE,2012:20-31.
[6]BHASKAR R,LAXMAN S,SIMTH A,et al.Discovering frequent patterns in sensitive data[C]//Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2010:503-512.
[7]MOHAMMED N,CHEN R,FUNG B,et al.Differentially private data release for data mining[C]//Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining.New York:ACM,2011:493-501.
[8]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the 39th Annual ACM Symposium on Theory of Computing.New York:ACM,2007:11-13.
[9]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Theory of Cryptography Conference.Berlin:Springer,2006:265-284.
[10]FREDRIKSON M,LANTZ E,JHA S,et al.Privacy in Pharmacogenetics:An End-to-End Case Study of Personalized Warfarin Dosing[C]//Proceedings of the USENIX Security Symposium.Berkeley,CA:USENIX,2014:17-32.
[11]WU X,FREDRIKSON M,WU W T,et al.Revisiting differen-tially private regression:Lessons from learning theory and their consequences[J].arXiv:1512.06388,2015.
[12]SMITH A.Privacy-preserving statistical estimation with opti-mal convergence rate[C]//Proceedings of the 43th Annual ACM Symposium on Theory of Computing.New York:ACM,2011:813-822.
[13]LEI J.Differentially private m-estimators[C]//Proceedings of the 23rd Annual Conference on Neural Information Processing Systems.New York:ACM,2011:361-369.
[14]GONG M G,PAN K,XIE Y.Differential privacy preservation in regression analysis based on relevance[J].Knowledge-Based Systems,2019,173(6):140-149.
[15]CHAUDHURI K,MONTELEONI C,SARWATE D.Differentially private empirical risk minimization[J].The Journal of Machine Learning Research,2011,12(2):1069-1109.
[16]ZHANG J,ZHANG Z J,XIAO X K,et al.Functional mechanism:regression analysis under differential privacy[J].Procee-dings of the VLDB Endowment,2012,5(11):1364-1375.
[17]DWORK C,KENTHAPADI K,MCSHERRY F,et al.Ourdata,ourselves:privacy via distributed noise generation[C]//Proceedings of the 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques.Berlin:Springer,2006:486-503.
[18]JAYARAMAN B,EVANS D.Evaluating differentially private machine learning in practice[C]//Proceedings of the 28th USENIX Security Symposium.Berkeley,CA:USENIX,2019:1895-1912.
[19]LIU J X,MENG X F.Survey on Privacy-Preserving MachineLearning[J].Journal of Computer Research and Development,2020,57(2):346-362.
[20]DWORK C,ROTH A.The algorithmic foundations of differential privacy[J].Foundations and Trends in Theoretical Compu-ter Science,2014,9(3/4):211-407.
[21]MCSHERRY F,TALWAR K.Mechanism design via differential privacy[C]//Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science.Piscataway,NJ:IEEE,2007:94-103.
[22]XIONG P,ZHU T Q,WANG X F.A survey on differential privacy and applications[J].Chinese Journal of Computers,2014,37(1):101-122.
[23]MCSHERRY F.Privacy integrated queries:an extensible plat-form for privacy-preserving data analysis[J].Communications of the ACM,2010,53(9):89-97.
[24]PEARSON K.On lines and planes of closest fit to systems of points in space[J].Philosophical Magazine,1901,2(11):559-572.
[25]DWORK C,TALWAR K,THAKURTA A,et al.Analyzegauss:optimal bounds for privacy-preserving principal component analysis[C]//Proceedings of the 46th Annual ACM Symposium on Theory of Computing.New York:ACM,2014:11-20.
[26]PENG C G,ZHAO Y Y,FAN M M.A differential private data publishing algorithm via principal component analysis based on maximum information coefficient[J].Netinfo Security,2020,20(2):37-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!