计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 318-325.doi: 10.11896/jsjkx.220300190

• 信息安全 • 上一篇    下一篇

隐私保护线性回归方案与应用

吕由1,2, 吴文渊1   

  1. 1 中国科学院重庆绿色智能技术研究院 重庆 400714
    2 中国科学院大学 北京 100049
  • 收稿日期:2022-03-19 修回日期:2022-06-03 出版日期:2022-09-15 发布日期:2022-09-09
  • 通讯作者: 吴文渊(wuwenyuan@cigit.ac.cn)
  • 作者简介:(lvyou@cigit.ac.cn)
  • 基金资助:
    科技部重点研发项目(2020YFA0712303);贵州省科技计划项目([2020]4Y056);重庆市在渝院士牵头科技创新引导专项(cstc2020yszx-jcyjX0005)

Privacy-preserving Linear Regression Scheme and Its Application

LYU You1,2, WU Wen-yuan1   

  1. 1 Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences,Chongqing 400714,China
    2 University of Chinese Academy Sciences,Beijing 100049,China
  • Received:2022-03-19 Revised:2022-06-03 Online:2022-09-15 Published:2022-09-09
  • About author:LYU You,born in 1996,postgraduate.His main research interests include homomorphic encryption and information security.
    WU Wen-yuan,born in 1976,Ph.D,professor.His main research interests include lattice based cryptography,automated reasoning and symbolic computation.
  • Supported by:
    National Key Research and Development Project(2020YFA0712303),Guizhou Science and Technology Program([2020]4Y056) and Chongqing Science and Technology Program(cstc2020yszx-jcyjX0005).

摘要: 线性回归是一种基础且应用广泛的机器学习算法,线性回归模型的训练通常依赖于大量的数据,而现实中数据集一般由不同的用户持有且包含用户的隐私信息,当多个用户想要集中大量的数据训练效果更好的模型时,会不可避免地涉及用户的隐私问题。同态加密作为一种隐私保护技术,可以有效解决计算中的隐私泄露问题。针对数据集水平分布在两个用户上的场景,结合CKKS同态加密技术,设计了一种新的基于混合迭代方法的隐私保护线性回归方案。该方案分为两个阶段:第一阶段实现了密文域上的随机梯度下降算法;第二阶段设计了一种安全两方快速下降协议,该协议的核心思想基于雅可比迭代算法,可以有效弥补实际应用中梯度下降法收敛效果不佳的缺陷,加速了模型的收敛,从而降低了方案的计算代价和通信损耗,在高效训练线性回归模型的同时保护了两个用户的数据隐私。分析了方案的效率、通信损耗以及安全性,利用C++实现了该方案并将其应用于真实数据集。大量实验结果表明,该方案可以高效地解决特征规模较大的线性回归问题,可决系数的相对误差小于0.001,这表明得到的隐私保护线性回归模型在真实数据集上的应用效果接近于直接在明文数据上求得的模型,可以满足特定场景下的实际应用需求。

关键词: 隐私保护, 线性回归, 混合迭代方法, 同态加密

Abstract: Linear regression is an important and widely used machine learning algorithm.The training of linear regression model usually depends on a large amount of data.In reality,the data set is generally held by different users and contains their privacy information.When multiple users want to gather more data to train a better model,it inevitably involves users' privacy.As a privacy protection technology,homomorphic encryption can effectively solve the problem of privacy leakage in computing.A new privacy preserving linear regression scheme based on hybrid iterative method is designed for the scenario where data sets are distri-buted horizontally on two users.The scheme is divided into two stages.The first stage implements the statistic gradient descent algorithm in the ciphertext domain.In the second stage,a secure two-party fast descent protocol is designed.The core idea of the protocol is based on Jacobi iterative method,which can effectively make up for the poor convergence effect of gradient descent method in practical application,accelerate the convergence of the model,and protect the data privacy of two users while effectively training the linear regression model.The efficiency,communication loss and security of the scheme are analyzed.The scheme is implemented by using C++and applied to real data sets.A large number of experimental results show that the scheme can effectively solve the linear regression problem with large scale features.The relative error of decision coefficient is less than 0.001,which show that the application effect of the privacy preserving linear regression model in real data set is close to that obtained directly from unencrypted data,and the scheme can meet the practical application requirements in specific scenarios.

Key words: Privacy-preserving, Linear regression, Hybrid iterative method, Homomorphic encryption

中图分类号: 

  • TP309.7
[1]YANG Q,LIU Y,CHEN T J,et al.Federated Machine Lear-ning:Concept and Applications[J].ACM Transactions on Intelligent Systems Technology,2019,10(2):1-19.
[2]BOULEMTAFES A,DERHAB A,CHALLAL Y.A review of privacy-preserving techniques for deep learning[J].Neurocomputing,2020,384:21-45.
[3]TREVOR H,ROBERT T,JEROME F.The elements of statistical learning:data mining,inference and prediction[M].Sprin-ger,2009.
[4]YAO A C C.How to generate and exchange secrets[C]//Proceedings of the 27th Annual Symposium on Foundations of Computer Science.IEEE Computer Society,1986:162-167.
[5]PAILLIER P.Public-key cryptosystems based on composite degree residuosity classes[C]//Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques.Springer-Verlag,1999:223-238.
[6]LYNBASHEVSKY V,PEIKERT C,REGEV O.On Ideal Lattices and Learning with Errors over Rings[M]//Advances in Cryptology-Eurocrypt 2010.Berlin:Springer,2010:1-23.
[7]GENTRY C.Fully Homomorphic Encryption Using Ideal Lattices[M]//Proceedings of the 2009 Acm Symposium on Theory of Computing(Stoc'09).2009:169-178.
[8]BRAKERSKI Z,GENTRY C,VAIKUNTANATHAN V.(Le-veled) Fully Homomorphic Encryption without Bootstrapping[J].ACM Trans. Comput. Theory,2014,6(3):1-36.
[9]FAN J,VERCAUTEREN F.Somewhat practical fully homo-morphic encryption[J].IACR Cryptology ePrint Archive,2012,144:1-19.
[10]CHEON J H,KIM A,KIM M,et al.Homomorphic Encryption for Arithmetic of Approximate Numbers[M]//Advances in Cryptology-Asiacrypt 2017,Pt I.Cham:Springer,2017:409-437.
[11]PAYMAN M,YUPENG Z.SecureML:A System for Scalable Privacy-Preserving Machine Learning[C]//2017 IEEE Symposium on Security and Privacy.2017:19-38.
[12]PAYMAN M,PETER R.ABY3:A Mixed Protocol Framework for Machine Learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Association for Computing Machinery,2018:35-52.
[13]NIKOLAENKO V,WEINSBERG U,IOANNIDIS S,et al.Privacy-Preserving Ridge Regression on Hundreds of Millions of Records[C]//2013 IEEE Symposium on Security and Privacy.IEEE Access,2013:334-348.
[14]GIACOMELLI I,JHA S,JOYE M,et al.Privacy-PreservingRidge Regression with only Linearly-Homomorphic Encryption[M]//Applied Cryptography and Network Security.Cham:Springer,2018:243-261.
[15]AKAVIA A,SHAUL H,WEISS M,et al.Linear-Regression on Packed Encrypted Data in the Two-Server Model[C]//Procee-dings of the 7th ACM Workshop on Encrypted Computing &Applied Homomorphic Cryptography.Association for Computing Machinery,2019:21-32.
[16]HU S,WANG Q,WANG J,et al.Securing Fast Learning!Ridge Regression over Encrypted Big Data[M]//2016 IEEE Trustcom/BigDataSE/ISPA.IEEE Access,2016:19-26.
[17]LU L,DING N.Horizontal Privacy-Preserving Linear Regression Which is Highly Efficient for Dataset of Low Dimension[C]//Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security.Association for Computing Machinery,2021:604-615.
[18]LYU Y,WU W Y.Linear System Solving Scheme Based on Homomorphic Encryption[J].Computer Science,2022,49(3):338-345.
[19]SMART N P,VERCAUTEREN F.Fully homomorphic SIMDoperations[J].Designs,Codes and Cryptography,2012,71(1):57-81.
[20]KIM A,SONG Y,KIM M,et al.Logistic regression model trai-ning based on the approximate homomorphic encryption[J].BMC Medical Genomics,2018,11(4):23-31.
[21]CHEON J H,HAN K,KIM A,et al.Bootstrapping for Approximate Homomorphic Encryption[M]//Advances in Cryptology-Eurocrypt 2018,Pt I.Cham:Springer,2018:360-384.
[22]CHEON J H,HAN K,KIM A,et al.A Full RNS Variant of Approximate Homomorphic Encryption[C]//Selected Areas in Cryptography-SAC 2018.Cham:Springer,2019:347-368.
[23]GENTRY C,HALEVI S,SMART N P.Homomorphic Evalua-tion of the AES Circuit[M]//Advances in Cryptology-Crypto 2012.Berlin:Springer,2012:850-867.
[24]LINDNER R,PEIKERT C.Better Key Sizes(and Attacks) for LWE-Based Encryption[M]//Topics in Cryptology-Ct-Rsa 2011.Berlin:Springer,2011:319-339.
[25]CHEON J H,KIM A,KIM M,et al.Implementation of HEAAN[OL].https://github.com/kimandrik/HEAAN.
[26]UCI.Machine Learning Repository [OL].http://archive.ics.uci.edu/ml/datasets.php.
[27]TIANCHI.Tianchi Data Sets [OL].https://tianchi.aliyun.com/dataset/.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 王健.
基于隐私保护的反向传播神经网络学习算法
Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving
计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[4] 李利, 何欣, 韩志杰.
群智感知的隐私保护研究综述
Review of Privacy-preserving Mechanisms in Crowdsensing
计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077
[5] 秦小月, 黄汝维, 杨波.
基于素数幂次阶分圆环的NTRU型全同态加密方案
NTRU Type Fully Homomorphic Encryption Scheme over Prime Power Cyclotomic Rings
计算机科学, 2022, 49(5): 341-346. https://doi.org/10.11896/jsjkx.210300089
[6] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[7] 任花, 牛少彰, 王茂森, 岳桢, 任如勇.
基于奇异值分解的同态可交换脆弱零水印研究
Homomorphic and Commutative Fragile Zero-watermarking Based on SVD
计算机科学, 2022, 49(3): 70-76. https://doi.org/10.11896/jsjkx.210800015
[8] 吕由, 吴文渊.
基于同态加密的线性系统求解方案
Linear System Solving Scheme Based on Homomorphic Encryption
计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124
[9] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[10] 金华, 朱靖宇, 王昌达.
视频隐私保护技术综述
Review on Video Privacy Protection
计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047
[11] 陈长伟, 周晓峰.
快速局部协同表示分类器及其在人脸识别中的应用
Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition
计算机科学, 2021, 48(9): 208-215. https://doi.org/10.11896/jsjkx.200800155
[12] 张小艳, 李秦伟, 付福杰.
基于数字承诺的区块链交易金额保密验证方法
Secret Verification Method of Blockchain Transaction Amount Based on Digital Commitment
计算机科学, 2021, 48(9): 324-329. https://doi.org/10.11896/jsjkx.200800123
[13] 雷羽潇, 段玉聪.
面向跨模态隐私保护的AI治理法律技术化框架
AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection
计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011
[14] 王辉, 朱国宇, 申自浩, 刘琨, 刘沛骞.
基于用户偏好和位置分布的假位置生成方法
Dummy Location Generation Method Based on User Preference and Location Distribution
计算机科学, 2021, 48(7): 164-171. https://doi.org/10.11896/jsjkx.200800069
[15] 季琰, 戴华, 姜莹莹, 杨庚, 易训.
面向混合云的可并行多关键词Top-k密文检索技术
Parallel Multi-keyword Top-k Search Scheme over Encrypted Data in Hybrid Clouds
计算机科学, 2021, 48(5): 320-327. https://doi.org/10.11896/jsjkx.200300160
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!