计算机科学 ›› 2022, Vol. 49 ›› Issue (9): 318-325.doi: 10.11896/jsjkx.220300190
吕由1,2, 吴文渊1
LYU You1,2, WU Wen-yuan1
摘要: 线性回归是一种基础且应用广泛的机器学习算法,线性回归模型的训练通常依赖于大量的数据,而现实中数据集一般由不同的用户持有且包含用户的隐私信息,当多个用户想要集中大量的数据训练效果更好的模型时,会不可避免地涉及用户的隐私问题。同态加密作为一种隐私保护技术,可以有效解决计算中的隐私泄露问题。针对数据集水平分布在两个用户上的场景,结合CKKS同态加密技术,设计了一种新的基于混合迭代方法的隐私保护线性回归方案。该方案分为两个阶段:第一阶段实现了密文域上的随机梯度下降算法;第二阶段设计了一种安全两方快速下降协议,该协议的核心思想基于雅可比迭代算法,可以有效弥补实际应用中梯度下降法收敛效果不佳的缺陷,加速了模型的收敛,从而降低了方案的计算代价和通信损耗,在高效训练线性回归模型的同时保护了两个用户的数据隐私。分析了方案的效率、通信损耗以及安全性,利用C++实现了该方案并将其应用于真实数据集。大量实验结果表明,该方案可以高效地解决特征规模较大的线性回归问题,可决系数的相对误差小于0.001,这表明得到的隐私保护线性回归模型在真实数据集上的应用效果接近于直接在明文数据上求得的模型,可以满足特定场景下的实际应用需求。
中图分类号:
[1]YANG Q,LIU Y,CHEN T J,et al.Federated Machine Lear-ning:Concept and Applications[J].ACM Transactions on Intelligent Systems Technology,2019,10(2):1-19. [2]BOULEMTAFES A,DERHAB A,CHALLAL Y.A review of privacy-preserving techniques for deep learning[J].Neurocomputing,2020,384:21-45. [3]TREVOR H,ROBERT T,JEROME F.The elements of statistical learning:data mining,inference and prediction[M].Sprin-ger,2009. [4]YAO A C C.How to generate and exchange secrets[C]//Proceedings of the 27th Annual Symposium on Foundations of Computer Science.IEEE Computer Society,1986:162-167. [5]PAILLIER P.Public-key cryptosystems based on composite degree residuosity classes[C]//Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques.Springer-Verlag,1999:223-238. [6]LYNBASHEVSKY V,PEIKERT C,REGEV O.On Ideal Lattices and Learning with Errors over Rings[M]//Advances in Cryptology-Eurocrypt 2010.Berlin:Springer,2010:1-23. [7]GENTRY C.Fully Homomorphic Encryption Using Ideal Lattices[M]//Proceedings of the 2009 Acm Symposium on Theory of Computing(Stoc'09).2009:169-178. [8]BRAKERSKI Z,GENTRY C,VAIKUNTANATHAN V.(Le-veled) Fully Homomorphic Encryption without Bootstrapping[J].ACM Trans. Comput. Theory,2014,6(3):1-36. [9]FAN J,VERCAUTEREN F.Somewhat practical fully homo-morphic encryption[J].IACR Cryptology ePrint Archive,2012,144:1-19. [10]CHEON J H,KIM A,KIM M,et al.Homomorphic Encryption for Arithmetic of Approximate Numbers[M]//Advances in Cryptology-Asiacrypt 2017,Pt I.Cham:Springer,2017:409-437. [11]PAYMAN M,YUPENG Z.SecureML:A System for Scalable Privacy-Preserving Machine Learning[C]//2017 IEEE Symposium on Security and Privacy.2017:19-38. [12]PAYMAN M,PETER R.ABY3:A Mixed Protocol Framework for Machine Learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Association for Computing Machinery,2018:35-52. [13]NIKOLAENKO V,WEINSBERG U,IOANNIDIS S,et al.Privacy-Preserving Ridge Regression on Hundreds of Millions of Records[C]//2013 IEEE Symposium on Security and Privacy.IEEE Access,2013:334-348. [14]GIACOMELLI I,JHA S,JOYE M,et al.Privacy-PreservingRidge Regression with only Linearly-Homomorphic Encryption[M]//Applied Cryptography and Network Security.Cham:Springer,2018:243-261. [15]AKAVIA A,SHAUL H,WEISS M,et al.Linear-Regression on Packed Encrypted Data in the Two-Server Model[C]//Procee-dings of the 7th ACM Workshop on Encrypted Computing &Applied Homomorphic Cryptography.Association for Computing Machinery,2019:21-32. [16]HU S,WANG Q,WANG J,et al.Securing Fast Learning!Ridge Regression over Encrypted Big Data[M]//2016 IEEE Trustcom/BigDataSE/ISPA.IEEE Access,2016:19-26. [17]LU L,DING N.Horizontal Privacy-Preserving Linear Regression Which is Highly Efficient for Dataset of Low Dimension[C]//Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security.Association for Computing Machinery,2021:604-615. [18]LYU Y,WU W Y.Linear System Solving Scheme Based on Homomorphic Encryption[J].Computer Science,2022,49(3):338-345. [19]SMART N P,VERCAUTEREN F.Fully homomorphic SIMDoperations[J].Designs,Codes and Cryptography,2012,71(1):57-81. [20]KIM A,SONG Y,KIM M,et al.Logistic regression model trai-ning based on the approximate homomorphic encryption[J].BMC Medical Genomics,2018,11(4):23-31. [21]CHEON J H,HAN K,KIM A,et al.Bootstrapping for Approximate Homomorphic Encryption[M]//Advances in Cryptology-Eurocrypt 2018,Pt I.Cham:Springer,2018:360-384. [22]CHEON J H,HAN K,KIM A,et al.A Full RNS Variant of Approximate Homomorphic Encryption[C]//Selected Areas in Cryptography-SAC 2018.Cham:Springer,2019:347-368. [23]GENTRY C,HALEVI S,SMART N P.Homomorphic Evalua-tion of the AES Circuit[M]//Advances in Cryptology-Crypto 2012.Berlin:Springer,2012:850-867. [24]LINDNER R,PEIKERT C.Better Key Sizes(and Attacks) for LWE-Based Encryption[M]//Topics in Cryptology-Ct-Rsa 2011.Berlin:Springer,2011:319-339. [25]CHEON J H,KIM A,KIM M,et al.Implementation of HEAAN[OL].https://github.com/kimandrik/HEAAN. [26]UCI.Machine Learning Repository [OL].http://archive.ics.uci.edu/ml/datasets.php. [27]TIANCHI.Tianchi Data Sets [OL].https://tianchi.aliyun.com/dataset/. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108 |
[3] | 王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155 |
[4] | 李利, 何欣, 韩志杰. 群智感知的隐私保护研究综述 Review of Privacy-preserving Mechanisms in Crowdsensing 计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077 |
[5] | 秦小月, 黄汝维, 杨波. 基于素数幂次阶分圆环的NTRU型全同态加密方案 NTRU Type Fully Homomorphic Encryption Scheme over Prime Power Cyclotomic Rings 计算机科学, 2022, 49(5): 341-346. https://doi.org/10.11896/jsjkx.210300089 |
[6] | 王美珊, 姚兰, 高福祥, 徐军灿. 面向医疗集值数据的差分隐私保护技术研究 Study on Differential Privacy Protection for Medical Set-Valued Data 计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032 |
[7] | 任花, 牛少彰, 王茂森, 岳桢, 任如勇. 基于奇异值分解的同态可交换脆弱零水印研究 Homomorphic and Commutative Fragile Zero-watermarking Based on SVD 计算机科学, 2022, 49(3): 70-76. https://doi.org/10.11896/jsjkx.210800015 |
[8] | 吕由, 吴文渊. 基于同态加密的线性系统求解方案 Linear System Solving Scheme Based on Homomorphic Encryption 计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124 |
[9] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[10] | 金华, 朱靖宇, 王昌达. 视频隐私保护技术综述 Review on Video Privacy Protection 计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047 |
[11] | 陈长伟, 周晓峰. 快速局部协同表示分类器及其在人脸识别中的应用 Fast Local Collaborative Representation Based Classifier and Its Applications in Face Recognition 计算机科学, 2021, 48(9): 208-215. https://doi.org/10.11896/jsjkx.200800155 |
[12] | 张小艳, 李秦伟, 付福杰. 基于数字承诺的区块链交易金额保密验证方法 Secret Verification Method of Blockchain Transaction Amount Based on Digital Commitment 计算机科学, 2021, 48(9): 324-329. https://doi.org/10.11896/jsjkx.200800123 |
[13] | 雷羽潇, 段玉聪. 面向跨模态隐私保护的AI治理法律技术化框架 AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection 计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011 |
[14] | 王辉, 朱国宇, 申自浩, 刘琨, 刘沛骞. 基于用户偏好和位置分布的假位置生成方法 Dummy Location Generation Method Based on User Preference and Location Distribution 计算机科学, 2021, 48(7): 164-171. https://doi.org/10.11896/jsjkx.200800069 |
[15] | 季琰, 戴华, 姜莹莹, 杨庚, 易训. 面向混合云的可并行多关键词Top-k密文检索技术 Parallel Multi-keyword Top-k Search Scheme over Encrypted Data in Hybrid Clouds 计算机科学, 2021, 48(5): 320-327. https://doi.org/10.11896/jsjkx.200300160 |
|