计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 387-395.doi: 10.11896/jsjkx.230800177

• 信息安全 • 上一篇    下一篇

基于同态加密的隐私保护主成分分析方法

张金斗, 陈经纬, 吴文渊, 冯勇   

  1. 中国科学院重庆绿色智能技术研究院 重庆 400714中国科学院大学重庆学院 重庆 400714
  • 收稿日期:2023-08-28 修回日期:2024-05-31 出版日期:2024-08-15 发布日期:2024-08-13
  • 通讯作者: 陈经纬(chenjingwei@cigit.ac.cn)
  • 作者简介:(zhangjindou@cigit.ac.cn)
  • 基金资助:
    科技部重点研发计划(2020YFA0712300);重庆市自然科学基金(cstc2021jcyj-msxmX0821,CSTB2023NSCQ-MSX0441,cstc2021yszx-jcyjX0004,2022YSZX-JCX0011CSTB,CSTB2023YSZX-JCX0008);中国科学院西部青年学者项目

Privacy-preserving Principal Component Analysis Based on Homomorphic Encryption

ZHANG Jindou, CHEN Jingwei, WU Wenyuan, FENG Yong   

  1. Chongqing School,University of Chinese Academy of Sciences,Chongqing 400714,China
  • Received:2023-08-28 Revised:2024-05-31 Online:2024-08-15 Published:2024-08-13
  • About author:ZHANG Jindou,born in 1997,postgra-duate.His main research interests include homomorphic encryption and information security.
    CHEN Jingwei,born in 1984,Ph.D,professor.His main research interests include error-free numerical computation and lattice-based cryptography.
  • Supported by:
    Ministry of Science and Technology Key Research and Development Programs(2020YFA0712300),Natural Science Foundation of Chongqing(cstc2021jcyj-msxmX0821,CSTB2023NSCQ-MSX0441,cstc2021yszx-jcyjX0004,2022YSZX-JCX0011CSTB,CSTB2023YSZX-JCX0008) and Western Light of the West Young Scholars program of the Chinese Academy of Sciences.

摘要: 在现实生活中,不同的行业之间,甚至同行业不同部门之间的数据并不互通,随着计算机算力的提升,制约模型训练效果的不是算力而是数据量。因此,想要得到更好的算法模型,仅靠某一方的数据是不够的,需要两方或者多方的参与,这就要求对各方的数据进行隐私保护。除此之外,随着收集的数据越来越详细,数据的维数也越来越大。面对高维的数据,数据降维是不可缺少的环节,而在数据降维方面,主成分分析(Principal Component Analysis,PCA)是常用的手段。当拥有数据的两方想要合作进行隐私保护的数据降维时,同态加密技术是一种解决办法。同态加密技术可以在保护数据隐私的前提下对加密数据进行计算,可以用在加密数据的PCA上。针对上述应用场景,利用CKKS同态加密方案,通过幂法迭代的SVD技术设计了一种两方加密数据进行PCA的方案,在保护两方数据隐私的前提下实现数据降维的目的;通过改进传统幂法迭代步骤,避免了代价高昂的同态密文除法运算,使得在选取较小的加密参数时,也能支持更多的幂法迭代次数,从而在缩短同态计算时间的同时提高计算精度。在公共数据集上进行测试,并与现有方案进行对比,该方案在计算耗时上缩短了约80%,与明文计算结果的均方误差缩减到1%以内。

关键词: 同态加密, 隐私保护, 主成分分析, 奇异值分解, 幂法

Abstract: In real life,data is not interconnected between different industries,or even between different departments within the same industry.With the improvement of computer computing power,it is not computing power but data volume that restricts the effectiveness of model training.Therefore,in order to obtain a better algorithm model,relying solely on one party’s data is not enough.It needs the participation of two or more parties,which requires privacy protection for all parties.In addition,as data collection becomes more detailed,the data dimension also increases.For high dimension data,dimension reduction is an indispensable step.And in terms of dimension reduction,principal component analysis(PCA) is a commonly used method.Homomorphic encryption is a solution when two parties want to collaborate on privacy protection data dimension reduction.Homomorphic encryption can compute encrypted data while protecting data privacy,and can be used to compute the PCA on encrypted data.In this paper,a two party encrypted data PCA scheme is designed using the CKKS homomorphic encryption scheme and the power method for dominant eigenvectors,achieving the goal of dimension reduction while protecting the privacy of both parties’ data.By improving the traditional power method iteration steps,the expensive homomorphic ciphertext division is avoided,allowing for more iterations with small encryption parameters,thereby reducing the computing time and improving the accuracy of the computed results.Through testing on public datasets and comparing it with some existing schemes,the scheme reduces the computational time by about 80%,and reduces the mean squared error to within 1% compared to the plaintext computation results.

Key words: Homomorphic encryption, Privacy preserving, Principal component analysis, Singular value decomposition, Power method

中图分类号: 

  • TP309.7
[1]TRAUTMAN L J,ORMEROD P C.Corporate directors’ and officers’ cybersecurity standard of care:The Yahoo data breach [J].AM UL Rev,2016,66:1231.
[2]PAULINA O.Russian Web hosting provider exposes data ofmore than 54M users [EB/OL].[2024-02-21].https://cybernews.com/security/web-hosting-ucoz-uid-data-leak.
[3]KRISHI C.Microsoft Azure Hit With The Largest Data Breach In Its History;Hundreds Of Executive Accounts Compromised [EB/OL].[2024-02-21].https://techreport.com/news/microsoft-azure-hit-with-the-largest-data-breach-in-its-history-hun-dreds-of-executive-accounts-compromised.
[4]NATALIE C.Bank of america customers left in the dark about data breach for 90 days [EB/OL].[2024-02-15].https://www.forbes.com/advisor/personal-finance/data-breach-affects-bank-of-america-customers.
[5]VOIGT P,VON DEM BUSSCHE A.The eu general data protection regulation(gdpr)[M].Springer,2017.
[6]BARRETT C.Are the EU GDPR and the California CCPA becoming the de facto global standards for data privacy and protection? [J].Scitech Lawyer,2019,15(3):24-29.
[7]PARASOL M.The impact of China’s 2016 Cyber Security Law on foreign technology firms,and on China’s big data and Smart City dreams [J].Computer law & security review,2018,34(1):67-98.
[8]LI F H,LI H,JIA Y,et al.Research Scope and Development Trends in Privacy Computing [J].Journal on Communications,2016,37(4):4.
[9]LI T,SAHU A K,TALWALKAR A,et al.Federated learning:Challenges,methods,and future directions [J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[10]SABT M,ACHEMLAL M,BOUABDALLAH A.Trusted execution environment:What it is,and what it is not[C]//2015 IEEE Trustcom/BigDataSE/Ispa.IEEE,2015,1:57-64.
[11]YAO A C.Protocols for secure computations[C]//23rd Annual Symposium on Foundations of Computer Science(SFCS 1982).IEEE,1982:160-164.
[12]DURGA PRASAD K,ADI NARAYANA REDDY K,VA-SUMATHI D.Privacy-Preserving Naive Bayesian Classifier for Continuous Data and Discrete Data[C]//First International Conference on Artificial Intelligence and Cognitive Computing(AICC 2018).Springer Singapore,2019:289-299.
[13]GILAD-BACHRACH R,DOWLIN N,LAINE K,et al.Cryptonets:Applying neural networks to encrypted data with high throughput and accuracy[C]//International Conference on Machine Learning.PMLR,2016:201-210.
[14]HESAMIFARD E,TAKABI H,GHASEMI M.Cryptodl:Deep neural networks over encrypted data [J].arXiv:1711.05189,2017.
[15]SANYAL A,KUSNER M,GASCON A,et al.TAPAS:Tricks to accelerate(encrypted) prediction as a service[C]//International Conference on Machine Learning.PMLR,2018:4490-4499.
[16]BOURSE F,MINELLI M,MINIHOLD M,et al.Fast homomorphic evaluation of deep discretized neural networks[C]//Advances in Cryptology(CRYPTO 2018):38th Annual International Cryptology Conference.Santa Barbara,CA,USA,Part III 38.Springer International Publishing,2018:483-512.
[17]LU W,KAWASAKI S,SAKUMA J.Using fully homomorphic encryption for statistical analysis of categorical,ordinal and numerical data[J].Cryptology ePrint Archive,2016.
[18]RATHEE D,MISHRA P K,YASUDA M.Faster PCA andlinear regression through hypercubes in HElib[C]//Proceedings of the 2018 Workshop on Privacy in the Electronic Society.2018:42-53.
[19]BRAKERSKI Z,GENTRY C,VAIKUNTANATHAN V.(Le-veled) fully homomorphic encryption without bootstrapping [J].ACM Transactions on Computation Theory(TOCT),2014,6(3):1-36.
[20]CHEON J H,KIM A,KIM M,et al.Homomorphic encryption for arithmetic of approximate numbers[C]//Advances in Cryptology(ASIACRYPT 2017):23rd International Conference on the Theory and Applications of Cryptology and Information Security,Hong Kong,China,Part I 23.Springer International Publishing,2017:409-437.
[21]PANDA S.Principal component analysis using CKKs homomorphic scheme[C]//Cyber Security Cryptography and Machine Learning:5th International Symposium(CSCML 2021).Be’er Sheva,Israel,Springer International Publishing,2021:52-70.
[22]RIVEST R L,ADLEMAN L,DERTOUZOS M L.On databanks and privacy homomorphisms [J].Foundations of Secure Computation,1978,4(11):169-180.
[23]RIVEST R L,SHAMIR A,ADLEMAN L.A method for obtaining digital signatures and public-key cryptosystems [J].Communications of the ACM,1978,21(2):120-126.
[24]ELGAMAL T.A public key cryptosystem and a signaturescheme based on discrete logarithms [J].IEEE Transactions on Information Theory,1985,31(4):469-472.
[25]PAILLIER P.Public-key cryptosystems based on composite degree residuosity classes[C]//International Conference on the Theory and Applications of Cryptographic Techniques.Berlin,Heidelberg:Springer Berlin Heidelberg,1999:223-238.
[26]BONEH D,GOH E J,NISSIM K.Evaluating 2-DNF formulason ciphertexts[C]//Theory of Cryptography:Second Theory of Cryptography Conference(TCC 2005).Cambridge,MA,USA,Springer Berlin Heidelberg,2005:325-341.
[27]GENTRY C.Fully homomorphic encryption using ideal lattices[C]//Proceedings of the forty-first annual ACM Symposium on Theory of Computing.2009:169-178.
[28]FAN J,VERCAUTEREN F.Somewhat practical fully homo-morphic encryption[J/OL].https://eprint.iacr.org/2012/144.pdf.
[29]GENTRY C,SAHAI A,WATERS B.Homomorphic encryption from learning with errors:Conceptually-simpler,asymptotically-faster,attribute-based[C]//Advances in Cryptology-CRYPTO 2013:33rd Annual Cryptology Conference,Santa Barbara,CA,USA,Part I.Springer Berlin Heidelberg,2013:75-92.
[30]LYUBASHEVSKY V,PEIKERT C,REGEV O.On ideal lattices and learning with errors over rings[C]//Advances in Cryptology(EUROCRYPT 2010):29th Annual International Conference on the Theory and Applications of Cryptographic Techniques.French Riviera,Springer,2010:1-23.
[31]HALEVI S,SHOUP V.Algorithms in helib[C]//Advances in Cryptology-CRYPTO 2014:34th Annual Cryptology Confe-rence,Santa Barbara,CA,USA,Part I 34.2014:554-571.
[32]LINDELL Y.How to simulate it-A tutorial on the simulationproof technique[M]//Tutorials on the Foundations of Cryptography Information Security and Cryptography.2017:277-346.
[33]ALBRECHT M R,PLAYER R,SCOTT S.On the concretehardness of learning with errors [J].Journal of Mathematical Cryptology,2015,9(3):169-203.
[34]CORTEZ P,CERDEIRA A,ALMEIDA F,et al.Modeling wine preferences by data mining from physicochemical properties [J].Decision Support Systems,2009,47(4):547-53.
[35]BELHUMEUR P N,HESPANHA J P,KRIEGMAN D J.Eigenfaces vs.fisherfaces:Recognition using class specific linear projection [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7):711-720.
[36]LECUN Y,CORTES C,BURGES C.MNIST handwritten digit database.AT&T Labs(2010) [EB/OL].http://yann lecun com/exdb/mnist.
[37]XIAO H,RASUL K,VOLLGRAF R.Fashion-mnist:a novelimage dataset for benchmarking machine learning algorithms [J].arXiv:1708.07747,2017.
[38]DE VITO S,MASSERA E,PIGA M,et al.On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario [J].Sensors and Actuators B:Chemical,2008,129(2):750-757.
[39]TSANAS A,LITTLE M,MCSHARRY P,et al.Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests[J/OL].Nature Proceedings,2009.https://doi.org/10.1038/npre.2009.3920.1.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!