计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 321-332.doi: 10.11896/jsjkx.220700130

• 信息安全 • 上一篇    下一篇

基于同态加密的隐私保护数据分类协议

陆星缘1,2, 陈经纬3, 冯勇3, 吴文渊3   

  1. 1 公共大数据国家重点实验室 贵阳 550025
    2 贵州大学计算机科学与技术学院 贵阳 550025
    3 中国科学院重庆绿色智能技术研究院 重庆 400714
  • 收稿日期:2022-07-12 修回日期:2022-12-14 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 陈经纬(chenjingwei@cigit.ac.cn)
  • 作者简介:(568901982@qq.com)
  • 基金资助:
    贵州省科技计划项目([2020]4Y056);科技部重点研发计划项目(2020YFA0712303);重庆市科技项目(cstc2021jcyj-msxmX0821,cstc2020yszx-jcyjX0005,cstc2021yszx-jcyjX0004,2022YSZX-JCX0011CSTB,2021000263)

Privacy-preserving Data Classification Protocol Based on Homomorphic Encryption

LU Xingyuan1,2, CHEN Jingwei3, FENG Yong3, WU Wenyuan3   

  1. 1 State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China
    2 College of Computer Science and Technology,Guizhou University,Guiyang 550025,China
    3 Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences,Chongqing 400714,China
  • Received:2022-07-12 Revised:2022-12-14 Online:2023-08-15 Published:2023-08-02
  • About author:LU Xingyuan,born in 1997,postgra-duate.His main research interests include homomorphic encryption and privacy protection.
    CHEN Jingwei,born in 1984,Ph.D,associate researcher.His main research interests include symbolic computation and lattice based cryptography.
  • Supported by:
    Guizhou Science and Technology Program([2020]4Y056),Key Research and Development Program of the Ministry of Science and Technology(2020YFA0712303) and Chongqing Science and Technology Program(cstc2021jcyj-msxmX0821, cstc2020yszx-jcyjX0005,cstc2021yszx-jcyjX0004,2022YSZX-JCX0011CSTB,2021000263).

摘要: 随着大数据、云计算技术的发展,用户对于云计算服务的需求也与日俱增。在用户申请云计算服务时,其隐私数据需要在云平台进行存储与计算,而这也带来了隐私数据泄露的问题。同态加密允许在不解密的情况下对密文进行直接运算,得到的新密文解密后即为运算结果,因此可以用于保障用户的隐私数据安全。在半诚实模型下考虑如下两方面的计算框架:用户端按照指定方式将隐私数据加密为密文后发送到服务器端,服务器端根据同态加密方案允许明文与密文间进行运算的性质,使用训练得到的明文模型对用户端发送来的加密数据进行分类,最后将加密的分类结果发送回用户端,由用户端自行解密获得隐私数据的分类结果。在这个框架下,基于同态加密方案BGV设计了超平面分类器、决策树以及KNN这3种机器学习分类算法。根据每种分类器的特性,结合SIMD技术设计不同的密文数据打包策略与分类计算流程,使得用户端与服务器端之间的通信开销大幅降低。特别地,在预测阶段,超平面分类器与决策树实现了无交互的分类,KNN仅需1次交互即可完成分类,并基于HElib同态加密库,采用C++语言实现了这3种分类器。在UCI公开数据集上,超平面分类器能够在几十毫秒到几百毫秒内完成对1个待预测样本的分类,决策树最慢能够在几十毫秒内完成,两种分类器对密文数据的预测准确率均能超过90%,两方仅需要承担用户端发送给服务器端的加密隐私数据与服务器端发送回用户端的加密分类标签的通信开销;KNN分类器平均4s左右完成对1个待预测样本的分类,对密文数据的预测准确率在90%以上,两方除了隐私数据与分类标签的通信开销外,只需要额外负担一轮服务器端与用户端的中间计算结果即可完成分类。与基于同态加密的同类协议相比,在通信轮数、预测准确率、运行效率等方面均有不同程度的改进。

关键词: 同态加密, 安全多方计算, 隐私保护, 机器学习, HElib

Abstract: With the development of big data and cloud computing,the demand for cloud computing services is growing dramatically.When users apply for cloud computing services,their privacy data needs to be stored and computed on cloud platforms,which may cause leakage of private data.Homomorphic encryption allows direct computation on ciphertexts,and the decryption of the resulting ciphertext is the same as computing on plaintexts,so homomorphic encryption can protect users' private data.Here a framework for two parties in the semi-honest model is considered.The client encrypts the privacy data into ciphertext according to a homomorphic encryption scheme and sends it to the server,and the server uses the plain machine learning model to classify the encrypted data from the client.Finally,the server sends the encrypted classification result back to the client,and the client decrypts the classification result by itself.With the framework above,three machine learning classifiers,the hyperplane,decision tree,and k-nearest neighbor classifier,based on the Brakerski-Gentry-Vaikuntanathan(BGV) homomorphic encryption scheme are investigated.According to the characteristics of each classifier,different ciphertext data packaging strategies and calculation processes are designed with single-instruction-multiple-data (SIMD) technology,which significantly reduces the communication overhead between the client and the server.In the prediction phase,the hyperplane and decision tree classifiers achieve interaction-free,and the KNN classifier only needs one interaction.Moreover,the three classifiers are implemented with a homomorphic encryption library HElib.For several UCI public datasets,the hyperplane classifier can complete the privacy-preserving classification within tens of milliseconds to hundreds of milliseconds for a single sample,and the decision tree can complete it within tens of milliseconds.The prediction accuracy of the first two classifiers for ciphertext data exceeds 90%,and the two parties only need the communication cost of the client sending the encrypted private data to the server,and the server returns the encrypted classification label to the client.The k-nearest-neighbor classifier completes one sample's classification in about 4 seconds on average,and the prediction accuracy of ciphertext data is also more than 90%.In addition to the communication overhead of privacy data and classification labels,the two parties also need an additional round of intermediate calculation results between the server and the client to complete the classification.Compared with similar protocols based on homomorphic encryption,the proposed protocols have advantages in the number of communication rounds,prediction accuracy,and computational efficiency.

Key words: Homomorphic encryption, Secure multi-party computation, Privacy protection, Machine learning, HElib

中图分类号: 

  • TP309.2
[1]YANG Y P,ZHAO Y,ZHANG J M,et al.Recent Development of Theory and Application on Homomorphic Encryption [J].Chinese Journal of Electronics & Information Technology,2021,43(2):13.
[2]YAO C C.How to generate and exchange secrets[C]//27th Annual Symposium on Foundations of Computer Science.1986.
[3]SHAMIR A.How to share a secret[J].Communications of the ACM,1979,22(11):612-613.
[4]BLAKLEY G R.Safeguarding cryptographic keys[C]//Afips.IEEE Computer Society,1979.
[5]RABIN M O.Transaction protection by beacons[J].Journal of Computer and System Sciences,1981,27(2):256-267.
[6]RIVEST R L,ADLEMAN L M,DERTOUZOS M L.On DataBanks and Privacy Homomorphisms[J].Foundations of Secure Compuation,1978,4(11):169-180.
[7]GENTRY C.A fully homomorphic encryption scheme[M].Stanford University,2009.
[8]BRAKERSKI Z,GENTRY C,VAIKUNTANATHAN V.Fully Homomorphic Encryption without Bootstrapping[J].ACM Transactions on Computation Theory (TOCT) [J].Special issue on Innovations in Theoretical Computer Science 2012-Part II,2014,6(3):1-36.
[9]FAN J,VERCAUTEREN F.Somewhat practical fully homo-morphic encryption[J].IACR Cryptology Eprint Archive,2012,2012:144.
[10]CHEON J H,KIM A,KIM M,et al.Homomorphic Encryption for Arithmetic of Approximate Numbers[C]//International Conference on the Theory and Application of Cryptology and Information Security.Cham:Springer,2017.
[11]WANG C,YAO H N,WANG B N,et al.Progress in Quantum Computing Cryptography Attacks [J].Journal of Computers,2020,43(9):1691-1707.
[12]PAILLIER P.Public-Key Cryptosystems Based on CompositeDegree Residuosity Classes[C]//Advances in Cryptology-EUROCRYPT '99,International Conference on the Theory and Application of Cryptographic Techniques.Berlin:Springer,1999.
[13]YASUMURA Y,ISHIMAKI Y,YAMANA H.Secure NaiveBayes Classification Protocol over Encrypted Data Using Fully Homomorphic Encryption[C]//iiWAS2019:The 21st International Conference on Information Integration and Web-based Applications & Services.2019.
[14]CAI Y L,TANG C M.Privacy of Outsourced Two-Party K-Means Clustering[J].Concurrency and Computation-Practice & Experience,2021,33(8):1-12.
[15]PARK S,BYUN J,LEE J.Privacy-Preserving Fair Learning of Support Vector Machine with Homomorphic Encryption[C]//The Web Conference.2022:3572-3583.
[16]JIA C F,WANG F Y,CHEN Y,et al.Machine Le-arning Algorithm for a Homomorphic Encrypted D-ata Set [J].Journal of Tsinghua University(Science and Technology),2020,60(6):456-463.
[17]DEY P,CHAULYA S K,KUMAR S.Secure decision tree twin support vector machine training and classification process for encrypted IoT data via blockchain platform [J].Concurrency and Computation Practice and Experience,2021,33(16).
[18]UCI[OL].https://archive.ics.uci.edu.
[19]XU J,WANG A D,BI M,et al.Privacy-preserving k-Nearest Neighbor Classifier [J].Journal of Software,2019,30(11):3503-3517.
[20]HElib[OL].https://github.com/shaih/HElib.
[21]GENTRY C.Fully homomorphic encryption using ideal lattices[C]//ACM.ACM,2009:169-178.
[22]DUCAS L,STEHLÉ D,FISCHLIN M,et al.Sanitization ofFHE Ciphertexts[C]//International Conferenceon Advances in Cryptology-eurocrypt.Berlin:Springer,2016.
[23]ILIASHENKO I,ZUCCA V.Faster homomorphic comparisonoperations for BGV and BFV [J].Privacy Enhancing Technologies,2021,3(2021):246-264.
[24]CORTES C.Support-Vector Networks[J].Machine Learning,1995,20(1995):273-297.
[25]KHEDR A,GULAK G,VAIKUNTANATHAN V.SHIELD:Scalable Homomorphic Implementation of Encrypted Data-Classifiers [J].IEEE Transactions on Computers,2016,65(9):2848-2858.
[26]CHEN J Y,FENG Y,LIU Y,et al.Non-interactive Privacy-Preserving Nave Bayes Classifier Using Homomorphic Encryption [C]//International Conference on Security and Privacy in New Computing Environments.Cham:Springer,2022.
[27]OpenMP[OL].https://www.openmp.org/.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!