计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600002-8.doi: 10.11896/jsjkx.230600002

• 信息安全 • 上一篇    下一篇

基于知识蒸馏的差分隐私联邦学习方法

谭智文, 徐茹枝, 王乃玉, 罗丹   

  1. 华北电力大学控制与计算机工程学院 北京 10220
  • 发布日期:2024-06-06
  • 通讯作者: 徐茹枝(xuruzhi@ncepu.edu.cn)
  • 作者简介:(xuruzhi@ncepu.edu.cn)
  • 基金资助:
    国家自然科学基金(61972148)

Differential Privacy Federated Learning Method Based on Knowledge Distillation

TAN Zhiwen, XU Ruzhi, WANG Naiyu, LUO Dan   

  1. School of Control and Computer Engineering,North China Electric Power University,Beijing 102206,China
  • Published:2024-06-06
  • About author:TAN Zhiwen,born in 1999,postgra-duate.Her main research interests include federated learning and differentially privacy.
    XU Ruzhi,born in 1966,Ph.D,professor,master supervisor.Her main research interests include information safety,application of information technology in smart grid,and computer control.
  • Supported by:
    National Natural Science Foundation of China(61972148).

摘要: 差分隐私技术作为一种隐私保护方法,在联邦学习领域得到了广泛应用。现有的差分隐私应用于联邦学习的研究,或是未考虑无标签公共数据,或是未考虑客户端之间的数据量差异,限制了其在现实场景的应用。文中提出一种基于知识蒸馏的差分隐私联邦学习方法,引入无标签公共数据集并考虑到客户端之间数据量的差异,为此场景设计了专用的差分隐私方案。首先,按数据量大小将客户端分组为“大数据量客户端”和“一般客户端”,用大数据量客户端的数据训练教师模型,教师模型为公共数据集添加伪标签,然后,公共数据集作为“特殊客户端”与“一般客户端”共同进行联邦训练。采用差分隐私技术保证客户端的数据隐私,由于特殊客户端的数据只有标签涉及隐私,在联邦训练中为其分配比一般客户端更多的隐私预算;限制隐私预算总量,设联邦训练阶段的隐私预算为定值,根据客户端对隐私性的需求和隐私预算平行组合性质,调整伪标签添加阶段的隐私预算。在MNIST数据集和SVHN数据集上的实验表明,在同等的隐私预算消耗下,训练得到了精度比传统方法更高的模型。本方案具有可拓展性,高灵活度的隐私预算分配使其可以满足复杂的隐私需求。

关键词: 联邦学习, 差分隐私, 知识蒸馏, 隐私保护, 隐私预算

Abstract: Differential privacy technology,as a privacy protection method,has been widely applied in federated learning.The existing research on the application of differential privacy in federated learning either fails to consider unlabeled public data or the difference in data volume between clients,which limits its application in real-world scenarios.This paper proposes a differential privacy federated learning method based on knowledge distillation,which introduces unlabeled public datasets and considers the differences in data volume between clients.A dedicated differential privacy scheme is designed for this scenario.Firstly,the clients are grouped into “large data clients” and “general clients” based on the size of the data.The teacher model is trained using the data from the large data clients,and the teacher model adds pseudo labels to the public dataset.Then,the public dataset is used as a “special client” to jointly conduct federated training with the “general client”.Adopting differential privacy technology to ensure the data privacy of clients,as the data of special clients only involves privacy with labels,more privacy budgets are allocated to them in federated training compared to general clients.Limit the total amount of privacy budget,set the privacy budget for the federal training stage as a fixed value,and adjust the privacy budget for the pseudo label addition stage based on the client’s privacy needs and the parallel combination property of privacy budget.Experiments on the MNIST and SVHN datasets show that,under the same privacy budget consumption,the trained model has higher accuracy than traditional methods.This scheme has scalability,and its high flexibility of privacy budget allocation enables it to meet complex privacy needs.

Key words: Federated learning, Differential privacy, Knowledge distillation, Privacy protection, Privacy budget

中图分类号: 

  • TP309.2
[1]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics,2017:1273-1282.
[2]DWORK C,MC-SHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.2006:265-284.
[3]DWORK C,KENTHAPADI K,MCSHERRY F,et al.Our Data,Ourselves:Privacy Via Distributed Noise Generation[C]//International Conference on Advances in Cryptology-eurocrypt.DBLP,2006:486-503.
[4]PHAN N H,WU X,HU H,et al.Adaptive Laplace Mechanism:Differential Privacy Preservation in Deep Learning[C]//IEEE International Conference on Data Mining.2017:385-394.
[5]ABADI M,CHU A,GOODFELLOW I,et al.Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna,Austria:ACM,2016:308-318.
[6]LU Y,HUANG X,DAI Y,et al.Differentially private asyn-chronous federated learning for mobile edge computing in urban informatics[J].IEEE Transactions on Industrial Informatics,2020,16(3):2134-2143.
[7]WEI K,LI J,DING M,et al.Federated learning with differential privacy:algorithms and performance analysis[J].IEEE Transactions on Information Forensics and Security,2020,15:3454-3469.
[8]MC MAHAN H B,RAMAGE D,TALWAR K,et al.Learning differentially private recurrent language models[J].arXiv:1710.06963,2017.
[9]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[10]LIU L,ZHANG J,SONG S H,et al.Communication-EfficientFederated Distillation with Active Data Sampling[J].arXiv:2203.06900,2022.
[11]ITAHARA S,NISHIO T,KODA Y,et al.Distillation-BasedSemi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data[J].IEEE Transactions on Mobile Computing,2021(1):191-205.
[12]SUN L,LYU L.Federated Model Distillation with Noise-Free Differential Privacy[C]//International Joint Conference on Artificial Intelligence.International Joint Conferences on Artificial Intelligence Organization,2021:1563-1572.
[13]ZHAO Y,LI M,LAI L,et al.Federated learning with non-iid data[J].arXiv:1806.00582,2018.
[14]YAO X,HUANG T,ZHANG R X,et al.Federated learning with unbiased gradient aggregation and controllable meta updating[J].arXiv:1910.08234,2019.
[15]ZHU L,LIU X,LI Y,et al.A Fine-Grained Differentially Private Federated Learning Against Leakage From Gradients[J].IEEE Internet of Things Journal,2022,9(13):11500-11512.
[16]PAPERNOT N,ABADI M,ERLINGSSON L,et al.Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data[J].arXiv:1610.05755,2016.
[17]KERKOUCHE R,CS G,CASTELLUCCIA C,et al.Constrained Differentially Private Federated Learning for Low-bandwidth Devices[J].arXiv:2103.00342,2021.
[18]PKAIROUZ H,MCMAHAN B,AVENT B,et al.Advances and Open Problems in Federated Learning[J].arXiv:1912.04977,2019.
[19]SHI H,ZHANG Y,SHEN Z,et al.Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning[J].arXiv:2109.14611,2021.
[20]ZHANG T,SONG A,DONG X,et al.Privacy-Preserving Asynchronous Grouped Federated Learning for IoT[J].IEEE Internet of Things Journal,2022,9(7):5511-5523.
[21]HUANG X,DING Y,JIANG Z L,et al.DP-FL:a novel diffe-rentially private federated learning framework for the unbalanced data[J].World Wide Web,2020,23:2529-2545.
[22]LIU J,LOU J,XIONG L,et al.Projected Federated Averaging with Heterogeneous Differential Privacy[J].PVLDB,2022,15(4):828-840.
[23]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[J].Artificial Intelligence and Statistics,2017:1273-1282.
[24]DWORK C.Differential Privacy[J].Lecture Notes in Computer Science,2006,26(2):1-12.
[25]HAEBERLEN A,PIERCE B C,NARAYAN.Differential privacy under fire[C]//Proceedings of the 20th USENIX Conference on Security.San Francisco,USA,2011:33-33.
[26]MCSHERRY F.Privacy integrated queries:An extensible plat-form for privacy-preserving data analysis[J].Communications of the ACM,2010,53(9):89-97.
[27]LECUN Y.The MNIST Database of Handwritten Digits[OL].http://yann.lecun.com/exdb/mnist/.
[28]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[29]NETZER Y,WANG T,COATES A,et al.Reading digits in na-tural images with unsupervised feature learning[C]//NIPS Workshop on Deep Learning & Unsupervised Feature Lear-ning.2011.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!