计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230600002-8.doi: 10.11896/jsjkx.230600002
谭智文, 徐茹枝, 王乃玉, 罗丹
TAN Zhiwen, XU Ruzhi, WANG Naiyu, LUO Dan
摘要: 差分隐私技术作为一种隐私保护方法,在联邦学习领域得到了广泛应用。现有的差分隐私应用于联邦学习的研究,或是未考虑无标签公共数据,或是未考虑客户端之间的数据量差异,限制了其在现实场景的应用。文中提出一种基于知识蒸馏的差分隐私联邦学习方法,引入无标签公共数据集并考虑到客户端之间数据量的差异,为此场景设计了专用的差分隐私方案。首先,按数据量大小将客户端分组为“大数据量客户端”和“一般客户端”,用大数据量客户端的数据训练教师模型,教师模型为公共数据集添加伪标签,然后,公共数据集作为“特殊客户端”与“一般客户端”共同进行联邦训练。采用差分隐私技术保证客户端的数据隐私,由于特殊客户端的数据只有标签涉及隐私,在联邦训练中为其分配比一般客户端更多的隐私预算;限制隐私预算总量,设联邦训练阶段的隐私预算为定值,根据客户端对隐私性的需求和隐私预算平行组合性质,调整伪标签添加阶段的隐私预算。在MNIST数据集和SVHN数据集上的实验表明,在同等的隐私预算消耗下,训练得到了精度比传统方法更高的模型。本方案具有可拓展性,高灵活度的隐私预算分配使其可以满足复杂的隐私需求。
中图分类号:
[1]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics,2017:1273-1282. [2]DWORK C,MC-SHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.2006:265-284. [3]DWORK C,KENTHAPADI K,MCSHERRY F,et al.Our Data,Ourselves:Privacy Via Distributed Noise Generation[C]//International Conference on Advances in Cryptology-eurocrypt.DBLP,2006:486-503. [4]PHAN N H,WU X,HU H,et al.Adaptive Laplace Mechanism:Differential Privacy Preservation in Deep Learning[C]//IEEE International Conference on Data Mining.2017:385-394. [5]ABADI M,CHU A,GOODFELLOW I,et al.Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna,Austria:ACM,2016:308-318. [6]LU Y,HUANG X,DAI Y,et al.Differentially private asyn-chronous federated learning for mobile edge computing in urban informatics[J].IEEE Transactions on Industrial Informatics,2020,16(3):2134-2143. [7]WEI K,LI J,DING M,et al.Federated learning with differential privacy:algorithms and performance analysis[J].IEEE Transactions on Information Forensics and Security,2020,15:3454-3469. [8]MC MAHAN H B,RAMAGE D,TALWAR K,et al.Learning differentially private recurrent language models[J].arXiv:1710.06963,2017. [9]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [10]LIU L,ZHANG J,SONG S H,et al.Communication-EfficientFederated Distillation with Active Data Sampling[J].arXiv:2203.06900,2022. [11]ITAHARA S,NISHIO T,KODA Y,et al.Distillation-BasedSemi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data[J].IEEE Transactions on Mobile Computing,2021(1):191-205. [12]SUN L,LYU L.Federated Model Distillation with Noise-Free Differential Privacy[C]//International Joint Conference on Artificial Intelligence.International Joint Conferences on Artificial Intelligence Organization,2021:1563-1572. [13]ZHAO Y,LI M,LAI L,et al.Federated learning with non-iid data[J].arXiv:1806.00582,2018. [14]YAO X,HUANG T,ZHANG R X,et al.Federated learning with unbiased gradient aggregation and controllable meta updating[J].arXiv:1910.08234,2019. [15]ZHU L,LIU X,LI Y,et al.A Fine-Grained Differentially Private Federated Learning Against Leakage From Gradients[J].IEEE Internet of Things Journal,2022,9(13):11500-11512. [16]PAPERNOT N,ABADI M,ERLINGSSON L,et al.Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data[J].arXiv:1610.05755,2016. [17]KERKOUCHE R,CS G,CASTELLUCCIA C,et al.Constrained Differentially Private Federated Learning for Low-bandwidth Devices[J].arXiv:2103.00342,2021. [18]PKAIROUZ H,MCMAHAN B,AVENT B,et al.Advances and Open Problems in Federated Learning[J].arXiv:1912.04977,2019. [19]SHI H,ZHANG Y,SHEN Z,et al.Towards Communication-Efficient and Privacy-Preserving Federated Representation Learning[J].arXiv:2109.14611,2021. [20]ZHANG T,SONG A,DONG X,et al.Privacy-Preserving Asynchronous Grouped Federated Learning for IoT[J].IEEE Internet of Things Journal,2022,9(7):5511-5523. [21]HUANG X,DING Y,JIANG Z L,et al.DP-FL:a novel diffe-rentially private federated learning framework for the unbalanced data[J].World Wide Web,2020,23:2529-2545. [22]LIU J,LOU J,XIONG L,et al.Projected Federated Averaging with Heterogeneous Differential Privacy[J].PVLDB,2022,15(4):828-840. [23]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[J].Artificial Intelligence and Statistics,2017:1273-1282. [24]DWORK C.Differential Privacy[J].Lecture Notes in Computer Science,2006,26(2):1-12. [25]HAEBERLEN A,PIERCE B C,NARAYAN.Differential privacy under fire[C]//Proceedings of the 20th USENIX Conference on Security.San Francisco,USA,2011:33-33. [26]MCSHERRY F.Privacy integrated queries:An extensible plat-form for privacy-preserving data analysis[J].Communications of the ACM,2010,53(9):89-97. [27]LECUN Y.The MNIST Database of Handwritten Digits[OL].http://yann.lecun.com/exdb/mnist/. [28]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. [29]NETZER Y,WANG T,COATES A,et al.Reading digits in na-tural images with unsupervised feature learning[C]//NIPS Workshop on Deep Learning & Unsupervised Feature Lear-ning.2011. |
|