计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240500132-7.doi: 10.11896/jsjkx.240500132

• 信息安全 • 上一篇    下一篇

一种结合数据集蒸馏的联邦学习隐私保护方法

王春东, 张清华, 付浩然   

  1. 天津理工大学计算机科学与工程学院 天津 300384
    “计算机病毒防治技术”国家工程实验室 天津 300384
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 王春东(michael3769@163.com)
  • 基金资助:
    国家自然科学基金项目(U1536122);天津市教委联合基金(2021YJSB252);天津市科委重大专项(15ZXDSGX00030)

Federated Learning Privacy Protection Method Combining Dataset Distillation

WANG Chundong, ZHANG Qinghua, FU Haoran   

  1. School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China
    National Engineering Laboratory of Computer Virus Prevention and Control Technology,Tianjin 300384,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:WANG Chundong,born in 1969,Ph.D,professor,is a member of CCF(No.16230M).His main research interests include big data and smart computing security,network security situation awareness,etc.
  • Supported by:
    National Natural Science Foundation of China(U1536122),Joint Funds of the Tianjin Municipal Commission of Education,China(2021YJSB252) and Science and Technology Commission Major Special Projects of Tianjin,China(15ZXDSGX00030).

摘要: 联邦学习通过交换模型参数而不是数据的方式来训练得到全局模型,以达成隐私保护的目的。但大量研究表明,攻击者可以通过截取到的梯度反推出原始的训练数据,导致客户端的隐私泄露。此外,不同客户端采样方式的不同会导致收集到的数据呈现出非独立同分布的现象,这种数据异质性会影响到整体模型的训练性能。为应对梯度反演攻击,将数据蒸馏方法引入到联邦学习框架中,同时结合数据增强方式加强合成数据的可用性。此外,针对不同机构的医疗数据存在的数据异质性问题,将批量归一化层引入客户端,以缓解客户端漂移现象,提高整体模型的性能表现。实验结果表明,在获得与其他联邦学习范式相近性能的同时,结合数据蒸馏的联邦学习方法也提高了对医疗数据隐私的保护力度。

关键词: 联邦学习, 隐私保护, 数据蒸馏, 图像分类, 数据异质性

Abstract: Federated learning trains a global model by exchanging model parameters rather than data,with the goal of achieving privacy protection.However,a large number of studies have shown that attackers can infer the original training data through intercepted gradients,leading to privacy leakage on clients.In addition,the different sampling methods of different clients can lead to the phenomenon of non independent and identically distributed collected data,which can affect the overall training performance of the model.To cope with gradient inversion attacks,the data distillation method is introduced into the federated learning framework,while combining data augmentation methods to enhance the availability of synthesized data.In addition,to address the issue of data heterogeneity in medical data from different institutions,a batch normalization layer is introduced into the client to alleviate client drift and improve the overall performance of the model.Experimental results indicate that while achieving similar performance to other federated learning paradigms,the federated learning method combined with data distillation also enhances the protection of medical data privacy.

Key words: Federated learning, Privacy protection, Data distillation, Image classification, Data heterogeneity

中图分类号: 

  • TP183
[1]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[2]GUO P,WANG P,ZHOU J,et al.Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:2423-2432.
[3]KUMAR R,KHAN AA,KUMAR J,et al.Blockchain-federated-learning and deep learning models for covid-19 detection using ct imaging[J].IEEE Sensors Journal,2021,21(14):16301-16314.
[4]ZHU L,LIU Z,HAN S.Deep leakage from gradients[J].Advances in Neural Information Processing Systems,2019,32.
[5]ZHAO B,MOPURI K R,BILEN H.idlg:Improved deep leakage from gradients[J].arXiv:2001.02610,2020.
[6]GEIPIN J,BAUERMEISTER H,DRÖGE H,et al.Invertinggradients-how easy is it to break privacy in federated learning?[J].Advances in Neural Information Processing Systems,2020,33:16937-16947.
[7]WAINAKH A,VENTOLA F,MÜßIG T,et al.User-level label leakage from gradients in federated learning[J].arXiv:2105.09369,2021.
[8]YIN H,MALLYA A,VAHDAT A,et al.See through gra-dients:Image batch recovery viagradinversion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:16337-16346.
[9]WEI W,LIU L,WU Y,et al.Gradient-leakage resilient federated learning[C]//2021 IEEE 41st International Conference on Distributed Computing Systems(ICDCS).IEEE,2021:797-807.
[10]KU H,SUSILO W,ZHANG Y,et al.Privacy-preserving federated learning in medical diagnosis with homomorphic re-encryption[J].Computer Standards & Interfaces,2022,80:103583.
[11]IOFFE S,SZEGEDY C.Batch normalization:Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning.pmlr,2015:448-456.
[12]SUN G,CONG Y,DONG J,et al.Data poisoning attacks on fe-derated machine learning[J].IEEE Internet of Things Journal,2021,9(13):11365-11375.
[13]FANG M,CAO X,JIA J,et al.Local model poisoning attacks to {Byzantine-Robust} federated learning[C]//29th USENIX Security Symposium(USENIX Security 20).2020:1605-1622.
[14]BERNSTEIN J,ZHAO J,AZIZZADENESHELI K,et al.signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant.CoRR abs/1810.05291(2018)[J].arXiv:1810.05291,2018.
[15]WANG H,SREENIVASAN K,RAJPUT S,et al.Attack of the tails:Yes,you really can backdoor federatedlearning[J].Advances in Neural Information Processing Systems,2020,33:16070-16084.
[16]XIE C,HUANG K,CHEN P Y,et al.Dba:Distributed backdoor attacks against federated learning[C]//International Conference on Learning Representations.2019.
[17]BAGDASARYAN E,VEIT A,HUA Y,et al.How to backdoor federated learning[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:2938-2948.
[18]LYU L,CHEN C.A Novel Attribute Reconstruction Attack in Federated Learning[J].arXiv:2108.06910,2021.
[19]MELIS L,SONG C,DE CRISTOFARO E,et al.Exploiting unintended feature leakage in collaborative learning[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:691-706.
[20]LU Y,HUANG X,DAI Y,et al.Blockchain and federated learning for privacy-preserved data sharing in industrial IoT[J].IEEE Transactions on Industrial Informatics,2019,16(6):4177-4186.
[21]PARK J,LIM H.Privacy-preserving federated learning usinghomomorphic encryption[J].Applied Sciences,2022,12(2):734.
[22]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[C]//Proceedings of Machine Learning and Systems.2020:429-450.
[23]ARIVAZHAGAN M G,AGGARWAL V,SINGH A K,et al.Federated learning with personalization layers[J].arXiv:1912.00818,2019.
[24]TDINH C,TRAN N,NGUYEN J.Personalized federatedlearning withmoreau envelopes[J].Advances in Neural Information Processing Systems,2020,33:21394-21405.
[25]ZHAO B,MOPURI K R,BILEN H.Dataset condensation with gradient matching[J].arXiv:2006.05929,2020.
[26]ZHAO B,BILEN H.Dataset condensation with differentiablesiamese augmentation[C]//International Conference on Machine Learning.PMLR,2021:12674-12685.
[27]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[C]//Proceedings of the IEEE.1998:2278-2324.
[28]KRIZHEVSKY A,HINTON G.Learning multiple layers of features from tiny images[J].Handbook of Systemic Autoimmune Diseases,2009,1(4).
[29]CHOWDHURY M E H,RAHMAN T,KHANDAKAR A,et al.Can AI help in screening viral and COVID-19 pneumonia?[J].IEEE Access,2020,8:132665-132676.
[30]RAHMAN T,KHANDAKAR A,QIBLAWEY Y,et al.Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images[J].Computers in Biology and Medicine,2021,132:104319.
[31]YANG J,SHI R,NI B.Medmnist classification decathlon:Alightweight automl benchmark for medical image analysis[C]//2021 IEEE 18th International Symposium on Biomedical Imaging(ISBI).IEEE,2021:191-195.
[32]WANG T,ZHU J Y,TORRALBA A,et al.Dataset distillation[J].arXiv:1811.10959,2018.
[33]BOHDAL O,YANG Y,HOSPEDALES T.Flexible dataset distillation:Learn labels instead of images[J].arXiv:2006.08572,2020.
[34]LI X,JIANG M,ZHANG X,et al.Fedbn:Federated learning on non-iid features via local batch normalization[J].arXiv:2102.07623,2021.
[35]HSU T M H,QI H,BROWN M.Measuring the effects of non-identical data distribution for federated visual classification[J].arXiv:1909.06335,2019.
[36]GUO S,YANG X,FENG J,et al.FedGR:Federated Learningwith Gravitation Regulation for Double Imbalance Distribution[C]//International Conference on Database Systems for Advanced Applications.Cham:Springer Nature Switzerland,2023:703-718.
[37]YUROCHKIN M,AGARWAL M,GHOSH S,et al.Bayesian nonparametric federated learning of neural networks[C]//International Conference on Machine Learning.2019:7252- 7261.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!