计算机科学 ›› 2023, Vol. 50 ›› Issue (11A): 220800021-8.doi: 10.11896/jsjkx.220800021

• 信息安全 • 上一篇    下一篇

一种基于CutMix的增强联邦学习框架

王春东, 杜英琦, 莫秀良, 付浩然   

  1. “计算机病毒防治技术”国家工程实验室 天津 300384
    “学习型智能系统”教育部工程研究中心 天津 300384
    天津理工大学计算机科学与工程学院 天津 300384
  • 发布日期:2023-11-09
  • 通讯作者: 王春东(michael3769@163.com)

Enhanced Federated Learning Frameworks Based on CutMix

WANG Chundong, DU Yingqi, MO Xiuliang, FU Haoran   

  1. National Engineering Laboratory for Computer Virus Prevention and Control Technology,Tianjin 300384,China
    Engineering Research Center of Learning-Based Intelligent System,Ministry of Education,Tianjin 300384,China
    School of Computer Science and Engineering,Tianjin University of Technology,Tianjin 300384,China
  • Published:2023-11-09
  • About author:WANG Chundong,born in 1969,Ph.D,professor,is a senior member of China Computer Federation.His main researchinterests includecyberspace security,blockchain technology,etc.

摘要: 联邦学习(Federated Learning)的出现解决了传统机器学习中存在的“数据孤岛”问题,能够在保护客户端本地数据隐私的前提下进行集体模型的训练。当客户端数据为独立同分布(Independently Identically Distribution,IID)数据时,联邦学习能够达到近似于集中式机器学习的精确度。然而在现实场景下,由于客户端设备、地理位置等差异,往往存在客户端数据含有噪声数据以及非独立同分布(Non-IID)的情况。因此,提出了一种基于CutMix的联邦学习框架,即剪切增强联邦学习(CutMix Enhanced Federated Learning,CEFL)。首先通过数据清洗算法过滤掉噪声数据,再通过基于CutMix的数据增强方式进行训练,可以有效提高联邦学习模型在真实场景下的学习精度。在 MNIST和 CIFAR-10标准数据集上进行了实验,相比传统的联邦学习算法,剪切增强联邦学习在Non-IID数据下对模型的准确率分别提升了23%和19%。

关键词: 联邦学习, 非独立同分布数据, 数据清洗, 数据增强, 显著性检测

Abstract: The emergence of federated learning solves the problem of "data silos" in traditional machine learning.Federated learning enables the training of collective models while protecting the privacy of the client's local data.When the client’s dataset is independently identically distributed(IID) data,federated learning can achieve an accuracy similar to that of centralized machine learning.However,in real scenarios,due to differences in client devices and geographic locations,there are often cases where client’s dataset contain noisy data and non-independent identical distribution(Non-IID).Therefore,this paper proposes a CutMix-based federated learning framework,namely CutMix enhanced federated learning(CEFL),which first filters out noisy data through data cleaning algorithms and then trains through CutMix-based data enhancement.Compared with the traditional federated learning algorithm,the accuracy of CutMix enhanced federated learning can be improved by 23% and 19% for the model on Non-IID dataset.

Key words: Federated learning, Non-independent identically distributed data, Data cleaning, Data augmentation, Saliency detection

中图分类号: 

  • TP183
[1]RAGHU M,SCHMIDT E.A Survey of Deep Learning forScientific Discovery[J].arXiv:2003.11755,2020.
[2]POUYANFAR S,SADI S,YAN Q Y,et al.A Survey on Deep Learning:Algorithms,Techniques,and Applications[J].ACM Computing Surveys,2019,51(5):1-36.
[3]LEVINE S,PASTOR P,KRIZHEVSKY A,et al.Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection[J].arXiv:1603.02199,2016.
[4]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-Efficient Learning of Deep Networks from Decentralized Data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.2017:1273-128.
[5]YANG Q,LIU Y,CHEN T.Federated Machine Learning:Concept and Applications[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[6]ZHU H,XU J,LIU S,et al.Federated learning on non-IID data:A survey[J].Neurocomputing,2021,465:371-390.
[7]FALLAH A,MOKHTARI A A,OZDAGLAR A A,et al.Personalized Federated Learning with Theoretical Guarantees:A Model-Agnostic Meta-Learning Approach[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.Vancouver,BC,Canada,2020:3557-3568.
[8]ARIVAZHAGAN M G,AGGARWAL V,SINGH A K,et al.Federated Learning with Personalization Layers[J].arXiv:1912.00818,2019.
[9]GHOSH A,CHUNG A A,YIN J A,etal.An Efficient Framework for Clustered Federated Learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.Vancouver,BC,Canada,2020:19586-19597.
[10]WANG Y S,LIU Y A,MA W A,et al.Iterative Learning with Open-set Noisy Labels[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.2018:8688-8696.
[11]RETU,STAVROU G F A,LOCASTO A A,et al.Casting out Demons:Sanitizing Training Data for Anomaly Sensors[C]//2008 IEEE Symposium on Security and Privacy(sp 2008).2008:81-95.
[12]XIE C,KOYEJO O,GUPTA I.Zeno:Distributed stochastic gradient descent with suspicion-based fault-tolerance[C]//36th International Conference on Machine Learning(ICML 2019).2019:11928-11944.
[13]HAN B,YAO B A,YU Q A,et al.Co-Teaching:Robust Training of Deep Neural Networks with Extremely Noisy Labels[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems(NIPS’18).2018.
[14]LI A R,ZHANG A A,WANG L A,et al.Privacy-Preserving Efficient Federated-Learning Model Debugging[J].IEEE Transactions on Parallel and Distributed Systems,2022,33(10):2291-2303.
[15]LEMLEY,BAZRAFKAN J A,SHABAB.Smart Augmentation Learning an Optimal Data Augmentation Strategy[J].IEEE Access,2017,5:5858-586.
[16]CUBUK E D,ZOPH B,MANÉ D,et al.AutoAugment:Learning Augmentation Strategies From Data[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:113-123.
[17]ZHANG H,CISSE M,DAUPHIN Y N,et al.mixup:BeyondEmpirical Risk Minimization[J].arXiv:1710.09412,2018.
[18]YUN S,HAN D,OH S J,et al.CutMix:Regularization Strategy to Train Strong Classifiers With Localizable Features[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).2019:6022-6031.
[19]DEVRIES T,TAYLOR G W.Improved Regularization of Convolutional Neural Networks with Cutout[J].arXiv:1708.04552,2017.
[20]QIN X,ZHANG Z,HUANG C,et al.BASNet:Boundary-Aware Salient Object Detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2019:7471-7481.
[21]HSIEH K,PHANISHAYEE A,MUTLU O.The Non-IID Data Quagmire of Decentralized Machine Learning[C]//Proceedings of the 37th International Conference on Machine Learning.PMLR,2020:4387-4398.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!