计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800046-6.doi: 10.11896/jsjkx.230800046

• 大数据&数据科学 • 上一篇    下一篇

基于Edge-TB的联邦学习中客户端选择策略和数据集划分研究

周天阳, 杨磊   

  1. 华南理工大学软件学院 广州 510006
  • 发布日期:2024-06-06
  • 通讯作者: 杨磊(sely@scut.edu.cn)
  • 作者简介:(zhoutianyang2002@163.com)

Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB

ZHOU Tianyang, YANG Lei   

  1. Department of Software Engineering,South China University of Technology,Guangzhou 510006,China
  • Published:2024-06-06
  • About author:ZHOU Tianyang,born in 2002,postgraduate,is a student member of CCF(No.D6441G).His main research interest is federated learning.
    YANG Lei,born in 1986,Ph.D,professor,is a member of CCF(No.60282M).His main research interests include cloud and edge computing,distributed machinelearning and federated learning.

摘要: 联邦学习是分布式机器学习在现实中的应用之一。针对联邦学习中的异构性,基于FedProx算法,提出优先选择近端项较大的客户端选择策略,效果优于常见的选择局部损失值较大的客户端选择策略,可以有效提高FedProx算法在异构数据和系统下的收敛速度,提高有限聚合次数内的准确率。针对联邦学习数据异构的假设,设计了一套异构数据划分流程,得到了基于真实图像数据集的异构联邦数据集作为实验数据集。使用开源的分布式机器学习框架Edge-TB作为实验测试平台,以异构划分后的Cifar10作为数据集,实验表明,采用新的客户端选择策略的改进FedProx算法较原算法在有限的聚合轮数内准确率提升14.96%,通信开销减小6.3%;与SCAFFOLD算法相比,准确率提升3.6%,通信开销减小51.7%,训练时间减少15.4%。

关键词: 分布式机器学习, 联邦学习, 优化算法, 正则化, 近端项

Abstract: Federated learning is one of the applications of distributed machine learning in reality.In view of the heterogeneity in Federated learning,based on FedProx algorithm,this paper proposes a client selection strategy that preferentially selects the client with large near end items.The effect is better than the common client selection strategy that selects the client with large local loss value,which can effectively improve the Rate of convergence of FedProx algorithm under heterogeneous data and systems,and improve the accuracy within limited aggregation times.According to the hypothesis of heterogeneous data in federated learning,a set of heterogeneous data partition process is designed,and the heterogeneous federated dataset based on the real image dataset is obtained as the experimental dataset.Using the open-source distributed machine learning framework Edge-TB as the experimental testing platform and the heterogeneous partitioned Cifar10 as the dataset,the experiment proves that,using the new client selection strategy,the accuracy of the improved FedProx algorithm improves by 14.96%,and the communication overhead reduces by 6.3% compared to the original algorithm in a limited number of aggregation round.Compared with the SCAFFOLD algorithm,the accuracy is improved by 3.6%,communication overhead is reduced by 51.7%,and training time is reduced by 15.4%.

Key words: Distributed machine learning, Federated learning, Optimization algorithm, Regularization, Proximal term

中图分类号: 

  • TP181
[1]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[2]LI T,SAHU A K,TALWALKAR A,et al.Federated learning:Challenges,methods,and future directions[J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[3]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[J].Proceedings of Machine Learning and Systems,2020,2:429-450.
[4]WANG J,LIU Q,LIANG H,et al.Tackling the objective inconsistency problem in heterogeneous federated optimization[J].Advances in Neural Information Processing Systems,2020,33:7611-7623.
[5]KARIMIREDDY S P,KALE S,MOHRI M,et al.Scaffold:Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning.PMLR,2020:5132-5143.
[6]XIE C,KOYEJO S,GUPTA I.Asynchronous federated optimization[J].arXiv:1903.03934,2019.
[7]NISHIO T,YONETANI R.Client selection for federated lear-ning with heterogeneous resources in mobile edge[C]//2019 IEEE International Conference on Communications(ICC 2019).IEEE,2019:1-7.
[8]RIBERO M,VIKALO H.Communication-efficient federatedlearning via optimal client sampling[J].arXiv:2007.15197,2020.
[9]CHEN W,HORVATH S,RICHTARIK P.Optimal client sampling for federated learning[J].arXiv:2010.13723,2020.
[10]CHO Y J,WANG J,JOSHI G.Client selection in federatedlearning:Convergence analysis and power-of-choice selection strategies[J].arXiv:2010.01243,2020.
[11]LAI F,ZHU X,MADHYASTHA H V,et al.Oort:EfficientFederated Learning via Guided Participant Selection[C]//OSDI.2021:19-35.
[12]FRABONI Y,VIDAL R,KAMENI L,et al.Clustered sampling:low-variance and improved represent ativity for clients selection in federated learning[C]//International Conference on Machine Learning.New York:PMLR,2021:3407-3416.
[13]WANG H,KAPLAN Z,NIU D,et al.Optimizing federatedlearning on non-iid data with reinforcement learning[C]//IEEE Conference on Computer Communications(INFOCOM 2020).IEEE,2020:1698-1707.
[14]CALDAS S,DUDDU S M K,WU P,et al.Leaf:A benchmark for federated settings[J].arXiv:1812.01097,2018.
[15]ZHAO Y,LI M,LAI L,et al.Federated learning with non-iiddata[J].arXiv:1806.00582,2018.
[16]ZHU H,XU J,LIU S,et al.Federated learning on non-IID data:A survey[J].Neurocomputing,2021,465:371-390.
[17]LI Q,DIAO Y,CHEN Q,et al.Federated learning on non-iid data silos:An experimental study[C]//2022 IEEE 38th International Conference on Data Engineering(ICDE).IEEE,2022:965-978.
[18]YANG L,WEN F,CAO J,et al.Edgetb:A hybrid testbed for distributed machine learning at the edge with high fidelity[J].IEEE Transactions on Parallel and Distributed Systems,2022,33(10):2540-2553.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!