面向联邦学习的高效分布式训练框架

doi:10.11896/jsjkx.221100224

计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 317-326.doi: 10.11896/jsjkx.221100224

面向联邦学习的高效分布式训练框架

冯晨, 顾晶晶

南京航空航天大学计算机科学与技术学院南京 211106

收稿日期:2022-11-26 修回日期:2023-03-31 出版日期:2023-11-15 发布日期:2023-11-06
通讯作者: 顾晶晶(gujingjing＠nuaa.edu.cn)
作者简介:(fengchen_98@163.com)
基金资助:
国家自然科学基金(62072235)

Efficient Distributed Training Framework for Federated Learning

FENG Chen, GU Jingjing

School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China

Received:2022-11-26 Revised:2023-03-31 Online:2023-11-15 Published:2023-11-06
About author:FENG Chen,born in 1998,postgra-duate.His main research interests include distributed meachine learning and federated learning.GU Jingjing,born in 1986,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include mobile computing and data mining.
Supported by:
National Natural Science Foundation of China(62072235).

摘要/Abstract

摘要： 联邦学习有效解决了数据孤岛问题,但仍然存在一些挑战。首先,联邦学习的训练节点具有较大的硬件异构性,对训练速度和模型性能存在影响,现有工作主要集中于联邦优化,但多数方法没有解决同步通信模式下各节点计算时间难以协调导致资源浪费的问题;此外,联邦学习中多数训练节点为移动设备,网络环境差,通信开销高,导致了更严重的网络瓶颈。已有方法通过对训练节点上传的梯度进行压缩来降低通信开销,但不可避免地带来了模型性能损失,难以达到较好的质量和效率的平衡。针对上述难题,在计算阶段,提出了自适应梯度聚合(Adaptive Federated Averaging,AFA),根据各个节点的硬件性能自适应协调本地训练的迭代周期,使得等待全局梯度下载的空闲时间整体最小化,提高了联邦学习的计算效率。在通信阶段,提出双重稀疏化(Double Sparsification,DS),通过在训练节点端和参数服务器端进行梯度稀疏化来最大化降低通信开销。此外,各个训练节点根据本地梯度信息和全局梯度信息的丢失值进行误差补偿,以较小的模型性能损失换取较大的通信开销降低。在图像分类数据集和时序预测数据集上进行实验,结果证明,所提方案有效提高了联邦学习训练的加速比,对模型性能也有一定提升。

关键词: 联邦学习, 分布式机器学习, 并行计算, 参数同步, 稀疏表示

Abstract: Federated learning effectively solves the problem of isolated data island,but there are some challenges.Firstly,the training nodes of federated learning have a large hardware heterogeneity,which has an impact on the training speed and model performance.The existing researches mainly focus on federated optimization,but most methods do not solve the problem of resource waste caused by the different computing time of each node in synchronous communication mode.In addition,most of the training nodes in federated learning are mobile devices,so the poor network environment leads to high communication overhead and serious network bottlenecks.Existing methods reduce the communication overhead by compressing the gradient uploaded by the training nodes,but inevitably bring the loss of model performance and it is difficult to achieve a good balance between quality and speed.To solve these problems,at the computing stage,this paper proposes adap-tive federated averaging(AFA),which adaptatively coordinates the local iteration according to the hardware performance of each node,minimizes the idle time of waiting for global gradient download and improves the computational efficiency of federated learning.In the communication stage,it proposes double sparsification(DS) to minimize the communication overhead by gradient sparsification on the training node and parameter server.In addition,each training node compensates the error according to the lost value of the local gradient and the global gra-dient,and reduces the communication cost greatly in exchange for lower model performance loss.Experimental results on the image classification dataset and the spatio-temporal prediction dataset prove that the proposed method can effectively improve the training acceleration ratio,and is also helpful to the model performance.

Key words: Federated learning, Distributed machine learning, Parallel computing, Parameter synchronization, Sparse representation

中图分类号:

TP399

冯晨, 顾晶晶. 面向联邦学习的高效分布式训练框架[J]. 计算机科学, 2023, 50(11): 317-326. https://doi.org/10.11896/jsjkx.221100224

FENG Chen, GU Jingjing. Efficient Distributed Training Framework for Federated Learning[J]. Computer Science, 2023, 50(11): 317-326. https://doi.org/10.11896/jsjkx.221100224

参考文献

[1]GEORGE J,GURRAM P.Distributed stochastic gradient de-scent with event-triggered communication[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7169-7178.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J/OL].Advances in Neural Information Processing Systems,2012,25.https://scholar.google.com/scholar?hl=zh-CN&as_sdt=0%2C5&q=Imagenet+classificationwithdeepconvolutionalneura-lnetworks&btnG=.
[3]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[4]LIU L,WANG T,PENG S,et al.Edge-Based Model Cleaning and Device Clustering in Federated Learning[J].Chinese Journal of Computers,2021,44(12):2515-2528.
[5]ALISTARH D,GRUBIC D,LI J,et al.QSGD:Communication-efficient SGD via gradient quantization and encoding[J].arXiv:1610.02132,2016.
[6]WEN W,XU C,YAN F,et al.Terngrad:Ternary gradients to reduce communication in distributed deep learning[J].arXiv:1705.07878,2017.
[7]ZHANG P,WEI X M.Dynamic QoS Optimization MethodBased on Federal Learning in Mobile Edge Computing[J].Chinese Journal of Computers,2021,44(12):2431-2446.
[8]YANG Z,CHEN M,SAAD W,et al.Energy efficient federated learning over wireless communication networks[J].IEEE Transactions on Wireless Communications,2020,20(3):1935-1949.
[9]HAN Y,ZHANG X.Robust federated learning via collaborative machine teaching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:4075-4082.
[10]GHOSH A,CHUNG J,YIN D,et al.An efficient framework for clustered federated learning[J].Advances in Neural Information Processing Systems,2020,33:19586-19597.
[11]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[12]KARIMIREDDY S P,KALE S,MOHRI M,et al.Scaffold:Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning.PMLR,2020:5132-5143.
[13]XU J,HUANG S L,SONG L,et al.Live gradient compensation for evading stragglers in distributed learning[C]//INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[14]DAI P,HU K,WU X,et al.Asynchronous Deep Reinforcement Learning for Data-Driven Task Offloading in MEC-Empowered Vehicular Networks[C]//INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[15]IVKIN N,ROTHCHILD D,ULLAH E,et al.Communication-efficient distributed SGD with sketching[J].arXiv:1903.04488,2019.
[16]GAO H,XU A,HUANG H.On the convergence of communication-efficient local sgd for federated learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7510-7518.
[17]XU H,HO C Y,ABDELMONIEM A M,et al.Grace:A compressed communication framework for distributed machine learning[C]//2021 IEEE 41st International Conference on Distributed Computing Systems(ICDCS).IEEE,2021:561-572.
[18]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[J].Proceedings of Machine Learning and Systems,2020,2:429-450.
[19]ZHU L,LIN H,LU Y,et al.Delayed gradient averaging:Tole-rate the communication latency for federated learning[J].Advances in Neural Information Processing Systems,2021,34:29995-30007.
[20]WANG S,PI A,ZHOU X.Scalable distributed dl training:Ba-tching communication and computation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:5289-5296.
[21]SUN J,CHEN T,GIANNAKIS G,et al.Communication-effi-cient distributed learning via lazily aggregated quantized gra-dients[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems(NIPS’19).2019:3370-3380.
[22]CHEN S,WANG W,PAN S J.Deep neural network quantization via layer-wise optimization using limited training data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3329-3336.
[23]FAGHRI F,TABRIZIAN I,MARKOV I,et al.Adaptive gradient quantization for data-parallel sgd[J].Advances in Neural Information Processing Systems,2020,33:3174-3185.
[24]CHEN Y,BLUM R S,TAKÁČ M,et al.Distributed learningwith sparsified gradient differences[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(3):585-600.
[25]AJI A F,HEAFIELD K.Sparse communication for distributed gradient descent[J].arXiv:1704.05021,2017.
[26]LIN Y,HAN S,MAO H,et al.Deep gradient compression:Reducing the communication bandwidth for distributed training[J].arXiv:1712.01887,2017.
[27]ABDELMONIEM A M,CANINI M.DC2:Delay-aware Com-pression Control for Distributed Machine Learning[C]//IEEE INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[28]XU H,KOSTOPOULOU K,DUTTA A,et al.DeepReduce:A Sparse-tensor Communication Framework for Federated Deep Learning[J].Advances in Neural Information Processing Systems,2021,34:21150-21163.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

面向联邦学习的高效分布式训练框架

Efficient Distributed Training Framework for Federated Learning

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0