面向联邦学习的高效分布式训练框架

doi:10.11896/jsjkx.221100224

Abstract

Abstract: Federated learning effectively solves the problem of isolated data island,but there are some challenges.Firstly,the training nodes of federated learning have a large hardware heterogeneity,which has an impact on the training speed and model performance.The existing researches mainly focus on federated optimization,but most methods do not solve the problem of resource waste caused by the different computing time of each node in synchronous communication mode.In addition,most of the training nodes in federated learning are mobile devices,so the poor network environment leads to high communication overhead and serious network bottlenecks.Existing methods reduce the communication overhead by compressing the gradient uploaded by the training nodes,but inevitably bring the loss of model performance and it is difficult to achieve a good balance between quality and speed.To solve these problems,at the computing stage,this paper proposes adap-tive federated averaging(AFA),which adaptatively coordinates the local iteration according to the hardware performance of each node,minimizes the idle time of waiting for global gradient download and improves the computational efficiency of federated learning.In the communication stage,it proposes double sparsification(DS) to minimize the communication overhead by gradient sparsification on the training node and parameter server.In addition,each training node compensates the error according to the lost value of the local gradient and the global gra-dient,and reduces the communication cost greatly in exchange for lower model performance loss.Experimental results on the image classification dataset and the spatio-temporal prediction dataset prove that the proposed method can effectively improve the training acceleration ratio,and is also helpful to the model performance.

Key words: Federated learning, Distributed machine learning, Parallel computing, Parameter synchronization, Sparse representation

CLC Number:

TP399

FENG Chen, GU Jingjing. Efficient Distributed Training Framework for Federated Learning[J].Computer Science, 2023, 50(11): 317-326.

References

[1]GEORGE J,GURRAM P.Distributed stochastic gradient de-scent with event-triggered communication[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7169-7178.
[2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J/OL].Advances in Neural Information Processing Systems,2012,25.https://scholar.google.com/scholar?hl=zh-CN&as_sdt=0%2C5&q=Imagenet+classificationwithdeepconvolutionalneura-lnetworks&btnG=.
[3]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255.
[4]LIU L,WANG T,PENG S,et al.Edge-Based Model Cleaning and Device Clustering in Federated Learning[J].Chinese Journal of Computers,2021,44(12):2515-2528.
[5]ALISTARH D,GRUBIC D,LI J,et al.QSGD:Communication-efficient SGD via gradient quantization and encoding[J].arXiv:1610.02132,2016.
[6]WEN W,XU C,YAN F,et al.Terngrad:Ternary gradients to reduce communication in distributed deep learning[J].arXiv:1705.07878,2017.
[7]ZHANG P,WEI X M.Dynamic QoS Optimization MethodBased on Federal Learning in Mobile Edge Computing[J].Chinese Journal of Computers,2021,44(12):2431-2446.
[8]YANG Z,CHEN M,SAAD W,et al.Energy efficient federated learning over wireless communication networks[J].IEEE Transactions on Wireless Communications,2020,20(3):1935-1949.
[9]HAN Y,ZHANG X.Robust federated learning via collaborative machine teaching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:4075-4082.
[10]GHOSH A,CHUNG J,YIN D,et al.An efficient framework for clustered federated learning[J].Advances in Neural Information Processing Systems,2020,33:19586-19597.
[11]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[12]KARIMIREDDY S P,KALE S,MOHRI M,et al.Scaffold:Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning.PMLR,2020:5132-5143.
[13]XU J,HUANG S L,SONG L,et al.Live gradient compensation for evading stragglers in distributed learning[C]//INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[14]DAI P,HU K,WU X,et al.Asynchronous Deep Reinforcement Learning for Data-Driven Task Offloading in MEC-Empowered Vehicular Networks[C]//INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[15]IVKIN N,ROTHCHILD D,ULLAH E,et al.Communication-efficient distributed SGD with sketching[J].arXiv:1903.04488,2019.
[16]GAO H,XU A,HUANG H.On the convergence of communication-efficient local sgd for federated learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7510-7518.
[17]XU H,HO C Y,ABDELMONIEM A M,et al.Grace:A compressed communication framework for distributed machine learning[C]//2021 IEEE 41st International Conference on Distributed Computing Systems(ICDCS).IEEE,2021:561-572.
[18]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[J].Proceedings of Machine Learning and Systems,2020,2:429-450.
[19]ZHU L,LIN H,LU Y,et al.Delayed gradient averaging:Tole-rate the communication latency for federated learning[J].Advances in Neural Information Processing Systems,2021,34:29995-30007.
[20]WANG S,PI A,ZHOU X.Scalable distributed dl training:Ba-tching communication and computation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:5289-5296.
[21]SUN J,CHEN T,GIANNAKIS G,et al.Communication-effi-cient distributed learning via lazily aggregated quantized gra-dients[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems(NIPS’19).2019:3370-3380.
[22]CHEN S,WANG W,PAN S J.Deep neural network quantization via layer-wise optimization using limited training data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3329-3336.
[23]FAGHRI F,TABRIZIAN I,MARKOV I,et al.Adaptive gradient quantization for data-parallel sgd[J].Advances in Neural Information Processing Systems,2020,33:3174-3185.
[24]CHEN Y,BLUM R S,TAKÁČ M,et al.Distributed learningwith sparsified gradient differences[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(3):585-600.
[25]AJI A F,HEAFIELD K.Sparse communication for distributed gradient descent[J].arXiv:1704.05021,2017.
[26]LIN Y,HAN S,MAO H,et al.Deep gradient compression:Reducing the communication bandwidth for distributed training[J].arXiv:1712.01887,2017.
[27]ABDELMONIEM A M,CANINI M.DC2:Delay-aware Com-pression Control for Distributed Machine Learning[C]//IEEE INFOCOM 2021－IEEE Conference on Computer Communications.IEEE,2021:1-10.
[28]XU H,KOSTOPOULOU K,DUTTA A,et al.DeepReduce:A Sparse-tensor Communication Framework for Federated Deep Learning[J].Advances in Neural Information Processing Systems,2021,34:21150-21163.

Related Articles 15

[1]	LI Ke, YANG Ling, ZHAO Yanbo, CHEN Yonglong, LUO Shouxi. EGCN-CeDML:A Distributed Machine Learning Framework for Vehicle Driving Behavior Prediction [J]. Computer Science, 2023, 50(9): 318-330.
[2]	LIN Xinyu, YAO Zewei, HU Shengxi, CHEN Zheyi, CHEN Xing. Task Offloading Algorithm Based on Federated Deep Reinforcement Learning for Internet of Vehicles [J]. Computer Science, 2023, 50(9): 347-356.
[3]	ZHAO Yuhao, CHEN Siguang, SU Jian. Privacy-enhanced Federated Learning Algorithm Against Inference Attack [J]. Computer Science, 2023, 50(9): 62-67.
[4]	CAI Qiquan, LU Juhong, YU Zhiyong, HUANG Fangwan. Data Completion of Air Quality Index Based on Multi-dimensional Sparse Representation [J]. Computer Science, 2023, 50(8): 52-57.
[5]	LI Rongchang, ZHENG Haibin, ZHAO Wenhong, CHEN Jinyin. Data Reconstruction Attack for Vertical Graph Federated Learning [J]. Computer Science, 2023, 50(7): 332-338.
[6]	ZHANG Lianfu, TAN Zuowen. Robust Federated Learning Algorithm Based on Adaptive Weighting [J]. Computer Science, 2023, 50(6A): 230200188-9.
[7]	ZHAI Xulun, ZHANG Yongguang, JIN Anzhao, QIANG Wei, LI Mengbing. Parallel DVB-RCS2 Turbo Decoding on Multi-core CPU [J]. Computer Science, 2023, 50(6): 22-28.
[8]	ZHONG Jialin, WU Yahui, DENG Su, ZHOU Haohao, MA Wubin. Multi-objective Federated Learning Evolutionary Algorithm Based on Improved NSGA-III [J]. Computer Science, 2023, 50(4): 333-342.
[9]	WANG Chundong, DU Yingqi, MO Xiuliang, FU Haoran. Enhanced Federated Learning Frameworks Based on CutMix [J]. Computer Science, 2023, 50(11A): 220800021-8.
[10]	ZHANG Lianfu, TAN Zuowen. Federated Learning Privacy-preserving Approach for Multimodal Medical Data [J]. Computer Science, 2023, 50(11A): 230800021-8.
[11]	LI Renjie, YAN Qiao. Inter-cluster Optimization for Cluster Federated Learning [J]. Computer Science, 2023, 50(11A): 221000243-5.
[12]	DING Yue, XU Chuanfu, QIU Haozhong, DAI Weixi, WANG Qingsong, LIN Yongzhen, WANG Zhenghua. Study on Cross-platform Heterogeneous Parallel Computing for Lattice Boltzmann Multi-phase Flow Simulations Based on SYCL [J]. Computer Science, 2023, 50(11): 32-40.
[13]	XU Wentao, WANG Binjun. Backdoor Defense of Horizontal Federated Learning Based on Random Cutting and GradientClipping [J]. Computer Science, 2023, 50(11): 356-363.
[14]	CAO Linxiao, LIU Jia, ZHU Yifei, ZHOU Haoquan, GONG Wei, YU Weihua, LI Chaoyou. Ventilator and Sedative Management in Networked ICUs Based on Federated Learning [J]. Computer Science, 2023, 50(10): 165-175.
[15]	LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Efficient Distributed Training Framework for Federated Learning

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0