计算机科学 ›› 2023, Vol. 50 ›› Issue (11): 317-326.doi: 10.11896/jsjkx.221100224
冯晨, 顾晶晶
FENG Chen, GU Jingjing
摘要: 联邦学习有效解决了数据孤岛问题,但仍然存在一些挑战。首先,联邦学习的训练节点具有较大的硬件异构性,对训练速度和模型性能存在影响,现有工作主要集中于联邦优化,但多数方法没有解决同步通信模式下各节点计算时间难以协调导致资源浪费的问题;此外,联邦学习中多数训练节点为移动设备,网络环境差,通信开销高,导致了更严重的网络瓶颈。已有方法通过对训练节点上传的梯度进行压缩来降低通信开销,但不可避免地带来了模型性能损失,难以达到较好的质量和效率的平衡。针对上述难题,在计算阶段,提出了自适应梯度聚合(Adaptive Federated Averaging,AFA),根据各个节点的硬件性能自适应协调本地训练的迭代周期,使得等待全局梯度下载的空闲时间整体最小化,提高了联邦学习的计算效率。在通信阶段,提出双重稀疏化(Double Sparsification,DS),通过在训练节点端和参数服务器端进行梯度稀疏化来最大化降低通信开销。此外,各个训练节点根据本地梯度信息和全局梯度信息的丢失值进行误差补偿,以较小的模型性能损失换取较大的通信开销降低。在图像分类数据集和时序预测数据集上进行实验,结果证明,所提方案有效提高了联邦学习训练的加速比,对模型性能也有一定提升。
中图分类号:
[1]GEORGE J,GURRAM P.Distributed stochastic gradient de-scent with event-triggered communication[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:7169-7178. [2]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[J/OL].Advances in Neural Information Processing Systems,2012,25.https://scholar.google.com/scholar?hl=zh-CN&as_sdt=0%2C5&q=Imagenet+classificationwithdeepconvolutionalneura-lnetworks&btnG=. [3]DENG J,DONG W,SOCHER R,et al.Imagenet:A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and Pattern Recognition.IEEE,2009:248-255. [4]LIU L,WANG T,PENG S,et al.Edge-Based Model Cleaning and Device Clustering in Federated Learning[J].Chinese Journal of Computers,2021,44(12):2515-2528. [5]ALISTARH D,GRUBIC D,LI J,et al.QSGD:Communication-efficient SGD via gradient quantization and encoding[J].arXiv:1610.02132,2016. [6]WEN W,XU C,YAN F,et al.Terngrad:Ternary gradients to reduce communication in distributed deep learning[J].arXiv:1705.07878,2017. [7]ZHANG P,WEI X M.Dynamic QoS Optimization MethodBased on Federal Learning in Mobile Edge Computing[J].Chinese Journal of Computers,2021,44(12):2431-2446. [8]YANG Z,CHEN M,SAAD W,et al.Energy efficient federated learning over wireless communication networks[J].IEEE Transactions on Wireless Communications,2020,20(3):1935-1949. [9]HAN Y,ZHANG X.Robust federated learning via collaborative machine teaching[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:4075-4082. [10]GHOSH A,CHUNG J,YIN D,et al.An efficient framework for clustered federated learning[J].Advances in Neural Information Processing Systems,2020,33:19586-19597. [11]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282. [12]KARIMIREDDY S P,KALE S,MOHRI M,et al.Scaffold:Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning.PMLR,2020:5132-5143. [13]XU J,HUANG S L,SONG L,et al.Live gradient compensation for evading stragglers in distributed learning[C]//INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10. [14]DAI P,HU K,WU X,et al.Asynchronous Deep Reinforcement Learning for Data-Driven Task Offloading in MEC-Empowered Vehicular Networks[C]//INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10. [15]IVKIN N,ROTHCHILD D,ULLAH E,et al.Communication-efficient distributed SGD with sketching[J].arXiv:1903.04488,2019. [16]GAO H,XU A,HUANG H.On the convergence of communication-efficient local sgd for federated learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7510-7518. [17]XU H,HO C Y,ABDELMONIEM A M,et al.Grace:A compressed communication framework for distributed machine learning[C]//2021 IEEE 41st International Conference on Distributed Computing Systems(ICDCS).IEEE,2021:561-572. [18]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[J].Proceedings of Machine Learning and Systems,2020,2:429-450. [19]ZHU L,LIN H,LU Y,et al.Delayed gradient averaging:Tole-rate the communication latency for federated learning[J].Advances in Neural Information Processing Systems,2021,34:29995-30007. [20]WANG S,PI A,ZHOU X.Scalable distributed dl training:Ba-tching communication and computation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:5289-5296. [21]SUN J,CHEN T,GIANNAKIS G,et al.Communication-effi-cient distributed learning via lazily aggregated quantized gra-dients[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems(NIPS’19).2019:3370-3380. [22]CHEN S,WANG W,PAN S J.Deep neural network quantization via layer-wise optimization using limited training data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:3329-3336. [23]FAGHRI F,TABRIZIAN I,MARKOV I,et al.Adaptive gradient quantization for data-parallel sgd[J].Advances in Neural Information Processing Systems,2020,33:3174-3185. [24]CHEN Y,BLUM R S,TAKÁČ M,et al.Distributed learningwith sparsified gradient differences[J].IEEE Journal of Selected Topics in Signal Processing,2022,16(3):585-600. [25]AJI A F,HEAFIELD K.Sparse communication for distributed gradient descent[J].arXiv:1704.05021,2017. [26]LIN Y,HAN S,MAO H,et al.Deep gradient compression:Reducing the communication bandwidth for distributed training[J].arXiv:1712.01887,2017. [27]ABDELMONIEM A M,CANINI M.DC2:Delay-aware Com-pression Control for Distributed Machine Learning[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications.IEEE,2021:1-10. [28]XU H,KOSTOPOULOU K,DUTTA A,et al.DeepReduce:A Sparse-tensor Communication Framework for Federated Deep Learning[J].Advances in Neural Information Processing Systems,2021,34:21150-21163. |
|