计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 42-55.doi: 10.11896/jsjkx.240500095

• 大语言模型技术研究及应用 • 上一篇    下一篇

面向联邦大语言模型训练的传输优化技术综述

顿婧博, 李卓   

  1. 网络文化与数字传播北京市重点实验室(北京信息科技大学) 北京 100101
    北京信息科技大学计算机学院 北京 100101
  • 收稿日期:2024-05-22 修回日期:2024-09-11 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 李卓(lizhuo@bistu.edu.cn)
  • 作者简介:((dunjingbo@163.com)
  • 基金资助:
    北京市自然科学基金(4232024);国家重点研发计划(2022YFF0604502)

Survey on Transmission Optimization Technologies for Federated Large Language Model Training

DUN Jingbo, LI Zhuo   

  1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(Beijing Information Science & Technology University),Beijing 100101,China
    School of Computer Science,Beijing Information Science & Technology University,Beijing 100101,China
  • Received:2024-05-22 Revised:2024-09-11 Online:2025-01-15 Published:2025-01-09
  • About author:DUN Jingbo,born in 2001,postgra-duate.Her main research interests include federated large language model and so on.
    LI Zhuo,born in 1983,Ph.D,Ph.D professor,is a senior member of CCF(No.29832S).His main research interests include edge computing,distributed machine learning and mobile wireless networks.
  • Supported by:
    Natural Science Foundation of Beijing,China(4232024) and National Key R&D Program of China(2022YFF0604502).

摘要: 随着人工智能技术的快速发展,各类大型语言模型不断涌现。但是专用大语言模型的用户及数据集大多具有隐私性和安全性要求,数据安全隐私问题亟待解决。在此背景下,联邦大语言模型应运而生并得到越来越多的关注。由于大型语言模型庞大的数据量以及联邦学习的分布式架构,海量的参与节点与云服务器间进行大量的模型交换会产生较高的通信成本。为提升模型收敛速率,研究人员对面向联邦大语言模型训练的传输优化技术展开了研究。文章分析了联邦大语言模型所面临的挑战;综述了基于模型微调的传输优化方法、基于模型压缩的传输优化方法以及基于分布式并行处理的传输优化的优化问题;介绍了已有的开源联邦大语言模型以及所用到的传输优化技术,并对未来研究方向进行了展望。

关键词: 联邦学习, 大语言模型, 传输优化, 通信开销, 模型压缩

Abstract: With the rapid development of artificial intelligence technology,various types of large language models are emerging.However,most users and datasets participating in dedicated large language models have certain requirements for privacy and security,the data security and privacy issues need to be solved urgently,and federated large language models have emerged and gained more and more attention.Due to the huge data volume of large language models and the distributed architecture of federated learning,a large number of model exchanges between a large number of participating nodes and cloud servers result in high communication costs.In order to improve the model convergence rate,researchers have investigated transmission optimization techniques for federated large language model training.This paper analyzes the challenges of federated large language models,reviews the optimization problems of transmission optimization methods based on model fine-tuning,transmission optimization methods based on model structure compression,and transmission optimization based on distributed parallel processing;introduces existing open-source federated large language models and the transmission optimization techniques used,and gives an outlook on future research directions.

Key words: Federated learning, Large language models, Transmission optimization, Communication overhead, Model compression

中图分类号: 

  • TP393
[1]ZHAO W X,ZHOU K,LI J,et al.A survey of large language models[J].arXiv:2303.18223,2023.
[2]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[3]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[4]BEBIS G,GEORGIOPOULOS M.Feed-forward neural net-works[J].IEEE Potentials,1994,13(4):27-31.
[5]LUO J H,WU J.Neural network pruning with residual-connections and limited-data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1458-1467.
[6]NAZIR A,WANG Z.A Comprehensive Survey of ChatGPT:Advancements,Applications,Prospects,and Challenges[J].Meta-radiology,2023,1(2):100022.
[7]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9-33.
[8]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[9]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical re-port[J].arXiv:2303.08774,2023.
[10]ERHAN D,BINGIO Y,COURVILLE A,et al.Why does unsupervised pre-training help deep learning?[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:201-208.
[11]SHAHID O,POURIYEH S,PARIZI R M,et al.Communication efficiency in federated learning:Achievements and challenges[J].arXiv:2107.10996,2021.
[12]DRIESS D,XIA F,SAJJADI M S M,et al.Palm-e:An embodied multimodal language model[J].arXiv:2303.03378,2023.
[13]SUN Y,WANG S,FENG S,et al.Ernie 3.0:Large-scale know-ledge enhanced pre-training for language understanding and ge-neration[J].arXiv:2107.02137,2021.
[14]CHEN M,SHLEZINGER N,POOR H V,et al.Communication-efficient federated learning[J].Proceedings of the National Academy of Sciences,2021,118(17):e2024789118.
[15]RAJBHANDARI S,RASLEY J,RUWASE O,et al.Zero:Me-mory optimizations toward training trillion parameter models[C]//SC20:International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2020:1-16.
[16]VM K,WARRIER H,GUPTA Y.Fine Tuning LLM for Enterprise:Practical Guidelines and Recommendations[J].arXiv:2404.10779,2024.
[17]CHEN C,FENG X,ZHOU J,et al.Federated large language model:A position paper[J].arXiv:2307.08925,2023.
[18]WANG J,LIU Q,LIANG H,et al.A novel framework for the analysis and design of heterogeneous federated learning[J].IEEE Transactions on Signal Processing,2021,69:5234-5249.
[19]HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al.Parame-ter-efficient transfer learning for NLP[C]//International Conference on Machine Learning.PMLR,2019:2790-2799.
[20]HE R,LIU L,YE H,et al.On the effectiveness of adapter-basedtuning for pretrained language model adaptation[J].arXiv:2106.03164,2021.
[21]LI X L,LIANG P.Prefix-tuning:Optimizing continuous promptsfor generation[J].arXiv:2101.00190,2021.
[22]LESTER B,AL-RFOU R,CONSTANT N.The power of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021.
[23]LIU X,JI K,FU Y,et al.P-tuning:Prompt tuning can be comparable to fine-tuning across scales and tasks[C]//Proceedings of the 60th Annual Meeting of the Association for Computa-tional Linguistics(Volume 2:Short Papers).2022:61-68.
[24]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].arXiv:2106.09685,2021.
[25]LIN B Y,HE C,ZENG Z,et al.Fednlp:Benchmarking federated learning methods for natural language processing tasks[J].ar-Xiv:2104.08815,2021.
[26]CAI D,WU Y,WANG S,et al.FedAdapter:Efficient Federated Learning for Modern NLP[J].arXiv:2205.10162,2022.
[27]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[28]KIM G,YOO J,KANG S.Efficient federated learning with pre-trained large language model using several adapter mechanisms[J].Mathematics,2023,11(21):4479.
[29]SUN G,MENDIETA M,YANG T,et al.Exploring parameter-efficient fine-tuning for improving communication efficiency in federated learning[J].arXiv:2210.01708,2024.
[30]ZHAO H,DU W,LI F,et al.FedPrompt:Communication-Efficient and Privacy-Preserving Prompt Tuning in Federated Learning[C]//ICASSP 2023-2023 IEEE International Confe-rence on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5.
[31]YANG F E,WANG C Y,WANG Y C F.Efficient model personalization in federated learning via client-specific prompt ge-neration[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:19159-19168.
[32]CHE T,LIU J,ZHOU Y,et al.Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization[J].arXiv:2310.15080,2023.
[33]YI L,YU H,WANG G,et al.Fedlora:Model-heterogeneouspersonalized federated learning with lora tuning[J].arXiv:2310.13283,2023.
[34]JIANG F,DONG L,TU S,et al.Personalized wireless federated learning for large language models[J].arXiv:2404.13238,2024.
[35]JIANG J,LIU X,FAN C.Low-parameter federated learningwith large language models[J].arXiv:2307.13896,2023.
[36]BABAKNIYA S,ELKORDY A R,EZZELDIN Y H,et al.SLoRA:Federated parameter efficient fine-tuning of language mo-dels[J].arXiv:2308.06522,2023.
[37]RAJE A.Communication-Efficient LLM Training for Federated Learning[D].Pittsburgh:Carnegie Mellon University,2024.
[38]HUANG W,WANG Y,CHENG A,et al.A Fast,Performant,Secure Distributed Training Framework For LLM[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2024:4800-4804.
[39]ZHUANG W,CHEN C,LYU L.When foundation model meets federated learning:Motivations,challenges,and future directions[J].arXiv:2306.15546,2023.
[40]REED R.Pruning algorithms-a survey[J].IEEE Transactions on Neural Networks,1993,4(5):740-747.
[41]HAN S,POOL J,TRAN J,et al.Learning both weights andconnections for efficient neural network[J].arXiv:1506.02626,2015.
[42]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337.
[43]LI H,KADAV A,DURDANOVIC I,et al.Pruning Filters for Efficient ConvNets[J].arXiv.1608.08710,2016.
[44]JIANG Y,WANG S,VALLS V,et al.Model pruning enables efficient federated learning on edge devices[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(12):10374-10386.
[45]HUANG H,ZHANG L,SUN C,et al.Distributed pruning towards tiny neural networks in federated learning[C]//2023 IEEE 43rd International Conference on Distributed Computing Systems(ICDCS).IEEE,2023:190-201.
[46]MA X,FANG G,WANG X.Llm-pruner:On the structuralpruning of large language models[J].Advances in neural information processing systems,2023,36:21702-21720.
[47]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337.
[48]SUN M,LIU Z,BAIR A,et al.A Simple and Effective Pruning Approach for Large Language Models[J].arXiv:2306.11695,2023.
[49]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[50]GOU J,YU B,MAYBANK S J,et al.Knowledge distillation:A survey[J].International Journal of Computer Vision,2021,129(6):1789-1819.
[51]ANIL R,PEREYRA G,PASSOS A,et al.Large scale distributed neural network training through online distillation[J].ar-Xiv:1804.03235,2018.
[52]WU C,WU F,LYU L,et al.Communication-efficient federated learning via knowledge distillation[J].Nature Communications,2022,13(1):2032.
[53]PENG Z,FAN X,CHEN Y,et al.FedPFT:Federated ProxyFine-Tuning of Foundation Models[J].arXiv:2404.11536,2024.
[54]WU F J,LI Z T,LI Y L,et al.FedBiOT:LLM Local Fine-tuning in Federated Learning without Full Model[J].arXiv:2406.17706,2024.
[55]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[56]KIRTAS M,OIKONOMOU A,PASSALIS N,et al.Quantization-aware training for low precision photonic neural networks[J].Neural Networks,2022,155:561-573.
[57]LIU Z,OGUZ B,ZHAO C,et al.LLM-QAT:Data-Free Quantization Aware Training for Large Language Models[J].arXiv:2305.17888,2023.
[58]REISIZADEH A,MOKHTARI A,HASSANI H,et al.Fedpaq:A communication-efficient federated learning method with periodic averaging and quantization[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:2021-2031.
[59]CHEN Y,CHEN Z,WU P,et al.FedOBD:Opportunistic block dropout for efficiently training large-scale neural networks through federated learning[J].arXiv:2208.05174,2022.
[60]KIM J,LEE J H,KIM S,et al.Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization[J].Advances in Neural Information Processing Systems,2024,36.
[61]DETTMERS T,PAGNONI A,HOLTZMAN A,et al.Qlora:Efficient finetuning of quantized llms[C]//Proceedings of the 37th International Conference on Neural Information Processing System.2024:36187-36207.
[62]DETTMERS T,LEWIS M,BELKADA Y,et al.Gpt3.int8():8-bit matrix multiplication for transformers at scale[J].Advances in Neural Information Processing Systems,2022,35:30318-30332.
[63]LIN J,TANG J,TANG H,et al.AWQ:Activation-awareWeight Quantization for LLM Compression and Acceleration[J].arXiv:2306.00978,2023.
[64]BONDARENKO Y,NAGEL M,BLANKEVOORT T.Under-standing and overcoming the challenges of efficient transformer quantization[J].arXiv:2109.12948,2021.
[65]WEN Z,YIN W,ZHANG Y.Solving a low-rank factorizationmodel for matrix completion by a nonlinear successive over-relaxation algorithm[J].Mathematical Programming Computation,2012,4(4):333-361.
[66]JADERBERG M,VEDALDI A,ZISSERMAN A.Speeding up convolutional neural networks with low rank expansions[J].arXiv:1405.3866,2014.
[67]LEBEDEV V,GANIN Y,RAKHUBA M,et al.Speeding-upconvolutional neural networks using fine-tuned cp-decomposition[J].arXiv:1412.6553,2014.
[68]WU X,YAO Z,HE Y.Zeroquant-fp:A leap forward in llms post-training w4a8 quantization using floating-point formats[J].arXiv:2307.09782,2023.
[69]ZHANG M,SHEN C,YANG Z,et al.Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning[J].arXiv:2305.18403,2023.
[70]XU M,CAI D,WU Y,et al.Fwdllm:Efficient fedllm using forward gradient[J].arXiv:2308.13894,2023.
[71]QIU Q,CHENG X,SAPIRO G.DCFNet:Deep neural network with decomposed convolutional filters[C]//International Conference on Machine Learning.PMLR,2018:4198-4207.
[72]NARAYANAN D,SHOEYBI M,CASPER J,et al.Efficientlarge-scale language model training on gpu clusters using megatron-lm[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-15.
[73]JIA Z,ZAHARIA M,AIKEN A.Beyond Data and Model Parallelism for Deep Neural Networks[J].Proceedings of Machine Learning and Systems,2019,1:1-13.
[74]ANDROUTSOPOULOS K,CLARK D,HARMAN M,et al.State-based model slicing:A survey[J].ACM Computing Surveys(CSUR),2013,45(4):1-36.
[75]SU N,HU C,LI B,et al.TITANIC:Towards Production Fede-rated Learning with Large Language Models[C]//IEEE INFOCOM.2024.
[76]SHOEYBI M,PATWARY M,PURI R,et al.Megatron-lm:Training multi-billion parameter language models using model parallelism[J].arXiv:1909.08053,2019.
[77]ZHU J,LI S,YOU Y.Sky Computing:Accelerating Geo-distri-buted Computing in Federated Learning[J].arXiv:2202.11836,2022.
[78]NAGRECHA K.Systems for parallel and distributed large-mo-del deep learning training[J].arXiv:2301.02691,2023.
[79]LI S,ZHAO Y,VARMA R,et al.Pytorch distributed:Expe-riences on accelerating data parallel training[J].arXiv:2006.15704,2020.
[80]HUANG Y,CHENG Y,BAPNA A,et al.Gpipe:Efficient trai-ning of giant neural networks using pipeline parallelism[J].Advances in Neural Information Processing Systems,2019,32(10):103-112.
[81]HARLAP A,NARAYANAN D,PHANISHAYEE A,et al.Pipedream:Fast and efficient pipeline parallel dnn training[J].arXiv:1806.03377,2018.
[82]HE C,LI S,SO J,et al.Fedml:A research library and benchmark for federated machine learning[J].arXiv:2007.13518,2020.
[83]FAN T,KANG Y,MA G,et al.FATE-LLM:A IndustrialGrade Federated Learning Framework for Large Language Mo-dels[J].arXiv:2310.10049,2023.
[84]KUANG W,QIAN B,LI Z,et al.Federatedscope-llm:A comprehensive package for fine-tuning large language models in fe-derated learning[J].arXiv:2309.00363,2023.
[85]YE R,WANG W,CHAI J,et al.OpenFedLLM:Training Large Language Models on Decentralized Private Data via Federated Learning[J].arXiv:2402.06954,2024.
[86]YE R,GE R,ZHU X,et al.FedLLM-Bench:Realistic Benchmarks for Federated Learning of Large Language Models[J].arXiv:2406.04845,2024.
[87]XIA Q,YE W,TAO Z,et al.A survey of federated learning for edge computing:Research problems and solutions[J].High-Confidence Computing,2021,1(1):100008.
[88]ZOU W,LIU X,HOU S,et al.Affinity-Based Resource andTask Allocation in Edge Computing Systems[C]//2023 IEEE 22nd International Conference on Trust,Security and Privacy in Computing and Communications(TrustCom).IEEE,2023.
[89]LIU Z,HUANG T,LI B,et al.Epnet++:Cascade bi-direc-tional fusion for multi-modal 3d object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(7):8324-8341.
[90]JI W,WEI Y,ZHENG Z,et al.Deep multimodal learning for information retrieval[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9739-9741.
[91]LIU F,ZHANG T,DAI W,et al.Few-shot Adaptation of Multi-modal Foundation Models:A Survey[J].arXiv:2401.01736,2024.
[92]FARAHANI A,VOGHOEI S,RASHEED K,et al.A brief review of domain adaptation[J].arXiv:2010.03978,2021.
[93]PENG L,LUO G,ZHOU S,et al.An in-depth evaluation of fe-derated learning on biomedical natural language processing for information extraction[J].NPJ Digital Medicine,2024,7(1):127.
[94]CHEN X,SHI Q,YANG L,et al.ThriftyEdge:Resource-efficient edge computing for intelligent IoT applications[J].IEEE network,2018,32(1):61-65.
[95]NGUYEN D C,DING M,PATHIRANA P N,et al.Federated learning for internet of things:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2021,23(3):1622-1658.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!