Computer Science ›› 2025, Vol. 52 ›› Issue (1): 42-55.doi: 10.11896/jsjkx.240500095

• Technology Research and Application of Large Language Model • Previous Articles     Next Articles

Survey on Transmission Optimization Technologies for Federated Large Language Model Training

DUN Jingbo, LI Zhuo   

  1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(Beijing Information Science & Technology University),Beijing 100101,China
    School of Computer Science,Beijing Information Science & Technology University,Beijing 100101,China
  • Received:2024-05-22 Revised:2024-09-11 Online:2025-01-15 Published:2025-01-09
  • About author:DUN Jingbo,born in 2001,postgra-duate.Her main research interests include federated large language model and so on.
    LI Zhuo,born in 1983,Ph.D,Ph.D professor,is a senior member of CCF(No.29832S).His main research interests include edge computing,distributed machine learning and mobile wireless networks.
  • Supported by:
    Natural Science Foundation of Beijing,China(4232024) and National Key R&D Program of China(2022YFF0604502).

Abstract: With the rapid development of artificial intelligence technology,various types of large language models are emerging.However,most users and datasets participating in dedicated large language models have certain requirements for privacy and security,the data security and privacy issues need to be solved urgently,and federated large language models have emerged and gained more and more attention.Due to the huge data volume of large language models and the distributed architecture of federated learning,a large number of model exchanges between a large number of participating nodes and cloud servers result in high communication costs.In order to improve the model convergence rate,researchers have investigated transmission optimization techniques for federated large language model training.This paper analyzes the challenges of federated large language models,reviews the optimization problems of transmission optimization methods based on model fine-tuning,transmission optimization methods based on model structure compression,and transmission optimization based on distributed parallel processing;introduces existing open-source federated large language models and the transmission optimization techniques used,and gives an outlook on future research directions.

Key words: Federated learning, Large language models, Transmission optimization, Communication overhead, Model compression

CLC Number: 

  • TP393
[1]ZHAO W X,ZHOU K,LI J,et al.A survey of large language models[J].arXiv:2303.18223,2023.
[2]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[3]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[4]BEBIS G,GEORGIOPOULOS M.Feed-forward neural net-works[J].IEEE Potentials,1994,13(4):27-31.
[5]LUO J H,WU J.Neural network pruning with residual-connections and limited-data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:1458-1467.
[6]NAZIR A,WANG Z.A Comprehensive Survey of ChatGPT:Advancements,Applications,Prospects,and Challenges[J].Meta-radiology,2023,1(2):100022.
[7]RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):9-33.
[8]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[9]ACHIAM J,ADLER S,AGARWAL S,et al.Gpt-4 technical re-port[J].arXiv:2303.08774,2023.
[10]ERHAN D,BINGIO Y,COURVILLE A,et al.Why does unsupervised pre-training help deep learning?[C]//Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics.2010:201-208.
[11]SHAHID O,POURIYEH S,PARIZI R M,et al.Communication efficiency in federated learning:Achievements and challenges[J].arXiv:2107.10996,2021.
[12]DRIESS D,XIA F,SAJJADI M S M,et al.Palm-e:An embodied multimodal language model[J].arXiv:2303.03378,2023.
[13]SUN Y,WANG S,FENG S,et al.Ernie 3.0:Large-scale know-ledge enhanced pre-training for language understanding and ge-neration[J].arXiv:2107.02137,2021.
[14]CHEN M,SHLEZINGER N,POOR H V,et al.Communication-efficient federated learning[J].Proceedings of the National Academy of Sciences,2021,118(17):e2024789118.
[15]RAJBHANDARI S,RASLEY J,RUWASE O,et al.Zero:Me-mory optimizations toward training trillion parameter models[C]//SC20:International Conference for High Performance Computing,Networking,Storage and Analysis.IEEE,2020:1-16.
[16]VM K,WARRIER H,GUPTA Y.Fine Tuning LLM for Enterprise:Practical Guidelines and Recommendations[J].arXiv:2404.10779,2024.
[17]CHEN C,FENG X,ZHOU J,et al.Federated large language model:A position paper[J].arXiv:2307.08925,2023.
[18]WANG J,LIU Q,LIANG H,et al.A novel framework for the analysis and design of heterogeneous federated learning[J].IEEE Transactions on Signal Processing,2021,69:5234-5249.
[19]HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al.Parame-ter-efficient transfer learning for NLP[C]//International Conference on Machine Learning.PMLR,2019:2790-2799.
[20]HE R,LIU L,YE H,et al.On the effectiveness of adapter-basedtuning for pretrained language model adaptation[J].arXiv:2106.03164,2021.
[21]LI X L,LIANG P.Prefix-tuning:Optimizing continuous promptsfor generation[J].arXiv:2101.00190,2021.
[22]LESTER B,AL-RFOU R,CONSTANT N.The power of scale for parameter-efficient prompt tuning[J].arXiv:2104.08691,2021.
[23]LIU X,JI K,FU Y,et al.P-tuning:Prompt tuning can be comparable to fine-tuning across scales and tasks[C]//Proceedings of the 60th Annual Meeting of the Association for Computa-tional Linguistics(Volume 2:Short Papers).2022:61-68.
[24]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].arXiv:2106.09685,2021.
[25]LIN B Y,HE C,ZENG Z,et al.Fednlp:Benchmarking federated learning methods for natural language processing tasks[J].ar-Xiv:2104.08815,2021.
[26]CAI D,WU Y,WANG S,et al.FedAdapter:Efficient Federated Learning for Modern NLP[J].arXiv:2205.10162,2022.
[27]BENGIO Y,LOURADOUR J,COLLOBERT R,et al.Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.2009:41-48.
[28]KIM G,YOO J,KANG S.Efficient federated learning with pre-trained large language model using several adapter mechanisms[J].Mathematics,2023,11(21):4479.
[29]SUN G,MENDIETA M,YANG T,et al.Exploring parameter-efficient fine-tuning for improving communication efficiency in federated learning[J].arXiv:2210.01708,2024.
[30]ZHAO H,DU W,LI F,et al.FedPrompt:Communication-Efficient and Privacy-Preserving Prompt Tuning in Federated Learning[C]//ICASSP 2023-2023 IEEE International Confe-rence on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2023:1-5.
[31]YANG F E,WANG C Y,WANG Y C F.Efficient model personalization in federated learning via client-specific prompt ge-neration[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:19159-19168.
[32]CHE T,LIU J,ZHOU Y,et al.Federated learning of large language models with parameter-efficient prompt tuning and adaptive optimization[J].arXiv:2310.15080,2023.
[33]YI L,YU H,WANG G,et al.Fedlora:Model-heterogeneouspersonalized federated learning with lora tuning[J].arXiv:2310.13283,2023.
[34]JIANG F,DONG L,TU S,et al.Personalized wireless federated learning for large language models[J].arXiv:2404.13238,2024.
[35]JIANG J,LIU X,FAN C.Low-parameter federated learningwith large language models[J].arXiv:2307.13896,2023.
[36]BABAKNIYA S,ELKORDY A R,EZZELDIN Y H,et al.SLoRA:Federated parameter efficient fine-tuning of language mo-dels[J].arXiv:2308.06522,2023.
[37]RAJE A.Communication-Efficient LLM Training for Federated Learning[D].Pittsburgh:Carnegie Mellon University,2024.
[38]HUANG W,WANG Y,CHENG A,et al.A Fast,Performant,Secure Distributed Training Framework For LLM[C]//ICASSP 2024-2024 IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP).IEEE,2024:4800-4804.
[39]ZHUANG W,CHEN C,LYU L.When foundation model meets federated learning:Motivations,challenges,and future directions[J].arXiv:2306.15546,2023.
[40]REED R.Pruning algorithms-a survey[J].IEEE Transactions on Neural Networks,1993,4(5):740-747.
[41]HAN S,POOL J,TRAN J,et al.Learning both weights andconnections for efficient neural network[J].arXiv:1506.02626,2015.
[42]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337.
[43]LI H,KADAV A,DURDANOVIC I,et al.Pruning Filters for Efficient ConvNets[J].arXiv.1608.08710,2016.
[44]JIANG Y,WANG S,VALLS V,et al.Model pruning enables efficient federated learning on edge devices[J].IEEE Transactions on Neural Networks and Learning Systems,2023,34(12):10374-10386.
[45]HUANG H,ZHANG L,SUN C,et al.Distributed pruning towards tiny neural networks in federated learning[C]//2023 IEEE 43rd International Conference on Distributed Computing Systems(ICDCS).IEEE,2023:190-201.
[46]MA X,FANG G,WANG X.Llm-pruner:On the structuralpruning of large language models[J].Advances in neural information processing systems,2023,36:21702-21720.
[47]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337.
[48]SUN M,LIU Z,BAIR A,et al.A Simple and Effective Pruning Approach for Large Language Models[J].arXiv:2306.11695,2023.
[49]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[50]GOU J,YU B,MAYBANK S J,et al.Knowledge distillation:A survey[J].International Journal of Computer Vision,2021,129(6):1789-1819.
[51]ANIL R,PEREYRA G,PASSOS A,et al.Large scale distributed neural network training through online distillation[J].ar-Xiv:1804.03235,2018.
[52]WU C,WU F,LYU L,et al.Communication-efficient federated learning via knowledge distillation[J].Nature Communications,2022,13(1):2032.
[53]PENG Z,FAN X,CHEN Y,et al.FedPFT:Federated ProxyFine-Tuning of Foundation Models[J].arXiv:2404.11536,2024.
[54]WU F J,LI Z T,LI Y L,et al.FedBiOT:LLM Local Fine-tuning in Federated Learning without Full Model[J].arXiv:2406.17706,2024.
[55]HAN S,MAO H,DALLY W J.Deep compression:Compressing deep neural networks with pruning,trained quantization and huffman coding[J].arXiv:1510.00149,2015.
[56]KIRTAS M,OIKONOMOU A,PASSALIS N,et al.Quantization-aware training for low precision photonic neural networks[J].Neural Networks,2022,155:561-573.
[57]LIU Z,OGUZ B,ZHAO C,et al.LLM-QAT:Data-Free Quantization Aware Training for Large Language Models[J].arXiv:2305.17888,2023.
[58]REISIZADEH A,MOKHTARI A,HASSANI H,et al.Fedpaq:A communication-efficient federated learning method with periodic averaging and quantization[C]//International Conference on Artificial Intelligence and Statistics.PMLR,2020:2021-2031.
[59]CHEN Y,CHEN Z,WU P,et al.FedOBD:Opportunistic block dropout for efficiently training large-scale neural networks through federated learning[J].arXiv:2208.05174,2022.
[60]KIM J,LEE J H,KIM S,et al.Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization[J].Advances in Neural Information Processing Systems,2024,36.
[61]DETTMERS T,PAGNONI A,HOLTZMAN A,et al.Qlora:Efficient finetuning of quantized llms[C]//Proceedings of the 37th International Conference on Neural Information Processing System.2024:36187-36207.
[62]DETTMERS T,LEWIS M,BELKADA Y,et al.Gpt3.int8():8-bit matrix multiplication for transformers at scale[J].Advances in Neural Information Processing Systems,2022,35:30318-30332.
[63]LIN J,TANG J,TANG H,et al.AWQ:Activation-awareWeight Quantization for LLM Compression and Acceleration[J].arXiv:2306.00978,2023.
[64]BONDARENKO Y,NAGEL M,BLANKEVOORT T.Under-standing and overcoming the challenges of efficient transformer quantization[J].arXiv:2109.12948,2021.
[65]WEN Z,YIN W,ZHANG Y.Solving a low-rank factorizationmodel for matrix completion by a nonlinear successive over-relaxation algorithm[J].Mathematical Programming Computation,2012,4(4):333-361.
[66]JADERBERG M,VEDALDI A,ZISSERMAN A.Speeding up convolutional neural networks with low rank expansions[J].arXiv:1405.3866,2014.
[67]LEBEDEV V,GANIN Y,RAKHUBA M,et al.Speeding-upconvolutional neural networks using fine-tuned cp-decomposition[J].arXiv:1412.6553,2014.
[68]WU X,YAO Z,HE Y.Zeroquant-fp:A leap forward in llms post-training w4a8 quantization using floating-point formats[J].arXiv:2307.09782,2023.
[69]ZHANG M,SHEN C,YANG Z,et al.Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning[J].arXiv:2305.18403,2023.
[70]XU M,CAI D,WU Y,et al.Fwdllm:Efficient fedllm using forward gradient[J].arXiv:2308.13894,2023.
[71]QIU Q,CHENG X,SAPIRO G.DCFNet:Deep neural network with decomposed convolutional filters[C]//International Conference on Machine Learning.PMLR,2018:4198-4207.
[72]NARAYANAN D,SHOEYBI M,CASPER J,et al.Efficientlarge-scale language model training on gpu clusters using megatron-lm[C]//Proceedings of the International Conference for High Performance Computing,Networking,Storage and Analysis.2021:1-15.
[73]JIA Z,ZAHARIA M,AIKEN A.Beyond Data and Model Parallelism for Deep Neural Networks[J].Proceedings of Machine Learning and Systems,2019,1:1-13.
[74]ANDROUTSOPOULOS K,CLARK D,HARMAN M,et al.State-based model slicing:A survey[J].ACM Computing Surveys(CSUR),2013,45(4):1-36.
[75]SU N,HU C,LI B,et al.TITANIC:Towards Production Fede-rated Learning with Large Language Models[C]//IEEE INFOCOM.2024.
[76]SHOEYBI M,PATWARY M,PURI R,et al.Megatron-lm:Training multi-billion parameter language models using model parallelism[J].arXiv:1909.08053,2019.
[77]ZHU J,LI S,YOU Y.Sky Computing:Accelerating Geo-distri-buted Computing in Federated Learning[J].arXiv:2202.11836,2022.
[78]NAGRECHA K.Systems for parallel and distributed large-mo-del deep learning training[J].arXiv:2301.02691,2023.
[79]LI S,ZHAO Y,VARMA R,et al.Pytorch distributed:Expe-riences on accelerating data parallel training[J].arXiv:2006.15704,2020.
[80]HUANG Y,CHENG Y,BAPNA A,et al.Gpipe:Efficient trai-ning of giant neural networks using pipeline parallelism[J].Advances in Neural Information Processing Systems,2019,32(10):103-112.
[81]HARLAP A,NARAYANAN D,PHANISHAYEE A,et al.Pipedream:Fast and efficient pipeline parallel dnn training[J].arXiv:1806.03377,2018.
[82]HE C,LI S,SO J,et al.Fedml:A research library and benchmark for federated machine learning[J].arXiv:2007.13518,2020.
[83]FAN T,KANG Y,MA G,et al.FATE-LLM:A IndustrialGrade Federated Learning Framework for Large Language Mo-dels[J].arXiv:2310.10049,2023.
[84]KUANG W,QIAN B,LI Z,et al.Federatedscope-llm:A comprehensive package for fine-tuning large language models in fe-derated learning[J].arXiv:2309.00363,2023.
[85]YE R,WANG W,CHAI J,et al.OpenFedLLM:Training Large Language Models on Decentralized Private Data via Federated Learning[J].arXiv:2402.06954,2024.
[86]YE R,GE R,ZHU X,et al.FedLLM-Bench:Realistic Benchmarks for Federated Learning of Large Language Models[J].arXiv:2406.04845,2024.
[87]XIA Q,YE W,TAO Z,et al.A survey of federated learning for edge computing:Research problems and solutions[J].High-Confidence Computing,2021,1(1):100008.
[88]ZOU W,LIU X,HOU S,et al.Affinity-Based Resource andTask Allocation in Edge Computing Systems[C]//2023 IEEE 22nd International Conference on Trust,Security and Privacy in Computing and Communications(TrustCom).IEEE,2023.
[89]LIU Z,HUANG T,LI B,et al.Epnet++:Cascade bi-direc-tional fusion for multi-modal 3d object detection[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(7):8324-8341.
[90]JI W,WEI Y,ZHENG Z,et al.Deep multimodal learning for information retrieval[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9739-9741.
[91]LIU F,ZHANG T,DAI W,et al.Few-shot Adaptation of Multi-modal Foundation Models:A Survey[J].arXiv:2401.01736,2024.
[92]FARAHANI A,VOGHOEI S,RASHEED K,et al.A brief review of domain adaptation[J].arXiv:2010.03978,2021.
[93]PENG L,LUO G,ZHOU S,et al.An in-depth evaluation of fe-derated learning on biomedical natural language processing for information extraction[J].NPJ Digital Medicine,2024,7(1):127.
[94]CHEN X,SHI Q,YANG L,et al.ThriftyEdge:Resource-efficient edge computing for intelligent IoT applications[J].IEEE network,2018,32(1):61-65.
[95]NGUYEN D C,DING M,PATHIRANA P N,et al.Federated learning for internet of things:A comprehensive survey[J].IEEE Communications Surveys & Tutorials,2021,23(3):1622-1658.
[1] LI Tingting, WANG Qi, WANG Jiakang, XU Yongjun. SWARM-LLM:An Unmanned Swarm Task Planning System Based on Large Language Models [J]. Computer Science, 2025, 52(1): 72-79.
[2] CHENG Zhiyu, CHEN Xinglin, WANG Jing, ZHOU Zhongyuan, ZHANG Zhizheng. Retrieval-augmented Generative Intelligence Question Answering Technology Based on Knowledge Graph [J]. Computer Science, 2025, 52(1): 87-93.
[3] LIU Yuming, DAI Yu, CHEN Gongping. Review of Federated Learning in Medical Image Processing [J]. Computer Science, 2025, 52(1): 183-193.
[4] WANG Xin, XIONG Shubo, SUN Lingyun. Federated Graph Learning:Problems,Methods and Challenges [J]. Computer Science, 2025, 52(1): 362-373.
[5] XU Jinlong, GUI Zhonghua, LI Jia'nan, LI Yingying, HAN Lin. FP8 Quantization and Inference Memory Optimization Based on MLIR [J]. Computer Science, 2024, 51(9): 112-120.
[6] LI Zhi, LIN Sen, ZHANG Qiang. Edge Cloud Computing Approach for Intelligent Fault Detection in Rail Transit [J]. Computer Science, 2024, 51(9): 331-337.
[7] LIU Yumeng, ZHAO Yijing, WANG Bicong, WANG Chao, ZHANG Baomin. Advances in SQL Intelligent Synthesis Technology [J]. Computer Science, 2024, 51(7): 40-48.
[8] XU Xiaohua, ZHOU Zhangbing, HU Zhongxu, LIN Shixun, YU Zhenjie. Lightweight Deep Neural Network Models for Edge Intelligence:A Survey [J]. Computer Science, 2024, 51(7): 257-271.
[9] GAO Yang, CAO Yangjie, DUAN Pengsong. Lightweighting Methods for Neural Network Models:A Review [J]. Computer Science, 2024, 51(6A): 230600137-11.
[10] ZHOU Tianyang, YANG Lei. Study on Client Selection Strategy and Dataset Partition in Federated Learning Basedon Edge TB [J]. Computer Science, 2024, 51(6A): 230800046-6.
[11] SUN Min, DING Xining, CHENG Qian. Federated Learning Scheme Based on Differential Privacy [J]. Computer Science, 2024, 51(6A): 230600211-6.
[12] TAN Zhiwen, XU Ruzhi, WANG Naiyu, LUO Dan. Differential Privacy Federated Learning Method Based on Knowledge Distillation [J]. Computer Science, 2024, 51(6A): 230600002-8.
[13] LIU Dongqi, ZHANG Qiong, LIANG Haolan, ZHANG Zidong, ZENG Xiangjun. Study on Smart Grid AMI Intrusion Detection Method Based on Federated Learning [J]. Computer Science, 2024, 51(6A): 230700077-8.
[14] WANG Chenzhuo, LU Yanrong, SHEN Jian. Study on Fingerprint Recognition Algorithm for Fairness in Federated Learning [J]. Computer Science, 2024, 51(6A): 230800043-9.
[15] ZANG Hongrui, YANG Tingting, LIU Hongbo, MA Kai. Study on Cryptographic Verification of Distributed Federated Learning for Internet of Things [J]. Computer Science, 2024, 51(6A): 230700217-5.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!