Computer Science ›› 2026, Vol. 53 ›› Issue (1): 12-28.doi: 10.11896/jsjkx.250300030
• Research and Application of Large Language Model Technology • Previous Articles Next Articles
LIU Lilong1, LIU Guoming2, QI Baoyuan3, DENG Xueshan4, XUE Dizhan4, QIAN Shengsheng4
CLC Number:
| [1]CHANG Y P,WANG X,WANG J D,et al.A survey on evaluation of large language models[J].arXiv:2307.03109,2023. [2]OpenAI.2023.Introducing ChatGPT[EB/OL].https://openai.com/blog/chatgpt. [3]Microsoft.Announcing microsoft copilot,your everyday aicompanion[EB/OL].[2023-12-04].https://blogs.microsoft.com/blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion/. [4]TOUVRON H,LAVRIL T,IZACARD G,et al.Llama:Openand efficient foundation language models[J].arXiv:2302.13971,2023. [5]KACHRIS C.A survey on hardware accelerators for large language models[J].arXiv:2401.09890,2024. [6]ZHU X,LI J,LIU Y,et al.A survey on model compression for large language models[J].arXiv:2308.07633,2023. [7]SHAO R R,LIU Y,ZHANG W,et al.A Survey of Knowledge Distillation in Deep Learning[J].Journal of Computer Science,2022,45(8):1638-1673. [8]HUANG Z H,YANG S Z,LIN W,et al.A Survey of Know-ledge Distillation[J].Journal of Computer Science,2022,45(3):624-653. [9]PARK S,CHOI J,LEE S,et al.A comprehensive survey ofcompression algorithms for language models[J].arXiv:2401.15347,2024. [10]KHOSHNOODI M,JAIN V,GAO M,et al.A comprehensive survey of accelerated generation techniques in large language models[J].arXiv:2405.13019,2024. [11]WANG W,CHEN W,LUO Y,et al.Model compression and efficient inference for large language models:A survey[J].arXiv:2402.09748,2024. [12]ZHOU Z,NING X,HONG K,et al.A survey on efficient inference for large language models[J].arXiv:2404.14294,2024. [13]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems(NIPS’17).2017:6000-6010. [14]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901. [15]GAO Y,XIONG Y,GAO X,et al.Retrieval-augmented generation for large language models:A survey[J].arXiv:2312.10997,2023. [16]ZHOU W,JIANG Y E,COTTERELL R,et al.Efficientprompting via dynamic in-context learning[J].arXiv:2305.11170,2023. [17]YIN F.VIG J,LABAN P,et al.Did you read the instructions? rethinking the effectiveness of task defi nitions in instruction learning[J].arXiv:2306.01150,2023. [18]JUNG H,KIM K J.Discrete prompt compression with rein-forcement learning[J].IEEE Access,2024,12:72578-72587. [19]XU F,SHI W,CHOI E.Recomp:Improving retrieval-augmented lms with compression and selective augmentation[J].arXiv:2310.04408,2023. [20]LISKAVETS B,ROY S,USHAKOV M,et al.Task-agnosticPrompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor[J].arXiv:2502.13374,2025. [21]WINGATE D,SHOEYBI M,SORENSEN T.Prompt compression and contrastive conditioning for controllability and toxicity reduction in language models[J].arXiv:2210.03162,2022. [22]MU J,LI X,GOODMAN N.Learning to compress prompts with gist tokens[J].Advances in Neural Information Processing Systems,2023,36:19327-19352. [23]CHEVALIER A,WETTIG A,AJITH A,et al.Adapting language models to compress contexts[J].arXiv:2305.14788,2023. [24]GE T,HU J,WANG L,et al.In-context autoencoder for context compression in a large language model[J].arXiv:2307.06945,2023. [25]WANG H,ZHANG Z,HAN S.Spatten:Efficient sparse attention architecture with cascade token and head pruning[C]//2021 IEEE International Symposium on High-Performance Computer Architecture(HPCA).IEEE,2021:97-110. [26]KIM S,SHEN S,THORSLEY D,et al.Learned token pruning for transformers[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.2022:784-794. [27]WANG Z,CHEN J,ZHOU W,et al.Smarttrim:Adaptive to-kens and attention pruning for efficient vision-language models[J].arXiv:2305.15033,2023. [28]FEDERICI M,BELLI D,VAN BAALEN M,et al.Efficient llm inference using dynamic input pruning and cache-aware masking[J].arXiv:2412.01380,2024. [29]JIANG Z,XU F F,GAO L,et al.Active retrieval augmentedgeneration[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.2023:7969-7992. [30]SHI W,MIN S,YASUNAGA M,et al.Replug:Retrieval-aug-mented black-box language models[J].arXiv:2301.12652,2023. [31]ASAI A,WU Z,WANG Y,et al.Self-rag:Learning to retrieve,generate,and critique through self-reflection[J].arXiv:2310.11511,2024. [32]XIN J,TANG R,LEE J,et al.DeeBERT:Dynamic early exiting for accelerating BERT inference[J].arXiv:2004.12993,2020. [33]SCHWARTZ R,STANOVSKY G,SWAYAMDIPTA S,et al.The right tool for the job:Matching model and instance complexities[J].arXiv:2004.07453,2020. [34]ZHOU W,XU C,GE T,et al.Bert loses patience:Fast and robust inference with early exit[J].Advances in Neural Information Processing Systems,2020,33:18330-18341. [35]ZHANG Z,ZHU W,ZHANG J,et al.PCEE-BERT:Accelerating BERT inference via patient and confident early exiting[C]//Findings of the Association for Computational Linguistics:NAACL 2022.2022:327-338. [36]WANG J,CHEN K,CHEN G,et al.Skipbert:Efficient infe-rence with shallow layer skipping[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers).2022:7287-7301. [37]DIN A Y,KARIDI T,CHOSHEN L,et al.Jump to conclusions:Short-cutting transformers with linear transformations[J].ar-Xiv:2303.09435,2023. [38]SCHUSTER T,FISCH A,GUPTA J,et al.Confident adaptive language modeling[J].Advances in Neural Information Proces-sing Systems,2022,35:17456-17472. [39]TANG S,WANG Y,KONG Z,et al.You need multiple exiting:Dynamic early exiting for accelerating unifiedvision language model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:10781-10791. [40]XU G,HAO J,SHEN L,et al.Lgvit:Dynamic early exiting for accelerating vision transformer[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:9103-9114. [41]MEISTER C,VIEIRA T,COTTERELL R.Best-first beamsearch[J].Transactions of the Association for Computational Linguistics,2020,8:795-809. [42]FAN A,LEWIS M,DAUPHIN Y.Hierarchical neural storygeneration[J].arXiv:1805.04833,2018. [43]HOLTZMAN A,BUYS J,DU L,et al.The curious case of neural text degeneration[J].arXiv:1904.09751,2019. [44]WANG X,XIONG Y,WEI Y,et al.LightSeq:A high perfor-mance inference library for transformers[J].arXiv:2010.13887,2020. [45]LI L,LIN Y,CHEN D,et al.Cascadebert:Accelerating infe-rence of pre-trained language models via calibrated complete models cascade[J].arXiv:2012.14682,2020. [46]WANG Y,CHEN K,TAN H,et al.Tabi:An efficient multi-le-vel inference system for large language models[C]//Proceedings of the Eighteenth European Conference on Computer Systems.2023:233-248. [47]CHEN L,ZAHARIA M,ZOU J.Frugalgpt:How to use large language models while reducing cost and improving performance[J].arXiv:2305.05176,2023. [48]YUE M,ZHAO J,ZHANG M,et al.Large language model cascades with mixture of thoughts representations for cost-efficient reasoning[J].arXiv:2310.03094,2023. [49]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thoughtprompting elicits reasoning in large language models[J].Advances in neural information processing systems,2022,35:24824-24837. [50]CHEN W,MA X,WANG X,et al.Program of thoughts promp-ting:Disentangling computation from reasoning for numerical reasoning tasks[J].arXiv:2211.12588,2022. [51]SHAZEER N,MIRHOSEINI A,MAZIARZ K,et al.Outra-geously large neural networks:The sparsely-gated mixture-of-experts layer[J].arXiv:1701.06538, 2017. [52]XIA H,YANG Z,DONG Q,et al.Unlocking efficiency in large language model inference:A comprehensive survey of speculative decoding[J].arXiv:2401.07851,2024. [53]ZHOU Y,LYU K,RAWAT A S,et al.Distillspec:Improvingspeculative decoding via knowledge distillation[J].arXiv:2310.08461,2023. [54]ZHANG J,WANG J,LI H,et al.Draft & verify:Lossless large language model acceleration via self-speculative decoding[J].arXiv:2309.08168,2023. [55]LIU X,HU L,BAILIS P,et al.Online speculative decoding[J].arXiv:2310.07177,2023. [56]MONEA G,JOULIN A,GRAVE E.Pass:Parallel speculative sampling[J].arXiv:2311.13581,2023. [57]HE Z,ZHONG Z,CAI T,et al.Rest:Retrieval-based speculative decoding[J].arXiv:2311.08252,2023. [58]MIAO X,OLIARO G,ZHANG Z,et al.Specinfer:Accelerating generative large language model serving with tree-based speculative inference and verification[J].arXiv:2305.09781,2023. [59]FU Y,BAILIS P,STOICA I,et al.Break the sequential depen-dency of llm inference using lookahead decoding[J].arXiv:2402.02057,2024. [60]CAI T,LI Y,GENG Z,et al.Medusa:Simple llm inference acceleration framework with multiple decoding heads[J].arXiv:2401.10774,2024. [61]LI Y,ZHANG C,ZHANG H.Eagle:Lossless acceleration of llm decoding by feature extrapolation[EB/OL].[2023-12-08].https://sites.google.com/view/eagle-llm. [62]SUN Z,SURESH A T,RO J H,et al.Spectr:Fast speculative decoding via optimal transport[J].Advances in Neural Information Processing Systems,2023,36:30222-30242. [63]LI S,CHEN J,SHEN Y,et al.Explanations from large language models make small reasoners better[J].arXiv:2210.06726,2022. [64]YANG G,LO D,MULLINS R,et al.Dynamic stashing quantization for efficient transformer training[J].arXiv:2303.05295,2023. [65]CHENG Y,WANG D,ZHOU P,et al.A survey of model compression and acceleration for deep neural networks[J].arXiv:1710.09282,2017. [66]FRANTAR E,ASHKBOOS S,HOEFLER T,et al.GPTQ:Accurate quantization for generative pre-trained transformers[C]//The Eleventh International Conference on Learning Representations.2022. [67]PARK G,PARK B,KIM M,et al.Lut-gemm:Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models[J].arXiv:2206.09557,2022. [68]LIN J,TANG J,TANG H,et al.AWQ:Activation-awareWeight Quantization for On-Device LLM Compression and Acceleration[J].Proceedings of Machine Learning and Systems,2024,6:87-100. [69]KIM S,HOOPER C,GHOLAMI A,et al.Squeezellm:Dense-and-sparse quantization[J].arXiv:2306.07629,2023. [70]YAO Z,WU X,LI C,et al.Zeroquant-v2:Exploring post-trai-ning quantization in llms from comprehensive study to low rank compensation[J].arXiv:2303.08302,2023. [71]DETTMERS T,LEWIS M,BELKADA Y,et al.Gpt3.int8():8-bit matrix multiplication for transformers at scale[J].Advances in Neural Information Processing Systems,2022,35:30318-30332. [72]XIAO G,LIN J,SEZNEC M,et al.Smoothquant:Accurate and efficient post-training quantization for large language models[C]//International Conference on Machine Learning.PMLR,2023:38087-38099. [73]YUAN Z,NIU L,LIU J,et al.Rptq:Reorder-based post-trai-ning quantization for large language models[J].arXiv:2304.01089,2023. [74]YAO Z,YAZDANI AMINABADI R,ZHANG M,et al.Zeroquant:Efficient and affordable post-training quantization for large-scale transformers[J].Advances in Neural Information Processing Systems,2022,35:27168-27183. [75]LIU Z,OGUZ B,ZHAO C,et al.Llm-qat:Data-free quantization aware training for large language models[C]//Findings of the Association for Computational Linguistics:ACL 2024.2024:467-484. [76]SAXENA U,SHARIFY S,ROY K,et al.ResQ:Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals[J].arXiv:2412.14363,2024. [77]ZENG B,JI B,LIU X,et al.LSAQ:Layer-Specific AdaptiveQuantization for Large Language Model Deployment[J].arXiv:2412.18135,2024. [78]LIU S,LIU Z,HUANG X,et al.Llm-fp4:4-bit floating-pointquantized transformers[J].arXiv:2310.16836,2023. [79]KIM J,LEE J H,KIM S,et al.Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization[J].arXiv:2305.14152,2024. [80]DETTMERS T,PAGNONI A,HOLTZMAN A,et al.Qlora:Efficient finetuning of quantized LLMs[J].arXiv:2305.14314,2024. [81]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].ICLR,2022,1(2):3. [82]LI L,LI Q,ZHANG B,et al.Norm tweaking:High-performance low-bit quantization of large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:18536-18544. [83]LI Y,XU S,ZHANG B,et al.Q-vit:Accurate and fully quantized low-bit vision transformer[J].Advances in neural information processing systems,2022,35:34451-34463. [84]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337. [85]SUN M,LIU Z,BAIR A,et al.A simple and effective pruning approach for large language models[J].arXiv:2306.11695,2023. [86]ZHANG M,CHEN H,SHEN C,et al.Loraprune:Pruning meetslow-rank parameter-efficient fine-tuning[J].arXiv:2305.18403,2023. [87]CUNEGATTI E,CUSTODE L L,IACCA G.Zeroth-OrderAdaptive Neuron Alignment Based Pruning without Re-Training[J].arXiv:2411.07066,2024. [88]MA X,FANG G,WANG X.Llm-pruner:On the structuralpruning of large language models[J].Advances in neural information processing systems,2023,36:21702-21720. [89]GORDON A,EBAN E,NACHUM O,et al.Morphnet:Fast & simple resource-constrained structure learning ofdeep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:1586-1595. [90]YANG T J,HOWARD A,CHEN B,et al.Netadapt:Platform-aware neural network adaptation for mobile applications[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:285-300. [91]VAN DER OUDERAA T F A,NAGEL M,VAN BAALEN M,et al.The Surgeon[J].arXiv:2312.17244,2023. [92]LEE J,KIM H.DCT-ViT:High-Frequency Pruned VisionTransformer with Discrete Cosine Transform[J].IEEE Access,2024,12:80386-80396. [93]YU L,XIANG W.X-pruner:explainable pruning for visiontransformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:24355-24363. [94]SANDRI F,CUNEGATTI E,IACCA G.2SSP:A Two-StageFramework for Structured Pruning of LLMs[J].arXiv:2501.17771,2025. [95]MUÑOZ J P,YUAN J,JAIN N.Mamba-Shedder:Post-Transformer Compression for Efficient Selective Structured State Space Models[J].arXiv:2501.17088,2025. [96]MUÑOZ J P,YUAN J,JAIN N.Multipruner:Balanced structure removal in foundation models[J].arXiv:2501.09949,2025. [97]SORBER L,VAN BAREL M,DE LATHAUWER L.Optimization-based algorithms for tensor decompositions:Canonical polyadic decomposition,decomposition in rank-(L_r,L_r,1) terms,and a new generalization[J].SIAM Journal on Optimization,2013,23(2):695-720. [98]MØRUP M,HANSEN L K,ARNFRED S M.Algorithms forsparse nonnegative Tucker decompositions[J].Neural computation,2008,20(8):2112-2131. [99]SAHA R,SRIVASTAVA V,PILANCI M.Matrix compression via randomized low rank and low precision factorization[J].ar-Xiv:2310.11028,2023. [100]KAUSHAL A,VAIDHYA T,RISH I.Lord:Low rank decomposition of monolingual code llms for one-shot compression[J].arXiv:2309.14021,2023. [101]WANG X,ZHENG Y,WAN Z,et al.Svd-llm:Truncation-aware singular value decomposition for large language model compression[J].arXiv:2403.07378,2024. [102]CHANG C C,SUNG Y Y,YU S,et al.FLORA:Fine-grained Low-Rank Architecture Search for Vision Transformer[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.2024:2482-2491. [103]BUCILUA C,CARUANA R,NICULESCU-MIZIL A.Modelcompression[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2006:535-541. [104]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015. [105]DONG Q,LI L,DAI D,et al.A survey on in-context learning[J].arXiv:2301.00234,2022. [106]HUANG Y,CHEN Y,YU Z,et al.In-context learning distillation:Transferring few-shot learning ability of pre-trained language models[J].arXiv:2212.10670,2022. [107]GOYAL V,KHAN M,TIRUPATI A,et al.Enhancing Know-ledge Distillation for LLMs with Response-Priming Prompting[J].arXiv:2412.17846,2024. [108]ZHOU Z,SHI J X,SONG P X,et al.LawGPT:A Chinese Legal Knowledge-Enhanced Large Language Model[J].arXiv:2406.04614,2024. [109]CHEN Z,GAO Q,BOSSELUT A,et al.DISCO:Distilling counterfactuals with large language models[J].arXiv:2212.10534,2022. [110]JIANG Y,CHAN C,CHEN M,et al.Lion:Adversarial distil-lation of proprietary large language models[J].arXiv:2305.12870,2023. [111]GU Y,DONG L,WEI F,et al.Knowledge distillation of large language models[J].arXiv:2306.08543,2023. [112]AGARWAL R,VIEILLARD N,STANCZYK P,et al.Gkd:Generalized knowledge distillation for auto-regressive sequence models[J].arXiv:2306.13649,2023. [113]LIANG C,ZUO S,ZHANG Q,et al.Less is more:Task-aware layer-wise distillation for language model compression[C]//International Conference on Machine Learning.PMLR,2023:20852-20867. [114]ZHANG C,YANG Y,LIU J,et al.Lifting the curse of capacity gap in distilling language models[J].arXiv:2305.12129,2023. [115]CHEN X,CAO Q,ZHONG Y,et al.Dearkd:data-efficient early knowledge distillation for vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:12052-12062. [116]RADEVSKI G,GRUJICIC D,BLASCHKO M,et al.Multimodal distillation for egocentric action recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:5213-5224. [117]SHENG Y,ZHENG L,YUAN B,et al.Flexgen:High-throughput generative inference of large language models with a single gpu[C]//International Conference on Machine Learning.PMLR,2023:31094-31116. [118]YU G I,JEONG J S,KIM G W,et al.Orca:A distributed serving system for {Transformer-Based} generative models[C]//16th USENIX Symposium on Operating Systems Design and Implementation(OSDI 22).2022:521-538. [119]JIN Y,WU C F,BROOKS D,et al.$ S∧ 3$:Increasing GPU Utilization during Generative Inference for Higher Throughput[J].Advances in Neural Information Processing Systems,2023,36:18015-18027. [120]KWON W,LI Z,ZHUANG S,et al.Efficient memory management for large language model serving with pagedattention[C]//Proceedings of the 29th Symposium on Operating Systems Principles.2023:611-626. [121]LIU J,CHUNG J W,WU Z,et al.Andes:Defining and enhancing quality-of-experience in llm-based text streaming services[J].arXiv:2404.16283,2024. [122]AMINABADI R Y,RAJBHANDARI S,AWAN A A,et al.Deepspeed-inference:enabling efficient inference of transformer models at unprecedented scale[C]//SC22:International Confe-rence for High Performance Computing,Networking,Storage and Analysis.IEEE,2022:1-15. [123]DAO T,HAZIZA D,MASSA F,et al.Flash-decoding for long-context inference.[EB/OL].[2023-10-13].https://pytorch.org/blog/flash-decoding/. [124]HONG K,DAI G,XU J,et al.Flashdecoding++:Faster large language model inference on gpus[J].arXiv:2311.01282,2023. [125]GONG R,BAI S,WU S,et al.Past-future scheduler for llm serving under sla guarantees[C]//Proceedings of the30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.2025:798-813. [126]QIN Z,CAO Y,LIN M,et al.CAKE:Cascading and adaptiveKV cache eviction with layer preferences[J].arXiv:2503.12491,2025. [127]YE Z,CHEN L,LAI R,et al.Flashinfer:Efficient and customi-zable attention engine for llm inference serving[J].arXiv:2501.01005,2025. [128]WU J,WANG Z,ZHANG L,et al.SCOPE:Optimizing Key-Value Cache Compression in Long-context Generation[J].ar-Xiv:2412.13649,2024. [129]WAN Z,SHEN H,WANG X,et al.Meda:Dynamic kv cache allocation for efficient multimodal long-context inference[J].ar-Xiv:2502.17599,2025. [130]TRAN B,LI J,MADRY A.Spectral signatures in backdoor attacks[C]//Proceedings of the 32nd International Confernece on Journal of Machine Learning Research.2018. [131]CHOWDHERY A,NARANG S,DEVLIN J,et al.Palm:Scaling language modeling with pathways[J].Journal of Machine Learning Research,2023,24(240):1-113. [132]HUANG Y,CHENG Y,BAPNA A,et al.Gpipe:Efficient trai-ning of giant neural networks using pipeline parallelism[C]//Proceedings of the 33rd International Conference onNeural Information Processing Systems.2019:103-112. [133]LI S,XUE F,BARANWAL C,et al.Sequence parallelism:Long sequence training from system perspective[J].arXiv:2105.13120,2021. [134]ZHENG L,LI Z,ZHANG H,et al.Alpa:Automating inter-and {Intra-Operator} parallelism for distributed deep learning[C]//16th USENIX Symposium on Operating Systems Design and Implementation(OSDI 22).2022:559-578. [135]JIA Z,ZAHARIA M,AIKEN A.Beyond data and model parallelism for deep neural networks[J].Proceedings of Machine Learning and Systems,2019,1:1-13. [136]MIAO X,WANG Y,JIANG Y,et al.Galvatron:Efficient transformer training over multiple gpus using automatic parallelism[J].arXiv:2211.13878,2022. [137]LI Z,ZHENG L,ZHONG Y,et al.{AlpaServe}:Statistical multiplexing with model parallelism for deep learning serving[C]//17th USENIX Symposium on Operating Systems Design and Implementation(OSDI 23).2023:663-679. [138]LU W,YAN G,LI J,et al.Flexflow:A flexible dataflow accele-rator architecture for convolutional neural networks[C]//2017 IEEE International Symposium on High Performance Computer Architecture(HPCA).IEEE,2017:553-564. [139]MIAO X,SHI C,DUAN J,et al.Spotserve:Serving generative large language models on preemptible instances[C]//Procee-dings of the 29th ACM International Conference on Architectu-ral Support for Programming Languages and Operating Systems,Volume 2.2024:1112-1127. [140]BORZUNOV A,BARANCHUK D,DETTMERS T,et al.Pe-tals:Collaborative inference and fine-tuning of large models[J].arXiv:2209.01188,2022. [141]WANG Y,XUE D,ZHANG S,et al.Badagent:Inserting and activating backdoor attacks in llm agents[J].arXiv:2406.03007,2024. |
| [1] | SHAO Xinyi, ZHU Jingwei, ZHANG Liang. LLM-based Business Process Adaptation Method to Respond Long-tailed Changes [J]. Computer Science, 2026, 53(1): 29-38. |
| [2] | LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247. |
| [3] | CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341. |
| [4] | LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7. |
| [5] | HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4. |
| [6] | GU Huijie, FANG Wenchong, ZHOU Zhifeng, ZHU Wen, MA Guang, LI Yingchen. CSO-LSTM Based Power Prediction Method for New Energy Generation [J]. Computer Science, 2025, 52(6A): 240600053-11. |
| [7] | GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329. |
| [8] | LI Hao, YANG Yumeng, ZHAO Boyang, ZHENG Puqi, LIN Hongfei. Adverse Drug Reaction Relationship Extraction Based on Chain of Thought Enhancement UnderHigh and Low Resources [J]. Computer Science, 2025, 52(12): 224-230. |
| [9] | XU Fuping, ZHOU Xiaohang, ZHANG Ning. Review of Impact of Personalized Recommendation Algorithms on User Decision-makingBehavior [J]. Computer Science, 2025, 52(11A): 241100086-8. |
| [10] | GUO Liwei, WU Yonghao, LIU Yong. Semantic Variations Based Defect Generation and Prediction Model Testing [J]. Computer Science, 2025, 52(11A): 241200059-7. |
| [11] | HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion [J]. Computer Science, 2025, 52(11A): 241000041-6. |
| [12] | PAN Jie, WANG Juan, WANG Nan. Large Language Models and Rumors:A Survey on Generation and Detection [J]. Computer Science, 2025, 52(11): 1-12. |
| [13] | FANG Quan, ZHANG Jinlong, WANG Bingqian, HU Jun. Research on Domain Knowledge Question Answering via Large Language Models withCompositional Context Prompting [J]. Computer Science, 2025, 52(11): 13-21. |
| [14] | ZHANG Haoran, HAO Wenning, JIN Dawei, CHENG Kai, ZHAI Ying. DF-RAG:A Retrieval-augmented Generation Method Based on Query Rewriting and Knowledge Selection [J]. Computer Science, 2025, 52(11): 30-39. |
| [15] | ZHOU Yuchen, LI Peng, HAN Keji. Instruct-Malware:Control Flow Graph Based Large Language Model Analysis of Malware [J]. Computer Science, 2025, 52(11): 40-48. |
|
||