计算机科学 ›› 2025, Vol. 52 ›› Issue (11A): 241000041-6.doi: 10.11896/jsjkx.241000041
黄海新1, 徐成龙1, 付垚2
HUANG Haixin1, XU Chenglong1, FU Yao2
摘要: 针对现有大型语言模型在经过剪枝算法处理后在Zero-shot Performance中PPL(困惑度)高、文本生成精度低、模型推理速度慢等问题,提出了一种基于损失联合量级为核心的剪枝度量算法(Loss And Magnitude,LAM)。在对权重重要性估计过程中,将损失函数信息、权重激活信息进行信息融合,使用LAM算法消除在权重重要性评估过程中对梯度信息进行Taylor展开时为提高计算效率省略二阶导数所造成的局限性,提高模型剪枝过程的准确率和鲁棒性,增强剪枝算法的泛用性。在建立耦合结构时,提出单向耦合结构,选择激活Transformer块中的多层感知机(MLP)中的神经元作为初始触发器,只需考虑向注意力层,查询向量、键向量、值向量层方向激活神经元建立耦合结构,从而降低了识别耦合结构组所需的参数量,提高剪枝速度和吞吐量。在WikiText2数据集和PTB数据集进行的Zero-shot Performance实验表明:在剪枝率为25%时对LLaMA-7B进行剪枝处理,其PPL分数分别为20.24和36.05,显著低于其他剪枝算法,在对Vicuna-7B剪枝后的PPL分数为21.24与85.81,也优于其他剪枝算法,表现出更高的泛用性和准确性。
中图分类号:
| [1]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337. [2]SUN M,LIU Z,BAI R A,et al.A simple and effective pruning approach for large language models[J].arXiv:2306.11695,2023. [3]MA X,FANG G,WANG X.Llm-pruner:On the structuralpruning of large language models[J].Advances in Neural Information Processing Systems,2023,36:21702-21720. [4]KIM B K,KIM G,KIM T H,et al.Shortened llama:A simple depth pruning for large language models[J].arXiv:2402.02834,2024. [5]ZHU X P,YAO H D,LIU J,et al.Review of Evolution of Large Language Model Algorithms[J].ZTE Technology Journal,2024,30(2):9-20. [6]FANG G,MA X,SONG M,et al.Depgraph:Towards any structural pruning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16091-16101. [7]MOLCHANOV P,MALLYA A,TYREE S,et al.Importance estimation for neural network pruning[C]//CVPR.2019. [8]DETTMERS T,LEWIS M,BELKADA Y,et al.LLM.int8():8-bit matrix multiplication for transformersat scale[J].arXiv:2208.07339,2022. [9]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[C]//ICLR.2017. [10]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].arXiv:2106.09685,2021. [11]HE T W,WANG H.Evaluating Perplexity of Chinese Sentences Based on Grammar & Semantics Analysis[J].ApplicationResearch of Computers,2017,34(12):3538-3542,3546. [12]KIM B K,KIM G,KIM T H,et al.Shortened llama:A simple depth pruning for large language models[J].arXiv:2402.02834,2024. [13]ZHU Y K,KIROS R,ZEMEL R,et al.Aligning books and movies:Towards story-like visual explanations by watching movies and reading books[C]//ICCV.2015. [14]TAORI R,GULRAJANI I,ZHANG T Y,et al.Stanford alpa-ca:An instruction-following llama model[EB/OL].https://github.com/tatsu-lab/stanford_alpaca. [15]TOUVRON H,MARTIN L,STONE K,et al.Llama 2:Openfoundation and fine-tuned chat models[J].arXiv:2307.09288,2023. [16]CHIANG W L,LI Z,LIN Z,et al.Vicuna:An open-source chatbot impressing gpt-4 with 90%* chatgpt quality[J].See https://vicuna.lmsys.org(accessed 14 April 2023),2023,2(3):6. [17]SUN M J,LIU Z,BAIR A,et al.A simple and effective pruning approach for large language models[C]//ICLR.2024. [18]AN Y,ZHAO X,YU T,et al.Fluctuation-based adaptive structured pruning for large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:10865-10873. [19]LV B,ZHOU Q,DING X,et al.KVPruner:Structural Pruning for Faster and Memory-Efficient Large Language Models[J].arXiv:2409.11057,2024. [20]CHENG H,ZHANG M,SHI J Q.MINI-LLM:Memory-Efficient Structured Pruning for Large Language Models[J].arXiv:2407.11681,2024. |
|
||