基于信息融合的结构化剪枝算法研究

doi:10.11896/jsjkx.241000041

Abstract

Abstract: Aiming at the problems of high PPL( perplexity ),low text generation accuracy and slow model reasoning speed in Zero-shot Performance after the existing large-scale language model is processed by pruning algorithm,this paper proposes a pruning metric algorithm LAM based on the joint magnitude of loss.In the process of estimating the weight importance,the loss function information and the weight activation information are fused.By using the LAM algorithm,the limitations caused by the omission of the second derivative in the Taylor expansion of the gradient information in the process of weight importance evaluation are eliminated,and the accuracy and robustness of the model pruning process are improved.Enhance the versatility of the pruning algorithm.When establishing the coupling structure,a single coupling structure is proposed,and the neurons in the multi-layer perceptron( MLP ) in the Transformer block are selected as the initial trigger.Only the attention layer,the query vector,the key vector,and the value vector layer are considered to activate the neurons to establish the coupling structure.Thus,the number of parameters required to identify the coupling structure group is reduced,and the pruning speed and throughput are improved.The Zero-shot Performance experiments on WikiText2 dataset and PTB dataset show that when the pruning rate is 25 %,the PPL scores of LLaMA-7B are 20.24 and 36.05,respectively,which are lower than other pruning algorithms.The PPL scores of Vicuna-7B after pruning are 21.24 and 85.81,which are also better than other pruning algorithms,showing that the algorithm has higher universality and accuracy.

Key words: Large language models, Model pruning, Taylor+ importance estimates, LoRa

CLC Number:

TP391

HUANG Haixin, XU Chenglong, FU Yao. Research on Structured Pruning Algorithm Based on Information Fusion[J].Computer Science, 2025, 52(11A): 241000041-6.

References

[1]FRANTAR E,ALISTARH D.Sparsegpt:Massive languagemodels can be accurately pruned in one-shot[C]//International Conference on Machine Learning.PMLR,2023:10323-10337.
[2]SUN M,LIU Z,BAI R A,et al.A simple and effective pruning approach for large language models[J].arXiv:2306.11695,2023.
[3]MA X,FANG G,WANG X.Llm-pruner:On the structuralpruning of large language models[J].Advances in Neural Information Processing Systems,2023,36:21702-21720.
[4]KIM B K,KIM G,KIM T H,et al.Shortened llama:A simple depth pruning for large language models[J].arXiv:2402.02834,2024.
[5]ZHU X P,YAO H D,LIU J,et al.Review of Evolution of Large Language Model Algorithms[J].ZTE Technology Journal,2024,30(2):9-20.
[6]FANG G,MA X,SONG M,et al.Depgraph:Towards any structural pruning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:16091-16101.
[7]MOLCHANOV P,MALLYA A,TYREE S,et al.Importance estimation for neural network pruning[C]//CVPR.2019.
[8]DETTMERS T,LEWIS M,BELKADA Y,et al.LLM.int8():8-bit matrix multiplication for transformersat scale[J].arXiv:2208.07339,2022.
[9]LI H,KADAV A,DURDANOVIC I,et al.Pruning filters for efficient convnets[C]//ICLR.2017.
[10]HU E J,SHEN Y,WALLIS P,et al.Lora:Low-rank adaptation of large language models[J].arXiv:2106.09685,2021.
[11]HE T W,WANG H.Evaluating Perplexity of Chinese Sentences Based on Grammar & Semantics Analysis[J].ApplicationResearch of Computers,2017,34(12):3538-3542,3546.
[12]KIM B K,KIM G,KIM T H,et al.Shortened llama:A simple depth pruning for large language models[J].arXiv:2402.02834,2024.
[13]ZHU Y K,KIROS R,ZEMEL R,et al.Aligning books and movies:Towards story-like visual explanations by watching movies and reading books[C]//ICCV.2015.
[14]TAORI R,GULRAJANI I,ZHANG T Y,et al.Stanford alpa-ca:An instruction-following llama model[EB／OL].https://github.com/tatsu-lab/stanford_alpaca.
[15]TOUVRON H,MARTIN L,STONE K,et al.Llama 2:Openfoundation and fine-tuned chat models[J].arXiv:2307.09288,2023.
[16]CHIANG W L,LI Z,LIN Z,et al.Vicuna:An open-source chatbot impressing gpt-4 with 90%* chatgpt quality[J].See https://vicuna.lmsys.org(accessed 14 April 2023),2023,2(3):6.
[17]SUN M J,LIU Z,BAIR A,et al.A simple and effective pruning approach for large language models[C]//ICLR.2024.
[18]AN Y,ZHAO X,YU T,et al.Fluctuation-based adaptive structured pruning for large language models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:10865-10873.
[19]LV B,ZHOU Q,DING X,et al.KVPruner:Structural Pruning for Faster and Memory-Efficient Large Language Models[J].arXiv:2409.11057,2024.
[20]CHENG H,ZHANG M,SHI J Q.MINI-LLM:Memory-Efficient Structured Pruning for Large Language Models[J].arXiv:2407.11681,2024.

Related Articles 15

[1]	LI Maolin, LIN Jiajie, YANG Zhenguo. Confidence-guided Prompt Learning for Multimodal Aspect-level Sentiment Analysis [J]. Computer Science, 2025, 52(7): 241-247.
[2]	CHEN Jinyin, XI Changkun, ZHENG Haibin, GAO Ming, ZHANG Tianxin. Survey of Security Research on Multimodal Large Language Models [J]. Computer Science, 2025, 52(7): 315-341.
[3]	LI Bo, MO Xian. Application of Large Language Models in Recommendation System [J]. Computer Science, 2025, 52(6A): 240400097-7.
[4]	HU Caishun. Study on Named Entity Recognition Algorithms in Audit Domain Based on Large LanguageModels [J]. Computer Science, 2025, 52(6A): 240700190-4.
[5]	WU Zhihua, CHENG Jianghua, LIU Tong, CAI Yahui, CHENG Bang, PAN Lehao. Human Target Detection Algorithm for Low-quality Laser Through-window Imaging [J]. Computer Science, 2025, 52(6A): 240600069-6.
[6]	GAO Hongkui, MA Ruixiang, BAO Qihao, XIA Shaojie, QU Chongxiao. Research on Hybrid Retrieval-augmented Dual-tower Model [J]. Computer Science, 2025, 52(6): 324-329.
[7]	PANG Mingyi, WEI Xianglin, ZHANG Yunxiang, WANG Bin, ZHUANG Jianjun. Efficient Adaptive CNN Accelerator for Resource-limited Chips [J]. Computer Science, 2025, 52(4): 94-100.
[8]	WANG Hao, CAI Yuhang, CHEN Guojie, WANG Lu. Study on MAC Protocol of LoRa Network Hidden Terminal Based on BTMA [J]. Computer Science, 2025, 52(3): 318-325.
[9]	GUO Liwei, WU Yonghao, LIU Yong. Semantic Variations Based Defect Generation and Prediction Model Testing [J]. Computer Science, 2025, 52(11A): 241200059-7.
[10]	PAN Jie, WANG Juan, WANG Nan. Large Language Models and Rumors:A Survey on Generation and Detection [J]. Computer Science, 2025, 52(11): 1-12.
[11]	FANG Quan, ZHANG Jinlong, WANG Bingqian, HU Jun. Research on Domain Knowledge Question Answering via Large Language Models withCompositional Context Prompting [J]. Computer Science, 2025, 52(11): 13-21.
[12]	ZHANG Haoran, HAO Wenning, JIN Dawei, CHENG Kai, ZHAI Ying. DF-RAG:A Retrieval-augmented Generation Method Based on Query Rewriting and Knowledge Selection [J]. Computer Science, 2025, 52(11): 30-39.
[13]	ZHOU Yuchen, LI Peng, HAN Keji. Instruct-Malware:Control Flow Graph Based Large Language Model Analysis of Malware [J]. Computer Science, 2025, 52(11): 40-48.
[14]	CHEN Yuyan, JIA Jiyuan, CHANG Jingwen, ZUO Kaiwen, XIAO Yanghua. SPEAKSMART:Evaluating Empathetic Persuasive Responses by Large Language Models [J]. Computer Science, 2025, 52(10): 217-230.
[15]	DUN Jingbo, LI Zhuo. Survey on Transmission Optimization Technologies for Federated Large Language Model Training [J]. Computer Science, 2025, 52(1): 42-55.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Research on Structured Pruning Algorithm Based on Information Fusion

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0