计算机科学 ›› 2026, Vol. 53 ›› Issue (3): 136-142.doi: 10.11896/jsjkx.250600087
李昊, 丁立中, 傅稼润, 令狐赵桓
LI Hao, DING Lizhong, FU Jiarun, LINGHU Zhaohuan
摘要: 基于推理数据的大模型指令微调,通过显式建模复杂任务的多步逻辑关联,显著提升模型的推理准确性,然而微调过程依赖海量高质量数据,导致算力开销急剧攀升。已有数据压缩技术主要聚焦于原始规模缩减,普遍缺乏针对推理数据的压缩方法设计,忽视了推理数据中的多步逻辑关联、语义依存关系等,致使关键推理链完整性受损,进而降低了推理性能。为此,提出基于推理贡献度的精化(Refinement Based on Inference Contribution,RBIC)。通过分析推理数据的语义相似性构建知识领域图谱,精准定位核心信息。将数据样本的语义与大模型的推理准确率相结合,划分难度梯度,覆盖全场景推理需求。通过多步推理数据的逻辑复杂度量化推理贡献度,精化对模型推理贡献度最高的数据样本。实验结果表明,基于RBIC精化的推理数据进行微调后,模型平均推理性能仅下降1.13%,而训练时间缩短为原耗时的 16%,验证了RBIC在模型效能与资源消耗间实现了最优平衡,有望推动多领域大模型在资源受限环境下的高效部署与微调优化。
中图分类号:
| [1]ZHANG B N,LI C X,FAN K.MARIO Eval:Evaluate yourmath LLM with your math LLM-A mathematical dataset eva-luation toolkit[J].arXiv:2404.13925,2024. [2]GAO B F,CAI Z F,XU R X,et al.LLM critics help catch bugs in mathematics:towards a better mathematical verifier with natural language feedback[J].arXiv:2406.14024,2024. [3]LUO Z Y,XU C,ZHAO P,et al.WizardCoder:Empoweringcode large language models with evol-instruct[J].arXiv:2306.08568,2023. [4]COIGNION T,QUINTON C,ROUVOY R.A performancestudy of LLM-generated code on LeetCode[C]//Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering.ACM,2024:79-89. [5]ZHANG Q T,WANG Y C,WANG H X,et al.A Survey on Fine-Tuning Techniques for Large Language Models[J].Computer Engineering and Applications,2024,60(17):17-33. [6]SHEN S Y,PENG J L,ZHANG X L,et al.The chinese dataset distilled from deepseek-r1-671b[EB/OL].https://huggingface.co/datasets/Congliu/Chinese-DeepSeek-R1-Distill-data-110k. [7]CHEN H,ZHANG Y M,ZHANG Q,et al.Maybe only 0.5% data is needed:a preliminary exploration of low training data instruction tuning[J].arXiv:2305.09246,2023. [8]ZHOU C,LIU P,XU P,et al.Lima:Less is more for alignment[J].Advances in Neural Information Processing Systems,2023,36:55006-55021. [9]WEI Y X,CASSANO F,LIU J W,et al.SelfCodeAlign:Self-alignment for code generation[J].arXiv:2410.24198,2024. [10]ZHANG Z S,ZHANG A,LI M,et al.Automatic chain ofthought prompting in large language models[J].arXiv:2210.03493,2022. [11]YU R N,LIU S H,WANG X C.Dataset distillation:a comprehensive review[J].arXiv:2301.07014,2023. [12]WEI Y,WANG Z,LIU J,et al.Magicoder:Empowering codegeneration with oss-instruct[J].arXiv:2312.02120,2023. [13]ZHANG J,ZHANG C X,LIU Y,et al.D3:Diversity,Difficulty,and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning[J].arXiv:2503.11441,2025. [14]LI Y Q,LI W J.Data distillation for text classification[J].ar-Xiv:2104.08448,2021. [15]CHAI C L,WANG J Y,TANG N,et al.Efficient coreset selection with cluster-based methods[C]//Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York:ACM,2023:167-178. [16]MIRZASOLEIMAN B,CAO K D,LESKOVEC J.Coresets forrobust training of deep neural networks against noisy labels[J].Advances in Neural Information Processing Systems,2020,33:11465-11477. [17]POOLADZANDI O,DAVINI D,MIRZASOLEIMAN B.Adaptive second order coresets for data-efficient machine learning[C]//Proceedings of the 39th International Conference on Machine Learning.Cambridge,MA:PMLR,2022:17848-17869. [18]LI X F,ZOU H Y,LIU P F.LIMR:Less is more for RL scaling[J].arXiv:2502.11886,2025. [19]LI M,ZHANG Y,LI Z,et al.From quantity to quality:Boosting llm performance with self-guided data selection for instruction tuning[J].arXiv:2308.12032,2023. [20]GUO C C,ZHAO B,BAI Y B.DeepCore:A comprehensive library for coreset selection in deep learning[C]//Proceedings of the International Conference on Database and Expert Systems Applications.Cham:Springer,2022:181-195. [21]SUCHOLUTSKY I,SCHONLAU M.Soft-label dataset distil-lation and text dataset distillation[C]//Proceedings of the 2021 International Joint Conference on Neural Networks(IJCNN).New York:IEEE,2021:1-8. [22]MAEKAWA A,KOSUGI S,FUNAKOSHI K,et al.DiLM:Distilling dataset into language model for text-level dataset distillation[C]//Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2024. [23]LU H,ISONUMA M,MORI J,et al.Unidetox:Universal de-toxification of large language models via dataset distillation[J].arXiv:2504.20500,2025. [24]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence em-beddings using Siamese BERT-networks[J].arXiv:1908.10084,2019. [25]HENDRYCKS D,BURNS C,KADAVATH S,et al.Measuring mathematical problem solving with the MATH dataset[J].ar-Xiv:2103.03874,2021. [26]YANG A,YANG B S,ZHANG B C,et al.Qwen2.5 technical report[J].arXiv:2412.15115,2024. [27]JAIN N,CHIANG P,WEN Y X,et al.NeFTune:Noisy embeddings improve instruction finetuning[J].arXiv:2310.05914,2023. [28]DAO T,FU D,ERMON S,et al.FlashAttention:Fast and me-mory-efficient exact attention with IO-awareness[J].Advances in Neural Information Processing Systems,2022,35:16344-16359. [29]LIGHTMAN H,KOSARAJU V,BURDA Y,et al.Let’s verify step by step[C]//Proceedings of the 12th International Confe-rence on Learning Representations.2023. [30]KWON W,LI Z H,ZHUANG S Y,et al.Efficient memory ma-nagement for large language model serving with Paged Attention[C]//Proceedings of the 29th ACM Symposium on Operat-ing Systems Principles.New York:ACM,2023. [31]YE Y X,HUANG Z,XIAO Y,et al.LIMO:Less is more for reasoning[J].arXiv:2502.03387,2025. [32]MUENNIGHOFF N,YANG Z T,SHI W J,et al.S1:Simpletest-time scaling[C]//Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.2025:20275-20321. [33]XIA M,MALLADI S,GURURANGAN S,et al.Less:Selecting influential data for targeted instruction tuning[J].arXiv:2402.04333,2024. [34]CAO Y,KANG Y,WANG C,et al.Instruction mining:Instruction data selection for tuning large language models[J].arXiv:2307.06290,2023. |
|
||