计算机科学 ›› 2025, Vol. 52 ›› Issue (12): 18-23.doi: 10.11896/jsjkx.241100182
张李政, 杨秋辉, 代声馨
ZHANG Lizheng, YANG Qiuhui, DAI Shengxin
摘要: 随着软件复杂性的增加,程序缺陷的规模和复杂度也随之增加,程序缺陷不仅消耗大量开发成本,还会导致现实世界中的安全问题。现有的程序修复方法普遍存在修复效果不佳、训练成本高的问题。针对这些问题,提出了基于扰动和冻结预训练模型的程序自动修复方法。该方法通过基于矩阵的扰动方法对模型参数增加噪声,缓解了微调过程中预训练模型在程序修复任务上的过拟合问题;冻结预训练模型中的编码器,缩短了模型的训练时间和减少了计算资源的消耗。此外,通过检查点集成策略,增强了模型的修复效果。在QuixBugs数据集中的40个Python程序上进行实验,结果表明,所提方法在缩短模型训练时间和降低计算资源消耗方面以及修复效果方面都具有显著优势,它仅需要训练原始模型41.62%的参数量,训练时间缩短了39.16%,能修复数据集中70%的缺陷,修复的缺陷类型具有多样性。
中图分类号:
| [1]LUCA G,DANIELA M,LEONARDO M.Automatic SoftwareRepair:A Survey[J].IEEE Transactions on Software Enginee-ring,2019,45(1):34-67. [2]ZHANG Q,FANG C,MA Y,et al.A survey of learning-based automated program repair [J].ACM Transactions on Software Engineering and Methodology,2023,33(2):55. [3]HUANG K,XU Z,YANG S,et al.A survey on automated program repair techniques [J].arXiv:2303.18184,2023. [4]CHEN Z,KOMMRUSCH S,TUFANO M,et al.Sequencer:Sequence-to-sequence Learning for End-to-end Program Repair.[J].IEEE Transactions on Software Engineering,2019,47(9):1943-1959. [5]TUFANO M,PANTIUCHINA J,WATSON C,et al.On Lear-ning Meaningful Code Changes Via Neural Machine Translation [C]//IEEE 41th International Conference on Software Engineering.IEEE,2019:25-36. [6]TUFANO M,WATSON C,BAVOTA G,et al.An EmpiricalStudy on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation [J].ACM Transactions on Software Engineering and Methodology,2019,28(4):19-48. [7]CHAKRABORTY S,DING Y,ALLAMANIS M,et al.Codit:Code Editing with Tree-Based Neural Models [J].IEEE Transactions on Software Engineering,2022,48(4):1385-1399. [8]MENG X,WANG X,ZHANG H,et al.Improving Fault Localization and Program Repair with Deep Semantic Features andTransferred Knowledge [C]//Proceedings of the 44th IEEE/ACM International Conference on Software Engineering.2022:1169-1180. [9]GUPTA R,PAL S,KANADE A,et al.Deepfix:Fixing CommonC Language Errors by Deep Learning [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:1345-1351. [10]WANG Y,WANG W,JOTY S,et al.Codet5:Identifier-aware Unified Pre-trained Encoder-decoder Models for Code Understanding and Generation [C]//Proceedings of the 2021 Confe-rence on Empirical Methods in Natural Language Processing.2021:8696-8708. [11]WANG Y,LE H,GOTMARE A,et al.CodeT5+:Open CodeLarge Language Models for Code Understanding and Generation [C]//Conference on Empirical Methods in Natural Language Processing.2023:1069-1088. [12]NIJKAMP E,PANG B,HAYASHI H,et al.CodeGen:An Open Large Language Model for Code with Multi-Turn Program Synthesis [C]//International Conference on Learning Representations.2022. [13]FRIED D,AGHAJANYAN A,LIN J,et al.InCoder:A generative model for code infilling and synthesis [J].arXiv:2204.05999,2022. [14]AHMAD W,CHAKRABORTY S,RAY B,et al.Unified pre-training for program understanding and generation [C]//Proceedings of the 2021Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2021:2655-2668. [15]QI W,YAN Y,GONG Y,et al.Prophetnet:Predicting future n-gram for sequence-to-sequence pre-training [C]//Findings of the Association for Computational Linguistics:EMNLP.2020:2401-2410. [16]CHEN Z,KOMM R S,TUFANO M,et al.Sequencer:Sequence-to-sequence learning for end-to-end program repair [J].IEEE Transactions on Software Engineering,2019,47(9):1943-1959. [17]CAO H L,HAN D,CHU Y H,et al.Multi-mechanism neural machine translation framework for automatic program repair [J].Journal of Intelligent & Fuzzy Systems,2024,46:7859-7873. [18]LUTELLIER T,PHAM H V,PANG L,et al.Coconut:Combining context-aware neural translation models using ensemble for program repair [C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.2020:101-114. [19]FU M,TANTITHAMTHAVORN C,LE T,et al.VulRepair:a T5-based automated software vulnerability repair [C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ACM,2022:935-947. [20]BERABI B,HE J,RAYCHEV V,et al.Tfix:Learning to fix coding errors with a text-to-text transformer [C]//International Conference on Machine Learning.PMLR,2021:780-791. [21]WAN H,LUO H Z,LI M Y,et al.Automated program repair for introductory programming assignments [J].IEEE Transactions on Learning Technologies,2024,17:1705-1720. [22]XIAO J M,XU Z P,CHEN S P,et al.Confix:Combining node-level fix templates and masked language model for automatic program repair [J].Journal of Systems and Software,2024,216:112116-112130. [23]GHARIBI R,SADREDDINI M H,FAKHRAMAD S M.T5APR:Empowering automated program repair across languages through checkpoint ensemble [J].Journal of Systems and Software,2024,214:112083. [24]HAO S C,SHI X J,LIU H W.RetypeR:Integrated retrieval-based automatic program repair for Python type errors [C]//2024 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2024:199-210. [25]AHMED T,LEDESMA N R,DEVANBU P.SynShine:Im-proved fixing of syntax errors [J].IEEE Transactions on Software Engineering,2023,49(4):2169-2181. [26]PRENNER J A,BABII H,ROBBES R.Can OpenAI’s codex fix bugs? an evaluation on QuixBugs [C]//Proceedings of the Third International Workshop on Automated Program Repair.New York:ACM,2022:69-75. [27]WU C,WU F,QI T,et al.NoisyTune:A little noise can help you finetune pretrained language models better [C]//Annual Meeting of the Association for Computational Linguistics.2022. [28]LIN D,KOPPEL J,CHEN A,et al.QuixBugs:a multi-lingualprogram repairbenchmark set based on the quixey challenge [C]//Proceedings of the 2017 ACM SIGPLAN International Conference on Systems,Programming,Languages,and Applications:Software for Humanity.New York:ACM,2017:55-56. |
|
||