基于扰动和冻结预训练模型的程序自动修复

doi:10.11896/jsjkx.241100182

Abstract

Abstract: With the increasing complexity of software,the scale and complexity of program defects are also increasing.Program defects not only consume a large amount of development costs but also lead to real-world security issues.Existing program repair methods generally suffer from poor repair effectiveness and high training costs.To address these issues,this paper proposes an automatic program repair method based on perturbation and freezing of pre-trained models.By adding noise to the model parameters through a matrix-based perturbation method,it alleviates the overfitting problem of pre-trained models on the program repair task during fine-tuning.Furthermore,freezing the encoder in the pre-trained model reduces the model’s training time and computational resource consumption.Additionally,the checkpoint ensemble strategy is adopted to enhance the model’s repair effectiveness.Experiments on 40 Python programs in the QuixBugs dataset demonstrate that the proposed method has significant advantages in reducing model training time and computational resource consumption,as well as in repair effectiveness.It only requires training 41.62% of the parameters of the original model,reduces training time by 39.16%,and can repair 70% of the defects in the dataset,demonstrating the diversity of the repaired defect types.

Key words: Automated program repair, Deep learning, Pre-trained model, Fine-tuning, Checkpoint ensemble

CLC Number:

TP311.5

ZHANG Lizheng, YANG Qiuhui, DAI Shengxin. Automated Program Repair Based on Perturbing and Freezing Pre-trained Model[J].Computer Science, 2025, 52(12): 18-23.

References

[1]LUCA G,DANIELA M,LEONARDO M.Automatic SoftwareRepair:A Survey[J].IEEE Transactions on Software Enginee-ring,2019,45(1):34-67.
[2]ZHANG Q,FANG C,MA Y,et al.A survey of learning-based automated program repair [J].ACM Transactions on Software Engineering and Methodology,2023,33(2):55.
[3]HUANG K,XU Z,YANG S,et al.A survey on automated program repair techniques [J].arXiv:2303.18184,2023.
[4]CHEN Z,KOMMRUSCH S,TUFANO M,et al.Sequencer:Sequence-to-sequence Learning for End-to-end Program Repair.[J].IEEE Transactions on Software Engineering,2019,47(9):1943-1959.
[5]TUFANO M,PANTIUCHINA J,WATSON C,et al.On Lear-ning Meaningful Code Changes Via Neural Machine Translation [C]//IEEE 41th International Conference on Software Engineering.IEEE,2019:25-36.
[6]TUFANO M,WATSON C,BAVOTA G,et al.An EmpiricalStudy on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation [J].ACM Transactions on Software Engineering and Methodology,2019,28(4):19-48.
[7]CHAKRABORTY S,DING Y,ALLAMANIS M,et al.Codit:Code Editing with Tree-Based Neural Models [J].IEEE Transactions on Software Engineering,2022,48(4):1385-1399.
[8]MENG X,WANG X,ZHANG H,et al.Improving Fault Localization and Program Repair with Deep Semantic Features andTransferred Knowledge [C]//Proceedings of the 44th IEEE/ACM International Conference on Software Engineering.2022:1169-1180.
[9]GUPTA R,PAL S,KANADE A,et al.Deepfix:Fixing CommonC Language Errors by Deep Learning [C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.2017:1345-1351.
[10]WANG Y,WANG W,JOTY S,et al.Codet5:Identifier-aware Unified Pre-trained Encoder-decoder Models for Code Understanding and Generation [C]//Proceedings of the 2021 Confe-rence on Empirical Methods in Natural Language Processing.2021:8696-8708.
[11]WANG Y,LE H,GOTMARE A,et al.CodeT5+:Open CodeLarge Language Models for Code Understanding and Generation [C]//Conference on Empirical Methods in Natural Language Processing.2023:1069-1088.
[12]NIJKAMP E,PANG B,HAYASHI H,et al.CodeGen:An Open Large Language Model for Code with Multi-Turn Program Synthesis [C]//International Conference on Learning Representations.2022.
[13]FRIED D,AGHAJANYAN A,LIN J,et al.InCoder:A generative model for code infilling and synthesis [J].arXiv:2204.05999,2022.
[14]AHMAD W,CHAKRABORTY S,RAY B,et al.Unified pre-training for program understanding and generation [C]//Proceedings of the 2021Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.ACL,2021:2655-2668.
[15]QI W,YAN Y,GONG Y,et al.Prophetnet:Predicting future n-gram for sequence-to-sequence pre-training [C]//Findings of the Association for Computational Linguistics:EMNLP.2020:2401-2410.
[16]CHEN Z,KOMM R S,TUFANO M,et al.Sequencer:Sequence-to-sequence learning for end-to-end program repair [J].IEEE Transactions on Software Engineering,2019,47(9):1943-1959.
[17]CAO H L,HAN D,CHU Y H,et al.Multi-mechanism neural machine translation framework for automatic program repair [J].Journal of Intelligent & Fuzzy Systems,2024,46:7859-7873.
[18]LUTELLIER T,PHAM H V,PANG L,et al.Coconut:Combining context-aware neural translation models using ensemble for program repair [C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.2020:101-114.
[19]FU M,TANTITHAMTHAVORN C,LE T,et al.VulRepair:a T5-based automated software vulnerability repair [C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ACM,2022:935-947.
[20]BERABI B,HE J,RAYCHEV V,et al.Tfix:Learning to fix coding errors with a text-to-text transformer [C]//International Conference on Machine Learning.PMLR,2021:780-791.
[21]WAN H,LUO H Z,LI M Y,et al.Automated program repair for introductory programming assignments [J].IEEE Transactions on Learning Technologies,2024,17:1705-1720.
[22]XIAO J M,XU Z P,CHEN S P,et al.Confix:Combining node-level fix templates and masked language model for automatic program repair [J].Journal of Systems and Software,2024,216:112116-112130.
[23]GHARIBI R,SADREDDINI M H,FAKHRAMAD S M.T5APR:Empowering automated program repair across languages through checkpoint ensemble [J].Journal of Systems and Software,2024,214:112083.
[24]HAO S C,SHI X J,LIU H W.RetypeR:Integrated retrieval-based automatic program repair for Python type errors [C]//2024 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2024:199-210.
[25]AHMED T,LEDESMA N R,DEVANBU P.SynShine:Im-proved fixing of syntax errors [J].IEEE Transactions on Software Engineering,2023,49(4):2169-2181.
[26]PRENNER J A,BABII H,ROBBES R.Can OpenAI’s codex fix bugs? an evaluation on QuixBugs [C]//Proceedings of the Third International Workshop on Automated Program Repair.New York:ACM,2022:69-75.
[27]WU C,WU F,QI T,et al.NoisyTune:A little noise can help you finetune pretrained language models better [C]//Annual Meeting of the Association for Computational Linguistics.2022.
[28]LIN D,KOPPEL J,CHEN A,et al.QuixBugs:a multi-lingualprogram repairbenchmark set based on the quixey challenge [C]//Proceedings of the 2017 ACM SIGPLAN International Conference on Systems,Programming,Languages,and Applications:Software for Humanity.New York:ACM,2017:55-56.

Related Articles 15

[1]	LIU Wei, XU Yong, FANG Juan, LI Cheng, ZHU Yujun, FANG Qun, HE Xin. Multimodal Air-writing Gesture Recognition Based on Radar-Vision Fusion [J]. Computer Science, 2025, 52(9): 259-268.
[2]	ZHONG Boyang, RUAN Tong, ZHANG Weiyan, LIU Jingping. Collaboration of Large and Small Language Models with Iterative Reflection Framework for Clinical Note Summarization [J]. Computer Science, 2025, 52(9): 294-302.
[3]	GAO Long, LI Yang, WANG Suge. Sentiment Classification Method Based on Stepwise Cooperative Fusion Representation [J]. Computer Science, 2025, 52(9): 313-319.
[4]	ZHOU Tao, DU Yongping, XIE Runfeng, HAN Honggui. Vulnerability Detection Method Based on Deep Fusion of Multi-dimensional Features from Heterogeneous Contract Graphs [J]. Computer Science, 2025, 52(9): 368-375.
[5]	YIN Shi, SHI Zhenyang, WU Menglin, CAI Jinyan, YU De. Deep Learning-based Kidney Segmentation in Ultrasound Imaging:Current Trends and Challenges [J]. Computer Science, 2025, 52(9): 16-24.
[6]	ZENG Lili, XIA Jianan, LI Shaowen, JING Maike, ZHAO Huihui, ZHOU Xuezhong. M2T-Net:Cross-task Transfer Learning Tongue Diagnosis Method Based on Multi-source Data [J]. Computer Science, 2025, 52(9): 47-53.
[7]	LI Yaru, WANG Qianqian, CHE Chao, ZHU Deheng. Graph-based Compound-Protein Interaction Prediction with Drug Substructures and Protein 3D Information [J]. Computer Science, 2025, 52(9): 71-79.
[8]	LUO Chi, LU Lingyun, LIU Fei. Partial Differential Equation Solving Method Based on Locally Enhanced Fourier NeuralOperators [J]. Computer Science, 2025, 52(9): 144-151.
[9]	LIU Leyuan, CHEN Gege, WU Wei, WANG Yong, ZHOU Fan. Survey of Data Classification and Grading Studies [J]. Computer Science, 2025, 52(9): 195-211.
[10]	LIU Zhengyu, ZHANG Fan, QI Xiaofeng, GAO Yanzhao, SONG Yijing, FAN Wang. Review of Research on Deep Learning Compiler [J]. Computer Science, 2025, 52(8): 29-44.
[11]	TANG Boyuan, LI Qi. Review on Application of Spatial-Temporal Graph Neural Network in PM_2.5 ConcentrationForecasting [J]. Computer Science, 2025, 52(8): 71-85.
[12]	ZHENG Cheng, YANG Nan. Aspect-based Sentiment Analysis Based on Syntax,Semantics and Affective Knowledge [J]. Computer Science, 2025, 52(7): 218-225.
[13]	FAN Xing, ZHOU Xiaohang, ZHANG Ning. Review on Methods and Applications of Short Text Similarity Measurement in Social Media Platforms [J]. Computer Science, 2025, 52(6A): 240400206-8.
[14]	YANG Jixiang, JIANG Huiping, WANG Sen, MA Xuan. Research Progress and Challenges in Forest Fire Risk Prediction [J]. Computer Science, 2025, 52(6A): 240400177-8.
[15]	YE Jiale, PU Yuanyuan, ZHAO Zhengpeng, FENG Jue, ZHOU Lianmin, GU Jinjing. Multi-view CLIP and Hybrid Contrastive Learning for Multimodal Image-Text Sentiment Analysis [J]. Computer Science, 2025, 52(6A): 240700060-7.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Automated Program Repair Based on Perturbing and Freezing Pre-trained Model

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0