计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 250-258.doi: 10.11896/jsjkx.240100019

• 计算机软件 • 上一篇    下一篇

基于CodeBERT和Stacking集成学习的补丁正确性验证方法

韩威, 姜淑娟, 周伟   

  1. 中国矿业大学计算机科学与技术学院 江苏 徐州 221116
    中国矿业大学矿山数字化教育部工程研究中心 江苏 徐州 221116
  • 收稿日期:2024-01-02 修回日期:2024-06-04 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 姜淑娟(shjjiang@cumt.edu.cn)
  • 作者简介:(ts21170066p31@cumt.edu.cn)
  • 基金资助:
    国家自然科学基金(61673384)

Patch Correctness Verification Method Based on CodeBERT and Stacking Ensemble Learning

HAN Wei, JIANG Shujuan, ZHOU Wei   

  1. School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
    Engineering Research Center of Mine Digitalization of Ministry of Education,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
  • Received:2024-01-02 Revised:2024-06-04 Online:2025-01-15 Published:2025-01-09
  • About author:HAN Wei,born in 1998,master.His main research interest is automatic program repair.
    JIANG Shujuan,born in 1966,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.15801M).Her main research interests include software analysis and test and compilation techniques.
  • Supported by:
    National Natural Science Foundation of China(61673384).

摘要: 近年来,自动程序修复已成为软件工程领域的重要研究课题。然而,现有的自动修复技术大多是基于补丁生成和测试的,在补丁验证环节时间成本很高。此外,由于测试套件的不完备,许多候选补丁虽然能通过测试,但实际上并不正确,从而导致补丁过拟合。为提高补丁验证的效率并缓解补丁过拟合的问题,提出了一种静态的补丁验证方法。该方法首先使用大型预训练模型CodeBERT自动提取缺陷代码片段和补丁代码片段的语义特征,然后使用历史缺陷修复补丁数据训练Stacking集成学习模型,训练之后的模型可以对新的缺陷修复补丁进行有效验证。在Defects4J缺陷数据集相关的1 000个补丁数据上对所提方法的验证能力进行评估。实验结果表明,该方法可以有效地验证补丁的正确性,从而提高补丁验证的效率。

关键词: 自动程序修复, 补丁验证, 预训练模型, 集成学习, Defects4J缺陷数据集

Abstract: In recent years,automatic program repair has become an important research topics in the field of software engineering.However,most of the existing automatic repair technologies are based on patch generation and testing,which consumes a significant amount of time and cost in the patch verification process.In addition,because the test suite is not completeness,many candidate patches can pass the test,but the test results are not consistent with the facts,which leads to the patch overfitting problem.To improve the efficiency of patch verification and alleviate patch overfitting issues,a static patch verification method is proposed.The method first uses the large pre-training model CodeBERT to automatically extract the semantic features of defect code fragments and patch code fragments,and then uses the historical defect repair patch data to train a Stacking ensemble learning model.The trained model can effectively verify the new defect repair patch.The verification ability of the proposed method is evaluated on the 1 000 patch data related to the Defects4J defect dataset.Experimental results show that the static patch verification method can effectively verify the correctness of the patch,thereby improving the efficiency of patch verification.

Key words: Automatic program repair, Patch verification, Pre-training model, Ensemble learning, Defects4J defect dataset

中图分类号: 

  • TP311
[1]LIU K,KOYUNCU A,KIM D,et al.TBar:revisiting template-based automated program repair[C]//Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis.NY:ACM,2019:31-42.
[2]WONG C,SANTIESTEBAN P,KÄSTNER C,et al.VarFix:balancing edit expressiveness and search effectiveness in automated program repair[C]//Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.NY:ACM,2021:354-366.
[3]JIANG N,LUTELLIER T,TAN L.Cure:code-aware neuralmachine translation for automatic program repair[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering.NJ:IEEE,2021:1161-1173.
[4]JIANG J J,CHEN J J,XIONG Y F.Survey of automatic program repair techniques[J].Journal of Software,2021,32(9):2665-2690.
[5]CSUVIK V,HORVÁTH D,HORVÁTH F,et al.Utilizingsource code embeddings to identify correct patches[C]//2020 IEEE 2nd International Workshop on Intelligent Bug Fixing.NJ:IEEE,2020:18-25.
[6]FRASER G,ARCURI A.Evosuite:automatic test suite generation for object-oriented software[C]//Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.NY:ACM,2011:416-419.
[7]PACHECO C,ERNST M D.Randoop:feedback-directed ran-dom testing for Java[C]//Companion to the 22nd ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications Companion.NY:ACM,2007:815-816.
[8]XIONG Y F,LIU X Y,ZENG M H,et al.Identifying patch correctness in test-based program repair[C]//Proceedings of the 40th International Conference on Software Engineering.NY:ACM,2018:789-799.
[9]LIN B,WANG S W,WEN M,et al.Context-aware code change embedding for better patch correctness assessment[J].ACM Transactions on Software Engineering and Methodology,2022,31(3):1-29.
[10]YE H,GU J,MARTINEZ M,et al.Automated classification of overfitting patches with statically extracted code features[J].IEEE Transactions on Software Engineering,2021,48(8):2920-2938.
[11]FENG Z Y,GUO D Y,TANG D Y,et al.Codebert:a pre-trained model for programming and natural languages[J].ar-Xiv:2002.08155,2020.
[12]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].arXiv:1706.03762,2017.
[13]HUSAIN H,WU H H,GAZIT T,et al.Codesearchnet chal-lenge:Evaluating the state of semantic code search[J].arXiv:1909.09436,2019.
[14]KARTHIKEYAN K,WANG Z H,MAYHEW S,et al.Cross-lingual ability of multilingual bert:An empirical study[C]//International Conference on Learning Representations.OpenReview.net,2020.
[15]WOLPERT D H.Stacked generalization[J].Neural Networks,1992,5(2):241-259.
[16]DITTERRICH T G.Machine learning research:four current direction[J].Artificial Intelligence Magzine,1997,4:97-136.
[17]HOANG T,Kang H J,LO D,et al.CC2Vec:distributed representations of code changes[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.NY:ACM,2020:518-529.
[18]TIAN H Y,LIU K,KABORÉ A K,et al.Evaluating representation learning of code changes for predicting patch correctness in program repair[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.NJ:IEEE,2020:981-992.
[19]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[20]CHEN T Q,GUESTRIN C.Xgboost:a scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd International Conference on Knowledge Discovery and Data Mining.NY:ACM,2016:785-794.
[21]KE G L,MENG Q,FINLEY T,et al.Lightgbm:A highly efficient gradient boosting decision tree[J].Neural Information Processing Systems,2017,30:3149-3157.
[22]PROKHORENKOVA L,GUSEV G,VOROBEV A,et al.CatBoost:unbiased boosting with categorical features[J].Neural Information Processing Systems,2018,31:6639-6649.
[23]LIU K,WANG S W,KOYUNCU A,et al.On the efficiency of test suite based program repair:a systematic assessment of 16 automated repair systems for java programs[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.NY:ACM,2020:615-627.
[24]JUST R,JALALI D,ERNST M D.Defects4J:a database ofexisting faults to enable controlled testing studies for Java programs[C]//Proceedings of the 2014 International Symposium on Software Testing and Analysis.NY:ACM,2014:437-440.
[25]LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//International Conference on Machine Learning.NY:ACM,2014:1188-1196.
[26]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[27]QI Z C,LONG F,ACHOUR S,et al.An analysis of patch plausibility and correctness for generate-and-validate patch generation systems[C]//Proceedings of the 2015 International Symposium on Software Testing and Analysis.NY:ACM,2015:24-36.
[28]LE G C,NGUYEN T V,FORREST S,et al.Genprog:a generic method for automatic software repair[J].IEEE Transactions on Software Engineering,2011,38(1):54-72.
[29]QI Y H,MAO X G,LEI Y,et al.The strength of random search on automated program repair[C]//Proceedings of the 36th International Conference on Software Engineering.NY:ACM,2014:254-265.
[30]WEIMER W,FRY Z P,FORREST S.Leveraging programequivalence for adaptive program repair:Models and first results[C]//2013 28th IEEE/ACM International Conference on Automated Software Engineering.NJ:IEEE,2013:356-366.
[31]XIN Q,REISS S P.Identifying test-suite-overfitted patchesthrough test case generation[C]//Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis.NY:ACM,2017:226-236.
[32]YANG J,ZHIKHARTSEV A,LIU Y,et al.Better test cases for better automated program repair[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.NY:ACM,2017:831-841.
[33]TIAN H,TANG X,HABIB A,et al.Is this change the answer to that problem? Correlating descriptions of bug and code changes for evaluating patch correctness[C]//Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.Rochester,IEEE/ACM,2022:1-13.
[34]TIAN H,LIU K,LI Y,et al.The Best of Both Worlds:Combining Learned Embeddings with Engineered Features for Accurate Prediction of Correct Patches[J].ACM Transactions on Software Engineering and Methodology,2023,32(4):1-34.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!