Computer Science ›› 2022, Vol. 49 ›› Issue (11): 83-89.doi: 10.11896/jsjkx.210900207

• Computer Software • Previous Articles     Next Articles

Patch Validation Approach Combining Doc2Vec and BERT Embedding Technologies

HUANG Ying, JIANG Shu-juan, JIANG Ting-ting   

  1. Engineering Research Center of Mine Digitalization of Ministry of Education,China University of Mining and Technology,Xuzhou, Jiangsu 221116,China
    School of Computer Science and Technology,China University of Mining and Technology,Xuzhou,Jiangsu 221116,China
  • Received:2021-09-24 Revised:2022-03-11 Online:2022-11-15 Published:2022-11-03
  • About author:HUANG Ying,born in 1996,postgra-duate.Her main research interests include automatic program repair and so on.
    JIANG Shu-juan,born in 1966,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include software analysis and test,and compilation techniques.
  • Supported by:
    National Natural Science Foundation of China(61673384).

Abstract: Automatic program repair is a research hotspot in recent years and has made some progress.Most of the existing automatic program repair methods use the test suite to validate patch correctness.However,using the test suite to validate a large number of candidate patches will not only bring huge costs,but also lead to the overfitting problem of patches.Therefore,how to improve the efficiency of patch validation and effectively validate patch correctness has become an urgent problem.In order to reduce the cost and improve the patch accuracy,this paper proposes an approach combining two embedding techniques to validate patch correctness.Firstly,this approach uses Doc2Vec model to calculate the similarity between the patch and the error code,then it uses the classifier based on BERT model to filter out the error patches from the patches screened by the similarity.To evaluate the effectiveness of this approach,experiments are carried out based on five open source Java benchmarks.Experimental results show that this approach can effectively validate patch correctness and improve the efficiency of patch validation.

Key words: Automatic program repair, Patch validation, Code similarity, Embedding technology

CLC Number: 

  • TP311
[1]GAZZOLA L,MICUCCI D,MARIANI L.Automatic software repair:a survey[J].IEEE Transactions on Software Enginee-ring,2017,45(1):34-67.
[2]WEIMER W,NGUYEN T V,LE GOUES C,et al.Automatically finding patches using genetic programming[C]//International Conference on Software Engineering.IEEE Computer Society,2009:364-374.
[3]LE GOUES C,NGUYEN T V,FORREST S,et al.Genprog:a generic method for automatic software repair[J].IEEE Transactions on Software Engineering,2011,38(1):54-72.
[4]KIM D,NAM J,SONG J,et al.Automatic patch generationlearned from human-written patches[C]//International Confe-rence on Software Engineering.IEEE Computer Society,2013:802-811.
[5]NGUYEN H D T,QI D,ROYCHOUDHURY A,et al.Semfix:program repair via semanticanalysis[C]//International Confe-rence on Software Engineering.IEEE Computer Society,2013:772-781.
[6]WHITE M,TUFANO M,MARTINEZ M,et al.Sorting andtransforming program repair ingredients via deep learning code similarities[C]//International Conference on Software Analysis,Evolution and Reengineering.IEEE Computer Society,2019:479-490.
[7]WANG Z,GAO J,CHEN X,et al.Automatic program repair techniques:a survey [J].Chinese Journal of Computers,2018,41(3):588-610.
[8]YANG J,ZHIKHARTSEV A,LIU Y,etal.Better test cases for better automated program repair[C]//Joint Meeting on Foundations of Software Engineering.ACM,2017:831-841.
[9]XIN Q,REISS S P.Identifying test-suite-overfitted patchesthrough test case generation[C]//International Symposium on Software Testing and Analysis.ACM,2017:226-236.
[10]LONG F,RINARD M.Automatic patch generation by learning correct code[C]//ACM SIGPLAN-SIGACT Symposium on Principles of Programming Language.ACM,2016:298-312.
[11]YE H,GU J,MARTINEZ M,et al.Automated classification of overfitting patches with statically extracted code features[J].arXiv:1910,12057,2019.
[12]CHEN Z,MONPERRUS M.The remarkable role of similarity in redundancy-based program repair[J].arXiv:1811.05703,2018.
[13]TIAN H,LIU K,KABORÉ A K,et al.Evaluating representation learning of code changes for predicting patch correctness in program repair[C]//International Conference on Automated Software Engineering.IEEE Computer Society,2020:981-992.
[14]JIANG J,XIONG Y,ZHANG H,et al.Shaping program repair space with existing patches and similar code[C]//International Symposium on Software Testing and Analysis.ACM,2018:298-309.
[15]LIU K,KOYUNCU A,KIM D,et al.Avatar:Fixing semanticbugs with fix patterns of static analysis violations[C]//International Conference on Software Analysis,Evolution and Reengineering.IEEE Computer Society,2019:1-12.
[16]KOYUNCU A,LIU K,BISSYANDÉ T F,et al.Fixminer:Mi-ning relevant fix patterns for automated program repair[J].arXiv:1810.01791,2018.
[17]LIU K,KOYUNCU A,KIM D,et al.Tbar:revisiting template-based automated program repair[C]//International Symposium on Software Testing and Analysis.ACM,2019:31-42.
[18]MARTINEZ M,MONPERRUS M.Astor:A program repair library for java[C]//International Symposium on Software Testing and Analysis.ACM,2016:441-444.
[19]LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//International Conference on Machine Learning.ACM,2014:1188-1196.
[20]DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[21]JUST R,JALALI D,ERNST M D.Defects4J:a database ofexisting faults to enable controlled testing studies for Java programs[C]//International Symposium on Software Testing and Analysis.ACM,2014:437-440.
[22]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[23]MADEIRAL F,URLI S,MAIA M,et al.Bears:an extensible java bug benchmark for automatic program repair studies[C]//International Conference on Software Analysis,Evolution and Reengineering.IEEE Computer Society,2019:468-478.
[24]SAHA R K,LYU Y,LAM W,et al.Bugs.jar:a large-scale,diverse dataset of real-world java bugs[C]//International Confe-rence on Mining Software Repositories.ACM,2018:10-13.
[25]KARAMPATSIS R M,SUTTON C.How often do single-statement bugs occur? The ManySStuBs4J dataset[C]//Internatio-nal Conference on Mining Software Repositories.ACM,2020:573-577.
[26]LIN D,KOPPEL J,CHEN A,et al.QuixBugs:A multi-lingual program repair benchmark set based on the Quixey challenge[C]//International Conference on Systems,Programming,Languages,and Applications:Software for Humanity.ACM,2017:55-56.
[27]LIU K,WANG S,KOYUNCU A,et al.On theefficiency of test suite based program repair:A systematic assessment of 16 automated repair systems for java programs[C]//International Conference on Software Engineering.ACM,2020:615-627.
[28]NDICHU S,KIM S,OZAWA S,et al.A machine learning ap-proachto detection of JavaScript-based attacks using AST features and paragraph vectors[J].Applied Soft Computing,2019,84:105721.
[29]ALON U,ZILBERSTEIN M,LEVY O,et al.code2vec:Lear-ning distributed representations of code[J].Proceedings of the ACM on Programming Languages,2019,3(POPL):1-29.
[30]HOANG T,KANG H J,LO D,et al.CC2Vec:Distributed representations of code changes[C]//International Conference on Software Engineering.ACM,2020:518-529.
[31]QI Z,LONG F,ACHOUR S,et al.An analysis of patch plausibility and correctness for generate-and-validate patch generation systems[C]//International Symposium on Software Testing and Analysis.ACM,2015:24-36.
[32]XIONG Y,LIU X,ZENG M,et al.Identifying patch correctness in test-based program repair[C]//International Conference on Software Engineering.ACM,2018:789-799.
[1] FANG Lei, WU Ze-hui, WEI Qiang. Summary of Binary Code Similarity Detection Techniques [J]. Computer Science, 2021, 48(5): 1-8.
[2] XIONG Hao,YAN Hai-hua,GUO Tao,HUANG Yong-gang,HAO Yong-lei,LI Zhou-jun. Code Similarity Detection:A Survey [J]. Computer Science, 2010, 37(8): 9-14.
Full text



No Suggested Reading articles found!