计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230400068-7.doi: 10.11896/jsjkx.230400068

• 计算机软件&体系架构 • 上一篇    下一篇

基于领域知识微调的缺陷报告严重性预测

陈冰婷, 邹卫琴, 蔡碧瑜, 刘文杰   

  1. 南京航空航天大学计算机科学与技术学院 南京 211106
  • 发布日期:2024-06-06
  • 通讯作者: 邹卫琴(weiqin@nuaa.edu.cn)
  • 作者简介:(btchen@nuaa.edu.cn)
  • 基金资助:
    国家自然科学基金(62002161);南京航空航天大学前瞻布局科研专项资金;南京航空航天大学人才科研启动基金

Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge

CHEN Bingting, ZOU Weiqin, CAI Biyu, LIU Wenjie   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Published:2024-06-06
  • About author:CHEN Bingting,born in 1998,postgra-duate.Her main research interests include bug repository mining and so on.
    ZOU Weiqin,born in 1988,Ph.D,professor,is a member of CCF(No.D3300M).Her main research interests include bug localization and software repository mining.
  • Supported by:
    National Natural Science Foundation of China(62002161),Fund of Prospective Layout of Scientific Research for NUAA(Nanjing University of Aeronautics and Astronautics) and Scientific Research Foundation for the Introduction of Talent for NUAA.

摘要: 有效预测缺陷报告的严重性,对快速、准确分派缺陷报告,帮助开发人员及时发现并处理软件中的缺陷至关重要。现有主流的基于传统信息检索或通用预训练模型的缺陷报告严重性预测方法,存在忽略上下文语义或缺陷报告特性导致预测效果受限的问题。对此,提出一种基于领域知识微调的缺陷报告严重性预测方法。利用能充分考虑文本上下文语义的BERT预训练模型,并使用缺陷报告数据对其进行模型微调使其学习到相关的领域知识。微调后的BERT模型用于抽取缺陷报告的语义特征,随后使用支持向量机进行严重性预测模型的构建。在 Mozilla,Eclipse和Apache 选取的共计 15个项目上进行的实验表明,在准确率、召回率和 F1 值上,相较传统的信息检索方法,所提方法分别能提升4.5%~22.0%,3.0%~22.0%,4.0%~22.0%;相较通用 BERT 模型,微调后的 BERT 模型的准确率、召回率和 F1 值分别能够提高2.0%~5.1%,1.9%~5.1%,1.8%~5.0%。

关键词: 词嵌入, BERT, 预训练模型, 缺陷报告, 微调, 严重性预测

Abstract: Accurately predicting the severity of bug reports is crucial for efficiently assigning them and facilitating developers to timely detect and fix software bugs.However,existing severity prediction methods based on traditional information retrieval or general pre-training models have limitations in prediction accuracy due to the ignorance of context semantics or bug report characteristics.To address this problem,this paper proposes a severity prediction method based on domain knowledge fine-tuning.A BERT pre-trained model that can fully consider the semantic context of text is used,and the model is fine-tuned with bug report data to learn relevant domain knowledge.The fine-tuned BERT model is then used to extract semantic features of bug reports,and a support vector machine is employed to construct a severity prediction model.Experimental results on 15 projects,including Mozilla,Eclipse,and Apache,demonstrate that compared with traditional information retrieval methods,the proposed method can improve the accuracy,recall,and F1 score by 4.5% to 22.0%,3.0% to 22.0%,and 4.0% to 22.0%,respectively.Compared with the general BERT model,the fine-tuned BERT model can improve the accuracy,recall,and F1 score by 2.0%~5.1%,1.9%~5.1%,and 1.8%~5.0%,respectively.

Key words: Word embedding, BERT, Pretrained model, Bug report, Fine-tuning, Severity prediction

中图分类号: 

  • TP311
[1]BETTENBURGN,JUST S,SCHRÖTER A,et al.What makes a good bug report?[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2008:308-318.
[2]ANVIK J,HIEW L,MURPHY G C.Who should fix thisbug?[C]//Proceedings of the 28th international conference on Software engineering.2006:361-370.
[3]ZOU W,LO D,CHEN Z,et al.How practitioners perceive automated bug report management techniques[J].IEEE Transactions on Software Engineering,2018,46(8):836-862.
[4]TAN Y,XU S,WANG Z,et al.Bug severity prediction using question-and-answer pairs from Stack Overflow[J].Journal of Systems and Software,2020,165:110567.
[5]SANTOS K,DIAS J P,AMADO C.A literature review of machine learning algorithms for crash injury severity prediction[J].Journal of Safety Research,2022,80:254-269.
[6]LUAPHOL B,POLPINIJ J,KAENAMPORNPAN M.TextMining Approaches for Dependent Bug Report Assembly and Severity Prediction[J].Internarional Arab Journal of Information Technology,2022,19(6):915-924.
[7]TIAN Y,LO D,SUN C.Information retrieval based nearestneighbor classification for fine-grained bug severity prediction[C]//2012 19th Working Conference on Reverse Engineering.IEEE,2012:215-224.
[8]YANG G,ZHANG T,LEE B.Towards semi-automatic bugtriage and severity prediction based on topic model and multi-feature of bug reports[C]//2014 IEEE 38th Annual Computer Software and Applications Conference.IEEE,2014:97-106.
[9]ROY N K S,ROSSI B.Towards an improvement of bug severityclassification[C]//2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.IEEE,2014:269-276.
[10]MALHOTRA R,KAPOOR N,JAIN R,et al.Severity assessment of softwarebug reports using text classification[J].International Journal of Computer Applications,2013,83(11):13-16.
[11]YANG C Z,HOU C C,KAO W C,et al.An empirical study on improving severity prediction of defect reports using feature selection[C]//2012 19th Asia-Pacific Software Engineering Conference.IEEE,2012,1:240-249.
[12]RAMAY W Y,UMER Q,YIN X C,et al.Deep neural network-based severity prediction of bug reports[J].IEEE Access,2019,7:46846-46857.
[13]JIA Y,CHEN X,XU S,et al.EKD-BSP:bug report severity prediction by extracting keywords from description[C]//2021 8th International Conference on Dependable Systems and Their Applications.IEEE,2021:42-53.
[14]SU Y,HU X,CHEN X,et al.CIL-BSP:Bug Report SeverityPrediction based on Class Imbalanced Learning[C]//2022 IEEE 22nd International Conference on Software Quality,Reliability,and Security Companion.IEEE,2022:298-306.
[15]KUMAR L,KUMAR M,MURTHY L B,et al.An empirical study on application of word embedding techniques for prediction of software defect severity level[C]//2021 16th Conference on Computer Science and Intelligence Systems.IEEE,2021:477-484.
[16]MENZIES T,MARCUS A.Automated severity assessment ofsoftware defect reports[C]//2008 IEEE International Con-ference on Software Maintenance.IEEE,2008:346-355.
[17]LAMKANFI A,DEMEYER S,GIGER E,et al.Predicting theseverity of a reported bug[C]//2010 7th IEEE Working Confe-rence on Mining Software Repositories.IEEE,2010:1-10.
[18]SARI G I P,SIAHAAN D O.An attribute selection for severity level determination according to the support vector machine classification result[C]//Proceedings of the 1st International Conference on Information Systems for Business Competitiveness.2011.
[19]JINDAL R,MALHOTRA R,JAIN A.Software defect prediction using neural networks[C]//Proceedings of 3rd InternationalConference on Reliability,Infocom Technologies and Optimization.IEEE,2014:1-6.
[20]YANG G,ZHANG T,LEE B.Towards semi-automatic bugtriage and severity prediction based on topic model and multi-feature of bug reports[C]//2014 IEEE 38th Annual Computer Software and Applications Conference.IEEE,2014:97-106.
[21]JINDAL R,MALHOTRA R,JAIN A.Prediction of defect severity by mining software project reports[J].International Journal of System Assurance Engineering and Management,2017,8:334-351.
[22]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[23]LAMKANFI A,DEMEYER S,SOETENS Q D,et al.Comparing mining algorithms for predicting the severity of a reported bug[C]//2011 15th European Conference on Software Maintenance and Reengineering.IEEE,2011:249-258.
[24]TIAN Y,LO D,XIA X,et al.Automated prediction of bug report priority using multi-factor analysis[J].Empirical Software Engineering,2015,20:1354-1383.
[25]VAN NGUYEN T,NGUYEN A T,PHAN H D,et al.Combining word2vec with revised vector space model for better code retrieval[C]//2017 IEEE/ACM 39th International Conference on Software Engineering Companion.IEEE,2017:183-185.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!