Computer Science ›› 2024, Vol. 51 ›› Issue (6A): 230400068-7.doi: 10.11896/jsjkx.230400068

• Computer Software & Architecture • Previous Articles     Next Articles

Bug Report Severity Prediction Based on Fine-tuned Embedding Model with Domain Knowledge

CHEN Bingting, ZOU Weiqin, CAI Biyu, LIU Wenjie   

  1. College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Published:2024-06-06
  • About author:CHEN Bingting,born in 1998,postgra-duate.Her main research interests include bug repository mining and so on.
    ZOU Weiqin,born in 1988,Ph.D,professor,is a member of CCF(No.D3300M).Her main research interests include bug localization and software repository mining.
  • Supported by:
    National Natural Science Foundation of China(62002161),Fund of Prospective Layout of Scientific Research for NUAA(Nanjing University of Aeronautics and Astronautics) and Scientific Research Foundation for the Introduction of Talent for NUAA.

Abstract: Accurately predicting the severity of bug reports is crucial for efficiently assigning them and facilitating developers to timely detect and fix software bugs.However,existing severity prediction methods based on traditional information retrieval or general pre-training models have limitations in prediction accuracy due to the ignorance of context semantics or bug report characteristics.To address this problem,this paper proposes a severity prediction method based on domain knowledge fine-tuning.A BERT pre-trained model that can fully consider the semantic context of text is used,and the model is fine-tuned with bug report data to learn relevant domain knowledge.The fine-tuned BERT model is then used to extract semantic features of bug reports,and a support vector machine is employed to construct a severity prediction model.Experimental results on 15 projects,including Mozilla,Eclipse,and Apache,demonstrate that compared with traditional information retrieval methods,the proposed method can improve the accuracy,recall,and F1 score by 4.5% to 22.0%,3.0% to 22.0%,and 4.0% to 22.0%,respectively.Compared with the general BERT model,the fine-tuned BERT model can improve the accuracy,recall,and F1 score by 2.0%~5.1%,1.9%~5.1%,and 1.8%~5.0%,respectively.

Key words: Word embedding, BERT, Pretrained model, Bug report, Fine-tuning, Severity prediction

CLC Number: 

  • TP311
[1]BETTENBURGN,JUST S,SCHRÖTER A,et al.What makes a good bug report?[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2008:308-318.
[2]ANVIK J,HIEW L,MURPHY G C.Who should fix thisbug?[C]//Proceedings of the 28th international conference on Software engineering.2006:361-370.
[3]ZOU W,LO D,CHEN Z,et al.How practitioners perceive automated bug report management techniques[J].IEEE Transactions on Software Engineering,2018,46(8):836-862.
[4]TAN Y,XU S,WANG Z,et al.Bug severity prediction using question-and-answer pairs from Stack Overflow[J].Journal of Systems and Software,2020,165:110567.
[5]SANTOS K,DIAS J P,AMADO C.A literature review of machine learning algorithms for crash injury severity prediction[J].Journal of Safety Research,2022,80:254-269.
[6]LUAPHOL B,POLPINIJ J,KAENAMPORNPAN M.TextMining Approaches for Dependent Bug Report Assembly and Severity Prediction[J].Internarional Arab Journal of Information Technology,2022,19(6):915-924.
[7]TIAN Y,LO D,SUN C.Information retrieval based nearestneighbor classification for fine-grained bug severity prediction[C]//2012 19th Working Conference on Reverse Engineering.IEEE,2012:215-224.
[8]YANG G,ZHANG T,LEE B.Towards semi-automatic bugtriage and severity prediction based on topic model and multi-feature of bug reports[C]//2014 IEEE 38th Annual Computer Software and Applications Conference.IEEE,2014:97-106.
[9]ROY N K S,ROSSI B.Towards an improvement of bug severityclassification[C]//2014 40th EUROMICRO Conference on Software Engineering and Advanced Applications.IEEE,2014:269-276.
[10]MALHOTRA R,KAPOOR N,JAIN R,et al.Severity assessment of softwarebug reports using text classification[J].International Journal of Computer Applications,2013,83(11):13-16.
[11]YANG C Z,HOU C C,KAO W C,et al.An empirical study on improving severity prediction of defect reports using feature selection[C]//2012 19th Asia-Pacific Software Engineering Conference.IEEE,2012,1:240-249.
[12]RAMAY W Y,UMER Q,YIN X C,et al.Deep neural network-based severity prediction of bug reports[J].IEEE Access,2019,7:46846-46857.
[13]JIA Y,CHEN X,XU S,et al.EKD-BSP:bug report severity prediction by extracting keywords from description[C]//2021 8th International Conference on Dependable Systems and Their Applications.IEEE,2021:42-53.
[14]SU Y,HU X,CHEN X,et al.CIL-BSP:Bug Report SeverityPrediction based on Class Imbalanced Learning[C]//2022 IEEE 22nd International Conference on Software Quality,Reliability,and Security Companion.IEEE,2022:298-306.
[15]KUMAR L,KUMAR M,MURTHY L B,et al.An empirical study on application of word embedding techniques for prediction of software defect severity level[C]//2021 16th Conference on Computer Science and Intelligence Systems.IEEE,2021:477-484.
[16]MENZIES T,MARCUS A.Automated severity assessment ofsoftware defect reports[C]//2008 IEEE International Con-ference on Software Maintenance.IEEE,2008:346-355.
[17]LAMKANFI A,DEMEYER S,GIGER E,et al.Predicting theseverity of a reported bug[C]//2010 7th IEEE Working Confe-rence on Mining Software Repositories.IEEE,2010:1-10.
[18]SARI G I P,SIAHAAN D O.An attribute selection for severity level determination according to the support vector machine classification result[C]//Proceedings of the 1st International Conference on Information Systems for Business Competitiveness.2011.
[19]JINDAL R,MALHOTRA R,JAIN A.Software defect prediction using neural networks[C]//Proceedings of 3rd InternationalConference on Reliability,Infocom Technologies and Optimization.IEEE,2014:1-6.
[20]YANG G,ZHANG T,LEE B.Towards semi-automatic bugtriage and severity prediction based on topic model and multi-feature of bug reports[C]//2014 IEEE 38th Annual Computer Software and Applications Conference.IEEE,2014:97-106.
[21]JINDAL R,MALHOTRA R,JAIN A.Prediction of defect severity by mining software project reports[J].International Journal of System Assurance Engineering and Management,2017,8:334-351.
[22]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[J].arXiv:1810.04805,2018.
[23]LAMKANFI A,DEMEYER S,SOETENS Q D,et al.Comparing mining algorithms for predicting the severity of a reported bug[C]//2011 15th European Conference on Software Maintenance and Reengineering.IEEE,2011:249-258.
[24]TIAN Y,LO D,XIA X,et al.Automated prediction of bug report priority using multi-factor analysis[J].Empirical Software Engineering,2015,20:1354-1383.
[25]VAN NGUYEN T,NGUYEN A T,PHAN H D,et al.Combining word2vec with revised vector space model for better code retrieval[C]//2017 IEEE/ACM 39th International Conference on Software Engineering Companion.IEEE,2017:183-185.
[1] WANG Zhaodan, ZOU Weiqin, LIU Wenjie. Buggy File Identification Based on Recommendation Lists [J]. Computer Science, 2024, 51(6A): 230600088-8.
[2] WANG Li, CHEN Gang, XIA Mingshan, HU Hao. DUWe:Dynamic Unknown Word Embedding Approach for Web Anomaly Detection [J]. Computer Science, 2024, 51(6A): 230300191-5.
[3] YANG Binxia, LUO Xudong, SUN Kaili. Recent Progress on Machine Translation Based on Pre-trained Language Models [J]. Computer Science, 2024, 51(6A): 230700112-8.
[4] LI Minzhe, YIN Jibin. TCM Named Entity Recognition Model Combining BERT Model and Lexical Enhancement [J]. Computer Science, 2024, 51(6A): 230900030-6.
[5] JIANG Haoda, ZHAO Chunlei, CHEN Han, WANG Chundong. Construction Method of Domain Sentiment Lexicon Based on Improved TF-IDF and BERT [J]. Computer Science, 2024, 51(6A): 230800011-9.
[6] YANG Junzhe, SONG Ying, CHEN Yifei. Text Emotional Analysis Model Fusing Theme Characteristics [J]. Computer Science, 2024, 51(6A): 230600111-8.
[7] MENG Xiangfu, REN Quanying, YANG Dongshen, LI Keqian, YAO Keyu, ZHU Yan. Literature Classification of Individual Reports of Adverse Drug Reactions Based on BERT and CNN [J]. Computer Science, 2024, 51(6A): 230400049-6.
[8] CHEN Haoyang, ZHANG Lei. Very Short Texts Hierarchical Classification Combining Semantic Interpretation and DeBERTa [J]. Computer Science, 2024, 51(5): 250-257.
[9] YAN Yintong, YU Lu, WANG Taiyan, LI Yuwei, PAN Zulie. Study on Binary Code Similarity Detection Based on Jump-SBERT [J]. Computer Science, 2024, 51(5): 355-362.
[10] LIU Yingying, YANG Qiuhui, YAO Bangguo, LIU Qiaoyun. Study on REST API Test Case Generation Method Based on Dependency Model [J]. Computer Science, 2023, 50(9): 101-107.
[11] WANG Yu, WANG Zuchao, PAN Rui. Survey of DGA Domain Name Detection Based on Character Feature [J]. Computer Science, 2023, 50(8): 251-259.
[12] ZHAO Jiangjiang, WANG Yang, XU Yingying, GAO Yang. Extractive Automatic Summarization Model Based on Knowledge Distillation [J]. Computer Science, 2023, 50(6A): 210300179-7.
[13] CHEN Jie. Study on Long Text Topic Clustering Based on Doc2Vec Enhanced Features [J]. Computer Science, 2023, 50(6A): 220800192-6.
[14] LUO Liang, CHENG Chunling, LIU Qian, GUI Yaocheng. Answer Selection Model Based on MLP and Semantic Matrix [J]. Computer Science, 2023, 50(5): 270-276.
[15] LI Binghui, FANG Huan, MEI Zhenhui. Interpretable Repair Method for Event Logs Based on BERT and Weak Behavioral Profiles [J]. Computer Science, 2023, 50(5): 38-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!