计算机科学 ›› 2018, Vol. 45 ›› Issue (6): 161-165.doi: 10.11896/j.issn.1002-137X.2018.06.028

• 软件与数据库技术 • 上一篇    下一篇

基于代码修改的多目标有监督缺陷预测建模方法

陈翔1,2,3, 王秋萍1   

  1. 南通大学计算机科学与技术学院 江苏 南通2260191;
    南京大学软件新技术国家重点实验室 南京 2100932;
    桂林电子科技大学广西可信软件重点实验室 广西 桂林5410043
  • 收稿日期:2017-04-24 出版日期:2018-06-15 发布日期:2018-07-24
  • 作者简介:陈 翔(1980-),男,博士,副教授,CCF会员,主要研究方向为软件缺陷预测,E-mail:xchencs@ntu.edu.cn(通信作者);王秋萍(1993-),女,硕士生,主要研究方向为软件缺陷预测(通信作者)
  • 基金资助:
    本文受国家自然科学基金(61202006,61602267),南京大学计算机软件新技术国家重点实验室开放课题(KFKT2016B18),广西可信软件重点实验室研究课题(kx201610),江苏省高校自然科学研究项目(15KJB520030,16KJB520038),南通市科技平台项目(CP120130001)资助

Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes

CHEN Xiang1,2,3, WANG Qiu-ping1   

  1. School of Computer Science and Technology,Nantong University,Nantong,Jiangshu 226019,China1;
    State Key Laboratory for Novel Software Technology at Nanjing University,Nanjing 210093,China2;
    Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin,Guangxi 541004,China3
  • Received:2017-04-24 Online:2018-06-15 Published:2018-07-24

摘要: 基于代码修改的缺陷预测,具有代码审查量少、缺陷定位和修复快的优点。文中首次将该问题建模为多目标优化问题,其中一个优化目标是最大化识别出的缺陷代码修改数,另一个优化目标是最小化需要审查的代码量。这两个优化目标之间存在一定的冲突,因此提出了MULTI方法,该方法可以生成一组具有非支配关系的预测模型。在实证研究中,考虑了6个大规模开源项目(累计227417个代码修改),以ACC和POPT作为评测预测性能的指标。实验结果表明,MULTI方法的预测性能均显著优于经典的有监督建模方法(EALR和Logistic)和无监督建模方法(LT和AGE)。

关键词: 代码修改, 多目标优化, 软件缺陷预测, 实证研究

Abstract: Defect prediction based on code changes has the advantage of smaller code inspection cost,easy fault localization and rapid fixing.This paper firstly formalized this problem as a multi-objective optimization problem.One objective is to maximize the number of identified buggy changes,and the other objective is to minimize the cost of code inspection.There exist an obvious conflict between two objectives,so this paper proposed a novel method MULTI.This me-thod can generate a set of non-dominated prediction models.In the empirical studies,this paper chose six large-scale open source projects (with 227417code changes in total) and considerd ACC and POPT as evaluation indicators of perfor-mance.Final results show that the proposed method can perform significantly better than the state-of-the-art supervised methods (i.e.,EALR and Logistic) and unsupervised methods (i.e.,LT and AGE).

Key words: Code changes, Empirical studies, Multi-objective optimization, Software defect prediction

中图分类号: 

  • TP311.5
[1]CHEN X,GU Q,LIU W S,et al.Survey of static software defect prediction[J].Journal of Software,2016,27(1):1-25.(in Chinese)
陈翔,顾庆,刘望舒,等.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25.
[2]MOCKUS A,WEISS D M.Predicting risk of software changes[J].Bell Labs Technical Journal,2000,5(2):169-180.
[3]KAMEI Y,SHIHAB E,ADAMS B,et al.A large-scale empirical study of just-in-time quality assurance[J].IEEE Transactions on Software Engineering,2013,39(6):757-773.
[4]YANG X,LO D,XIA X,et al.Deep learning for just-in-time defect prediction[C]//International Conference on Software Qua-lity,Reliability,and Security.2015:17-26.
[5]KIM S,JR E J W,ZHANG Y.Classifying software changes:clean or buggy?[J].IEEE Transactions on Software Enginee-ring,2008,34(2):181-196.
[6]SHIVAJI S,WHITEHEAD E J,AKELLA R,et al.reducing features to improve code change-based bug prediction[J].IEEE Transactions on Software Engineering,2013,39(4):552-569.
[7]YANG Y,ZHOU Y,LIU J,et al.Effort-aware just-in-time defect prediction:simple unsupervised models could be better than supervised models[C]//Proceedings of the International Symposium on Foundations of Software Engineering.2016,157-168.
[8]HARMAN M,MANSOURI S A,ZHANG Y.Search-based software engineering:trends,techniques and applications[J].ACM Computing Survey,2012,45(1):1-61.
[9]HARMAN M.The relationship between search based software engineering and predictive modeling[C]//International Confe-rence on Predictive Models in Software Engineering.2010:1-13.
[10]DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multi-objective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[11]TAN M,TAN L,DARA S,et al.online defect prediction for imbalance data[C]//International Conference on Software Engineering.2015:99-108.
[12]BENJAMINI Y,HOCHBERG Y.controlling the false discovery rate:a practical and powerful approach to multiple testing[J].Journal of the Royal Statistical Society,Series B (Methodological),1995,57(1):289-300.
[13]ZITZLER E,THIELE L.Multiobjective evolutionary algorithms:a comparative case study and the strength pareto approach[J].IEEE Transactions on Evolutionary Computation,1999,3(4):257-271.
[14]LIU W S,CHEN X,GU Q,et al.A cluster-analysis-based feature-selection method for software defect prediction[J].SCIENCE CHINA:Information Sciences,2016,46(9):1298-1320.(in Chinese)
刘望舒,陈翔,顾庆,等.软件缺陷预测中基于聚类分析的特征选择方法[J].中国科学:信息科学,2016,46(9):1298-1320.
[15]LIU W S,CHEN X,GU Q,et al.A Noise Tolerable Feature Selection Framework for Software Defect Prediction[J].Chinese Journal of Computers,2018,41(3):506-520.(in Chinese)
刘望舒,陈翔,顾庆,等.一种面向软件缺陷预测的可容忍噪声的特征选择框架[J].计算机学报,2018,41(3):506-520.
[1] 孙刚, 伍江江, 陈浩, 李军, 徐仕远.
一种基于切比雪夫距离的隐式偏好多目标进化算法
Hidden Preference-based Multi-objective Evolutionary Algorithm Based on Chebyshev Distance
计算机科学, 2022, 49(6): 297-304. https://doi.org/10.11896/jsjkx.210500095
[2] 李浩东, 胡洁, 范勤勤.
基于并行分区搜索的多模态多目标优化及其应用
Multimodal Multi-objective Optimization Based on Parallel Zoning Search and Its Application
计算机科学, 2022, 49(5): 212-220. https://doi.org/10.11896/jsjkx.210300019
[3] 彭冬阳, 王睿, 胡谷雨, 祖家琛, 王田丰.
视频缓存策略中QoE和能量效率的公平联合优化
Fair Joint Optimization of QoE and Energy Efficiency in Caching Strategy for Videos
计算机科学, 2022, 49(4): 312-320. https://doi.org/10.11896/jsjkx.210800027
[4] 郑小萌, 高猛, 滕俊元.
航天器软件缺陷预测数据集构建方法研究
Research on Construction Method of Defect Prediction Dataset for Spacecraft Software
计算机科学, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133
[5] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[6] 王珂, 曲桦, 赵季红.
多域SFC部署中基于强化学习的多目标优化方法
Multi-objective Optimization Method Based on Reinforcement Learning in Multi-domain SFC Deployment
计算机科学, 2021, 48(12): 324-330. https://doi.org/10.11896/jsjkx.201100159
[7] 胡腾, 王艳平, 张小松, 牛伟纳.
基于区块链的DApp数据与行为分析
Data and Behavior Analysis of Blockchain-based DApp
计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134
[8] 朱汉卿, 马武彬, 周浩浩, 吴亚辉, 黄宏斌.
基于改进多目标进化算法的微服务用户请求分配策略
Microservices User Requests Allocation Strategy Based on Improved Multi-objective Evolutionary Algorithms
计算机科学, 2021, 48(10): 343-350. https://doi.org/10.11896/jsjkx.201100009
[9] 崔国楠, 王立松, 康介祥, 高忠杰, 王辉, 尹伟.
结合多目标优化算法的模糊聚类有效性指标及应用
Fuzzy Clustering Validity Index Combined with Multi-objective Optimization Algorithm and Its Application
计算机科学, 2021, 48(10): 197-203. https://doi.org/10.11896/jsjkx.200900061
[10] 张清琪, 刘漫丹.
复杂网络社区发现的多目标五行环优化算法
Multi-objective Five-elements Cycle Optimization Algorithm for Complex Network Community Discovery
计算机科学, 2020, 47(8): 284-290. https://doi.org/10.11896/jsjkx.190700082
[11] 郑友莲, 雷德明, 郑巧仙.
求解高维多目标调度的新型人工蜂群算法
Novel Artificial Bee Colony Algorithm for Solving Many-objective Scheduling
计算机科学, 2020, 47(7): 186-191. https://doi.org/10.11896/jsjkx.190600089
[12] 赵松辉, 任志磊, 江贺.
软件升级问题的多目标优化方法
Multi-objective Optimization Methods for Software Upgradeability Problem
计算机科学, 2020, 47(6): 16-23. https://doi.org/10.11896/jsjkx.200400027
[13] 夏春艳, 王兴亚, 张岩.
基于多目标优化的测试用例优先级排序方法
Test Case Prioritization Based on Multi-objective Optimization
计算机科学, 2020, 47(6): 38-43. https://doi.org/10.11896/jsjkx.191100113
[14] 孙敏, 陈中雄, 叶侨楠.
云环境下基于HEDSM的工作流调度策略
Workflow Scheduling Strategy Based on HEDSM Under Cloud Environment
计算机科学, 2020, 47(6): 252-259. https://doi.org/10.11896/jsjkx.190400047
[15] 王绪亮, 聂铁铮, 唐欣然, 黄菊, 李迪, 闫铭森, 刘畅.
流式数据处理的动态自适应缓存策略研究
Study on Dynamic Adaptive Caching Strategy for Streaming Data Processing
计算机科学, 2020, 47(11): 122-127. https://doi.org/10.11896/jsjkx.190800093
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!