基于卷积神经网络的代价敏感软件缺陷预测模型

doi:10.11896/jsjkx.191100502C

摘要/Abstract

摘要： 基于机器学习的软件缺陷预测方法受到软件工程领域学者们的普遍关注,通过缺陷预测模型可一定程度地分析软件中的缺陷分布,以此帮助软件质量保障团队发现软件中潜在的错误并合理分配测试资源。然而,现有多数的缺陷预测方法是基于代码行数、模块依赖程度、栈引用深度等人工提取的软件特征进行缺陷预测的。此类方法未考虑到软件源码中潜在的语义特征,可能导致预测效果不理想。为了解决以上问题,文中利用卷积神经网络挖掘源码中隐含的语义特征,并将其用于软件缺陷预测的任务中。在源码语义特征的有效挖掘方面,采用三层卷积神经网络提取数据抽象特征。在数据不平衡处理方面,采用代价敏感的方法,即分别给予正例与反例不同的权重,平衡正反例对模型训练的影响。在实验数据集方面,选取了开源缺陷标注数据集PROMISE中8个软件中的多个版本,合计19个项目。在模型性能比较方面,将提出的基于卷积神经网络的代价敏感软件缺陷预测模型(Cost-Sensitive Three-Layer Convolutional Neural Network,CS-TCNN)分别与逻辑回归、深度置信网络等模型进行比较,评估指标为在缺陷预测研究领域中普遍使用的AUC和MCC。实验结果充分说明了CS-TCNN能更有效地提取程序代码中的语义特征,进而提高软件缺陷预测模型的预测效果。

关键词: 代价敏感, 卷积神经网络, 软件缺陷预测, 语义特征挖掘

Abstract: Machine-learning-based software defect prediction methods are received widely attention from the researchers in the field of software engineering.The defect distribution in the software can be analyzed by the defectprediction mo-del,so as to help the software quality assurance team to detect potential software errors and allocate test resources reasonably.However,most of the existing defect prediction methods are based on hand-crafted features such as line of code,dependency between modules and stack reference depth.These methods do not take into account the potential semantic features of the software source code and may result in poor predictions.To solve the above problems,this paper applied convolutional neural networks to mine the semantic features implicit in the source code.In the effective mining of source code semantic features,this paper used three-layer convolutional neural network to extract data abstract features.In terms of data imbalance processing,this paper adopted a cost-sensitive method,which gives different weights to positive and negative examples,and balances the impact of positive and negative examples on model training.In terms of experimental data sets,this paper selected multiple versions of the eight softwares in the PROMISE defect dataset,totaling 19 projects.In terms of model comparison,this paper compared the proposed cost-sensitive software defect prediction model based on convolutional neural network (CS-TCNN) with logistic regression and deep confidence network respectively.The evaluation metrics contain AUC and MCC,which are widely used in the field of defect prediction research.The experimental results demonstrate that CS-TCNN can effectively extract the semantic features in the program code,and improve the prediction effect of the software defect prediction model.

Key words: Convolutional neural network, Cost-sensitive, Semantic feature mining, Software defect prediction

中图分类号:

TP311

邱少健, 蔡子仪, 陆璐. 基于卷积神经网络的代价敏感软件缺陷预测模型[J]. 计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C

QIU Shao-jian, CAIZi-yi, LU Lu. Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction[J]. Computer Science, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C

参考文献

[1]LIU H,HAO K G.Cause Analysis Method of Software Defect [J].Computer Science,2009,36(1):242-243.(in Chinese)
刘海,郝克刚.软件缺陷原因分析方法[J].计算机科学,2009,36(1):242-243.
[2]PETERS F,MENZIES T,MARCUS A.Better cross companydefect prediction[C]∥Proceedings of the 10th IEEE Working Conference on Mining Software Repositories.2013:409-418.
[3]RADJENOVIC D,HERICKO M,TORKAR R,et al.Software fault prediction metrics:A systematic literature review[J].Information and Software Technology,2013,55(8):1397-1418.
[4]JURECZKO M,MADEYSKI L.Towards identifying softwareproject clusters with regard to defect prediction[C]∥Procee-dings of the 6th International Conference on Predictive Models in Software Engineering.2010:9.
[5]YANG X,LO D,XIA X,et al.TLEL:A two-layer ensemblelearning approach for just-in-time defect prediction[J].Information and Software Technology,2017,87:206-220.
[6]LIU W S,CHEN X,GU Q,et al.A cluster-analysis-based feature-selection method for software defect prediction[J].SCIENTIA SINICA Informationis,2016,46(9):1298-1320.(in Chinese)
刘望舒,陈翔,顾庆,等.软件缺陷预测中基于聚类分析的特征选择方法[J].中国科学:信息科学,2016,46(9):1298-1320.
[7]LI J,HE P,ZHU J,et al.Software defect prediction via convolutional neural network[C]∥2017 IEEE International Conference on Software Quality,Reliability and Security.IEEE,2017:318-328.
[8]FISCHER A,IGEL C.An introduction to restricted Boltzmann machines[C]∥Iberoamerican Congress on Pattern Recognition.Berlin:Springer,2012:14-36.
[9]WANG S,LIU T,TAN L.Automatically learning semantic features for defect prediction [C]∥International Conference onSoftware Engineering.2016:297-308.
[10]GAN L,ZANG L,LI H.Deep Belief Network Software Defect Prediction Model[J].Computer Science,2017,44(4):229-233.(in Chinese)
甘露,臧洌,李航.深度信念网软件缺陷预测模型[J].计算机科学,2017,44(4):229-233.
[11]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[12]CHAWLAN V,BOWYER K W,HALL L O,et al.Smote:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[13]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]∥Advances in Neural Information Processing Systems.2013:3111-3119.
[14]CUKIC B.Guest editor’s introduction:The promise of public software engineering data repositories[J].IEEE Software,2005,22(6):20-22.
[15]FAWCETT T.An introduction to ROC analysis[J].PatternRecognition Letters,2006,27(8):861-874.
[16]BALDI P,BRUANAK S,CHAUVIN Y,et al.Assessing the accuracy of prediction algorithms for classification:an overview[J].Bioinformatics,2000,16(5):412-424.
[17]LI Y,HUANG Z Q,FANG B W,et al.Using Cost-Sensitive Classification for Software Defects Prediction[J].Journal of Frontiers of Computer Science and Technology,2014,8(12):1442-1451.(in Chinese)
李勇,黄志球,房丙午,等.代价敏感分类的软件缺陷预测方法[J].计算机科学与探索,2014,8(12):1442-1451.
[18]SONG Q,GUO Y,SHEPPERD M.A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction[OL].https://ieeexplore.ieee.org/document/8359087/.
[19]XIONG J,GAO Y,WANG Y Y.Software Defect PredictionModel Based on Adaboost Algorithm[J].Computer Science,2016,43(7):186-190.(in Chinese)
熊婧,高岩,王雅瑜.基于Adaboost算法的软件缺陷预测模型[J].计算机科学,2016,43(7):186-190.

相关文章 15

[1]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[2]	李宗民, 张玉鹏, 刘玉杰, 李华. 基于可变形图卷积的点云表征学习 Deformable Graph Convolutional Networks Based Point Cloud Representation Learning 计算机科学, 2022, 49(8): 273-278. https://doi.org/10.11896/jsjkx.210900023
[3]	陈泳全, 姜瑛. 基于卷积神经网络的APP用户行为分析方法 Analysis Method of APP User Behavior Based on Convolutional Neural Network 计算机科学, 2022, 49(8): 78-85. https://doi.org/10.11896/jsjkx.210700121
[4]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[5]	檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[6]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[7]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[8]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[9]	刘月红, 牛少华, 神显豪. 基于卷积神经网络的虚拟现实视频帧内预测编码 Virtual Reality Video Intraframe Prediction Coding Based on Convolutional Neural Network 计算机科学, 2022, 49(7): 127-131. https://doi.org/10.11896/jsjkx.211100179
[10]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085
[11]	孙福权, 崔志清, 邹彭, 张琨. 基于多尺度特征的脑肿瘤分割算法 Brain Tumor Segmentation Algorithm Based on Multi-scale Features 计算机科学, 2022, 49(6A): 12-16. https://doi.org/10.11896/jsjkx.210700217
[12]	吴子斌, 闫巧. 基于动量的映射式梯度下降算法 Projected Gradient Descent Algorithm with Momentum 计算机科学, 2022, 49(6A): 178-183. https://doi.org/10.11896/jsjkx.210500039
[13]	杨涵, 万游, 蔡洁萱, 方铭宇, 吴卓超, 金扬, 钱伟行. 基于步态分类辅助的虚拟IMU的行人导航方法 Pedestrian Navigation Method Based on Virtual Inertial Measurement Unit Assisted by GaitClassification 计算机科学, 2022, 49(6A): 759-763. https://doi.org/10.11896/jsjkx.211200148
[14]	张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[15]	王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed