Computer Science ›› 2019, Vol. 46 ›› Issue (11): 156-160.doi: 10.11896/jsjkx.191100502C

• Software & Database Technology • Previous Articles     Next Articles

Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction

QIU Shao-jian, CAIZi-yi, LU Lu   

  1. (School of Computer Science and Engineering,South China University of Technology,Guangzhou 510000,China)
  • Received:2018-10-15 Online:2019-11-15 Published:2019-11-14

Abstract: Machine-learning-based software defect prediction methods are received widely attention from the researchers in the field of software engineering.The defect distribution in the software can be analyzed by the defectprediction mo-del,so as to help the software quality assurance team to detect potential software errors and allocate test resources reasonably.However,most of the existing defect prediction methods are based on hand-crafted features such as line of code,dependency between modules and stack reference depth.These methods do not take into account the potential semantic features of the software source code and may result in poor predictions.To solve the above problems,this paper applied convolutional neural networks to mine the semantic features implicit in the source code.In the effective mining of source code semantic features,this paper used three-layer convolutional neural network to extract data abstract features.In terms of data imbalance processing,this paper adopted a cost-sensitive method,which gives different weights to positive and negative examples,and balances the impact of positive and negative examples on model training.In terms of experimental data sets,this paper selected multiple versions of the eight softwares in the PROMISE defect dataset,totaling 19 projects.In terms of model comparison,this paper compared the proposed cost-sensitive software defect prediction model based on convolutional neural network (CS-TCNN) with logistic regression and deep confidence network respectively.The evaluation metrics contain AUC and MCC,which are widely used in the field of defect prediction research.The experimental results demonstrate that CS-TCNN can effectively extract the semantic features in the program code,and improve the prediction effect of the software defect prediction model.

Key words: Software defect prediction, Convolutional neural network, Semantic feature mining, Cost-sensitive

CLC Number: 

  • TP311
[1] LIU H,HAO K G.Cause Analysis Method of Software Defect [J].Computer Science,2009,36(1):242-243.(in Chinese)刘海,郝克刚.软件缺陷原因分析方法[J].计算机科学,2009,36(1):242-243.
[2] PETERS F,MENZIES T,MARCUS A.Better cross companydefect prediction[C]∥Proceedings of the 10th IEEE Working Conference on Mining Software Repositories.2013:409-418.
[3] RADJENOVIC D,HERICKO M,TORKAR R,et al.Software fault prediction metrics:A systematic literature review[J].Information and Software Technology,2013,55(8):1397-1418.
[4] JURECZKO M,MADEYSKI L.Towards identifying softwareproject clusters with regard to defect prediction[C]∥Procee-dings of the 6th International Conference on Predictive Models in Software Engineering.2010:9.
[5] YANG X,LO D,XIA X,et al.TLEL:A two-layer ensemblelearning approach for just-in-time defect prediction[J].Information and Software Technology,2017,87:206-220.
[6] LIU W S,CHEN X,GU Q,et al.A cluster-analysis-based feature-selection method for software defect prediction[J].SCIENTIA SINICA Informationis,2016,46(9):1298-1320.(in Chinese)刘望舒,陈翔,顾庆,等.软件缺陷预测中基于聚类分析的特征选择方法[J].中国科学:信息科学,2016,46(9):1298-1320.
[7] LI J,HE P,ZHU J,et al.Software defect prediction via convolutional neural network[C]∥2017 IEEE International Conference on Software Quality,Reliability and Security.IEEE,2017:318-328.
[8] FISCHER A,IGEL C.An introduction to restricted Boltzmann machines[C]∥Iberoamerican Congress on Pattern Recognition.Berlin:Springer,2012:14-36.
[9] WANG S,LIU T,TAN L.Automatically learning semantic features for defect prediction [C]∥International Conference onSoftware Engineering.2016:297-308.
[10] GAN L,ZANG L,LI H.Deep Belief Network Software Defect Prediction Model[J].Computer Science,2017,44(4):229-233.(in Chinese)甘露,臧洌,李航.深度信念网软件缺陷预测模型[J].计算机科学,2017,44(4):229-233.
[11] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]∥Advances in Neural Information Processing Systems.2012:1097-1105.
[12] CHAWLAN V,BOWYER K W,HALL L O,et al.Smote:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
[13] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]∥Advances in Neural Information Processing Systems.2013:3111-3119.
[14] CUKIC B.Guest editor’s introduction:The promise of public software engineering data repositories[J].IEEE Software,2005,22(6):20-22.
[15] FAWCETT T.An introduction to ROC analysis[J].PatternRecognition Letters,2006,27(8):861-874.
[16] BALDI P,BRUANAK S,CHAUVIN Y,et al.Assessing the accuracy of prediction algorithms for classification:an overview[J].Bioinformatics,2000,16(5):412-424.
[17] LI Y,HUANG Z Q,FANG B W,et al.Using Cost-Sensitive Classification for Software Defects Prediction[J].Journal of Frontiers of Computer Science and Technology,2014,8(12):1442-1451.(in Chinese)李勇,黄志球,房丙午,等.代价敏感分类的软件缺陷预测方法[J].计算机科学与探索,2014,8(12):1442-1451.
[18] SONG Q,GUO Y,SHEPPERD M.A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction[OL].https://ieeexplore.ieee.org/document/8359087/.
[19] XIONG J,GAO Y,WANG Y Y.Software Defect PredictionModel Based on Adaboost Algorithm[J].Computer Science,2016,43(7):186-190.(in Chinese)熊婧,高岩,王雅瑜.基于Adaboost算法的软件缺陷预测模型[J].计算机科学,2016,43(7):186-190.
[1] HU Zhi-jun,XU Yong. Overview of Content-based Video Retrieval [J]. Computer Science, 2020, 47(1): 117-123.
[2] MA Lu, PEI Wei, ZHU Yong-ying, WANG Chun-li, WANG Peng-qian. Fall Action Recognition Based on Deep Learning [J]. Computer Science, 2019, 46(9): 106-112.
[3] WANG Yan-ran, CHEN Qing-liang, WU Jun-jun. Research on Image Semantic Segmentation for Complex Environments [J]. Computer Science, 2019, 46(9): 36-46.
[4] SUN Zhong-feng, WANG Jing. RCNN-BGRU-HN Network Model for Aspect-based Sentiment Analysis [J]. Computer Science, 2019, 46(9): 223-228.
[5] MIAO Yong-wei, LI Gao-yi, BAO Chen, ZHANG Xu-dong, PENG Si-long. Image Localized Style Transfer Based on Convolutional Neural Network [J]. Computer Science, 2019, 46(9): 259-264.
[6] SHI Xiao-hong, HUANG Qin-kai, MIAO Jia-xin, SU Zhuo. Edge-preserving Filtering Method Based on Convolutional Neural Networks [J]. Computer Science, 2019, 46(9): 277-283.
[7] YU Yang, LI Shi-jie, CHEN Liang, LIU Yun-ting. Ship Target Detection Based on Improved YOLO v2 [J]. Computer Science, 2019, 46(8): 332-336.
[8] ZHANG Lin-na,CHEN Jian-qiang,CHEN Xiao-ling,CEN Yi-gang,KAN Shi-chao. Lightweight SSD Network for Real-time Object Detection in Automotive Videos [J]. Computer Science, 2019, 46(7): 233-237.
[9] KONG Fan-yu, ZHOU Yu-feng, CHEN Gang. Traffic Flow Prediction Method Based on Spatio-Temporal Feature Mining [J]. Computer Science, 2019, 46(7): 322-326.
[10] SHI Yu-xin, DENG Hong-min, GUO Wei-lin. Static Gesture Recognition Based on Hybrid Convolution Neural Network [J]. Computer Science, 2019, 46(6A): 165-168.
[11] LV Pei-jian, CHEN Jia-peng, YUAN Fei, PENG Qiang, XIANG Yu. Object Detection Algorithm Based on Context and Multi-scale Information Fusion [J]. Computer Science, 2019, 46(6A): 279-283.
[12] ZHENG Cheng, HONG Tong-tong, XUE Man-yi. BLSTM_MLPCNN Model for Short Text Classification [J]. Computer Science, 2019, 46(6): 206-211.
[13] HUA Zhen, ZHANG Hai-cheng, LI Jin-jiang. End-to-end Image Super Resolution Based on Residuals [J]. Computer Science, 2019, 46(6): 246-255.
[14] XU Yi-ming, ZHANG Juan, LIU Cheng-cheng, GU Ju-ping, PAN Gao-chao. Wind Turbine Visual Inspection Based on GoogLeNet Network in Transfer Learning Mode [J]. Computer Science, 2019, 46(5): 260-265.
[15] HU Hai-gen, ZHOU Li-li, ZHOU Qian-wei, CHEN Sheng-yong, ZHANG Jun-kang. Multi-target Tracking of Cancer Cells under Phase Contrast Microscopic Images Based
on Convolutional Neural Network
[J]. Computer Science, 2019, 46(5): 279-285.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[2] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105, 130 .
[3] WANG Zhen-wu, LV Xiao-hua and HAN Xiao-hui. Survey of Terrain LOD Technology Based on Quadtree Segmentation[J]. Computer Science, 2018, 45(4): 34 -45 .
[4] QU Zhong and ZHAO Cong-mei. Anti-occlusion Adaptive-scale Object Tracking Algorithm[J]. Computer Science, 2018, 45(4): 296 -300 .
[5] LU Jia-wei, MA Jun, ZHANG Yuan-ming and XIAO Gang. Service Clustering Approach for Global Social Service Network[J]. Computer Science, 2018, 45(3): 204 -212 .
[6] GUO Jun-xia, GUO Ren-fei, XU Nan-shan and ZHAO Rui-lian. Study on Construction of EFSM Model for Web Application Based on Session[J]. Computer Science, 2018, 45(4): 203 -207, 214 .
[7] ZHANG Jing and ZHU Guo-bin. Hot Topic Discovery Research of Stack Overflow Programming Website Based on CBOW-LDA Topic Model[J]. Computer Science, 2018, 45(4): 208 -214 .
[8] WEN Jun-hao, SUN Guang-hui and LI Shun. Study on Matrix Factorization Recommendation Algorithm Based on User Clustering and Mobile Context[J]. Computer Science, 2018, 45(4): 215 -219, 251 .
[9] TONG Ze-ping, LI Tao, LI Li-jie and REN Liang. Study on Collaborative Optimization of Supply Chain with Uncertain Demand and Capacity Constraint[J]. Computer Science, 2018, 45(4): 260 -265 .
[10] CUI Jian-jing, LONG Jun, MIN Er-xue, YU Yang and YIN Jian-ping. Survey on Application of Homomorphic Encryption in Encrypted Machine Learning[J]. Computer Science, 2018, 45(4): 46 -52 .