航天器软件缺陷预测数据集构建方法研究

doi:10.11896/jsjkx.200900133

摘要/Abstract

摘要： 软件缺陷预测数据集作为预测模型构建及实施缺陷预测的基础设施,面临着两方面问题,一方面因数据源头上采集困难导致可用评测数据集较少;另一方面,已公开的数据集因领域数据不同导致了差异性大、度量标准不适用等问题,鲜有工程应用。结合国内航天领域的真实软件评测数据,对航天器软件度量指标设计方法与航天器软件缺陷预测数据集的构建过程进行了系统阐述。依据航天器软件的特点,提出了软件的代码度量与质量度量相结合的混合度量方法,确保能够从不同的角度全面刻画、度量航天器软件的相关特性;同时针对面向大规模数据收集、处理、分析等环节耗费高昂人力与存储成本的问题,提出了版本划分下的数据清洗与模块层级预处理相结合的标准化数据集构建方法。通过对基于该方法构建的SPACE数据集进行应用示范,验证了此方法能够有效应用于构建具有领域针对性的高质量软件缺陷预测数据集,并可取得模型AutoWeka良好的预测效果。

关键词: 航天器软件, 软件度量指标, 软件缺陷预测, 数据集, 数据质量

Abstract: As being the infrastructure of prediction model's construction and implementation,software defect prediction dataset faces two sets of problems.On the one hand,due to the difficulty of data collection from data sources,there are fewer available datasets.On the other hand,due to the difference of data in diverse fields and the inapplicability of software metrics standards,the published datasets are rarely applied in engineering.In this paper,combined with the real software testing data in the domestic space field,the method of spacecraft software metrics design and the construction process of spacecraft software defect prediction dataset are systematically expounded.According to the characteristics of the spacecraft software,a hybrid method combining the metrics based on code and quality of the software is proposed to ensure that the relevant characteristics of the spacecraft software can be described and measured comprehensively from different angles.At the same time,to solve the problem of high labor and storage cost for large-scale data collection,processing and analysis,a standardized dataset construction method combining the data cleaning process under version division and module hierarchical preprocessing is proposed.The dataset SPACE constructed based on this method is demonstrated,which proves that the method can be effectively applied to the construction of domain-specific high-quality software defect prediction dataset,and at the same time,good prediction effect of model AutoWeka can be obtained.

Key words: Data quality, Dataset, Software defect prediction, Software metrics, Spacecraft software

中图分类号:

TP311

郑小萌, 高猛, 滕俊元. 航天器软件缺陷预测数据集构建方法研究[J]. 计算机科学, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133

ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software[J]. Computer Science, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133

参考文献

[1] HALL T,BEECHAM S,BOWES D,et al.A systematic literature review on fault prediction performance in software engineering[J].IEEE Trans.on Software Engineering,2012,38(6):1276-1304.
[2] CHEN X,GU Q,LIU W S,et al.Survey of static software defect prediction[J].Ruan Jian Xue Bao/Journal of Software,2016,27(1):1-25.
[3] WU F J.Research progress of static software defect prediction[J].Journal of Frontiers of Computer Science and Technology,2019,13(10):1621-1637.
[4] SHEPPERD M,SONG Q,SUN Z,et al.Data quality:some comments on the nasa software defect data sets[J].IEEE Trans.on Software Engineering,2013,39(9):1208-1215.
[5] GRAY D,BOWES D,DAVEY N,et al.Reflections on theNASA MDP data sets[J].IET S0ftware,2012,6(6):549-558.
[6] JURECZKO M,MADEYSKI L.Towards identifying softwareproject clusters with regard to defect prediction[C]//Proceedings of the 6th International Conference on Predictive Models in Software Engineering.Timisoara,2010,New York:ACM,2010:9.
[7] KHOSHGOFTAAR T M,GAO K,NAPOLITANO A,et al.A comparative study of iterative and non-iterative feature selection techniques for software defect prediction[J].Information Systems Frontiers,2013,16(5):1-22.
[8] WU R,ZHANG H,KIM S,et al.ReLink:recovering links between bugs and changes[C]//Proceedings of the Joint Meeting of the 19th ACM SIGSOFT Symposium and the13th European Conference on Foundations of Software Engineering,Szeged,2011.New York:ACM,2011:15-25.
[9] JING X,WU F,DONG X,et al.Heterogeneous cross-company defect prediction by unified metric representation and CCAbased transfer learning[C]//Proceedings of the Joint Meeting of the European Software Engineering Conference and the International Symposium on the Foundations of Software Engineering,Bergamo,2015.New York:ACM,2015:496-507.
[10] RADJENOVIC D,HERICKO M,TORKAR R,et al.Software fault prediction metrics:A systematic literature review[J].Information and Software Technology,2013,55(8):1397-1418.
[11] PUNITHA K,CHITRA S.Software defect prediction usingsoftware metrics-A survey[C]//International conference on Information Communication and Embedded Systems.Chennai,2013:555-558.
[12] AKIYAMA F.An example of software system debugging[C]//Proc.of the Int'l Federation of Information Proc.Societies Congress.New York:Springer Science and Business Media,1971:353-359.
[13] HALSTEAD M H.Elements of Software Science (Operatingand Programming Systems Series)[J].New York:Elsevier Science Inc.,1977.
[14] MCCABE T J.A complexity measure[J].IEEE Trans.on Software Engineering,1976,2(4):308-320.
[15] CHIDAMBER S R,KEMERER C F.A metrics suite for object oriented design[J].IEEE Trans.on Software Engineering,1994,20(6):476-493.
[16] SARKAR S,KAK A C,RAMA G M.Metrics for measuring the quality of modularization of large-scale object-oriented software[J].IEEE Trans.on Software Engineering,2008,34(5):700-720.
[17] FENTON N,BIEMAN J.Software metrics:a rigorous and practical approach(3rd ed)[M].Bosa Roca:CRC Press,2014:3-133.
[18] GJB5236-2004.Militray software quality metrics [S].Institute of China Aerospace Standardization,2004.
[19] GB/T 16260.1-2006.Software engineering-Product quality-Part 1:Quality model[S].Standardization Administration,2006.
[20] KAMEI Y,SHIHAB E,ADAMS B,et al.A large-scale empirical study of just-in-time quality assurance[J].IEEE Trans.on Software Engineering,2013,39(6):757-773.
[21] CAI L,FAN Y R,YAN M,et al.Just-in-time software defect prediction:literature review[J].Journal of Software,2019,30(5):1288-1307.

相关文章 15

[1]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[2]	颜敏, 罗晓清, 张战成. 基于光传输模型学习的红外和可见光图像融合网络设计 Infrared and Visible Image Fusion Network Based on Optical Transmission Model Learning 计算机科学, 2022, 49(4): 215-220. https://doi.org/10.11896/jsjkx.210200174
[3]	江昊琛, 魏子麒, 刘璘, 陈俊. 非均衡数据分类经典方法综述与面向医疗领域的实验分析 Imbalanced Data Classification:A Survey and Experiments in Medical Domain 计算机科学, 2022, 49(1): 80-88. https://doi.org/10.11896/jsjkx.210200124
[4]	和青芳, 王慧, 程光. 自适应小数据集乳腺癌病理组织分类研究 Research on Classification of Breast Cancer Pathological Tissues with Adaptive Small Data Set 计算机科学, 2021, 48(6A): 67-73. https://doi.org/10.11896/jsjkx.201000188
[5]	张曼, 李杰, 朱新忠, 沈霁, 成昊天. 基于改进DCGAN算法的遥感数据集增广方法 Augmentation Technology of Remote Sensing Dataset Based on Improved DCGAN Algorithm 计算机科学, 2021, 48(6A): 80-84. https://doi.org/10.11896/jsjkx.200700185
[6]	滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[7]	周彦, 陈少昌, 吴可, 宁明强, 陈宏昆, 张鹏. SCTD1.0:声呐常见目标检测数据集 SCTD 1.0:Sonar Common Target Detection Dataset 计算机科学, 2021, 48(11A): 334-339. https://doi.org/10.11896/jsjkx.210100138
[8]	王萧萧, 王亭雯, 马玉玲, 范佳奕, 崔超然. 基于深度森林的P2P网贷借款人信用风险评估方法 Credit Risk Assessment Method of P2P Online Loan Borrowers Based on Deep Forest 计算机科学, 2021, 48(11A): 429-434. https://doi.org/10.11896/jsjkx.201000013
[9]	杨连平, 孙玉波, 张红良, 李封, 张祥德. 基于编解码残差的人体关键点匹配网络 Human Keypoint Matching Network Based on Encoding and Decoding Residuals 计算机科学, 2020, 47(6): 114-120. https://doi.org/10.11896/jsjkx.200300079
[10]	蔡强, 邓毅彪, 李海生, 余乐, 明少锋. 基于深度学习的人体行为识别方法综述 Survey on Human Action Recognition Based on Deep Learning 计算机科学, 2020, 47(4): 85-93. https://doi.org/10.11896/jsjkx.190300005
[11]	王莹, 郑丽伟, 张禹尧, 张晓妘. 面向中文APP用户评论数据的软件需求挖掘方法 Software Requirement Mining Method for Chinese APP User Review Data 计算机科学, 2020, 47(12): 56-64. https://doi.org/10.11896/jsjkx.201200031
[12]	沈琦, 陈逸伦, 刘枢, 刘利刚. 基于两级网络的三维目标检测算法 3D Object Detection Algorithm Based on Two-stage Network 计算机科学, 2020, 47(10): 145-150. https://doi.org/10.11896/jsjkx.190900172
[13]	李卓, 徐哲, 陈昕, 李淑琴. 面向移动群智感知的位置相关在线多任务分配算法 Location-related Online Multi-task Assignment Algorithm for Mobile Crowd Sensing 计算机科学, 2019, 46(6): 102-106. https://doi.org/10.11896/j.issn.1002-137X.2019.06.014
[14]	张昉, 赵书良, 武永亮. 面向多尺度数据挖掘的数据尺度划分方法 Data Scaling Method for Multi-scale Data Mining 计算机科学, 2019, 46(4): 57-65. https://doi.org/10.11896/j.issn.1002-137X.2019.04.009
[15]	赵振兵, 崔雅萍, 戚银城, 杜丽群, 张珂, 翟永杰. 基于改进的R-FCN航拍巡线图像中的绝缘子检测方法 Detection Method of Insulator in Aerial Inspection Image Based on Modified R-FCN 计算机科学, 2019, 46(3): 159-163. https://doi.org/10.11896/j.issn.1002-137X.2019.03.024

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed