R语言及其核心包缺陷的实证研究

doi:10.11896/jsjkx.220200181

摘要/Abstract

摘要： R语言提供了多种统计计算的功能,并被认为是最适合人工智能领域的程序设计语言之一。语言功能的正确实现是R语言程序正确运行的必要前提,但R语言中不可避免地存在着诸多软件缺陷。文中对R语言及其核心包中的历史缺陷进行了实证研究。通过分析R语言及其核心包中的7 020个缺陷报告发现:1)缺陷所涉及的35个R语言版本中R 3.1.2,R 3.0.2,R 3.5.0所含缺陷的数量较多,这些缺陷大量分布于Documentation,Graphics,Language等少数组件中;2)缺陷优先级整体较高的组件依次是Startup,Installation和Analyses,缺陷严重程度整体较高的组件依次是I/O,Installation和Accuracy,缺陷的优先级和严重性之间存在中等强度的秩相关;3)约78%的缺陷可在一年之内被修复;4)语义错误是缺陷最常见的根本原因,其中缺少功能和数据处理错误在各个阶段均占有较高的比例。这些发现揭示了R语言及其核心包中历史缺陷的一些基本规律,可在一定程度上帮助R语言开发人员提高开发质量,帮助R语言维护人员更高效地检测和修复缺陷,并帮助R语言的使用者规避潜在风险。

关键词: R语言, 实证研究, 软件缺陷, 缺陷分布, 缺陷修复, 缺陷的根本原因

Abstract: The R programming language that provides a variety of statistical calculation functions is considered to be one of the programming languages most suitable for artificial intelligence.The correctness of the language implementation is a prerequisite for the correctness of the programs developed with such a language.However,there are inevitably many defects in the R programming language.This paper conducts an empirical study on defects in the R programming language and its core packages.By analyzing 7020 issues,we find that:1) Among all the 35 versions involved in these defects,there are the most defects in R 3.1.2,R 3.0.2 and R 3.5.0,and these defects are primarily distributed in a few components such as Documentation,Graphics,Language.2) The components with higher overall defect priority include Startup,Installation and Analyses,and the components with higher overall defect severity include I/O,Installation and Accuracy.There is a significant intermediate correlation between the priority and severity of the defects.3) About 78% of defects could be repaired within one year.4) Semantic faults are the most frequent root cause of defects,in which the “missing feature” and “processing” are more than others.These findings reveal some laws of defects in the R programming language and its core packages.It can assist developers of the R programming language in improving their development quality,assist maintainers of the R programming language in detecting and repairing defects more effectively,and suggest users of the R programming language evade potential risks.

Key words: R programming language, Empirical study, Software defect, Distribution of defects, Defect repair, Root cause

中图分类号:

TP311

王子元, 卜德欣, 李凌菱, 张霞. R语言及其核心包缺陷的实证研究[J]. 计算机科学, 2022, 49(12): 89-98. https://doi.org/10.11896/jsjkx.220200181

WANG Zi-yuan, BU De-xin, LI Ling-ling, ZHANG Xia. Empirical Study on Defects in R Programming Language and Core Packages[J]. Computer Science, 2022, 49(12): 89-98. https://doi.org/10.11896/jsjkx.220200181

参考文献

[1]TAN L,LIU C,LI Z M,et al.BugCharacteristics in Open Source Software [J].Empirical Software Engineering,2014,19(6):1665-1705.
[2]WAN Z Y,LO D,XIA X,et al.Bug Characteristics in Blockchain Systems:A Large-Scale Empirical Study [C]//Procee-dings of the IEEE/ACM 14th International Conference on Mi-ning Software Repositories(MSR 2017).2017:413-424.
[3]RAZZAQ S,LI Y F,LIN C T,et al.A Study of the Extraction of Bug Judgment and Correction Times from Open Source Software Bug Logs [C]//Proceedings of the IEEE International Conference on Software Quality,Reliability and Security Companion(QRS-C 2018).2018:229-234.
[4]BHATTACHARYA P,ULANOVA L,NEAMTIU I,et al.An Empirical Analysis of Bug Reports and Bug Fixing in Open Source Android Apps [C]//Proceedings of the 17th European Conference on Software Maintenance and Reengineering.2013:133-143.
[5]SAHA R K,KHURSHID S,PERRY D E.An Empirical Study of Long Lived Bugs [C]//Proceedings of the Software Evolution Week-IEEE Conference on Software Maintenance,Reenginee-ring,and Reverse Enginee-ring(CSMR-WCRE 2014).2014:144-153.
[6]YUE R R,MENG N,WANG Q X.A Characterization Study of Repeated Bug Fixes [C]//Proceedings of the IEEE Interna-tional Conference on Software Maintenance and Evolution(ICSME 2017).2017:422-432.
[7]ZIMMERMANN T,NAGAPPAN N,GUO P J,et al.Characte-rizing and Predicting Which Bugs Get Reopened [C]//Procee-dings of the 34th International Conference on Software Enginee-ring(ICSE 2012).2012:1074-1083.
[8]SUN C N,DU J,CHEN N,et al.Mining Explicit Rules for Software Process Evaluation [C]//Proceedings of the International Conference on Software and System Process(ICSSP 2013).2013:118-125.
[9]CHEN N,HOI S C H,XIAO X K.Software Process Evaluation:A Machine Learning Approach [C]//Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering(ASE 2011).2011:333-342.
[10]SUN C N,LE V,ZHANG Q R,et al.Toward UnderstandingCompiler Bugs in GCC and LLVM [C]//Proceedings of the 25th International Symposium on Software Testing and Analysis(ISSTA 2016).2016:294-305.
[11]SAHOO S K,CRISWELL J,ADVE V.An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis [C]//Proceedings of the ACM/IEEE 32nd International Conference on Software Engineering(ICSE 2010).2010:1-10.
[12]LE V,SUN C N,SU Z D.Finding Deep Compiler Bugs via Guided Stochastic Program Mutation [C]//Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming,Systems,Languages,and Applications(OOPSLA 2015).2015:386-399.
[13]ZAMAN S,ADAMS B,E.HASSAN A.Security Versus Performance Bugs:A Case Study on Firefox [C]//Proceedings of the 8th Working Conference on Mining Software Repositories(MSR 2011).2011:93-102.
[14]VUIJAYAKUMAR K,BHUVANESWARI V.How Much Effort Needed to Fix the Bug? A Data Mining Approach for Effort Estimation and Analysing of Bug Report Attributes in Firefox [C]//Proceedings of the International Conference on Intelligent Computing Applications.2014:335-339.
[15]LI F,PAXSON V.A Large-Scale Empirical Study of Security Patches [C]//Proceedings of the ACM Conference on Compu-ter and Communications Security(CCS 2017).2017:2201-2215.
[16]HANAM Q,BRITO F S D M,MESBAH A.Discovering Bug Patterns in JavaScript [C]//Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering(FSE 2016).2016:144-156.
[17]NGUYEN T,VU P M,NGUYE T T.An Empirical Study of Exception Handling Bugs and Fixes [C]//Proceedings of the ACM Southeast Conference(ACMSE 2019).2019:257-260.
[18]SUN X B,ZHOU T C,LI G J,et al.An Empirical Study on Real Bugs for Machine Learning Programs [C]//Proceedings of the 24th Asia-Pacific Software Engineering Conference(APSEC 2017).2017:348-357.
[19]ZHANG Y H,CHEN Y F,CHEUNG S C,et al.An Empirical Study on TensorFlow Program Bugs [C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA 2018).2018:129-140.
[20]ZHANG R,XIAO W C,ZHANG H Y,et al.An Empirical Stu-dy on Program Failures of Deep Learning Jobs [C]//Procee-dings of the 42nd International Conference on Software Enginee-ring(ICSE 2020).2020:1159-1170.
[21]ISLAM M J,NGUYEN G,PAN R,et al.A ComprehensiveStudy on Deep Learning Bug Characteristics [C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC /FSE 2019).2019:510-520.
[22]DU X T,XIAO G P,SUI Y L.Fault Triggers in the TensorFlow Framework:An Experience Report [C]//Proceedings ofthe IEEE 31st International Symposium on Software Reliability Engineering(ISSRE 2020).2020:1-12.
[23]GRISHMA B R,ANJALI C.Software Root Cause Prediction using Clustering Techniques:A Review [C]//Proceedings of Global Conference on Communication Technologies(GCCT 2015).2015:511-515.
[24]HIRSCH T,HOFER B.Root Cause Prediction Based on BugReports [C]//Proceedings of the 31st IEEE International Symposium on Software Reliability Engineering Workshops(ISSREW 2020).2020:171-176.
[25]LAL H,HOFER B,PAHWA G.Root Cause Analysis of Software Bugs using Machine Learning Techniques [C]//Procee-dings of 7th International Conference on Cloud Computing,Data Science & Engineering-Confluence.2017:105-111.
[26]JEFFREY D,GUPTA N,GUPTA R.Identifying the Root Causes of Memory Bugs Using Corrupted Memory Location Suppression [C]//Proceedings of the IEEE International Confe-rence on Software Maintenance(ICSM 2008).2008:356-369.
[27]THUNG F,LO D,JIANG L X.Automatic Recovery of Root Causes from Bug-Fixing Changes [C]//Proceedings of 20th Working Conference on Reverse Engineering(WCRE 2013).2013:92-101.
[28]DALAL S,CHHILLAR R S.Empirical Study of Root CauseAnalysis of Software Failure [J].ACM SIGSOFT Software Engineering Notes,2013,38(4):1-7.

相关文章 15

[1]	倪珍, 李斌, 孙小兵, 李必信, 朱程. 面向软件缺陷报告的缺陷定位方法研究与进展 Research and Progress on Bug Report-oriented Bug Localization Techniques 计算机科学, 2022, 49(11): 8-23. https://doi.org/10.11896/jsjkx.220200117
[2]	郑小萌, 高猛, 滕俊元. 航天器软件缺陷预测数据集构建方法研究 Research on Construction Method of Defect Prediction Dataset for Spacecraft Software 计算机科学, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133
[3]	滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[4]	胡腾, 王艳平, 张小松, 牛伟纳. 基于区块链的DApp数据与行为分析 Data and Behavior Analysis of Blockchain-based DApp 计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134
[5]	陈正钊, 姜人和, 潘敏学, 张天, 李宣东. 基于约束求解的代码查询技术在StackOverflow上的实证研究 Empirical Study of Code Query Technique Based on Constraint Solving on StackOverflow 计算机科学, 2019, 46(11): 137-144. https://doi.org/10.11896/jsjkx.191100501C
[6]	邱少健, 蔡子仪, 陆璐. 基于卷积神经网络的代价敏感软件缺陷预测模型 Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction 计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C
[7]	胡梦园, 黄鸿云, 丁佐华. 用于软件缺陷预测的集成模型 Ensemble Model for Software Defect Prediction 计算机科学, 2019, 46(11): 176-180. https://doi.org/10.11896/jsjkx.180901685
[8]	薛参观, 燕雪峰. 基于改进深度森林算法的软件缺陷预测 Software Defect Prediction Based on Improved Deep Forest Algorithm 计算机科学, 2018, 45(8): 160-165. https://doi.org/10.11896/j.issn.1002-137X.2018.08.029
[9]	陈翔, 王秋萍. 基于代码修改的多目标有监督缺陷预测建模方法 Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes 计算机科学, 2018, 45(6): 161-165. https://doi.org/10.11896/j.issn.1002-137X.2018.06.028
[10]	朱朝阳,陈相舟,闫龙,张信明. 基于主成分分析法的人工免疫识别软件缺陷预测模型研究 Research on Software Defect Prediction Based on AIRS Using PCA 计算机科学, 2017, 44(Z6): 483-485. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.107
[11]	杨杰,燕雪峰,张德平. 基于Boosting的代价敏感软件缺陷预测方法 Cost-sensitive Software Defect Prediction Method Based on Boosting 计算机科学, 2017, 44(8): 176-180. https://doi.org/10.11896/j.issn.1002-137X.2017.08.031
[12]	甘露,臧洌,李航. 深度信念网软件缺陷预测模型 Deep Belief Network Software Defect Prediction Model 计算机科学, 2017, 44(4): 229-233. https://doi.org/10.11896/j.issn.1002-137X.2017.04.049
[13]	张宇霞. Mozilla项目缺陷修复追踪关系研究 Study on Bug-fixed Traceability of Mozilla Project 计算机科学, 2017, 44(4): 21-23. https://doi.org/10.11896/j.issn.1002-137X.2017.04.005
[14]	王铁建,吴飞,荆晓远. 基于多核字典学习的软件缺陷预测 Multiple Kernel Dictionary Learning for Software Defect Prediction 计算机科学, 2017, 44(12): 131-134. https://doi.org/10.11896/j.issn.1002-137X.2017.12.026
[15]	陈诚,郑征,王皓钦,乔禹. 基于测试充分性准则的非死锁并发缺陷定位方法 Non-deadlock Concurrency Fault Localization Approach Based on Adequate Test Criteria 计算机科学, 2017, 44(11): 195-201. https://doi.org/10.11896/j.issn.1002-137X.2017.11.030

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed