计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 1-11.doi: 10.11896/jsjkx.230200069

• 计算机软件 • 上一篇    下一篇

R语言程序包依赖关系与更新情况的实证研究

程弘正1, 杨文华1,2   

  1. 1 南京航空航天大学计算机科学与技术学院 南京 211106
    2 软件新技术与产业化协同创新中心 南京 210023
  • 收稿日期:2023-02-10 修回日期:2023-05-01 出版日期:2024-06-15 发布日期:2024-06-05
  • 通讯作者: 杨文华(ywh@nuaa.edu.cn)
  • 作者简介:(maybechz@163.com)

Empirical Study on Dependencies and Updates of R Packages

CHENG Hongzheng1, YANG Wenhua1,2   

  1. 1 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
    2 Collaborative Innovation Center of Novel Software Technology and Industrialization,Nanjing 210023,China
  • Received:2023-02-10 Revised:2023-05-01 Online:2024-06-15 Published:2024-06-05
  • About author:CHENG Hongzheng,born in 1998,postgraduate,is a student member of CCF(No.O7976G).His main research interests include intelligent software engineering and empirical study.
    YANG Wenhua,born in 1990,Ph.D,associate professor,is a member of CCF(No.96710M).His main research interests include software engineering and self-adaptive software systems.

摘要: 作为一款统计分析和统计制图的优秀工具,R在统计分析和人工智能领域得到了广泛应用,并且拥有丰富的开源生态系统,相关R语言程序包(R包)的数量也在持续增长。R包开发模式的特征,即新开发R包往往通过引入已有的R包来实现功能,导致R包之间的依赖关系非常复杂,甚至出现依赖冲突。而引起此问题的原因除了依赖关系外,还有R包的更新。为了了解现有R包的发展现状,需要对R包的依赖和更新情况进行深入实证研究。但已有关于R的实证研究关注的主要是整个R生态系统,没有专门针对R包的依赖和更新的具体分析。为了弥补这一空缺,基于CRAN与GitHub上的数据对常用R包的依赖关系、包的更新情况、存在的依赖冲突隐患以及R包的依赖更新情况4方面展开了详细分析。发现R包之间的依赖关系复杂、每个包依赖的包的数量普遍较多但依赖集中于一部分R包,虽然常用R包的更新频率较快,但其中依然存在不少依赖间的冲突(不一致);同时,还对这些R包的依赖冲突进行了检测和分类。实证研究结果能够让R开发者和使用者更加了解R包的发展现状,同时提供了一些可以帮助R包的开发者在开发过程中避免隐患的建议,总结了研究者在R包依赖和更新相关问题上可以进一步探究的方向。

关键词: R包, 实证研究, 依赖, 更新, 依赖冲突

Abstract: As an excellent tool for statistical analysis and statistical cartography,R is very popular in the field of statistical analysis and artificial intelligence,and it has a rich open-source ecosystem with a growing number of R packages.The characteristics of the R package development model,i.e.,the new development of an R package is often implemented by introducing existing R packages to achieve functionality,resulting in very complex dependencies between R packages and even dependency conflicts.The other factor that causes this problem is the update of the R package,in addition to the dependencies.Therefore,an in-depth empi-rical study of the dependencies and updates of R packages is needed to understand the current state of development of existing R packages.However,existing empirical studies on R have focused on the entire R ecosystem without a specific analysis of the dependencies and updates of R packages.To bridge this gap,this paper presents a detailed analysis of the dependencies,the updates,the potential conflicts of dependencies,and the updates of dependencies of common R packages based on data from CRAN(Comprehensive R Archive Network) and GitHub.It is found that the dependency relationships between R packages are complex,and the number of packages each R package depends on is generally high.Still,the dependencies are concentrated in a part of R packages.Although the update frequency of common R packages is fast,there are still many conflicts(inconsistencies) between depen-dencies,and we detected and classified the dependency conflicts of these R packages.The results of our empirical study can provide R developers and users with a better understanding of the current state of R package development,and provide some suggestions that can help R package developers avoid pitfalls in the development process,as well as directions that researchers can explore further on issues related to R package dependencies and updates.

Key words: R package, Empirical study, Dependency, Update, Dependency conflict

中图分类号: 

  • TP311
[1]TIOBE.TIOBE Index[EB/OL].https://www.tiobe.com/tiobe-index/.
[2]FOUNDATION R.The Comprehensive R Archive Network[EB/OL].https://cran.r-project.org/.
[3]CTAN team.CTAN:Comprehensive TeX Archive Network[EB/OL].https://www.ctan.org/.
[4]NetActuate.The Comprehensive Perl Archive Network[EB/OL].https://www.cpan.org/.
[5]HORNIK K,LEISCH F.Vienna and R:Love,Marriage and the Future[M].Citeseer,2002:61-70.
[6]FOUNDATION R.The Status of CRAN Mirrors[EB/OL].https://cran.r-project.org/mirmon_report.html.
[7]BOMMARITO E,BOMMARITO II M J.An Empirical Analysis of the R Package Ecosystem[J].arXiv:2102.09904,2021.
[8]WANG Y,WEN M,LIU Y,et al.Watchman:Monitoring dependency conflicts for python library ecosystem[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:125-135.
[9]WANG Y,CHEN B,HUANG K,et al.An empirical study of usages,updates and risks of third-party libraries in java projects[C]//2020 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2020:35-45.
[10]PLAKIDAS K,SCHALL D,ZDUN U.Evolution of the R software ecosystem:Metrics,relationships,and their impact on qualities[J].Journal of Systems and Software,2017,132:119-146.
[11]DECAN A,MENS T,CLAES M,et al.On the development and distribution of R packages:An empirical analysis of the R ecosystem[C]//Proceedings of the 2015 European Conference on Software Architecture Workshops.2015:1-6.
[12]WANG Z Y,BU D X,LI L L,et al.An Empirical Study of R Language and Core Package Defects[J].Computer Science,2022,49(12):89-98.
[13]ZIMMERMANN M,STAICU C A,TENNY C,et al.Smallworld with high risks:A study of security threats in the npm ecosystem[C]//28th USENIX Security Symposium(USENIX Security 19).2019:995-1010.
[14]LERTWITTAYATRAI N,KULA R G,ONOUE S,et al.Extracting insightsfrom the topology of the javascript package ecosystem[C]//2017 24th Asia-Pacific Software Engineering Conference(APSEC).IEEE,2017:298-307.
[15]MENS T.An ecosystemic and socio-technical view on software maintenance and evolution[C]//2016 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2016:1-8.
[16]PATRA J,DIXIT P N,PRADEL M.Conflictjs:finding and understanding conflicts between javascript libraries[C]//Procee-dings of the 40th International Conference on Software Enginee-ring.2018:741-751.
[17]ARTHO C,SUZAKI K,DI COSMO R,et al.Why do software packages conflict?[C]//2012 9th IEEE Working Conference on Mining Software Repositories(MSR).IEEE,2012:141-150.
[18]SOTO-VALERO C,BENELALLAM A,HARRAND N,et al.The emergence of software diversity in maven central[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).IEEE,2019:333-343.
[19]WANG Y,WEN M,LIU Z,et al.Do the dependency conflicts in my project matter?[C]//Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2018:319-330.
[20]WANG Y,WEN M,WU R,et al.Could i have a stack trace to examine the dependency conflict issue?[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).IEEE,2019:572-583.
[21]BAVOTA G,CANFORA G,DI PENTA M,et al.The evolution of project inter-dependencies in a software ecosystem:The case of apache[C]//2013 IEEE International Conference on Software Maintenance.IEEE,2013:280-289.
[22]BAVOTA G,CANFORA G,DI PENTA M,et al.How theapache community upgrades dependencies:an evolutionary study[J].Empirical Software Engineering,2015,20(5):1275-1317.
[23]FUJIBAYASHI D,IHARA A,SUWA H,et al.Does the release cycle of a library project influence when it is adopted by a client project?[C]//2017 IEEE 24th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2017:569-570.
[24]KULA R G,GERMAN D M,OUNI A,et al.Do developers update their library dependencies?[J].Empirical Software Engineering,2018,23(1):384-417.
[25]DERR E,BUGIEL S,FAHL S,et al.Keep me updated:An empirical study of third-party library updatability on android[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:2187-2200.
[26]SALZA P,PALOMBA F,DI NUCCI D,et al.Do developers update third-party libraries in mobile apps?[C]//Proceedings of the 26th Conference on Program Comprehension.2018:255-265.
[27]ZEROUALI A,CONSTANTINOU E,MENS T,et al.An empirical analysisof technical lag in npm package dependencies[C]//International Conference on Software Reuse.Cham:Springer,2018:95-110.
[28]DECAN A,MENS T,CLAES M.An empirical comparison of dependency issues in OSS packaging ecosystems[C]//2017 IEEE 24th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2017:2-12.
[29]DECAN A,MENS T,CONSTANTINOU E.On the evolution of technical lag in the npm package dependency network[C]//2018 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2018:404-414.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!