计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220600221-8.doi: 10.11896/jsjkx.220600221

• 软件&交叉 • 上一篇    下一篇

开源软件中社区文档应用与维护的实证研究

张禹1, 王哲2, 李志星1, 余跃1, 王涛1, 蔡孟栾1   

  1. 1 国防科技大学计算机学院 长沙 410000;
    2 清华大学公共管理学院 北京 100084
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 李志星(lizhixing15@nudt.edu.cn)
  • 作者简介:(zhangyu_@nudt.edu.cn)
  • 基金资助:
    基于群智范式的泛在操作系统开源生态构建、治理及安全评估研究(62141209)

Empirical Study on Application and Maintenance of OSS Community Profile Documentation

ZHANG Yu1, WANG Zhe2, LI Zhixing1, YU Yue1, WANG Tao1, CAI Mengluan1   

  1. 1 College of Computer Science and Technology,National University of Defense Technology,Changsha 410000,China;
    2 School of Public Policy and Management,Tsinghua University,Beijing 100084,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHANG Yu,born in 1995,postgra-duate.Her main research interests include software engineering,data mining,and knowledge graph in open source communities. LI Zhixing ,born in 1992,assistant professor.His main research interests include software engineering,data mi-ning,and knowledge discovering in open source communities.
  • Supported by:
    Research on Open Source Ecosystem Construction,Governance and Security Assessment of Ubiquitous Operating System based on Crowd Intelligence Paradigm(62141209).

摘要: 社区文档对开源软件的开发和和管理具有重要意义,虽然已有部分研究对社区文档的内容进行了初步分析,但是社区文档在开源软件中的实际应用情况以及开发者对社区文档的维护实践尚未得到充分研究。为填补这一空白,通过量化分析手段探索了社区文档的应用和维护现状。首先随机收集了托管于GitHub平台上近2000个开源项目的社区文档数据,分析了编程语言、项目所有者类型、项目成长期和项目社区规模等多个因素对社区文档应用的影响。与此同时,从文档位置、创建延迟、维护者、更新频率和更新原因多个方面分析了社区文档的维护实践。研究结果表明,README文档和LICENSE文档比CONTRIBUTING,CONDUCT和TEMPLATE文档具有更高的应用普及率和更早的应用时间。此外,社区文档在TypeScript语言的项目、大规模社区的项目以及组织所拥有的项目中的应用更为普遍。就文档位置而言,社区文档常被放置于项目的根目录中,由开发者中的小部分成员出于完善性和适应性需求,对文档进行低频率的更新和维护。本研究有助于增强开源软件开发者和使用者对社区文档应用和维护实践的了解,引导开源软件社区的健康发展。

关键词: 开源软件, 社区文档, 文档普及, 文档维护, GitHub

Abstract: Community profile documentation is crucial for the establishment and management of open source software(OSS) communities.Although prior research has conducted content analysis of community profile documentation,little is known about how common it is in practice and how it is maintained by OSS practitioners.We aim at complementing the current understanding of community profile documentation by providing a quantitative description of its prevalence and maintenance.We randomly collect 2000 OSS projects from GitHub,based on which we study the documentation popularity by programming language,repository owner type,repository age,and community size,respectively.We also investigate the maintenance practice of community profile documentation in terms of location,creation latency,maintainers,update frequency and change-triggering events,respectively.The README and LICENSE documentation is far more popular and created earlier than the CONTRIBUTING,CONDUCT and TEMPLATE documentation in GitHub OSS projects.Community profile documentation is more likely to be found in repositories of TypeScript,repositories of larger community size,and repositories owned by organizations.Community profile documentation is mainly placed in the root directory and changed by a small group of developers with a low frequency of update,which is mostly driven by perfective and adaptive requirements.

Key words: Open source software, Community profile documentation, Documentation prevalence, Documentation maintenance, GitHub

中图分类号: 

  • TP311
[1]TREUDE C,ROBILLARD M P.Augmenting api documentation with insights from stack overflow[C]//2016 IEEE/ACM 38th International Conference on Software Engineering(ICSE).IEEE,2016:392-403.
[2]ZHONG H,LU Z,TAO X,et al.Inferring resource specifications from natural language API documentation[C]//2009 IEEE/ACM International Conference on Automated Software Engineering.IEEE,2009:307-318.
[3]JIANG H,ZHANG J,REN Z,et al.An unsupervised approach for discovering relevant tutorial fragments for APIs[C]//2017 IEEE/ACM 39th International Conference on Software Engineering(ICSE).IEEE,2017:38-48.
[4]HERBSLEB J,TSAY J,STUART C,et al.Social coding in Git-Hub:transparency and collaboration in an open software repository[C]//Proceedings of the ACM 2012 Conference on ComputerSupported Cooperative Work.ACM,2012:1277-1286.
[5]About community profiles for public repositories[OL].https://docs.github.com/en/communities/setting-up-your-project-for-healthy-ontributions/about-community-rofiles-for-public-repositories.Accessed:2021-05-20.
[6]STOREY M A.How social and communication channels shape and challenge a participatory culture in software development[J].IEEE Transactions on Software Engineering,2016,43(2):185-204.
[7]JIANG J,LO D,HE J,et al.Why and how developers fork what from whom in GitHub[J].Empirical Software Engineering,2017,22(1):1-32.
[8]ZHU J X,ZHOU M H,MOCKUS A.Effectiveness of code contribution:From patchbased to pull-request-based tools[C]//Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2016:871-882.
[9]The State of the OCTOVERSE[OL].https://octoverse.github.com.Accessed:2020-08-28.
[10]PRANA G A A,TREUDE C,THUNG F,et al.Categorizing the Content of GitHub README Files[J].Empirical Software Engineering,2018,24(3):1296-1327.
[11]VENDOME C,BAVOTA G,PENTA M D,et al.License usage and changes:a large-scale study on gitHub[J].Empirical Software Engineering,2017,22(3):1537-1577.
[12]VENDOME C,LINARES-VASQUEZ M,BAVOTA G,et al.Machine learning-based detection of open source license exceptions[C]//2017 IEEE/ACM 39th International Conference on Software Engineering(ICSE).IEEE,2017:118-129.
[13]STEINMACHER I,CONTE T U,TREUDE C,et al.Overcoming open source project entry barriers with a portal for newcomers[C]//Proceedings of the 38th International Conference on Software Engineering.ACM,2016:273-284.
[14]ELAZHARY O,STOREY M A,ERNST N,et al.Do as I Do,Notas I Say:Do Contribution Guidelines Match the GitHub Contribution Process?[C]//2019 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2019:286-290.
[15]LI Z X,YU Y,WANG T,et al.Are You Still Working on This? An Empirical Study on Pull Request Abandonment[J].IEEE Transactions on Software Engineering(TSE),2022,48(6):2173-2188.
[16]LI Z X,YU Y,ZHOU M H,et al.Redundancy,Context,and Preference:An Empirical Study of Duplicate Pull Requests in OSS Projects[J].IEEE Transactions on Software Engineering,2022,48(4):1309-1335.
[17]TOURANI P,ADAMS B,SEREBRENIK A.Code of conduct in open source projects[C]//2017 IEEE 24th International Confe-rence on Software Analysis,Evolution and Reengineering(SANER).IEEE,2017:24-33.
[18]BISSYAND T F,LO D,JIANG L,et al.“Got issues?who cares about it? a large scale investigation of issue trackers from github[C]//2013 IEEE 24th International Symposium on Software Reliability Engineering(ISSRE).IEEE,2013:188-197.
[19]VASILESCU B,YUE Y,WANG H,et al.Quality and productivity outcomes relating to continuous integration in GitHub[C]//Proceedings of the 10th Joint Meeting on Foundations of Software Engineering.2015:805-816.
[20]GOUSIOS G,PINZGER M,VAN DEURSEN A.An exploratory study of the pull-based software development model[C]//International Conference Software Engineering.2014:345-355.
[21]GOUSIOS G,STOREY M A,BACCHELLI A.Work practices and challenges in pull-based development:the contributor’s perspective[C]//Proceedings of the 38th International Conference on Software Engineering.2016:285-296.
[22]LEE A,CARVER J C,BOSU A.Understanding the impres-sions,motivations,and barriers of one time code contributors to FLOSS projects:a survey[C]//Proceedings of the 39th International Conference on Software Engineering.IEEE Press,2017:187-197.
[23]STEINMACHER I,AUR M,SILVA l G,et al.Barriers faced by newcomers to open source projects:a systematic review[C]//IFIP International Conference on Open Source Systems.Sprin-ger,2014:153-163.
[24]TROCKMAN A.Adding sparkle to social coding:an empiricalstudy of repository badges in the npm ecosystem[C]//Procee-dings of the 40th International Conference on Software Enginee-ring.2018:511-522.
[25]COELHO J,VALENTE M T.Why modern open source pro-jects fail[C]//Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering.2017:186-196.
[26]JIANG J,LO D,MA X,et al.Understanding inactive yet available assignees in GitHub[J].Information and Software Techno-logy,2017(91):44-55.
[27]About organizations[OL].https://docs.github.com/en/github/setting-up-and-managing-organizations-and-teams/about-organizations.
[28]WILCOXON F.Individual Comparisons by Ranking Methods[M].Springer Series in Statistics,Breakthroughs in Statistics,2011:196-202.
[29]ZLOTNICK F.GitHub Open Source Survey 2017[OL].http://opensourcesurvey.org/2017/.June 2017.doi:10.5281/zenodo.806811.
[30]HASSAN A E.Predicting faults using the complexity of codechanges[C]//Proceedings of the 31st International Conference on Software Engineering.IEEE Computer Society,2009:78-88.
[31]MOCKUS A,VOTTA L G.Identifying Reasons for Software Changes using Historic Databases[C]//Proceedings of the 2000 International Conference on Software Maintenance.2000:120-130.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!