计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 26-36.doi: 10.11896/jsjkx.250200108

• 计算机软件 • 上一篇    下一篇

开源项目中的子社区发现与评价:以Apache IoTDB为例

王威伟1, 乐阳2, 王彦凯1   

  1. 1 清华大学软件学院 北京 100084
    2 华中科技大学计算机科学与技术学院 武汉 430074
  • 收稿日期:2025-02-26 修回日期:2025-06-12 发布日期:2025-07-17
  • 通讯作者: 王威伟(gongchengphd@163.com)
  • 基金资助:
    重庆市创新技术与应用发展专项重大项目(CSTB2023TIAD-STX0034)

Sub-community Detection and Evaluation in Open Source Projects:An Example of Apache IoTDB

WANG Weiwei1, LE Yang2, WANG Yankai1   

  1. 1 School of Software, Tsinghua University, Beijing 100084, China
    2 Department of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2025-02-26 Revised:2025-06-12 Published:2025-07-17
  • About author:WANG Weiwei,born in 1979,master.His main research interests include open source and industrial software.
  • Supported by:
    Chongqing Innovative Technology and Application Development Special Major Project(CSTB2023TIAD-STX0034).

摘要: 随着开源协作成为软件研发广泛应用的范式,开源软件的项目规模和结构越来越复杂。在开源协作模式下,如何保证大型复杂软件的软件质量,成为亟待解决的问题。现有开源社区运作模式中,一个项目的社区往往被作为一个整体,这与复杂软件的模块化设计思路相矛盾。聚焦开源项目中的子社区现象,通过分析代码提交记录和文件变更历史,将开发者与代码文件建模为图结构,提出了一种基于开发者和代码修改记录的子社区发现算法。通过引入社团内参与系数和社团间参与系数,建立核心开发者识别模型,为项目管理者提供开发者贡献度与协作重要性的量化评估工具。同时,设计了一种综合考虑模块集中度和分散度的子社区评分方法,以评估不同子社区在模块开发过程中的质量表现。以 Apache IoTDB 项目为案例进行实证分析,通过挖掘 282 位开发者的 11 523 次提交记录,构建了协作网络,识别出4个具有显著特征的子社区。实验结果显示,核心开发者识别结果与各子社区的代码质量评估得分均与实际开发状况相符,验证了所提模型和方法在开源项目中的有效性。

关键词: 开源项目, 协作网络, 子社区发现, 核心开发者, Apache IoTDB, 代码质量评估

Abstract: As open-source collaboration has become a widely adopted paradigm in software development,the scale and structure of open-source projects have grown increasingly complex.Within the open-source collaboration model,ensuring software quality in large and intricate software systems has emerged as a critical issue.In the existing operational models of open-source communities,a project's community is often treated as a single entity,which contradicts the modular design principles of complex software.This study focuses on the phenomenon of sub-communities within open-source projects.Based on the analysis of code commit records and file change histories,a graph structure is constructed to model the relationships between developers and code files,upon which a sub-community detection algorithm is proposed that leverages developer activity and code modification records.By introducing intra-community participation coefficients and inter-community participation coefficients,this paper estab-lishes a core developer identification model,providing project managers with a quantitative evaluation tool for assessing developer contributions and collaboration significance.Additionally,it designs a sub-community scoring method that comprehensively considers both modular concentration and dispersion to evaluate the quality performance of different sub-communities in the module development process.An empirical analysis is conducted using the Apache IoTDB project as a case study.By mining 11 523 commit records from 282 developers,it constructs a collaboration network and identify four distinct sub-communities with significant characteristics.The experimental results indicate that the core developer identification outcomes and the code quality evaluation scores of each sub-community align with actual development conditions,validating the effectiveness of the proposed models and methods in open-source projects.

Key words: Open-source project, Collaboration network, Sub-community detection, Core developers, Apache IoTDB, Code quality evaluation

中图分类号: 

  • TP311
[1]AUE J,HAISMA M,TOMASDOTTIR K F,et al.Social Diversity and Growth Levels of Open Source Software Projects on GitHub[C]//Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2016.
[2]RAJA U,TRETTER M J.Defining and evaluating a measure of open source project survivability[J].IEEE Transactions on Software Engineering,2012,38(1):163-174.
[3]CHENGALUR-SMITH I,SIDOROVA A,DANIEL S L.Sustainability of Open-Source Projects:A Longitudinal Study[J].Journal of the Association for Information Systems,2010,11(11):657-683.
[4]SAHIN S E,KUBILAY K,TOSUN A.Predicting popularity of open source projects using recurrent neural networks[C]//IFIP International Conference on Open Source Systems.Springer,2019:80-90.
[5]SAMOLADAS I,LEFTERIS A,IOANNIS S.Survival analysison the duration of open source projects[J].Information and Software Technology 2010,52(9):902-922.
[6]XIA H X,ZHANG X,ZHANG X Z.Study on collaborative net-work of OpenStack OSS developers[J].Systems Engineering-Theory & Practice,2017,37(5):1373-1382.
[7]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences of the United States of America,2006,103(23):8577-8582.
[8]BORGATTI S P,EVERETT M G.Models of Core/PeripheryStructures[J].Social Networks,2000,21(4):375-395.
[9]CHEN D,WANG X,HE P,et al.Towards UnderstandingExisting Developers' Collaborative Behavior in OSS Communities[J].Computer Science,2016,43(S1):476-501.
[10]LU D,WU J,LIU P,et al.Core developers identification of knowledge collaboration network in open source software community:A case study of AngularJS[J].Computer Engineering &Science,2021,43(3):551-559.
[11]CROWSTON K,JAMES H.Assessing the health of opensource communities[J].Computer,2006,39(5):89-91.
[12]AMAN H,BURHANDENNYA E,AMASAKI S,et al.AHealth Index of Open Source Projects Focusing on Pareto Distribution of Developer's Contribution[C]//The 8th IEEE International Workshop on Empirical Software Engineering in Practice.2017.
[13]BIAN Y,MU W,ZHAO J L.Online leadership for open source project success:Evidence from the GitHub blockchain projects[C]//PACIS2018.2018.
[14]Wikipedia contributors.Dense graph[EB/OL].https://en.wikipedia.org/wiki/Dense_graph.
[15]Wikipedia contributors.Assortativity[EB/OL].https://en.wikipedia.org/wiki/Assortativity.
[16]SETIA P,RAJAGOPALAN B,SAMBAMURTHY V,et al.How peripheral developers contribute to open-source software development[J].Information Systems Research,2012,23(1):144-163.
[17]CHRISTIAN B,PATTISON D,D'SOUZA R,et al.Latent social structure in open source projects[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering.2008.
[18]SHI Y Q,LI L,SHI Q.Impact of User Heterogeneity onKnowledge Collaboration Effectiveness from a Network Structure Perspective[J].Journal of Library and Information Sciences in Agriculture,2024,36(3):72-82.
[19]TRAAGVA,WALTMAN L,VAN ECK N J.From Louvain to Leiden:guaranteeing well-connected communities[J/OL].https://www.nature.com/articles/s41598-019-41695-z.
[20]ROSVALL M,BERGSTROMC T.Maps of random walks oncomplex networks reveal community structure[C]//Procee-dings of the National Academy of Sciences.2008:1118-1123.
[21]WOLPERT D H,MACREADY W G.No free lunch theorems for optimization[J].IEEE Transactions on Evolutionary Computation,1997,1(1):67-82.
[22]JU L.Understanding the Role of Core Developers in OpenSource Software Development[C]//Proceedings of the 15th International Conference on Global Software Engineering.2020:55-65.
[23]LI Y,TAN C H,TEO H H.Leadership characteristics and developers' motivation in open source software development[J].Information & Management,2012,49(5):257-267.
[24]ZHAO Q,YAO X J,DANG X Y,et al.The Nodes influencemaximization in open source software community based on probability propagation model[J].IEEE Transactions on Network Science and Engineering,2023,10(4):2386-2395.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!