Computer Science ›› 2025, Vol. 52 ›› Issue (7): 26-36.doi: 10.11896/jsjkx.250200108

• Computer Software • Previous Articles     Next Articles

Sub-community Detection and Evaluation in Open Source Projects:An Example of Apache IoTDB

WANG Weiwei1, LE Yang2, WANG Yankai1   

  1. 1 School of Software, Tsinghua University, Beijing 100084, China
    2 Department of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2025-02-26 Revised:2025-06-12 Published:2025-07-17
  • About author:WANG Weiwei,born in 1979,master.His main research interests include open source and industrial software.
  • Supported by:
    Chongqing Innovative Technology and Application Development Special Major Project(CSTB2023TIAD-STX0034).

Abstract: As open-source collaboration has become a widely adopted paradigm in software development,the scale and structure of open-source projects have grown increasingly complex.Within the open-source collaboration model,ensuring software quality in large and intricate software systems has emerged as a critical issue.In the existing operational models of open-source communities,a project's community is often treated as a single entity,which contradicts the modular design principles of complex software.This study focuses on the phenomenon of sub-communities within open-source projects.Based on the analysis of code commit records and file change histories,a graph structure is constructed to model the relationships between developers and code files,upon which a sub-community detection algorithm is proposed that leverages developer activity and code modification records.By introducing intra-community participation coefficients and inter-community participation coefficients,this paper estab-lishes a core developer identification model,providing project managers with a quantitative evaluation tool for assessing developer contributions and collaboration significance.Additionally,it designs a sub-community scoring method that comprehensively considers both modular concentration and dispersion to evaluate the quality performance of different sub-communities in the module development process.An empirical analysis is conducted using the Apache IoTDB project as a case study.By mining 11 523 commit records from 282 developers,it constructs a collaboration network and identify four distinct sub-communities with significant characteristics.The experimental results indicate that the core developer identification outcomes and the code quality evaluation scores of each sub-community align with actual development conditions,validating the effectiveness of the proposed models and methods in open-source projects.

Key words: Open-source project, Collaboration network, Sub-community detection, Core developers, Apache IoTDB, Code quality evaluation

CLC Number: 

  • TP311
[1]AUE J,HAISMA M,TOMASDOTTIR K F,et al.Social Diversity and Growth Levels of Open Source Software Projects on GitHub[C]//Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement.2016.
[2]RAJA U,TRETTER M J.Defining and evaluating a measure of open source project survivability[J].IEEE Transactions on Software Engineering,2012,38(1):163-174.
[3]CHENGALUR-SMITH I,SIDOROVA A,DANIEL S L.Sustainability of Open-Source Projects:A Longitudinal Study[J].Journal of the Association for Information Systems,2010,11(11):657-683.
[4]SAHIN S E,KUBILAY K,TOSUN A.Predicting popularity of open source projects using recurrent neural networks[C]//IFIP International Conference on Open Source Systems.Springer,2019:80-90.
[5]SAMOLADAS I,LEFTERIS A,IOANNIS S.Survival analysison the duration of open source projects[J].Information and Software Technology 2010,52(9):902-922.
[6]XIA H X,ZHANG X,ZHANG X Z.Study on collaborative net-work of OpenStack OSS developers[J].Systems Engineering-Theory & Practice,2017,37(5):1373-1382.
[7]NEWMAN M E J.Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences of the United States of America,2006,103(23):8577-8582.
[8]BORGATTI S P,EVERETT M G.Models of Core/PeripheryStructures[J].Social Networks,2000,21(4):375-395.
[9]CHEN D,WANG X,HE P,et al.Towards UnderstandingExisting Developers' Collaborative Behavior in OSS Communities[J].Computer Science,2016,43(S1):476-501.
[10]LU D,WU J,LIU P,et al.Core developers identification of knowledge collaboration network in open source software community:A case study of AngularJS[J].Computer Engineering &Science,2021,43(3):551-559.
[11]CROWSTON K,JAMES H.Assessing the health of opensource communities[J].Computer,2006,39(5):89-91.
[12]AMAN H,BURHANDENNYA E,AMASAKI S,et al.AHealth Index of Open Source Projects Focusing on Pareto Distribution of Developer's Contribution[C]//The 8th IEEE International Workshop on Empirical Software Engineering in Practice.2017.
[13]BIAN Y,MU W,ZHAO J L.Online leadership for open source project success:Evidence from the GitHub blockchain projects[C]//PACIS2018.2018.
[14]Wikipedia contributors.Dense graph[EB/OL].https://en.wikipedia.org/wiki/Dense_graph.
[15]Wikipedia contributors.Assortativity[EB/OL].https://en.wikipedia.org/wiki/Assortativity.
[16]SETIA P,RAJAGOPALAN B,SAMBAMURTHY V,et al.How peripheral developers contribute to open-source software development[J].Information Systems Research,2012,23(1):144-163.
[17]CHRISTIAN B,PATTISON D,D'SOUZA R,et al.Latent social structure in open source projects[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering.2008.
[18]SHI Y Q,LI L,SHI Q.Impact of User Heterogeneity onKnowledge Collaboration Effectiveness from a Network Structure Perspective[J].Journal of Library and Information Sciences in Agriculture,2024,36(3):72-82.
[19]TRAAGVA,WALTMAN L,VAN ECK N J.From Louvain to Leiden:guaranteeing well-connected communities[J/OL].https://www.nature.com/articles/s41598-019-41695-z.
[20]ROSVALL M,BERGSTROMC T.Maps of random walks oncomplex networks reveal community structure[C]//Procee-dings of the National Academy of Sciences.2008:1118-1123.
[21]WOLPERT D H,MACREADY W G.No free lunch theorems for optimization[J].IEEE Transactions on Evolutionary Computation,1997,1(1):67-82.
[22]JU L.Understanding the Role of Core Developers in OpenSource Software Development[C]//Proceedings of the 15th International Conference on Global Software Engineering.2020:55-65.
[23]LI Y,TAN C H,TEO H H.Leadership characteristics and developers' motivation in open source software development[J].Information & Management,2012,49(5):257-267.
[24]ZHAO Q,YAO X J,DANG X Y,et al.The Nodes influencemaximization in open source software community based on probability propagation model[J].IEEE Transactions on Network Science and Engineering,2023,10(4):2386-2395.
[1] HE Peng, YU Lv-jun. Analysis of Open Source Software Cliff Walls for Group Collaborative Development [J]. Computer Science, 2020, 47(6): 51-58.
[2] LU Dong-dong, WU Jie, LIU Peng, SHENG Yong-xiang. Analysis of Key Developer Type and Robustness of Collaboration Network in Open Source Software [J]. Computer Science, 2020, 47(12): 100-105.
[3] CHEN Dan, WANG Xing, HE Peng and ZENG Cheng. Towards Understanding Existing Developers’ Collaborative Behavior in OSS Communities [J]. Computer Science, 2016, 43(Z6): 476-479.
[4] LI Wen-xiang, WEI Xia, JIANG Hao and SHENG Yu-xia. Modeling and Analysis on Facilities Collaboration Network [J]. Computer Science, 2016, 43(9): 160-164.
[5] KUANG Li, YI Yun-fei and LI Yuan-xiang. Analysis of Fractal Property on GitHub Network Based on Weak Ties Theory [J]. Computer Science, 2015, 42(7): 146-149.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!