计算机科学 ›› 2020, Vol. 47 ›› Issue (6): 51-58.doi: 10.11896/jsjkx.190300140

• 智能软件工程 • 上一篇    下一篇

面向群体协作开发的开源软件峭壁分析

何鹏1,2, 喻绿君1   

  1. 1 湖北大学计算机与信息工程学院 武汉430062
    2 应用数学湖北省重点实验室 武汉430062
  • 收稿日期:2019-03-27 出版日期:2020-06-15 发布日期:2020-06-10
  • 通讯作者: 何鹏(penghe@hubu.edu.cn)
  • 基金资助:
    国家重点研发计划(2018YFB1003801);国家自然科学基金项目(61902114);湖北省教育厅青年人才项目(Q20171008);应用数学湖北省重点实验室开放基金(HBAM201901)

Analysis of Open Source Software Cliff Walls for Group Collaborative Development

HE Peng1,2, YU Lv-jun1   

  1. 1 School of Computer Science and Information Engineering,Hubei University,Wuhan 430062,China
    2 Hubei Key Laboratory of Applied Mathematics,Wuhan 430062,China
  • Received:2019-03-27 Online:2020-06-15 Published:2020-06-10
  • About author:HE Peng,born in 1988,Ph.D,associate professor,master supervisor,is a member of China Computer Federation.His main research interests include software measure,software defect prediction,and service recommendationand complex network.
  • Supported by:
    This work was supported by the National Key R & D Program of China (2018YFB1003801),National Natural Science Foundation of China(61902114),Hubei Province Education Department Youth Talent Project(Q20171008) and Hubei Provincial Key Laboratory of Applied Mathematics(HBAM201901)

摘要: 开源软件项目因门槛低、自由度高,在开发过程中存在进度缓慢、效率低下和项目质量偏低等问题;同时,软件峭壁(Software Cliff Wall)作为一种判定项目鲁棒性的依据,表现为软件开发过程中在短时间内完成远超过常规增量开发的一种代码贡献行为,是软件演化过程中可持续发展的一种潜在威胁。为了深入研究开源项目的开发过程,更准确地刻画软件演化,从而提高软件开发效率,分析软件峭壁的成因是一种行之有效的方法。实验以GitHub上9个时间跨度至少有5年的开源软件项目为研究对象,分别以月份和季度为周期,基于150 000多个commits上开发者的关注与评论信息构建开发者合作网络(Deve-loper Collaboration Networks,DCN),将代码行数超过1万行的单次commit视为软件峭壁,并从网络规模、网络结构、网络质量3个方面,利用节点数、连边数、节点更新率、模块度、平均路径长度、平均度、节点入度指数、节点出度均值、多样性这9个度量指标来分析软件开发过程中DCN与软件峭壁的关系。研究结果表明:1)当开发团队规模偏小,且成员更新幅度较大时,容易形成软件峭壁;2)保持开发者之间良好的“小世界”特性,有助于避免峭壁的产生;3)以季度为周期来分析软件开发过程中DCN与软件峭壁的关系更为合适,且开发团队成员的组织来源多样化也会在一定程度上促进软件峭壁的产生。

关键词: 开发者合作网络, 开源软件, 群体协作开发, 软件峭壁, 软件演化

Abstract: Due to the characteristics of low threshold and high freedom,open source software encounters slow progress,low efficiency and low quality in the development process.Software cliff wall as a criterion of project robustness,indicates unexpected acceleration in common incremental development activities over short periods of time,which is a potential threat to the sustainable development in software evolution.Therefore,analyzing the causes of software cliff walls is an effective method to deeply understand the development process of open source projects,to more accurately describe the evolution of software,and to improve the efficiency of software development.The experiment firstly constructes a series of developer collaboration networks (DCNs) over more than 150 thousand commits from 9 GitHub projects by month and quarter respectively.This paper consideres a single commit of more than 10 000 lines of code as a software cliffs.And then it introduces 9 metrics,such as the number of nodes,the number of connected edges,the node update rate,the module degree,the average path length,the average degree,the node penetration index,the node out-of-mean,and the diversity,to analyze the relationship between DCN and cliff walls from the perspectives of network scale,network structure and network quality.The results show that:1)smaller development teams and greater member turnover tend to cause a cliff wall;2)‘small world’ features among developers is helpful to avoid the emergence of software cliff walls;3)the relationship between DCN and software cliffs in the software development process is more appropriate in a quarterly cycle,and the diversity of the development team will also affect the creation of cliff walls in software development.

Key words: Developer collaboration network, Group collective development, Open source software, Software cliff walls, Software evolution

中图分类号: 

  • TP301
[1]BROWN A W,BOOCH G.Reusing Open-Source Software and Practices:The Impact of Open-Source on Commercial Vendors[C]//International Conference on Software Reuse.2002:123-136.
[2]YANG B,YU Q,ZHANG W,et al.Influence Factors Correlation Analysis in Github Open Source Software Development Process[J].Journal of Software,2017,28(6):1330-1342.
[3]HE P,LI B,YANG X H,et al.Research On Developer Preferential Collaboration in Open-Source Software Community[J].Computer Science,2015,42(2):161-166.
[4]ZHOU M H.Looking for micro-process in large-scale data[C]//Proceedings of the 2nd International Workshop on Evidential Assessment of Software Technologies.New York:ACM,2012:39-42.
[5]KALLIAMVAKOU E,GOUSIOS G,BLINCOE K,et al.The Promises and Perils of Mining Github[C]//Proceedings of the 11th Working Conference on Mining Software Repositories.New York:ACM,2014:92-101.
[6]LI W P,WANG J B,LIN Z Q,et al.Software Knowledge Graph Building Method for Open Source Project[J].Journal of Frontiers of Computer Science & Technology,2017,11(6):851-862.
[7]JUNG H W,KIM S G,CHUNG C S.Measuring Software Product Quality:A Survey of ISO/IEC 9126[J].IEEE Software,2004,21(5):88-92.
[8]GIRBEA A,SUCIU C,NECHIFOR S,et al.Design and Implementation of a Service-Oriented Architecture for the Optimization of Industrial Applications[J].IEEE Transactions on Industrial Informatics,2014,10(1):185-196.
[9]MENEELY A,WILLIAMS L,SNIPES W,et al.Predicting Failures with Developer Networks and Social Network Analysis[C]//Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2008:13-23.
[10]LIMAM N,BOUTABA R.Assessing Software Service Quality and Trustworthiness at Selection Time[J].IEEE Transactions on Software Engineering,2010,36(4):559-574.
[11]MACLEAN A C,PRATT L J,KREIN J L,et al.Trends that Affect Temporal Analysis using Sourceforge data[C]//Proceedings of the 5th International Workshop on Public Data about Software Development.North Carolina,USA,2010:6-11.
[12]MACLEAN A C.Commit Patterns and Threats to Validity in Analysis of Open Source Software Repositories[D].Utah:Brigham Young University,2012.
[13]PRATT L J.Cliff Walls:Threats to Validity in Empirical Studies of Open Source Forges[D].Utah:Brigham Young University,2013.
[14]CHENG C,LI B,LI Z Y,et al.Developer Role Evolution in Open Source Software Ecosystem:An Explanatory Study on GNOME[J].Journal of Computer Science and Technology,2017,32(2):396-414.
[15]PRATT L J,MACLEAN A C,KNUTSON C D,et al.Cliff Walls:An Analysis of Monolithic Commits Using Latent Dirichlet Allocation[C]//IFIP International Conference on Open Source Systems.Springer,2011:282-298.
[16]CHEN D,WANG X,HE P,et al.Towards Understanding Existing Developers’ Collaborative Behavior In OSS Communities[J].ComputerScience,2016,43(6A):476-479.
[17]GRÖNLUND M,JEFFORD-BAKER J.Measuring correlation between commit frequency and popularity on GitHub[D].Stockholm:KTH Royal Institute of Technology,2017.
[18]SINHA V S,MANI S,SINHA S.Entering the circle of trust:developer initiation as committers in open-source projects[C]//Proceedings of the 8th Working Conference on Mining Software Repositories.2011:133-142.
[19]MA Y T,WU Y,XU Y W.Dynamics of open-source software developer's commit behavior[C]//Proceedings of the 29th Annual ACM Symposium on Applied Computing(SAC’14).New York,USA:ACM Press,2014:1171-1173.
[20]GOUSIOS G.The GHTorent dataset and tool suite[C]//Proceedings of the 10th Working Conference on Mining Software Repositories.2013:233-236.
[21]Struggling in IT.GitHub 2018 Annual Report[EB/OL].http://www.wh-ford.com/f828820/20181030A1WJZ800.html.
[22]HINDLE A,GERMAN D M,HOLT R C,et al.Automatic Classification of Large Changes into Maintenance Categories[C]//IEEE International Conference on Program Comprehension.IEEE,2009:99-108.
[23]ARAFAT O,RIEHLE D.The Commit Size Distribution of Open Source Software[C]//Hawaii International Conference on System Science.IEEE,2009:1-8.
[24]GU Q,CHEN D X.Validation and Simulation of Software System Evolution Rules Using Software Networks[J].Scientia Sinica Informationis,2014,44(1):20-36.
[25]GU Q,XIONG S J,CHEN D X.Correlations between characteristics of maximum influence and degree distributions in software networks[J].SCIENCE CHINA Information Sciences,2014,57(7):1-12.
[26]HE P,WANG P,LI B,et al.An Evolution Analysis of Software System Based On Multi-Granularity Software Network[J].Acta Electronica Sinica,2018,46(2):257-267.
[27]PAN W F,LI B,MA Y T,et al.Multi-Granularity Evolution Analysis of Software Using Complex Network Theory[J].Journal of Systems Science and Complexity,2011,24(6):1068-1082.
[28]NEWMAN M E J.Fast Algorithm for Detecting Community Structure in Networks[J].Physical Review E,2003,69(6):066133.
[1] 范家宽, 王皓月, 赵生宇, 周添一, 王伟.
数据驱动的开源贡献度量化评估与持续优化方法
Data-driven Methods for Quantitative Assessment and Enhancement of Open Source Contributions
计算机科学, 2021, 48(5): 45-50. https://doi.org/10.11896/jsjkx.201000107
[2] 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞.
基于类粒度的克隆代码群稳定性实证研究
Empirical Study on Stability of Clone Code Sets Based on Class Granularity
计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062
[3] 王继文, 吴毅坚, 彭鑫.
基于演化和语义特征的上帝类检测方法
Approach of God Class Detection Based on Evolutionary and Semantic Features
计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077
[4] 张静宣, 江贺.
代码标识符归一化研究现状及发展趋势
Research Status and Development Trend of Identifier Normalization
计算机科学, 2020, 47(3): 1-4. https://doi.org/10.11896/jsjkx.200200009
[5] 张超,毛新军,卢遥.
基于特征提取的开源社区Fork摘要自动生成方法
Approach of Automatic Fork Summary Generation in Open Source Community Based on Feature Extraction
计算机科学, 2020, 47(3): 25-33. https://doi.org/10.11896/jsjkx.191000087
[6] 卢冬冬, 吴洁, 刘鹏, 盛永祥.
开源软件关键开发者类型及协作网络鲁棒性分析
Analysis of Key Developer Type and Robustness of Collaboration Network in Open Source Software
计算机科学, 2020, 47(12): 100-105. https://doi.org/10.11896/jsjkx.200300147
[7] 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静.
软件演化历史的逆向工程生成方法研究
Study on Reverse Engineering Generation Method of Software Evolution History
计算机科学, 2020, 47(11A): 549-556. https://doi.org/10.11896/jsjkx.200200067
[8] 王扩, 王忠杰.
众包协作流程的恢复方法
Crowdsourcing Collaboration Process Recovery Method
计算机科学, 2020, 47(10): 19-25. https://doi.org/10.11896/jsjkx.191200164
[9] 潘浩, 郑巍, 张紫枫, 芦超群.
软件网络分形结构特征研究
Study on Fractal Features of Software Networks
计算机科学, 2019, 46(2): 166-170. https://doi.org/10.11896/j.issn.1002-137X.2019.02.026
[10] 唐倩文, 陈良育.
基于复杂网络理论的Java开源系统演化分析
Analysis of Java Open Source System Evolution Based on Complex Network Theory
计算机科学, 2018, 45(8): 166-173. https://doi.org/10.11896/j.issn.1002-137X.2018.08.030
[11] 郑交交, 李彤, 林英, 谢仲文, 王晓芳, 成蕾, 刘妙.
构件系统演化一致性的判定方法
Judgement Method of Evolution Consistency of Component System
计算机科学, 2018, 45(10): 189-195. https://doi.org/10.11896/j.issn.1002-137X.2018.10.035
[12] 赵会群,黄榆涵.
软件模型代数性质的程序化验证
Program Verification of Software Model’s Algebraic Properties
计算机科学, 2017, 44(11): 240-245. https://doi.org/10.11896/j.issn.1002-137X.2017.11.036
[13] 陈丹,王星,何鹏,曾诚.
开源社区中已有开发者的合作行为分析
Towards Understanding Existing Developers’ Collaborative Behavior in OSS Communities
计算机科学, 2016, 43(Z6): 476-479. https://doi.org/10.11896/j.issn.1002-137X.2016.6A.112
[14] 钟林辉,李俊杰,夏鲸,薛良波.
基于多维属性的构件化软件演化相似性度量方法研究
Research on Evolution Similarity Measurement of Component-based Software Based on Multi-dimensional Evolution Properties
计算机科学, 2016, 43(Z11): 499-505. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.112
[15] 钱晔,李彤,郁涌,孙吉红,于倩,彭琳.
一种面向同步交互的软件演化过程建模方法
Approach to Modeling Software Evolution Process for Synchronous Interaction
计算机科学, 2016, 43(8): 154-158. https://doi.org/10.11896/j.issn.1002-137X.2016.08.032
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!